Grammar

Overview

The Grammar object defines a recognition grammar, which can be used in all objects that accept caller input (such as Input, Menu, or Confirmation objects). A recognition grammar defines the input a caller can respond with to a particular request prompt. In the voice and video channel, this can be a set of phrases that a caller might say or DTMF keys they might press on the phone’s keypad. In the text and Web channel, the grammar can define a set of options; depending on if and how the grammar is displayed, the caller can either select an option, or respond with free text that can then be mapped to slot values by the server.

@8    Tip: To develop dynamic speech or DTMF grammars effectively you can use a grammar tool like NuGram IDE, which provides a complete set of Eclipse tools to author, test, and debug speech or DTMF recognition grammars. For further details and a free download refer to the Nu Echo Website (www.grammarserver.com) or the VoiceObjects Developer Portal (http://developers.voiceobjects.com/partners/).

The Object Definition below covers the configuration of the Grammar object with VoiceObjects Desktop. For information on how to define this object type using VoiceObjectsXML, refer to the VoiceObjectsXML Definition paragraph.

The Grammar object belongs to the object category Resources.

Object Definition

The Definition of the Grammar object provides one section:

·          Grammar



For further details regarding additional object configuration refer to Precondition and Properties in this Object Reference.

Grammar

In the Grammar section, one or more Grammar items can be defined.

i8    Note: The structure of the Grammar editor in Desktop for Web is slightly different but the provided fields are actually the same.

Each Grammar Item section contains the following fields:


Property

Description

Label

Optional parameter to identify the Grammar item in a list.

Layer

Optional parameter to define a layer condition for the specific Grammar item. At call time, only Grammar items with layer conditions that evaluate to true are activated. For more information on layers, see the Layer object in this Object Reference, or Chapter 7 – How to Use Layers in the Design Guide.
Note that this layer condition only refers to the individual Grammar item. A precondition for the entire Grammar object can be defined within the top-level Precondition section of the Grammar editor.

Language

Defines the language layer the grammar belongs to. At call time, the server only activates the grammars of the currently active language layer, e.g. if the currently active language is en-UK then only grammars on either the language layer English (United Kingdom) or English are activated.

Channel

Defines the channel layer the grammar belongs to. At call time, the server only activates the grammars of the currently active channel layer., e.g. if the currently active channel is voice then only grammars with the channel layer set to voice will be processed. If a grammar is valid for the voice and video channel, or for the text and Web channel, use the combined entries Voice/Video or Text/Web, respectively.

Weighting

Optional parameter to define the relative weighting of this grammar compared to other grammars active at a certain dialog step. This can be used to indicate that some caller inputs are more or less likely to occur than others. The actual effect depends on the way the underlying media platform has implemented this feature. This setting is only relevant in the voice and video channel.
If the weight is specified via a Variable, Expression, Layer, or Script object, the weight factor needs to be specified as a numerical value, e.g. 0.2 for 20% or 5.0 for 500%. Note that this field is a combo field, i.e. you can also type in any arbitrary real number.

 

Settings in these fields apply to all subsections specified later on (see below).

If multiple Grammar items exist for the same language and layer in the voice or video channel, all of them are rendered into the resulting VoiceXML. The media platform then merges them into one combined grammar. In the text and Web channel, only the first definition (in object editor order) will be used.

In addition, each Grammar Item section, contains the following three subsections (represented by tabs in Desktop for Web):

·         Voice/Text
To define a voice grammar in the voice and video channel, or to define a text grammar in the text and Web channel.

·         DTMF
To define a DTMF grammar in the voice and video channel. In the text and Web channel, any definition in this section will be ignored.

·         Preprocessing
To define initial processing that dynamically builds a grammar at call time.

Voice/Text

In the Voice/Text section, the content of the grammar is defined. In case of the voice and video channel you define a grammar for voice input, which is specified through an embedded definition that is stored within the Grammar object, or through a reference to an external file. In case of the text and Web channel, you define a grammar for text input, which needs to be an embedded definition.

Embedded Grammar Definition
For an embedded definition, you may type in the grammar code into the Grammar field (or text area in Desktop for Web). By default, the TTG check box is selected, indicating that the grammar is defined in text-to-grammar format. This enables you to define grammars via comma-separated lists of utterances without having to bother about specific grammar formats appropriate for the underlying media platform. TTG is the only format supported when writing grammars in the text and Web channel. More details can be found in the Text-to-Grammar paragraph.

If you clear the TTG check box, you need to make sure that the grammar format you provide is understood by the media platform you are deploying your application on. For more details, see the Grammar Formats paragraph.

In the voice and video channel, you can explicitly use a <grammar> … </grammar> tag pair in your embedded non-TTG grammar definition; if you do so, the definition is copied verbatim into the VoiceXML code that is generated by the server at call time. Note, however, that if XML-related characters such as ”<or>” are used within the grammar code itself, it needs to be wrapped in CDATA.
If your embedded definition does not use a <grammar> tag, the server automatically wraps it at call time, inserting the appropriate settings for grammar type, grammar weighting, etc.

Embedded grammar definitions can also be provided through references to Variable, Expression, Collection, Layer, or Script objects. If an object reference is used as an embedded definition, the respective object is evaluated at call time and its value in turn becomes the definition for the grammar.

External Grammar Reference
Besides using embedded definitions, the voice and video channel allows a grammar to be defined by referencing an external file. To do so, specify the following:


Property

Description

Location

Optional parameter to specify a location (using a Resource Locator object).

File

Defines the filename for the grammar file to be used.

Rule

Optional parameter to define the starting rule to be used within the grammar file.

Extension

Defines the file extension for the grammar, which is appended to the filename. If you include the extension in the File definition, set the extension to None. If you define an extension other than None, this setting can be overwritten by the corresponding setting in the service definition. For more information on the definition of services, refer to Chapter 2 – Configuring Servers and Services in the Deployment Guide.

8   Caution: Not all media platforms support the use of grammar files with multiple public rules that can be accessed using the Rule field. Some media platforms only allow one public rule per grammar file. In this case, the Rule field should be left empty.

The URL that is created within the markup code to reference the grammar at call time has the following structure:

[Location] + [File] + [#] + [Rule]

The “#” character is only inserted if both a file and a rule are defined.

In a similar manner as described for the embedded grammar definition, filename and rule name can also be defined through a Variable, Expression, Layer, or Script object.

@8    Tip: When working with Voxeo Prophecy and a third party ASR engine, external grammar files are usually loaded by Prophecy and converted to a format appropriate for the ASR engine. This may prevent the ASR engine from caching the file reference. To bypass the conversion mechanism, use the Tuning property Resource Fetching – Grammar Fetch Options to set voxeo:useuri=true. This forces Prophecy to simply send the file reference itself to the ASR engine. Note that in this case the grammar file must be in a format that can be understood directly by the ASR engine.

Precedence
In case both an external grammar reference and an embedded grammar definition are present, use the Precedence setting to decide which one to use. Select Embedded:External, if the embedded definition is to be used. Select External:Embedded, if the external reference is to be used. If you leave the setting at Default, the Service object definition of this property is used. See Configuring a Service in Chapter 2 – Configuring Servers and Services of the Deployment Guide for more information on how to set the grammar mode in a Service object.

i8  Note: If you want to add more rules to an existing external grammar, you can define these extra rules as an embedded grammar and use the marker @SOURCE@, which will be replaced with the resource locator path, the filename, and the extension of the external file definition at call time.

Type
The Type field defines the format of the grammar and is only relevant in the voice and video channel. A corresponding attribute is rendered into the VoiceXML <grammar> tag at call time. A Type field set to Default indicates that the grammar format defined in the corresponding Service object is used. For more details, refer to Chapter 2 – Configuring Servers and Services in the Deployment Guide. If no format is to be set, select None. Note that only those types are allowed that are supported by the media platform. If you select a type that is not supported, the server renders the default grammar type for the selected platform instead.

Built-in grammars
Special types of grammars are built-in for standard inputs such as yes/no, a date, a phone number etc. The support for built-in grammars depends on the media platform used; in general, only voice and video applications based on VoiceXML can support built-in grammars. Check the documentation of your media platform to see which built-in grammars are provided. See Appendix A – Media Platform Drivers in the Deployment Guide to find out if your platform supports built-in grammars at all. Note that built-in is not the same type as precompiled. Precompiled grammars are a special type of grammar offered by some media platforms. To make use of built-in grammars in Input objects, set the Type field to Built-in and leave the Slot name field in the Result Handling empty. In the text area for embedded grammars, enter the type of grammar required. The following types are possible: boolean, date, digits, currency, number, phone, and time. Proprietary types provided by the platform can also be used. For more information on the standard types, refer to the VoiceXML 2.0 specification at http://www.w3.org/TR/voicexml20, Appendix P – Builtin Grammar Types.
For voice and DTMF grammars in the voice and video channel, the type digits can be parameterized in order to set a minimum and/or maximum length for digit sequences. Use minlength, maxlength, and length, to set the minimum, maximum, or the exact length, respectively.
For DTMF grammars, the type boolean can be parameterized in order to define which key corresponds to an affirmative and a negative answer. Use y and n to set the keys for yes and no, respectively.
The parameters are appended to the type itself.

Examples:

digits?length=4

digits?minlength=2;maxlength=4

boolean?y=1;n=0

DTMF

In the DTMF section, a DTMF grammar can be defined for touchtone input using the keypad of the phone. Any definition made here will only be applied in the voice and video channel.

The corresponding fields are structurally the same as for Voice/Text, and the same mechanisms apply. Note, however, that according to the VoiceXML specification the weight entry (defined in the Weighting field) has no effect for DTMF grammars.

i8  Note: When using the hash key ‘#’ in a TTG grammar, you need to set the tuning parameter DTMF – Termination Character to “” (the empty string). Otherwise the media platform will interpret the hash key ‘#’ as the termination key to end a sequence of DTMF keys, and not as a key on its own.

Some media platforms allow defining DTMF input within a voice grammar. For these platforms, the use of the DTMF section is optional. For portability reasons, it is recommended to use it, though. And in case your application only allows one input mode at a time (either voice or DTMF), VoiceObjects Server renders only the grammar that corresponds to the currently active input mode.

8  Caution: Some media platforms do not support slot assignments in DTMF grammars. These platforms always return the DTMF keys pressed, but no semantic interpretation. If you use TTG for DTMF grammars, this will be handled by the server, i.e. in this case slot assignments will work as usual. If you use external, or inline but non-TTG grammars, you need to be aware of this restriction. See Appendix A – Media Platform Drivers in the Deployment Guide to find out how your media platform handles this.

Preprocessing

The Preprocessing section enables the dynamic generation of grammars at call time and is supported in all four channels. In certain cases, e.g. when maintaining a shopping cart within an application that the user can remove items from, the specific entries in a grammar are not known at design time.

Both the Voice/Text and the DTMF definition can make use of Variable, Expression, Collection, Layer, or Script objects. At call time, the server first processes whatever is defined in Preprocessing, and then evaluates the Voice/Text and DTMF sections to generate the appropriate grammar definitions. This enables you to fill variables with certain values within the Preprocessing and then use those values in the Voice/Text or DTMF definitions.

For a definition of the individual fields in the Preprocessing section, refer to the Connector object.

The figures below provide an example for the handling of a shopping cart. The Preprocessing section specifies a Java connector that fills a Variable object called Shopping cart grammar with the appropriate grammar. This Variable object is then used as an embedded definition in the Voice/Text section.


 

Text-to-Grammar

When building voice or video applications in a rapid prototyping approach, it is often desirable to start the definition of a grammar by just providing a couple of typical utterances a caller might say. The application designer does not need to know the exact grammar syntax required by the media platform. The text-to-grammar (TTG) functionality is built to support this approach. A TTG definition consists of a comma-separated list of possible caller input, potentially enriched with slot values.

While in the voice and video channel TTG is one of several options to define grammars, in the text and Web channel it is the only supported format.

The following paragraphs apply to all channels and describe how to build TTG grammars. Find examples for the various channels at the end of this paragraph.

The tokens of a TTG grammar may contain multiple words, as in the following example:

car insurance, life insurance, health insurance

In certain cases it is desirable to define a slot return value that is associated with a specific utterance. This can be done by providing the slot value within parentheses behind each token:

car insurance (car), life insurance (life), health insurance (health)

If no slot values are defined, the utterance itself is returned as the slot value.

When using TTG, the result is returned in a default slot called sltTTG for all media platforms certified with the VoiceObjects platform (note, though, that it is possible to define a different slot name through a media platform driver setting if required). This slot name must be used in the Result Handling of an Input object to assign it to a Variable or Layer object. It is also possible to define different slot names within one TTG grammar, using a ‘#’ notation within the parentheses:

car insurance (#sltChoice# car), life insurance (#sltChoice# life),
health insurance (#sltChoice# health)

In this example, all three utterances fill a slot sltChoice. For reasons of convenience, you can define the slot name once and then leave it out for all following utterances:

car insurance (#sltChoice# car), life insurance (life), health insurance (health), premium customer (#sltCustomer# premium), gold customer (premium)

In this example, saying car insurance, life insurance or health insurance will fill the slot sltChoice with car, life, or health, respectively, and saying premium customer or gold customer will fill sltCustomer with premium.

Some ASR engines support garbage matching; this enables the media platform to match any kind of caller input between certain keywords. In TTG, you can make use of garbage by working with the placeholder @GARBAGE@. This placeholder can be put at any position within one utterance. Consider the following example:

@GARBAGE@ car insurance, @GARBAGE@ life insurance, @GARBAGE@ health insurance

This grammar matches utterances such as “I need information on car insurance”, “Please give me info on life insurance”, “life insurance”, etc. If your media platform does not support garbage, the placeholder is removed and has no further effect. Consult your media platform vendor for more information on this.

In the text and Web channel, the utterances are displayed on the screen as options, in case the tuning parameter Presentation – Input is set to radio, list, or menu. To be able to use special characters like commas or parentheses in the output, which are reserved characters from the TTG syntax, you can use the back-slash symbol for escaping. Consider the following example, where the grammar is to be displayed as a menu:

Monday\, 3rd Jan \(today\) (today), Tuesday\, 4th Jan \(tomorrow\) (tomorrow)

This would be shown as “Monday, 3rd Jan (today)” and “Tuesday, 4th Jan (tomorrow)”, with the slot values today and tomorrow, respectively.

If the tuning parameter Presentation – Input is set to radio or list in a Web channel application, it is desirable to mark one utterance as pre-selected, so the browser can display it accordingly. This can be achieved by using the marker @SELECT@ within the parentheses, in front of potential slot assignments. Have a look at the following examples:

male (male), female (@SELECT@female)

male (#Gender#male), female (@SELECT@#Gender#female)

These are functionally identical, with the difference that the second one mentions the slot name explicitly. By using the @SELECT@ marker for female, the browser will be instructed to pre-select this option in the display.

In the Web channel, multi-slot Input objects are possible. The TTG grammar must define the set of options per slot in this case. The following example shows how to do this:

male(#Gender#), female (@SELECT@), ? (#Age#)

This TTG grammar requires two slots in the corresponding Input object named Gender and Age. If these are not present, processing of this object will fail. For the slot called Gender, two options are defined. If no value for the tuning parameter Presentation – Input is defined for this slot name, the server will show these options as radio buttons. For the Age slot, the server will generate an input field for free entry of text. At the bottom of the resulting page, a submit button is shown so that the caller can apply their selection and submit that.

TTG can also build grammars on the basis of Collection objects. For single-slot grammars using the standard slot sltTTG, this can be done in two different ways: If the collection has a single column (named utterance), then this column is used for both utterances and slot values as described in the first case above. The corresponding collection would look like this:

<root>

<row><col name=”utterance”>car insurance</col></row>

<row><col name=”utterance”>life insurance</col></row>

<row><col name=”utterance”>health insurance</col></row>

</root>

Alternatively, slot values and multi-lingual grammars can be defined via multi-column collections like this:

<root>

<row>

<col name=”en-US”>car insurance</col>

<col name=”de-DE”>autoversicherung</col>

<col name=”key”>car</col>

</row>

<row>

<col name=”en-US”>life insurance</col>

<col name=”de-DE”>lebensversicherung</col>

<col name=”key”>life</col>

</row>

<row>

<col name=”en-US”>health insurance</col>

<col name=”de-DE”>krankenversicherung</col>

<col name=”key”>health</col>

</row>

</root>

In this format, the column “key” defines the slot return value while columns with language code names such as “en-US” or “de-DE” define the corresponding utterances. A column “default” may be provided, which is used if no matching column exists for the active language setting. If more than one utterance should return the same slot value, separate rows with the same “key” value need to be defined, i.e. you cannot use a comma-separated list of utterances in an utterance column.

For grammars that need to set more than one slot, or single-slot grammars that use a slot name other than sltTTG, you can add columns “slotname” that define, for every row, the name of the slot to be filled. An example is:


<root>

<row>

<col name=”en-US”>car insurance</col>

<col name=”de-DE”>autoversicherung</col>

<col name=”slotname”>sltChoice</col>

<col name=”key”>car</col>

</row>

<row>

<col name=”en-US”>life insurance</col>

<col name=”de-DE”>lebensversicherung</col>

<col name=”slotname”>sltChoice</col>

<col name=”key”>life</col>

</row>

<row>

<col name=”en-US”>health insurance</col>

<col name=”de-DE”>krankenversicherung</col>

<col name=”slotname”>sltChoice</col>

<col name=”key”>health</col>

</row>

</root>

i8  Note: It is not possible to simply enter the code of the collection into the Grammar field within the Grammar editor. If you want to use it for TTG you need to insert the actual Collection object.

Using TTG in the voice and video channel has the important advantage that grammars are automatically created in a format appropriate for the media platform used. In particular, media platforms can be switched without having to make any changes to the application itself.

The grammar type in which the TTG grammar should be created (e.g. Nuance GSL or SRGS XML) can be determined in a variety of ways. The recommended approach is to define the desired type in the corresponding Service object, and leave all individual Grammar items set to Default.

Not all media platforms support all of the available grammar types. Therefore, if the TTG check box is selected and a grammar type is chosen that is not known to be supported by the current media platform, the server reverts to the default grammar type for the media platform. This is to ensure that TTG definitions always produce grammars that work with the underlying media platform.

Finally, it should be kept in mind that TTG is intended for rapid prototyping and small to medium-sized grammars. As TTG in the voice and video channel creates an inline <grammar> element embedded within the VoiceXML code sent to the media platform, no caching of these grammars is possible. Therefore, large grammars should reference external files, which will enable caching and improve overall system performance.

i8  Note: When using the hash key ‘#’ in a TTG grammar, you need to set the tuning parameter DTMF – Termination Character to “” (the empty string). Otherwise the media platform will interpret the hash key ‘#’ as the termination key to end a sequence of DTMF keys, and not as a key on its own.

The following examples illustrate different uses of TTG, depending on the channel.

The first example shows the use of TTG in a Grammar object embedded in a Hyperlink object that allows jumping back to a top-level menu. Here, no slot is required. The TTG grammar is defined for the voice channel.


 

The same Hyperlink object could be re-used for the text and Web channel, by defining appropriate additional Grammar items using TTG.
In the text channel, the grammar will be shown as the access key of the option that will be generated for the hyperlink by the USSD browser. See Chapter 10 – How to Support Multiple Phone Channels in the Design Guide for examples of how the display of a text application could look like. Therefore, define some digit as the grammar for this hyperlink, e.g. 0 (zero).
In the Web channel, the grammar will not be processed at all, only the presentation output will be used to display the hyperlink. Nevertheless, a grammar definition is mandatory in all channels. So either re-use the text channel grammar definition in case your application should support both channels, or use the question mark “?” as a placeholder definition for a pure Web grammar:

 

 

The second example shows an Input object that makes use of the slot sltTTG without explicitly defining slot values in the TTG grammar definition. After the caller has selected a flavor, the utterance itself is returned in the slot. Thus the Variable object Flavor contains vanilla, chocolate, strawberry, or banana.

 

 

Note that in case of a single slot name, the slot name definition can be left empty, for convenience reasons. Since TTG grammars provide one slot only, you can always leave the Slot name field empty when making use of TTG grammars.

This grammar works in all four channels, therefore the Channel setting is left at Default. In the voice and video channel, it will allow the caller to say one of the four mentioned flavors. In the text and Web channel, a TTG grammar defining between 1 and 5 utterances will be displayed on the screen as a set of options, as it is best practice to rather offer the caller a range of predefined choices than to let them type in their response manually. How the grammar will be shown (e.g. as hyperlinks, radio-buttons or a drop-down list in the Web channel) can be defined using the tuning parameter Presentation – Input and is described in more detail in the Input object in this Object Reference.

The final example shows an Input object that makes use of explicit slot values, which is again supported by all channels. After the caller has chosen an insurance type by saying or selecting car insurance, health insurance, or life insurance the Variable object Insurance type contains the slot value car, health, or life.

 

 

i8  Note: Not all voice and video platforms can handle blanks in slot return values. Thus when using utterances in TTG that contain blanks, it may be required for these platforms to provide an explicit slot return value for this utterance that does not contain blanks, e.g. “car insurance (car_insurance)”.

Utterance Validation

When building applications for the text or Web channel, there sometimes is the situation that the user is asked to provide free-form input that still needs to meet certain criteria, such as e.g. an account number that must be a seven digit number not starting with 0.

While the application can certainly perform such validation and loop back to the respective input state if it is not met, it is often more desirable to have a concept similar to the No Match condition known from the voice channel. This can be achieved by using the utterance validation mechanism built into VoiceObjects Server.

To enable utterance validation, the grammar definition must be provided in the format

regex:RegularExpression:regex (#slotName#)

where RegularExpression represents any regular expression and slotName stands for the name of the slot to be validated. Multiple such statements may be used, separated by comma, for multi-slot inputs in the Web channel (though only a single regular expression is allowed for each individual slot). For single nameless slots, the slotName part may be omitted.

As an example, if a single slot input is supposed to be a seven digit number not starting with 0 then the following grammar definition could be used:

regex:[1-9][0-9]{6}:regex

If the caller’s input matches the regular expression, the dialog proceeds normally with the next object in the dialog flow.

If the input does not match the regular expression, a No Match event is triggered. This can be handled in the usual way using an event handler; for more information refer to Event Handling. Note that no event counter is kept, so multiple iterations of invalid caller inputs will always trigger the same No Match event handler. If different behavior is desired for different levels, the application may keep its own counter within the No Match event handling.

In the No Match handler, you can use the expression LASTRESULT(validation) to retrieve a comma-separated list of those slots where input validation against the respective regular expressions failed. LASTRESULT(validation, Slotname) will return true if validation succeeded for the specified slot, false if it failed.

Grammar Formats

Three major grammar formats are currently in use in voice and video applications: GSL, JSGF, and XML. The table below provides a sample for each of them.

 

Grammar Format

Sample Grammar

GSL

Highway

(

  ?[freeway highway route]

   [

     (one oh one)                      {<sltHighway "101">}

     (one hundred one)             {<sltHighway "101">}

     (one hundred and one)      {<sltHighway "101">}

     (two eighty)                        {<sltHighway "280">}

     (two hundred eighty)          {<sltHighway "280">}

     (two hundred and eighty)   {<sltHighway "280">}

     (six eighty)                         {<sltHighway "680">}

     (six hundred eighty)           {<sltHighway "680">}

     (six hundred and eighty)    {<sltHighway "680">}

   ]

)

JSGF

public <Highway> = [freeway | highway | route]

                   [

                     (one oh one)                      {this.sltHighway="101"} |

                     (one hundred one)             {this.sltHighway="101"} |

                     (one hundred and one)      {this.sltHighway="101"} |

                     (two eighty)                        {this.sltHighway="280"} |

                     (two hundred eighty)          {this.sltHighway="280"} |

                     (two hundred and eighty)   {this.sltHighway="280"} |

                     (six eighty)                          {this.sltHighway="680"} |

                     (six hundred eighty)            {this.sltHighway="680"} |

                     (six hundred and eighty)     {this.sltHighway="680"} |

                   ];

SRGS XML

<grammar xml:lang="en-US" version="1.0" root="Highway">

<rule id="Highway" scope="public">

  <item repeat="0-1">

    <one-of>

      <item> freeway </item>

      <item> highway </item>

      <item> route   </item>

    </one-of>

  </item>

  <one-of>

    <item tag="sltHighway='101'"> one oh one                      </item>

    <item tag="sltHighway='101'"> one hundred one             </item>

    <item tag="sltHighway='101'"> one hundred and one      </item>

    <item tag="sltHighway='280'"> two eighty                        </item>

    <item tag="sltHighway='280'"> two hundred eighty          </item>

    <item tag="sltHighway='280'"> two hundred and eighty   </item>

    <item tag="sltHighway='680'"> six eighty                         </item>

    <item tag="sltHighway='680'"> six hundred eighty           </item>

    <item tag="sltHighway='680'"> six hundred and eighty    </item>

  </one-of>

</rule>

</grammar>

 

The example grammars shown above each fill a slot called sltHighway with the number of the highway the caller entered. Note that while the utterances themselves are defined textually (six eighty), the slot values are digit strings (680). This is preferable in order to abstract the semantic interpretation from the actual utterances, and in order to simplify the business logic used within the application itself. Using the numbers instead of the text makes it very easy, for instance, to run exactly the same application in Spanish with only the utterances defined in the grammar being different.

i8  Note: Some media platforms require the explicit tagging of grammars with a language code. This implies that when using an embedded grammar definition containing explicit <grammar> … </grammar> tags, you are responsible for providing this tag attribute. As a convenience, you may use the marker @LANGUAGE@ which the server will replace with the currently active language at call time when creating the markup code.

VoiceObjectsXML Definition

The Grammar object is represented by the VoiceObjectsXML element <grammar>. It has two groups of children.

In addition, the element has the standard attributes described in the XDK Guide.

The <grammar> element uses the embedded <grammarItem> element.

Grammar

Children

·          <expression usage=”precondition”> or
<variable usage=”precondition”> or
<collection usage=”precondition”> or
<script usage=”precondition”>
Defines the precondition for the Grammar object.

·          +<grammarItem>
Defines the list of Grammar items.

 

Example

<grammar>

  <grammarItem language=”en-US”>

    <grammarDefinition mode=“voice“>

      one, two, three

    </grammarDefinition>

    <grammarDefinition mode=“dtmf“>

      1, 2, 3

    </grammarDefinition>

  </grammarItem>

  <grammarItem language=”de-DE”>

    <grammarDefinition mode=“voice“>

      eins, zwei, drei

    </grammarDefinition>

    <grammarDefinition mode=“dtmf“>

      1, 2, 3

    </grammarDefinition>

  </grammarItem>

</grammar>

GrammarItem

Attributes

·          label
A text string providing a name for the Grammar item.

·          language
Defines the language for the Grammar item. Can be default or a valid language code (e.g. de-DE, en-US, etc.). If not specified, defaults to default.
Appendix A – Language Codes contains a list of all language codes available in VoiceObjects together with the respective language they represent.

·          layer
Defines the layer for the Grammar item. Can either be a reference to a Collection, Expression, Script, or Variable object; or a layer state reference of the form “Layer=State” or “Layer!=State” where “State” is the label of a state for the layer “Layer”.

·          channel
Defines the channel(s) for which this Grammar item is valid. Can be default, voice, video, text, web, voiceVideo, or textWeb. If not specified, defaults to default.

·          weighting
Defines the relative weighting of the grammar. Indicated by disabled or a positive float value. E.g. 0.5 indicates 50% weight, and 1.7 indicates 170% weight. If not specified, defaults to disabled.

 

Children

·          <grammarDefinition mode=”voice”>
Defines the Grammar item section for voice.

·          <grammarDefinition mode=”dtmf”>
Defines the Grammar item section for DTMF.

·          <connectorItem>
Defines the preprocessing for the Grammar item. For more information, refer to the Connector object.

 

Example

<grammarItem>

  <grammarDefinition mode=”voice” ttg=”false” grammarType="xml">

    <![CDATA[<rule id="root" scope="public">operator</rule>]]>

  </grammarDefinition>

</grammarItem>

GrammarDefinition

Attributes

·          mode [required]
Indicates the mode of the Grammar object. Can be either dtmf or voice.

·          location
Defines the location from where the grammar is to be retrieved. Must be a reference to a Resource Locator object.

·          file
Defines the grammar file name, without extension. Can be a constant name, or a reference to an Expression, Script, or Variable object.

·          rule
Defines the rule name to be used within the grammar file. Can be a static name, or a reference to an Expression, Script, or Variable object.

·          grammarPrecedence
Indicates if the embedded or external grammar definition takes precedence, in case both are present. Must be one of embedded:external or external:embedded, or default. If not specified, defaults to default.

·          grammarType
Defines the type of the grammar. Must be one of default, abnf, gsl, jsgf, xml, cisco, precompiled, builtin, none. If not specified, defaults to default.

·          embeddedRef
Defines an embedded grammar through a reference to an Expression, Script or Variable object.

·          ttg
Indicates whether to use TTG on the embedded definition. Ignored if no embedded definition is present. May be true or false. If not specified, defaults to true.

·          grammarFileExtension
Specifies the file extension for the grammar file as either a constant value or a reference to an Expression, Script or Variable object. Legal constant values are none, fsg, gram, grammar, grm,grxml, gsl, jsgf, ngo, sjv, srgs, txt, xml. If not specified, defaults to grm.

 

Children

·          CDATA
Optional constant embedded grammar definition.

 

Examples

<grammarDefinition mode=”voice” ttg=”true”>

  apple, orange, pear

</grammarDefinition>

 

<grammarDefinition mode=”voice” embeddedRef="#MainMenu" >

 

<grammarDefinition location=”#Grammar Locator” file=”cars” grammarFileExtension=”gsl” grammarMode=”external:embedded”>

  bmw, ford, nissan

</grammarDefinition>

Object Interoperability

The following table contains all object types that can reference a Grammar object:


Icon

Object Name

Use Case Example

Input

A Grammar object can be used within an Input object.

Menu

A Grammar object can be used within a Menu item of a Menu object.

Confirmation

A Grammar object can be used within a Correction item of a Confirmation object.

List

A Grammar object can be used for the navigation and selection grammars in the List object.

Hyperlink

A Grammar object can be used within a Hyperlink object.

Pause

A Grammar object can be used as the wake up grammar within a Pause object.

Transfer

A Grammar object can be used as the termination grammar within a Transfer object.

OSDM

A Grammar object can be linked in the parameter set within any OSDM object.

Object Naming Conventions

In order to leverage the capabilities of the integrated documentation of VoiceObjects it is important to provide intuitive and self-explanatory object names and descriptions.

The name of a Grammar object should indicate which types of utterances the grammar covers. The short description should contain a brief explanation of the possible caller utterances, as well as assumptions or restrictions (e.g. the language covered). The table below lists three examples:


Name

Description

 Yes/No grammar

Allows the caller to flexibly answer yes/no questions in a variety of ways. US-English.

 Date grammar

Allows the caller to enter dates in both absolute format (June first two thousand four) and relative format (next Monday). German.

 Mobile phone number grammar

Allows the caller to enter German mobile phone numbers with the area codes 0170, 0171, 0172, and 0173. German.