Output

Overview

The Output object is a configurable dialog component to present an output to the caller. In the voice and video channel, outputs may contain text that is read out through a text-to-speech engine, or prerecorded audio or video files, possibly containing various dynamic content as well as Silence objects. In the text and Web channel, the output defines what is displayed on the screen of the mobile device. References to Audio, Video or Silence objects will be ignored in these channels.

The Object Definition below covers the configuration of the Output object with VoiceObjects Desktop. For information on how to define this object type using VoiceObjectsXML refer to the VoiceObjectsXML Definition paragraph.

The Output object belongs to the object category Components.

Dialog Flow Scenario

The following dialog flow example describes a simple output from a flight booking application.

Object – Caller

Dialog Flow

Thanks for booking flight 456 from Boston to Chicago.

Object Definition

The Definition of the Output object provides one section:

·          Output
To define the set of Output items.


 

For information regarding additional object configuration refer to Pre-/Postprocessing, Tuning and Properties in this Object Reference.

i8  Note: If an Output object is used within an event handling definition, any pre- or postprocessing defined for that Output object is ignored. This is also the case when a Goto object with an Output object as its destination object is linked into an event handling definition.
An Output object’s pre- and postprocessing is also ignored if the object is referenced within another Output object.

Output

An Output object may contain any number of individual Output items.

 

In each Output Item the following properties can be specified:

Property

Description

Label

Optional parameter to identify the Output item in a list.

Layer

Optional parameter to indicate the custom layer that the Output item belongs to. For more information on layers, see Chapter 7 – How to Use Layers in the Design Guide.

Channel

To identify the channel in which the respective Output item is to be active. If an Output item is appropriate for all channels, the default channel layer setting Default should be used. Otherwise, define one of the four channels Voice, Video, Text or Web. For your convenience, use the two combined settings Voice/Video or Text/Web in case one Output item is to be active in those combinations. Using a combined channel is the same as defining two Output items with the individual channel setting, with all other settings being the same.

Language

To identify the language in which the respective Output item is provided. If an Output item is appropriate in all relevant languages, the default language layer setting Default should be used. For languages that are used in multiple countries, both the specific language options (e.g. English (SG), English (US)) and the generic option (English) are provided.

Occurrence

Enables you to play different outputs depending on how often the Output object has already been visited. Always means that regardless of the number of previous visits, the respective Output item is always active. Only once means that the Output item is active only the first time the Output object is visited. And If >= N means that the Output item is active only if this is at least the Nth time that the Output object has been visited.

Input mode

To tag the Output item with an input mode, so that different outputs can be played depending on the current state of the input mode system layer. This is typically used for applications that are designed to be controlled with both input modes: Voice and/or DTMF. Using this setting you can, for example, define one Output item that is played when the input mode is set to Voice (Please say yes or no), and a different one for input mode DTMF (Please press 1 for yes and 0 for no), both in the same Output object. For more information on designing input mode-specific applications, refer to Chapter 9 – How to Use Different Input Modes in the Design Guide.
You can set the input mode either to Voice, DTMF, or Voice+DTMF. If you do not want to distinguish your Output items with regard to input mode, leave the setting at Default. In this case, the output item is applicable to any input mode
. The initial state of the input mode system layer can be set in the Service object. For more details, refer to Configuring a Service in Chapter 2 – Configuring Servers and Services in the Deployment Guide.

Barge-in

Enables barge-in by selecting True or disables it by selecting False.
Barge-in is a feature in the voice and video channel that allows callers to interrupt an output message and respond without waiting until the output has finished. The feature itself is essentially provided by the media platform whereas VoiceObjects provides the possibility to switch barge-in on or off. In the text and Web channel, the barge-in setting is ignored.
The default setting Default indicates that the barge-in setting from the current dialog context is taken. The initial value of the barge-in can be set in the Service object. For more details, refer to Configuring a Service in Chapter 2 – Configuring Servers and Services in the Deployment Guide.

i8  Note: When using Output objects within other Output objects, the barge-in setting of the innermost output is applied to each sub-portion of the total prompt.

 

The actual output is defined in the Prompt field.

Prompt

An output can be defined by entering plain text in the text field or by inserting new or existing Output, Variable, Collection, Expression, Script, Audio, Video, Text, Silence, and Layer objects.

i8  Note: Audio, Video and Silence objects are only applicable in the voice and video channel. In the text and Web channel, these objects will be ignored. The textual channels offer a special character for display purposes, though, which is ignored in the voice and video channel. Use the pipe symbol “| if you want to include a line break in the output. More than one line break can be achieved by using more pipes, separated by a blank. A double-pipe with no blanks “|| will result in a pipe symbol to be displayed. In the Web channel, you can also include XHTML elements in the Output items. Use elements like <b>, <i>, <center> etc. to modify the layout of the output, or use <div> or <span> to make use of style sheets (CSS). See HTML below for more information on this.

Existing objects like Variable or other Output objects can be referenced in a prompt by typing in their names using the shortcut notation described in Chapter 6 – Object Editors in the Desktop for Eclipse Guide and Desktop for Web Guide, respectively.

i8  Note: A Collection object will be presented cell by cell, ignoring the XML tags. If a Variable or Script object contains XML code which is formatted as a collection, it will be treated the same way. Any other XML code will be presented including the tags.

Random Prompting

The Output object supports true random prompting, a valuable technique for good User Interface design for voice and video applications. In the text and Web channel, random prompting is less frequently used.
Random outputs can be achieved by simply providing multiple Output items with the same language layer, custom layer (defined in the Layer field), occurrence level, channel, and input mode setting.

The algorithm that determines which Output item gets played is as follows:

1.     Start with all Output items defined within the Output object.

2.     Evaluate the custom layer conditions of all Output items and eliminate those whose layer condition evaluates to false.

3.     Eliminate all Output items that do not have their language layer set to either the currently active language, a super-set thereof (e.g. English vs. English (US)), or Default.

4.     Eliminate all Output items that do not have their input mode layer set to either the currently active input mode, or Default.

5.     Eliminate all Output items that do not have their channel layer set to either the currently active channel, or Default.

6.     Assuming this is the Nth visit to the Output object, eliminate all Output items that have the occurrence level set to >=(N+1) or higher. For N>1, eliminate all Output items with occurrence level Only once. If an Output item with occurrence level >=N exists, eliminate all those with setting >=(N-1) or lower. If no Output item with occurrence level >=N exists, use the one with the highest number below N and eliminate all others below.
Example: An Output item that is tagged with Only once is only considered for playing the very first time the Output object is visited. An Output item that is tagged with If >= 2 is considered for playing upon the second visit to the Output object, and for all subsequent visits unless an item with tag If >= 3 is defined in the same Output object.

7.     If more than one Output item remains, select one at random from among the remaining set. True randomizing ensures that all available prompts are played before a repetition occurs.

8.     If no Output item remains, do nothing.

i8 Note: Even if no Output item remains, the Output object’s postprocessing (if defined) will still be processed.

If an Output object is used in reprompting, its occurrence settings are interpreted differently. All items with settings Always or Only once are ignored. For all items with settings If >= N the value N is interpreted as the reprompting level.

For a detailed description of how to use the reprompting mechanism, refer to Chapter 8 – Advanced Event Handling in the Design Guide.

SSML

The Speech Synthesis Markup Language (SSML), developed by the Voice Browser Working Group of W3C, allows adding meta information to text that is supposed to be read out through a speech synthesis system in voice or video applications. By using SSML you have better control over how the TTS engine synthesizes speech. It allows you to define voice-related parameters like volume, speed, pitch, emphasis, pronunciation, etc. For more information on SSML, refer to the specification at the W3C Website, or to chapter 4.1.1 of the VoiceXML 2.0 specification.

SSML markup can be included in any Output item as well as alternative text items within the Audio object. VoiceObjects Server passes the extra code to the media platform without changes. The following example shows how to use SSML to tell the TTS engine to put emphasis on a specific word within a given prompt:

8  Caution: Since the markup you enclose in the Output item definition is passed to the platform as it is, make sure to provide the correct syntax required by SSML. Note that it is up to the platform or TTS engine, respectively, if and how SSML is interpreted. If a TTS engine does not support SSML, it might decide to read the XML tags out aloud.

Prompt Marker

SSML allows you to embed markers between different prompts within an output. The prompts can be both TTS and prerecorded audio. The markers can be used to detect if the caller barged in on a prompt, as well as the position within a prompt at which this barge-in occurred. The system function LASTRESULT(type, [index], [phase]) with type=markname returns the name of the last marker processed by the media platform. With type=marktime you get the time in milliseconds that has passed since processing the last marker. LASTRESULT is available through the Expression object.

Markers can be defined within Output items by putting a <mark> element with a name attribute at the desired position. In the following example, a marker "ADSTART" is defined at the beginning of an advertisement:

 

In this example, the LASTRESULT function with type=marktime can help in finding out if the message was heard entirely by the caller or if he or she barged in too early during the advertisement. The latter could, for instance, result in a reprompt.

HTML

The Web channel works on the basis of (X)HTML, which is rendered at call time by the server. You can enhance the rendered markup by embedding HTML tags in Output items.

Examples:

Element

Description

<b>

Prints the text bold.

<i>

Prints the text in italics.

<u>

Underlines the text.

<center>

Centers the text in the display.

<h1>, <h2>, …

Applies a heading layout to the text.

<font size=…>

Applies given size to the text.

<font color=…>

Shows text in given color.

<img src=…>

Embeds an image in the display using the given URL.


While these tags are only relevant in the Web channel, the content itself (=prompts) is usually the same in the text and Web channel. By default, the server therefore removes all XML tags from an Output definition within the text channel. This enables developers to use the same Output item for both channels, setting the channel layer to Text/Web. Note, though, that a string like “XX_<br/>_YY, where “_ denotes blanks, will result in two blanks “XX__YY.

By default, Web browsers apply their own styles when it comes to displaying the content on the screen. As a Web developer, you can influence the layout by using cascading style sheets (CSS). VoiceObjects Server uses three default style sheets for all pages rendered for a dialog, depending on the driver selected. For the Mobile Web XHTML 1.0 driver, the file is called default.css. For the Apple iPhone Web XHTML 1.0 driver, the file is called iphone.css. For the Rich Web Client XHTML 1.0 driver, the file is called richweb.css. Find them in your VoiceObjects installation at Resources\System\StyleSheet\. You can either modify this CSS file, or reference your own by using the corresponding tuning parameter Presentation – Style Sheet URL. When using this parameter, the default style sheet will not be applied.

For all elements rendered by the server that are relevant for layout, different classes are used, so you can modify the output of these elements by defining styles for these classes. The following table lists all available classes:

Element + Class

Used for

div.voOutput

Prompt in Output object

a.voMenuItem

Menu items in Menu object

a.voInputItem

Items in Input object, if tuning parameter Presentation – Input is set to menu

a.voConfirmOutput

Confirm output in Confirmation object

a.voCorrectionItem

Correction items in Confirmation object

a.voListNavigationItem

Navigation commands in List object

a.voListSelectionItem

Selection commands in List object

a.voNavigationItem

Single custom and standard navigation command

div.voNavigation

Entire block of navigation commands

input.voSubmit

“Proceed button

div.voSubmit

“Proceed button; use this to be able to center the button

input.voText

Text input field of Input object, if tuning parameter Presentation – Input is set to text

input.voPassword

Password input field of Input object, if tuning parameter Presentation – Input is set to password

input.voRadio

Radio buttons of Input object, if tuning parameter Presentation – Input is set to radio

option.voOption

Options (drop-down list) of Input object, if tuning parameter Presentation – Input is set to list

img.voTopLogo

Top logo image, defined through tuning parameter Presentation – Top Logo URL

img.voBottomLogo

Bottom logo image, defined through tuning parameter Presentation – Bottom Logo URL

h6.voTitle

Phone Simulator – Web only. The <title> tag is converted into <h6 class=voTitle> as the first child element of <body>, so that it is shown within the phone screen.

 

In addition to class, the server adds the id attribute to all elements relevant for layout. This attribute consists of the type and name of the object that this page was rendered for, with the syntax type_name. Type and name are converted to lower-case and all special characters such as blanks, underscores or hyphens are removed from the name. Using this attribute, the style sheet can define formatting that only applies to specific objects of an application.

Usage Examples

Random prompting

The first example shows random prompting. By providing three different Output items with the same language setting and without any additional layers (see below), the server will randomly select one of them at call time.


 

Custom layers

By using custom layers associated with the Output items, it is easy to build time-dependent greetings, for instance. Create one Output item for each time of day as shown below, and associate a layer with each of them, testing whether it is morning, afternoon, or evening.

Lively welcome outputs

By combining an audio jingle, the time-dependent greeting shown above, and a randomized intro as shown in the first example, a varied and lively welcome output for an application can be created with minimal effort. Note that the Output item in the following example only consists of references to other objects, such as Audio, Output, and Silence.

Occurrence level

In many cases, it is desirable to have different outputs depending on whether a menu is reached the first time or subsequently. This is easily achieved using the occurrence level setting as shown below. Note how this can be combined with random prompting, to let the menu sound fresh, i.e. non-robotic, even when the caller returns to it many times during the call.


Language layer

The final example shows the use of the language layer for a multilingual goodbye prompt in English and German.


Output objects within other Output objects

Highly sophisticated prompts can be designed by using Output objects within other Output objects. For example, a prompt Welcome [Name]! can be split into two Output objects to provide a fully dynamic greeting. The table below describes the object definitions, where each row of condition and content describes a single Output item.

Output object

Condition

Content

Welcome

time = morning

age > 30

Good morning

time = afternoon

age > 30

Welcome

time = evening

age > 30

Good evening

time = morning, afternoon, evening

age <=30

Hi

time = morning, afternoon, evening

age <= 30

Hello

Welcome complete

 

[O:Welcome] [V:Name]


Processing the Output object Welcome complete during a call will first process the Welcome Output object with all its conditions.

For a 36 year old caller named Henry, the output will be Good Morning Henry, Welcome Henry or Good evening Henry depending on the time of day.

For a 23 year old caller named Lisa, the output will be one of the two random selections Hi Lisa or Hello Lisa.

Advanced usage example: Local barge-in

Barge-in, i.e. the capability for the caller to interrupt system output, is routinely used in state-of-the-art voice applications. Its flexibility however is often rather limited on the current generation of media platforms. For example, it is often difficult to define a specific range of prompts over which barge-in should extend.

Consider the following example: A voice application that provides the download of mobile phone ring tones has the following initial dialog flow:

Object – Caller

Dialog Flow

Welcome to Chart Ring. We let you download the Top Ten tunes to your phone!

Here’s how it works… (Plays explanatory intro)

Plays the ring tone of the day

How about it? Download this ring tone now?


The caller may interrupt the application during the intro if he already knows how to use it. In this case, though, barge-in should not skip all prompts up to and including the question Download this ring tone now? Instead it would be preferable to still present the ring tone of the day and to follow up with the question of whether the caller wants to buy it. We call this local barge-in, since it allows you to define the scope of the barge-in and limit it to a certain set of prompts. Note that this is very different from simply switching barge-in off, which would force the caller to listen to certain prompts in their entirety before being able to say anything.

Local barge-in can be achieved as follows:
The prompt (or prompts) to show the behavior should be used within the initial output of an Input object instead of within an autonomous Output object. As the grammar to be used in the Input, define a simple TTG utterance that is unlikely to be entered by the caller (e.g. this is unlikely to be entered). Both options Enable auto-advance on No Input and Enable auto-advance on No Match need to be selected. In the Event Handling section, define a handler for the Caller Help event and set continuation to Proceed, if desired. Finally, the tuning parameter Input – Speech Timeout should be set to 0.01 seconds.

The modified dialog flow looks as follows:

Object – Caller

Dialog Flow

Welcome to Chart Ring. We let you download the Top Ten tunes to your phone! Here’s how it works… (Plays explanatory intro; uses local barge-in)

Plays the ring tone of the day

How about it? Download this ring tone now?


If the caller barges in during the initial prompt, this is interpreted as an input. Since the input is unlikely to match the grammar, this will cause a No Match event. Due to the setting Enable auto-advance on No Match the application proceeds to the next object in the dialog flow and plays the ring tone of the day.

If, on the other hand, the caller does not interrupt the initial prompt then the application waits for an input. Since the Input – Speech Timeout is set to a very small value, a No Input event is thrown almost immediately. Due to the setting Enable auto-advance on No Input the application proceeds to the next object in the dialog flow and plays the ring tone of the day.

VoiceObjectsXML Definition

The Output object is represented by the VoiceObjectsXML element <output>. It has one attribute and four groups of children.

In addition, the element has the standard attributes described in the XDK Guide.

The <output> element uses the embedded <outputItem> element.

Output

Attributes

·          type
Defines whether this Output object is to be used as either initial or reprompt. If not specified, the type defaults to initial. When an Output object is linked, this attribute may be overwritten.

 

Children

·          <expression usage=precondition> or
<variable usage=precondition> or
<collection usage=precondition> or
<script usage=precondition>
Defines the precondition for the Output object.

·          <sequence usage=preprocessing>
Defines the preprocessing sequence for the Output object.

·          +<outputItem>
Defines the list of available Output items.

·          <sequence usage=postprocessing>
Defines the postprocessing sequence for the Output object.

 

Example

<output>

  <outputItem language=en-US>

    <text>Welcome to Prime Insurance.</text>

  </outputItem>

  <outputItem language=de-DE>

    <text>Herzlich willkommen bei Prime Insurance.</text>

  </outputItem>

</output>

OutputItem

Attributes

·          label
A text string providing a name for the Output item.

·          bargein
Defines the barge-in behavior of the Output item. Can be true, false, or default. If not specified, defaults to default.

·          language
Defines the language for which this Output item is valid. Can be default or a valid language code (e.g. de-DE, en-US, etc.). If not specified, defaults to default.
Appendix A – Language Codes contains a list of all language codes available in VoiceObjects together with the respective language they represent.

·          occurrence
Defines the occurrence level for which this Output item is valid. Can be always, once, or an integer between 1 and 10. If not specified, defaults to always.

·          inputMode
Defines the input mode for which this Output item is valid. Can be default, voice, dtmf, or voicedtmf. If not specified, defaults to default.

·          channel
Defines the channel(s) for which this Output item is valid. Can be default, voice, video, text, web, voiceVideo, or textWeb. If not specified, defaults to default.

·          layer
Defines the layer for this Output item. Can either be a reference to a Collection, Expression, Script, or Variable object; or a layer state reference of the form “Layer=State or “Layer!=State where “State is the label of a state for the layer “Layer.

i8    Note: When using Output objects within other Output objects, only the barge-in setting of the outermost Output object is used.

 

Children

·          +(<audio>, <video>, <silence>, <output>, <layer>, <variable>, <expression>, <collection>, <script>, <text>)
Defines the actual output as a concatenation of various elements.
Note that a single blank will automatically be inserted between any two consecutive non-text children during import into the VoiceObjects Metadata Repository or deployment to VoiceObjects Server in order to ensure proper output behavior.

Examples

<outputItem>

  <text>The annual premium for your </text>

  <variable link=#Car Manufacturer/>

  <variable link=#Car model/>

  <text> starts at just </text>

  <variable link=#Premium/>

  <text> dollars.</text>

</outputItem>

Object Interoperability

The following table contains all object types that can reference an Output object:

Icon

Object Name

Use Case Example

Module

An Output object can be used within the embedded sequence of a Module object, as well as be referenced for the welcome and goodbye output of the Module object.

Input

An Output object can be referenced within an Input object.

Output

An Output object can be referenced from within other Output objects.

Sequence

Most commonly, an Output object is used within a Sequence object in a dialog flow.

Menu

An Output object can be referenced within each Menu item in a Menu object.

Confirmation

An Output object can be referenced in multiple places inside a Confirmation object, as well as used as the destination object within a Correction item.

List

An Output object can be referenced in multiple places inside a List object.

If

An Output object can be used within either the THEN item or the ELSE item of an If object.

Case

An Output object can be used in any WHEN item of a Case object.

Loop

An Output object can be used within the embedded sequence of a Loop object.

Goto

An Output object can be referenced via a Goto object.

Hyperlink

An Output object can be the destination of a Hyperlink object.

Recording

An Output object can be referenced within a Recording object.

Transfer

An Output object can be referenced within a Transfer object.

Pause

An Output object can be referenced as both the welcome and the pause output of a Pause object.

Exit

An Output object can be referenced within an Exit object.

OSDM

An Output object can be linked in the parameter set within any OSDM object.

Object Naming Conventions

In order to leverage the capabilities of the integrated documentation of VoiceObjects it is important to provide intuitive and self-explanatory object names and descriptions.

The name of an Output object should indicate which type of prompt is played to the caller. The table below lists three examples:

Name

Description

 Play insurance fee

Plays the insurance fee retrieved from a database to the caller.

 Randomized intro

Plays a randomized intro to the Insurance Portal, selecting from 5 different prompts.

 Time-dependent greeting

Plays a greeting that depends on the current time of day, e.g. Good morning or Good evening.