* Audio

Overview

The Audio object refers to prerecorded audio that can be played back through an Output object in the voice and video channel (the latter only if the video platform also supports playback of audio). It is possible to refer to a single file only, or to a set of files that hold variations of a prompt (for more information on randomizing, see Output). In addition, an audio generation engine can be referred to through an Audio object, in order to embed dynamically rendered audios into an application.
Alternative text can be provided, which is played back by text-to-speech (TTS) if the audio files are not available or not in the correct format.
The supported audio file formats depend on the underlying media platform that will do the final audio processing.

Using the Audio object, any kind of audio content can be played back in an application, such as speech, music, or earcons. For more details on audio design, see Creating Output in Appendix A – Voice User Interface Design in the Design Guide.

The Audio object, including the alternative text definitions, will be ignored in the text and Web channel.

The Object Definition below covers the configuration of the Audio object with VoiceObjects Desktop. For information on how to define this object type using VoiceObjectsXML refer to the VoiceObjectsXML Definition paragraph.

For information on how to manage the prompts of your application, refer to the Storyboard Manager Guide.

The Audio object belongs to the object category Resources.

Dialog Flow Scenario

The following dialog flow example demonstrates an Output object with two embedded Audio objects.

Object

Dialog Flow

*

[ Music jingle ]

*

[ Hello and welcome to the VoicePortal.]

Object Definition

The Definition of the Audio object provides the following sections:

·          Audio Resource
To specify an external audio file.

·          Parameter Set
To specify an optional set of parameters to be passed to audio generation or streaming engines.

·          Alternative Texts
To specify the alternative text that will be played back with TTS if the audio file is not available or if the output mode is set to TTS:Audio.


 

For further details regarding additional object configuration refer to Precondition and Properties in this Object Reference.

Audio Resource

The Audio Resource section provides the following properties:

Property

Description

Location

Optional parameter to specify the location (using a Resource Locator object) of the audio file to be used.

File

Defines the file name of the audio file to be used. The name of the file can either be typed in or selected through a file browser by clicking the Browse button  to the right of the File field. The filename can also be provided dynamically during call time by assigning a Variable, Expression, Script, or Layer object.

In Desktop for Web the Browse button  will only be displayed if the Resource Locator object is available and includes a physical path definition. Moreover the audio file can be played by clicking the Play button  to the right of the File field. This will start the default audio player associated with Internet Explorer and play the audio file in this player. This button will only be displayed if the Resource Locator object is available and includes a URL definition without references to dynamic objects (Variable, Expression, Script, or Layer).

File prefix

Optionally specifies a prefix that is applied to the file name. The file prefix can also be provided dynamically during call time by assigning a Variable, Expression, Script, or Layer object.

File suffix

Optionally specifies a suffix that is applied to the file name. The file suffix can also be provided dynamically during call time by assigning a Variable, Expression, Script, or Layer object.

Random variations

Defines whether randomizing is enabled for the audio file. To enable randomizing set the Random variations property to a number between 2 or 99, by selecting an appropriate value from the drop-down list. If randomizing is enabled and different audio files are provided with the specified file name extended by different numbers (e.g. welcome1.wav, welcome2.wav, welcome3.wav), the server will randomly pick one of the available files with a random file extension between 1 and the selected number whenever the Audio object is processed. True randomizing ensures that all available audio files are played before a repetition occurs.
By default, the Random variations property is set to Disabled indicating that no randomization is used.
For more information on randomization see the Output object.

Extension

Specifies the audio file extension. Instead of selecting one of the provided file extensions from the drop-down list, the extension can also be supplied by using a Variable, Expression, Script, or Layer object.

The complete filename for the audio file is generated at call time according to the following formula:

[prefix] + file + [suffix] + [random number] + [extension]

Parameter Set

The Parameter Set section is optional and relevant only for Audio objects that refer to external audio generation engines. Use it to define any number of parameters, all of which will be added to the final URL as CGI parameters, in order to control the engine if required.

 

This section provides the following properties:

Property

Description

Parameter

Specifies the value of the parameter. Possible parameter types are Variable, Collection, Expression, Layer, Resource Locator or Script objects, and constants. When using a Resource Locator object as a parameter, its URL definition is passed to the script. The field can be left empty to denote an empty value.

Alias

Optionally specifies the name of the parameter. If left empty, the reference ID of the object defined in the Parameter field will be used as the name. This means in particular that for constant values an alias needs to be defined (as they do not have a reference ID). Constant parameters without a defined alias are ignored.


The parameters will be added to the final URL in a CGI-string like fashion.

Alternative Texts

The Alternative Texts section optionally defines the text that will be played back to the caller if the audio file is not in the correct format, not available (e.g. the corresponding file server is down or the network connection is not available) or Output mode is set to TTS:Audio (for more information on Output mode, see Chapter 2 – Configuring Servers and Services in the Deployment Guide).

 

The Alternative Texts section contains items to define the various texts for prompt variations, in case the Audio object reflectsrepresents more than one file. The text entered into the Text field must not contain special characters such as quotation marks, apostrophes, ampersands etc. Also avoid using angle brackets unless you want to include SSML tags into your text. The alternative text can be typed in or can be set at call time by a Variable, Expression, Script, or Layer object.

In addition to the Text field, the following properties can be configured:

Property

Description

Label

Optional parameter to identify the alternative text item in a list.

Language

Audio objects can represent prompts in multiple languages through dynamic resource locators. The Language drop-down list allows you to specify a language for a prompt, so that prompts of different languages can be incorporated into the Audio object. When dealing with monolingual applications, this field can be ignored.

Layer

Associates the alternative text item with a custom layer. This item is only eligible for playback if the layer evaluates to true at call time.
If you reference Audio objects in different Output object items using different layer conditions, there is no need to also define the same layers in the Audio object itself. If, on the other hand, different outputs are achieved by using a Layer (or Variable, Expression, Script) object within the resource locator definition of the Audio object, you can define different alternative texts by associating items with the corresponding layer condition.

Variation

If randomizing is enabled, this field allows you to associate a text with a specific prompt variation number. If the Audio object only refers to a single file, this can be left at the default value. Do not define multiple alternative text items with the same variation number (and the other settings being identical), as this may lead to unexpected behavior at call time.
If you want to use the same alternative text for all randomized audio files, you only need to define an alternative text for variation number 1.

Input mode

Associates the alternative text item with a specific input mode. Leave this property at the default value if your application does not distinguish between voice and DTMF input in the wording of your application prompts. Otherwise, tag the item with the appropriate mode.

i8    Note: If the audio file is not available, not in the right format, or Output mode is set to TTS:Audio, the alternative text will be synthesized by the TTS (text-to-speech) system of the underlying media platform. If the media platform does not support TTS the caller will not hear anything.

For further details on how to use an Audio object, see the Output object in this Object Reference.

 

SSML

The Speech Synthesis Markup Language (SSML), developed by the W3C Voice Browser Working Group, allows adding meta information to text that is supposed to be read out through a speech synthesis system. By using SSML, you have better control over how the TTS engine synthesizes speech. It allows you to define voice-related parameters like volume, speed, pitch, emphasis, pronunciation, etc. For more information on SSML, refer to the specification at the W3C website, or to chapter 4.1.1 of the VoiceXML 2.0 specification.

SSML markup can be included in any alternative text item, as well as in Output items. The server passes the extra code to the media platform without changes. The following example shows how to use SSML to tell the TTS engine to put emphasis on a specific word within a given prompt:

8  Caution: Since the markup you enclose in the alternative text item definition is passed to the platform as it is, make sure to provide the correct syntax required by SSML. Note that it is up to the platform or TTS engine, respectively, if and how SSML is interpreted. If a TTS engine does not support SSML, it might decide to read the XML tags out aloud.

VoiceObjectsXML Definition

The Audio object is represented by the VoiceObjectsXML element <audio>. It has six attributes and two groups of children.

In addition, the element has the standard attributes described in the XDK Guide.

The <audio> element uses the embedded <altText> element.

Audio

Attributes

·          location
Defines the location from where the audio file is to be retrieved. Must be a reference to a Resource Locator object.

·          file
The file name of the audio file, without extension. Can be a constant name, or a reference to a Variable, Expression, or Script object.

·          prefix
A prefix to be used together with the audio file name.

·          suffix
A suffix to be used together with the audio file name.

·          random
Indicates whether randomization should be used. Can be either disabled or an integer >= 2 specifying the number of versions in which the file is available. If not specified, defaults to disabled.
For further information on randomization see Random Variations above.

·          audioFileExtension
Specifies the file extension for the audio file as either a constant value or a reference to a Variable, Expression, or Script object. Legal constant values are none, aif, aiff, alaw, alw, au, dwd, ivc, mp3, snd, ulaw, ulw, voc, vox, wav. If not specified, defaults to wav.

The complete filename for the audio file is generated at call time according to the following formula:

[prefix] + file + [suffix] + [random number] + [extension]

 

Children

·          <expression usage=precondition> or
<variable usage=precondition> or
<collection usage=precondition> or
<script usage=precondition>
Defines the precondition for the Audio object.

·          ?<parameterSet>
Defines the parameter set.

·          *<altText>
Defines the corresponding alternative text(s).

 

Example

<audio location=#Prompt locator file=welcome audioFileExtension=wav/>

<audio location=#Prompt locator file=nomatch audioFileExtension=wav random=5/>

ParameterSet

Children

·          +<item>
Defines the list of entries in the parameter set. The use of the alias attribute is optional when the object attribute is used, and required when the valueattribute is used.

 

Example

<parameterSet>

  <item alias=prompt object=#Prompt/>

  <item alias=speed value=0.5/>

</parameterSet>

Item

Attributes

·          object
Defines a reference to an object. If this attribute is defined, the
value attribute must not be defined.

·          value
Defines a constant value. If this attribute is defined, the
object attribute must not be defined.

·          alias
Defines the parameter alias.

 

Example

<item alias=prompt object=#Prompt/>

AltText

The <altText> element defines the alternative texts to be used for the Audio object. It may only occur embedded inside an <audio> element.

The <altText> elementhas six attributes and contains text as a child.

 

Attributes

·          label
A text string providing a name for the alternative text.

·          random
Defines the random index to which this alternative text belongs. An integer >= 1. If not specified, defaults to 1.

·          language
Defines the language for which this alternative text is valid. Can be default or a valid language code (e.g. de-DE, en-US, etc.). If not specified, defaults to default.
Appendix A – Language Codes contains a list of all language codes available in VoiceObjects together with the respective language they represent.

·          inputMode
Defines the input mode for which this alternative text is valid. Can be default, voice, dtmf, or voicedtmf. If not specified, defaults to default.

·          layer
Defines the layer for the alternative text. Can either be a reference to a Collection, Expression, Script, or Variable object; or a layer state reference of the form “Layer=State or “Layer!=State where “State is the label of a state for the layer “Layer.

·          object
Optional reference to an Expression, Variable, or Script object that defines the alternative text.

 

Children

·          CDATA
The static alternative text itself. Note that the text may not contain special characters such as apostrophes, quotation marks, etc. It may, however, contain XML markup to be used e.g. in combination with SSML.

 

Example

<altText language=de-DE>

  Bitte geben Sie Ihre Telefonnummer ein.

</altText>

 

<altText random=1> Sorry? </altText>

<altText random=2> Once more, please. </altText>

<altText random=3> I didn’t quite get that. </altText>

Object Interoperability

The following table contains all object types that can reference an Audio object:

Icon

Object Name

Use Case Example

Output

An Audio object can be referenced within an Output object definition. In addition, an Audio object can be referenced in all objects that provide Output items.

List

An Audio object can be referenced within the Content Formatting definition of a List object, to play back pre-recorded audio as part of the List content.

Connector

An Audio object can be linked within a Connector object as the Wait Loop Audio.

Database

An Audio object can be linked within a Database object as the Wait Loop Audio.

Script

An Audio object can be linked within a Script object as the Wait Loop Audio.

OSDM

An Audio object can be linked in the parameter set within any OSDM object.

Object Naming Conventions

In order to leverage the capabilities of the integrated documentation of VoiceObjects it is important to provide intuitive and self-explanatory object names and descriptions.

The name of an Audio object should be derived from the name of the Output object it is used in. When used in multiple outputs, its name should provide information about what is being played. The table below lists three examples:

Name

Description

* Get claim reference number, Help 1

Plays first level help for the Input object Get claim reference number

* Generic No Input 1

Contains five alternatives for a generic first level No Input prompt (e.g. Sorry?)

* Main menu earcon

Short jingle played every time the caller returns to the main menu