The Audio object refers to prerecorded audio that can be played back through an Output object in the voice and video channel (the latter only if the video platform also supports playback of audio). It is possible to refer to a single file only, or to a set of files that hold variations of a prompt (for more information on randomizing, see Output). In addition, an audio generation engine can be referred to through an Audio object, in order to embed dynamically rendered audios into an application.
Alternative text can be provided, which is played back by text-to-speech (TTS) if the audio files are not available or not in the correct format.
The supported audio file formats depend on the underlying media platform that will do the final audio processing.
Using the Audio object, any kind of audio content can be played back in an application, such as speech, music, or earcons. For more details on audio design, see Creating Output in Appendix A – Voice User Interface Design in the Design Guide.
The Audio object, including the alternative text definitions, will be ignored in the text and Web channel.
The Object Definition below covers the configuration of the Audio object with VoiceObjects Desktop. For information on how to define this object type using VoiceObjectsXML refer to the VoiceObjectsXML Definition paragraph.
For information on how to manage the prompts of your application, refer to the Storyboard Manager Guide.
The Audio object belongs to the object category Resources.
The following dialog flow example demonstrates an Output object with two embedded Audio objects.
|
Object |
Dialog Flow |
|
|
|
[ Music jingle ] |
|
|
|
[ Hello and welcome to the VoicePortal.] |
|
The Definition of the Audio object provides the following sections:
· Audio Resource
To specify an external audio file.
· Parameter Set
To specify an optional set of parameters to be passed to audio generation or streaming engines.
· Alternative Texts
To specify the alternative text that will be played back with TTS if the audio file is not available or if the output mode is set to TTS:Audio.

For further details regarding additional object configuration refer to Precondition and Properties in this Object Reference.
The Audio Resource section provides the following properties:
|
Property |
Description |
|
|
Location |
Optional parameter to specify the location (using a Resource Locator object) of the audio file to be used. |
|
|
File |
Defines the file name of the audio file to be used. The name of the file can either be typed in or selected through a file browser by clicking the Browse button In Desktop for Web the Browse button |
|
|
File prefix |
Optionally specifies a prefix that is applied to the file name. The file prefix can also be provided dynamically during call time by assigning a Variable, Expression, Script, or Layer object. |
|
|
File suffix |
Optionally specifies a suffix that is applied to the file name. The file suffix can also be provided dynamically during call time by assigning a Variable, Expression, Script, or Layer object. |
|
|
Defines whether randomizing is enabled for the audio file. To enable randomizing set the Random variations property to a number between 2 or 99, by selecting an appropriate value from the drop-down list. If randomizing is enabled and different audio files are provided with the specified file name extended by different numbers (e.g. welcome1.wav, welcome2.wav, welcome3.wav), the server will randomly pick one of the available files with a random file extension between 1 and the selected number whenever the Audio object is processed. True randomizing ensures that all available audio files are played before a repetition occurs. |
||
|
Extension |
Specifies the audio file extension. Instead of selecting one of the provided file extensions from the drop-down list, the extension can also be supplied by using a Variable, Expression, Script, or Layer object. The complete filename for the audio file is generated at call time according to the following formula:
|
The Parameter Set section is optional and relevant only for Audio objects that refer to external audio generation engines. Use it to define any number of parameters, all of which will be added to the final URL as CGI parameters, in order to control the engine if required.

This section provides the following properties:
|
Property |
Description |
|
Specifies the value of the parameter. Possible parameter types are Variable, Collection, Expression, Layer, Resource Locator or Script objects, and constants. When using a Resource Locator object as a parameter, its URL definition is passed to the script. The field can be left empty to denote an empty value. |
|
|
Alias |
Optionally specifies the name of the parameter. If left empty, the reference ID of the object defined in the Parameter field will be used as the name. This means in particular that for constant values an alias needs to be defined (as they do not have a reference ID). Constant parameters without a defined alias are ignored. |
The parameters will be added to the final URL in a CGI-string like fashion.
The Alternative Texts section optionally defines the text that will be played back to the caller if the audio file is not in the correct format, not available (e.g. the corresponding file server is down or the network connection is not available) or Output mode is set to TTS:Audio (for more information on Output mode, see Chapter 2 – Configuring Servers and Services in the Deployment Guide).

The Alternative Texts section contains items to define the various texts for prompt variations, in case the Audio object reflectsrepresents more than one file. The text entered into the Text field must not contain special characters such as quotation marks, apostrophes, ampersands etc. Also avoid using angle brackets unless you want to include SSML tags into your text. The alternative text can be typed in or can be set at call time by a Variable, Expression, Script, or Layer object.
In addition to the Text field, the following properties can be configured:
|
Property |
Description |
|
Label |
Optional parameter to identify the alternative text item in a list. |
|
Audio objects can represent prompts in multiple languages through dynamic resource locators. The Language drop-down list allows you to specify a language for a prompt, so that prompts of different languages can be incorporated into the Audio object. When dealing with monolingual applications, this field can be ignored. |
|
|
Layer |
Associates the alternative text item with a custom layer. This item is only eligible for playback if the layer evaluates to true at call time. |
|
Variation |
If randomizing is enabled, this field allows you to associate a text with a specific prompt variation number. If the Audio object only refers to a single file, this can be left at the default value. Do not define multiple alternative text items with the same variation number (and the other settings being identical), as this may lead to unexpected behavior at call time. |
|
Input mode |
Associates the alternative text item with a specific input mode. Leave this property at the default value if your application does not distinguish between voice and DTMF input in the wording of your application prompts. Otherwise, tag the item with the appropriate mode. |
i8 Note: If the audio file is not available, not in the right format, or Output mode is set to TTS:Audio, the alternative text will be synthesized by the TTS (text-to-speech) system of the underlying media platform. If the media platform does not support TTS the caller will not hear anything.
For further details on how to use an Audio object, see the Output object in this Object Reference.
The Speech Synthesis Markup Language (SSML), developed by the W3C Voice Browser Working Group, allows adding meta information to text that is supposed to be read out through a speech synthesis system. By using SSML, you have better control over how the TTS engine synthesizes speech. It allows you to define voice-related parameters like volume, speed, pitch, emphasis, pronunciation, etc. For more information on SSML, refer to the specification at the W3C website, or to chapter 4.1.1 of the VoiceXML 2.0 specification.
SSML markup can be included in any alternative text item, as well as in Output items. The server passes the extra code to the media platform without changes. The following example shows how to use SSML to tell the TTS engine to put emphasis on a specific word within a given prompt:

8 Caution: Since the markup you enclose in the alternative text item definition is passed to the platform as it is, make sure to provide the correct syntax required by SSML. Note that it is up to the platform or TTS engine, respectively, if and how SSML is interpreted. If a TTS engine does not support SSML, it might decide to read the XML tags out aloud.
The Audio object is represented by the VoiceObjectsXML element <audio>. It has six attributes and two groups of children.
In addition, the element has the standard attributes described in the XDK Guide.
The <audio> element uses the embedded <altText> element.
· location
Defines the location from where the audio file is to be retrieved. Must be a reference to a Resource Locator object.
· file
The file name of the audio file, without extension. Can be a constant name, or a reference to a Variable, Expression, or Script object.
· prefix
A prefix to be used together with the audio file name.
· suffix
A suffix to be used together with the audio file name.
· random
Indicates whether randomization should be used. Can be either disabled or an integer >= 2 specifying the number of versions in which the file is available. If not specified, defaults to disabled.
For further information on randomization see Random Variations above.
· audioFileExtension
Specifies the file extension for the audio file as either a constant value or a reference to a Variable, Expression, or Script object. Legal constant values are none, aif, aiff, alaw, alw, au, dwd, ivc, mp3, snd, ulaw, ulw, voc, vox, wav. If not specified, defaults to wav.
The complete filename for the audio file is generated at call time according to the following formula:
|
[prefix] + file + [suffix] + [random number] + [extension] |
· <expression usage=precondition> or
<variable usage=precondition> or
<collection usage=precondition> or
<script usage=precondition>
Defines the precondition for the Audio object.
· ?<parameterSet>
Defines the parameter set.
· *<altText>
Defines the corresponding alternative text(s).
<audio location=#Prompt locator file=welcome audioFileExtension=wav/>
<audio location=#Prompt locator file=nomatch audioFileExtension=wav random=5/>
· +<item>
Defines the list of entries in the parameter set. The use of the alias attribute is optional when the object attribute is used, and required when the valueattribute is used.
<parameterSet>
<item alias=prompt object=#Prompt/>
<item alias=speed value=0.5/>
</parameterSet>
· object
Defines a reference to an object. If this attribute is defined, the value attribute must not be defined.
· value
Defines a constant value. If this attribute is defined, the object attribute must not be defined.
· alias
Defines the parameter alias.
<item alias=prompt object=#Prompt/>
The <altText> element defines the alternative texts to be used for the Audio object. It may only occur embedded inside an <audio> element.
The <altText> elementhas six attributes and contains text as a child.
· label
A text string providing a name for the alternative text.
· random
Defines the random index to which this alternative text belongs. An integer >= 1. If not specified, defaults to 1.
· language
Defines the language for which this alternative text is valid. Can be default or a valid language code (e.g. de-DE, en-US, etc.). If not specified, defaults to default.
Appendix A – Language Codes contains a list of all language codes available in VoiceObjects together with the respective language they represent.
· inputMode
Defines the input mode for which this alternative text is valid. Can be default, voice, dtmf, or voicedtmf. If not specified, defaults to default.
· layer
Defines the layer for the alternative text. Can either be a reference to a Collection, Expression, Script, or Variable object; or a layer state reference of the form “Layer=State or “Layer!=State where “State is the label of a state for the layer “Layer.
· object
Optional reference to an Expression, Variable, or Script object that defines the alternative text.
· CDATA
The static alternative text itself. Note that the text may not contain special characters such as apostrophes, quotation marks, etc. It may, however, contain XML markup to be used e.g. in combination with SSML.
<altText language=de-DE>
Bitte geben Sie Ihre Telefonnummer ein.
</altText>
<altText random=1> Sorry? </altText>
<altText random=2> Once more, please. </altText>
<altText random=3> I didn’t quite get that. </altText>
The following table contains all object types that can reference an Audio object:
|
Icon |
Object Name |
Use Case Example |
|
|
An Audio object can be referenced within an Output object definition. In addition, an Audio object can be referenced in all objects that provide Output items. |
|
|
|
An Audio object can be referenced within the Content Formatting definition of a List object, to play back pre-recorded audio as part of the List content. |
|
|
|
An Audio object can be linked within a Connector object as the Wait Loop Audio. |
|
|
|
An Audio object can be linked within a Database object as the Wait Loop Audio. |
|
|
|
An Audio object can be linked within a Script object as the Wait Loop Audio. |
|
|
|
An Audio object can be linked in the parameter set within any OSDM object. |
In order to leverage the capabilities of the integrated documentation of VoiceObjects it is important to provide intuitive and self-explanatory object names and descriptions.
The name of an Audio object should be derived from the name of the Output object it is used in. When used in multiple outputs, its name should provide information about what is being played. The table below lists three examples:
|
Name |
Description |
|
|
Plays first level help for the Input object Get claim reference number |
|
|
Contains five alternatives for a generic first level No Input prompt (e.g. Sorry?) |
|
|
Short jingle played every time the caller returns to the main menu |