Recording

Overview

The Recording object is a special kind of input that collects an audio or video message from the caller. The caller is prompted for a message which is recorded with the Recording object. After a successful recording session, the recorded audio or video stream can be stored as a single file on a file server. If required, recordings stored on the server can afterwards be written to a database using a Connector object.

The Recording object provides multiple termination options which apply either automatically or on demand by the caller. A recording session ends when one of the following conditions is met: the specified interval of final silence occurs (only for audio recording), the maximum recording duration is reached, a DTMF key is pressed by the caller (if DTMF recognition is activated), or the caller hangs up.

Due to its nature the Recording object is only relevant for the voice and video channel; in the text and Web channel it will not be processed. For further information on channels refer to Chapter 10 – How to Support Multiple Phone Channels in the Design Guide.

The Object Definition below covers the configuration of the Recording object with VoiceObjects Desktop. For information on how to define this object type using VoiceObjectsXML refer to the VoiceObjectsXML Definition paragraph.

The Recording object belongs to the object category Actions.

Dialog Flow Scenario

The first dialog flow example demonstrates a video recording session within a video mail service. The caller is informed that the called party is not available and is asked to leave a message. The message will be recorded and the recorded video file is stored in a message store.


Object – Caller

Dialog Flow

You have reached the video mail of Tim Miller.
 I’m currently not available to take your call.
Please leave a message after the beep.

 [ Beep (generated by media platform) ]


Caller

Hi Tim! This is Jeff, I would like to tell you  … 

Bye!

< Recorded message is stored as a video file >

 

The next dialog flow illustrates the use of a Recording object as part of a personal voice calendar service. It describes how a caller adds a new appointment to the personal calendar by using two subsequent Recording objects to record the subject and details of the appointment. Instead of a standard beep tone a special sound is used before the recording starts. In the continuing dialog flow the recorded audio files are replayed to the caller for verification as well as a final confirmation including the date and time, which is recognized with an Input object using multi-slot recognition.


Object – Caller

Dialog Flow

< Connecting to the database system to load the caller profile and the personal calendar based on the supplied caller’s number (ANI)  >

Hello Susan!

Welcome to your personal voice calendar.

Please say or key in your five-digit PIN.


Caller

7 4 7 0 1

What can I do for you?


Caller

Add an appointment for
tomorrow morning from nine to ten.

Enter the appointment subject.

[   Special Sound ]


Caller

Office of Bob Summerfield

< Recorded subject file is stored >

Now enter the appointment details.

[   Special Sound ]


Caller

Clarify the legal issues with Bob …

< Recorded details file is stored >

Do you want to hear a replay?


Caller

Yes

Text

Subject:

*

[Recorded Subject file]

Text

Details:

*

[Recorded Details file]

Do you want to change anything?


Caller

No, thanks.

Text

The new appointment on

*

June 30th [Variable]

Text

From

*

9 A.M. [Variable]

Text

To

*

10 A.M. [Variable]

Text

has been added under the subject

*

[Recorded Subject file]

Do you want to add another appointment?  …

Object Definition

The Definition of the Recording object provides the following sections:

·          Recording Request
To specify optional output to prompt the caller to say something.

·          Processing
To configure the recording operation behavior.

·          Parameter Set
To optionally specify additional parameters.


 

For information regarding additional object configuration refer to Pre-/Postprocessing, Event Handling, Tuning, and Properties in this Object Reference.


Recording Request

In the Recording Request section, an optional output can be specified that is played to the caller before the recording session starts. It is typically used to prompt the caller to leave a message or to say something.


 

By default, a beep tone is provided by the media platform (see Processing) and should not be included inside the output definition. If a beep tone other than the default one is desired, an Audio or Video object can be included as an alternative at the end of the output stream (see the voice calendar example in Dialog Flow Scenario). In this case, the automated beep tone must be disabled, otherwise both are played back to the caller (see Processing).

For more details on output definitions see the Output object in this Object Reference.

Processing

The Processing section specifies the processing behavior of the Recording object. It consists of three parts – the first one specifies the handling of the recorded file, the second one is about timing constraints, and the third one about processing options.


 

The first part of the Processing section provides the following properties to specify the handling of the recorded file:


Property

Description

Recording type

Offers the two choices Audio and Video. Depending on the type of recording needed in your application, select one of the two entries.

File naming

Offers two different ways of supplying a filename for the recording file:

Generate Unique ID: The server dynamically generates a unique name including an extension for the recording file and provides this name in the Variable object specified in the File field (see below). The generated ID is a 40-character string. The filename is constructed as a concatenation of the ID and the file extension (typically .wav for audios and .3gp for videos) corresponding to the active media platform driver specification.

Example: OVAPac16174800000000000013b0000000f71aa0903e.wav

Defined by Object: The filename without an extension (this is set by the current media platform driver) is provided with the value of an Expression, Layer, Script, or Variable object that is specified within the File field.

In case of audio recordings, the recorded stream is typically transferred to the server and then stored as a file. Video recordings, on the other hand, are typically stored directly on the media platform, without being sent to the server. This depends on the active media platform. In any case, the file is stored in the location specified through the Location and File settings (see below) of the Processing section.

i8  Note: If a file with the given name already exists at the specified location, it will be overwritten. It is crucial that the storage of recorded audio or video files works in concurrent call sessions, which requires that the user-defined generation of filenames (defined by object) combined with the corresponding file location must be unique for each call session.

Location

Specifies a Resource Locator object that defines the file server and the corresponding directory into which the recorded audio or video stream is stored (Physical path field in the Resource Locator object definition), as well as the URL by which those recorded files can be accessed at call time for play back (URL field). The latter can, for example, be used to offer the caller a replay of the recorded message and if desired a new recording session which replaces the previous message (see second Dialog Flow Scenario).

i8  Note: Some media platforms can store the recorded files locally, without transferring them to VoiceObjects Server, which will improve the processing performance of the Recording object. For more information on which platforms do this, refer to Appendix A – Media Platform Drivers in the Deployment Guide.

File

Specifies the name of the audio or video file to be stored and is supplied with the value of a Variable, Expression, Script, or Layer object. It can also serve as a container in which VoiceObjects Server stores the unique filename it generated (see above). In this case only a Variable object can be used to store the filename.

The audio and video file formats in which the recording is stored depend on the media platform used. For most of the platforms supported by VoiceObjects, it is a wav file with the file extension .wav for audios, and a 3GP file with the file extension .3gp for videos.

 

To play back a recorded video file, create a Video object and link the Variable object specified in the File field of the Recording object into the File field of the Video object, setting Extension to None. Then embed the Video object in an Output object and insert it into the dialog flow.
Recorded audio files can be played back directly by embedding the Variable object in an Output object and using the Formatting to play the content back as an audio file (through the text-to-audio feature of the Format object). See the Audio and Format object for more information on this.

The second part of the Processing section provides the following properties to specify timing constraints:


Property

Description

Maximum duration

Limits the maximum length of the recording session. The interval is a time designation and is defined to begin immediately after the playback of the Recording Request content (including the generated beep tone if enabled). By default, the value for the maximum duration is set to 1 Minute. When using an Expression, Layer, Script, or Variable object to define the duration interval at call time, or when typing in a custom value instead of selecting one of the values from the drop-down list, the value needs to be a positive number and will be applied as seconds, e.g. for a custom duration of 5.5 minutes, type in 330.

@8  Tip: If the caller speaks longer than the time interval defined in the Maximum duration field, the event ASR – Max Processing Time is activated in case a corresponding event handler is defined. A typical use case would be an event handler that first tells the caller not to exceed the maximum recording duration (by linking an Output object in the handler) and then restarts the recording (by setting Continuation to Return).

i8  Note: As mentioned above, a recording session begins immediately after the playback of the recording request including the optional beep tone. As a processing optimization, some media platforms may begin the recording session of an audio message when the caller starts speaking. This is dependent on the individual media platform (more details can be found in the specific documentation of the platform).

Silence timeout

Specifies the time interval of silence required to trigger the termination of the recording session. It indicates to the media platform that the caller has stopped speaking, which will terminate the recording session. Note that this will only work for audio recordings and is again depending on media platform support. The default value of the silence timeout interval is 2 Seconds. When using a Variable, Expression, Script, or Layer object to define the silence timeout interval dynamically at call time, or when typing in a custom value instead of selecting one of the values from the drop-down list, the corresponding value needs to be a positive number and will be applied as seconds, e.g. for a custom timeout of 5.5 minutes, type in 330.

i8  Note: When a caller hangs up the phone while the Recording object is being processed by the media platform, the recording session terminates automatically. However, the audio or video stream recorded until hangup is available and will be stored at the specified file location as usual, despite the call disconnection.

Standby timeout

Specifies the maximum time for which the dialog session is kept alive during the recording process. By default this timeout is set to Default, which indicates that the value is derived from the service. For more details, refer to Chapter 2 – Configuring Servers and Services in the Deployment Guide. When using a Variable, Expression, Script, or Layer object to define the standby timeout dynamically at call time, or when typing in a custom value instead of selecting one of the values from the drop-down list, the corresponding value needs to be a positive number and will be applied as seconds, e.g. for a custom timeout of 5.5 minutes, type in 330.

i8  Note: If a caller records for a period longer than the standby timeout, a return to the session is not possible. Thus designers need to make sure that the standby timeout is at least as long as the maximum duration.

 

The last part of the Processing section specifies options related to the recording:

Enable the playback of a beep tone before the recording begins allows you to enable or disable the playing of a beep tone generated by the media platform before starting to record any input (typically only available for audio recordings). By default the check box is selected which means that the beep tone is played. Support for this depends on the active media platform. If a platform does not support the playback of a beep, the setting is typically ignored.

Enable the termination of the recording with DTMF input specifies whether the caller can terminate the recording session by pressing any DTMF key on the phone. You cannot define a specific DTMF key to end the recording. By default the check box is clear, which means that the termination by means of a DTMF key is not available to the caller.

i8  Note: It is also possible to link Expression, Layer, Script, or Variable objects to define the beep tone and termination behavior dynamically at call time. In this case the values of the linked objects need to evaluate to true or false.

Parameter Set

The Parameter Set section defines an optional list of parameters that are passed to the media platform. These parameters allow designers to make use of special extensions to the recording functionality that are provided by a number of platforms.


 

The example shown here applies to the VoiceGenie platform and indicates that there may be an initial silence of 100ms before the caller needs to speak, and that the entire recording needs to be at least one minute long.

The parameters defined in the Parameter Set section are placed into the markup as attributes of the <recording> tag, so the VoiceXML code rendered by the server for the example above looks like this:

<record beginsilence=”100ms” mintime=”60000ms”> … </record>

Meta Information on Recording Result

After a recording has been made, the media platform provides additional information on the recording result. This information is being made available using the LASTRESULT function through an Expression object, after the Recording object has been processed.


Function

Description

LASTRESULT (duration)

Returns the duration of a recording in milliseconds.

LASTRESULT (size)

Returns the size of a recording in bytes.

LASTRESULT (utterance)

Returns the DTMF key pressed, if the option Enable the termination of the recording with DTMF input is selected and the caller presses a DTMF key to terminate the recording.

LASTRESULT (maxtime)

If the recording exceeds the time limit specified in the Maximum duration field, this function returns the value true, otherwise false. Note that this is also realized as an event that can be handled with the ASR – Max Processing Time handler without leaving the Recording object.

Usage Examples

Several usage examples of the Recording object are listed below:

·          Voice mail or video mail service

·          Video or audio dating application with recordings for personal ads

·          VoiceGraffitiWall to post messages that other callers may listen to or watch

·          Voice calendar service using the Recording object for new appointments made over the phone

·          Generic VoicePad module to record and manage personal notes

·          Voice banking service where the caller can create transaction templates and save them with a recorded name and description. The template can then be reused to transfer money in ongoing call sessions.

·          Voice Forums and transcription applications

VoiceObjectsXML Definition

The Recording object is represented by the VoiceObjectsXML element <recording>. It has eight attributes and five groups of children.

In addition, the element has the standard attributes described in the XDK Guide.

Recording

Attributes

·          fileNaming
Specifies whether the file name for the recording is to be generated (generate), or whether it is provided by an object reference (object). If not specified, defaults to generate.

·          location
Defines the location where the recorded audio file is to be stored. Must be a reference to a Resource Locator object.

·          file
The file name of the audio file. Must be a reference to an Expression, Layer, Script, or Variable object. If fileNaming is generate, it must be a variable and the generated file name is written into this variable.

·          recordingType
The type of recording required. Either audio or video. If not specified, defaults to audio.

·          maxDuration
Defines the maximum length of the recording. A numerical value interpreted as seconds. If not specified, defaults to 60 (one minute).
May be static text or a reference to an Expression, Script or Variable object.
For details see Maximum Duration above.

·          silenceTimeout
Defines the length of silence after which the recording is considered to be complete. An Integer value between 1 and 10, interpreted as seconds.
May be static text or a reference to an Expression, Script or Variable object.

·          standbyTimeout
Defines the standby timeout for the Recording object. Either default or a numerical value that is interpreted as seconds. If not specified, defaults to default.
May be static text or a reference to an Expression, Script or Variable object.

·          playBeep
Defines whether a beep tone is played before the recording starts. Either true or false. If not specified, defaults to true.

·          dtmfTermination
Defines whether the caller may terminate the recording by pressing any DTMF key. Either true or false. If not specified, defaults to false.

 

Children

·          <expression usage=”precondition”> or
<variable usage=”precondition”> or
<collection usage=”precondition”> or
<script usage=”precondition”>
Defines the precondition for the Recording object.

·          <sequence usage=”preprocessing”>
Defines the preprocessing sequence for the Recording object.

·          <output>
Defines the recording request.

·          <parameterSet>
Defines the parameter set for the Recording object.

·          <eventHandling>
Defines the event handling for the Recording object.

·          <tuning>
Defines the tuning settings for the Recording object.

·          <sequence usage=”postprocessing”>
Defines the postprocessing sequence for the Recording object.

 

Example

<recording dtmfTermination=”true” location=”#Recording locator” file=”#Recording”>

  <output>

    <outputItem>

      <text>Please record your message after the tone.</text>

    </outputItem>

  </output>

</recording>

Object Interoperability

The table below lists all object types that can reference a Recording object:


Icon

Object Name

Use Case Example

Sequence

A Recording object can be referenced within a Sequence object.

Menu

A Recording object can be used within the Menu object – one menu choice offering a recording session.

If

A Recording object can be either in the THEN item or the ELSE item of an If object.

Case

A Recording object can be used in a WHEN item or the ELSE item of a Case object.

Goto

A Recording object can be referenced within a Goto object.

Hyperlink

A Recording object can be the destination of a Hyperlink object, which can be activated on demand by the caller through a specific voice or DTMF command.

Object Naming Conventions

In order to leverage the capabilities of the integrated documentation of VoiceObjects it is important to provide intuitive and self-explanatory object names and descriptions.

The name of the Recording object should describe what the caller is expected to do, as well as the type of recording (Audio or Video). The name may contain additional hints on the maximum recording length or other options like automatic beep tone generation or enabled DTMF termination. The short description should provide further information on how the recorded message is handled and processed, for example any conventions for the filename or the file location. The table below lists a few examples:


Name

Description

 Leave a Video Message (Beep)

Caller can leave a video mail message after the beep tone.

 Enter Subject [30s]

Subject of the appointment with a maximum length of 30 seconds. The audio file is stored with the following filename convention: [<AppointmentID>_S.wav]

 Enter Details [5min]

Details of the appointment with a maximum length of 5 minutes. The audio file is stored with the following filename convention:
[<AppointmentID>_D.wav]

 Record Dating Ad (Jingle)

Caller can record a personal dating ad after a short jingle is played back. The recording can be terminated with any DTMF keystroke.