10   How to Support Multiple Phone Channels

This chapter provides an introduction to the phone channels supported by VoiceObjects and describes how the VoiceObjects platform has been modified to support multiple channels.

For further information on how to build applications for multiple phone channels refer to How to Build Applications for Multiple Phone Channels in Chapter 7 – How to Use Layers.

About Phone Channels

The notion of channel is predominantly used in the call center environment, where customers can interact with a company using various different communication channels like e-mail, fax, phone, or Web.

The term phone channel however concentrates on the different ways of how customers can use their phone or mobile device to communicate with a company.

VoiceObjects currently supports four different phone channels:


Phone channel

Description

Supported platforms

Voice

The voice channel is the “traditional” channel that customers use in automatic systems, i.e. IVR systems. Interaction takes place through voice or DTMF input and audio or TTS output, powered by media platforms that support VoiceXML, the technical foundation of modern voice applications.

Traditional media platforms using VoiceXML, like Avaya VP, Genesys GVP, etc.

Video

The video channel is similar to the voice channel, but an application typically uses prerecorded video files instead of audio files to interact with the caller. Input is still done through voice or DTMF. Video applications also run on media platforms on the basis of VoiceXML. They require 3G mobile devices.

VoiceXML-based media platforms with additional support for video, like Alcatel Multimedia Browser, HP OCMP Video, etc.

Text

The text channel can operate on different underlying bearer protocols.
USSD (Unstructured Supplementary Service Data), a texting protocol supported by GSM networks. This allows users of ordinary GSM-based cell phones to interact with an automated system on a purely textual basis, with the look & feel of a text message (SMS, Short Message Service). The advantage of this channel is the availability on virtually all GSM-based handsets and the simple and fail-safe usability.
SMS (Short Message Service). Dialogs are started by sending an initial text message to a given short code or phone number; the VoiceObjects application then returns text-based dialog messages, to which the user can respond by sending another message, and so on.
IM (Instant Messaging). VoiceObjects applications can run as “chat bots” on any supported chat network. These include Yahoo IM, Skype, Google Talk, AOL, MSN, and Jabber.
Twitter (www.twitter.com). Posting of messages on a twitter account and retrieving responses over retweets or direct messages can be controlled by a VoiceObjects application.
Text applications allow free entry of text, which traditional IVRs do not support due to the nature of speech recognition.

USSD is supported by USSD browsers that use a proprietary markup language for content description, as no standardization is available yet. Examples are Cellicium’s Cellcube, Materna’s AnnyWay SS7 Data Gateway, or Sicap’s USSD Menu Browser.
SMS, IM, and Twitter dialogs can run on Voxeo’s IMified platform, which supports Yahoo IM, Skype, Google Talk, AOL, MSN, and Jabber. In addition, a flexible adapter approach allows the integration of other SMS and IM gateways.

 

Web

The Web channel uses interactive Web pages that are displayed on browsers embedded in modern cell phones. The technical foundation of this is therefore HTML, the standard markup language used in the Web to code content. Web pages support the display of text & images, using rich layout capabilities. The caller can interact with such an application through forms (input fields, buttons, check boxes, radio buttons, etc.) and hyperlinks. By using the Web channel, also called Mobile Web, companies can retain any corporate identity (i.e. look & feel) they have developed for the World Wide Web.

Any Web browser installed as a client tool on a mobile device that supports HTML.


VoiceObjects supports applications that are built for one of these phone channels only, as well as such that can serve any combination of channels. A company can therefore leave the choice of which channel to use up to the customer, as different customers typically have different preferences towards the various channels.

As an example, consider an automated banking service. One customer might use it while driving a car and therefore prefer the voice channel to request the current account balance or to perform a transaction by use of speech recognition. Another customer might be sitting in a train and prefer not to speak. They can choose the “silent” text or Web channel to perform the same tasks as offered through the voice channel.

While this flexibility is a gain in user experience and acceptance for the customer, it also provides important advantages for the service provider. The application that powers all these channels is defined and deployed once using the VoiceObjects platform, with shared business logic, back-end access etc. The only thing that differs usually is the output (how the callers are addressed) and potentially the grammars (what the caller can say or type as a response to a prompt). The call flow design is identical in most cases. The application can be deployed on VoiceObjects Server as a single service, accessible from various different platforms (see table above). Using Infostore and VoiceObjects Analyzer, unified reporting and analysis across all deployed channels becomes possible. This provides a comprehensive view of caller behavior across the different channels. As an example, one and the same report can provide consolidated information on all calls made to a service, independent of as well as depending on the channel.

Application Examples

This paragraph provides examples of how a text or Web application can look like.

Text channel

Text applications based on USSD do not have any means of formatting or layout. They can display text and use line breaks to structure the output a little. Most applications will be menu-driven, as it is best practice to let the caller navigate through menus and selections rather than to ask the caller for free input. A typical display of a menu, built with the Menu object, might look like this:


 

This is a menu with four items. Each item is represented by an access key, i.e. the key the caller must respond with to select the item (e.g. “1”), and the actual option name (e.g. Service plans).

To respond to this menu, the caller needs to press the Reply key on their phone, enter the response (e.g. “1”) and then press Send:


 

The phone shows some form of Please wait while submitting the value and requesting the next page of the server:



i8  Note: The look & feel of USSD-based applications highly depends on the mobile device used. It is not standardized.

Free entry of text works the same way as “selecting” a Menu item: by pressing Reply, entering the response, and pressing Send.

Output objects that simply present a message and provide no further interaction with the caller are displayed with a default “menu item” to proceed to the next dialog step.


 

The access key for this item is always “1”, whereas the option name is “>>” by default. This value can be overwritten using the tuning parameter Presentation – Proceed Label. See New tuning parameters for more information on this.

Note that this extra step means an additional interaction with the caller, which is discouraged in the text channel.

Web channel

The Web channel works on the basis of (X)HTML, the standard markup language for Web content. It offers a wide range of layout capabilities, plus cascading style sheets (CSS) to define the layout independently of the content.

The example from the text channel could look as follows when called with a (mobile) Web browser in the Web channel:


 

Note that a logo was included at the top of the page, the header is printed bold, and the options are hyperlinks that the caller can select and activate using the navigation buttons on their phone.

In an Output object, the option to proceed to the next dialog step is shown by a button:


 

Phone Channel Support with VoiceObjects

The support for multiple phone channels is tightly integrated into the VoiceObjects platform. The following sections summarize how the platform has been changed from the 6.1 to the 7.1 version to accommodate the enhancements required to support this feature.

New media platform drivers

Every media platform driver works on exactly one channel. If a platform supports more than one channel, e.g. voice platforms that can also play video files, two different drivers with the same configuration exist. This is the case for the following platforms:

·         I6NET VXIasterisk 1.5 (Video)

·         NMS Vision VoiceXML Server 2.1 (Video)

·         Voxpilot Open Media Platform 2.5-3.2 (Video)

Most of the existing drivers work on the voice channel, a few on the video channel. The following drivers support the text and Web channel:


ID

Platform name

Channel

101

Cellicium Cellcube 3.6

Text

126

Materna AnnyWay SS7 Data Gateway 2.0.1

Text

102

Sicap USSD Menu Browser 3.2

Text

132

WindMobile UXML-HTTP Interface Handler Module 1.0

Text

103

Mobile Web XHTML 1.0

Web

134

Apple iPhone Web XHTML 1.0

Web

135

Rich Web Client XHTML 1.0

Web


The channel a service runs in is defined implicitly through the driver. There is therefore no extra setting, e.g. on the Service object, that specifies the channel. To indicate which channel a driver works on, the channel information has been prefixed to the names of the drivers in the Service object and in the tuning item:


 

Only one driver can be defined in the Service object. When deploying applications that support more than one channel, each platform calling the server must include the URL parameter vsDriver in the initial request. This way the session will be started for a specific driver and channel. See the section on Service URL Configuration in Chapter 4 – Service Deployment in the Deployment Guide for more information on using the vsDriver parameter.

i8  Note: Neither driver nor channel can be changed during an ongoing call.

New system layer “channel”

VoiceObjects provides predefined layers that are frequently used in phone applications, the so-called system layers. These used to be language and input mode. To better support the multiple channel approach, a new system layer channel is now available. It is added to all areas where a distinction based on the channel might be desirable. These are:

·         Output item

·         Grammar item

·         Column group item (in the List object)

·         Format item (see Format object below on enhancements made to this object)

·         Event handling item

·         Custom navigation item

·         Tuning item

The following is an example of an output embedded into an Input object:


 

It shows the definition of two Output items that have different prompt definitions based on the channel. The first item is defined for the voice channel, with a reference to an Audio object. The second item has a definition that is valid for the text and Web channel, through a combined value of the channel layer Text/Web. Depending on the channel the session is currently running in, the server will select the correct Output item when processing this Input object.

Input mode handling

The mode of caller input in a voice or video application can be voice or DTMF. Prompts typically vary by input mode, and grammars are inherently different. Therefore input mode is a system layer available in Output items and a grammar is defined in the respective sections, voice or DTMF. In the text and Web channel, caller input can only be typed in, i.e. in the form of text. But since it is not the case that prompting differs by input mode within a text or within a Web application, but rather by channel, the input mode layer and the corresponding dialog context function INPUTMODE have not been adjusted to incorporate text. The possible states of the layer are still voice, dtmf or voicedtmf. The only place where text is found as a value of input mode is Infostore, which logs it as such for reports with VoiceObjects Analyzer.

In a text or Web application, the input mode setting should be ignored. Any output definition should leave the layer at Default. When running a service in these channels, the system layer input mode will be in the voice state. This can be changed by neither the Service object setting nor the initial request parameter vsInputMode. Both of these settings will be ignored if the current channel is text or Web, and the layer will be forced to be in the voice state.

Object enhancements

Module object

The Module object has been enhanced to include the definition of a title, through an embedded Output object. The title output was mainly introduced for the Web channel, so that a page title can be defined for the HTML pages generated by the server. A Web browser typically shows it as a kind of header at the very top of the screen, separated from the body content by some special layout.


 

The title is considered in each subordinate object of the module (plus the Module object itself, i.e. for the welcome and goodbye message). In the voice, video and text channel, it is played or displayed before the main prompt of each object, i.e. before the input request of the Input object, before the welcome message of the Menu object, etc. It is not played in embedded objects like outputs in event handling or navigation. Note that the main output of an object directly follows the title output in the text and Web channel, with no spaces or line breaks added. To define a line break in the text or Web channel, use the pipe symbol ‘|’, as described in more detail in the Output object below.

For more information, see the Module object in the Object Reference.


Input object

The Input object was not modified to accommodate support for multiple channels. The existing properties of the object are interpreted in different ways depending on the channel, though.

The event reprompt definition is only considered in the voice and video channel. Any definition made for the text or Web channel will be ignored.

The grammar in a text or Web application is also used to define valid input from the caller, but the definition and the processing of the grammar can be different.
First of all, grammar definitions in the text and Web channel must be provided in the TTG format on the voice section, which is now called Voice/Text. Any definition on the DTMF section, any external definition and any embedded non-TTG definition will be ignored.
As the text and Web channel support free entry of text, it is possible to not provide a grammar definition in these channels, which is not allowed in the case of voice or video applications. For technical reasons, a grammar must be defined so that the Input object is valid. If you want to define such an “open-ended” grammar, use the question mark symbol “?” as the only grammar definition.
When defining a proper TTG grammar in the text or Web channel, slot values are still supported. For small grammars with a manageable number of entries, it is best practice to display the available options, so that the caller can simply pick one instead of typing it in. In the text channel, options can be shown as a menu with access keys and option names. In the Web channel, options can be shown as a set of hyperlinks, as a group of radio buttons, or as a dropdown list. If the current grammar has too many entries, the grammar should not be displayed at all; instead the caller should type in the response manually. The server will decide how to show the grammar on the basis of the number of entries in the current grammar. If you do not want to rely on this, use the tuning parameter Presentation – Input to control the display.

For more information, see the Input object in the Object Reference.


Output object

In the text channel, the only available layout option is a line break. This can be achieved by inserting a pipe symbol ‘|’ inside the Output item definition. Manual carriage returns will not help. If you want to show the pipe symbol ‘|’ in the display instead of triggering a line break, use a double pipe ‘||’. If you want to have two or more line breaks, add blanks between the pipe symbols.

Only 182 (in some countries 91) characters can be transmitted in one USSD page in the text channel. It depends on the USSD browser used whether it will split the page and allow the caller to see all content by navigating from one page to the next using a special “Next” option. If a browser does not support this it will most likely cut off the remaining characters. Be aware of this when writing prompts for your text application.

The Web channel offers much richer layout capabilities. To exploit this fully, you can both embed valid XHTML markup inside Output items and use cascading style sheets (CSS) for your pages. Use the tuning parameter Presentation – Style Sheet URL to link an external CSS to an object (typically to the root Module object to get a consistent look & feel, as tuning parameters are inherited).
While line breaks would be realized using the <br></br> element, the pipe symbol ‘|’ will also work in the Web channel.

For more information, see the Output object in the Object Reference.


Menu object

The Menu object was not modified to accommodate support for multiple channels. The existing properties of the object are interpreted in different ways depending on the channel, though.

The welcome message can be used in the voice and video channel to either just play an introductory prompt, or to play an introductory prompt followed by the presentation of the Menu items. In the text and Web channel, the welcome message must only hold an introductory text. The Menu items are presented using the individual Menu item outputs. In the text channel, the Menu item grammar will then define the access keys for the items. If you do not want to define explicit keys for the items, use the auto-numbering feature of the Menu object, which also supports the text channel. In the Web channel, the items will not be shown with access keys, but instead as hyperlinks the caller can activate. Menu item grammars are therefore not required in this channel.

For more information, see the Menu object in the Object Reference.


Confirmation object

The Confirmation object presents a summary of collected items and asks whether the information is correct. In the voice and video channel, using the active confirm or deny grammar is enough for the caller to respond to the confirmation request. In the text and Web channel, the caller needs to activate the confirm option explicitly, so it must be displayed on the screen. To provide an output definition for this, a new embedded output Confirm was added.


 

The deny option is not presented in these channels, as it does not make sense to first deny the information and then state which item was wrong. Since the correction item grammars are already active in the confirmation request phase (in all channels), the correction items will be presented together with the confirm option in the text and Web channel. Again, to be able to provide an output definition for each correction item, presentation outputs have been added to the items.

For more information, see the Confirmation object in the Object Reference.


List object

The List object presents items of a one-dimensional list or two-dimensional table, with the option to navigate between these items with commands like next, previous, jump to end etc. This concept of navigating a list works in all channels. Several adjustments have been made to the List object to support the text and Web channel.

The Navigation Outputs section provided Output objects to be played after the activation of a given navigation command. It has been enhanced to hold Presentation Outputs, in order to be able to show the commands in the text and Web channel.


 

For each command defined in the navigation grammar, a presentation output should be defined, so that the commands can be displayed and thus activated by the caller.
The activation outputs are only played in the voice and video channel and ignored in the text and Web channel.

In the Content Formatting section, the column groups now have the channel layer, to be able to make the presentation of a column group dependent on the channel.

In the Selection Commands section, commands can be defined to perform some action on the current List item. Again, to be able to display these commands in the text and Web channel, presentation outputs have been added.

For more information, see the List object in the Object Reference.


Format object

The Format object allows formatting values of dynamic objects like Variable, Expression, or Script. It was originally designed to format data for TTS or audio output (using TTA algorithms for the latter). Since formatting is highly channel-dependent, the Format object was made a set of items, so that one Format object can still be associated with an object like Variable, but different Format items can be defined based on channel, language, or custom layers.


 

Use the Formatting bus to include custom formatting algorithms for the new channels, as desired.

Applications built with a VoiceObjects version lower than 7.1 will be seamlessly upgraded to this new object model with the metadata upgrade.

For more information, see the Format object in the Object Reference and see Appendix B – How to Use the Formatting Bus in the Administration Guide.


Connector object

If no process notification output or wait loop audio are defined in the Connector object, the server renders an additional VoiceXML page in the voice and video channel, for technical purposes explained in the Connector object. This extra rendering step is skipped in the text and Web channel.

For more information, see the Connector in the Object Reference.


Custom and standard navigation

Hyperlinks and standard navigation commands must be displayed in the text and Web channel, just like navigation or selection commands in the List object, correction items in the Confirmation object, etc. For this reason, again a presentation output has been added to the Hyperlink object and each of the four available standard navigation commands.


 

The autonomous version of the Hyperlink object offers a confirmation step before performing the link. The caller is asked to confirm or deny the activation of this hyperlink. Again, to be able to display the confirm and deny command in the text and Web channel, outputs have been added for the two commands.


 

All hyperlinks are presented before all standard navigation commands by default. To change the order of when the corresponding outputs are presented, the Label field of the presentation output of each hyperlink and standard navigation command can be used to provide sorting criteria. On all labels, alphanumeric sorting will take place to finally come up with the order of the outputs.

For more information, see the Hyperlink object in the Object Reference.


Unsupported objects

A number of dialog objects are motivated for the voice and video channel only. Therefore, they will be ignored in the text and Web channel, i.e. if one of the following five objects occurs during a text or Web session, it will not be processed:

·         Plug-in

·         Pause

·         Recording

·         Silence

·         Transfer

Likewise, the “media” objects Audio and Video are only supported by voice and/or video drivers. If such an object occurs in a text or Web session, it will not be processed; nor will their alternative text definition be used.

Event handling

The VoiceObjects platform distinguishes between server-related and media platform-related events, as described in Event Handling in the Object Reference.

The voice and video channel support both of these event types, as media platform events are related to VoiceXML. The text and Web channel work with different kinds of platforms or browsers, which do not provide the same event model as VoiceXML or none at all. Therefore, the text and Web channel only support server-related event types. These are Error – Internal, Error – Script, and Error – Connector, plus custom events defined through a Hyperlink object or using the ACTIVATEEVENT function in the Expression object.

New tuning parameters

Eight new tuning parameters were added to support tuning the layout of pages in the text and Web channel. They only apply to these channels and are ignored in the voice and video channel.


Parameter

Short description

Presentation – Bottom Logo URL

Allows including a reference to an image file to be displayed at the bottom of the page in the Web channel.

Presentation – Input

Controls the way how the grammar of an Input object is to be presented. Note that the TTG format must be used for grammars in the text and Web channel.
It can be shown as a menu of items in the text and Web channel, or as a drop-down list or radio buttons in the Web channel. The alternative is to not show the grammar, but allow free entry of text instead.

Presentation – Input Maximum Length

Specifies the maximum allowed length of input fields in the Web channel, i.e. the number of characters the caller is allowed to type in an input field. By default, no maximum is defined.

Presentation – Menu

Specifies how the items of a Menu object are to be displayed in the Web channel. It can have the values "horizontal" (all Menu items in one row), "vertical" (all items below each other), "2columns" (two items per row), "3columns" (three items per row). The default is “vertical”.

Presentation – Proceed Label

Allows defining the label to be shown on proceed/submit buttons in the Web channel, and for the proceed option in the text channel.

Presentation – Slot Label

Specifies the labels to be shown in front of Input object slots (if any) in the Web channel. Labels can contain text or markup and can be defined per slot, using the following syntax: label1(slot1),label2(slot2), ... By default, no slot labels are defined.

Presentation – Style Sheet URL

Allows including a reference to a CSS file in Web pages.

Presentation – Top Logo URL

Allows including a reference to an image file to be displayed at the top of the page in the Web channel.


For more information on these tuning parameters, refer to Tuning in the Object Reference.

Debug Viewer

The Debug Viewer can be used as an off-line testing tool to view the markup code rendered for each step, and to interact with the application based on that code. It is mainly used in cases where no media platform is available for real tests.

In the voice and video channel, the Debug Viewer shows VoiceXML, highlights references to audio and grammar files, offers to activate hyperlinks by clicking the corresponding <submit> tags etc.

For the text and Web channel, the Debug Viewer has been adapted to show the markup code required for the respective drivers.
In the text channel, the markup of the text drivers Cellicium Cellcube 3.6, Materna AnnyWay SS7 Data Gateway 2.0.1, Sicap USSD Menu Browser 3.2, and WindMobile UXML-HTTP Interface Handler Module 1.0 is supported. Interactivity is provided by clickable tags like <a>.
In the Web channel, (X)HTML is shown. References to external style sheet and image files are highlighted and the files can be opened by clicking the tags (<link> and <img>). To navigate between dialog steps, use tags like <a>, <option> and <input>.

For more information see Debug Viewer in Chapter 4 – Service Deployment in the Deployment Guide.

@8  Tip: Applications in the Web channel can easily be tested end-to-end by calling the service in a Web browser of your choice, pointing it to the server URL with all usual parameters (like VSN).

Phone Simulator

The Phone Simulator can also be used as an off-line testing tool, aimed at simulating text and Web applications only. As opposed to the Debug Viewer, this tool does not show the markup code, but instead simulates the look & feel of a mobile phone and shows the application as it would be seen on a real device. It comes with three different skins, an ordinary mobile phone, a BlackBerry phone and an Apple iPhone. For more information see Phone Simulator in Chapter 4 – Service Deployment in the Deployment Guide.