9         How to Use Different Input Modes

This chapter describes how to build input mode-specific applications within VoiceObjects.

The notion of input mode refers to the way a caller input is retrieved in voice applications and therefore is relevant for the voice channel only. The input mode can be either voice, DTMF or both.

About Input Modes

Traditional IVR systems used to be controlled by DTMF (touch tone), whereas modern applications are predominantly controlled by speech commands with the possibility to revert back to DTMF in certain cases. The latter could, for example, be necessary when logging into an account and entering sensitive data such as a PIN, or when calling in a noisy environment that makes speech recognition almost impossible.

With VoiceObjects, applications can be designed that provide voice and/or DTMF control, with the possibility to switch between both input modes dynamically at call time, either determined by the application, the environment, or the caller. The application could, for example, leave the choice of which input mode to use up to the caller, by asking for it at the beginning of the dialog. Or it might automatically set the input mode to DTMF after a certain number of No Match events have occurred in the dialog.

Setting the input mode to voice or DTMF only can also affect the wording of the prompts. In a news portal, for example, the main menu prompt that lists the possible choices might read “This is the main menu. Please say news or press one, say sports or press two, or say weather or press three.” In this case, both voice and DTMF are possible input modes. The following prompt allows voice as the only input mode: “This is the main menu. We offer news, sports, and weather. Please choose.” And the same prompt in a DTMF-only application might read “This is the main menu. For news, press one. For sports, press two. For weather, press three.

The Input Mode Layer

Creating different prompts that correspond to the currently active input mode is supported by VoiceObjects by providing a system layer input mode, which can be in one of the states voice, dtmf, and voicedtmf. The initial state of this layer is set in the service definition of your application. For more information on this, refer to Default parameters in Chapter 2 – Configuring Servers and Services in the Deployment Guide.

The system layer input mode can be switched at call time through the Expression function INPUTMODE([mode]). If called without an argument, it returns the currently active input mode. If the argument is voice, dtmf, or voicedtmf, the function sets the input mode to the corresponding value. For more information on how to use this function, see the Expression object in the Object Reference.

As described above, the setting of the system layer input mode can affect both input and output of an application. If the mode is set to voice or DTMF only, the server writes the tuning property inputmodes into the VoiceXML code (see Tuning in the Object Reference for details on this), which tells the media platform to turn off the recognition of the inactive mode. This means that if input mode is set to voice, any DTMF input by the caller is ignored and barge-in using DTMF is not possible. Vice versa, if DTMF is the only currently active input mode, the media platform ignores any voice input made by the caller, which might be a good option in noisy environments. At the same time, the server only includes grammars for the active input mode. This means, that if the grammars used in your application are defined separately, using the Voice and DTMF sections of the Grammar editor, then only those grammars are included that correspond to the currently active input mode. For more information on how to define grammars, see the Grammar object in the Object Reference.

The selection of an Output item in (embedded) Output objects by the server at call time can also depend on the input mode. Every Output item contains an input mode parameter that can be set to Voice, DTMF, Voice+DTMF, or Default. Leaving the setting at Default means that the current setting from the dialog context is used. Thus, if your application does not distinguish between input mode-specific prompts, you do not need to touch this setting. Otherwise, tagging an Output item with an input mode influences its selection at call time. This means that if the system layer input mode is set to voice, for example, then only those items that have their input mode parameter set to Voice are eligible for playback. Likewise, if input mode is set to voicedtmf, then only the Voice+DTMF items are considered. See the Output object in the Object Reference for more information on how to set the input mode for an Output item.

Tracking of Input Modes

In applications that allow both voice and DTMF input, it might be of interest which mode was used by the caller for the last input, both on a global base for statistical reasons, but also in certain dialog steps, where this may influence the next prompts played or even the further dialog logic. Infostore, the logging component of VoiceObjects Server, can track the input modes used on an input state level. For more information on this, refer to the Infostore Guide.

During the dialog, the last input mode used can be determined by calling the LASTRESULT(type, [index]) function. If called with inputmode as first argument, it returns voice or dtmf, depending on the input mode used. See the Expression object in the Object Reference for more information on the use of this function.

Use Case: No Match Event

A typical use case of distinguishing between prompts depending on the last input mode is the No Match event. If both voice and DTMF are possible modes for the caller to respond to a question, any No Match prompt should pay attention to the input mode used. If, for example, the system asks for a PIN number, the caller can say it or type it in on the phone’s keypad. If the caller used speech, a No Match prompt could read “I’m sorry, I couldn’t understand you. Please give me your four-digit PIN.” If the caller used DTMF instead, the system could respond with something like “Sorry, the number you typed in is not a valid PIN. Please enter a valid four-digit PIN.” This can be achieved by using input mode-specific No Match event handlers. See Event Handling in the Object Reference or Input Mode-Specific No Match Handling in Chapter 8 – Advanced Event Handling for more information on this.