Archive for the ‘Best Practices’ Category

Adapt-to-me, as I don’t want to adapt to you

Sunday, June 28th, 2009

Imagine a computer that not only understands what you say but also says it the way you can understand. Imagine a computer that you can talk to the way you talk to humans and that responds the way humans respond. Imagine a computer that can read your thoughts and communicate with you as seamlessly as you’ve always wanted it to, since you saw 2001. OK, keep on dreaming. 2001 is in the past, yet still in the future.

But indeed a HAL of a lot has changed since 1968. Now we can build machines that can reliably understand spoken commands, whole phrases or sentences, and react accordingly: provide timetables, transfer money, book tickets, or provide assistance with any kind of problem we might have in today’s life 2.0 (man how I hate this 2.0 thing by now!). And those machines are increasingly built in a way that they use human patterns of communication, allowing for more or less free speech, interactive turn-taking, and relatively natural-sounding computer voices.

Over 9 months ago – man, time flies like an arrow (hey, we can even build machines that understand the ambiguity of this sentence; apparently already way back around 1968) – I wrote my first article on Natural Dialog Management (also check out the 11/05/2008 jam session on this topic). I promised I’d continue on this, so here I am. Today I want to write about how you can make a voice application adapt to the caller with regard to their “speaking style”, the vocabulary they use, “how they speak”. Why should you do this? Think of a doctor trying to explain what’s wrong with you. If he or she doesn’t adapt his or her vocabulary to yours, you might just as well stay home and google the symptoms yourself.

Here are some examples where Adapt-to-me (as we like to call it at Voxeo) makes sense in speech applications:

Synonyms

  • If you are a provider of, say, landline telephony as well as high-speed internet, you might have callers calling into your helpline saying “I have problems with my Internet connection” at your first How-Can-I-Help-You input state. Your system might confirm this by saying “I understand you have problems with your DSL, correct?”. Your technology to provide internet might be DSL – but does your customer necessarily know that? How could she respond? Maybe with saying “No, Internet!”?
    Ouch…

Number patterns

  • Ever had the experience of giving out your phone number over the phone and hearing it back from your interlocutor in a way you didn’t even recognize your own number anymore? “My number? That’s six two nine three nine oh four.” – “OK, I’ve jotted that down, that’s sixty-two, ninety-three, nine fourteen?” – “Hang-on… let me think… err, yeah I think that’s it.”
    Ouch…

Date patterns

  • How do you say the expiration date of your credit card? If it was “12/12”, would you say “twelve twelve”, or “December twelve”, or “December two thousand twelve”, or …
    No ouch this time. This is just to demonstrate that there are numerous ways to speak dates, and using the same pattern as the caller when repeating their input again can help improve intelligibility of the system, thus cause less frustration, thus fuel acceptance of the overall application, thus increase revenue…?

So I say you can have the computer say “internet connection” instead of “DSL”, and “six two nine three nine oh four” (not even “six two nine three nine zero four”), and even “twelve twelve” if that’s what the caller is inclined to say (maybe hastening to add a “that’s December, two-thousand and twelve”, just to confirm you have fully understood your caller). You will say: “How”? Let me explain.

VoiceObjects allows you to store the pronunciation of an utterance in a variable, along with its actual value. This is done through the grammar that enables the speech recognizer to understand the caller in the first place. This value is called the pronunciation value. There is no fixed format for how this value should look like; it is completely up to you. How to hand this value back to VoiceObjects Server from within a grammar is simple: you add it to the return value for the slot that is filled by the corresponding utterance, separated by a double-pipe (“||”).

Example:

pronunciationvalue


When the server detects this “||” symbol (which is configurable through our media platform driver concept, by the way), it will parse the actual value out of it (“DSL”, as this might be the internal value required for further processing) and assign it to the variable, parse the pronunciation value and assign it as well. By the way, if you’re interested in this value during processing, you can retrieve it via the PRONUNCIATION(RefID) function provided by the Expression object.

What you do with this pronunciation value is straightforward, too: you hand it over to a formatting algorithm (via our Formatting Bus), which takes this pronunciation value (along with the “real” value, which is actually not needed for speaking the variable value back) and uses it to come up with the pronunciation when repeating the value in an output. Note how the grammar, in the above example, returns “internet_connection” as the pronunciation value; this assumes that there is a prerecorded prompt saying “internet connection” as the problem category. Your formatting algorithm would thus probably need to return “internet_connection.wav” as the audio file to use for playback. In fact, for this example you don’t even need your own formatting algorithm. The predefined formatting types utilize the pronunciation value instead of the actual variable value anyway. So choosing, e.g., TTA – Files or TTA – Complete as formatting types for your Variable object will make the platform use “internet_connection.wav” right away. Nice and simple.

Let’s have a look at the number pattern example now.

First, your grammar must be built in such a way that it can recognize single digits as well as number blocks. Usually, rules that match “one” up to “ninety-nine” suffice. The rest can be nested using smart grammar rule structures. In the tags that compute the value of what was said (as opposed to the words used), you need to add logic that also builds up the pronunciation value as the caller speaks (or rather: as the ASR engine computes the result). As an example, if the caller in fact says “sixty-two ninety-three nine oh four”, the slot return value computed by your grammar rules might be “6293904||62 93 9 oh 4”, which gets parsed as “6293904” for the actual variable value and “62 93 9 oh 4” for the pronunciation value. Your formatting algorithm might make a sequence of “62.wav 93.wav 9.wav oh.wav 4.wav” out of this. In fact, you could just as well use a predefined TTA algorithm for this again, e.g. TTA – Words, and it will do the job.

Last but not least, our famous sample application Prime Telecom, a telco self-service portal coming in three channels (voice, text, mobile Web), provides a sample implementation of Adapt-to-me with the credit card expiration date example I described above. Go check it out today! You can get all the software required to run this sample application for free at http://developers.voiceobjects.com. Go and impress your boss with what VoiceObjects can do for making your phone applications a much more pleasant experience, and your customers much happier. (Or maybe you ARE the boss? But hey – this mission is too important for me to allow you to jeopardize it…)

Oh, and if your boss tells you to implement this within your existing VoiceObjects app, check out the Input object documentation of our Object Reference (search for “pronunciation value”).

Inside Infostore – Part I: Structure and Call Records

Wednesday, April 8th, 2009

Infostore, the VoiceObjects data repository for real-time caller behavior analysis, offers a wealth of information so rich that it can be outright confusing for novice users. So in this series of blog postings, we want to shed light on Infostore’s inner workings and provide technically minded readers with the understanding and some sample SQL to explore the data on their own.

In addition, of course, there is VoiceObjects Analyzer with its comprehensive set of pre-built reports for all of the leading Business Intelligence (BI) frameworks. To find out more about it, as well as the Voxeo VoiceObjects tools in general, go to http://developers.voiceobjects.com/voiceobjects-documentation/.
For those eager to learn more we also offer hands-on training sessions on Infostore. Visit http://www.voiceobjects.com/en/support/training/ for details.

In this first part of the “Inside Infostore” series, we’ll look at the general structure of the Infostore repository and focus on the single dialog statistics record that is written for each call. In the subsequent parts we will then dive deeper into more detailed information about input states, personalization, business tasks, etc.

On a high level, Infostore is organized as a snowflake schema and optimized for immediate analysis of session data, typically by using BI tools, without the need for intermediate ETL processes. In particular this means that there are a number of key fact tables referring to lookup tables for the various dimensions. The following image gives a high-level overview of the relationships:

infostoreoverview

The Infostore data model has been designed for extensibility and integration with data derived e.g. from CRM systems, IVR logging, etc. In the same way, custom data logged by an application can be merged with the standard information contained in the Infostore fact tables.

infostoreextensions
The fact table we will focus on for right now is VOLDDLGSTS containing the dialog statistics, on a level that corresponds to what is often referred to as a Call Detail Record (CDR). In more than a hundred columns, the table contains aggregated information about the respective dialog session and can answer many important questions about application quality and caller behavior even without the need to join other, more detailed fact tables.
The entries in VOLDDLGSTS are the highest level of session information in Infostore, and in most installations it is desirable to have them for each and every session (at least for a certain period of time, such as 30 days). However, through simple configuration on the level of each deployed service it is possible to use statistical sampling and only collect data e.g. for 5% of all calls.

The following paragraphs describe the different types of data present within the VOLDDLGSTS table and provide sample SQL statements to answer typical questions. The SQL has been tested using Microsoft SQL Server; adjustments may be required for other databases. SQL buffs should also note that the statements have been optimized for readability as opposed to performance.
Entries in VOLDDLGSTS belong to specific services identified by a unique ID, the VSC_SID. In all of the samples we assume this SID to be known and fixed. It can be retrieved like this:

select vsc_sid from voldvscobj where vsc_refid=’<VSN of service>’ and is_current=1

Finally, the SQL statements used here operate on the “raw” Infostore tables. For Analyzer, there is an additional view layer that adjusts localization and performs a few mappings that usually aren’t relevant here. In some statements you see “locale_id=1″, which indicates the English localizations. Should you prefer German, use “locale_id=2″ instead.

Basic Session Information
On the most basic level, VOLDDLGSTS contains information about the vitals of each call session, including:

  • When the session started (MONTH_ID, DAY_ID, MINUTE_ID, SECOND_ID)
  • Which context parameters were available for the session (DLG_AAI, DLG_ANI, DLG_CRMID, DLG_DNIS, DLG_GCID, DLG_IID, DLG_RDNIS, DLG_SPSID)
  • Where it was processed (SRV_HOST_IP, SRV_INST_PORT, SRV_INST_NAME)
  • Which media platform driver was used (DRIVER_ID)
  • How long it lasted (DLG_CALL_DUR_MS, DLG_PROC_DUR_MS)

Even on the basis of just this core information, a number of relevant questions can quickly be answered:

  • How many calls were there yesterday / last week?
    Calls for a given day can easily be extracted with the data format YearMonthDay by use of:
    select count(*) from volddlgsts where vsc_sid=SID and day_id = ‘20090403′
    Similarly, making use of the date dimension table VOLDDATDAY we can retrieve all calls for a given calendar week:
    select count(*) from volddlgsts where vsc_sid=SID and day_id in (select day_id from volddatday where cw_id = ‘200914′ and locale_id=1)

  • Which percentage of calls comes from within the San Francisco (415) area code?
    For certain applications it is interesting to see where callers are geographically located. This can often be approximated by area codes:
    select 100.0*count(*)/(select count(*) from volddlgsts where vsc_sid=SID) from volddlgsts where vsc_sid=SID and dlg_ani like ‘415%’

  • Which calls lasted over a minute?
    Depending on the application, long session durations may indicate that callers had problems getting the information they called for. Thus it may be helpful to look at such sessions in more detail.
    select dlg_id,dlg_ani,day_id,minute_id from volddlgsts where vsc_sid=SID and dlg_call_dur_ms > 60000

As an excercise, you may want to build SQL statements to answer the following questions:

  • Which percentage of calls came in during weekdays / weekends? (Hint: Use the information in VOLDDATDAY)
  • Show number of sessions per day of week
  • Which is the busiest day of the week (in terms of number of sessions)?

Interaction Details
Moving up from the session basics to information on how the caller interacted with the application, we get the following:

  • How many dialog steps the session encompassed, and of which type (NO_DS_STEP, NO_DS_STEPS_VOICE, NO_DS_STEPS_DTMF, NO_DS_STEPS_TEXT)
  • Which No Input / No Match events occurred during the session (NO_NI, NO_NM, NO_NI_1..4, NO_NM_1..4, NO_DS_NOINPUT, NO_DS_NOMATCH)
  • How well recognition worked (AVG_CONF_VOICE, NO_DS_IMMEDREC, NO_DS_NONIMMEDREC, NO_DS_SUCCESS, NO_DS_NONSUCCESS)
  • How often standard navigation commands were used (NO_BACK, NO_FORWARD, NO_RPTS, NO_SKIP)
  • How often custom navigation commands were used (NO_HYPERLINKS)
  • How the session ended (DLG_EXIT_TYPE_ID, LAST_DS_STEP, LAST_DS_NAME, LAST_DS_TYPE)

Frequently used questions in this area are:

  • How do calls end?
    There are multiple ways in which calls can end (e.g. caller hanging up, application terminating normally or in exception, etc.) and it is good practice to keep an eye on the distribution. Here we use the localizations for the various exit types contained in VOLDEXTTYP.
    select count(d.dlg_id) as no_sessions, x.dlg_exit_type_dsc from volddlgsts d right outer join voldexttyp x on (d.dlg_exit_type_id = x.dlg_exit_type_id and d.vsc_sid=SID)
    where x.locale_id=1 group by x.dlg_exit_type_dsc

  • Which objects do callers typically hang up in?
    For those calls ending with a caller hang-up it is relevant to look at where in the application this happens, since it may point to spots that cause callers grief.
    select distinct last_ds_name, count(last_ds_name) as no_sessions from volddlgsts where vsc_sid=SID and dlg_exit_type_id=16 group by last_ds_name order by count(last_ds_name) desc

  • Which percentage of calls uses any sort of navigation?
    Most applications offer some way of escaping the normal top-to-bottom dialog flow, either by jumping to specific points (e.g. “main menu”) or by relative navigation (e.g. “back” or “repeat”). If a very large percentage of callers uses them, adjustments in the standard flow might be useful.
    select 100.0*count(*)/(select count(*) from volddlgsts where vsc_sid=SID) from volddlgsts where vsc_sid=SID and no_back+no_rpts+no_forward+no_skip+no_hyperlinks>0

Other questions you may want to explore for yourself could be:

  • Which percentage of calls has both No Input and No Match events?
  • Is the average confidence in short calls higher than in long calls?
  • How does average confidence vary by area code?

Processing Details
In addition to details on the interaction with the caller, VOLDDLGSTS also contains a lot of useful information about the interaction with backends:

  • How many backend interactions occurred, and how long they took (NO_CONNECTOR_EXECS, CONN_EXEC_TIME_MAX, CONN_EXEC_TIME_MIN, CONN_EXEC_TIME_TOT)
  • Which errors occurred during the session (NO_ERRS, NO_ERRS_CONNECTOR, NO_ERRS_INTERNAL, NO_ERRS_MP, NO_ERRS_SCRIPT)
  • How many notifications were sent during the session (NO_NOTIFICATIONS)
  • Which network-related activity took place (NO_REQUESTS, VOL_BYTES)

Interesting questions regarding the backend are e.g.

  • During which times has backend access been slow?
    This may point to problems on the backend itself, or to network congestion.
    select day_id,minute_id from volddlgsts where conn_exec_time_max>3000 and vsc_sid=SID

  • Were any calls aborted due to backend errors?
    Again, this may point to either problems on the backend itself or in the integration code that connects the application to the backend.
    select dlg_id from volddlgsts where dlg_exit_type_id=2 and no_errs_connector>0 and vsc_sid=SID

  • What’s the total data volume (in MB) transferred between IVR and VoiceObjects Server by week?
    This information is useful to ensure that network cpacacity between the IVR and VoiceObjects Server is sufficient to maintain optimal performance.
    select sum(d.vol_bytes)/10485476 as volume, t.cw_id as week from volddlgsts d, volddatday t where d.day_id=t.day_id and t.locale_id=1 and d.vsc_sid=SID group by t.cw_id order by t.cw_id

Other interesting backend-related questions could be:

  • What is the average backend processing time?
  • Are errors tied to backend slowdowns?

And finally, of course, you can combine information from the different categories to answer broader questions such as:

  • How does average confidence vary by area code?
  • How much longer are calls with many No Input / No Match events than calls with fewer of them?
  • Do weekday calls show a different caller behavior than weekend calls in terms of events and navigation?

That should do it for today. Keep in mind that we’ve used only a portion of the columns in VOLDDLGSTS so far – and that’s just one of several fact tables in Infostore. So there’s lots more to come.
Next time, we’ll look at how callers navigate through an application by means of module sequences.

Personalization using Layers – Part I

Tuesday, January 20th, 2009

You are going to develop a personalized and flexible phone application? Okay, bad news first: You may get into hot water, if you are not able to ensure a manageable application definition because of increasing complexity. The good news is that VoiceObjects Desktop and VoiceObjects Server have been designed to cope exactly with this complexity, what makes your life much easier.

 

Sure, you are right: In most of today’s phone applications personalization is key, because it allows for adapting the user interface

  • to individual users, e.g. by preferred language, persona or input mode,
  • to user segments, e.g. post-paid customer vs. pre-paid customer, or novice user vs. power user, and
  • to other relevant conditions, e.g. workday vs. weekend, happy hour vs. unhappy ;-) hour, or back-end available vs. back-end unavailable.

Real-life applications typically apply a combination of different conditions in order to offer the best suited user interface under certain conditions to a certain user. Additionally those conditions should be dynamically changeable at call time, e.g. to switch to another dialog language instantly at any dialog step. In the same way a web application server enables personalized web sites, the VoiceObjects phone application server supports personalized phone applications.

 

And it is not just about delivering the highest quality of service to your users! VoiceObjects Server based personalization also helps to relieve the media platform or browser from processing all these conditions during each call, ensuring best media platform performance during dialog execution. In other words: In a VoiceObjects setup the browser is used for dialog presentation, but VoiceObjects Server is responsible for any business logic and cares for “handpicked” (dynamic) dialogs for each user. Additionally all applied dialog conditions are automatically logged into the system database (Infostore), and the out-of-the-box reporting of VoiceObjects Analyzer can be used to analyze which conditions have been applied when and how often. Last but not least, you can apply these conditions as selection criteria (so-called dimensions) to other statistics, in order to analyze and compare user behavior, task completion or recognition scores in more detail.

 

VoiceObjects allows for such powerful applications by offering a single concept, called layers. Layers can be thought of global definitions or filters defined in your application. A developer defines a layer first, and later on applies the layer to each dialog step where this conditional behavior is required. However, in order to use the layer concept to its full capacity, you have to know how to design, configure, switch, apply and manage layer conditions. This article (to be continued) wants to shed some light on what you should keep in mind using layers.

 

Let’s start with an important basic distinction: VoiceObjects Desktop is equipped with standard (meaning application-independent) layers, which are already built-in to other dialog objects. These so-called system layers can evaluate the current dialog language (English-US, Spanish, etc.), input mode (voice / DTMF), phone channel (like voice or mobile Web) and occurrence (visit counter per dialog step). All other (typically application-dependent) layers are so-called custom layers. Each custom layer in your project has to be defined by a Layer object.

 

The second important concept focuses on the switching behavior of layers: Each layer has a set of so-called states, which are switched on or off following a well-defined process and logic. On the one hand automatic layers are switching… (guess!)… automatically, i.e. some internal logic controls which layer states are currently on (active) or off (inactive). On the other hand manual layers have to be switched “manually”, i.e. by calling the Layer function in an Expression object. The same Layer object type is used for defining both manual and automatic layers. By the way, all system layers are manual layers.

 

You may ask: What determines if a custom layer defined by a Layer object should be automatic or manual? Consider the following: Are there any settings (variables or other indicators, like contract number or time of day) which can/should be constantly monitored in order to control the layer states? This would argue for an automatic layer. Or is this layer more dependent on certain events or exceptions and changing more on individual incidents rather than switching frequently throughout the whole dialog? This would call for a manual layer. You should also take into account that a manual layer always has exactly one state activated, whereas automatic layers can have any number of active states (including none or all states). In general, many custom layers can be configured as an automatic or manual layer, and the choice will be a matter of best practice and personal preferences.

 

Looking deeper into the configuration of automatic layers you will notice two alternative mechanisms offered by the Layer object: First a state indicator and sets of indicator values assigned to each state, working like a Case-Else construct, and second state conditions working like pre-conditions per state. Mind that both approaches cannot be mixed in a single layer definition.

 

Typical traps working with layers are:

  1. During development ensure that your state IDs (which will be used internally by VoiceObjects Server during layer processing) are subject to the same restrictions as other object reference IDs, especially use no special characters (incl. space!) and be unique within your project.
  2. Do not try to switch an automatic layer using the Expression function Layer(). This would cause an internal server error at call time.
  3. Do not forget to define the initial (default) state that is required for each manual layer.

 

Feel free to add comments based on your experience working with layers. A follow-up article will continue on some best practices about switching layers and applying layers to your dialog definition, and will provide some more references and examples. Stay tuned!

Handling Test Case Data in VoiceObjects 7.4

Wednesday, January 7th, 2009
When developing voice applications, you often find yourself in a situation where you don’t (yet) have access to real back-end systems – yet you need to test your application for a variety of different scenarios, each with a different set of parameters, caller data, request and response data from back-end systems, etc.

In short, you need to handle sets of test data, each set representing a certain test case. Of course, there are several options to deal with this, but as of VoiceObjects 7.4, you now have a very elegant solution at our fingertips: The new expression function APPLYCONFIGURATION.

What does it do? Let’s have a look at the inline documentation in the Expression editor:

APPLYCONFIGURATION (configurationXML) – Applies the assignments defined in configurationXML. The XML format used is the same as for application defaults.

Application Defaults

Have you used the application defaults functionality before? If not – it’s simple: It’s about initializing selected variables, layers and collections on the service level. The Service object references an XML configuration file in the Configuration URL field. This configuration XML file will be loaded whenever the service is (re-)deployed. (For more information, check out the section on Application Defaults in the VoiceObjects Deployment Guide.)

The primary use case for Application Defaults is this: When working in multiple environments such as, say, a development, a test, and a production environment, each of those will require some unique configuration settings. For example, database names and credentials might differ, resource locator paths, and any other external settings. By “outsourcing” the initialization of the environment-dependent variables to the “application defaults” configuration XML document (which is bound to the service object, not to the project), the project definition itself becomes agnostic of the environment and can hence easily be taken from “dev” to “test”, and from “test” to “prod”, without applying any changes to the project.

Click to enlarge

Click to enlarge

For an example of a valid configuration XML file, scroll down to the bottom of this posting. In a nutshell, a configuration XML file references any number of variables, layers, and collections in a given project (by reference ID) and defines their initial values.

In-Session Configuration

Now, VoiceObjects 7.4 takes this concept one step further and makes the same mechanism available on a per-session basis: You can do bulk assignments of variables, layers and collections in a single step within your call flow definition, applying different sets of values for each and every call. Of course, this comes in very handy when you need to manage test case data.

Let’s have a look at our Prime Telecom demo application. It supports 3 different languages and 2 different customer types. For each of the resulting 3×2 = 6 combinations, we need at least one test case.

In the previous version of Prime Telecom, these test cases were handled in the traditional way: Within the Preprocessing sequence of the main Module Prime Telecom Portal, a Connector object invoked a JSP, providing the current language and the customer status as request parameters. As response parameters, the Connector’s parameter set contained each and every variable and collection that needed to be initialized – the customer’s postal address, email address, payment information, current tariff, subscribed tariff add-ons, available tariff add-ons etc. Quite a few parameters had to be maintained. And whenever a new parameter had to be added, it had to be added both in the JSP implementation and to the Connector object’s parameter set. Also, the maintenance of the test data in that JSP was cumbersome at best.

Not so any more.

The new implementation of test case data handling in Prime Telecom relies on test data being organized in configuration XML files, each file representing one test case. In Prime Telecom, these files are named configuration_de-DE_platinum.xml, configuration_de-DE_silver.xml, configuration_en-UK_platinum.xml etc.

In the Preprocessing sequence of the main Module Prime Telecom Portal,

  1. a Connector object reads the configuration XML file (via http get) for the current language and customer status and assigns its content to a variable;
  2. this variable is then used as the argument of an APPLYCONFIGURATION expression, setting all required variables and collections at once.
Prime Telecom Portal Module - Preprocessing

The beauty if this solution is that there is only one place to maintain the test data: In the configuration XML documents. When adding more parameters, only the XML documents need to be adapted; the Connector implementation (in our case, some Java code) and the Connector object’s parameter set remain unchanged. Also, the configuration XML documents are much easier to read and hence to maintain than the old JSP.

Of course, there are more use cases to APPLYCONFIGURATION than “just” handling test cases. For example, a hosted service provider could build application templates which become adapted to each customer’s requirements using this mechanism. Also note that, using VoiceObjects’ web service interface, much of the necessary handling could be automated, creating easy-to-use web front ends for end customers.

Example for a valid configuration XML document

This example shows how two objects are being initialized - the variable with the RefID CustomerBaseTariffName, and the collection with the RefID CustomerPaymentSettings. Note that the <type> nodes are optional; also note that collections need to be masked by <![CDATA[ ... ]]> sections.

<?xml version=”1.0″ encoding=”UTF-8″?>
<configurations>
  <configuration>
    <referenceID>CustomerBaseTariffName</referenceID>
    <type>variable</type>
    <value>Individual Plan</value>
  </configuration>
  <configuration>
    <referenceID>CustomerPaymentSettings</referenceID>
    <type>collection</type>
    <value><![CDATA[
      <root>
        <row>
          <col name="type">Visa</col>
          <col name="number">4140040912440644</col>
          <col name="expdate">0210</col>
        </row>
      </root>
    ]]></value>
  </configuration>
</configurations>


Exploring the new Expression Functions in VoiceObjects 7.4

Tuesday, January 6th, 2009

As you may know, VoiceObjects 7.4 has been released and is available for download on this developer portal.

Now, I wanted to see what’s in VoiceObjects 7.4 for developers. In particular, I was interested in exploring some of the new expression functions – there is large number of new functions in the realm of date and time handling, string operations, regular expressions and more, plus some other more VoiceObjects-specific functions. My plan was to adapt our Prime Telecom demo application to leverage some of the new VoiceObjects 7.4 functionality. Of course, the existing “legacy” Prime Telecom application works fine in VoiceObjects 7.4, but then … at the end of the day it’s a demo application, so I thought it should stay up-to-date, best-practice-wise.

I started reviewing the existing Connector and Script objects in Prime Telecom, assuming that these presented the lowest hanging fruits: If I could replace some of them by mere Expression objects, using the new functionality, the gains in application maintainability and scalability would be obvious. And indeed, I did find a few objects that had become obsolete:

  • Three Connector objects that had called custom Java code to validate the format of email addresses, credit card numbers, and expiration dates, could be replaced by simple Expression objects using the new MATCHESREGEXP function. This function checks whether a value matches a given regular expression. No more Java code to maintain here! Check out, for example, the new Expression object Is Email Address Valid in the new version of Prime Telecom to see how it’s done.
  • The Script object First Day Next Month had contained some JavaScript logic to calculate, well, the first day of the next month (relative to the system date). With the new expression functions LASTDAYINMONTH, ADDDATE and CONVERTDATE, this Script object could be easily replaced by Expression objects. There are quite a few more date and time related expressions, like NEXTWEEKDAY or TIMEBETWEEN that should help you implement most conceivable tasks in date and time arithmetics.
  • The Script object Monthly Rate for All Active Addons contained JavaScript code that iterated over the prices of all tariff add-ons currently subscribed to by the caller. Whenever the caller subscribes to a new add-on, the total monthly rate has to be re-calculated, so this Script object had to be re-executed. Now, with the new Expression function ITERATE, it was easy to replace this script by a few expressions: First, a Collection object is initialized with all prices of the current add-ons (Expression object Assign List of subscribed Add-On Prices). Then, after resetting the “total monthly rate” variable , another expression simply iterates over all rows in this collection and sums up the prices (Expression object Sum up Prices of all Active Add-Ons).

After getting rid of a few Connectors and Scripts, I reviewed the existing Expression objects in Prime Telecom. I found some implementation details that looked a bit cumbersome in the light of the new possibilities. Hence, I tried to replace them with new expressions, lowering the total number of objects used and trying to make the code more maintainable at the same time.

  • First, I investigated the new string operations. For a start, using the new functions RIGHT and LEFT instead of MID where appropriate helps keep the number of objects down.
  • The new function COUNTOCCURRENCES made several other expressions in Prime Telecom obsolete. It counts how often a certain substring appears in another string. In combination with the XPATH expression, it helped implement a very efficient way to count available tariff add-ons: Inspect, for example, the Expression object Add-On Row Counter.
  • Note, by the way, that the XPATH expression now takes a third argument returnAsCollection: This allows returning a list of XML nodes not only as a variable containing blank-separated strings (returnAsCollection=false), but alternatively as a handy Collection object (returnAsCollection=true). This makes the result of an XML query much easier to handle, as you can see when you inspect the Calculate Subscribed Add-Ons Sequence object.
  • In the same sequence, I found the new ITERATE function to be useful. I use it in the Iterate over remaining Add-Ons Expression object to create a comma-separated list of the names of subscribed tariff add-ons. ITERATE can make the (much more powerful) Loop object obsolete in cases when you just need to iterate over an expression to perform some simple logic.
  • The new VALUESUBSTITUTION function was instrumental in minimizing the number of expressions in the context of managing the headlines in the web and text channel. This function extracts a piece of functionality that was previously only available in the context of the Formatting object’s Value substitution field: It searches a variable in a Collection object and returns a lookup value – which can differ per language. This comes in very handy in multi-language applications. Check out the expression Headline to see how the (language-dependent) headlines in Prime Telecom’s web and text channel are implemented. 

There are many more new expressions to explore; some of the most powerful ones, like APPLYCONFIGURATION, deserve dedicated blog entries just to explore their capabilities.

Let me conclude with two more nice catches that have an impact on the Prime Telecom implementation:

  • The Channel filter is now also available on the Menu Items in Menu objects and on the Correction Mappings in the Confirmation object. This makes it much easier than before to create multi-channel applications with different sets of menu items across the different channels. For example, you might want to offer an “Address Change” self-service in the text-based channels, but not in the voice channel. For an example in Prime Telecom, check out the Confirm Credit Card Confirmation object: The “Correct both” (both the credit card number and the expiration date, that is) Correction Item is available only in the voice channel.
  • In the web channel, fields in web forms are now automatically pre-populated with the according variable values. This works for text fields, radio buttons, and lists. Note how the Prime Telecom web forms no longer “forget” your input when correcting, for example, credit card details.

If you want to reproduce my findings, I suggest you download your copy of the new version of the Prime Telecom demo application, install it in your VoiceObjects 7.4 Developer edition and inspect it in detail.

New Prime Telecom Video Samples

Tuesday, January 6th, 2009

If you are a developer interested in evaluating, playing around with, or just getting to know VoiceObjects, a good way to start is looking into the tutorials and demos that come with it.

In particular, if you’re interested in the multi-phone-channel capabilities of VoiceObjects, or if you just want to explore some of the more advanced implementation concepts of the VoiceObjects platform, you should have a look into the Prime Telecom demo application. You find it in the Demos&Templates section of the developer portal.

Very recently, we have added a few video demos that show how the Prime Telecom application looks&feels like in the different channels. Watch these videos to see what you will be rewarded with if you decide to download and install the Prime Telecom demo.

First, watch a short demo of the Prime Telecom application in the text channel. The same “Enter new credit card data” self service can be invoked on the mobile web channel and as a voice call. For a high-level backstage view of the service implementation, watch this video.


VoiceObjects Integration with NuEcho’s Grammar Server

Monday, December 1st, 2008

When creating speech applications, being able to manage dynamic grammars is often a must. A few examples include

  • Choosing from a user-specific list of accounts, reservations, transaction codes, …
  • Asking the caller to identify with his password, date of birth, or security question;
  • In a banking bill payment application, the “payee list” grammar can be dynamically generated based on the list of payees that has been set up by the user;
  • Address capture: After asking the caller for the zip code, a “street” grammar can be generated dynamically with streets associated with the zip code.

Many more use cases for dynamic grammars can be found in the Nu Echo blog: Part 1 and Part 2.
Now, the question is: How to generate dynamic grammars? In particular, how to generate dynamic grammars at call time? The traditional approach is to create a JSP (or ASP, or PHP, or Perl …) page that is invoked at call time with a set of request parameters, returning the required grammar. Creating such pages is cumbersome at best.

Nu Echo, the VoiceObjects partner company specializing on grammar development tools, have come up with an interesting approach: They designed an easy-to-use yet powerful markup language for creating dynamic grammars. This Grammar Language extends the ABNF format with dynamic grammar directives that can access variables and objects passed to the instantiation service via an instantiation context. This context maps variable names to values.

Using this approach is fairly simple: You create a grammar template in ABNF format, maybe using the NuGram IDE. This grammar template needs to be uploaded (via HTTP PUT) to the Grammar Server (aka NuGram Server). At call time, the voice application instantiates the grammar, providing an instantiation context that contains the session-specific data via a HTTP POST command. This context must be provided in JSON format, a special string format representing, in a nutshell, structured key-value pairs. The Grammar Server hence creates the grammar which can finally be retrieved using a HTTP GET request.

Now, our idea was that while this sounds simple enough, it should be made even simpler for VoiceObjects developers. All the HTTP based communication with the Grammar Server that is going on behind the scenes should be automated. Also, creating JSON formatted strings shouldn’t be something VoiceObjects developers are bothered with.

What we came up with is the Grammar Server Connector that bridges between VoiceObjects applications and the NuGram Server. This connector uses NuGram Server’s HTTP API to upload, instantiante, and fetch dynamic grammars. All that the VoiceObjects developer has to do is, well, first create the grammar template. In the application, he maintains the instantiation context (i.e. the dynamic data that is known only at call time) in a Collection object. The Grammar Server Connector’s task is then to instantiate the grammar at call time.

To get you started with this new approach to dynamic grammars, I created a simple demo application that uses one single grammar template and a single, simple input state, asking the caller for a currency value – treating the actual currency (US Dollars, Euro, British Pound, Czech Koruny, Mexican Pesos …) as a dynamic value. You can download it, along with the installation package and comprehensive documentation here (scroll down to section “VoiceObjects Integration with NuGram Server”).

Check out the update…

Tuesday, October 21st, 2008

Dear VoiceObjects developers,

We recently updated the Developer Portal with some new product updates. Check out the update below:

  • Now available: VoiceObjects Media Mixer Developer Edition
    VoiceObjects Media Mixer is now available as free Developer Edition. VoiceObjects Media Mixer enables the delivery of innovative video solutions via the creation and dynamic generation of video applications for 3G phones. This optional component to the VoiceObjects Server mixes multiple content types — such as text, images, HTML, audio and video files — and generates video clips in 3gp or Flash video (flv) format.
    http://developers.voiceobjects.com/support-training/media-mixer/


  • New version of VoiceObjects Developer Edition
    Version 7.3 R2 is now available for download, providing a newly designed Test Monitor, an enhanced Storyboard Manager, and several minor enhancements and bug fixes. Users of previous versions of the Developer Edition may use the “Check for Update” from the Developer Edition’s VoiceObjects menu to check for the latest version and to access the provided update package.
    http://developers.voiceobjects.com/downloads/deveditiondownload/


  • Prime Telecom
    To learn more about using VoiceObjects for multiple phone channels try the newly available demo application Prime Telecom. Prime Telecom implements a sample self-service portal for a telecommunication company, supporting three phone channels (voice, text, and Web) in three different languages (US-English, UK-English, and German).
    Additional demos and templates will be published soon.
    http://developers.voiceobjects.com/support-training/developer-edition/demos-templates/


Please note also our next Jam Session on “Natural Dialog Management – Adding NLU and Other Advanced Speech Processing Features to Traditional Voice & DTMF Dialogs” on November 5, 2008. Free registration at http://developers.voiceobjects.com/tech-topics/monthly-jam-sessions/

Best regards,

Michael

Correct me if I’m wrong

Tuesday, October 14th, 2008

Ill-designed voice dialogs can get unnecessarily slow and tedious when they try to over-compensate for speech recognition challenges, may they occur or not. This can drive me nuts: They verify each and every input individually instead of first collecting all input and then confirming all of it at once. With 4 items to collect, for instance, this would reduce the input states by 3 (!), let alone the time those three steps would take. Using VoiceObjects Analyzer, you could easily measure that time and understand that that would be time spent in vain, which might actually cost your organisation money spent in vain! But what can you do?

Well, VoiceObjects 7.3 makes it pretty easy to apply concepts like implicit confirmation and thus correction to your dialogs. As this occurs so frequently in everyday voice applications, I thought I’d write a post on this.

So check out the following call flow excerpt:

The Input object Get Credit Card Type asks the initial question: “What is the type of your new credit card?” and accepts responses such as “Visa”, “It’s mastercard”, “I have an Amex card”, etc. The following Input object Get Credit Card Number collects the number only – at first sight, that is. In reality, it does more. It

  1. implicitly confirms the collected card type by prompting the caller with “And what is the number of your Mastercard?” (The speech bubble icon behind the object name – denoting a comment on that object – hints at the additional functionality of this object; hovering over it in VoiceObjects Desktop would show you the developer’s comment as a tool-tip).
  2. allows a correction of the card type in case it was misrecognized.

This behaviour is not visible per se from the flow as VoiceObjects call flows are usually optimized for readability, deliberately omitting certain details. Showing too much of the innards would make it harder to follow what’s happening on the surface. (I have positive experience with this approach. But here’s an idea: to mention the correction capability of this input state, the developer could have called the Input object Get Credit Card Number (or Correct Card Type) instead).

Now imagine the recognizer got it wrong and the caller actually said “AmEx card”. What could be the caller’s reaction to this question? Maybe something like “No I said AmEx!” (Damnit!). The Input object Get Credit Card Number has an additional grammar defined (via a second Grammar item in the Grammar section of the Input object) that matches corrections like this. But how can the grammar instruct the server to accept this as a correction of credit card type, reset the corresponding variable, apologize, and ask again for the number, as in “I’m sorry. So what’s the number of your Amex card?” VoiceObjects 7.3 introduced the notion of grammar-driven application control to accomplish this. (Did you notice that the server even adapts to the caller’s choice of words in this example, by saying “Amex” instead of “American Express”? I guess that would make for another nice post on naturalness…)

The grammar can return instructions for the server via a special slot vogrammarcontrol; instructions such as “change the value of variable CCType” and “re-process object Get Credit Card Number”. In our example, the corresponding grammar snippet could look like this:

By detecting the slot vogrammarcontrol in the speech platform’s request after caller input and parsing the slot value to arrive at the three instructions varCCType=Amex (set variable CCType to “Amex”), gosub=Apology (process the object with ReferenceID “Apology”, which happens to be an Output object), and continuation=return (continue by returning to the current object and re-processing it), the server naturally responds to the caller’s correction with “Sorry for that. So then, what’s the number of your Amex card?”.

This is one of many possible steps towards more natural man-machine interaction. We at VoiceObjects like to call it Natural Dialog Management. Find more details on the feature of grammar-driven application control in the Input object section of the Object Reference, which is part of the VoiceObjects product documentation. I plan to provide some more examples on other Natural Dialog Management features in upcoming posts… Stay tuned!

Welcome to the VoiceObjects Developer Blog!

Monday, October 13th, 2008

Welcome to the VoiceObjects Developer Blog. The VoiceObjects team is very excited about this new communication channel as this will enrich our growing developer community by sharing thoughts and experiences on various topics.

This weblog will focus on best practices for self-service application design, VUI / GUI design, analytics and reporting, and speech grammar development. Also we are going to cover new market and technology trends and how those impact the VoiceObjects product roadmap. Specifically on developments in the W3C standard bodies around VoiceXML. But also on emerging phone self-service channels such as Video, SMS, USSD, Web Chat and Instant Messaging. And finally we want to report about successful project and solution deployments with our SI partners and customers.

Here are the VoiceObjects team members who  will participate as authors in the Developer Blog:

Andreas Volmer – Presales Manager EMEA
Angelika Salmen – Manager VUI Services
Christopher Schick – Program Manager
Martin Mauelshagen – Program Manager
Michael Codini – CTO
Michael Gill – Director Product Management
Stefan Besling – VP Engineering
Tobias Göbel – Senior Presales Consultant
Volker Kraft – Manager Education Services

If you would like to stay  up-to-date  on what we are doing, please do subscribe to our site’s feed – and please do pass along your comments.  We’d love to hear from you.

The VoiceObjects team