Archive for the ‘Best Practices’ Category

HowTo call stored procedures in a database server from VoiceObjects

Tuesday, August 10th, 2010

Did you ever wonder what’s the best way to call a stored procedure in a relational database server from your VoiceObjects call flow implementation? As you might know, the new Database object was added with VoiceObjects 9.1, and it greatly facilitates the task of executing SELECT, INSERT and UPDATE statements in a database server. However, it just doesn’t deal with stored procedures. In many projects, however, database content must be accessed through stored procedures, while direct use of DML (Data Manipulation Language) commands (such as SELECT or INSERT) is a no-no.

So, we went ahead and created a generic VoiceObjects Connector implementation that should be useable in most if not all instances where database integration via stored procedures is required. It was implemented as a CGI connector – that is, the connector code will be deployed as  a web application in a web application server (such as Tomcat), and communicate with the VoiceObjects Server via XML/http. It supports calling stored procedures with any number of IN and OUT parameters of all kinds of data types, and deals with result sets that may be returned by  the stored procedure. Plus, it leverages database connection pools that are configured and maintained on the web application server.

If you’re interested, find all you need to know about this generic “database stored procedure connector” in this new Knowledge Base article. There, you’ll find background information, installation and configuration instructions, as well as the actual download of the connector including the associated Java sources.

PS: Note that the implementation of this new CGI connector actually builds on the Java Servlet framework for CGI connectors that was presented in this recent blog post on VoiceObjects back-end integration.

Reports at your fingertips

Sunday, April 25th, 2010

VoiceObjects has one of the strongest phone self-service reporting and analytics offerings in the market today. VoiceObjects Analyzer provides insight into application usage and system performance for developers, administrators, and business staff like no other. Full access to the rich features of this product requires a Business Intelligence platform underneath, such as Business Objects, Cognos, or MicroStrategy. But did you know that VoiceObjects 9.1 introduced a variety of reports available at your fingertips, right within your favorite GUI Desktop for Eclipse, with no further software or installation required?

If you’re using 9.1 already, you have insanely easy access to reports probably without even knowing it; reports that help you run a session analysis for your server cluster or a single instance, inspect memory usage over a configurable period of time, or even analyze business task completion rates for the service you’ve launched the other day.

Let me show you how it all works. Whether you use Desktop for Eclipse in standalone mode (which is what you get with the “Developer Edition”), or access a full VoiceObjects Server in network mode, you can utilize the so-called Control Center Reports either way. As the name implies, the Control Center is where this neat little feature hides. If you run in standalone mode, you may want to check out this post about how to setup a Control Center view of your embedded VoiceObjects Server before proceeding.

Reports are available for three different areas:

  1. entire server cluster
  2. single server instance
  3. single service

To access the reports for the entire server cluster (the “logical server”), switch to the Server Manager tab of the Control Center view, right-click on the item preceded by “S: “ (which is your logical server representing the cluster, called “VOServer” by default), and click Reports. From the submenu, you can now select a report from the available list:

How to Access Control Center Reports



Once selected, the view switches to the Report Chart tab, where you now see your report:

Sessions By Service


Change the time frame as desired and hit the Refresh button to update the report. You can even export a PDF version of your report to share it with others! (Notice the little Disk icon in the upper right corner…)

Accessing the reports for a single instance works pretty much the same way: right-click on the server instance (denoted with an “I: “) of your choice and select the desired report from the Reports submenu.

Memory Usage



Service reports can be accessed likewise: switch to the Service Manager tab, right-click on the service of your choice and select the desired report. For example, check out the following report on business task completion rates, telling you how successful your callers are using your service:

Business Task Completion Rates

Note: If you can’t see the entry Reports in your menus, then you have turned off System DB Logging for your server, instance, or service. System DB Logging must be switched on in order to be able to access the reports. If it’s switched off but had been switched on before and you still want to see reports on that historical data, you can switch it on temporarily through the transient setting available in the DB Logging submenu of your server or service.


So there you go: reports at your fingertips, without the need to install a full Business Intelligence suite. However, keep in mind that VoiceObjects Analyzer provides a whole lot more reports, plus analysis features such as drilling, slicing&dicing, and more. The Control Center Reports do not replace the Analyzer, but provide valuable insight mainly for administrational staff right within your favorite IDE.

Alright, I’m outta here. I have to fix this module “Enter New Credit Card” in my application. The task completion rate is way too low as I just realized…

NuBot – Automated end-to-end testing of IVR applications

Monday, October 19th, 2009
Our partner, NuEcho, have started a beta program for the latest addition to their tools portfolio: NuBot, an automated test platform for functional testing and load testing of IVR applications. By the way, NuEcho are still accepting participants in their beta program; so if you’re interested, tell them!
 
Curious to see how NuBot can be used for testing VoiceObjects applications, I enrolled in their beta program and got started right away. I was delighted to see that the NuBot ITE (Integrated Test Environment) client comes as an Eclipse plugin, so it fits in nicely with VoiceObjects Desktop for Eclipse. Here’s a little screenshot from my Eclipse Perspective selection popup:
 
Eclipse Perspectives
 
What NuBot does? Let’s listen to NuEcho: “The NuBot Platform is a complete and integrated testing infrastructure which allows for testing interactive voice response (IVR) applications. It can be applied to multiple types of applications, whether they use voice recognition or not. The NuBot Platform uses an open-source telephony platform and supports a wide range of telephony standards (SIP, T1, RNIS, analog, and so on). The system’s architecture is as follows: by way of the NuBot plug-in client, the user’s workstation connects to the remote robot server over an IP network (RMI), which in turn processes and executes incoming or outgoing calls.”
 
Remember the VoiceObjects LoadTester? The conceptual difference between LoadTester and NuBot is that LoadTester loads VoiceObjects Server directly (sending http requests in place of the VoiceXML browser), while NuBot allows for end-to-end, over-the-phone testing that involves the entire software stack including the IVR. Both tools have obvious use cases in the testing process.
 
Now, what does it take to create and run test scenarios for an existing application?
 
1) Application Instrumentation with DTMF sequences
 
First of all, your IVR application must be instrumented. To each input state that will be part of a test scenario, you add a unique, fixed-length DTMF sequence. For instrumentation of our dear old friend, the Prime Insurance demo application, I decided to add the sequence “C001″ to the main menu, “C300″ to the Car Insurance welcome prompt, etc. (In case you didn’t know – DTMF keys comprise not only the well-known *, #, and 0-9, but also the characters A-D).
 
To make the DTMF instrumentation a smooth experience, I created a new formatting class “DTMF Sequence” for the VoiceObjects Formatting Bus. It takes a sequence definition such as “C1#” and maps it to the according files: c.wav, 1.wav, hash.wav. Simple. I also made sure I can (de-)activate the instrumentation through a single variable, either by setting it in the initial URL, or by listening, early on in the Prime Insurance application, for a hidden DTMF command that only my NuBot test script knows about. If you want to know more about this Formatting Bus implementation, let me know.
  
2) Drawing the call flow
 
Now let’s move on to the NuBot ITE. After creating a new test project, you first need to create a “call flow”. This is a very simple mapping of your application’s input states and their transitions, identified by the DTMF sequences which you defined and created in step 1.  The following screenshot shows how I sketched out the “Car Insurance” Module in Prime Insurance as a NuBot callflow. For example, the initial prompt in the main menu has been extended by the DTMF sequence C001; and the “Ask for car” input state by C301.
 
callflow
 
3) Defining a test scenario
 
Now that NuBot is aware of the basic structure of your IVR application, you can go ahead and create test scenarios. The following screenshot shows such a scenario that drives and tests the Prime Insurance application, simulating user input via dtmf keys and speech input. For the speech input (“Ford”, “Focus”), I created two short voice recordings and added them to my NuBot project. When executing the test, NuBot compares the DTMF sequence played after each step by the application with the DTMF sequence from the call flow definition; in case of a mismatch, an error is reported and the test is aborted.
 
scenario
 
4) Executing test script
 
Finally, you create a “test descriptor” where you define which test scenarios should be executed, which phone number is to be called, how often to repeat tests, how many of them in parallel … in short, you can schedule both functional tests and load tests, and I guess, at some point, also schedule monitoring tests (running 24×7 on a regular basis to monitor a production system, performing functional end-to-end tests).
 
After running the test, the results are fetched from the NuBot Server and can be analyzed locally. You get summary statistics on test success/failure; you’ll see which input states caused test scenarios to fail; and you’ll get detailed statistics on response times. The following screenshot shows the response times from my first successful test execution of Prime Insurance’s Car Insurance module. 
 
response times results

I found NuBot easy to master and a very powerful addition to my automated testing portfolio. I can only recommend to get your hands on it and try it; it’s about time that we take automated testing more seriously in the IVR application business.

Creating IM Bots … and phone-less testing

Friday, September 4th, 2009

Did you read Tobias Göbel’s very recent blog about “IM-ifying VoiceObjects applications”? Well, I’m not saying it’s outdated. But, things have become even simpler since then.

When provisioning applications on Voxeo Evolution, you are no longer confined to the IVR channel. In addition to voice phone applications, you can provision your applications as “text messaging” applications. And you can provision these text messaging apps as an SMS service and / or map them to bots on the different IM networks, all in one single spot. Read the recent Voxeo blog about this new feature.

Now, what does this mean for VoiceObjects developers?

First of all, the two new options for deploying text messaging apps are actually very different both technically and  in terms of use case. This is represented by the following architectural sketch:

Architecture

Application Type “Instant Messaging Bot”

First, have a look at the right-hand-side of the drawing. In VoiceObjects Server, the media platform driver Voxeo IMified Platform is selected (which has been available since VO 9 R1), while in Evolution, the application type is set to Instant Messaging Bot. In this scenario, Evolution is just the place where you configure your bots on IMified and provision the application’s start URL pointing to your VoiceObjects Server. The actual session-related traffic will be exchanged directly between VoiceObjects Server and the IMified platform, which in turn is connected to the IM networks. In the VoiceObjects service, the text channel will be active, and you can specifically design your application for this channel. (If you have never seen a VoiceObjects service in the text channel yet, watch this little demo video that shows how to test-drive a text app with the VoiceObjects PhoneSimulator).

“Instant Messaging Bot” is the application type you want to use for service with a specific design for the text channel. Under the hood, it uses the XML/http-based IMified API.

Application Type “Prophecy 10 VXML 2.1 /w SMS”  (aka Phoneless testing!)

Now, for the left-hand side of the architectural sketch. The difference looks small, but this is actually a completely different approach. Here, (a beta version of) Voxeo Prophecy 10 is sitting in between the application Server and the IMified platform. In VoiceObjects, you configure the exact same media platform driver as in the case of Voxeo-based IVR applications: Voxeo Voice Platform. In other words, VoiceObjects Server renders plain old VXML 2.1. In Evolution, you pick the new application type Prophecy 10 VXML 2.1 /w SMS. The Prophecy server is connected to IMified via MRCP, and rather than using ASR and TTS as IVR resources, it is treating IMified as a resource for (dis)playing prompts and processing grammars.

What happens if you talk to an IM bot configured this way? Well, he’ll speak the TTS text that he finds in the VXML, and he will process the input you type into the chat window based on the grammar(s) defined in the VXML code. What if your application expects DTMF input? You simply type in a number. What if you type in an out-of-grammar utterance? Guess what, a NoMatch event is triggered and the according NoMatch prompt will be played. What if you stay inactive for more than a few seconds? Of course – a NoInput event will be triggered, and if you have a NoInput handler defined, it will be processed.

In short: We’re looking at the perfect phone-less testing device for voice applications. Here’s a screenshot from a sample session; you can see me “talking” to the voice channel version of our SpeechTEK 411 demo application. Note the NoMatch and the NoInput situations.

Sample IM session

So - can you think of a more elegant phone-less testing test harness? I can’t.

When deploying and testing your voice application in this way, just make sure that all your prerecorded prompts have alternative TTS text defined, and set the speech timeout to a much higher value than the standard 3 seconds.

VoiceObjects LoadTester available for download

Friday, July 31st, 2009

Load and performance testing, functional regression testing, active production monitoring – things that ain’t fun, but need to be done if you’re serious about service quality and 24×7 availability.

For internal quality assurance based on test automation, as well as for automated testing in projects delivered by our own professional services and by partners, the Voxeo VoiceObjects team has developed and has been using a great little tool for quite a while: The VoiceObjects LoadTester. Technically, it’s a set of Python scripts that allow for

  • recording test scripts by making reference calls;
  • playing back test scripts by few or by hundreds of virtual callers (simulation of real-world workload), stressing the VoiceObjects Server platform and all involved back-end systems, while at the same time verifying that the system still responds quickly and correctly even under peak load conditions;
  • and help to analyze system performance and response times, identify system bottlenecks, and understand resource usage.

The primary use case is making sure that the sizing and configuration of the VoiceObjects platform, including the server cluster, load balancer, Infostore setup, and back-end systems are ready to meet peak load situations, and that the system survives breakdown of single components.

An additional important use case is automated functional regression testing - making sure, with every new service release, that functionality that once was tested successfully hasn’t been broken since then. Finally, the LoadTester can be used for active production monitoring: Executing a set of test scripts in regular intervals (like, every 3 minutes) against the production system, performing transactional, end-to-end testing, that feeds health information and alarms into an existing system monitoring solution.

Now, why do I write about this? Simple: You can now download and use the VoiceObjects LoadTester in your own projects, for free. It will be official part of the VoiceObjects package in the next revision of VoiceObjects 9.0. And it is available today, as a pre-release, here: LoadTester.zip (this package contains a full installation for Windows, including the Python interpreter and required libraries).

On July 29, 2009, I presented the LoadTester in one of our developer jam sessions. Be sure to download (and scan through) the slides I used in the presentation, get the LoadTester Guide, and - sit back and relax – watch a recording of the webinar.

Finally, these are some images with charts generated the VoiceObjects LoadTester. Want to know what they show? Well, check out the LoadTester Guide – it’s all explained in there.

 

Mini-Preview of LoadTester charts

Mini-Preview of LoadTester charts

Inside Infostore – Part II: Modules and Paths

Wednesday, July 8th, 2009

Back in April, in the first installment of our Inside Infostore series, we looked at the general structure of the Infostore repository for real-time caller behavior analysis and answered a number of interesting questions on the basis of the Call Detail Record table VOLDDLGSTS alone. This time, we’ll take a look at Module information available within Infostore. It provides valuable insight into how callers use your application – which parts they visit, which parts they skip, and exactly how they get to where they end up.

Modules
Module objects in the Voxeo VoiceObjects framework provide a “wrapper” for applications or sub-applications within a bigger one such as a self-service portal. The Prime Insurance sample provides a good model, as shown in its main menu:

pimodules

 A separate Module object encapsulates each of the five branches, as well as the overall application. Each Module defines inheritable event handling, navigation using hyperlinks, and additional application settings.
More information on the Module object can be found within the Object Reference. Best practices in structuring your application using Modules are discussed in the Design Guide. Both are highly recommended additional reading.

Module Tables
Module information is stored within Infostore in five different tables: VOLDMODULE, VOLDMODSEQ, VOLDMODSET, VOLDRELMSQ, and VOLDSUBSEQ. Other tables, such as VOLDDLGSTS, refer to them through surrogate IDs.

VOLDMODULE contains general lookup information on each Module object such as its name, modification timestamps, and key settings.
Data in this table is updated with each deployment or redeployment. In addition to the “real” Module objects, the table also contains an entry for “[End of Dialog]“, which is used to indicate the end of the dialog (as you may have guessed).

VOLDMODSEQ contains an entry for each sequence of Module objects that has been traversed within a call. So e.g. when somebody calls the Prime Insurance application shown above, selects the car insurance branch from the main menu, and then afterwards also inquires about life insurance, there would be an entry “Prime Insurance Portal,Car Insurance,Life Insurance”.
Data is entered into this table as necessary whenever a new sequence is observed in a call.

VOLDMODSET is similar in that it contains one entry for each set of Module objects that has been visited within a call. Multiple sequences may lead to the same set, and each sequence entry in VOLDMODSEQ contains a pointer to the respective set entry in VOLDMODSET. The set entry is sorted alphabetically, so for the same call example as above the set entry would be “Car Insurance,LifeInsurance,Prime Insurance Portal”.
Data is entered into this table as necessary whenever a new set is observed in a call. Since the sequence entry references the corresponding set entry, the set entry is made first.

VOLDRELMSQ maps individual Module objects to module sequences and the positions at which they occur within these sequences. In the example there would be three separate entries mapping Module “Prime Insurance Portal” to the first position in the sequence, “Car Insurance” to the second position, and “Life Insurance” to the third.
Data is entered into this table as necessary whenever a new sequence is observed in a call.

Finally, VOLDSUBSEQ contains a break-down of Module sequences into their constituent sub-sequences. This information is needed for reports such as the dominant path analysis referred to below. In our example this will result in the following six sub-sequences including the end marker  ”[End of Dialog]” mentioned above:

  • Prime Insurance Portal,Car Insurance,Life Insurance,[End of Dialog]
  • Prime Insurance Portal,Car Insurance,Life Insurance
  • Prime Insurance Portal,Car Insurance
  • Car Insurance,Life Insurance,[End of Dialog]
  • Car Insurance,Life Insurance
  • Life Insurance,[End of Dialog]

Data is entered into this table as necessary whenever a new sequence is observed in a call.

Taken together, these five tables can be utilized to gain insight into how callers navigate through your applications. The next two sections explore a number of sample questions. 

Basic Orientation
As in part I, the SQL statements shown below have been tested using Microsoft SQL Server. They are meant to be indicative of specific types of information and are formulated for readability rather than performance. Entries in all tables mentioned here belong to specific services identified by a unique ID, the VSC_SID. In all of the samples we assume this SID to be known and fixed. It can be retrieved like this:

select vsc_sid from voldvscobj where vsc_refid=’<VSN of service>’ and is_current=1

Finally, the SQL statements used here operate on the “raw” Infostore tables. For Analyzer, there is an additional view layer that adjusts localization and performs a few mappings that usually aren’t relevant here.

With these preliminaries out of the way, here are a few high-level questions that can be answered easily on the basis of the Module set and sequence entries as well as the references in the VOLDDLGSTS table we looked at before:

  • How many different sets / sequences do callers visit?
    Obviously this is about the simplest question you can ask regarding sets and sequences, but it does already give you a high-level idea of how callers utilize an application.
    select count(*) from voldmodset where vsc_sid=SID
    select count(*) from voldmodseq where vsc_sid=SID
  • What is the ratio between sets and sequences?
    This ratio is a rough measure of variability between different calls (and callers). When it is close to 1, callers visit the same places in roughly the same way. When it is significantly bigger than 1, different calls visit the same places in significantly different ways. Keep in mind, however, that this value depends on application design at least as much as it depends on caller choices.
    select (select count(*) from voldmodseq where vsc_sid=SID) / (select count(*) from voldmodset where vsc_sid=SID)
  • Which percentage of calls visits a certain sub-application?
    Different choices within a self-service portal will attract callers to differing extents. In our example we want to know the percentage of calls that visit the “Life Insurance” branch of the Prime Insurance Portal.
    select 100.0*count(*)/(select count(*) from volddlgsts where vsc_sid=SID) from volddlgsts where mod_set_sid in (select mod_set_sid from voldmodset where mod_set_name like ‘%Life Insurance%’) and vsc_sid=SID
  • How often does each Module object occur in sequences?
    Module sequences will contain individual Module objects in different numbers, which is indicative of their respective importance in call flows.
    select count(*) as cnt, m.mod_name from voldrelmsq r inner join voldmodule m on (r.mod_sid = m.mod_sid and m.mod_sid>0 and r.vsc_sid=SID) group by m.mod_name order by cnt desc
  • What is the most / least visited sub-application within my application portal?
    If your application is e.g. a banking self-service portal offering sub-applications such as balance checking, making transfers, and brokerage transactions, then you will want to know if 90% of your callers really only want to check their account balance at the end of the month. Note that while this question is similar to the preceding one, here we’re dealing with numbers of actual calls as opposed to just occurrences of Module objects in sequences.
    select m.mod_name, count(dlg_id) as callCount from voldmodule m inner join ((select distinct mod_seq_sid, mod_sid from voldrelmsq) as r inner join volddlgsts as s on r.mod_seq_sid = s.mod_seq_sid and vsc_sid=SID) on r.mod_sid = m.mod_sid and m.vsc_sid=SID group by m.mod_name order by callCount desc
  • What is the average time spent in each Module visited?
    This simple query does, of course, only give you a very rough global estimate – but it can give a hint on whether the “size” of your Modules is reasonable. If the average time spent in a Module is in the order of minutes, you may want to add more structure to your application by adding Module objects in strategic places to obtain a better resolution.
    select avg(dlg_call_dur_ms/(1000.0*no_modules)) from volddlgsts where no_modules>0 and vsc_sid=SID
  • Which Module scopes do callers typically hang up in?
    Knowing where in the application callers hang up can validate (or invalidate) assumptions about caller behavior.
    select count(d.dlg_id) as cnt, m.mod_name as modname from volddlgsts d, voldmodule m where m.mod_sid=d.last_module_sid and d.dlg_exit_type_id=16 and d.last_module_sid>-1 and d.vsc_sid=SID group by m.mod_name order by cnt desc

Paths
VoiceObjects Analyzer contains a Dominant Path Analysis report that shows in significant detail how callers navigate through your application, and which choices they predominantly make whenever there is a fork in the road. 

 

dominantpath1

While this report is too complex to replicate fully in manual SQL, we can answer a number of related questions here:

  • Which paths have callers taken to get from one Module to another?
    Different paths can lead to the same destination, and to optimize the flow of an application it is very relevant to look at the various paths callers take to get from one place to another. In our example we want to find all the different paths that lead from the Module object with Reference ID “LifeInsurance” to the one with Reference ID “CarInsurance”.
    select distinct mod_seq_refid as paths from voldsubseq where mod_start_sid=(select mod_sid from voldmodule where mod_refid=’LifeInsurance’ and vsc_sid=SID) and mod_end_sid=(select mod_sid from voldmodule where mod_refid=’CarInsurance’ and vsc_sid=SID) and vsc_sid=SID
  •  Which Modules are often visited together?
    As a mirror to the first question, it is also of interest to see which Modules are often visited alongside a given Module. In terms of online shopping, this is a bit like saying “customers who bought this item also liked these other products”.
    select m1.mod_name as name1, m2.mod_name as name2 from voldmodule m1, voldmodule m2 where m1.mod_sid<m2.mod_sid and m1.mod_sid>0 and m2.mod_sid>0 and m1.vsc_sid=SID and m2.vsc_sid=SID and exists (select * from (select * from voldrelmsq where mod_sid=m1.mod_sid and vsc_sid=SID) as rel1 inner join (select * from voldrelmsq where mod_sid=m2.mod_sid and vsc_sid=SID) as rel2 on rel1.mod_seq_sid = rel2.mod_seq_sid)

  • Which Modules occur as immediate predecessors of a given Module object in sequences?
    Expectations about how to use a certain sub-application are driven by the other places the caller has previously been to during the call. Therefore it is relevant to look at the predecessor Module object. In our example, we want to find out which places callers come from just before they enter the “Car Insurance” sub-application within Prime Insurance.
    select distinct mod_start_name as predecessor from voldsubseq where mod_end_sid=(select mod_sid from voldmodule where mod_name=’Car Insurance’ and vsc_sid=SID) and mod_subseq_count=0 and vsc_sid=SID

And with this, we’ve reached the end of our path for today.
In the next installment we’ll dig one level deeper and look at the detailed information that is written for each caller interaction in the input state table VOLDDSSEQ. In the meantime, we’d love to get your feedback. Just leave a comment below!

Adapt-to-me, as I don’t want to adapt to you

Sunday, June 28th, 2009

Imagine a computer that not only understands what you say but also says it the way you can understand. Imagine a computer that you can talk to the way you talk to humans and that responds the way humans respond. Imagine a computer that can read your thoughts and communicate with you as seamlessly as you’ve always wanted it to, since you saw 2001. OK, keep on dreaming. 2001 is in the past, yet still in the future.

But indeed a HAL of a lot has changed since 1968. Now we can build machines that can reliably understand spoken commands, whole phrases or sentences, and react accordingly: provide timetables, transfer money, book tickets, or provide assistance with any kind of problem we might have in today’s life 2.0 (man how I hate this 2.0 thing by now!). And those machines are increasingly built in a way that they use human patterns of communication, allowing for more or less free speech, interactive turn-taking, and relatively natural-sounding computer voices.

Over 9 months ago – man, time flies like an arrow (hey, we can even build machines that understand the ambiguity of this sentence; apparently already way back around 1968) – I wrote my first article on Natural Dialog Management (also check out the 11/05/2008 jam session on this topic). I promised I’d continue on this, so here I am. Today I want to write about how you can make a voice application adapt to the caller with regard to their “speaking style”, the vocabulary they use, “how they speak”. Why should you do this? Think of a doctor trying to explain what’s wrong with you. If he or she doesn’t adapt his or her vocabulary to yours, you might just as well stay home and google the symptoms yourself.

Here are some examples where Adapt-to-me (as we like to call it at Voxeo) makes sense in speech applications:

Synonyms

  • If you are a provider of, say, landline telephony as well as high-speed internet, you might have callers calling into your helpline saying “I have problems with my Internet connection” at your first How-Can-I-Help-You input state. Your system might confirm this by saying “I understand you have problems with your DSL, correct?”. Your technology to provide internet might be DSL – but does your customer necessarily know that? How could she respond? Maybe with saying “No, Internet!”?
    Ouch…

Number patterns

  • Ever had the experience of giving out your phone number over the phone and hearing it back from your interlocutor in a way you didn’t even recognize your own number anymore? “My number? That’s six two nine three nine oh four.” – “OK, I’ve jotted that down, that’s sixty-two, ninety-three, nine fourteen?” – “Hang-on… let me think… err, yeah I think that’s it.”
    Ouch…

Date patterns

  • How do you say the expiration date of your credit card? If it was “12/12”, would you say “twelve twelve”, or “December twelve”, or “December two thousand twelve”, or …
    No ouch this time. This is just to demonstrate that there are numerous ways to speak dates, and using the same pattern as the caller when repeating their input again can help improve intelligibility of the system, thus cause less frustration, thus fuel acceptance of the overall application, thus increase revenue…?

So I say you can have the computer say “internet connection” instead of “DSL”, and “six two nine three nine oh four” (not even “six two nine three nine zero four”), and even “twelve twelve” if that’s what the caller is inclined to say (maybe hastening to add a “that’s December, two-thousand and twelve”, just to confirm you have fully understood your caller). You will say: “How”? Let me explain.

VoiceObjects allows you to store the pronunciation of an utterance in a variable, along with its actual value. This is done through the grammar that enables the speech recognizer to understand the caller in the first place. This value is called the pronunciation value. There is no fixed format for how this value should look like; it is completely up to you. How to hand this value back to VoiceObjects Server from within a grammar is simple: you add it to the return value for the slot that is filled by the corresponding utterance, separated by a double-pipe (“||”).

Example:

pronunciationvalue


When the server detects this “||” symbol (which is configurable through our media platform driver concept, by the way), it will parse the actual value out of it (“DSL”, as this might be the internal value required for further processing) and assign it to the variable, parse the pronunciation value and assign it as well. By the way, if you’re interested in this value during processing, you can retrieve it via the PRONUNCIATION(RefID) function provided by the Expression object.

What you do with this pronunciation value is straightforward, too: you hand it over to a formatting algorithm (via our Formatting Bus), which takes this pronunciation value (along with the “real” value, which is actually not needed for speaking the variable value back) and uses it to come up with the pronunciation when repeating the value in an output. Note how the grammar, in the above example, returns “internet_connection” as the pronunciation value; this assumes that there is a prerecorded prompt saying “internet connection” as the problem category. Your formatting algorithm would thus probably need to return “internet_connection.wav” as the audio file to use for playback. In fact, for this example you don’t even need your own formatting algorithm. The predefined formatting types utilize the pronunciation value instead of the actual variable value anyway. So choosing, e.g., TTA – Files or TTA – Complete as formatting types for your Variable object will make the platform use “internet_connection.wav” right away. Nice and simple.

Let’s have a look at the number pattern example now.

First, your grammar must be built in such a way that it can recognize single digits as well as number blocks. Usually, rules that match “one” up to “ninety-nine” suffice. The rest can be nested using smart grammar rule structures. In the tags that compute the value of what was said (as opposed to the words used), you need to add logic that also builds up the pronunciation value as the caller speaks (or rather: as the ASR engine computes the result). As an example, if the caller in fact says “sixty-two ninety-three nine oh four”, the slot return value computed by your grammar rules might be “6293904||62 93 9 oh 4”, which gets parsed as “6293904” for the actual variable value and “62 93 9 oh 4” for the pronunciation value. Your formatting algorithm might make a sequence of “62.wav 93.wav 9.wav oh.wav 4.wav” out of this. In fact, you could just as well use a predefined TTA algorithm for this again, e.g. TTA – Words, and it will do the job.

Last but not least, our famous sample application Prime Telecom, a telco self-service portal coming in three channels (voice, text, mobile Web), provides a sample implementation of Adapt-to-me with the credit card expiration date example I described above. Go check it out today! You can get all the software required to run this sample application for free at http://developers.voiceobjects.com. Go and impress your boss with what VoiceObjects can do for making your phone applications a much more pleasant experience, and your customers much happier. (Or maybe you ARE the boss? But hey – this mission is too important for me to allow you to jeopardize it…)

Oh, and if your boss tells you to implement this within your existing VoiceObjects app, check out the Input object documentation of our Object Reference (search for “pronunciation value”).

Inside Infostore – Part I: Structure and Call Records

Wednesday, April 8th, 2009

Infostore, the VoiceObjects data repository for real-time caller behavior analysis, offers a wealth of information so rich that it can be outright confusing for novice users. So in this series of blog postings, we want to shed light on Infostore’s inner workings and provide technically minded readers with the understanding and some sample SQL to explore the data on their own.

In addition, of course, there is VoiceObjects Analyzer with its comprehensive set of pre-built reports for all of the leading Business Intelligence (BI) frameworks. To find out more about it, as well as the Voxeo VoiceObjects tools in general, go to http://developers.voiceobjects.com/voiceobjects-documentation/.
For those eager to learn more we also offer hands-on training sessions on Infostore. Visit http://www.voiceobjects.com/en/support/training/ for details.

In this first part of the “Inside Infostore” series, we’ll look at the general structure of the Infostore repository and focus on the single dialog statistics record that is written for each call. In the subsequent parts we will then dive deeper into more detailed information about input states, personalization, business tasks, etc.

On a high level, Infostore is organized as a snowflake schema and optimized for immediate analysis of session data, typically by using BI tools, without the need for intermediate ETL processes. In particular this means that there are a number of key fact tables referring to lookup tables for the various dimensions. The following image gives a high-level overview of the relationships:

infostoreoverview

The Infostore data model has been designed for extensibility and integration with data derived e.g. from CRM systems, IVR logging, etc. In the same way, custom data logged by an application can be merged with the standard information contained in the Infostore fact tables.

infostoreextensions
The fact table we will focus on for right now is VOLDDLGSTS containing the dialog statistics, on a level that corresponds to what is often referred to as a Call Detail Record (CDR). In more than a hundred columns, the table contains aggregated information about the respective dialog session and can answer many important questions about application quality and caller behavior even without the need to join other, more detailed fact tables.
The entries in VOLDDLGSTS are the highest level of session information in Infostore, and in most installations it is desirable to have them for each and every session (at least for a certain period of time, such as 30 days). However, through simple configuration on the level of each deployed service it is possible to use statistical sampling and only collect data e.g. for 5% of all calls.

The following paragraphs describe the different types of data present within the VOLDDLGSTS table and provide sample SQL statements to answer typical questions. The SQL has been tested using Microsoft SQL Server; adjustments may be required for other databases. SQL buffs should also note that the statements have been optimized for readability as opposed to performance.
Entries in VOLDDLGSTS belong to specific services identified by a unique ID, the VSC_SID. In all of the samples we assume this SID to be known and fixed. It can be retrieved like this:

select vsc_sid from voldvscobj where vsc_refid=’<VSN of service>’ and is_current=1

Finally, the SQL statements used here operate on the “raw” Infostore tables. For Analyzer, there is an additional view layer that adjusts localization and performs a few mappings that usually aren’t relevant here. In some statements you see “locale_id=1″, which indicates the English localizations. Should you prefer German, use “locale_id=2″ instead.

Basic Session Information
On the most basic level, VOLDDLGSTS contains information about the vitals of each call session, including:

  • When the session started (MONTH_ID, DAY_ID, MINUTE_ID, SECOND_ID)
  • Which context parameters were available for the session (DLG_AAI, DLG_ANI, DLG_CRMID, DLG_DNIS, DLG_GCID, DLG_IID, DLG_RDNIS, DLG_SPSID)
  • Where it was processed (SRV_HOST_IP, SRV_INST_PORT, SRV_INST_NAME)
  • Which media platform driver was used (DRIVER_ID)
  • How long it lasted (DLG_CALL_DUR_MS, DLG_PROC_DUR_MS)

Even on the basis of just this core information, a number of relevant questions can quickly be answered:

  • How many calls were there yesterday / last week?
    Calls for a given day can easily be extracted with the data format YearMonthDay by use of:
    select count(*) from volddlgsts where vsc_sid=SID and day_id = ‘20090403′
    Similarly, making use of the date dimension table VOLDDATDAY we can retrieve all calls for a given calendar week:
    select count(*) from volddlgsts where vsc_sid=SID and day_id in (select day_id from volddatday where cw_id = ‘200914′ and locale_id=1)

  • Which percentage of calls comes from within the San Francisco (415) area code?
    For certain applications it is interesting to see where callers are geographically located. This can often be approximated by area codes:
    select 100.0*count(*)/(select count(*) from volddlgsts where vsc_sid=SID) from volddlgsts where vsc_sid=SID and dlg_ani like ‘415%’

  • Which calls lasted over a minute?
    Depending on the application, long session durations may indicate that callers had problems getting the information they called for. Thus it may be helpful to look at such sessions in more detail.
    select dlg_id,dlg_ani,day_id,minute_id from volddlgsts where vsc_sid=SID and dlg_call_dur_ms > 60000

As an excercise, you may want to build SQL statements to answer the following questions:

  • Which percentage of calls came in during weekdays / weekends? (Hint: Use the information in VOLDDATDAY)
  • Show number of sessions per day of week
  • Which is the busiest day of the week (in terms of number of sessions)?

Interaction Details
Moving up from the session basics to information on how the caller interacted with the application, we get the following:

  • How many dialog steps the session encompassed, and of which type (NO_DS_STEP, NO_DS_STEPS_VOICE, NO_DS_STEPS_DTMF, NO_DS_STEPS_TEXT)
  • Which No Input / No Match events occurred during the session (NO_NI, NO_NM, NO_NI_1..4, NO_NM_1..4, NO_DS_NOINPUT, NO_DS_NOMATCH)
  • How well recognition worked (AVG_CONF_VOICE, NO_DS_IMMEDREC, NO_DS_NONIMMEDREC, NO_DS_SUCCESS, NO_DS_NONSUCCESS)
  • How often standard navigation commands were used (NO_BACK, NO_FORWARD, NO_RPTS, NO_SKIP)
  • How often custom navigation commands were used (NO_HYPERLINKS)
  • How the session ended (DLG_EXIT_TYPE_ID, LAST_DS_STEP, LAST_DS_NAME, LAST_DS_TYPE)

Frequently used questions in this area are:

  • How do calls end?
    There are multiple ways in which calls can end (e.g. caller hanging up, application terminating normally or in exception, etc.) and it is good practice to keep an eye on the distribution. Here we use the localizations for the various exit types contained in VOLDEXTTYP.
    select count(d.dlg_id) as no_sessions, x.dlg_exit_type_dsc from volddlgsts d right outer join voldexttyp x on (d.dlg_exit_type_id = x.dlg_exit_type_id and d.vsc_sid=SID)
    where x.locale_id=1 group by x.dlg_exit_type_dsc

  • Which objects do callers typically hang up in?
    For those calls ending with a caller hang-up it is relevant to look at where in the application this happens, since it may point to spots that cause callers grief.
    select distinct last_ds_name, count(last_ds_name) as no_sessions from volddlgsts where vsc_sid=SID and dlg_exit_type_id=16 group by last_ds_name order by count(last_ds_name) desc

  • Which percentage of calls uses any sort of navigation?
    Most applications offer some way of escaping the normal top-to-bottom dialog flow, either by jumping to specific points (e.g. “main menu”) or by relative navigation (e.g. “back” or “repeat”). If a very large percentage of callers uses them, adjustments in the standard flow might be useful.
    select 100.0*count(*)/(select count(*) from volddlgsts where vsc_sid=SID) from volddlgsts where vsc_sid=SID and no_back+no_rpts+no_forward+no_skip+no_hyperlinks>0

Other questions you may want to explore for yourself could be:

  • Which percentage of calls has both No Input and No Match events?
  • Is the average confidence in short calls higher than in long calls?
  • How does average confidence vary by area code?

Processing Details
In addition to details on the interaction with the caller, VOLDDLGSTS also contains a lot of useful information about the interaction with backends:

  • How many backend interactions occurred, and how long they took (NO_CONNECTOR_EXECS, CONN_EXEC_TIME_MAX, CONN_EXEC_TIME_MIN, CONN_EXEC_TIME_TOT)
  • Which errors occurred during the session (NO_ERRS, NO_ERRS_CONNECTOR, NO_ERRS_INTERNAL, NO_ERRS_MP, NO_ERRS_SCRIPT)
  • How many notifications were sent during the session (NO_NOTIFICATIONS)
  • Which network-related activity took place (NO_REQUESTS, VOL_BYTES)

Interesting questions regarding the backend are e.g.

  • During which times has backend access been slow?
    This may point to problems on the backend itself, or to network congestion.
    select day_id,minute_id from volddlgsts where conn_exec_time_max>3000 and vsc_sid=SID

  • Were any calls aborted due to backend errors?
    Again, this may point to either problems on the backend itself or in the integration code that connects the application to the backend.
    select dlg_id from volddlgsts where dlg_exit_type_id=2 and no_errs_connector>0 and vsc_sid=SID

  • What’s the total data volume (in MB) transferred between IVR and VoiceObjects Server by week?
    This information is useful to ensure that network cpacacity between the IVR and VoiceObjects Server is sufficient to maintain optimal performance.
    select sum(d.vol_bytes)/10485476 as volume, t.cw_id as week from volddlgsts d, volddatday t where d.day_id=t.day_id and t.locale_id=1 and d.vsc_sid=SID group by t.cw_id order by t.cw_id

Other interesting backend-related questions could be:

  • What is the average backend processing time?
  • Are errors tied to backend slowdowns?

And finally, of course, you can combine information from the different categories to answer broader questions such as:

  • How does average confidence vary by area code?
  • How much longer are calls with many No Input / No Match events than calls with fewer of them?
  • Do weekday calls show a different caller behavior than weekend calls in terms of events and navigation?

That should do it for today. Keep in mind that we’ve used only a portion of the columns in VOLDDLGSTS so far – and that’s just one of several fact tables in Infostore. So there’s lots more to come.
Next time, we’ll look at how callers navigate through an application by means of module sequences.

Personalization using Layers – Part I

Tuesday, January 20th, 2009

You are going to develop a personalized and flexible phone application? Okay, bad news first: You may get into hot water, if you are not able to ensure a manageable application definition because of increasing complexity. The good news is that VoiceObjects Desktop and VoiceObjects Server have been designed to cope exactly with this complexity, what makes your life much easier.

 

Sure, you are right: In most of today’s phone applications personalization is key, because it allows for adapting the user interface

  • to individual users, e.g. by preferred language, persona or input mode,
  • to user segments, e.g. post-paid customer vs. pre-paid customer, or novice user vs. power user, and
  • to other relevant conditions, e.g. workday vs. weekend, happy hour vs. unhappy ;-) hour, or back-end available vs. back-end unavailable.

Real-life applications typically apply a combination of different conditions in order to offer the best suited user interface under certain conditions to a certain user. Additionally those conditions should be dynamically changeable at call time, e.g. to switch to another dialog language instantly at any dialog step. In the same way a web application server enables personalized web sites, the VoiceObjects phone application server supports personalized phone applications.

 

And it is not just about delivering the highest quality of service to your users! VoiceObjects Server based personalization also helps to relieve the media platform or browser from processing all these conditions during each call, ensuring best media platform performance during dialog execution. In other words: In a VoiceObjects setup the browser is used for dialog presentation, but VoiceObjects Server is responsible for any business logic and cares for “handpicked” (dynamic) dialogs for each user. Additionally all applied dialog conditions are automatically logged into the system database (Infostore), and the out-of-the-box reporting of VoiceObjects Analyzer can be used to analyze which conditions have been applied when and how often. Last but not least, you can apply these conditions as selection criteria (so-called dimensions) to other statistics, in order to analyze and compare user behavior, task completion or recognition scores in more detail.

 

VoiceObjects allows for such powerful applications by offering a single concept, called layers. Layers can be thought of global definitions or filters defined in your application. A developer defines a layer first, and later on applies the layer to each dialog step where this conditional behavior is required. However, in order to use the layer concept to its full capacity, you have to know how to design, configure, switch, apply and manage layer conditions. This article (to be continued) wants to shed some light on what you should keep in mind using layers.

 

Let’s start with an important basic distinction: VoiceObjects Desktop is equipped with standard (meaning application-independent) layers, which are already built-in to other dialog objects. These so-called system layers can evaluate the current dialog language (English-US, Spanish, etc.), input mode (voice / DTMF), phone channel (like voice or mobile Web) and occurrence (visit counter per dialog step). All other (typically application-dependent) layers are so-called custom layers. Each custom layer in your project has to be defined by a Layer object.

 

The second important concept focuses on the switching behavior of layers: Each layer has a set of so-called states, which are switched on or off following a well-defined process and logic. On the one hand automatic layers are switching… (guess!)… automatically, i.e. some internal logic controls which layer states are currently on (active) or off (inactive). On the other hand manual layers have to be switched “manually”, i.e. by calling the Layer function in an Expression object. The same Layer object type is used for defining both manual and automatic layers. By the way, all system layers are manual layers.

 

You may ask: What determines if a custom layer defined by a Layer object should be automatic or manual? Consider the following: Are there any settings (variables or other indicators, like contract number or time of day) which can/should be constantly monitored in order to control the layer states? This would argue for an automatic layer. Or is this layer more dependent on certain events or exceptions and changing more on individual incidents rather than switching frequently throughout the whole dialog? This would call for a manual layer. You should also take into account that a manual layer always has exactly one state activated, whereas automatic layers can have any number of active states (including none or all states). In general, many custom layers can be configured as an automatic or manual layer, and the choice will be a matter of best practice and personal preferences.

 

Looking deeper into the configuration of automatic layers you will notice two alternative mechanisms offered by the Layer object: First a state indicator and sets of indicator values assigned to each state, working like a Case-Else construct, and second state conditions working like pre-conditions per state. Mind that both approaches cannot be mixed in a single layer definition.

 

Typical traps working with layers are:

  1. During development ensure that your state IDs (which will be used internally by VoiceObjects Server during layer processing) are subject to the same restrictions as other object reference IDs, especially use no special characters (incl. space!) and be unique within your project.
  2. Do not try to switch an automatic layer using the Expression function Layer(). This would cause an internal server error at call time.
  3. Do not forget to define the initial (default) state that is required for each manual layer.

 

Feel free to add comments based on your experience working with layers. A follow-up article will continue on some best practices about switching layers and applying layers to your dialog definition, and will provide some more references and examples. Stay tuned!

Handling Test Case Data in VoiceObjects 7.4

Wednesday, January 7th, 2009
When developing voice applications, you often find yourself in a situation where you don’t (yet) have access to real back-end systems – yet you need to test your application for a variety of different scenarios, each with a different set of parameters, caller data, request and response data from back-end systems, etc.

In short, you need to handle sets of test data, each set representing a certain test case. Of course, there are several options to deal with this, but as of VoiceObjects 7.4, you now have a very elegant solution at our fingertips: The new expression function APPLYCONFIGURATION.

What does it do? Let’s have a look at the inline documentation in the Expression editor:

APPLYCONFIGURATION (configurationXML) – Applies the assignments defined in configurationXML. The XML format used is the same as for application defaults.

Application Defaults

Have you used the application defaults functionality before? If not – it’s simple: It’s about initializing selected variables, layers and collections on the service level. The Service object references an XML configuration file in the Configuration URL field. This configuration XML file will be loaded whenever the service is (re-)deployed. (For more information, check out the section on Application Defaults in the VoiceObjects Deployment Guide.)

The primary use case for Application Defaults is this: When working in multiple environments such as, say, a development, a test, and a production environment, each of those will require some unique configuration settings. For example, database names and credentials might differ, resource locator paths, and any other external settings. By “outsourcing” the initialization of the environment-dependent variables to the “application defaults” configuration XML document (which is bound to the service object, not to the project), the project definition itself becomes agnostic of the environment and can hence easily be taken from “dev” to “test”, and from “test” to “prod”, without applying any changes to the project.

Click to enlarge

Click to enlarge

For an example of a valid configuration XML file, scroll down to the bottom of this posting. In a nutshell, a configuration XML file references any number of variables, layers, and collections in a given project (by reference ID) and defines their initial values.

In-Session Configuration

Now, VoiceObjects 7.4 takes this concept one step further and makes the same mechanism available on a per-session basis: You can do bulk assignments of variables, layers and collections in a single step within your call flow definition, applying different sets of values for each and every call. Of course, this comes in very handy when you need to manage test case data.

Let’s have a look at our Prime Telecom demo application. It supports 3 different languages and 2 different customer types. For each of the resulting 3×2 = 6 combinations, we need at least one test case.

In the previous version of Prime Telecom, these test cases were handled in the traditional way: Within the Preprocessing sequence of the main Module Prime Telecom Portal, a Connector object invoked a JSP, providing the current language and the customer status as request parameters. As response parameters, the Connector’s parameter set contained each and every variable and collection that needed to be initialized – the customer’s postal address, email address, payment information, current tariff, subscribed tariff add-ons, available tariff add-ons etc. Quite a few parameters had to be maintained. And whenever a new parameter had to be added, it had to be added both in the JSP implementation and to the Connector object’s parameter set. Also, the maintenance of the test data in that JSP was cumbersome at best.

Not so any more.

The new implementation of test case data handling in Prime Telecom relies on test data being organized in configuration XML files, each file representing one test case. In Prime Telecom, these files are named configuration_de-DE_platinum.xml, configuration_de-DE_silver.xml, configuration_en-UK_platinum.xml etc.

In the Preprocessing sequence of the main Module Prime Telecom Portal,

  1. a Connector object reads the configuration XML file (via http get) for the current language and customer status and assigns its content to a variable;
  2. this variable is then used as the argument of an APPLYCONFIGURATION expression, setting all required variables and collections at once.
Prime Telecom Portal Module - Preprocessing

The beauty if this solution is that there is only one place to maintain the test data: In the configuration XML documents. When adding more parameters, only the XML documents need to be adapted; the Connector implementation (in our case, some Java code) and the Connector object’s parameter set remain unchanged. Also, the configuration XML documents are much easier to read and hence to maintain than the old JSP.

Of course, there are more use cases to APPLYCONFIGURATION than “just” handling test cases. For example, a hosted service provider could build application templates which become adapted to each customer’s requirements using this mechanism. Also note that, using VoiceObjects’ web service interface, much of the necessary handling could be automated, creating easy-to-use web front ends for end customers.

Example for a valid configuration XML document

This example shows how two objects are being initialized - the variable with the RefID CustomerBaseTariffName, and the collection with the RefID CustomerPaymentSettings. Note that the <type> nodes are optional; also note that collections need to be masked by <![CDATA[ ... ]]> sections.

<?xml version=”1.0″ encoding=”UTF-8″?>
<configurations>
  <configuration>
    <referenceID>CustomerBaseTariffName</referenceID>
    <type>variable</type>
    <value>Individual Plan</value>
  </configuration>
  <configuration>
    <referenceID>CustomerPaymentSettings</referenceID>
    <type>collection</type>
    <value><![CDATA[
      <root>
        <row>
          <col name="type">Visa</col>
          <col name="number">4140040912440644</col>
          <col name="expdate">0210</col>
        </row>
      </root>
    ]]></value>
  </configuration>
</configurations>