Posts Tagged ‘Analyzer’

How to enable Infostore on your Developer Edition

Tuesday, June 30th, 2009

A couple of weeks ago we had started to take a look at the structure of Infostore,  the VoiceObjects data repository for real-time caller behavior analysis. (Incidentally – yes, the second installment of this series is in the pipeline.)

 Infostore is the basis for statistical data analysis using Business Intelligence (BI) tools such as MicroStrategy (for which a free edition is available), SAP BusinessObjects or IBM Cognos. As such a certain amount of data is required to obtain useful and statistically significant reports. However we realize that many of you would like to explore the capabilities of Infostore even in your development environments using Voxeo VoiceObjects Developer Edition, so in this article we’ll show you how to enable Infostore in this setup. And it’s just about as easy as 1-2-3:

  1. Set up a database
    Infostore, being a data repository, requires a relational database to live in. Since it is conceptually designed to hold many millions of records, it currently supports large-scale commercial database systems that can do this with high performance, including Oracle, Microsoft SQL Server, IBM DB2, and PostgreSQL.
    For evaluations all of these provide free versions that can be downloaded; one that is particularly easy to work with and that we’ll use in this article is Microsoft SQL Server Express. Installation and configuration of the database itself is very straightforward. Appendix B of the Installation Guide provides  instructions on how to set things up.
  2. Create the Infostore schema
    Once you have a database user and (empty) schema, it is time to create the tables and fill them with initial configuration data. Infostore supports partitioned and non-partitioned tables; in this article we will use the non-partitioned version since the data volume for evaluation will be limited.
    So the first thing to do is to run the script LDCreate.sql that can be found in <EclipseFolder>\plugins\com.voiceobjects.eclipseDesktop_9.0.0\driver\db\SQLServer of your Developer Edition installation. This creates all of the database tables that Infostore needs.
    After this has finished successfully, run LDConfigure.sql from the same folder. This inserts a whole bunch of static look-up data into the tables and prepares Infostore for its use.
  3. Configure Developer Edition
    With the preparations of the two preceding steps, we can now point Voxeo VoiceObjects Developer Edition to use this Infostore repository.
    Open the configuration file Eclipse_Configuration.xml that is located in <EclipseFolder>\plugins\com.voiceobjects.eclipseDesktop_9.0.0\config and locate the section System Log Repository Connection. Within this section, there is a single <connection> entry.
    Set its enabled attribute to true (the default entry is false). Then set the correct database type <rdbms>, connection URL <dburl>, user <user> and password <password>.  There are examples for the various supported database types; for SQL Server it will be something like:
                <rdbms>SQLServer</rdbms>
                <dburl>jdbc:jtds:sqlserver://localhost:1433/VODB</dburl>
                <user>voadmin</user>
                <password>voadmin</password>
  4. Start Developer Edition and make your calls
    After having changed and saved the configuration file, (re-)start your Developer Edition. If you have configured a Control Center entry for the embedded server, you should see a green light in the System DB Logging column.
    Deploy an application using the Test Application command and make a call either with a media platform such as Voxeo Prophecy or using the Debug Viewer. This will create the first fact record entries in your Infostore repository.
  5. Enjoy
    To explore the entries created in Infostore, refer to the sample SQL queries listed here. Or take a look at the Infostore Guide to learn more about advanced features.

As always, if you have any problems, questions, or suggestions do not hesitate to contact our Extreme Support.

And if you want to learn how you can fill your new Infostore repository faster than by manually making calls, join us for our next Developer Jam Session taking place on July 29 at 11am Eastern to learn about LoadTester. Registration is open now.

Inside Infostore – Part I: Structure and Call Records

Wednesday, April 8th, 2009

Infostore, the VoiceObjects data repository for real-time caller behavior analysis, offers a wealth of information so rich that it can be outright confusing for novice users. So in this series of blog postings, we want to shed light on Infostore’s inner workings and provide technically minded readers with the understanding and some sample SQL to explore the data on their own.

In addition, of course, there is VoiceObjects Analyzer with its comprehensive set of pre-built reports for all of the leading Business Intelligence (BI) frameworks. To find out more about it, as well as the Voxeo VoiceObjects tools in general, go to http://developers.voiceobjects.com/voiceobjects-documentation/.
For those eager to learn more we also offer hands-on training sessions on Infostore. Visit http://www.voiceobjects.com/en/support/training/ for details.

In this first part of the “Inside Infostore” series, we’ll look at the general structure of the Infostore repository and focus on the single dialog statistics record that is written for each call. In the subsequent parts we will then dive deeper into more detailed information about input states, personalization, business tasks, etc.

On a high level, Infostore is organized as a snowflake schema and optimized for immediate analysis of session data, typically by using BI tools, without the need for intermediate ETL processes. In particular this means that there are a number of key fact tables referring to lookup tables for the various dimensions. The following image gives a high-level overview of the relationships:

infostoreoverview

The Infostore data model has been designed for extensibility and integration with data derived e.g. from CRM systems, IVR logging, etc. In the same way, custom data logged by an application can be merged with the standard information contained in the Infostore fact tables.

infostoreextensions
The fact table we will focus on for right now is VOLDDLGSTS containing the dialog statistics, on a level that corresponds to what is often referred to as a Call Detail Record (CDR). In more than a hundred columns, the table contains aggregated information about the respective dialog session and can answer many important questions about application quality and caller behavior even without the need to join other, more detailed fact tables.
The entries in VOLDDLGSTS are the highest level of session information in Infostore, and in most installations it is desirable to have them for each and every session (at least for a certain period of time, such as 30 days). However, through simple configuration on the level of each deployed service it is possible to use statistical sampling and only collect data e.g. for 5% of all calls.

The following paragraphs describe the different types of data present within the VOLDDLGSTS table and provide sample SQL statements to answer typical questions. The SQL has been tested using Microsoft SQL Server; adjustments may be required for other databases. SQL buffs should also note that the statements have been optimized for readability as opposed to performance.
Entries in VOLDDLGSTS belong to specific services identified by a unique ID, the VSC_SID. In all of the samples we assume this SID to be known and fixed. It can be retrieved like this:

select vsc_sid from voldvscobj where vsc_refid=’<VSN of service>’ and is_current=1

Finally, the SQL statements used here operate on the “raw” Infostore tables. For Analyzer, there is an additional view layer that adjusts localization and performs a few mappings that usually aren’t relevant here. In some statements you see “locale_id=1″, which indicates the English localizations. Should you prefer German, use “locale_id=2″ instead.

Basic Session Information
On the most basic level, VOLDDLGSTS contains information about the vitals of each call session, including:

  • When the session started (MONTH_ID, DAY_ID, MINUTE_ID, SECOND_ID)
  • Which context parameters were available for the session (DLG_AAI, DLG_ANI, DLG_CRMID, DLG_DNIS, DLG_GCID, DLG_IID, DLG_RDNIS, DLG_SPSID)
  • Where it was processed (SRV_HOST_IP, SRV_INST_PORT, SRV_INST_NAME)
  • Which media platform driver was used (DRIVER_ID)
  • How long it lasted (DLG_CALL_DUR_MS, DLG_PROC_DUR_MS)

Even on the basis of just this core information, a number of relevant questions can quickly be answered:

  • How many calls were there yesterday / last week?
    Calls for a given day can easily be extracted with the data format YearMonthDay by use of:
    select count(*) from volddlgsts where vsc_sid=SID and day_id = ‘20090403′
    Similarly, making use of the date dimension table VOLDDATDAY we can retrieve all calls for a given calendar week:
    select count(*) from volddlgsts where vsc_sid=SID and day_id in (select day_id from volddatday where cw_id = ‘200914′ and locale_id=1)

  • Which percentage of calls comes from within the San Francisco (415) area code?
    For certain applications it is interesting to see where callers are geographically located. This can often be approximated by area codes:
    select 100.0*count(*)/(select count(*) from volddlgsts where vsc_sid=SID) from volddlgsts where vsc_sid=SID and dlg_ani like ‘415%’

  • Which calls lasted over a minute?
    Depending on the application, long session durations may indicate that callers had problems getting the information they called for. Thus it may be helpful to look at such sessions in more detail.
    select dlg_id,dlg_ani,day_id,minute_id from volddlgsts where vsc_sid=SID and dlg_call_dur_ms > 60000

As an excercise, you may want to build SQL statements to answer the following questions:

  • Which percentage of calls came in during weekdays / weekends? (Hint: Use the information in VOLDDATDAY)
  • Show number of sessions per day of week
  • Which is the busiest day of the week (in terms of number of sessions)?

Interaction Details
Moving up from the session basics to information on how the caller interacted with the application, we get the following:

  • How many dialog steps the session encompassed, and of which type (NO_DS_STEP, NO_DS_STEPS_VOICE, NO_DS_STEPS_DTMF, NO_DS_STEPS_TEXT)
  • Which No Input / No Match events occurred during the session (NO_NI, NO_NM, NO_NI_1..4, NO_NM_1..4, NO_DS_NOINPUT, NO_DS_NOMATCH)
  • How well recognition worked (AVG_CONF_VOICE, NO_DS_IMMEDREC, NO_DS_NONIMMEDREC, NO_DS_SUCCESS, NO_DS_NONSUCCESS)
  • How often standard navigation commands were used (NO_BACK, NO_FORWARD, NO_RPTS, NO_SKIP)
  • How often custom navigation commands were used (NO_HYPERLINKS)
  • How the session ended (DLG_EXIT_TYPE_ID, LAST_DS_STEP, LAST_DS_NAME, LAST_DS_TYPE)

Frequently used questions in this area are:

  • How do calls end?
    There are multiple ways in which calls can end (e.g. caller hanging up, application terminating normally or in exception, etc.) and it is good practice to keep an eye on the distribution. Here we use the localizations for the various exit types contained in VOLDEXTTYP.
    select count(d.dlg_id) as no_sessions, x.dlg_exit_type_dsc from volddlgsts d right outer join voldexttyp x on (d.dlg_exit_type_id = x.dlg_exit_type_id and d.vsc_sid=SID)
    where x.locale_id=1 group by x.dlg_exit_type_dsc

  • Which objects do callers typically hang up in?
    For those calls ending with a caller hang-up it is relevant to look at where in the application this happens, since it may point to spots that cause callers grief.
    select distinct last_ds_name, count(last_ds_name) as no_sessions from volddlgsts where vsc_sid=SID and dlg_exit_type_id=16 group by last_ds_name order by count(last_ds_name) desc

  • Which percentage of calls uses any sort of navigation?
    Most applications offer some way of escaping the normal top-to-bottom dialog flow, either by jumping to specific points (e.g. “main menu”) or by relative navigation (e.g. “back” or “repeat”). If a very large percentage of callers uses them, adjustments in the standard flow might be useful.
    select 100.0*count(*)/(select count(*) from volddlgsts where vsc_sid=SID) from volddlgsts where vsc_sid=SID and no_back+no_rpts+no_forward+no_skip+no_hyperlinks>0

Other questions you may want to explore for yourself could be:

  • Which percentage of calls has both No Input and No Match events?
  • Is the average confidence in short calls higher than in long calls?
  • How does average confidence vary by area code?

Processing Details
In addition to details on the interaction with the caller, VOLDDLGSTS also contains a lot of useful information about the interaction with backends:

  • How many backend interactions occurred, and how long they took (NO_CONNECTOR_EXECS, CONN_EXEC_TIME_MAX, CONN_EXEC_TIME_MIN, CONN_EXEC_TIME_TOT)
  • Which errors occurred during the session (NO_ERRS, NO_ERRS_CONNECTOR, NO_ERRS_INTERNAL, NO_ERRS_MP, NO_ERRS_SCRIPT)
  • How many notifications were sent during the session (NO_NOTIFICATIONS)
  • Which network-related activity took place (NO_REQUESTS, VOL_BYTES)

Interesting questions regarding the backend are e.g.

  • During which times has backend access been slow?
    This may point to problems on the backend itself, or to network congestion.
    select day_id,minute_id from volddlgsts where conn_exec_time_max>3000 and vsc_sid=SID

  • Were any calls aborted due to backend errors?
    Again, this may point to either problems on the backend itself or in the integration code that connects the application to the backend.
    select dlg_id from volddlgsts where dlg_exit_type_id=2 and no_errs_connector>0 and vsc_sid=SID

  • What’s the total data volume (in MB) transferred between IVR and VoiceObjects Server by week?
    This information is useful to ensure that network cpacacity between the IVR and VoiceObjects Server is sufficient to maintain optimal performance.
    select sum(d.vol_bytes)/10485476 as volume, t.cw_id as week from volddlgsts d, volddatday t where d.day_id=t.day_id and t.locale_id=1 and d.vsc_sid=SID group by t.cw_id order by t.cw_id

Other interesting backend-related questions could be:

  • What is the average backend processing time?
  • Are errors tied to backend slowdowns?

And finally, of course, you can combine information from the different categories to answer broader questions such as:

  • How does average confidence vary by area code?
  • How much longer are calls with many No Input / No Match events than calls with fewer of them?
  • Do weekday calls show a different caller behavior than weekend calls in terms of events and navigation?

That should do it for today. Keep in mind that we’ve used only a portion of the columns in VOLDDLGSTS so far – and that’s just one of several fact tables in Infostore. So there’s lots more to come.
Next time, we’ll look at how callers navigate through an application by means of module sequences.