A - Sizing and Configuration Recommendations

This document provides guidelines and recommendations for sizing and configuring VoiceObjects installations. The guidelines are based on load test results as well as on experience from existing, live VoiceObjects deployments.

The focus of this document is on VoiceObjects Server. Sizing and configuration of the server machine or machines hosting VoiceObjects Desktop is not critical for a production environment since it will typically only be used for server administration. Sizing of the RDBMS server and of the media platform components should follow guidelines of the respective software vendors.

i8    Note: When talking about VoiceObjects Desktop within this document this refers to the Desktop process to which Desktop for Eclipse and Desktop for Web connect. Thus the machine hosting VoiceObjects Desktop is the machine running the Desktop process, not the machine on which Eclipse with the Desktop for Eclipse plug-in is installed, or on which the Internet Explorer session for Desktop for Web is started.

General Configuration Guidelines

As stated in Appendix A – System Requirements and Supported Configurations in the Installation Guide, in a production environment it is strongly recommended to install

·          VoiceObjects Server,

·          VoiceObjects Desktop, and

·          the RDBMS server(s) operating the VoiceObjects Metadata Repository and the Infostore Repository

on separate server machines. The components of the media platform (including the VoiceXML browser, speech recognition and speech synthesis modules etc.) should again be installed on other server machines.

Several instances of VoiceObjects Server will typically be deployed on at least two different physical server machines for failover reasons. In addition, the database server(s) may be operated in a failsafe, redundant configuration, following the guidelines of the respective RDBMS vendor.

8     Caution: In order to optimize performance it should be ensured that all system services and automatic jobs (cron jobs on Linux respectively) that are not required for working with VoiceObjects are disabled on the server machines running VoiceObjects.

Cluster Deployment of VoiceObjects Server

VoiceObjects Server can be deployed in a server cluster. On each physical machine, one or several server instances may be deployed. A load balancer is responsible for distributing the requests from the VoiceXML gateway among the available server instances.

Different NICs (Network Interface Cards) are used for serving dynamic VoiceXML pages and for accessing the Metadata Repository and back-end systems, respectively.

All servers access the same Metadata Repository database and the same Infostore database, each of which may also be set up as a database server cluster.

A single instance of VoiceObjects Desktop is deployed onto a dedicated machine. In the production environment, it is used for system administration, monitoring and maintenance. It accesses the same Metadata Repository database as the servers.

The actual number of required machines and server instances will mainly depend on the number of IVR ports to be handled by the cluster. Guidelines for calculating the necessary hardware are given in the section Determining the Required Number of Servers.

The installation procedure for deploying several server instances on a single machine is described in the section Configure Multiple Server Instances per Machine.

Configuration details about load balancers are documented in the section Load Balancer Configuration.

Note, that some manual changes to the standard configuration are necessary to set up a server cluster. The required configuration steps that need to be applied to the individual server instances are described in Chapter 1 – Advanced Configuration of VoiceObjects.

i8    Note: When installing a cluster of VoiceObjects Servers, make sure that all server instances use the same logical server name. In the standard installation, the logical server name defaults to VOServer.

JVM memory configuration

The reason for configuring several instances per machine is twofold. It is based on the fact that VoiceObjects Server is a Java Web application, being hosted by a Java Web application server:

·          The memory space that a 32 bit Java virtual machine (JVM) can address is limited. The actual limit is well below 2 GB. For a server machine equipped with 8 GB RAM, for example, it is hence mandatory to have several server instances running in parallel in order to make use of the full available memory. It is recommended to configure the JVM that is running VoiceObjects Server with 1 GB (or more) heap size. The values for the initial and for the maximum heap size should be identical, in order to prevent unnecessary and time-consuming memory allocation activities.
8Caution: Note that the memory that will be reserved in total for the corresponding JVM process might be up to 50% higher than the configured heap space, as Java claims this additional memory for internal data structures such as class stores or thread stacks.

·          When the Java Garbage Collector is activated, the server is practically unavailable for a short time, impacting page response times during this period. The larger the heap size, the longer the garbage collection periods. A configuration with several server instances is hence preferred to a configuration with one large instance, because this will result in many short garbage collection periods (which will ideally be unobservable to the caller) rather than a few long ones.


Architectural sketch



Figure 1:
In this sample setup, there are two machines that can each host several server instances. Requests from the media platform and from the USSD gateway are distributed to these instances by a load balancer. The media platform, VoiceObjects Server, and the database servers hosting the Metadata Repository and the Infostore Repository are individually clustered.

Calculate number of database connections

The number of used database connections differs depending on the kind of repository. If the repository serves as Metadata Repository typically more connections are needed than if it is used for storing logging data. All database connections for a corresponding instance of VoiceObjects Server or VoiceObjects Desktop are reserved at once when the instance has started successfully.

 

Metadata Repository

For the Metadata Repository the number of database connections used by a VoiceObjects cluster can be calculated as follows:

[Total number of server instances + total number of desktop instances] * 10 connections = Total number of used database connection.

 

Infostore Repository

For the Infostore Repository the number of database connections used by a VoiceObjects cluster can be calculated as follows:

[Total number of server instances] * 6 connections = Total number of used database connection.

This already includes three connections used for custom logging only. If your system does not contain custom logging, or if you use a separate RDBMS for storing the custom logging data, calculate with only three connections for the Infostore Repository.

Recommended Hardware for VoiceObjects Server

In this section, a typical hardware configuration for one VoiceObjects Server machine is specified. A server cluster may consist of several such machines.

The recommendations below are based on the sample Prime Insurance application for the voice channel, delivered with the current version of VoiceObjects. These recommendations should be interpreted as rules of thumb. Adjustments to the required hardware, especially CPU power and available RAM, will depend on the complexity of your applications, the call volumes and the supported channels. The CPU load for instance will be higher in the text or Web channel as the interaction between caller and system is typically faster than in the voice channel.

Memory (RAM)

On a dual CPU machine, each having 2 cores, 8 GB of RAM are recommended. This configuration typically provides a good balance between CPU power and memory sizing.

CPU

In general each core of a CPU should support only one server instance, e.g. if a server machine has two dual core CPUs or a quad CPU at maximum 4 server instances can be run on that machine. With proper memory assignment in place (see above) each of these server instances is able to handle 125 IVR ports simultaneously on average, summing up to 500 IVR ports per server machine with 4 server instances. Maximum peak load on such a machine is 800 concurrent sessions.

The exact maximum number of ports depends on the complexity of the application, the implementation details of back-end integration, and on the deployed hardware.

Network adapters (NIC)

Each VoiceObjects Server machine should be equipped with 2 NICs, with one NIC dedicated to communicating with the VoiceXML gateway, and the other NIC to the back-end systems.

In most cases 100 Mbit NICs are sufficient, because both VoiceXML pages delivered and back-end data requested by the server typically require only small bandwidth. 1 Gbit adapters will be needed if requests to back-end systems cause high network throughput, or if static resources (e.g. large audio files) have to be served. Note however, that in an optimized configuration, media resources are served by a dedicated file server rather than by the Service Execution Environment.

8     Caution: All server machines participating in one cluster have to be located within the same local area network (LAN), wide area networks (WAN) are not supported.

Hard disk space

At least 1 GB or more of free disk space is required, depending on the amount and size of resource (audio, grammar) files.

8     Caution: Depending on the settings made for the log framework of VoiceObjects the amount of space that is used by log information may also be significant. Refer to Log file configuration below for more detailed information.

Determining the Required Number of Servers

Taking the machine hardware as described above (dual CPU with two cores, 8 GB RAM) as a building block, it is straightforward to calculate the number of required servers:

Given that one machine can handle 500 IVR ports, N machines can handle N *500 ports. For a media platform with 1440 ports, for example, the minimum requirement is 3 machines, and with 4 machines you are on the safe side.

For smaller deployments with less than 500 ports, one machine can usually sustain the entire load. To achieve failover capabilities, it is nevertheless recommended to set up a cluster of at least 2 machines, so that the entire load can still be handled if one machine is no longer available.

Number of services per server

A service is an application running on VoiceObjects Server, identified by a VSN (VoiceObjects Service Name). A single server or a server cluster can, in principle, host any number of services in parallel.

These services compete for memory, though, because their metadata is cached by each server instance. The memory footprint of services mainly depends on the number of objects that are used in the corresponding dialog definitions.

As a rule of thumb, no more than about 30 services should be deployed in parallel on a given server cluster group. The optimal number for an actual deployment should be determined, however, by monitoring the memory usage as displayed in the info box of the corresponding server instance in the Control Center. To open the info box for a server instance double-click it or select Details from its context menu. In Desktop for Web the memory figures are displayed in the Server Instance section in the Control Center. The same information is available through the Web Services Interface (WSI) and through the Command Line Interface (CLI). The sum of all service caches should not exceed 25-30% of the memory that is assigned to the corresponding JVM.

Number of server instances per machine

On one 8 GB machine, 4 server instances should be configured to run in parallel, each instance using a heap size of 1 GB or more. On a 4 GB machine, only 2 server instances should be used.

These numbers should be a good starting point for configuration optimization. In a given environment, load tests should reveal whether more or less instances per machine enhance performance.

It is recommended that each server instance should be configured to handle about 125 IVR ports as its normal workload. With 4 server instances on a dual CPU machine with two cores with 8 GB RAM, this adds up to 500 IVR ports that the machine can handle.

Note that these numbers already include a certain level of safety, as each server instance can handle up to 200 parallel sessions. When this maximum limit is reached, no further initial requests will be processed by this instance. This feature works as a safety net, ensuring ongoing service operations even in case that several server instances in the cluster are no longer functional, by making sure that the remaining instances do not get overloaded.

Depending on the configured number of server instances per machine, a setting in the system-components.xml configuration file should be adapted. Locate the setting

<component id="ServerManagerClusterNode" …>

<communication-properties> … ;port_range=5; …

</communication-properties>

</component>

The parameter port_range must reflect the number of server instances. So, if you set up 3 server instances per box, set port_range = 3 to reflect the total number of server instances.

Number of server instances in one cluster

The maximum supported number of instances per logical server is 16, which can be realized on anywhere between 4 to 16 machines. All machines involved in the cluster should run the same number of server instances. This number also limits the amount of ports that one cluster can handle: With a single server instance handling no more than 125 ports, the overall limit would be 16*125 = 2000 ports. If the number of ports exceeds this limit, separate clusters have to be configured.

Shared resources

The linear scaling between the number of server machines and the number of IVR ports may be negatively affected by several factors:

·          the scaling properties of the load balancer,

·          the back-end system(s),

·          the logging database,

·          the file server(s).

Any of these resources shared by the server machines may cause sub-linear scaling (i.e. the load that can be handled by N machines can be less than N times the load handled by a single machines).

Bottlenecks causing sub-linear scaling behavior can be identified in load tests and removed by providing additional resources, clustering and load balancing.

Fault tolerance

In a fault tolerant setup, the system must continue to work after breakdown of a single component. If a server instance or an entire machine fails, the load balancer is responsible for distributing the load among the remaining server instances. For details regarding configuration of load balancers, cf. section Load Balancer Configuration.

Fault tolerance can hence be achieved by simply adding extra machines to the server cluster.

Recommended Software Configuration

Operating system

On Intel systems, Linux with kernel 2.6 is recommended. As compared to kernel 2.4, this kernel has been improved in terms of scalability and responsiveness, partially as a result of enhanced multithreading performance.

For an optimal configuration, the Linux kernel should be recompiled with a minimum set of required modules. When tuning kernel parameters, watch out for parameters related to TCP/IP and to multi-threading.

Additional server processes (like RDBMS or Web server) that are not required for running VoiceObjects should be deactivated or removed.

If VoiceObjects Server is hosted on a Microsoft Windows server system, make sure to disable all unnecessary Windows services (like RDBMS, Web server, Indexing etc.).

Cluster support

VoiceObjects supports both Windows and Linux clusters. Heterogeneous clusters (i.e. a mixture of Linux and Windows machines) are not supported.

Java Development Kit (JDK)

Recommended: Sun JDK 1.5.0

Also supported: IBM JDK 1.5 (only for AIX).

Web application server

The Jetty server, which is installed with VoiceObjects by default, has proven exceptional performance. The Jetty configuration as shipped with the VoiceObjects installer is production ready.

Database servers (RDBMS)

For production deployments, Oracle is the preferred database management system to host the Metadata Repository and the System Logging database.

MS SQL Server will typically be used in Windows based development environments.

Load Balancer Configuration

When deploying VoiceObjects Server in a cluster, a load balancer must be configured to distribute the requests from the VoiceXML gateway to the individual server instances of the cluster.

The load balancer has the following tasks:

·          Load balance traffic for better scalability: Distribute the load between the server instances such that each of the instances serves – at least approximately – the same load.

·          Call stickiness: Make sure that once a call is mapped to a server instance, subsequent requests from the same call are mapped to the same instance.

·          Fault tolerance: Make sure that once an instance in the cluster is no longer available, no further calls are mapped to this instance for a certain time.

Let us have a closer look at these three tasks.

Load balancing traffic

Several strategies exist for load balancing traffic.

·          Round-robin is the simplest approach: The load balancer assigns the requests to a list of server instances on a rotating basis. The first request is allocated to a server instance picked randomly from the group. For subsequent requests, the load balancer follows the circular order to redirect the request.

·          In random load balancing, requests are routed to server instances at random. Like round-robin, random load balancing is recommended only for homogeneous cluster deployments, where each server instance runs on a similarly configured machine.

·          Weight-based load balancing improves the round-robin algorithm by taking into account a pre-assigned weight for each server instance. When using a weight-based algorithm, carefully determine the relative weights to assign to each server instance. Factors to consider include the processing capacity of the machine's hardware in relationship to other machines and the number of non-clustered ("pinned") objects each server hosts.

·          In a more sophisticated schedule, calls are routed to the server instance that bears the least load at that moment (i.e. has the lowest CPU or memory usage). This corresponds to a weight-based load-balancing scheme with dynamic weights.

Call stickiness

Ensuring „call stickiness“ is the same as ensuring „session stickiness“ in the language of Web applications.

In order to ensure that subsequent requests from the same call are always routed to the same server instance, several approaches are possible:

·          The load balancer maps the first request from a new call to one server instance in the server cluster. In the response VoiceXML code generated by this server instance, all embedded URLs directly reference this instance, rather than the load balancer. This configuration ensures that all subsequent requests from the same call will automatically be directed to the same server instance, bypassing the load balancer.

·          Cookie Persistence: Many load balancers can be configured such that they use a Web cookie for identifying application sessions. In this mode, the first request from a call is mapped to one of the available server instances. The response VoiceXML contains a cookie representing the dialog ID (which is equivalent to a session ID). Subsequent requests from the same call will contain the session cookie and thus allow the load balancer to map the request to the same server instance.
When using this approach, VoiceObjects Server must be configured to use session tracking based on cookies generated from the Servlet engine. Since this is not the default mode, the <SessionTrackingMode> tag must be manually changed to Standard (refer to Session tracking mode in Chapter 1 – Advanced Configuration of VoiceObjects).

·          Some VoiceXML gateways do not support Web cookies. On these platforms, sessions are tracked via URL encoding: The HTTP request parameter representing the dialogID is encoded into the URLs. Load balancers that are aware of the HTTP protocol can often be configured to use this scheme in order to ensure session stickiness.

8     Caution: Note that in an IVR environment, IP persistence, a commonly used approach for ensuring sticky sessions for Web application server clusters, is not a valid approach for voice applications. IP persistence maps all requests from the same client IP address to the same server instance. While this approach works well for a public Internet server that serves thousands or millions of different PCs, it does not work in the IVR environment, because there is only one client – the VoiceXML gateway – communicating with the VoiceObjects Server cluster. Even if there are several instances of the VoiceXML gateway, mapping requests by client IP address is obviously not a good idea.

Fault tolerance

The load balancer internally maintains a list of available server instances. Once one of these instances is no longer available (e.g. because the server doesn’t answer a request within a given time interval), the corresponding entry is marked off this list for a certain time. When this time elapses, the load balancer will try again to map requests to that instance.

Some load balancers can be configured to check for the availability of server instances proactively. The VoiceObjects Server “Ping” mechanism may be used for this purpose; it allows checking for the service status on a given server through a simple HTTP request. For details, cf. section Representative Load Test Results.

Load balancer options

Several options are available for the actual load balancer implementation:

·          Some VoiceXML gateways provide basic load balancer functionality by maintaining a list of available application servers. Basic load balancer functionality can also be implemented by hard-coding it into the start VoiceXML page.

·          The Web application server that hosts VoiceObjects Server may offer an internal load balancing mechanism when deployed in a server cluster. Load-balancing support is available, for example, for JBoss, IBM WebSphere Application Server and BEA WebLogic Server.

·          In environments that must guarantee highest availability, a dedicated software or hardware load balancer is recommended.

Hardware load balancers are based on DNAT (Destination Network Address Translation). Here, the request packet’s destination address is changed to an internal cluster address. On obtaining the response from the cluster, the destination address is changed to the client address.

A software load balancer may be deployed to dedicated machines, or coexist with the VoiceObjects Servers on the same machines. In the latter case, a valid approach consists in deploying one instance of the load balancer software to every server machine in the VoiceObjects Server cluster and configuring the URL list in the VoiceXML gateway with all load balancer instances. This way, both the VoiceObjects Server cluster and the load balancer are deployed redundantly, ensuring fault tolerance when one server becomes unavailable.

Load and Performance Tests

Sizing and configuration recommendations can, by nature, only offer rough guidelines. The optimal sizing and configuration always depends on the specific environment, the specific application, back-end systems, network details etc.

In consequence, pre-production load and performance tests are an indispensable milestone in any serious service deployment. Successful design, implementation, execution and analysis of load tests are a non-trivial task and a prerequisite for project success.

In a load test, thousands of automated calls exert a load on the media platform, application server and back-end systems which as closely as possible duplicates realistic customer calling conditions. During the simulation, virtual users dial into the systems under test, enter account numbers and PIN numbers, verify that the right prompt responses are being played, and measure system and network response times throughout each test call.

The VoiceObjects platform provides an integrated LoadTester to measure the scalability and performance of your application server and back-end systems. For details on working with the VoiceObjects LoadTester refer to the LoadTester Guide.

Why load tests

There is more than one reason for planning and performing load tests. Questions that can be answered by load tests include:

·          Scaling: Will my servers still scale linearly under peak time conditions? Are all parts of the infrastructure sized correctly? Undersizing means bad customer experience, oversizing involves unnecessary expenses.

·          Performance: Will callers experience acceptable response times during peak times?

·          Accuracy: Does the application start generating errors under load?

·          Identify bottlenecks: How much load do I need to break the system? Which is the first system component to operate at full capacity? Which component is the first to break?

·          Fault tolerance: Does the system survive the breakdown of a single component? Do the fault tolerance strategies (load balancing, redundancy) really work?

·          Long-term stability: Is the system stable under load over extended periods of hours and days?

·          Error reproduction: Can I reproduce errors that only occur under load conditions?

For some of these tests, it is important that the simulated load is as realistic as possible, while for others it is sufficient to just generate heavy load or to stress certain system components.

When to test

The earlier, the better. Load tests can and should be conducted in early stages of a project, when it is still possible to change fundamental design decisions and hardware configuration.

One of the advantages of using a phone application server like VoiceObjects is that architecturally, a prototype and the final application are the same. In consequence, meaningful load tests can be perfectly conducted with an early prototype that only implements a small subset of features and a few, typical back-end connectors.

In addition to pre-production load tests, load tests should always be repeated when software or hardware components are upgraded or added. The results should be compared with previous load test results to uncover the effect of the change to system performance and stability. In order that load test results can be compared with each other, it is important to ensure that load tests can easily be reproduced.

Design load tests

First, you need to decide which parts of the system to include in the test. The two most important scenarios are as follows:

1.        In a communication services environment, a complete end-to-end test requires the simulation of real phone users. Load tests of this type provide information on the entire system, including realistic response time data from the caller perspective, but require substantial effort.

2.        The minimal approach, which includes only the VoiceObjects application server (including load balancing) and back-end systems, but excludes all IVR components, consists in generating HTTP requests against VoiceObjects Server. This type of test is easy to conduct, and it is adequate for validating sizing and configuration of the server and back-end systems.

For both testing scenarios, standard load-testing tools exist. For scenario #1, Empirix (www.empirix.com) offers the Hammer Contact Center Test System.

For scenario #2, most standard load test tools for Web applications can be used. VoiceObjects has developed a load test tool optimized for testing VoiceObjects-based voice applications, including a HTTP recorder and load test execution engine.

In the following, we will focus on the second scenario.

Simulate realistic user behavior

When validating system scalability and stability, the goal is to simulate a load that is as close as possible to a realistic, production-level system load. Most importantly, this involves realistic simulation of typical user behavior. The more accurately users are modeled, the more reliable performance test results will be. Simulating realistic user behavior involves the following:

1.        Identify a few typical paths through the application that your callers will typically take. Describe each path as a test case. Note that, depending on the purpose of the test, there are two alternative guidelines for choosing test cases: If the goal is to simulate realistic load, typical user behavior must be analyzed (or predicted). If, on the other hand, the goal is to stress certain system components, define paths that exert a controlled stress on single components.

2.        For the generation of load test scripts, you actually call in to the service with a phone and talk through the test case. The HTTP traffic (requests and responses) between the VoiceXML gateway and VoiceObjects Server is recorded. After finishing a test case and hanging up the phone, the load test script will automatically be generated by the recorder.

3.        In the generated load test scripts, randomize test data where possible in order to prevent artifacts from caching in load test results.

4.        Realistic „user think times“ must be inserted into the test script between subsequent requests. The user think times as recorded by the HTTP recorder (cf. step #2) are a good starting point. When executing the load test, these think times will be used to simulate the delays caused by the user listening to prompts and giving answers, including the typical processing time of the IVR components not included in the load test.

5.        Add response verification logic to the test scripts. During a load test, it is important to verify every server response – if a load test fails to verify server responses, errors that only occur when the system is under load will go unnoticed.

6.        In order to achieve a realistic distribution of caller activities, attribute a statistical weight to each of the load test scripts. Popular paths (e.g. “get account balance”) should be executed more frequently during a load test than less frequently used paths (e.g. “change PIN”).

i8    Note: User think times must be randomized in order to de-synchronize concurrent virtual users during the load simulation (user synchronization artifacts are a frequent problem with poorly implemented load tests and cause meaningless results). The random distribution of user delays should be carefully modeled in order to generate realistic user behavior patterns: Typically, a negative exponential distribution with lower and upper limits is an adequate choice. The lower limit should reflect the minimal processing time of the media platform plus the duration of prompt replay when barge-in is disabled. The upper limit should include the duration of prompt replay plus timeout settings on the ASR.

For stress tests, it must be possible to globally limit or even turn off user think times in order to achieve maximum throughput.

Modeling the load

Having defined and implemented the user profiles, the next step is to decide how much load to exert on the system. It is a good idea to start with an increasing load test: The test starts with a small number of virtual users. After fixed time intervals, more virtual users are added to the simulation, leading to a load profile that looks, over time, like a staircase. An increasing load test is an adequate tool to get an understanding of the system’s scaling properties, and to quickly identify system bottlenecks.

First increasing and then decreasing the number of virtual users helps detect whether the system completely recovers after heavy load. Compare performance measures in the first and in the last part of such a test, when equal numbers of concurrent virtual users use the voice application.

Load tests with a constant number of virtual users are used to get statistically sound performance and throughput data. The appropriate number of concurrent users will be defined directly, or calculated from targets like calls per hour or HTTP requests per second.

A stress test can help to quickly reproduce errors, or to identify resource limitations. During a stress test, user think times are turned off.

What to measure

From the end user perspective, the most important data to measure are response times (average, min/max, percentiles) and error rates. When measuring page response times, it is important to separate DNS lookup time (which should be avoided by using TCP addresses instead of hostnames), connection time, network latency for sending request and response, and server processing time. If possible, back-end processing time should be measured as well.

From a system perspective, the most important data are throughput data like requests per second, calls per hour, or kB/sec. Related measures feeding into the throughput performance are, for example, the average VoiceXML page size, and caching efficiency.

Load test execution

Before starting the test, be sure to turn off dialog tracing and to set logging to the same level as planned for the production system.

During the load test, monitor the machines that generate the load. It is important that resource utilization (CPU, memory, network IO) of these machines is well below saturation.

Moreover, simulated virtual users must be able to act (i.e., send and receive requests) independently: If requests from several virtual users are artificially queued on the load generation machine, the generated server load will be too low, leading to overly optimistic load test results. Ideally, each virtual user is represented by a separate thread or process.

Monitor server resource utilization

During the load test, resource utilization from all system components that are involved in the test must be measured and logged.

When correlated with load test data like number of requests per second, number of parallel virtual users etc., server monitoring data is essential in identifying system bottlenecks and verifying system sizing.

For all servers, this data includes monitoring CPU and memory usage, network throughput and IO activity.

For all relevant server processes, this data includes CPU usage and memory consumption per process, plus more specific, process-related measures like the number of concurrent threads.

For application servers like database server, Web application server or directory server, application-specific monitoring data are available through the system utility perfmon on Windows, and through command line utilities or SNMP on Unix. Proper analysis of this data will require application specialists for the respective servers.

8     Caution: Sometimes a load test involves resources (back-end systems, network components etc.) that are not exclusively dedicated to the application under test. In this case, it is important to measure the “background noise” or normal activity levels of these components before and after the test in order to get an understanding of their utilization by the voice application.

Results analysis

On test completion, the raw data gathered during the test must be consolidated and statistically analyzed. Graphical tools are used for plotting data from different sources and identify correlations. Performance data (response times, throughput, error rates) and data from monitoring system components are correlated to identify bottlenecks and verify sizing.

Data pre-processing and visualization must be consistent across subsequent load tests in order to make them comparable.

Proper results analysis is a joint effort involving the respective experts for all system components involved in the test.

Representative Load Test Results

VoiceObjects conducts daily load and performance tests in their test lab.

The following results are from tests of the VoiceObjects 9.0 release. VoiceObjects Server was running on a single workstation, equipped with the following hardware and software:


CPUs

2 AMD Opteron 2212 (Dual core) 2 GHz

Memory

8 GB

Operating System

Linux based on Kernel 2.6 (64 bit)

Java virtual machine

Sun JDK 1.5.0-11 (32 bit)

Web application server

Jetty 6.1.19 Web Application Server

RDBMS (Infostore Repository)

MS SQL Server 2008

 

8     Caution: The test results listed below cannot be guaranteed in general as they may differ based on the overall setup of the environment at the customer's side.

The system was configured to run a cluster of 4 server instances in parallel, with each JVM’s stack size set to 1 GB. Note that in this case the actual amount of memory used by each JVM is up to 50% higher than the configured heap space, as Java claims this additional memory for internal data structures such as class stores or thread stacks.

As described in the section on Load and Performance Tests, the load was generated by simulating virtual users with realistic behavior patterns and timing. The virtual users were calling into the standard Prime Insurance sample voice application, which is shipped with VoiceObjects, and the Prime Telecom sample application, which is designed for the Web and text channel (and which is available for download on the VoiceObjects Developer Portal). The path of each test call in the Prime Insurance application consisted of 77 HTTP requests (and hence an equal number of VoiceXML pages being served) and was designed to involve all types of objects in a representative way. A typical virtual call lasted about 510 seconds. A typical call in the Prime Telecom application consisted of 31 HTTP requests and lasted about 80 seconds.

During the test the server management functionality was also tested by reloading the service list every hour and afterwards redeploying and restoring the application. In addition System DB logging was fully active, including 10% Input State logging.

The results below are from a long-running load test (60 hours), which proved

·         the stability of VoiceObjects Server,

·         the absence of memory leaks, and

·         consistent performance in high-load conditions.

The number of virtual callers generating the load varied over time in a wave-shaped (or sinusoidal) form: During „peak times“, 560 parallel sessions (400 voice + 160 text/Web sessions) were active.

The following table summarizes the most important results from this load test.


Call volume

Total number of calls

417.325

Total number of requests

~ 18 million

Response time
(statistics over all VoiceXML pages served during the test)

Average response time

18,3 ms

1. Quartile

7 ms

2. Quartile (Median)

10 ms

3. Quartile

15 ms

Peaks > 500 ms

~ 0.03 %

Memory utilization
(per Server JVM instance)

Average used

375 MB

Maximum used

450 MB

CPU usage (Server)

Average usage

33%

Maximum usage

60%

 

The following three charts visualize the results for this particular instance.

The first chart shows the memory usage during the 60 hours.

The second chart shows the load on the system (the bars) and the minimal response times as well as the first quartile, median, third quartile and average for the response times.

Finally the third chart shows the status of the Infostore queue during the test run.

Production Monitoring

Once in production mode, the service’s and media platform’s performance must be monitored on a permanent basis. Administrators can observe the current system usage and performance through the VoiceObjects Control Center. On top of that, it is however necessary to implement alarming mechanisms that signal high load or error conditions to a management platform.

Notifications

Notifications can be used to monitor both VoiceObjects Server and individual services. The configuration of notification is documented in detail in Chapter 8 – Notifications in the Deployment Guide.

Notifications, could be SNMP traps and/or e-mails, will be sent to a system management platform whenever the state of a server or a service changes, or when back-end errors occur. In addition, custom notifications can be configured to be sent from within a service.

Server “Ping” connection test

As an alternative to notifications, requesting the following URL from the server can test the availability of a server instance:

http://…/VoiceObjects/DialogMapping?ping=instance

If the server is available, the response contains the logical server name.

Requesting the following URL from the server can test the availability of individual services:

http://…/VoiceObjects/DialogMapping?VSN=[ReferenceID]&ping=instance

The response contains the status of the service on this server instance.

Details are documented in the section Connection Test in Chapter 4 –Service Deployment in the Deployment Guide.

Resource usage monitoring

Monitoring of resource usage on the VoiceObjects Server machines should be integrated into the system management platform used by the system operators. As a minimum requirement, CPU, memory, File IO and Network IO utilization should be monitored on a permanent basis.

Active monitoring end-to-end

In addition to passive monitoring as described above, active application monitoring is recommended. In an approach that is orthogonal to monitoring separate server and network components, actual application requests are sent to the server in a virtual, simulated caller session. The goal is to identify problems with the application or with the back-end that would go unnoticed with pure component monitoring.

Ideally, active monitoring takes a full end-to-end approach. This requires a simulated user calling into the application on a regular basis (e.g. every 2 minutes), driving through the voice application while measuring response times and validating system prompts.

In the same way as discussed in the section on load testing, a simpler approach consists in simulating a caller session by sending appropriate HTTP requests against the VoiceObjects Server cluster. The server response times and the validity of the generated VoiceXML should be measured and reported to a system management platform.

@8    Tip: The test scripts developed for preproduction load tests can be reused for system monitoring, in most cases without any further development efforts.

Log File Configuration

This paragraph illustrates the logging mechanism of VoiceObjects. It also gives some recommendations on how to configure it best, depending on the current requirements. The examples below are based on a setup of one logical server machine holding 4 VoiceObjects Server instances and one VoiceObjects Desktop instance, which are located in:

[VoiceObjects]\Platform_1

[VoiceObjects]\Platform_2

[VoiceObjects]\Platform_3

[VoiceObjects]\Platform_4

and the Desktop in:

[VoiceObjects]\Platform

Log files

The operational logging on the VoiceObjects platform comprises 5 components:

1          VoiceObjects Server logs in
[VoiceObjects]\Platform_[1-4]\WEB-INF\log\VOServer_log\System

2          VoiceObjects Desktop logs in
[VoiceObjects]\Platform\WEB-INF\log\VODesktop_log\System

3          Bootstrap logs for VoiceObjects Server in
[VoiceObjects]\Platform_[1-4]\WEB-INF\log\VOServer_log\System\

and for VoiceObjects Desktop in
[VoiceObjects]\Platform\WEB-INF\log\VODesktop_log\System\

4          Console output for VoiceObjects Server in
[VoiceObjects]\Platform_[1-4]\WEB-INF\log\VOServer_log\System\
 
and for VoiceObjects Desktop in
[VoiceObjects]\Platform\WEB-INF\log\VODesktop_log\System\

5          Jetty logs (independent of VoiceObjects logs) in [VoiceObjects]\Jetty\logs

The VoiceObjects Server logs (1) are highly configurable. See instructions below for details on how to configure them. Also refer to Configuring the Logging Facilities in Chapter 1 – Advanced Configuration of VoiceObjects in the Administration Guide.

The VoiceObjects Desktop logs (2) are identical to (1) in structure and content.

The Bootstrap logs in (3) are only used during start-up of individual server and desktop instances, until the logging framework is fully functional. These logs are not otherwise accessed during run time, and are therefore not critical.

(4) is independent of the Server logging, since it is handled by the underlying Web application server (Jetty). In normal circumstances, this output contains only two types of information: start-up and redeploy notifications, and low-level error messages from the JVM (e.g. out of memory) that cannot be passed to the logging framework.

(5) contains information about accesses to the underlying Web application server (Jetty), e.g. all incoming requests are logged. Configuration of these logs is described in the Jetty documentation (http://www.mortbay.org). Based on our experience, these logs can be ignored.

The settings for the VoiceObjects Server logs can be critical for disk space. Therefore we provide recommended configuration settings below.

Configure VoiceObjects Server logs

All Server log settings can be configured in
[VoiceObjects]\Platform_[1-4]\WEB-INF\config\VOServer_logSettings.xml

Configuration does NOT require a restart of VoiceObjects Server, changes are picked up automatically.

In the current installation, the log level for all logs is set to WARN or ERROR. As a result, VoiceObjects produces very little log output. In most cases this is adequate, but during installation and development phase of applications it may be useful to increase the log-level of some categories, as described below.

@8    Tip: The different log levels supported by VoiceObjects are (ordered from large to low log data amount) DEBUG, INFO, WARN, ERROR and FATAL_ERROR.

The log level of the following categories can be set to DEBUG to produce more revealing output:

·          Executer (sub category ConnectorSet)
Lists general information about all processed connectors.

·          Executer (sub category Connector)
Provides detailed information about what happens within connectors.

·          VXMLLogger
Lists all VXML pages that were sent from VoiceObjects Server to the media platform.

·          URLLogger
Lists all URL requests that were sent from the media platform to VoiceObjects Server.

In order to change the log level for a log category do the following:

1.        Open the file [VoiceObjects]\Platform_[1-4]\WEB-INF\config\ VOServer_logSettings.xml.

2.        Locate this text snippet:
<category name=”XYZ” log-level=”WARN”>
where XYZ stands for the category you want to change the log level for.

3.        Change the log level for all log categories listed above to DEBUG.

These settings clearly show the input and output to VoiceObjects Server. Over time these settings can replace trace files, which saves the extra load when the system is busy: the logging framework is much less resource-intensive than the trace functionality (e.g. in terms of CPU usage, memory, and disk space).

By default, 26 log components are defined, which each write a log file. The files are rotated when they reach a size of 5 MB (or 10 MB for some files), and the rotation cycles after 249 files (i.e. the first file is overwritten when file 249 reaches 5 MB). The filenames of the rotation files are built as follows:

[log_file].[rotation_number]

Example:

SRV_ObjectValidation.log.000000 – First rotation of log file SRV_ObjectValidation

SRV_ObjectValidation.log.000001 – Second rotation

and so on, until

SRV_ObjectValidation.log.000248 – Last rotation

 

In the current configuration, this produces a maximum log file size of about 185 GB (four server instances and one desktop instance = 5 instances * 26 log files * 249 files each * 5 MB). That is quite a large amount, even for a theoretical maximum.

It is important to have enough space available (especially for DEBUG settings) for the following targets of VoiceObjects Server:

·          VoiceServerTarget

·          VXMLLoggerTarget

·          RootCausesTarget

·          URLLoggerTarget

In order to change the number of rotation files for a log target do the following:

1.        Open the file [VoiceObjects]\Platform_[1-4]\WEB-INF\config\ VOServer_logSettings.xml.

2.        Locate the text snippet:
<async-target id="XYZ" queue-size="1000" priority="NORM">
where XYZ stands for the target you want to change the file rotation for.

3.        Locate the tag <rotation type="revolving" max="249">.

4.        Change the value of the max attribute to the desired number of rotation files for all log targets listed above. Note that the rotation starts with 0, if a file rotation over 5 files is desired, the parameter should be set to 4.

Especially when moving to a production environment the number of rotation files should be reduced to 100 for the targets VoiceServer and VXMLLogger. 20 files would probably be enough for RootCauses and URLLogger. All other targets can be limited, e.g. to a 5-file rotation only. These files are generally empty. For the VoiceObjects Desktop logs, a 5-file rotation for all targets is definitely sufficient.


This results in the following maximum disk space calculation:


Per server instance:

2 log files * 100 files each * 5 MB  = 1 GB

2 log files * 20 files each * 5 MB  = 0.2 GB

22 log files * 5 files each * 5 MB  = 0.55 GB

Total = 1.75 GB


Per Desktop instance:

26 log files * 5 files each * 5 MB = 0.65 GB


The maximum log size for (1) and (2) together for a server machine with four server instances and one Desktop instance is thus:

4 server instances * 1.75 GB + 1 Desktop instance * 0.65 GB = 7.65 GB

Together with (3) and (4), a quota of around 8 GB is reasonable for all log information written by VoiceObjects on a server machine with 4 server instances and one Desktop instance.