This brand new title provides an insight into the rich diversity of techniques, tools and knowledge used in performance engineering, covering the whole life cycle from design through to operation - of both networks and systems. Performance modelling is discussed as an essential technique, providing the ability to predict performance through a quantitive understanding of how varying demand affects the behaviour of a system or network.
Inspec keywords: quality of service; telephony; Internet; IP networks; telecommunication traffic; voice communication; telecommunication network management
Other keywords: combined Web-telephony system; adaptive network overload control; traffic characterisation; voice traffic; performance monitoring; intelligent network overload control; IP-dial; telecommunication performance engineering; quality of service; broadband window; in-life capacity management; IP network; capacity planning
Subjects: Telecommunication applications
Forty years ago-or even thirty-performance engineering was a very different area from what it is now. For a start, it was then termed 'teletraffic'; more significantly, the range of systems and problems studied was much narrower, the variety much less, and the speed of evolution incomparably lower. Only with the opening-up of the business following privatisation and liberalisation in the 1980s did this attitude begin to alter. Within the UK, a sea change was effected by the so-called 'overall grade-of-service' studies carried out within BT, which made evident for the first time the extent of variability of performance figures over a properly managed network
'Telecommunications traffic' is a phrase used to describe all the variety and complexity of the usage of a network. It is the comings and goings of demand, in response to user behaviour and to the network's reaction. Its detailed specification is a means to an end - merely the first step in the evaluation of the quantities that are meaningful and useful to the network engineer, and therefore the type and complexity of its characterisation reflects the richness of its array of uses. Time-honoured representations of network traffic by pure-chance streams, which have worked well for many years, are now insufficient. The new data networks display a complexity of behaviour which requires a corresponding richness of input, and this has required the development of a multitude of traffic descriptions. These include not only extensions of classical models, such as multi-bit-rate sources, but also such very different areas as the effective bandwidth models now in widespread use for ATM modelling, self-similar or fractal traffics, and the complex self-coupled and adaptive behaviour of TCP/IP traffic streams. This chapter will distinguish loosely between characterisation and modelling. The first of these implies the purely phenomenological description of traffic, i.e. the analysis of measurement data, and the production of high-level statistical descriptions. Modelling, by contrast, covers the detailed low-level production of probabilistic models of traffic that can actually be used to make predictions about network and system behaviour.
The PSTN has been around for a long time and its performance is well understood. The technology is mature and the issues regarding the management of the network are, for the most part, also well understood. We can see this in the excellent performance and network availability delivered by the PSTN - a record that is to be envied by some younger network technologies. This chapter considers IP-dial traffic, i.e. customers using their computers to access the Internet by connecting across the PSTN to their Internet service providers (ISPs), using analogue modems or ISDN terminal adapters. The PSTN must attempt to deliver dial tone and connectivity to their PCs, while at the same time protecting the interests of other customers on the network making voice calls.
Following the success of the Internet, over the last five years or so IP has firmly established itself as the networking protocol of choice for a wide range of both traditional and emerging applications. More recently, among business users, there has been much interest in adding real-time applications, including voice over IP (VoIP) and interactive videoconferencing. A major driver for this is the desire to consolidate all applications on to a single, multi-service platform. This chapter focuses specifically on the DiffServ approach to QoS. DiffServ involves the segregation of traffic into a small number of classes, but unlike either the ATM QoS approach, or the IETF IntServ approach, there is no signalling control plane to look at end-to-end behaviour. Instead the approach relies upon per-class capacity planning at each router individually (the so-called per hop mechanism) in order to ensure that, overall, each class gets the standard of service required.
This section explores the performance modelling of IT systems. This chapter will not describe performance engineering and the range of activities it encompasses. What it will do, however, is illustrate where modelling and prediction fits in, and how it interacts, with these other performance engineering activities.
In its most abstract sense performance testing is the loading of a system, or a part of a system, with a synthetic workload, i.e. taking a real system and using specialist scripts and tools to emulate the activities of multiple users (where a user may be another system) performing typical user actions. Atypical workloads are sometimes used to stretch a system to, and beyond, the anticipated limit to add confidence that it will be able to support.What differentiates performance testing is that the focus of the work is on a real system with a controlled load rather than a chapter or computer simulation-based exercise, or a live system with a live and uncontrolled load. An advantage that performance testing has is that it works with the full complexity of the system in its implemented state, rather than a simplified model, or how it should work according to the design. The downside is that testing can only begin when there is something to test and this is inevitably later in the life cycle than other forms of performanceengineering, and obviously major problems found at a late stage are more costly to fix than if they had been found at an earlier point in time.However, as many modern systems are an integration of specialist components bought in from third parties, who may be reluctant to share details of how their product operates for good commercial reasons, performance testing may be the first opportunity for the integrator to find out if the components being used to build the system live up to their specifications and expectations. It should be noted that performance testing is not limited to a particular technology, although the choice of tools used may be strongly linked to either the underlying technology or to the system interfaces where work is presented.
This chapter has presented evidence to show that overloads occur frequently in telephony networks, and can greatly exceed the capacity of the terminating lines (and of the network) to handle the surge of calls. The typical causes of overloads are media-stimulated events, emergencies and equipment failures. Their impact on the network is to reduce effective switch throughput (ultimately leading to switch failure) and to generate high levels of repeat attempts - most of which will fail to complete successfully, but nevertheless consume network resources, thereby reducing the capacity available to other, non-event, call streams. It is not economic to provide sufficient network or terminating line capacity to handle such events; consequently overload controls are necessary to ensure that switches are protected and ineffective traffic is minimised.
The original idea of an 'intelligent network' (IN) was to be able to define a system capability that would support the rapid building and deployment of a large range of new services into a telephony network. Services include those with advanced call distribution features, such as call queueing. The ITU specifies a series of recommendations that define such a capability, incorporating the intelligent network application protocol (INAP). Abernethy and Munday provide a useful overview of IN standards and services.
Operators of carrier-scale IP networks are faced with the constant challenge of providing a consistent service in the most cost-effective way while dealing with rapid network expansion and traffic growth, the introduction of new services, and the deployment of new technologies. This is particularly challenging for networks that offer quality of service (QoS). Such networks support a wide range of application with very different requirements on performance. They enable customers to prioritise their applications according to performance and business need by offering multiple classes of service with different targets on delay and throughput, backed by service level agreements (SLAs) or service level guarantees (SLGs). For example, voice applications should be assigned to a high priority class to ensure the low delay and loss that is necessary for a satisfactory call quality. On the other hand, Web browsing and e-mail are usually considered to be lower priority and tolerate a 'best-effort' service. This chapter focuses on two key activities for capacity planning in which performance engineering has an important role to play. The first is the development of modelling tools that provide accurate performance prediction for a wide range of scenarios. The second is the detailed analysis of network and traffic measurements. Both are essential in achieving an optimum balance between the cost of a network and the performance experienced by the users.
The Capacity Forum is at the centre of the whole capacity management process, while the Capacity Report is at the centre of the data collection and analysis. The Capacity Forum is the co-ordinating team that keeps the issues and methods under constant review. It recognises that focusing the reporting into one work area is efficient, economical and effective. It takes the statistical reporting task away from separate business units, which contributes to local morale. It puts the task under the control of a dedicated individual who is free to develop the skills required to produce a very high quality report. The motivation for this emphasis on capacity and performance is 'customer satisfaction', but capacity and performance engineering is also about minimising costs. The aim, therefore, is to prime the business with the network information required to achieve customer satisfaction at minimum cost. Covering both systems and services, each new issue of the Capacity Report builds on the picture established in previous months, and projects this picture into the future. Managers use the report, primarily, for managing engineering intervention necessitated by forthcoming special events, system changes, new services and market trends. In addition, account managers of major customers appreciate the information delivered through the Capacity Report, because it puts them in a position of real knowledge to discuss issues that their customers may not fully, or correctly, understand. It also enables them to discuss new services, and how to create better business solutions from the current and future product range. The TeleMarketing Services Capacity Report has been in production every month for several years. It was a very successful innovation that has grown and evolved in a way that could never have been envisaged at the outset. A web of links now reaches out to the business, giving the kind of coherence and structure that encourages objective discussion among otherwise disjoint communities. With its comprehensive and flexible coverage, straightforward and consistent format, simple graphs and tables, together with its emphasis on end-to-end performance and focus on customer satisfaction, the TeleMarketing Services Capacity Report stands as a prime example of how to monitor a complex, evolving system, and plan its progress towards providing the yet more sophisticated services of the future.
A full performance analysis of a complex computer system can be difficult to complete in a short time-scale if the system consists of many interworking sub systems, often widely separated geographically, and if time and budget pressures prevent the normal in-depth assessment of performance. There is, however, a way to identify performance problems and provide indications of where performance engineering effort should be directed to greatest effect - the performance health check. This chapter covers the full richness of the performance health check; normally a subset of the techniques would be employed, tailored to the particular requirements of the situation - time-scales often dictate how deeply the analysis should go. Note that the approach here concentrates on systems; networks would be covered in an analogous way.
This chapter will take a critical personal look at traditional and future potential broadband markets, consider competing technologies, and try to argue whether or not any significant network market exists in each case. While there may be no single 'killer application' that will justify a high-speed network on its own, there are many services that add together to give traffic demand, and some services that can soak up whatever bandwidth is available, if the price is right.