This book gives an overview on many practical aspects of video compression systems used in broadcast TV, IPTV, telecommunication and many other video applications. Although the book concentrates on MPEG real-time video compression systems, many aspects are equally applicable to off-line and/or non-MPEG video compression applications.
Inspec keywords: motion estimation; video coding; mobile handsets; data compression; decoding; codecs; concatenated codes; statistical multiplexing; transcoding; high definition television
Other keywords: mobile device; picture quality assessment; bit-stream processing; transcoding; concatenated codec; HDTV; digital video compression system; motion estimation; high definition television; statistical multiplexing; MPEG video compression; MPEG decoder
Subjects: High definition television and video; Communication switching; Video signal processing; Mobile radio systems; Image and video coding
This chapter discusses video compression systems. The introduction of MPEG-4 advanced video coding (AVC) compression into broadcast and telecommunication systems adds another level of complexity. Whereas just a few years ago, MPEG-2 was the main video compression algorithm used throughout the broadcast industry, from contribution links right down to direct-to-home (DTH) applications, there are now different algorithms operating in separate parts of the production and transmission paths.
Digital video is becoming more and more popular. Not only are broadcasters changing over to digital transmission, but most video consumer products, such as camcorders, DVD recorders, etc., are now also using digital video signals.
Many chapters of this book make references to picture quality. Therefore, it is appropriate to give a brief overview of picture quality measurement methods. For example, statistical multiplexing systems, described in Chapter 12, have to have a measure of picture quality in order to allocate appropriate bit rates to each of the encoders. Although it will be shown in Chapter 4 that the quantisation parameter (QP) is one of the main factors affecting picture quality, there are many other factors influencing the quality of compressed video signals. This chapter gives a brief summary of picture quality assessment methods.
Chapter 2 introduced some important aspects of analogue and digital video, particularly relevant to video compression. In this chapter we have a first look at some basic video compression techniques before we move on to specific MPEG algorithms. Although some of the examples are already based on MPEG-2 coding tools, the principles explained in this chapter are applicable to the majority of video compression algorithms in use today. But before we explore the vast area of video compression methods, we will have a brief look at audio compression techniques.
Having investigated the basic principles of video compression, it is time to have a look at some real compression algorithms. By far the most widely used family of video compression algorithms used in broadcast and telecommunication applications are the MPEG algorithms. The Moving Picture Experts Group (MPEG) is a working group of ISO/IEC (the International Organization for Standardisation/International Electrotechnical Commission), i.e. a non governmental international standards organisation. The first MPEG meeting was held in May 1988 in Ottawa, Canada. To date, MPEG has produced a number of highly successful international standards for video compression, as well as for multimedia content management.
Despite the fact that MPEG and the closely related ITU H.26x video compression standards are widely used in broadcast, telecommunications, IPTV as well as many other applications, there are, of course, numerous other video compression algorithms, each with its own advantages and niche applications. Although it goes beyond the scope of this book to introduce all commonly used compression algorithms, it is worth having a look at a few of the more important ones in order to show different approaches and algorithm variations. In particular, the Video Codec 1 (VC-1) algorithm, developed by Microsoft, is a good example of a vendor-driven, block-based, motion-compensated compression algorithm with some interesting differences to MPEG algorithms. Since the decoding algorithm has been standardised in Society of Motion Picture and Tele vision Engineers (SMPTE), most of its compression tools are now in the public domain. Suitable for coding interlaced video signals and fully integrated with other Microsoft products, it provides an alternative for IPTV streaming applications. A second algorithm worth mentioning is the Chinese Audio Video Coding Standard (AVS). Although it has many similarities with MPEG-4 (AVC), it avoids some of the most processing-intensive parts of the MPEG-4 (AVC) standard. In contrast to the block-based VC-1 and AVS algorithms, which have many similarities with MPEG algorithms, it is worth also looking at some non-block-based compression algorithms, i.e. algorithms based on wavelet technology. Wavelet transforms are quite different from block-based transforms. Therefore, a brief summary about wavelet theory is provided in Appendix E. Two algorithms need to be examined in more detail: JPEG 2000, which is becoming increasingly more relevant to broadcast applications, and Dirac, a family of open-source algorithms developed primarily by the BBC research department.
Motion estimation is arguably one the most important sub-functions of any motion compensated video compression algorithm - a reason enough to allocate a whole chapter to it. Although there are differences in terms of prediction modes and block sizes, the removal of temporal redundancy in video signals inevitably requires a search engine that provides motion information on predicted blocks or block par titions. However, motion estimation techniques are not only a major part of video compression algorithms, they are also used in noise reduction, de-interlacing (see also Chapter 8), standards conversion and many other video-processing applications. It is, therefore, not surprising that a large number of motion estimation algorithms have been developed for video compression as well as for other application areas.
Pre-processing is an important part of any professional compression encoder. It includes vital functions such as noise reduction, forward analysis, picture re-sizing and frame synchronisation. Very often, the pre-processing functions have direct connections with the main compression engine, which is why, in many cases, integrated pre-processing produces better results than external ones, even if external pieces of equipment might be more sophisticated. Pre-processing is as much a system function as it is an encoder function. It is closely related to most of the concepts discussed in remaining chapters of this book, and in many cases it would be beneficial to introduce the system aspects first before explaining how pre-processing functions can improve the end-to-end performance. Nevertheless, it was felt that since pre-processing is an integral part of encoders, it should be explained in conjunction with compression algorithms rather than towards the end of the book. There are many cross-references in this chapter, and the reader should not hesitate to read some of the other chapters before returning to this one.
After many years of research and several attempts to introduce HDTV into broadcast systems, HDTV has finally hit the consumer market. It took the combination of several factors to make it happen, the most important of these being the availability of large affordable display devices for consumers. A second important factor was the development of HDTV-capable successors to DVD players. Last but not least, the high compression efficiency of MPEG-4 (AVC) makes it possible to transmit HDTV signals at bit rates that are not much higher than SDTV transmissions were in the early years of MPEG-2. In fact, using DVB-S2 forward error correction (FEC) and modulation, together with MPEG-4 (AVC) compression, makes it possible to transmit HDTV sig nals within a bandwidth equivalent to that of SDTV MPEG-2 compression with DVB-S FEC and modulation.
The requirements of video transmission systems to and from mobile devices differ considerably from those of satellite or terrestrial direct-to-home (DTH) transmissions. The main considerations are to keep the power consumption of mobile devices as low as possible and to make sure that frequency deviations due to the Doppler effect in moving receivers do not degrade the performance of the transmission system. In terms of video compression algorithms for mobile devices, there are also different requirements compared to traditional DTH applications. Not only should the compression algorithm provide high efficiency on relatively small, non-interlaced images, but it should also require little processing power and memory space for encoding as well as decoding.
This chapter presents the channel change time between MPEG-4 (AVC) bit streams tends to be generally longer than between MPEG-2 bit streams due to larger buffer sizes and lower bit rates. Channel change time between different multiplexes is largely implementation dependent. Once a bit error has been detected by the decoder, the decoding process can only resume at the next slice header. In MPEG-2 each row of macro blocks starts with a slice header, whereas in MPEG-4 (AVC) slice headers are usually only at the start of a picture.
CBR encoders have to be set to a bit rate close to the maximum bit-rate demand in order to avoid compression artefacts. Most of the time the encoder can cope with a lower bit rate. By sharing a common bit-rate capacity, encoders can free up bit rates to other channels during scenes of low criticality and obtain higher bit rates during critical scenes. By modelling the bit-rate demand for different content, the bit-rate saving of statistical multiplexing can be calculated.The bit-rate demand of MPEG-2 and MPEG-4 (AVC) encoders is strongly correlated although the bit-rate saving of MPEG-4 (AVC) increases for less critical scenes. Noise reduction frees up bit rate for other channels in the statistical multiplexing group.
In the previous chapter we had a look at statistical multiplexing systems that are mainly used for large head-end transmission systems. However, MPEG compression is also used for point-to-point transmissions between broadcast studios and for distribution to local studios or transmitters.
In the previous two chapters, we have seen that MPEG video compression is used for contribution and distribution (C&D) applications as well as for statistical multiplexing systems in DTH head-end systems. Today, it is inevitable that video signals undergo several stages of compression and decompression before it reaches the end user, and with the growth of digital television networks, concatenation of compression encoding and decoding is becoming more and more prevalent. In this chapter we will have a closer look at all combinations of concatenation between MPEG-2 and MPEG-4 (AVC).
This chapter presents the small bit-rate changes can be achieved with bit-rate changers, thus avoiding the need for decoders and encoders. Transcoders consist of closely coupled decoders and re-encoders. Transcoders are used for large bit-rate or profile changes or conversion from one compression standard to another. Splicing between statistically multiplexed video signals requires bit-rate changers.
Over the last 10 years, the compression efficiency of MPEG algorithms has improved significantly, and once all the compression tools of MPEG-4 (AVC) have been fully exploited, it is difficult to see how it could be advanced even further. However, as processing power seems to increase steadily according to Moore's Law, more advanced algorithms can be conceived, and the question is not whether new compression algorithms will be developed, but rather at what point a new standard should be defined. Apart from MPEG, there are many research programmes and initiatives that could lead to future, more efficient, video compression standards. One of the areas in which compression efficiency could improve on MPEG-4 (AVC) is how to deal with texture with random motion, for example splashing water. This type of content contains little redundancy and is difficult to compress. It has been shown that by synthesising such picture areas rather than trying to compress them, significant coding gains can be achieved.
There are a number of test sequences referred to in this book. The tables in this appendix give a brief description of the sequences in terms of content and types of motion.
This appendix contains SDTV and HDTV conversion formulae between RGB and YUV and vice versa.
PSNR is calculated by summing up the squared pixel differences between the distorted and the source video signal. It has to be calculated for each component separately.
The 8 × 8 two-dimensional discrete cosine transform (DCT) and inverse DCT are defined.
At first sight, wavelet transformation seems to combine several advantages of sub-band coding and conventional FFT or DCT while being computationally more efficient. The continuous nature of the transform, as opposed to DCT blocks, helps to avoid artefacts, and it appears to be better suited to the spatial de-correlation of texture in images. In the form of quadrature mirror filters (QMFs), a special case of wavelet filters has been known for some time. Wavelet theory generalises the principle of QMFs and provides a broader mathematical basis.
This appendix gives the formula for calculating the phase correlation surface calculated and by comparison, that for the cross-correlation surface.
Design an FIR low-pass filter with a bandwidth of fsi/2N, where fsi is the input sampling frequency. The filter can be designed by inverse Fourier transforming the theoretical frequency response of the low-pass filter. Since the theoretical frequency response is a square wave, the inverse Fourier transform generates a sin(x)/x waveform of infinite length. After transformation into the spatial domain, a window function has to be applied to limit the number of filter coefficients.
Assuming that a bit error occurs somewhere in a group of pictures (GOP), the expected error propagation time through the GOP is calculated as a function of the GOP structure and the relative frame sizes of I, P and B frames.
This appendix shows how the bit-rate demand model is derived by defining the cumulative distribution function (CDF).