next up previous contents index
Next: Data Structures Up: No Title Previous: Database

Data Compression

  Large amounts of data can create enormous problems in storage and transmission. A good example is given by digitized images: a single DIN A4 colour picture, scanned at 300 dpi with 8 bits/pixel/colour, produces 30 MBytes of data.

The widespread, consumer-market use of information in the form of images has contributed much to the development of data compression techniques. The design goal of image compression is to represent images with as few bits as possible, according to some fidelity criterion, to save storage and transmission channel capacity. All image compression techniques try to get rid of the inherent redundancy, which may be spatial (neighbouring pixels), spectral (pixels in different spectral bands in a colour image) or temporal (correlated images in a sequence, e.g. television).

There are lossless methods, which are reversible, viz. do not sacrifice any information, and lossy methods which may be used if the quality of a compression-decompression sequence is judged by general criteria, like unchanged quality for the human visual system. Note that in image processing jargon, ``lossless'' is sometimes used in the sense of ``no visible loss''.

Examples of lossless methods are run-length coding, Huffman coding , or the Lempel-Ziv-Welsh (LZW) method. In run-length coding one replaces runs , sequences of equal greyvalues, by their lengths and the greyvalues. Huffman and LZW coding are approximations to entropy encoding, i.e. frequently used sequences are replaced by short codes, rare sequences by longer codes. In Huffman coding, sequences are single greyvalues, for LZW they are strings of greyvalues.

Of the many lossy coding techniques the simplest may be thresholding, applicable in some situations; the most important ones are predictive and transform coding. In predictive coding, one removes the correlation between neighbouring pixels locally, and quantizes only the difference between the value of a sample and a predicted value ( Quantization). Transform coding decorrelates the whole signal, e.g. a block of pixels in an image, as a unit, and then quantizes the transform coefficients, viz one sets insignificant coefficients to zero. Only complete sets of unitary transforms are considered, i.e. transforms with the property of equal energy in the spatial domain and in the transform domain. This compression works well if the energy is clustered in a few transform samples. One talks of zonal coding, if certain coefficients are systematically set to zero (e.g. frequencies in the Fourier domain), and of adaptive coding, if coefficients are set to zero according to some threshold criterion of significance ( e.g. rank reduction in principal component analysis)

The following sets of unitary transforms are usually described in the literature ( [Rabbani91])

They are listed above in order of decreasing energy compaction and computer time used. The popular JPEG algorithm for compression of colour images uses essentially the discrete cosine transform (DCT), followed by quantization and Huffman coding (JPEG, short for the original committee ``Joint Photographic Experts Group'', is a widely used compression standard for still images).


next up previous contents index
Next: Data Structures Up: No Title Previous: Database

Rudolf K. Bock, 7 April 1998