That’s Maths: Data compression is music to most of our ears

Audiophiles decry the reduction in fidelity of ‘lossy’ compression on MP3s, but most of us don’t notice

The arrival of mobile phones was followed rapidly by "txtese", an abbreviation of language to enable messages to be written and transmitted rapidly using SMS (short message service). The simplest strategy is to omit most of the vowels; due to the redundancy of English, the meaning usually remains clear.

Of course, abbreviated spelling like this is nothing new. When telegram messages were charged by the word, all sorts of ingenious ways were found to save money by shortening them. There is no standard SMS language; a wide variety of ploys are used, the context normally serving to remove ambiguities.

Data compression

The study of data compression goes back to a paper, The Mathematical Theory of Communication by Claude Shannon, a brilliant American mathematician and engineer. Shannon, known as the "father of information theory", made many crucial contributions to the development of computing. His seminal paper on communications is 55 pages long, and replete with groundbreaking ideas.

Shannon gave a mathematical definition of the quantity of information in a message, called entropy, which allows the information content to be measured precisely. Shannon showed that there is a limit to the extent to which information can be compressed, related to what he called the entropy rate. It is mathematically impossible to do better than this. However, if we permit some distortion or loss of information, much higher compression ratios are possible.

READ MORE

The redundancy of English was estimated by Shannon to be about 50 per cent. So, we should be able to compress a typical message to half its length. For example, “Hpy Xmas 2 all IT rdrs” is about half the length of its fully expanded version, yet it is quite comprehensible.

Drastic reduction

Data compression enables music files to be drastically reduced in size. Music on a compact disc is uncompressed. An average song lasting three minutes requires about 32MB of storage. The MP3 format allows the size, and also the download time, to be reduced by a factor of about 10. It is a "lossy" compression, and does not sound identical to the original. Audiophiles decry the reduction in fidelity and stick with CDs or vinyl, but most of us don't notice any degradation in quality.

MP3 has had a huge impact on how people acquire and save music. Thanks to data compression, it is possible to download and store a large volume of music on a PC or iPod. The compression is based on "perceptual noise shaping". For example, if two sounds are played simultaneously, we tend to hear the louder one, so the softer one may be omitted without much harm. And data compression is also vital for efficient storage and transmission of images and videos, using formats such as JPEG.

Shannon’s work provides the mathematical underpinning for data storage and compression. Zip files, MP3s and JPEG images are made possible thanks to it. So, whether you are texting your friends, watching videos on the web or enjoying music on the move, you are benefiting from the application of Shannon’s mathematical theory of information.