Data Compression

I. Introduction
In the modern era known as the “Information Age,” forms of electronic information are steadily becoming more important. Unfortunately, maintenance of data requires valuable resources in storage and transmission, as even the presence of information in storage re-quires some power. However, some of the largest files are those that are in formats re-plete with repetition, and thus are larger than they need to be. The study of data compres-sion is the science which attempts to advance toward methods that can be applied to data in order to make it take up less space. The uses for this are vast, and algorithms will need to be improved in order to sustain the inevitably larger files of the future. Thus, I
The chronicle of modern data compression begins in the late 1940’s with a method de-signed by Claude Shannon and Robert Fano, logically named Shannon-Fano coding. It takes the frequencies of the symbols in a file to represent the more recurrent ones using shorter coding. This was improved upon by one of Fano’s students, David Huffman, into an optimal version, and has obsoleted Shannon-Fano coding (except for historical pur-poses).
The next step up, arithmetic coding, is similar to, and even better than Huffman coding, but it requires more resources to implement. Also, it is burdened with patentship, one of the most adverse (in terms of adoption) properties of data compression algorithms. The same also applies to others such as LZW, MP3 and more obscure ones. LZW, standing for Lempel-Ziv-Welch, was developed by Terry Welch, and is still patented, and used in .gif and .pdf files.
Mp3, developed by the German ‘Fraunhofer Society’ in the 1990’s, is used for com-pressing sounds by removing inaudible redundancy and frequencies. It is, however, a lossy compression algorithm, called so because it loses some information that is not es-
sential in order to save a lot of space. I decided to gear research mostly to lossless com-pression algorithms, and will mention little more about these.
The ZIP file format is also worthy of mention in any account of data compression. It was
