Compression formats

From MobileRead
Jump to: navigation, search

Compression Formats can be either lossy or lossless. The idea of any compression to to reduce the storage requirements of data and improve the performance when moving large amounts of data.

Don't confuse compression formats with eBook formats. Although listed by some e-book readers as a supported format, these readers have only the ability to extract the compressed file and to get to the file or files inside. The reader must still support the actual underlying eBook format. Also, some eBook formats already include compression.

[edit] Lossless

These are lossless compression formats that reduce the amount of space required to store a document. Text, unlike music, can be compressed a great deal. Sometimes the compression can be as much as 90%. These formats are considered to be containers in that they can hold multiple files. In some case the ability to hold multiple files is more important than the actual compression.

  • RAR - a file compression system providing one of the most compact resultant files current available in wide distribution. The premier tool for RAR is WinRAR but 7ZIP works as well.
  • ZIP - the most universal of the compression tools. Slightly less efficient than RAR files, ZIP files have been around longer and enjoy more support.
  • LHA - a Japanese developed compressed archive file format. A Microsoft Compressed (LZH) Folder Add-on is included with the Japanese version of Windows to use this format.
  • LZH - Can be used in TIF files. This is an encoding scheme that replaces data with a pointer to previous data that is identical to the current data. It is based on LZ77 and LZ78 compression.
  • LZW - Used on GIF and can be used on TIF files. Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 (LZH) algorithm published by Lempel and Ziv in 1978.
  • LZX - LZX is an LZ77 family compression algorithm. It is also the name of a file archiver with the same name. Both were invented by Jonathan Forbes and Tomi Poutanen. This format is used by Windows primarily and is used on LIT and CHM files. It is also used on cabinet files and WIM (Windows Imaging Format used to compress disk images).
  • GZIP - A zip format (.gz) that was developed by the GNU team. It is designed to be zipped or unzipped on the fly and only supports one file. Often the file is a tar (.tar) format which is a container (archive) format. When used together the file extension is usually .tgz.
  • BZIP2 - compresses files using the Burrows-Wheeler block sorting text compression algorithm, and Huffman coding. Compression is generally considerably better than that achieved by more conventional LZ77 and LZ78-based compressors, and approaches the performance of the PPM (prediction by partial mapping) family of statistical compressors. The file extension is generally .bz2
  • RLE - Run length encoding is a very simple compression scheme where data may appear multiple times in a row. Instead of repeating the data the data is only shown once with a count of the number of times it is to appear.

[edit] lossy compression

This form of data compression attempts to compress by leaving out data that won't be noticed. Examples include JPG, MP3 and AAC. The playback scheme will attempt to fill in the missing data so the user will not notice. It is not generally used on eBook data but is often used on graphics images and sound. It is also popular on digital motion picture video images such as MPEG.

Some consider GIF images to be lossy but it is not due to the compression which uses LZW. It is because GIF only supports a maximum of 256 different colors within the image which may cause slight color loss in some images.

Video compression is nearly always lossy. An exception is motion GIF.

Personal tools

MobileRead Networks