Compression

 

Although the cost of a byte of storage has declined rapidly, and is still declining, use of data compression techniques can almost always reduce the effective cost still further by squeezing more data into the same space. Consider a text archive or collection of documents. It may be advantageous to hold it in compressed form to save space if access to a particular document is infrequent (therefore expansion of it performed rarely) but the document may be required quickly (thus it needs to be on-line). Compression could also save time (and money) when data is transmitted; for example, compression of source code might reduce the number of diskettes needed to distribute software.

Data compression relies on there being redundancy in the input. Random strings of characters are not compressible to any great degree, neither are object files. Natural language text is redundant in that not all text units (characters, character pairs, words) occur with equal frequency. Compression tends to remove redundancy, thus compression of a compressed file is normally not worthwhile. Usually compression without loss of information is required, that is, the input file should be exactly recoverable by application of some corresponding expansion technique. In some cases, an inexact reversal may be acceptable. For example, when expanded, a source program in a free format language may not need to have exactly the same layout as the original.

There are many ways of measuring the degree of compression achieved, the following is a usual one:

Length (input) – length (output) – size (X)

Length (input)

X is any information that we need in addition to the compressed text in order to be able to recreate the original. For example, if the original file is 2000 byte long and is compressed to 1000 bytes, and a 100-byte table is required to expand the file back again, then the degree of compression is 45%. Because of overheads, the “compressed” version of a short file might be larger than the original. In general, compression techniques operate by mapping sections of the input file onto (smaller) sections of the output file. We can classify techniques by the type of the input object replaced (fixed or variable length) and the type of output object (fixed or variable length). Additional characteristics of a compression method are whether the mapping is adaptive (varies as the input is processed) or static and whether the compression requires one pass or more than one pass over the input file.

Compression is not without disadvantages: reduced redundancy makes a file more vulnerable to storage and transmission errors.

 

The words to the text:

 

to decline уменьшаться, ухудшаться

rapid быстрый, скорый

to reduce уменьшать, превращать

to squeeze втискивать, сжимать, сдавливать

to expand расширять, наращивать

survey осмотр, обзор, обследование

redundancy избыточность, чрезмерность

to occur происходить, встречаться

worthwhile стоящий

loss потеря, убыток

recoverable восстанавливаемый

corresponding соответствующий

reverse обратный, негативный

layout размещение, расположение, план

to measure измерять, оценивать

overheads накладные расходы

to map отображать, преобразовывать

variable переменная, изменяемый

pass проход, просмотр; пропускать

vulnerable уязвимый, ранимый