What you need to know about information theory

A lot of the testing and verification activities of EEs today centers on concepts having a theoretical basis in Claude Shannon’s (of Nyquist-Shannon renown) article titled, A Mathematical Theory of Communication, which appeared in 1948 in the Bell System Technical Journal.

Shannon’s full treatise gets to be quite mathematical. Fortunately, some of the fundamental concepts involve a minimum of mathematics that most engineers can readily understand.

Communications over a channel — such as an ethernet cable or a wireless link — is the primary motivation of information theory. However, such channels often fail to produce an exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality.

Shannon stated that these are the basic elements in communication:

A message derived from an information source
A signal that originates at a transmitter
The channel through which the signal is conveyed
A receiver in which the signal becomes the message
A destination, the final recipient of which can be human or inanimate

Shannon spelled out in his noisy channel coding theorem that information, a set of messages sent over a noisy channel, may be reconstructed at the receiver with a low probability of error notwithstanding the channel noise. Shannon stated that the rate of information recovery is equal to the channel capacity, which depends solely on channel metrics.

Subsequently, information theory has become an effort to find methods (codes) capable of reducing error over noisy channels. These codes are mostly of two types: data compression techniques and error correction techniques.

Consider the communications process over a discrete channel. In the diagram, X represents all the messages transmitted, and Y all the messages received during a unit time over our channel. And p(y|x) is the conditional probability distribution function of Y given X. (As a quick review, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value.) Here, p(y|x) is a property of the communications channel representing the nature of the noise in the channel. Then the joint distribution of X and Y (that is, the probability distribution giving the probability that each X, Y falls in any particular range of values) is completely determined by the channel and by f(x), the marginal distribution of messages sent over the channel. The maximum rate of information, or signal, that can be sent over the channel is called the channel capacity, C, and is given by:

This relationship is read as C = I (X) if X ≥ Y, C = I (Y) otherwise. Any rate of information transfer, R, must be below the channel capacity. Formally, this is expressed by saying for any R < C and coding error ε > 0, for a code scheme that is large enough, there exists a code of length N and rate ≥ R and a decoding algorithm, such that the maximal probability of block error (that is, the probability that at least one symbol in a block of symbols is decoded incorrectly) is ≤ ε. That is, it is always possible to transmit with arbitrarily small block error. In addition, for any rate R > C, it is impossible to transmit with an arbitrarily small block error.

Thus channel coding is concerned with finding such nearly optimal codes able to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.

The channel capacity of particular channel models is degraded by noise. An analog communications channel is subject to Gaussian noise. Noise in binary channels gets to be more mathematical. Suffice to say noise increases the probability that the binary output channel flips the input bit, and that there is a complete loss of information about an input bit.

Translated into modern communication parlance, the capacity of a communication system relates to the number of symbol transfers per second, or the symbol rate, known simply as baud; and to the number of differences per symbol, which can be thought of as the symbol space. If we have n symbol transfers per second and s possible symbols, sⁿgives the total possible messages one could send given a sequence of n symbols. Thus the channel capacity can be viewed as the number of things that can be communicated given the sⁿmessage space.

There is a lot more to information theory than just channel capacity. For example, Shannon also introduced the idea of information entropy, a measure of the uncertainty in a message. But it is the basic channel qualities that tend to get the most attention in modern communications.