Information Theory: Entropy & Channel Capacity

Shannon entropy of a discrete random variable $X$ with PMF $p(x)$:

$$H(X) = -\sum_x p(x) \log_2 p(x) \quad \text{(bits)}.$$

Properties: $H(X) \geq 0$, with equality iff $X$ is deterministic; maximized at the uniform distribution on $|X|$ outcomes, giving $H = \log_2 |X|$. Joint and conditional entropy satisfy $H(X, Y) = H(X) + H(Y \mid X)$.

Mutual information:

$$I(X; Y) = H(X) + H(Y) - H(X, Y) = H(X) - H(X \mid Y) \geq 0.$$

Symmetric, nonnegative, and zero iff $X \perp Y$.

Source coding theorem: $H(X)$ is the minimum average codeword length (in bits/symbol) for lossless compression of i.i.d. $X$. Noisy-channel coding theorem: the maximum reliable rate over a memoryless channel is

$$C = \max_{p(x)} I(X; Y) \quad \text{(channel capacity)}.$$

For the binary symmetric channel with bit-flip probability $p$, $C_{\rm BSC} = 1 - H_2(p)$ where $H_2(p) = -p \log_2 p - (1-p)\log_2(1-p)$ is the binary entropy.

Postgraduate Science

Information Theory: Entropy & Channel Capacity

Interactive: binary entropy curve

Quiz