Postgraduate Science

Guest · Admin login

Information Theory: Entropy & Channel Capacity

Shannon entropy, mutual information, and the noisy-channel coding theorem.

Shannon entropy of a discrete random variable $X$ with PMF $p(x)$:

$$H(X) = -\sum_x p(x) \log_2 p(x) \quad \text{(bits)}.$$

Properties: $H(X) \geq 0$, with equality iff $X$ is deterministic; maximized at the uniform distribution on $|X|$ outcomes, giving $H = \log_2 |X|$. Joint and conditional entropy satisfy $H(X, Y) = H(X) + H(Y \mid X)$.

Mutual information:

$$I(X; Y) = H(X) + H(Y) - H(X, Y) = H(X) - H(X \mid Y) \geq 0.$$

Symmetric, nonnegative, and zero iff $X \perp Y$.

Source coding theorem: $H(X)$ is the minimum average codeword length (in bits/symbol) for lossless compression of i.i.d. $X$. Noisy-channel coding theorem: the maximum reliable rate over a memoryless channel is

$$C = \max_{p(x)} I(X; Y) \quad \text{(channel capacity)}.$$

For the binary symmetric channel with bit-flip probability $p$, $C_{\rm BSC} = 1 - H_2(p)$ where $H_2(p) = -p \log_2 p - (1-p)\log_2(1-p)$ is the binary entropy.

Interactive: binary entropy curve

Quiz

1. Entropy $H(X)$ of a uniform distribution on $N$ outcomes is:
2. Mutual information $I(X;Y)$ is:
3. The Shannon source-coding theorem says you can compress an i.i.d. source down to (but not below) on average:
4. Capacity of the binary symmetric channel with crossover $p$:
5. $H(X, Y) = $ ?
6. Kullback–Leibler divergence $D(p \| q)$ satisfies: