Postgraduate Science

Guest · Admin login

Optimization: Gradient Descent & Convexity

First-order optimization, convex functions, and why convexity is so powerful.

Optimize a smooth $f: \mathbb R^n \to \mathbb R$. At a critical point $\nabla f(x^*) = 0$; local minima additionally have positive-semidefinite Hessian.

Gradient descent: iteratively step against the gradient,

$$x_{k+1} = x_k - \eta\, \nabla f(x_k).$$

For $L$-smooth convex $f$ with step size $\eta \leq 1/L$, the sub-optimality decays as $O(1/k)$; for strongly convex, $O(e^{-k})$ (linear / geometric rate). Stochastic variants (SGD) average noisy gradients, achieving $O(1/\sqrt k)$ for general convex problems.

A set $C$ is convex iff $\lambda x + (1 - \lambda) y \in C$ for all $x, y \in C$ and $\lambda \in [0,1]$. A function $f$ is convex iff its epigraph is convex; equivalently $f(\lambda x + (1-\lambda) y) \leq \lambda f(x) + (1-\lambda) f(y)$ — Jensen's inequality.

Why convexity matters: every local minimum is a global minimum, and many efficient algorithms have provable convergence guarantees. Convex problems are the bedrock of LP, QP, SDP, and modern ML loss landscapes.

Interactive: gradient descent on a 2D landscape

Quiz

1. At a local minimum of a smooth $f$ on $\mathbb R^n$:
2. For an $L$-smooth convex $f$, gradient descent with $\eta = 1/L$ converges at rate:
3. Jensen's inequality for convex $f$:
4. Every local minimum of a convex function is:
5. SGD (stochastic gradient descent) has convergence rate for convex $f$:
6. The intersection of convex sets is: