The Gradient

The directional derivative gave us a vector worth naming. Collecting the two partials of f(x, y) into a single vector defines the gradient:

\nabla f = \left( f_x,\; f_y \right).

It is more than bookkeeping. The gradient is a genuine arrow living in the xy-plane, and that arrow has two superpowers: it points the way steepest uphill, its length is exactly how steep that climb is, and it always stands perpendicular to the level curves. All three fall out of one identity.

Deriving the three properties

Everything starts from the directional-derivative formula and a single fact about dot products.

Step 1 — write the rate as a dot product. For a unit direction \mathbf{u},

D_{\mathbf{u}} f = \nabla f \cdot \mathbf{u}.

Step 2 — turn the dot product into a cosine. For any two vectors, \mathbf{a} \cdot \mathbf{b} = \|\mathbf{a}\|\,\|\mathbf{b}\|\cos\theta. Here \|\mathbf{u}\| = 1, so with \theta the angle between \nabla f and \mathbf{u},

D_{\mathbf{u}} f = \|\nabla f\|\,\|\mathbf{u}\|\cos\theta = \|\nabla f\|\cos\theta.

Step 3 — maximise over directions. As \mathbf{u} swings around, the only thing that changes is \cos\theta, which is largest when \theta = 0 — that is, when \mathbf{u} points along \nabla f. There \cos\theta = 1 and

\max_{\mathbf{u}} D_{\mathbf{u}} f = \|\nabla f\|.

So \nabla f points in the direction of steepest ascent, and the steepest slope is its magnitude \|\nabla f\|. (At \theta = 180^\circ, \cos\theta = -1: the opposite direction -\nabla f is steepest descent.)

Step 4 — set up the level curve. A level curve is the set where f stays constant. Walk along it with a path \big(x(t), y(t)\big), so that

f\big(x(t), y(t)\big) = c \quad\text{(constant) for all } t.

Step 5 — differentiate the constant. The right side is constant, so its derivative is zero; the left side opens up by the chain rule:

\frac{d}{dt}\, f\big(x(t), y(t)\big) = f_x\, x'(t) + f_y\, y'(t) = 0.

Step 6 — recognise the dot product. That sum is the gradient dotted with the path's velocity \mathbf{T} = (x', y'), which is the tangent to the level curve:

\nabla f \cdot \mathbf{T} = 0.

A zero dot product means perpendicular. The gradient is at right angles to the level curve through every point — it always points straight "across the contours", never along them.

Let f be differentiable at a point with \nabla f \neq \mathbf{0} there. Then:

A worked example

Let f(x, y) = x^2 + y^2 at the point (3, 4).

Step 1 — the gradient. f_x = 2x, f_y = 2y, so \nabla f(3, 4) = (6, 8).

Step 2 — direction of steepest ascent. Straight along (6, 8) — radially outward from the origin, which makes sense: f is a bowl and the fastest way up is directly away from the bottom.

Step 3 — maximum rate. \|\nabla f\| = \sqrt{6^2 + 8^2} = \sqrt{100} = 10.

Step 4 — perpendicularity check. The level curves of x^2 + y^2 are circles centred at the origin; their tangents are perpendicular to the radius, and (6, 8) is exactly the radial direction. The gradient crosses the contour at a right angle, as promised.

If \nabla f is the steepest way up, then -\nabla f is the steepest way down — and rolling downhill is how almost every machine-learning model is trained. Gradient descent repeatedly nudges a point against its gradient,

\mathbf{x}_{n+1} = \mathbf{x}_n - \eta\, \nabla f(\mathbf{x}_n),

where the small step size \eta is the "learning rate". Each step lowers f (for small enough \eta), and the process coasts to a halt exactly where \nabla f = \mathbf{0} — a critical point, the subject of the next page. Training a neural network with a billion parameters is, at heart, this one line run a great many times on a function nobody could ever picture. The gradient is what makes the impossible-to-visualise navigable.

Watch it stay perpendicular

The rings are level curves of f(x, y) = x^2 + y^2 (circles). Slide the point around its ring: the gradient arrow \nabla f = (2x, 2y) always points straight outward, crossing the contour at a perfect right angle and growing longer the farther out you go — its length \|\nabla f\| = 2\sqrt{x^2 + y^2} is the steepest slope there.