Multivariable Taylor and the Hessian

A straight line is blind to curvature. The best linear approximation tells you which way a landscape tilts, but stand at a spot where the ground is dead level — a mountain peak, the bottom of a valley, or a mountain pass — and the tilt is zero at all three. The linear term throws up its hands: peak, pit, and pass look identical to first order. To tell them apart you must look at how the surface bends, and bending is a second-order quantity.

The bookkeeping device for all the second derivatives at once is the Hessian matrix, and the statement that packages first and second order together is the second-order Taylor expansion. Together they answer the question every optimiser asks — "is this critical point a minimum, a maximum, or a saddle?" — and they are the reason multivariable optimization has a systematic test at all. The same matrix decides the stability of an equilibrium in physics (a ball rests in a bowl, not on a dome) and drives second-order methods like Newton's in machine learning.

Second order in one variable, then in many

You already know the one-variable Taylor expansion to second order:

f(a + h) = f(a) + f'(a)\,h + \tfrac12 f''(a)\,h^2 + o(h^2).

Three ingredients: the value, the slope times the step, and — the new part — half the curvature times the step squared. That quadratic term is what bends the approximation off the tangent line to hug the graph. Now promote every piece to vectors. Let f : \mathbb{R}^n \to \mathbb{R} be twice continuously differentiable (C^2). Then

f(a + \mathbf{h}) = f(a) + \nabla f(a)\cdot \mathbf{h} + \tfrac12\, \mathbf{h}^{\top} H(a)\, \mathbf{h} + o\big(\lVert \mathbf{h}\rVert^2\big).

The slope f'(a) became the gradient \nabla f(a), and the curvature f''(a) became the Hessian H(a) — the n \times n matrix of all second partial derivatives,

H(a) = \begin{pmatrix} \dfrac{\partial^2 f}{\partial x_1^2} & \cdots & \dfrac{\partial^2 f}{\partial x_1 \partial x_n} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial^2 f}{\partial x_n \partial x_1} & \cdots & \dfrac{\partial^2 f}{\partial x_n^2} \end{pmatrix}, \qquad H_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}.

The middle term \nabla f(a)\cdot\mathbf{h} is the linear map from before; the new term \tfrac12\,\mathbf{h}^\top H\,\mathbf{h} is a quadratic form — the multivariable "\tfrac12 f'' h^2."

See the curvature term earn its place

Below, f(x) = \sin x (faint) is approximated at a base point a two ways: the tangent line f(a) + f'(a)(x - a) (first order) and the Taylor parabola f(a) + f'(a)(x - a) + \tfrac12 f''(a)(x - a)^2 (second order). Slide a and compare.

The line always shoots off tangentially, hugging the curve for only a sliver before drifting away. The parabola, carrying the curvature term \tfrac12 f''(a), bends the right way and clings to the curve over a much wider stretch. Watch the parabola flip from opening upward to opening downward as you cross an inflection point, where f'' changes sign — the sign of the second-order term is exactly the information the linear term cannot carry. In many variables that single sign becomes the definiteness of the Hessian.

The second-derivative test: reading the quadratic form

At a critical point a the gradient vanishes, \nabla f(a) = 0, so the linear term drops out and the Taylor expansion reduces to

f(a + \mathbf{h}) \approx f(a) + \tfrac12\, \mathbf{h}^{\top} H(a)\, \mathbf{h}.

The behaviour near a is now dictated entirely by the quadratic form — by whether \mathbf{h}^\top H\, \mathbf{h} is positive, negative, or mixed. Its sign is governed by the eigenvalues of the symmetric matrix H(a):

At a critical point a of a C^2 function:

In two variables you need not find eigenvalues — the discriminant does it. With H = \begin{pmatrix} f_{xx} & f_{xy} \\ f_{xy} & f_{yy} \end{pmatrix}, let D = \det H = f_{xx}f_{yy} - f_{xy}^2. Then D > 0,\ f_{xx} > 0 gives a minimum; D > 0,\ f_{xx} < 0 a maximum; D < 0 a saddle; and D = 0 is inconclusive. (The determinant is the product of the eigenvalues and f_{xx} reveals their common sign.)

Three worked classifications

Each of these has its only critical point at the origin, where \nabla f = (0, 0).

Bowl — f = x^2 + y^2. H = \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}, eigenvalues 2, 2 both positive; D = 4 > 0 and f_{xx} = 2 > 0. Local minimum — the surface curves up every way you leave the origin.

Saddle — f = x^2 - y^2. H = \begin{pmatrix} 2 & 0 \\ 0 & -2 \end{pmatrix}, eigenvalues 2 and -2 of opposite sign; D = -4 < 0. Saddle — up along the x-axis, down along the y-axis. This is the Pringle-crisp shape, and no first-order information could ever have flagged it.

Inconclusive — f = x^2 + y^4 vs g = x^2 - y^4. Both have the same Hessian at the origin, \begin{pmatrix} 2 & 0 \\ 0 & 0 \end{pmatrix}, with a zero eigenvalue — D = 0. The test is silent, and rightly so: f has a minimum there while g has a saddle. Two functions, identical to second order, different fates — the tie is broken only by the fourth-order term.

Two traps sit around the Hessian test.