Multivariable Taylor and the Hessian

A straight line is blind to curvature. The best linear approximation tells you which way a landscape tilts, but stand at a spot where the ground is dead level — a mountain peak, the bottom of a valley, or a mountain pass — and the tilt is zero at all three. The linear term throws up its hands: peak, pit, and pass look identical to first order. To tell them apart you must look at how the surface bends, and bending is a second-order quantity.

The bookkeeping device for all the second derivatives at once is the Hessian matrix, and the statement that packages first and second order together is the second-order Taylor expansion. Together they answer the question every optimiser asks — "is this critical point a minimum, a maximum, or a saddle?" — and they are the reason multivariable optimization has a systematic test at all. The same matrix decides the stability of an equilibrium in physics (a ball rests in a bowl, not on a dome) and drives second-order methods like Newton's in machine learning.

Second order in one variable, then in many

You already know the one-variable Taylor expansion to second order:

f(a + h) = f(a) + f'(a)\,h + \tfrac12 f''(a)\,h^2 + o(h^2).

Three ingredients: the value, the slope times the step, and — the new part — half the curvature times the step squared. That quadratic term is what bends the approximation off the tangent line to hug the graph. Now promote every piece to vectors. Let f : \mathbb{R}^n \to \mathbb{R} be twice continuously differentiable (C^2). Then

f(a + \mathbf{h}) = f(a) + \nabla f(a)\cdot \mathbf{h} + \tfrac12\, \mathbf{h}^{\top} H(a)\, \mathbf{h} + o\big(\lVert \mathbf{h}\rVert^2\big).

The slope f'(a) became the gradient \nabla f(a), and the curvature f''(a) became the Hessian H(a) — the n \times n matrix of all second partial derivatives,

H(a) = \begin{pmatrix} \dfrac{\partial^2 f}{\partial x_1^2} & \cdots & \dfrac{\partial^2 f}{\partial x_1 \partial x_n} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial^2 f}{\partial x_n \partial x_1} & \cdots & \dfrac{\partial^2 f}{\partial x_n^2} \end{pmatrix}, \qquad H_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}.

The middle term \nabla f(a)\cdot\mathbf{h} is the linear map from before; the new term \tfrac12\,\mathbf{h}^\top H\,\mathbf{h} is a quadratic form — the multivariable "\tfrac12 f'' h^2."

If f is C^2, mixed partials commute (Clairaut / Schwarz): \dfrac{\partial^2 f}{\partial x_i \partial x_j} = \dfrac{\partial^2 f}{\partial x_j \partial x_i}.
Hence H(a) is a symmetric matrix, so it has real eigenvalues and orthogonal eigenvectors — the fact that makes the second-derivative test work.

See the curvature term earn its place

Below, f(x) = \sin x (faint) is approximated at a base point a two ways: the tangent line f(a) + f'(a)(x - a) (first order) and the Taylor parabola f(a) + f'(a)(x - a) + \tfrac12 f''(a)(x - a)^2 (second order). Slide a and compare.

The line always shoots off tangentially, hugging the curve for only a sliver before drifting away. The parabola, carrying the curvature term \tfrac12 f''(a), bends the right way and clings to the curve over a much wider stretch. Watch the parabola flip from opening upward to opening downward as you cross an inflection point, where f'' changes sign — the sign of the second-order term is exactly the information the linear term cannot carry. In many variables that single sign becomes the definiteness of the Hessian.

The second-derivative test: reading the quadratic form

At a critical point a the gradient vanishes, \nabla f(a) = 0, so the linear term drops out and the Taylor expansion reduces to

f(a + \mathbf{h}) \approx f(a) + \tfrac12\, \mathbf{h}^{\top} H(a)\, \mathbf{h}.

The behaviour near a is now dictated entirely by the quadratic form — by whether \mathbf{h}^\top H\, \mathbf{h} is positive, negative, or mixed. Its sign is governed by the eigenvalues of the symmetric matrix H(a):

At a critical point a of a C^2 function:

H(a) positive definite (all eigenvalues > 0) \Rightarrow strict local minimum — the bowl curves up in every direction.
H(a) negative definite (all eigenvalues < 0) \Rightarrow strict local maximum.
H(a) indefinite (eigenvalues of both signs) \Rightarrow saddle point — up one way, down another.
H(a) singular (some eigenvalue = 0) \Rightarrow the test is inconclusive.

In two variables you need not find eigenvalues — the discriminant does it. With H = \begin{pmatrix} f_{xx} & f_{xy} \\ f_{xy} & f_{yy} \end{pmatrix}, let D = \det H = f_{xx}f_{yy} - f_{xy}^2. Then D > 0,\ f_{xx} > 0 gives a minimum; D > 0,\ f_{xx} < 0 a maximum; D < 0 a saddle; and D = 0 is inconclusive. (The determinant is the product of the eigenvalues and f_{xx} reveals their common sign.)

Three worked classifications

Each of these has its only critical point at the origin, where \nabla f = (0, 0).

Bowl — f = x^2 + y^2. H = \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}, eigenvalues 2, 2 both positive; D = 4 > 0 and f_{xx} = 2 > 0. Local minimum — the surface curves up every way you leave the origin.

Saddle — f = x^2 - y^2. H = \begin{pmatrix} 2 & 0 \\ 0 & -2 \end{pmatrix}, eigenvalues 2 and -2 of opposite sign; D = -4 < 0. Saddle — up along the x-axis, down along the y-axis. This is the Pringle-crisp shape, and no first-order information could ever have flagged it.

Inconclusive — f = x^2 + y^4 vs g = x^2 - y^4. Both have the same Hessian at the origin, \begin{pmatrix} 2 & 0 \\ 0 & 0 \end{pmatrix}, with a zero eigenvalue — D = 0. The test is silent, and rightly so: f has a minimum there while g has a saddle. Two functions, identical to second order, different fates — the tie is broken only by the fourth-order term.

Two traps sit around the Hessian test.

Inconclusive is a real verdict, not a minimum. When D = 0 (or any eigenvalue is 0) the quadratic form is flat in some direction, and the outcome is decided by higher-order terms the Hessian cannot see. The pair x^2 \pm y^4 above — same Hessian, one a minimum and one a saddle — is the standard warning. Do not read "D = 0" as "boundary case, probably a minimum"; it means "look harder."
The Hessian is only guaranteed symmetric for C^2 functions. Clairaut's theorem — that f_{xy} = f_{yx} — needs the mixed partials to be continuous. There is a famous rogue, f(x,y) = \dfrac{xy(x^2 - y^2)}{x^2 + y^2} (with f(0,0)=0), whose mixed partials at the origin disagree (f_{xy}(0,0) = -1 but f_{yx}(0,0) = +1). Its Hessian is not symmetric, and the eigenvalue reasoning breaks. Almost every function you meet is C^2, but the hypothesis is doing quiet work.