Lagrange Multipliers

Unconstrained optimisation sets the whole gradient to zero. But the interesting problems come fenced in: maximise output subject to a fixed budget, minimise surface area at a fixed volume. We want to optimise f(x, y) while a constraint holds,

g(x, y) = c.

Now \nabla f = \mathbf{0} is the wrong condition: the best point on the constraint curve is rarely a free peak of f. The right condition is startlingly clean. At a constrained optimum the two gradients are parallel:

\nabla f = \lambda\, \nabla g,

for some scalar \lambda, the Lagrange multiplier. One little equation captures the whole geometry.

Deriving the condition geometrically

The argument rests entirely on what the gradient means — steepest ascent, and perpendicular to level curves.

Step 1 — you are confined to the constraint curve. Every allowed point satisfies g = c, so you may only travel along the curve g(x, y) = c. Let \mathbf{T} be its tangent direction at a candidate point.

Step 2 — at a constrained max, moving can't increase f. If stepping along the curve in direction \mathbf{T} raised f even slightly, the point wouldn't be the maximum — you'd just take that step. So the rate of change of f along the curve must be zero. That rate is a directional derivative:

D_{\mathbf{T}} f = \nabla f \cdot \mathbf{T} = 0.

Step 3 — so \nabla f has no component along the constraint. A zero dot product means \nabla f is perpendicular to the tangent \mathbf{T} of the constraint curve.

Step 4 — but \nabla g is also perpendicular to that tangent. The constraint curve is itself a level curve of g (the level g = c), and a gradient is always perpendicular to its own level curve:

\nabla g \cdot \mathbf{T} = 0.

Step 5 — two vectors perpendicular to the same line are parallel. In the plane there is only one direction perpendicular to \mathbf{T} (up to sign). Both \nabla f and \nabla g point along it, so they are scalar multiples of each other:

\nabla f = \lambda\, \nabla g.

Geometrically: at the optimum the level curve of f is tangent to the constraint curve. The contours of f are sliding across the constraint, and the last one they touch before leaving is the one that just kisses it — sharing a tangent line, hence parallel normals.

To find the extrema of f(x, y) subject to g(x, y) = c (with \nabla g \neq \mathbf{0} on the constraint), solve the system:

Parallel gradients: \nabla f = \lambda\, \nabla g, i.e. f_x = \lambda g_x and f_y = \lambda g_y.
The constraint itself: g(x, y) = c.
That is three equations in the three unknowns x, y, \lambda; solve for the candidate points.
Evaluate f at every solution; the largest value is the constrained maximum, the smallest the constrained minimum.

A worked example

Maximise f(x, y) = xy subject to the constraint x + y = 10. (The largest-area rectangle of a fixed perimeter, in disguise.)

Step 1 — the two gradients. With g(x, y) = x + y,

\nabla f = (y,\, x), \qquad \nabla g = (1,\, 1).

Step 2 — set them parallel. \nabla f = \lambda \nabla g gives two equations:

y = \lambda \cdot 1, \qquad x = \lambda \cdot 1.

Step 3 — eliminate \lambda. Both equal \lambda, so x = y.

Step 4 — use the constraint. Substitute x = y into x + y = 10:

x + x = 10 \;\Rightarrow\; x = 5, \quad y = 5.

Step 5 — the optimal value. f(5, 5) = 5 \cdot 5 = 25, with multiplier \lambda = 5. Among all ways to split 10 into two parts, the product is largest when the parts are equal — the constrained maximum is 25.

The multiplier \lambda looks like algebraic scaffolding to be discarded, but it carries real meaning. It measures how much the optimal value of f would change if you relaxed the constraint by one unit. Writing the optimum as a function of the constraint level c,

\frac{d f^{*}}{d c} = \lambda.

In our example \lambda = 5: nudging the budget from x + y = 10 to 11 would lift the maximum product by about 5 (indeed the new optimum is 5.5^2 = 30.25, up 5.25 — and 5 is the first-order estimate). To an economist this is a shadow price: the marginal value of one more unit of the scarce resource. It tells a firm exactly how much it should be willing to pay to loosen a binding constraint — the sensitivity of the prize to the fence.

Slide to the tangency

The straight line is the constraint x + y = 10; the curved arcs are level curves of f = xy (hyperbolae, one per value of the product). Slide the point along the constraint. Watch the contour through the point: almost everywhere it cuts across the line, meaning you could slide to a higher contour. At x = y = 5 the contour is tangent — its gradient \nabla f = (y, x) lines up with \nabla g = (1, 1) — and the product f = xy peaks at 25.