The KKT Conditions

Lagrange multipliers handle optimization with equality constraints: at the optimum the objective's gradient lines up with the constraint's gradient, \nabla f = \lambda \nabla h. But real problems are full of inequalities — budgets you may or may not exhaust, capacities you may or may not hit. The Karush–Kuhn–Tucker (KKT) conditions are the grand generalization: the single set of equations that a constrained optimum must satisfy, inequalities and all. They are the bedrock on which nearly every constrained solver is built.

Set the problem in standard form — minimize f subject to inequality constraints g_i(\mathbf{x}) \le 0 and equalities h_j(\mathbf{x}) = 0 — and bundle everything into one Lagrangian, attaching a multiplier to each constraint:

\mathcal{L}(\mathbf{x}, \boldsymbol{\mu}, \boldsymbol{\lambda}) = f(\mathbf{x}) + \sum_i \mu_i\, g_i(\mathbf{x}) + \sum_j \lambda_j\, h_j(\mathbf{x}).

The \mu_i \ge 0 price the inequalities and the \lambda_j (any sign) price the equalities.

The four conditions

At a constrained local minimum \mathbf{x}^\star (under a mild constraint qualification), there exist multipliers \boldsymbol{\mu}, \boldsymbol{\lambda} such that:

The fourth is the clever one. It says that for each inequality, at least one of the pair \mu_i and g_i is zero:

So complementary slackness quietly decides which constraints matter, turning a messy inequality problem into an equality problem on the active set.

A worked KKT solve

Minimize the squared distance to the point (2, 2) subject to a budget x + y \le 2:

\min\; f(x, y) = (x - 2)^2 + (y - 2)^2 \quad\text{s.t.}\quad g(x, y) = x + y - 2 \le 0.

Step 1 — try the unconstrained minimum. That is (2, 2), but 2 + 2 = 4 > 2 violates the budget. So the constraint must be active: x + y = 2.

Step 2 — write stationarity. With \nabla f = (2(x-2),\, 2(y-2)) and \nabla g = (1, 1):

2(x - 2) + \mu = 0, \qquad 2(y - 2) + \mu = 0.

Step 3 — solve. Subtracting the two gives x = y; with x + y = 2 that forces x = y = 1. Back-substitute: \mu = -2(1 - 2) = 2.

Step 4 — check every KKT condition. Primal feasibility: 1 + 1 - 2 = 0 \le 0 ✓. Dual feasibility: \mu = 2 \ge 0 ✓. Complementary slackness: \mu \cdot g = 2 \cdot 0 = 0 ✓. All satisfied, so (1, 1) is the constrained minimum, at squared distance (1-2)^2 + (1-2)^2 = 2.

Geometrically the answer is a tangency: the smallest circle centred at (2, 2) that still touches the feasible half-plane grazes the boundary line at exactly one point, and there its gradient \nabla f is antiparallel to the constraint's \nabla g. Slide the point along the boundary: the circle's radius (the objective) bottoms out precisely at the tangent point.

When KKT is the whole answer

In general the KKT conditions are necessary: any constrained optimum (meeting a constraint qualification) satisfies them — so they generate the candidates. They become sufficient, pinning down a genuine global minimum, exactly when the problem is convex: a convex objective over a convex feasible set. For a convex program, a KKT point is the global optimum — no further checking required. This is why so much effort goes into casting problems as convex: it turns "solve the KKT equations" into a complete method.

For decades these were the "Kuhn–Tucker conditions", after Harold Kuhn and Albert Tucker's 1951 paper. Only later did scholars notice that William Karush had derived the very same conditions in his 1939 master's thesis at the University of Chicago — twelve years earlier and entirely unpublished beyond the department. Karush had moved into other work and never pressed the claim; Kuhn himself, on learning of it, generously insisted the credit be shared. Today the third initial is firmly restored, a small act of mathematical justice — and a reminder that a great idea can sit unnoticed in a library for a decade.