The KKT Conditions
Lagrange multipliers
handle optimization with equality constraints: at the optimum the objective's gradient lines
up with the constraint's gradient, \nabla f = \lambda \nabla h. But real
problems are full of inequalities — budgets you may or may not exhaust, capacities you may or
may not hit. The Karush–Kuhn–Tucker (KKT) conditions are the grand generalization:
the single set of equations that a constrained optimum must satisfy, inequalities and all. They are the
bedrock on which nearly every constrained solver is built.
Set the problem in standard form — minimize f subject to inequality
constraints g_i(\mathbf{x}) \le 0 and equalities
h_j(\mathbf{x}) = 0 — and bundle everything into one
Lagrangian, attaching a multiplier to each constraint:
\mathcal{L}(\mathbf{x}, \boldsymbol{\mu}, \boldsymbol{\lambda})
= f(\mathbf{x}) + \sum_i \mu_i\, g_i(\mathbf{x}) + \sum_j \lambda_j\, h_j(\mathbf{x}).
The \mu_i \ge 0 price the inequalities and the
\lambda_j (any sign) price the equalities.
The four conditions
At a constrained local minimum \mathbf{x}^\star (under a mild constraint
qualification), there exist multipliers \boldsymbol{\mu}, \boldsymbol{\lambda}
such that:
- Stationarity — the Lagrangian's gradient vanishes:
\nabla f + \sum_i \mu_i \nabla g_i + \sum_j \lambda_j \nabla h_j = \mathbf{0}.
- Primal feasibility — the point obeys the constraints:
g_i(\mathbf{x}^\star) \le 0 and
h_j(\mathbf{x}^\star) = 0.
- Dual feasibility — the inequality multipliers are non-negative:
\mu_i \ge 0.
- Complementary slackness —
\mu_i\, g_i(\mathbf{x}^\star) = 0 for every
i.
The fourth is the clever one. It says that for each inequality, at least one of the pair
\mu_i and g_i is zero:
- If a constraint is inactive (g_i < 0, strictly inside),
it isn't holding you back, so its multiplier \mu_i = 0 — it drops out of
stationarity entirely.
- If a constraint is active (g_i = 0, pressed against the
boundary), its multiplier may be positive and it behaves just like a Lagrange equality constraint.
So complementary slackness quietly decides which constraints matter, turning a messy
inequality problem into an equality problem on the active set.
A worked KKT solve
Minimize the squared distance to the point (2, 2) subject to a budget
x + y \le 2:
\min\; f(x, y) = (x - 2)^2 + (y - 2)^2 \quad\text{s.t.}\quad g(x, y) = x + y - 2 \le 0.
Step 1 — try the unconstrained minimum. That is (2, 2), but
2 + 2 = 4 > 2 violates the budget. So the constraint must be
active: x + y = 2.
Step 2 — write stationarity. With
\nabla f = (2(x-2),\, 2(y-2)) and
\nabla g = (1, 1):
2(x - 2) + \mu = 0, \qquad 2(y - 2) + \mu = 0.
Step 3 — solve. Subtracting the two gives x = y; with
x + y = 2 that forces x = y = 1. Back-substitute:
\mu = -2(1 - 2) = 2.
Step 4 — check every KKT condition. Primal feasibility:
1 + 1 - 2 = 0 \le 0 ✓. Dual feasibility:
\mu = 2 \ge 0 ✓. Complementary slackness:
\mu \cdot g = 2 \cdot 0 = 0 ✓. All satisfied, so
(1, 1) is the constrained minimum, at squared distance
(1-2)^2 + (1-2)^2 = 2.
Geometrically the answer is a tangency: the smallest circle centred at
(2, 2) that still touches the feasible half-plane grazes the boundary line
at exactly one point, and there its gradient \nabla f is antiparallel to
the constraint's \nabla g. Slide the point along the boundary: the circle's
radius (the objective) bottoms out precisely at the tangent point.
When KKT is the whole answer
In general the KKT conditions are necessary: any constrained optimum (meeting a
constraint qualification) satisfies them — so they generate the candidates. They become
sufficient, pinning down a genuine global minimum, exactly when the problem
is convex: a convex
objective over a convex feasible set. For a convex program, a KKT point is the global optimum
— no further checking required. This is why so much effort goes into casting problems as convex: it
turns "solve the KKT equations" into a complete method.
For decades these were the "Kuhn–Tucker conditions", after Harold Kuhn and Albert Tucker's 1951 paper.
Only later did scholars notice that William Karush had derived the very same conditions
in his 1939 master's thesis at the University of Chicago — twelve years earlier and entirely
unpublished beyond the department. Karush had moved into other work and never pressed the claim; Kuhn
himself, on learning of it, generously insisted the credit be shared. Today the third initial is firmly
restored, a small act of mathematical justice — and a reminder that a great idea can sit unnoticed in a
library for a decade.
-
The sign of \mu matters — a lot. For a
minimization with constraints written as g \le 0, the multiplier
must be \mu \ge 0. Equality multipliers \lambda
are free in sign. Mix up the convention (max vs min, \le vs
\ge) and a genuine optimum can look like it fails dual feasibility.
-
Complementary slackness is a case split, not one equation.
\mu_i g_i = 0 means each constraint is either active
(g_i = 0) or free (\mu_i = 0). You typically
solve by guessing the active set, solving the resulting equalities, then checking the guess.
-
KKT is only necessary without convexity. A non-convex problem can have KKT points
that are saddles or mere local optima. And KKT needs a constraint qualification (e.g. LICQ)
— at a badly-behaved boundary corner the multipliers may fail to exist at all.