The KKT Conditions

Lagrange multipliers handle optimization with equality constraints: at the optimum the objective's gradient lines up with the constraint's gradient, \nabla f = \lambda \nabla h. But real problems are full of inequalities — budgets you may or may not exhaust, capacities you may or may not hit. The Karush–Kuhn–Tucker (KKT) conditions are the grand generalization: the single set of equations that a constrained optimum must satisfy, inequalities and all. They are the bedrock on which nearly every constrained solver is built.

Set the problem in standard form — minimize f subject to inequality constraints g_i(\mathbf{x}) \le 0 and equalities h_j(\mathbf{x}) = 0 — and bundle everything into one Lagrangian, attaching a multiplier to each constraint:

\mathcal{L}(\mathbf{x}, \boldsymbol{\mu}, \boldsymbol{\lambda}) = f(\mathbf{x}) + \sum_i \mu_i\, g_i(\mathbf{x}) + \sum_j \lambda_j\, h_j(\mathbf{x}).

The \mu_i \ge 0 price the inequalities and the \lambda_j (any sign) price the equalities.

The four conditions

At a constrained local minimum \mathbf{x}^\star (under a mild constraint qualification), there exist multipliers \boldsymbol{\mu}, \boldsymbol{\lambda} such that:

Stationarity — the Lagrangian's gradient vanishes: \nabla f + \sum_i \mu_i \nabla g_i + \sum_j \lambda_j \nabla h_j = \mathbf{0}.
Primal feasibility — the point obeys the constraints: g_i(\mathbf{x}^\star) \le 0 and h_j(\mathbf{x}^\star) = 0.
Dual feasibility — the inequality multipliers are non-negative: \mu_i \ge 0.
Complementary slackness — \mu_i\, g_i(\mathbf{x}^\star) = 0 for every i.

The fourth is the clever one. It says that for each inequality, at least one of the pair \mu_i and g_i is zero:

If a constraint is inactive (g_i < 0, strictly inside), it isn't holding you back, so its multiplier \mu_i = 0 — it drops out of stationarity entirely.
If a constraint is active (g_i = 0, pressed against the boundary), its multiplier may be positive and it behaves just like a Lagrange equality constraint.

So complementary slackness quietly decides which constraints matter, turning a messy inequality problem into an equality problem on the active set.

A worked KKT solve

Minimize the squared distance to the point (2, 2) subject to a budget x + y \le 2:

\min\; f(x, y) = (x - 2)^2 + (y - 2)^2 \quad\text{s.t.}\quad g(x, y) = x + y - 2 \le 0.

Step 1 — try the unconstrained minimum. That is (2, 2), but 2 + 2 = 4 > 2 violates the budget. So the constraint must be active: x + y = 2.

Step 2 — write stationarity. With \nabla f = (2(x-2),\, 2(y-2)) and \nabla g = (1, 1):

2(x - 2) + \mu = 0, \qquad 2(y - 2) + \mu = 0.

Step 3 — solve. Subtracting the two gives x = y; with x + y = 2 that forces x = y = 1. Back-substitute: \mu = -2(1 - 2) = 2.

Step 4 — check every KKT condition. Primal feasibility: 1 + 1 - 2 = 0 \le 0 ✓. Dual feasibility: \mu = 2 \ge 0 ✓. Complementary slackness: \mu \cdot g = 2 \cdot 0 = 0 ✓. All satisfied, so (1, 1) is the constrained minimum, at squared distance (1-2)^2 + (1-2)^2 = 2.

Geometrically the answer is a tangency: the smallest circle centred at (2, 2) that still touches the feasible half-plane grazes the boundary line at exactly one point, and there its gradient \nabla f is antiparallel to the constraint's \nabla g. Slide the point along the boundary: the circle's radius (the objective) bottoms out precisely at the tangent point.

When KKT is the whole answer

In general the KKT conditions are necessary: any constrained optimum (meeting a constraint qualification) satisfies them — so they generate the candidates. They become sufficient, pinning down a genuine global minimum, exactly when the problem is convex: a convex objective over a convex feasible set. For a convex program, a KKT point is the global optimum — no further checking required. This is why so much effort goes into casting problems as convex: it turns "solve the KKT equations" into a complete method.

For decades these were the "Kuhn–Tucker conditions", after Harold Kuhn and Albert Tucker's 1951 paper. Only later did scholars notice that William Karush had derived the very same conditions in his 1939 master's thesis at the University of Chicago — twelve years earlier and entirely unpublished beyond the department. Karush had moved into other work and never pressed the claim; Kuhn himself, on learning of it, generously insisted the credit be shared. Today the third initial is firmly restored, a small act of mathematical justice — and a reminder that a great idea can sit unnoticed in a library for a decade.

The sign of \mu matters — a lot. For a minimization with constraints written as g \le 0, the multiplier must be \mu \ge 0. Equality multipliers \lambda are free in sign. Mix up the convention (max vs min, \le vs \ge) and a genuine optimum can look like it fails dual feasibility.
Complementary slackness is a case split, not one equation. \mu_i g_i = 0 means each constraint is either active (g_i = 0) or free (\mu_i = 0). You typically solve by guessing the active set, solving the resulting equalities, then checking the guess.
KKT is only necessary without convexity. A non-convex problem can have KKT points that are saddles or mere local optima. And KKT needs a constraint qualification (e.g. LICQ) — at a badly-behaved boundary corner the multipliers may fail to exist at all.