What Is Optimization?

Every time you pick the shortest queue, pack a suitcase to fit, or leave home at the last safe moment, you are optimizing — searching a space of choices for the one that scores best. Airlines route planes to burn the least fuel; a phone battery charges as fast as heat allows; a portfolio chases the most return for a fixed risk. Optimization is the branch of mathematics that makes "the best choice" precise and then finds it.

Three ingredients turn a vague wish into a solvable problem:

Decision variables \mathbf{x} = (x_1, \dots, x_n) — the knobs you get to turn.
an objective function f(\mathbf{x}) — the single number you want as small (or as large) as possible.
constraints — the rules the choice must obey, carving out the feasible region of allowed \mathbf{x}.

Written out, the whole subject fits in one template — the standard form:

\min_{\mathbf{x}} \; f(\mathbf{x}) \quad\text{subject to}\quad g_i(\mathbf{x}) \le 0,\quad h_j(\mathbf{x}) = 0.

The g_i are inequality constraints (budgets, capacities, "at least this much"), the h_j are equality constraints (something must balance exactly). The set of \mathbf{x} satisfying every constraint is the feasible region \mathcal{F}; a point inside it is feasible, a point outside is infeasible. Optimization asks for the feasible point with the best objective value.

Minimize or maximize? Same problem

By convention we write everything as a minimization. That costs nothing, because maximizing f is exactly minimizing -f:

\max_{\mathbf{x}\in\mathcal{F}} f(\mathbf{x}) = -\min_{\mathbf{x}\in\mathcal{F}} \bigl(-f(\mathbf{x})\bigr).

Flip the sign of the objective, solve the minimization, and flip the sign of the answer back. The location \mathbf{x}^\star of the optimum is identical either way — only the reported value changes sign. So a solver only ever needs to know how to go downhill.

Likewise, a "\ge" constraint becomes a "\le" one by negating it: g(\mathbf{x}) \ge 0 is the same rule as -g(\mathbf{x}) \le 0. Every optimization problem can therefore be squeezed into the one standard form above — which is why a single theory covers such a sprawling zoo of applications.

Local versus global — the central difficulty

A point \mathbf{x}^\star is a local minimum if it beats every feasible neighbour close by; it is a global minimum if it beats every feasible point everywhere. A landscape can have many local minima but only the deepest is global. Standing in a valley, you cannot tell by looking at your feet whether a deeper valley lies over the next ridge — and that, in one sentence, is why optimization is hard.

The picture below is an objective f(x) = \sin x + 0.15\,x restricted to a feasible interval. It dips into several local minima, each a little higher than the last because of the upward drift. Reveal the steps: the feasible interval, then the local minima, then the single global minimum among them. A greedy searcher that stopped at the first dip it found would still be wrong about the whole picture.

A worked set-up: the fence problem

A farmer has 100 m of fencing and wants to enclose the largest possible rectangular paddock against a straight river (so one side needs no fence). Turn the wish into standard form.

Variables. Let the two sides perpendicular to the river each be x and the side parallel be y.

Objective. Maximize the area A = xy — equivalently, minimize -xy.

Constraints. The fence used is 2x + y = 100 (an equality), and lengths are non-negative: x \ge 0, y \ge 0 (inequalities).

Solve. The equality lets us eliminate y = 100 - 2x, so A(x) = x(100 - 2x) = 100x - 2x^2. Then A'(x) = 100 - 4x = 0 \Rightarrow x = 25, giving y = 50 and area A = 1250 m². The feasible region for x is the interval [0, 50]; the optimum sits in its interior, and the boundary points x = 0 or x = 50 give zero area. This is the whole optimization workflow in miniature: name the variables, write the objective, list the constraints, then hunt the feasible region for the best point.

In linear programming, integer programming, dynamic programming, the word "programming" predates the electronic computer. It comes from military logistics of the 1940s, where a program meant a schedule or plan of activities — how many tons of supplies to move where and when. George Dantzig, who invented the simplex method in 1947, was literally optimizing programs in that older sense: timetables and deployment plans for the U.S. Air Force. The name stuck long after the meaning drifted, so "mathematical programming" is just the historical name for optimization.

A local minimum is not automatically global. Finding a point where the objective can't be improved locally tells you nothing about far-off valleys. Only special structure — most importantly convexity — promotes "local" to "global".
The optimum can hide on the boundary. Setting \nabla f = \mathbf{0} only finds interior stationary points. On a constrained region the best feasible point may sit on an edge or a corner where the gradient does not vanish at all — you must check the boundary too.
Mind the sign when you flip max to min. Minimizing -f finds the same location as maximizing f, but its reported value is negated. Forget to flip it back and your "maximum profit" comes out negative.