The Pumping Lemma

We already know that a finite-state machine has no real memory — only its current state. So it feels obvious that a language like "the same number of 0s then 1s" cannot be regular: the machine would have to count, and it has nowhere to keep the count. But "it feels obvious" is not a proof. How do you prove, rigorously, that no finite-state machine — however cleverly built, however many states — can recognise a language?

The pumping lemma is the tool for exactly this. It is the single most useful theorem for showing a language is not regular, and it works by turning the machine's finiteness against it with a beautifully simple counting argument: the pigeonhole principle.

The idea: a long walk must repeat a state

Suppose a finite-state machine has p states, and we feed it a string that is at least p symbols long. As it reads the string, it passes through a sequence of states — one before each symbol, so at least p + 1 states in all. But there are only p different states available. By the pigeonhole principle, some state must be visited twice.

The chunk of input read between those two visits forms a loop in the machine: it starts and ends in the very same state. And here is the crucial consequence — if a loop takes you from a state back to itself, you can go around it as many times as you like (or skip it entirely) and the machine cannot tell the difference. Every one of those altered strings ends in the same final state, so the machine accepts them all, or rejects them all.

Break the string into three pieces x\,y\,z: the walk up to the first visit (x), the loop (y), and the rest (z). Because y is a loop, every string x\,y^i\,zy repeated i times, for any i \ge 0 — is treated identically. "Pumping" y is repeating it. Step through the diagram to watch the loop grow.

The lemma, stated precisely

The counting argument packages up into a clean guarantee about every regular language:

Read it as a promise the language must keep. If a language is regular, it must let you pump every long string. So to prove a language is not regular, you show it breaks that promise — you find one long string that cannot possibly be pumped.

Worked example: \{\,0^{n}1^{n}\,\} is not regular

Let L = \{\,0^{n}1^{n} : n \ge 0\,\} — equal numbers of 0s then 1s. This is exactly the balanced-brackets language a finite-state machine choked on. We prove it is not regular by contradiction, using the lemma as our lever.

  1. Assume L is regular. Then it has some pumping length p.
  2. Pick a stubborn string. Choose s = 0^{p}1^{p}. It is in L, and |s| = 2p \ge p, so the lemma applies — it must split as s = xyz with the three conditions.
  3. Corner the loop. Since |xy| \le p and the first p symbols of s are all 0s, the loop y lies entirely inside the block of 0s. And |y| \ge 1, so y is one or more 0s.
  4. Pump and contradict. Pump up to i = 2: xy^{2}z has extra 0s but the same number of 1s. So it has more 0s than 1s and is not in L — yet the lemma guaranteed it would be. Contradiction.

The assumption was the only thing that could give. Therefore L is not regular. No finite-state machine, of any size, recognises equal-count strings — exactly the intuition, now nailed down.

You have a free choice of i — pick whichever value breaks the language most easily. For 0^{p}1^{p}, i = 0 works just as well: deleting y removes some 0s and again unbalances the counts. Pumping down to zero copies is often the slickest contradiction — one well-chosen i is all you need.

It is a game against an adversary

The trickiest thing about the lemma is the tangle of "for all" and "there exists". The cleanest way to keep them straight is to picture a two-player game — you are trying to prove the language is not regular, and a stubborn Adversary is trying to stop you:

Notice how the quantifiers alternate: for all p, there exists a string, for all splits, there exists an i. Winning the game is the proof.

The pumping lemma is a one-way tool. It says "regular \Rightarrow pumpable", so failing to pump proves not regular. But it does not say the reverse: some non-regular languages happen to be pumpable anyway. So you can never use the pumping lemma to prove a language is regular — passing the test proves nothing. To show a language is regular, build a finite-state machine or a regular expression for it.

A second classic slip: letting yourself choose the split xyz. You don't — the Adversary does, so your argument has to defeat every valid split at once. That is exactly why the |xy| \le p condition is such a gift: it pins the loop into the opening block of the string, killing off most of the Adversary's options for you.