Algorithm efficiency

Imagine two friends each looking up the name "Zara" in a thick paper phone book. Alice starts at page one and reads every single name in order until she reaches Zara near the very end. Ben opens the book in the middle, sees he's landed on "M", and throws away the whole first half in one go — then repeats, halving what's left each time. Both find Zara. Both are correct. But Ben finishes in a handful of flips while Alice is still turning pages.

This is the heart of algorithm efficiency: two algorithms can solve the very same problem and both give the right answer, yet one does dramatically less work than the other. Choosing the clever one isn't showing off — for a big enough phone book it's the difference between an answer now and an answer next week. This page is about how we measure "less work" in a fair way, and how to name the patterns we find.

Why we don't just use a stopwatch

The obvious idea is to time each algorithm and pick the faster one. But a stopwatch measures the computer, not the algorithm. Run Alice's method on a brand-new gaming PC and Ben's on a ten-year-old laptop and you might "prove" that reading every page is quicker — which tells you nothing useful. The same code is faster on a faster machine, slower when the computer is busy, and different again in another programming language.

So computer scientists ignore seconds and count steps instead — the number of basic operations the algorithm performs, such as comparisons or swaps. And we don't care about the count for one particular input; we ask a sharper question:

As the input size n gets bigger, how fast does the number of steps grow?

Here n is the "size" of the input — the number of names in the phone book, the length of a list, the number of items to sort. An algorithm whose work grows slowly as n grows will win on big inputs no matter how fast or slow the computer is. Growth is the thing that matters.

Searching: linear vs binary

Alice's method is linear search: check items one by one from the start. In the worst case — the item is last, or missing entirely — she checks all n of them. Double the list and she does double the work. We say linear search takes about n steps.

Ben's method is binary search: on a sorted list, jump to the middle and throw away half. Each step throws away half of what remains, so the number of steps is how many times you can halve n before reaching one item — and that is the logarithm, \log_2 n. A list of a thousand items needs only about 10 steps (2^{10} = 1024); a million items needs about 20. That's a staggering difference:

list size n linear search (~n) binary search (~log₂ n) 10 10 ~4 100 100 ~7 1,000 1,000 ~10 1,000,000 1,000,000 ~20

The catch (there's always a catch): binary search needs the list to be sorted first. Linear search works on any old jumble. Efficiency is often a trade — which is exactly why we compare.

Count the steps yourself

Talk is cheap — let's count. The code below builds a sorted list of 1000 numbers and searches it for a value near the end, once with linear search and once with binary search. Each function keeps a tally of how many comparisons it makes. Press Run and read the two counts.

Linear search grinds through hundreds of comparisons; binary search finishes in about ten. Try changing n to 100000 and run again: the binary-search count barely moves, while the linear count explodes. That gap is efficiency.

Sorting: bubble sort vs merge sort

The same story plays out when we sort a list. Bubble sort repeatedly walks the list swapping neighbours that are out of order. Because it makes roughly one pass per item, and each pass looks at roughly every item, its work grows like n \times n = n^2. Sort 10 items and it's about 100 steps; sort 1000 and it's about a million. Ten times the data, a hundred times the work.

Merge sort is cleverer: it splits the list in half, sorts each half, then merges them back together. All that splitting gives \log_2 n levels, and each level does about n work, so its total grows like n \log_2 n — enormously smaller than n^2 for a big list. Sorting a million items, bubble sort wants a trillion-ish steps while merge sort wants only about twenty million. One is instant; the other never finishes.

Giving the patterns a name: Big-O

We keep saying "about n", "about \log_2 n", "about n^2". Big-O notation is just a tidy way to say exactly that: it captures the growth rate of an algorithm and throws away the fussy details (the exact constant, the small terms) that don't matter once n is large. You read O(n) as "order n".

Here are the ones you'll meet most often, slowest-growing (best) first:

O(1) — constant. The work doesn't grow with n at all. E.g. reading the first item of a list. The dream.
O(\log n) — logarithmic. Grows very slowly; doubling n adds just one step. Binary search.
O(n) — linear. Work grows in step with the input; double the data, double the work. Linear search.
O(n \log n) — a little worse than linear, but still very good. The best general-purpose sorts, like merge sort.
O(n^2) — quadratic. Grows painfully fast; double the data and the work goes up fourfold. Bubble sort.

Big-O describes a trend, not an exact count — an O(n) algorithm that does 3n steps and one that does n steps are both just O(n), because as n rockets up, the shape of the growth is what decides the winner, not the "3".

The "O" is for order — as in "order of magnitude", the rough size of a thing. The notation dates back to a German number theorist, Paul Bachmann, in 1894, long before computers existed, and was popularised by Edmund Landau (so mathematicians sometimes call it a "Landau symbol"). Computer scientists borrowed it in the 1960s — Donald Knuth championed it — to talk about how algorithms scale. So this very modern-sounding idea is wearing a 130-year-old hat.

See the growth

This is why growth rate is everything. The chart plots the number of steps against the input size n for the five orders above. Watch how \log n barely lifts off the floor, n climbs steadily, and n^2 shoots off the top of the chart almost immediately — even though for tiny n on the far left they're all bunched together.

The take-away in one sentence: for small inputs the lines are close, but for large inputs the growth rate wins by a mile. Pick the algorithm with the flatter curve and you win as n grows.

A "slower-looking" algorithm can genuinely be the right choice for small inputs. Big-O only describes what happens as n gets large; it deliberately ignores the constants. For a list of five items, plain old bubble sort is often quicker in practice than a fancy O(n \log n) sort, because merge sort's splitting-and-merging overhead outweighs its cleverness until the list is big enough. Real sorting libraries even switch to a simple sort for tiny chunks. So O(n^2) beating O(n \log n) at n = 5 is not a contradiction — Big-O is a statement about big n.

A second habit worth building: we usually compare the worst case. Linear search might get lucky and find the target on the very first try (its "best case" is O(1)), but we plan for the case where it's last or missing, because that's what could bite us. Efficiency is about guarantees, not lucky days.