A relational database splits its data into linked tables. But which columns go in which table? Get that wrong and the same problems that ruin a flat file creep back in — the same fact stored in a dozen places, quietly drifting out of step. Normalisation is the disciplined process that decides, step by step, exactly how to lay the tables out so that every fact lives in exactly one place.
The goal is to remove redundancy (repeated data) and, with it, the anomalies that redundancy causes when you try to update, insert or delete rows. We do this in named stages — First (1NF), Second (2NF) and Third Normal Form (3NF) — each one tightening the rules a little further. This page works one messy table all the way through, and by the end you will be able to normalise a table yourself and say why each split was made.
Imagine a small shop keeps every order on a single paper-style table. Each order can contain several products, so the item details are crammed into one cell — a repeating group:
| OrderID | OrderDate | CustID | CustName | CustCity | Items (ProductID, Name, Price, Qty) |
|---|---|---|---|---|---|
| 1001 | 02 Jun | C7 | Ade | Leeds | P3 Mug £4 ×2 · P8 Pen £1 ×5 |
| 1002 | 02 Jun | C7 | Ade | Leeds | P3 Mug £4 ×1 |
| 1003 | 03 Jun | C9 | Bea | Hull | P8 Pen £1 ×3 · P5 Pad £2 ×1 |
It looks compact, but it is a trap. Customer Ade and city Leeds are copied onto every order Ade makes; “Mug £4” is repeated wherever a mug is bought; and the Items cell holds a whole list, which a database cannot search or total properly. This repetition is data redundancy, and it breeds three specific faults.
When facts are duplicated, ordinary operations start to misbehave. These are the anomalies that normalisation exists to prevent:
All three have the same root cause: a fact is tangled up with other, unrelated facts. Normalisation untangles them so each fact stands on its own, stored once. Let's do it stage by stage.
A table is in 1NF when:
We fix our table by giving each item its own row. Now “P3 Mug £4 ×2” becomes
a proper row. Because one order spans several rows, no single column is unique any more — but the
combination of OrderID and ProductID is. That pair is a
composite primary key (underlined below).
| OrderID | ProductID | OrderDate | CustID | CustName | CustCity | ProdName | Price | Qty |
|---|---|---|---|---|---|---|---|---|
| 1001 | P3 | 02 Jun | C7 | Ade | Leeds | Mug | £4 | 2 |
| 1001 | P8 | 02 Jun | C7 | Ade | Leeds | Pen | £1 | 5 |
| 1002 | P3 | 02 Jun | C7 | Ade | Leeds | Mug | £4 | 1 |
| 1003 | P8 | 03 Jun | C9 | Bea | Hull | Pen | £1 | 3 |
| 1003 | P5 | 03 Jun | C9 | Bea | Hull | Pad | £2 | 1 |
The lists are gone and every cell is atomic — this is valid 1NF. But look how much is now repeated down the rows: the whole order and customer block is copied for every item on the order, and each product's name and price reappears wherever it is bought. 1NF is only the beginning.
2NF attacks redundancy that comes from a composite key. A table is in 2NF when it is in 1NF and has no partial dependencies — that is, no non-key column depends on only part of the key. Every non-key column must depend on the whole key.
Our key is (OrderID, ProductID). Ask of each other column: what does it really
depend on?
OrderDate, CustID, CustName, CustCity
depend on OrderID alone — the same for every item on the order. (Partial.)ProdName, Price depend on ProductID alone — the same
wherever that product appears. (Partial.)Qty depends on the whole key — how many of that
product on that order.We remove each partial dependency into its own table, keyed by the part it depends on. One table becomes three:
| OrderID | OrderDate | CustID | CustName | CustCity |
|---|---|---|---|---|
| 1001 | 02 Jun | C7 | Ade | Leeds |
| 1002 | 02 Jun | C7 | Ade | Leeds |
| 1003 | 03 Jun | C9 | Bea | Hull |
| ProductID | ProdName | Price |
|---|---|---|
| P3 | Mug | £4 |
| P8 | Pen | £1 |
| P5 | Pad | £2 |
| OrderID | ProductID | Qty |
|---|---|---|
| 1001 | P3 | 2 |
| 1001 | P8 | 5 |
| 1002 | P3 | 1 |
| 1003 | P8 | 3 |
| 1003 | P5 | 1 |
Each product's name and price is now stored once (green). The
OrderLines table keeps only the fact that genuinely needs both parts of the key —
the quantity — and uses OrderID and ProductID as
foreign keys back to the other two tables. Notice that this also cures the
insertion and deletion anomalies: a brand-new product can now go straight into
Products even if no one has bought it, and deleting an order can never erase a
product.
One table is still hiding redundancy. Look at Orders: Ade and Leeds are
still repeated on orders 1001 and 1002. Why? Because those columns don't really depend on
the order at all — they depend on the customer.
A table is in 3NF when it is in 2NF and has no transitive
dependencies: no non-key column depends on another non-key column rather than
directly on the key. Here OrderID → CustID → CustName: the customer's name depends on
the key only indirectly, by way of CustID. That indirect chain is a
transitive dependency.
We break it by lifting the customer details out into their own table, leaving
CustID behind as a foreign key:
| OrderID | OrderDate | CustID |
|---|---|---|
| 1001 | 02 Jun | C7 |
| 1002 | 02 Jun | C7 |
| 1003 | 03 Jun | C9 |
| CustID | CustName | CustCity |
|---|---|---|
| C7 | Ade | Leeds |
| C9 | Bea | Hull |
Ade and Leeds are now written once. Together with the earlier split, the whole
design has reached 3NF as four clean tables — Customers,
Orders, Products and OrderLines — each about
one kind of thing, linked by keys. Step through the finished schema:
| Form | Rule added | Removes |
|---|---|---|
| 1NF | Atomic values, no repeating groups, a primary key | Lists / repeating groups in a cell |
| 2NF | 1NF + no partial dependency on part of a composite key | Redundancy from a composite key |
| 3NF | 2NF + no transitive dependency (non-key → non-key) | Non-key fields hiding facts about something else |
A neat sanity check: in a fully 3NF table, every non-key attribute is a fact about the primary key, and nothing else. If a column is really a fact about something other than the key of its table, it is in the wrong table.
There is a famous mnemonic that captures 2NF and 3NF together. A table is in third normal form when every non-key attribute depends on:
“the key, the whole key, and nothing but the key.”
But do not over-normalise blindly! Normalisation is a trade-off. We gained rock-solid data integrity and zero redundancy — at the cost of more tables, which means answering a question like “what did Ade order and where does she live?” now needs a join across three or four tables instead of reading one row. More integrity, more query complexity. For almost all systems that trade is well worth it, which is why 3NF is the normal target — but it is a genuine engineering decision, not an automatic reflex.
Normalisation was invented by Edgar F. “Ted” Codd at IBM in 1970, in the paper that launched relational databases. He called the stages “normal forms” — borrowing a term from mathematics for a canonical, tidied-up standard shape — and defined 1NF, 2NF and 3NF one after another, each a stricter refinement of the last. Later he and others added even stricter forms (BCNF, 4NF, 5NF) for rare tricky cases, but for A-level and for the vast majority of real databases, reaching 3NF is the practical goal.