Database Normalisation

A relational database splits its data into linked tables. But which columns go in which table? Get that wrong and the same problems that ruin a flat file creep back in — the same fact stored in a dozen places, quietly drifting out of step. Normalisation is the disciplined process that decides, step by step, exactly how to lay the tables out so that every fact lives in exactly one place.

The goal is to remove redundancy (repeated data) and, with it, the anomalies that redundancy causes when you try to update, insert or delete rows. We do this in named stages — First (1NF), Second (2NF) and Third Normal Form (3NF) — each one tightening the rules a little further. This page works one messy table all the way through, and by the end you will be able to normalise a table yourself and say why each split was made.

The problem: one bloated table

Imagine a small shop keeps every order on a single paper-style table. Each order can contain several products, so the item details are crammed into one cell — a repeating group:

Orders (unnormalised — UNF)
OrderID	OrderDate	CustID	CustName	CustCity	Items (ProductID, Name, Price, Qty)
1001	02 Jun	C7	Ade	Leeds	P3 Mug £4 ×2 · P8 Pen £1 ×5
1002	02 Jun	C7	Ade	Leeds	P3 Mug £4 ×1
1003	03 Jun	C9	Bea	Hull	P8 Pen £1 ×3 · P5 Pad £2 ×1

It looks compact, but it is a trap. Customer Ade and city Leeds are copied onto every order Ade makes; “Mug £4” is repeated wherever a mug is bought; and the Items cell holds a whole list, which a database cannot search or total properly. This repetition is data redundancy, and it breeds three specific faults.

Why redundancy hurts: the three anomalies

When facts are duplicated, ordinary operations start to misbehave. These are the anomalies that normalisation exists to prevent:

Update anomaly. If Ade moves to Manchester, her city is written on many rows. Change some and miss others and the database now claims Ade lives in two cities at once — it disagrees with itself.
Insertion anomaly. You want to add a new product to the catalogue that nobody has ordered yet. In this table there is nowhere to put it — a product only exists as part of an order, so you are blocked from recording a fact you clearly know.
Deletion anomaly. If order 1003 is the only order containing the Pad (P5) and you delete that order, the very existence of the Pad — its name and price — vanishes with it. You lost a fact you never meant to throw away.

All three have the same root cause: a fact is tangled up with other, unrelated facts. Normalisation untangles them so each fact stands on its own, stored once. Let's do it stage by stage.

Stage 1 — First Normal Form (1NF)

A table is in 1NF when:

every value is atomic — a single value per cell, not a list;
there are no repeating groups — no “Item 1, Item 2, Item 3…” columns and no lists crammed into one cell;
there is a primary key that uniquely identifies each row.

We fix our table by giving each item its own row. Now “P3 Mug £4 ×2” becomes a proper row. Because one order spans several rows, no single column is unique any more — but the combination of OrderID and ProductID is. That pair is a composite primary key (underlined below).

Orders (1NF) — key = (OrderID, ProductID)
OrderID	ProductID	OrderDate	CustID	CustName	CustCity	ProdName	Price	Qty
1001	P3	02 Jun	C7	Ade	Leeds	Mug	£4	2
1001	P8	02 Jun	C7	Ade	Leeds	Pen	£1	5
1002	P3	02 Jun	C7	Ade	Leeds	Mug	£4	1
1003	P8	03 Jun	C9	Bea	Hull	Pen	£1	3
1003	P5	03 Jun	C9	Bea	Hull	Pad	£2	1

The lists are gone and every cell is atomic — this is valid 1NF. But look how much is now repeated down the rows: the whole order and customer block is copied for every item on the order, and each product's name and price reappears wherever it is bought. 1NF is only the beginning.

Stage 2 — Second Normal Form (2NF)

2NF attacks redundancy that comes from a composite key. A table is in 2NF when it is in 1NF and has no partial dependencies — that is, no non-key column depends on only part of the key. Every non-key column must depend on the whole key.

Our key is (OrderID, ProductID). Ask of each other column: what does it really depend on?

OrderDate, CustID, CustName, CustCity depend on OrderID alone — the same for every item on the order. (Partial.)
ProdName, Price depend on ProductID alone — the same wherever that product appears. (Partial.)
Only Qty depends on the whole key — how many of that product on that order.

We remove each partial dependency into its own table, keyed by the part it depends on. One table becomes three:

Orders
OrderID	OrderDate	CustID	CustName	CustCity
1001	02 Jun	C7	Ade	Leeds
1002	02 Jun	C7	Ade	Leeds
1003	03 Jun	C9	Bea	Hull

Products
ProductID	ProdName	Price
P3	Mug	£4
P8	Pen	£1
P5	Pad	£2

OrderLines
OrderID	ProductID	Qty
1001	P3	2
1001	P8	5
1002	P3	1
1003	P8	3
1003	P5	1

Each product's name and price is now stored once (green). The OrderLines table keeps only the fact that genuinely needs both parts of the key — the quantity — and uses OrderID and ProductID as foreign keys back to the other two tables. Notice that this also cures the insertion and deletion anomalies: a brand-new product can now go straight into Products even if no one has bought it, and deleting an order can never erase a product.

Stage 3 — Third Normal Form (3NF)

One table is still hiding redundancy. Look at Orders: Ade and Leeds are still repeated on orders 1001 and 1002. Why? Because those columns don't really depend on the order at all — they depend on the customer.

A table is in 3NF when it is in 2NF and has no transitive dependencies: no non-key column depends on another non-key column rather than directly on the key. Here OrderID → CustID → CustName: the customer's name depends on the key only indirectly, by way of CustID. That indirect chain is a transitive dependency.

We break it by lifting the customer details out into their own table, leaving CustID behind as a foreign key:

Orders (3NF)
OrderID	OrderDate	CustID
1001	02 Jun	C7
1002	02 Jun	C7
1003	03 Jun	C9

Customers
CustID	CustName	CustCity
C7	Ade	Leeds
C9	Bea	Hull

Ade and Leeds are now written once. Together with the earlier split, the whole design has reached 3NF as four clean tables — Customers, Orders, Products and OrderLines — each about one kind of thing, linked by keys. Step through the finished schema:

The whole journey at a glance

What each normal form removes
Form	Rule added	Removes
1NF	Atomic values, no repeating groups, a primary key	Lists / repeating groups in a cell
2NF	1NF + no partial dependency on part of a composite key	Redundancy from a composite key
3NF	2NF + no transitive dependency (non-key → non-key)	Non-key fields hiding facts about something else

A neat sanity check: in a fully 3NF table, every non-key attribute is a fact about the primary key, and nothing else. If a column is really a fact about something other than the key of its table, it is in the wrong table.

There is a famous mnemonic that captures 2NF and 3NF together. A table is in third normal form when every non-key attribute depends on:

“the key, the whole key, and nothing but the key.”

The key — every non-key field must depend on the primary key (that's just being a proper table).
The whole key — not just part of a composite key (that's 2NF, killing partial dependencies).
Nothing but the key — not on some other non-key field (that's 3NF, killing transitive dependencies).

But do not over-normalise blindly! Normalisation is a trade-off. We gained rock-solid data integrity and zero redundancy — at the cost of more tables, which means answering a question like “what did Ade order and where does she live?” now needs a join across three or four tables instead of reading one row. More integrity, more query complexity. For almost all systems that trade is well worth it, which is why 3NF is the normal target — but it is a genuine engineering decision, not an automatic reflex.

Normalisation was invented by Edgar F. “Ted” Codd at IBM in 1970, in the paper that launched relational databases. He called the stages “normal forms” — borrowing a term from mathematics for a canonical, tidied-up standard shape — and defined 1NF, 2NF and 3NF one after another, each a stricter refinement of the last. Later he and others added even stricter forms (BCNF, 4NF, 5NF) for rare tricky cases, but for A-level and for the vast majority of real databases, reaching 3NF is the practical goal.