Database Normalisation

A relational database splits its data into linked tables. But which columns go in which table? Get that wrong and the same problems that ruin a flat file creep back in — the same fact stored in a dozen places, quietly drifting out of step. Normalisation is the disciplined process that decides, step by step, exactly how to lay the tables out so that every fact lives in exactly one place.

The goal is to remove redundancy (repeated data) and, with it, the anomalies that redundancy causes when you try to update, insert or delete rows. We do this in named stages — First (1NF), Second (2NF) and Third Normal Form (3NF) — each one tightening the rules a little further. This page works one messy table all the way through, and by the end you will be able to normalise a table yourself and say why each split was made.

The problem: one bloated table

Imagine a small shop keeps every order on a single paper-style table. Each order can contain several products, so the item details are crammed into one cell — a repeating group:

Orders (unnormalised — UNF)
OrderIDOrderDateCustID CustNameCustCityItems (ProductID, Name, Price, Qty)
100102 JunC7 AdeLeeds P3 Mug £4 ×2 · P8 Pen £1 ×5
100202 JunC7 AdeLeeds P3 Mug £4 ×1
100303 JunC9 BeaHull P8 Pen £1 ×3 · P5 Pad £2 ×1

It looks compact, but it is a trap. Customer Ade and city Leeds are copied onto every order Ade makes; “Mug £4” is repeated wherever a mug is bought; and the Items cell holds a whole list, which a database cannot search or total properly. This repetition is data redundancy, and it breeds three specific faults.

Why redundancy hurts: the three anomalies

When facts are duplicated, ordinary operations start to misbehave. These are the anomalies that normalisation exists to prevent:

All three have the same root cause: a fact is tangled up with other, unrelated facts. Normalisation untangles them so each fact stands on its own, stored once. Let's do it stage by stage.

Stage 1 — First Normal Form (1NF)

A table is in 1NF when:

We fix our table by giving each item its own row. Now “P3 Mug £4 ×2” becomes a proper row. Because one order spans several rows, no single column is unique any more — but the combination of OrderID and ProductID is. That pair is a composite primary key (underlined below).

Orders (1NF) — key = (OrderID, ProductID)
OrderIDProductIDOrderDate CustIDCustNameCustCity ProdNamePriceQty
1001P302 JunC7AdeLeedsMug£42
1001P802 JunC7AdeLeedsPen£15
1002P302 JunC7AdeLeedsMug£41
1003P803 JunC9BeaHullPen£13
1003P503 JunC9BeaHullPad£21

The lists are gone and every cell is atomic — this is valid 1NF. But look how much is now repeated down the rows: the whole order and customer block is copied for every item on the order, and each product's name and price reappears wherever it is bought. 1NF is only the beginning.

Stage 2 — Second Normal Form (2NF)

2NF attacks redundancy that comes from a composite key. A table is in 2NF when it is in 1NF and has no partial dependencies — that is, no non-key column depends on only part of the key. Every non-key column must depend on the whole key.

Our key is (OrderID, ProductID). Ask of each other column: what does it really depend on?

We remove each partial dependency into its own table, keyed by the part it depends on. One table becomes three:

Orders
OrderIDOrderDateCustIDCustNameCustCity
100102 JunC7AdeLeeds
100202 JunC7AdeLeeds
100303 JunC9BeaHull
Products
ProductIDProdNamePrice
P3Mug£4
P8Pen£1
P5Pad£2
OrderLines
OrderIDProductIDQty
1001P32
1001P85
1002P31
1003P83
1003P51

Each product's name and price is now stored once (green). The OrderLines table keeps only the fact that genuinely needs both parts of the key — the quantity — and uses OrderID and ProductID as foreign keys back to the other two tables. Notice that this also cures the insertion and deletion anomalies: a brand-new product can now go straight into Products even if no one has bought it, and deleting an order can never erase a product.

Stage 3 — Third Normal Form (3NF)

One table is still hiding redundancy. Look at Orders: Ade and Leeds are still repeated on orders 1001 and 1002. Why? Because those columns don't really depend on the order at all — they depend on the customer.

A table is in 3NF when it is in 2NF and has no transitive dependencies: no non-key column depends on another non-key column rather than directly on the key. Here OrderID → CustID → CustName: the customer's name depends on the key only indirectly, by way of CustID. That indirect chain is a transitive dependency.

We break it by lifting the customer details out into their own table, leaving CustID behind as a foreign key:

Orders (3NF)
OrderIDOrderDateCustID
100102 JunC7
100202 JunC7
100303 JunC9
Customers
CustIDCustNameCustCity
C7AdeLeeds
C9BeaHull

Ade and Leeds are now written once. Together with the earlier split, the whole design has reached 3NF as four clean tablesCustomers, Orders, Products and OrderLines — each about one kind of thing, linked by keys. Step through the finished schema:

The whole journey at a glance

What each normal form removes
FormRule addedRemoves
1NFAtomic values, no repeating groups, a primary keyLists / repeating groups in a cell
2NF1NF + no partial dependency on part of a composite keyRedundancy from a composite key
3NF2NF + no transitive dependency (non-key → non-key)Non-key fields hiding facts about something else

A neat sanity check: in a fully 3NF table, every non-key attribute is a fact about the primary key, and nothing else. If a column is really a fact about something other than the key of its table, it is in the wrong table.

There is a famous mnemonic that captures 2NF and 3NF together. A table is in third normal form when every non-key attribute depends on:

“the key, the whole key, and nothing but the key.”

But do not over-normalise blindly! Normalisation is a trade-off. We gained rock-solid data integrity and zero redundancy — at the cost of more tables, which means answering a question like “what did Ade order and where does she live?” now needs a join across three or four tables instead of reading one row. More integrity, more query complexity. For almost all systems that trade is well worth it, which is why 3NF is the normal target — but it is a genuine engineering decision, not an automatic reflex.

Normalisation was invented by Edgar F. “Ted” Codd at IBM in 1970, in the paper that launched relational databases. He called the stages “normal forms” — borrowing a term from mathematics for a canonical, tidied-up standard shape — and defined 1NF, 2NF and 3NF one after another, each a stricter refinement of the last. Later he and others added even stricter forms (BCNF, 4NF, 5NF) for rare tricky cases, but for A-level and for the vast majority of real databases, reaching 3NF is the practical goal.