A school needs to keep track of its pupils: their name, which class they are in, and who teaches that class. Where do you put all of that? The obvious first answer is one big table — like a single giant spreadsheet with a row for every pupil and a column for every fact. That is exactly what a flat-file database is: all the data lives in one table.
It sounds tidy, and for a tiny amount of data it is fine. But as soon as the same facts start turning up again and again, a flat file quietly turns into a mess. This page is about why that happens, and how a relational database — data split across several linked tables — fixes it. Understanding this one trade-off is the heart of GCSE databases.
Here is our school as a single flat-file table. Read it row by row — each pupil, their class, and their teacher:
| Name | Class | Teacher |
|---|---|---|
| Priya | 10A | Mr Okafor |
| Sam | 10A | Mr Okafor |
| Ada | 10A | Mr Okafor |
| Leo | 10B | Ms Bell |
| Maya | 10C | Dr Rai |
| Ben | 10A | Mr Okafor |
Look at the highlighted Teacher column. “Mr Okafor” is written out four separate times, once for every pupil in 10A. Nothing there is wrong, exactly — but the same fact (“10A is taught by Mr Okafor”) is stored over and over. This repetition of identical data is called data redundancy.
Repeating the same fact on many rows causes two real headaches:
Suppose Mr Okafor leaves and Ms Bell takes over class 10A. In the flat file you must find every single 10A row and change the teacher — four edits for four pupils. Change three of them and get distracted, and the table now claims that some 10A pupils are taught by Ms Bell and others by Mr Okafor. Same class, two answers. That contradiction is called an update anomaly, and it is the classic reason flat files go wrong: a fact stored many times must be updated in many places, and humans miss some.
The cure is simple to state: store each fact exactly once. Then there is only ever one place to change it, and the data can never disagree with itself. That single idea is what relational databases are built to deliver.
You can watch an update anomaly happen. This flat-file list tries to change 10A's teacher but only fixes the first matching row — press Run:
A relational database refuses to repeat facts. Instead of one table, it uses several tables, each about one kind of thing, and it links them together. Our school splits cleanly into two:
| ClassID | Teacher |
|---|---|
| 10A | Mr Okafor |
| 10B | Ms Bell |
| 10C | Dr Rai |
| Name | ClassID |
|---|---|
| Priya | 10A |
| Sam | 10A |
| Ada | 10A |
| Leo | 10B |
| Maya | 10C |
| Ben | 10A |
Now “Mr Okafor” is written once, in the Classes table. The Students
table doesn't repeat the teacher's name — it just stores the short ClassID
(like 10A) that points to the matching row in Classes. That shared column
is a key: it is how the two tables are joined back together when you need the
full picture.
Step through the diagram to see the link:
Two special columns make the linking work. You need to recognise both for GCSE:
ClassID is the primary
key — no two classes share it.
ClassID is a
foreign key pointing at the Classes table.
Follow a student's ClassID across to the Classes table and you instantly find their
teacher — no duplication needed. Change the teacher for 10A? You edit one cell,
and every 10A pupil is automatically up to date, because they were never storing the teacher's
name in the first place. The update anomaly simply cannot happen.
Yes! If your data is small and simple — a personal list of birthdays, a shopping list, the high scores in one game — a flat file (a single spreadsheet or CSV) is quick to set up and perfectly fine. The problems only bite once the same facts repeat across many rows and the data needs updating. As a rule of thumb: one kind of thing, little repetition ⇒ flat file is fine; several related kinds of things, lots of shared facts ⇒ go relational.
| Feature | Flat file | Relational |
|---|---|---|
| Tables | One big table | Several linked tables |
| How facts are stored | Repeated on every relevant row | Each fact stored once |
| Data redundancy | High | Low |
| Risk of inconsistency | High (update anomalies) | Low |
| Links between tables | None | Primary & foreign keys |
| Best for | Small, simple data | Large, related data |