Input validation

A shop's website has a box that asks "How old are you?". Someone types -7. Someone else types hello. Another leaves it blank and just hits Enter. A computer that blindly trusts whatever it's handed will happily store a customer who is minus seven years old — and later crash, misbehave, or let bad data poison everything downstream.

The golden rule of handling any data from the outside world is: never trust input. Whether it comes from a keyboard, a file, a web form or another program, you must check it is sensible before you use it. That checking is called input validation.

Validation doesn't mean being suspicious for its own sake — it means catching mistakes early, while you can still ask for the value again, instead of letting a bad value sail deep into your program and cause chaos later.

The four everyday checks

Most validation is built from a small toolkit of checks. You pick the ones that make sense for the data you're expecting:

Presence check — is there anything there at all? Reject an empty answer where a value is required (a name, an email, an age).
Range check — is a number within sensible limits? A human age might be 0 to 120; a month is 1 to 12; a percentage 0 to 100.
Type / format check — is it the right kind of thing, in the right shape? A number where a number is expected; an email that contains an @; a UK postcode that matches the postcode pattern.
Length check — is it the right size? A password at least 8 characters; a message no longer than 280; a phone number of exactly the right number of digits.

Real forms usually stack several of these together. "A valid age" might mean: present and a whole number and between 0 and 120. All the checks must pass.

Validation is just a function that says yes or no

In this Primer's Run boxes there's no keyboard to type into — so instead of reading a value from a user, we'll write validation the way professionals test it: as a function that takes a value and returns whether it's valid. Then we call it with several preset test values and print the verdicts. This is exactly how you'd check your logic before ever wiring it to a real form.

Here's a range check for a human age. Press Run and read the verdicts:

function isValidAge(age: number): boolean { return age >= 0 && age <= 120; // range check: 0 to 120 inclusive } const tests: number[] = [25, 0, 120, -7, 121, 200]; for (const age of tests) { const verdict = isValidAge(age) ? "VALID" : "rejected"; console.log("age " + age + " -> " + verdict); }

Notice how the function contains no console.log of its own — it just answers true or false. That makes it easy to reuse anywhere and easy to test, because you can throw a whole list of values at it and eyeball the results.

Stacking the checks: a fuller validator

Real input arrives as text — even a "number" typed into a box is really the characters "25". So a thorough validator often does a type/format check first (can this text even be read as a number?), then a presence check, then a range check. Here we validate an age supplied as a string, and return a helpful message saying why it failed:

function checkAge(input: string): string { if (input.trim() === "") return "rejected: nothing entered (presence)"; const age = Number(input); if (Number.isNaN(age)) return "rejected: not a number (type)"; if (!Number.isInteger(age)) return "rejected: must be a whole number (format)"; if (age < 0 || age > 120) return "rejected: outside 0-120 (range)"; return "VALID (" + age + ")"; } const tests: string[] = ["25", "", "hello", "3.5", "-7", "121", "120"]; for (const input of tests) { console.log("\"" + input + "\" -> " + checkAge(input)); }

Each if is one guard. The checks run in a sensible order — there's no point range-checking something that isn't even a number — and the first failure returns straight away with a clear reason. Only input that survives every guard is declared VALID.

A length check and a format check

Not everything is a number. A length check guards passwords and messages; a simple format check can catch an email with no @. Again we write each as a yes/no function and try several values:

The email check here is deliberately rough — real email validation is famously fiddly. That's a useful lesson in itself: a format check is usually a reasonable filter, not a perfect one.

Keep asking until it's valid

On a real form you don't just reject bad input and give up — you ask again. The pattern is a while loop that repeats while the input is still invalid, and only lets the program move on once a good value arrives:

\texttt{while (input is NOT valid) \{ ask again \}}

We can't really read a keyboard here, so let's simulate a user who first types some rubbish and eventually types something sensible. We walk through that list of attempts with a while loop, stopping the moment one is valid:

function isValidAge(age: number): boolean { return Number.isInteger(age) && age >= 0 && age <= 120; } // pretend these are what a user types, one attempt after another const attempts: number[] = [-7, 200, 3.5, 34]; let i: number = 0; // keep asking WHILE the current attempt is not valid while (!isValidAge(attempts[i])) { console.log("rejected " + attempts[i] + " - please try again"); i = i + 1; } console.log("Accepted age: " + attempts[i]);

In a genuine program the line that reads the next attempt would ask the user again each pass; the shape is identical. The condition is the negation of your validator — while (!isValid(...)) — so the loop runs while things are still wrong and exits the instant they're right.

Like any while loop, this one needs something to change each pass or it never ends. Here it's the fresh attempt (i = i + 1 fetches the next one). In a real program the "progress" is the user typing something new. If your loop somehow re-tests the same rejected value forever, you've built an infinite loop — the classic while trap.

Validation only checks that input is reasonable — it can never check that it's true. If someone types their age as 34 when they're really 35, every check passes: 35... sorry, 34 is present, is a whole number, and is inside 0-120. It's a perfectly valid age. It's just the wrong one, and no amount of validation can tell.

The same goes for dates: 29/02/2023 can be rejected (2023 wasn't a leap year, so that date doesn't exist), but 15/03/2023 is a valid date — even if the event actually happened on the 16th. Validation guards against impossible and nonsensical input; it can't guard against honest mistakes or lies.

And always test the boundary values — the exact edges of your ranges. Is 120 allowed but 121 rejected? Is 0 allowed? A check written age < 120 instead of age <= 120 quietly rejects everyone who is exactly 120 — an "off-by-one" bug you'll only catch by testing the edge itself. When you test a validator, always throw the two values either side of each limit at it.

Why bother? Because bad data spreads

Validation feels like extra work for input that's "probably fine". But an unchecked value doesn't stay put — it gets stored, added up, drawn on a chart, emailed to someone. A single -7 for an age can turn an average into nonsense, break a graph's axis, or crash a line of code three files away that assumed ages are positive. Catching it at the front door, with one small check, is far cheaper than hunting it down later.

This is also your first taste of thinking about security: input validation is the front line of defence against malicious input, not just clumsy typing. You'll meet that idea properly when you study defensive design.