You've researched users, sketched a design, and built a
There are two big families. Expert-based methods have specialists inspect the design (fast, cheap, no users needed). User-based methods watch real people try it (slower, but reveals what users actually do). Good teams use both — experts catch the obvious problems cheaply, then users reveal the surprising ones.
Step through how each method works, who it involves, and what it's good at:
A handful of usability experts inspect the interface and check it against a list of established rules of thumb — heuristics. The most famous list is Nielsen's 10 heuristics, which includes things like visibility of system status (show what's happening — feedback!), match between the system and the real world (speak the user's language), user control and freedom (an undo/escape), consistency, error prevention, and recognition rather than recall. Each expert notes every place the design breaks a heuristic.
Give real users representative tasks ("find and buy a blue size-M jumper") and watch. You measure things like success rate, time on task, errors, and where people get stuck, and often ask them to think aloud — narrating their thoughts — so you hear the confusion in real time. The gold of usability testing is watching someone fail at a task you thought was obvious.
A key idea: you don't need many users. Watching just 5 users typically uncovers around 80% of the usability problems — so testing often, with a few people, beats one giant study.
Show version A to one random half of your live users and version B to the other half, then measure which performs better on a real metric (clicks, sign-ups, purchases, time spent). Because the two groups are large and randomly assigned, the difference is caused by the design change — it's a controlled experiment on real behaviour.
It feels wrong — surely more is better? For finding usability problems, no. The same big, obvious issues trip up almost everyone, so by the fifth user you're mostly watching people hit the same walls you've already seen; extra users add little new. Nielsen's rule of thumb is that ~5 users reveal about 80% of the problems, so it's far smarter to run several small tests (fixing issues between them) than one huge, expensive one. (A/B testing is different — it measures a rate, so there you do want large numbers for a reliable result.)
Classic confusions to avoid: