A website you use has millions of accounts, and every one of them has a password. Where does the website keep those passwords? The obvious answer — write them down in a database — is a catastrophe waiting to happen. Databases get stolen. When one leaks, every password in it is instantly in a criminal's hands, and because people reuse passwords, their email, banking and everything else falls too. So a well-built system does something that sounds almost paradoxical: it never stores your password at all, yet it can still check that you typed the right one.
The trick that makes this possible is hashing. A cryptographic hash function takes any input — a word, a password, a whole book, a video file — and crunches it down to a short, fixed-size string of characters called a digest (or just "the hash"). This one idea is the backbone of password storage, file-integrity checks, digital signatures and blockchains — and it is the single thing this page teaches.
Picture a machine with a funnel on top: you pour anything you like in, and out of the bottom drops a
fixed-length fingerprint. Pour in the word cat and you might get
77af7784…; pour in the complete works of Shakespeare and you get a different string of
exactly the same length. The output is called a digest because the function has "digested" the
input down to a compact summary.
A real hash such as SHA‑256 always produces a 256-bit digest — written as 64 hexadecimal characters — whether you feed it one letter or a gigabyte. That fixed size is the first clue that hashing is a one-way street: a gigabyte of information cannot possibly be crammed into 64 characters, so the digest can never contain enough to rebuild the original. Meet the four properties that make a hash function genuinely useful.
The same input always produces the same digest, every time, on every computer, forever. If
hashing hunter2 gave a different answer each time, it would be worthless for checking
passwords — you could never match a login against what you stored. Determinism is what makes hashing
checkable: hash the password the user just typed, compare it to the stored digest, and if
the two digests match, the passwords matched.
Given the input it is easy and fast to compute the digest; given only the digest it is practically impossible to recover the input. There is no "unhash" button. This is the property that lets a company store the digest of your password safely: even if an attacker steals the entire database of digests, they still can't read the passwords back out of them.
Think of it like a smoothie. Blending a strawberry and a banana into a smoothie is quick and easy — but no one can look at the smoothie and reconstruct the exact fruit that went in, still less put the banana back together. Hashing is a mathematical blender: easy one way, hopeless the other.
Change the input by the tiniest amount — flip a single letter, add one space, toggle one bit — and the digest changes completely and unpredictably. About half of the output characters flip, and there is no resemblance whatsoever between the two digests. A small change causes a huge, scattered effect, like a single pebble starting an avalanche. This is exactly why you cannot "creep up" on the answer by trying inputs that are close to the target: close inputs give you no hint at all, because their digests look totally unrelated.
Press Run below. It hashes two almost-identical strings — Hello and
hello (just one capital letter different) — and prints both digests so you can compare
them character by character:
The two digests share almost nothing, even though the inputs differ by a single bit of a single letter. (This is only a teaching hash — a genuine one like SHA‑256 has a far stronger avalanche and a much longer digest — but the effect is the real thing.)
A collision is when two different inputs produce the same digest. Because there are infinitely many possible inputs but only finitely many digests, collisions must exist in principle (the pigeonhole principle again). The point of a good cryptographic hash is that, although collisions exist, no one can actually find one: there is no known method faster than blindly trying an astronomical number of inputs. This matters because if an attacker could easily manufacture a second input with the same digest as your password, they could log in as you without ever knowing your real password.
Now put it together. When you create an account, the system hashes your password and stores only the digest — your actual password is thrown away and never written down. When you log in later, the system hashes whatever you type this time and compares the two digests. Match? You're in. It has verified your password without ever storing it.
This is why a well-run website cannot email you your forgotten password — it genuinely doesn't have it, only an irreversible digest. The best it can do is let you reset it. If a site ever emails you your original password in plain text, that is a red flag that they stored it insecurely.
Hashing alone has a gap. Because it's deterministic, the password password123 hashes to
the same digest for everyone who uses it. Attackers exploit this by precomputing the digests
of millions of common passwords into a giant lookup table — a rainbow table — and
then just looking up each stolen digest to see if it's a known one. Two users with the same password
also stand out, because their stored digests are identical.
The defence is a salt: a random, unique string generated for each user and added to their password before hashing. The salt is stored (unhidden) alongside the digest. Now the same password produces a different digest for every user, because each has a different salt — so a precomputed rainbow table is useless, and identical passwords no longer look identical.
The salt doesn't need to be secret — its whole job is just to be different for every user, which is enough to defeat precomputed tables. The rule of thumb for password storage is short: hash, and salt.
These two ideas are constantly confused, so nail the difference now.
Hashing is a one-way process: there is no key, and it is not meant to be reversed — ever. You never "un-hash" a digest to get the input back; you only ever re-hash a fresh input and compare digests. Use encryption when someone needs to read the data later (a message in transit, a file at rest). Use hashing when no one should ever need the original back, only to check it — which is exactly the situation with passwords.
Hashing is not encryption, and treating it like encryption is a serious mistake. Keep these straight:
Passwords aren't the only job. Because the avalanche effect makes a digest a sensitive fingerprint of a file, download sites often publish the hash of a file next to the download link. You hash the file you received and compare: if even one byte was corrupted in transit — or tampered with by an attacker — the digests won't match and you know not to trust it. Same idea powers Git commit ids, digital signatures and blockchains: a hash is a tiny, tamper-evident summary of a much larger thing.
Collisions are unavoidable in theory — there are infinitely many inputs and only finitely many digests — so a hash's strength is never "no collisions exist" but "no one can find one". That's a practical, not absolute, guarantee, and it can expire: the once-standard MD5 and SHA‑1 hashes are now considered broken, because researchers found efficient ways to manufacture collisions, so they've been retired in favour of SHA‑256 and friends. A cryptographic hash is only "secure" until someone proves otherwise — which is why the recommended algorithms change over the years.