Hashing

A website you use has millions of accounts, and every one of them has a password. Where does the website keep those passwords? The obvious answer — write them down in a database — is a catastrophe waiting to happen. Databases get stolen. When one leaks, every password in it is instantly in a criminal's hands, and because people reuse passwords, their email, banking and everything else falls too. So a well-built system does something that sounds almost paradoxical: it never stores your password at all, yet it can still check that you typed the right one.

The trick that makes this possible is hashing. A cryptographic hash function takes any input — a word, a password, a whole book, a video file — and crunches it down to a short, fixed-size string of characters called a digest (or just "the hash"). This one idea is the backbone of password storage, file-integrity checks, digital signatures and blockchains — and it is the single thing this page teaches.

Input in, digest out

Picture a machine with a funnel on top: you pour anything you like in, and out of the bottom drops a fixed-length fingerprint. Pour in the word cat and you might get 77af7784…; pour in the complete works of Shakespeare and you get a different string of exactly the same length. The output is called a digest because the function has "digested" the input down to a compact summary.

\text{any input (any length)} \;\xrightarrow{\;\text{hash function}\;}\; \text{digest (fixed length)}

A real hash such as SHA‑256 always produces a 256-bit digest — written as 64 hexadecimal characters — whether you feed it one letter or a gigabyte. That fixed size is the first clue that hashing is a one-way street: a gigabyte of information cannot possibly be crammed into 64 characters, so the digest can never contain enough to rebuild the original. Meet the four properties that make a hash function genuinely useful.

Property 1 — deterministic

The same input always produces the same digest, every time, on every computer, forever. If hashing hunter2 gave a different answer each time, it would be worthless for checking passwords — you could never match a login against what you stored. Determinism is what makes hashing checkable: hash the password the user just typed, compare it to the stored digest, and if the two digests match, the passwords matched.

\operatorname{hash}(x) = \operatorname{hash}(x) \quad \text{always}

Property 2 — one-way (irreversible)

Given the input it is easy and fast to compute the digest; given only the digest it is practically impossible to recover the input. There is no "unhash" button. This is the property that lets a company store the digest of your password safely: even if an attacker steals the entire database of digests, they still can't read the passwords back out of them.

Think of it like a smoothie. Blending a strawberry and a banana into a smoothie is quick and easy — but no one can look at the smoothie and reconstruct the exact fruit that went in, still less put the banana back together. Hashing is a mathematical blender: easy one way, hopeless the other.

Property 3 — the avalanche effect

Change the input by the tiniest amount — flip a single letter, add one space, toggle one bit — and the digest changes completely and unpredictably. About half of the output characters flip, and there is no resemblance whatsoever between the two digests. A small change causes a huge, scattered effect, like a single pebble starting an avalanche. This is exactly why you cannot "creep up" on the answer by trying inputs that are close to the target: close inputs give you no hint at all, because their digests look totally unrelated.

Press Run below. It hashes two almost-identical strings — Hello and hello (just one capital letter different) — and prints both digests so you can compare them character by character:

// A small demonstration hash (NOT secure — a real one is SHA-256). // It still shows the avalanche effect: a tiny input change scrambles the whole output. function demoHash(input: string): string { let h1 = 0x811c9dc5; // two rolling 32-bit values, mixed together let h2 = 0x1000193; for (let i = 0; i < input.length; i++) { const c = input.charCodeAt(i); h1 = (h1 ^ c) * 0x01000193; // XOR in the char, then multiply (spreads bits) h2 = ((h2 << 5) + h2 + c) ^ (h1 >>> 3); h1 = h1 >>> 0; // keep them as unsigned 32-bit numbers h2 = h2 >>> 0; } // stitch the two values into an 16-character hex digest return h1.toString(16).padStart(8, "0") + h2.toString(16).padStart(8, "0"); } console.log("hash('Hello') =", demoHash("Hello")); console.log("hash('hello') =", demoHash("hello")); // ONE letter different... console.log("hash('hellp') =", demoHash("hellp")); // ...and again console.log(); console.log("Notice: one tiny change -> a completely different digest.");

The two digests share almost nothing, even though the inputs differ by a single bit of a single letter. (This is only a teaching hash — a genuine one like SHA‑256 has a far stronger avalanche and a much longer digest — but the effect is the real thing.)

Property 4 — collision-resistant

A collision is when two different inputs produce the same digest. Because there are infinitely many possible inputs but only finitely many digests, collisions must exist in principle (the pigeonhole principle again). The point of a good cryptographic hash is that, although collisions exist, no one can actually find one: there is no known method faster than blindly trying an astronomical number of inputs. This matters because if an attacker could easily manufacture a second input with the same digest as your password, they could log in as you without ever knowing your real password.

Deterministic — the same input always yields the same digest.
Fixed-size output — the digest length is the same for any input.
One-way (pre-image resistant) — you cannot feasibly recover an input from its digest.
Avalanche — a one-bit change to the input changes about half the output bits.
Collision-resistant — you cannot feasibly find two inputs with the same digest.

The big use: storing passwords safely

Now put it together. When you create an account, the system hashes your password and stores only the digest — your actual password is thrown away and never written down. When you log in later, the system hashes whatever you type this time and compares the two digests. Match? You're in. It has verified your password without ever storing it.

\text{sign up: store } \operatorname{hash}(\text{password}) \qquad\qquad \text{log in: check } \operatorname{hash}(\text{typed}) \overset{?}{=} \text{stored digest}

This is why a well-run website cannot email you your forgotten password — it genuinely doesn't have it, only an irreversible digest. The best it can do is let you reset it. If a site ever emails you your original password in plain text, that is a red flag that they stored it insecurely.

The weakness: rainbow tables, and the fix: salting

Hashing alone has a gap. Because it's deterministic, the password password123 hashes to the same digest for everyone who uses it. Attackers exploit this by precomputing the digests of millions of common passwords into a giant lookup table — a rainbow table — and then just looking up each stolen digest to see if it's a known one. Two users with the same password also stand out, because their stored digests are identical.

The defence is a salt: a random, unique string generated for each user and added to their password before hashing. The salt is stored (unhidden) alongside the digest. Now the same password produces a different digest for every user, because each has a different salt — so a precomputed rainbow table is useless, and identical passwords no longer look identical.

\text{store: } \big(\; \text{salt}, \;\; \operatorname{hash}(\text{salt} + \text{password}) \;\big)

function demoHash(input: string): string { let h1 = 0x811c9dc5, h2 = 0x1000193; for (let i = 0; i < input.length; i++) { const c = input.charCodeAt(i); h1 = ((h1 ^ c) * 0x01000193) >>> 0; h2 = (((h2 << 5) + h2 + c) ^ (h1 >>> 3)) >>> 0; } return h1.toString(16).padStart(8, "0") + h2.toString(16).padStart(8, "0"); } const password = "letmein"; // Without salt: two users with the SAME password store the SAME digest. console.log("Alice, no salt:", demoHash(password)); console.log("Bob, no salt:", demoHash(password)); // identical -> a giveaway // With a unique salt each, the SAME password now hashes differently: console.log("Alice + salt: ", demoHash("f3a9c1" + password)); console.log("Bob + salt: ", demoHash("81be40" + password)); // totally different

The salt doesn't need to be secret — its whole job is just to be different for every user, which is enough to defeat precomputed tables. The rule of thumb for password storage is short: hash, and salt.

Hashing is not encryption

These two ideas are constantly confused, so nail the difference now. Encryption is a two-way process: you scramble plaintext into ciphertext with a key, and anyone with the key can reverse it back to the plaintext. That reversibility is the whole point — the receiver needs to read the message.

Hashing is a one-way process: there is no key, and it is not meant to be reversed — ever. You never "un-hash" a digest to get the input back; you only ever re-hash a fresh input and compare digests. Use encryption when someone needs to read the data later (a message in transit, a file at rest). Use hashing when no one should ever need the original back, only to check it — which is exactly the situation with passwords.

Hashing is not encryption, and treating it like encryption is a serious mistake. Keep these straight:

A hash has no key and is not reversible. There is no "unhash" operation — if you find yourself wanting to decrypt a hash, you've misunderstood what a hash is.
Never store passwords as plain text, and never store them reversibly encrypted either — if you (or a thief) can decrypt them, that's a disaster. Store a salted hash so the originals are gone for good.
Don't use a fast general-purpose hash (like a plain SHA‑256) directly for passwords in real systems: attackers can try billions per second. Purpose-built slow password hashes (bcrypt, scrypt, Argon2) deliberately take longer to compute, which cripples brute-forcing — but the salted-hash principle is exactly the one taught here.

Another everyday use: checking a file hasn't changed

Passwords aren't the only job. Because the avalanche effect makes a digest a sensitive fingerprint of a file, download sites often publish the hash of a file next to the download link. You hash the file you received and compare: if even one byte was corrupted in transit — or tampered with by an attacker — the digests won't match and you know not to trust it. Same idea powers Git commit ids, digital signatures and blockchains: a hash is a tiny, tamper-evident summary of a much larger thing.

Collisions are unavoidable in theory — there are infinitely many inputs and only finitely many digests — so a hash's strength is never "no collisions exist" but "no one can find one". That's a practical, not absolute, guarantee, and it can expire: the once-standard MD5 and SHA‑1 hashes are now considered broken, because researchers found efficient ways to manufacture collisions, so they've been retired in favour of SHA‑256 and friends. A cryptographic hash is only "secure" until someone proves otherwise — which is why the recommended algorithms change over the years.