Imagine a school register with two thousand pupils, and you need to find one pupil's record from
their name. With an
A hash table pulls off something that feels like magic: it stores
key → value pairs (name → record, word → definition, username → account) and lets
you fetch a value from its key almost instantly, no matter how many pairs it holds.
Instead of searching for where a key lives, it calculates where the key lives — in
one step. That trick is the single idea of this page, and it powers dictionaries in Python, objects
and Map in JavaScript/TypeScript, database indexes, and much more besides.
Underneath, a hash table is just an ordinary array of slots, which we'll call buckets. The clever part is the hash function: a small piece of code that takes a key and crunches it down to a whole number — the index of the bucket where that key's value should live.
The mod (remainder) step is what squashes any number the hash produces into a valid
index — a value from tableSize - 1, so it always
lands inside the array. To store a pair you hash the key and drop the value in that
bucket; to look it up later you hash the same key, go straight to the same bucket,
and there it is. You never scan the whole array — you compute the address and jump to it:
Because computing a hash and then indexing an array both take the same tiny amount of time whether the
table holds ten items or ten million, lookup, insertion and deletion are all
constant time, written
A hash function for text can be as simple as this: add up the character code of every letter, then
take the remainder when divided by the table size. (Every character has a numeric code — 'A'
is 'B' is charCodeAt reads it.) Press Run and watch three keys turn into bucket
indices:
The same key always produces the same index — that's what makes lookup possible. A hash function must be deterministic (same input, same output every time) and fast to compute; the best ones also spread different keys as evenly as possible across the buckets, for a reason we're about to meet.
Put the hash function to work as a tiny dictionary. To store, hash the key and place the value in that bucket; to look up, hash again and read the same bucket. Notice there is no searching anywhere — the index is computed both times:
This works beautifully — right up until two different keys land in the same bucket. And with only seven buckets, that is going to happen sooner than you might think.
A collision is when hash(keyA) and hash(keyB) give the
same index for two different keys. Our simple "sum the character codes" hash collides very
easily — any two words made of the same letters (an anagram!) add up to exactly the same total.
Watch "listen" and "silent" collide, and see the danger:
The value for "listen" has been silently overwritten. A hash table is useless unless it
has a plan for collisions — and it always needs one, because collisions can never be fully avoided
(more on that in the warning below). There are two classic plans.
With chaining (also called separate chaining), each bucket doesn't hold a single value — it holds a small list of all the pairs that hashed there. To store, you append to the bucket's list; to look up, you go to the bucket and scan its (usually very short) list for the matching key. Colliding keys simply queue up together instead of clobbering one another:
The chain is why we store the whole key alongside the value, and compare on it inside the
bucket: the index alone can't tell "listen" from "silent", so we confirm by
matching the actual key. As long as the chains stay short, lookup is still effectively
The other approach keeps just one value per bucket but, on a collision, goes looking for another free bucket. This is open addressing, and the simplest version is linear probing: if the computed bucket is taken, try the next one along, then the next, wrapping around the end of the array, until you find an empty slot.
Look-up follows the very same trail: start at the hashed bucket and step forward until you either
find the key or hit an empty slot (which means it isn't there). Run it and watch "silent"
get bumped one place past the bucket "listen" already occupies:
Chaining is simpler and copes gracefully when a table gets crowded; open addressing avoids the extra lists and keeps everything in one tidy array, which the hardware often likes. Both are widely used — what matters is that some strategy is always there.
The speed of a hash table depends on how crowded it is. The load factor measures exactly that — the number of stored items divided by the number of buckets:
A load factor of
The cure is resizing (also called rehashing): when the load factor crosses a
threshold — commonly around
The honest answer is that
Collisions are not a sign of a bug — they are unavoidable, and the maths proves it. By the pigeonhole principle, if you have more keys than buckets then at least two keys must share a bucket, no matter how brilliant the hash function is. And there are always vastly more possible keys (every possible word, name or number) than the handful of buckets in the array. So a hash table without a collision strategy is simply broken; handling collisions is not optional.
The subtler trap is a poor hash function. A hash that clusters keys into just a few
buckets — leaving most empty — creates long chains or long probe trails, and your beautiful