Every time you write new Thing() in Java, [1, 2, 3] in JavaScript, or
malloc(sizeof(struct)) in C, a lump of memory is set aside somewhere to hold that
value. But where, exactly, does your object live — and once you stop using it, who tidies
it away? Get this wrong and your program either forgets to give memory back (a leak
that grows until it dies) or hands the same memory to two owners at once (a
crash waiting to happen).
This page is about the two great regions a running program carves its memory into — the
stack and the heap — and the strategies, from hand-written
free() to automatic garbage collection, that decide when heap memory
can be reclaimed. The key idea underneath it all: the heap is really an
When a program runs, the operating system gives it a block of memory. Two ends of that block get special treatment.
A quick rule of thumb: small, short-lived, known-size values (a loop counter, a pointer) sit on the
stack; anything created with new/malloc that lives on after the current
call lives on the heap. The stack cleans itself; the heap is the hard part — and the rest of this
page is about it.
Because a stack frame dies the instant its function returns. If you returned a pointer to a local variable, it would point at memory that has already been popped and reused — a dangling pointer to garbage. Anything that must survive the call has to go on the heap. Also, the stack is small (often a megabyte or two): recurse too deep and you get the famous stack overflow. The heap is where the room is.
Here is the mental model that makes garbage collection click. Draw every heap object as a node and every reference (pointer) as a directed arrow. Add the roots — the variables the program can touch directly: things on the stack and in globals. An object is live if you can reach it by following arrows from a root; if you can't, it is garbage, no matter how many other objects still point at it.
Watch the two-phase sweep in the figure. First we mark everything reachable from the roots. Then whatever is left unmarked — here a little island of two objects pointing only at each other — is swept away. That self-referential pair is the classic trap we'll meet in a moment.
malloc and free
In a language like C, you are the memory manager. You ask for heap memory with
malloc and you must give it back with free when you're done. It's simple,
fast, and unforgiving — three mistakes lie in wait:
free. The memory is never reclaimed;
do it in a loop and the program bloats until it's killed.free a block but keep a
pointer to it and read through it later. That memory may already belong to something else.free the same block twice, corrupting the
allocator's own bookkeeping.Because these bugs are so easy to make and so hard to find, most modern languages take the job away from you and reclaim heap memory automatically. That automatic janitor is the garbage collector, and it comes in a few flavours.
The simplest automatic scheme. Give every object a counter of how many references point at it. Every time you make a new reference, bump the count up; every time a reference goes away, drop it down. The instant a count hits zero, nothing can reach the object, so free it immediately.
Reference counting is wonderfully prompt — memory is reclaimed the moment it becomes
unused, with no big pauses — and it's used by Python (alongside a cycle collector) and by C++'s
shared_ptr. But it has a famous blind spot, which the "Watch out!" box below spells out:
it cannot free a cycle.
Instead of tracking counts as you go, a tracing collector figures out reachability directly, exactly like the figure above. It runs in two phases:
Because it works from reachability rather than counts, mark-and-sweep reclaims cycles that reference counting leaks: two objects pointing at each other are still garbage if no root can reach them, and the mark phase simply never visits them. The cost is that a full collection may pause the program while it traces the heap. Let's watch both strategies decide the fate of the same little graph — including a cycle:
Objects 3 and 4 reference each other, so each keeps the other's count at
one — reference counting never frees them. Mark-and-sweep starts from the root, never reaches them,
and collects both.
Tracing the entire heap on every collection is expensive. Real collectors lean on a striking empirical fact, the generational hypothesis:
So split the heap by age. New objects go in a small young generation (the nursery), which is collected often and quickly — since almost everything there is already dead, each pass throws away a lot and traces very little. An object that survives a few collections is promoted to the old generation, which is large and collected only rarely. You spend your effort where the garbage actually is, and touch the long-lived survivors hardly at all. This is how the JVM, .NET and modern JavaScript engines keep GC pauses small.
A points to
B and B points back to A, each holds the other's count at
one — even after the program has dropped every reference to both. Neither ever hits
zero, so a pure reference counter never frees them. Reachability (mark-and-sweep) is the fix.
free has its own trio of traps: forgetting to free (leak),
using a pointer after freeing (dangling / use-after-free), and freeing the same block twice
(double free). These are precisely the bugs a garbage collector exists to prevent.