Memory Management and Garbage Collection

Every time you write new Thing() in Java, [1, 2, 3] in JavaScript, or malloc(sizeof(struct)) in C, a lump of memory is set aside somewhere to hold that value. But where, exactly, does your object live — and once you stop using it, who tidies it away? Get this wrong and your program either forgets to give memory back (a leak that grows until it dies) or hands the same memory to two owners at once (a crash waiting to happen).

This page is about the two great regions a running program carves its memory into — the stack and the heap — and the strategies, from hand-written free() to automatic garbage collection, that decide when heap memory can be reclaimed. The key idea underneath it all: the heap is really an graph of objects pointing at one another, and "garbage" is simply the part of that graph you can no longer reach.

Two regions: the stack and the heap

When a program runs, the operating system gives it a block of memory. Two ends of that block get special treatment.

A quick rule of thumb: small, short-lived, known-size values (a loop counter, a pointer) sit on the stack; anything created with new/malloc that lives on after the current call lives on the heap. The stack cleans itself; the heap is the hard part — and the rest of this page is about it.

Because a stack frame dies the instant its function returns. If you returned a pointer to a local variable, it would point at memory that has already been popped and reused — a dangling pointer to garbage. Anything that must survive the call has to go on the heap. Also, the stack is small (often a megabyte or two): recurse too deep and you get the famous stack overflow. The heap is where the room is.

Reachability: the heap as an object graph

Here is the mental model that makes garbage collection click. Draw every heap object as a node and every reference (pointer) as a directed arrow. Add the roots — the variables the program can touch directly: things on the stack and in globals. An object is live if you can reach it by following arrows from a root; if you can't, it is garbage, no matter how many other objects still point at it.

Watch the two-phase sweep in the figure. First we mark everything reachable from the roots. Then whatever is left unmarked — here a little island of two objects pointing only at each other — is swept away. That self-referential pair is the classic trap we'll meet in a moment.

Doing it by hand: malloc and free

In a language like C, you are the memory manager. You ask for heap memory with malloc and you must give it back with free when you're done. It's simple, fast, and unforgiving — three mistakes lie in wait:

Because these bugs are so easy to make and so hard to find, most modern languages take the job away from you and reclaim heap memory automatically. That automatic janitor is the garbage collector, and it comes in a few flavours.

Strategy 1 — reference counting

The simplest automatic scheme. Give every object a counter of how many references point at it. Every time you make a new reference, bump the count up; every time a reference goes away, drop it down. The instant a count hits zero, nothing can reach the object, so free it immediately.

Reference counting is wonderfully prompt — memory is reclaimed the moment it becomes unused, with no big pauses — and it's used by Python (alongside a cycle collector) and by C++'s shared_ptr. But it has a famous blind spot, which the "Watch out!" box below spells out: it cannot free a cycle.

Strategy 2 — mark and sweep

Instead of tracking counts as you go, a tracing collector figures out reachability directly, exactly like the figure above. It runs in two phases:

Because it works from reachability rather than counts, mark-and-sweep reclaims cycles that reference counting leaks: two objects pointing at each other are still garbage if no root can reach them, and the mark phase simply never visits them. The cost is that a full collection may pause the program while it traces the heap. Let's watch both strategies decide the fate of the same little graph — including a cycle:

// A tiny heap: each object has an id and a list of ids it references. interface Obj { id: number; refs: number[]; } const heap: Obj[] = [ { id: 0, refs: [1] }, // reachable from a root { id: 1, refs: [2] }, // reached via 0 { id: 2, refs: [] }, // reached via 1 (a leaf) { id: 3, refs: [4] }, // NOT rooted -- part of a cycle { id: 4, refs: [3] }, // NOT rooted -- points back to 3 ]; const roots: number[] = [0]; // the only thing the program can touch directly const byId = (id: number) => heap.find((o) => o.id === id)!; // --- Mark & sweep: mark everything reachable from the roots, sweep the rest --- function markSweep(): { live: number[]; collected: number[] } { const marked = new Set<number>(); const stack = [...roots]; while (stack.length > 0) { const id = stack.pop()!; if (marked.has(id)) continue; // already visited -- this is what tames cycles marked.add(id); for (const r of byId(id).refs) stack.push(r); } const live = heap.map((o) => o.id).filter((id) => marked.has(id)); const collected = heap.map((o) => o.id).filter((id) => !marked.has(id)); return { live, collected }; } // --- Reference counting: count incoming refs from ALL objects (ignoring reachability) --- function refCounts(): Map<number, number> { const count = new Map<number, number>(); for (const o of heap) count.set(o.id, 0); for (const r of roots) count.set(r, count.get(r)! + 1); // held by a root for (const o of heap) for (const r of o.refs) count.set(r, count.get(r)! + 1); return count; } const { live, collected } = markSweep(); console.log("mark-sweep LIVE :", live); // [0, 1, 2] console.log("mark-sweep COLLECTED:", collected); // [3, 4] -- the cycle is reclaimed const counts = refCounts(); const leaked = [...counts].filter(([, c]) => c > 0).map(([id]) => id); console.log("ref-count NON-ZERO :", leaked); // [0,1,2,3,4] -- 3 and 4 keep each other alive! console.log("=> ref counting LEAKS the 3<->4 cycle that mark-sweep frees");

Objects 3 and 4 reference each other, so each keeps the other's count at one — reference counting never frees them. Mark-and-sweep starts from the root, never reaches them, and collects both.

Strategy 3 — generational garbage collection

Tracing the entire heap on every collection is expensive. Real collectors lean on a striking empirical fact, the generational hypothesis:

So split the heap by age. New objects go in a small young generation (the nursery), which is collected often and quickly — since almost everything there is already dead, each pass throws away a lot and traces very little. An object that survives a few collections is promoted to the old generation, which is large and collected only rarely. You spend your effort where the garbage actually is, and touch the long-lived survivors hardly at all. This is how the JVM, .NET and modern JavaScript engines keep GC pauses small.