A parser is a strict grammar teacher: it checks that your sentence is well-formed. Feed it
x + 1 and it happily builds a tree — a plus node with a variable on the left and a literal
on the right. Grammatically perfect. But there is a question the parser never asks:
was x ever declared? Does x + 1 actually mean
anything?
That is the job of the next stage, semantic analysis. Parsing checks
form; semantic analysis checks meaning. It walks the
x + 1 is fine syntax, but nonsense if
x was never introduced. "Undefined variable x.""apple" * 3 parses like any other multiplication,
yet multiplying a string by a number is meaningless in most languages. "Cannot multiply a string by
a number."max(1) is a valid function call to the grammar, but if
max expects two arguments the program is broken. "Expected 2 arguments, got
1."None of these are grammar mistakes — every one is a perfectly legal shape. Catching them needs two tools that this page is about: a symbol table to remember what every name means, and a type checker that walks the tree working out the type of each node.
The moment a program says let count = 0, the compiler has learned a fact it must remember:
the name count now exists, it is a variable (not a function or a type), it holds a number,
and it lives at some storage location. All of that bundled knowledge is a name's
attributes, and the structure that stores them is the symbol table:
a dictionary mapping each name to its attributes.
number, string, bool, a function
signature…
Every time the analyser meets a use of a name — count + 1 — it does a
lookup: fetch count's attributes so it can check the use makes sense.
A single program does millions of these lookups, so the operation must be fast. That is exactly why a
symbol table is almost always backed by a
The kind attribute — and it matters more than it looks. In many languages the same
identifier can be a type and a value in different contexts, and the analyser leans on
kind to tell them apart. When the parser sees foo(x) it can't know whether
foo is a function being called or (in some languages) a type doing a cast; only the
symbol table, filled in by an earlier declaration, resolves the ambiguity. This is one reason
semantic analysis is a separate pass after parsing: it needs the whole picture of what every
name is.
A real program doesn't have one flat namespace. The same name i can be a loop counter in
one function and something entirely different in another. What keeps them apart is scope
— the region of program text where a declaration is visible. And the elegant way to track scope is a
stack of symbol tables:
{ … }, a loop),
it pushes a fresh, empty symbol table onto the stack.The stack mirrors the nesting of blocks precisely. Here is a small program and the stack of tables as the analyser reaches the innermost line:
| Stack (top = innermost) | Names in this table |
|---|---|
| scope 2 (the inner block) | pi |
scope 1 (function area) | r, two |
| scope 0 (global) | g, area |
Now the line two * pi * r needs to resolve three names. Name
resolution searches the stack from the top down — innermost scope first, then outward until a
match is found:
pi — found immediately in scope 2. ✔two — not in scope 2; look outward to scope 1. Found. ✔r — not in scope 2 or… wait, it's in scope 1. Found. ✔If a name is not found in any table on the stack, it was never declared — that is the "undeclared variable" error. Because the search always begins at the top, an inner declaration can shadow an outer one of the same name: the inner is found first, so it wins. This inside-out search is called lexical (or static) scoping — "lexical" because you can work out which declaration a name refers to purely from where it sits in the text, before the program ever runs.
Under lexical (static) scope, a name refers to whatever declaration encloses it in the source text — you resolve it by reading the program. Under dynamic scope, a name refers to the most recent binding on the call stack at run time: which declaration wins depends on who called whom, and can differ from one run to the next.
Dynamic scope makes code fiendishly hard to reason about — you can't tell what a variable means just
by looking at the function that uses it. That is why almost every modern language
(
With names resolved, the analyser can check types. The idea is beautifully recursive
and it walks the
number. A string literal
has type string. (Leaves — no children to check.)+ checks its two children: if both are number,
the result is number; if one is a string and the other a number,
that's a type error — reported, not crashed.
The function below is a real, complete mini type checker. It models a tiny AST, keeps a
scope as a stack of Maps, and check(node, scope) returns the
node's type or throws the moment it meets an undeclared name or a type it can't reconcile. Then
it checks two sample programs — one good, one broken. Press Run:
Look at what came back: the good program is reported as number, while the two broken ones
produce diagnostics, not crashes. That is the whole spirit of semantic analysis — it doesn't
run your program, it reasons about it, and when the meaning doesn't add up it hands you
a clear error with a location. The output of a clean pass is an annotated AST: the same
tree, now with a type stamped on every node and every name linked to its symbol-table entry, ready for
the code generator that comes next.
x was declared or whether the operands' types agree. Semantic
analysis is a separate pass over the finished AST — that's why "a" * 3 parses
cleanly and only later is flagged.