Syntax and Semantics
Write down 2 + 3. On the page it is just five marks — a numeral, a
plus sign, another numeral, and the spaces between them. That arrangement of marks is the
syntax: the form of the expression, the shapes you are allowed to write.
But the marks mean something — they denote the number five. That meaning
is the semantics.
Every programming language lives a double life this way. Its syntax says which
strings of characters are well-formed — a grammar, a spell-checker for programs. Its
semantics says what those well-formed programs actually do or
mean. This one distinction — form versus meaning — is the backbone of
the whole subject, and it is what this page is about.
Grammatical but meaningless
Natural language shows the split cleanly. The linguist Noam Chomsky's sentence
"Colorless green ideas sleep furiously" is perfectly grammatical —
subject, adjectives, verb, adverb, all in the right places — yet it means nothing: ideas have no
colour, and cannot be both colourless and green, nor sleep. Syntax is satisfied; semantics is not.
Programs do the same. A source file can pass the parser (good syntax) and still be nonsense — a
variable used before it exists, a number added to a function, a type mismatch. The parser is happy;
the meaning is broken. Keeping "well-formed" and "meaningful" apart is the first habit of
a language designer.
Concrete syntax: the characters you type
The concrete syntax is the literal text — every character, parenthesis and space.
Which strings count as legal is fixed by a grammar, usually written in
BNF (Backus–Naur Form) as a set of production rules. A grammar is exactly the kind
of formal-language machinery you met with
regular
expressions and languages, scaled up to describe nesting.
Here is a tiny grammar for arithmetic expressions:
Expr ::= Expr "+" Term | Term
Term ::= Term "*" Factor | Factor
Factor ::= Number | "(" Expr ")"
Number ::= digit { digit }
Read ::= as "can be", and | as "or". These
four rules say a legal expression is a sum of terms, a term is a product of factors, and a factor
is either a number or a parenthesised sub-expression. The string 2 * (3 + 4) obeys the
rules; the string 2 * * ) 3 does not — it is a syntax error, rejected
before we ever ask what it means.
Abstract syntax: the tree of meaning-bearing structure
Parentheses and spacing exist only to guide the reader; once the shape is fixed they can be thrown
away. What survives is the abstract syntax — the essential nested structure,
captured as an abstract syntax tree (AST). Parsing is exactly the
job of turning concrete text into this tree.
In the AST for 2 * (3 + 4), the outermost operation — the multiplication — sits at the
root. Its children are the number 2 and the whole
3 + 4 sub-expression, which is itself a little tree. Notice there are
no parentheses in the tree: the grouping they expressed is now carried by the
shape. A tree is
precisely the right structure for this.
Two different strings can parse to the same tree — 2*(3+4) and
2 * ( 3 + 4 ) differ only in spacing, so their concrete syntax differs but their
abstract syntax is identical. The tree is what the rest of the language machinery actually works on.
From form to meaning: two ways to say what a program means
We now have a clean tree. What does it mean? There are two classic answers, and they are
the two great schools of formal semantics.
-
Operational semantics — meaning is how it runs. Give a set of
state-transition rules that rewrite the program one small step at a time until
it reaches a final value. The meaning of an expression is the sequence of steps and the
value it ends on. (Think of it as defining an idealised interpreter.)
-
Denotational semantics — meaning is the mathematical object it denotes.
Define a function \llbracket \cdot \rrbracket that maps each piece of
syntax straight to a value in some mathematical domain, built compositionally from the meanings
of its parts. The meaning of 2 * (3 + 4) simply is the number
14.
Same program, two lenses: one describes the journey (the steps), the other names the
destination (the value). A good language design makes sure they agree.
Operational semantics: meaning as small steps
Let us make the operational view concrete. We represent an expression as an AST — a union type: an
expression is either a number literal, or a node holding an operator and two
sub-expressions. A small-step rule finds the innermost fully-evaluated operation (a
redex), performs it, and hands back a slightly smaller tree. Repeat until only a number
remains. Press Run and watch the expression reduce, one transition at a time.
type Expr = number | { op: "+" | "*"; left: Expr; right: Expr };
// Pretty-print an expression back into concrete syntax.
function show(e: Expr): string {
if (typeof e === "number") return String(e);
return "(" + show(e.left) + " " + e.op + " " + show(e.right) + ")";
}
// ONE small step: reduce the left-most innermost redex.
function step(e: Expr): Expr {
if (typeof e === "number") return e; // already a value
if (typeof e.left === "number" && typeof e.right === "number") {
return e.op === "+" ? e.left + e.right : e.left * e.right;
}
if (typeof e.left !== "number") { // reduce the left child first
return { op: e.op, left: step(e.left), right: e.right };
}
return { op: e.op, left: e.left, right: step(e.right) }; // then the right
}
// The MEANING is the whole run: keep stepping until a number falls out.
function evaluate(e: Expr): number {
console.log("start: " + show(e));
while (typeof e !== "number") {
e = step(e);
console.log(" --> " + show(e));
}
return e;
}
// 2 * (3 + 4)
const ast1: Expr = { op: "*", left: 2, right: { op: "+", left: 3, right: 4 } };
console.log("value =", evaluate(ast1));
console.log("");
// (1 + 2) * (3 + 4)
const ast2: Expr =
{ op: "*", left: { op: "+", left: 1, right: 2 }, right: { op: "+", left: 3, right: 4 } };
console.log("value =", evaluate(ast2));
Each printed line is one state transition. The meaning of the program, on this
view, is the whole trace and the number it lands on. Change a rule — say make
* reduce its right child first — and you have defined a subtly different operational
semantics.
Denotational semantics: meaning as the value denoted
The denotational view skips the journey and names the destination directly. We write one function
that maps a piece of syntax to the mathematical value it denotes, defined
compositionally — the meaning of a node is built from the meanings of its
children:
type Expr = number | { op: "+" | "*"; left: Expr; right: Expr };
// The denotation function ⟦·⟧ : Expr → number.
// The meaning of a whole is composed from the meanings of its parts.
function meaning(e: Expr): number {
if (typeof e === "number") return e; // a literal denotes itself
const l = meaning(e.left);
const r = meaning(e.right);
return e.op === "+" ? l + r : l * r; // + denotes addition, * denotes multiplication
}
const ast: Expr = { op: "*", left: 2, right: { op: "+", left: 3, right: 4 } };
console.log("2 * (3 + 4) denotes", meaning(ast)); // 14 — no steps, just the value
No trace, no reduction order — the expression simply is 14. For
this tiny language the two semantics visibly agree: the operational run ends on the number
the denotational function names. Proving that they always agree — an
adequacy theorem — is a central sport in the theory of programming languages.
Because they are good at different questions. Operational semantics is close to
an implementation, so it is the natural setting for reasoning about evaluation order,
performance, and side effects — how a real interpreter or compiler behaves. Denotational
semantics is mathematical and compositional, which makes it powerful for proofs: showing
two programs are equivalent, or that an optimisation preserves meaning, by comparing the values
they denote. A third style, axiomatic semantics (Hoare logic), describes meaning
through logical pre- and post-conditions — the foundation of program verification.
Same meaning, different form — and vice versa
Because form and meaning are independent, you get all four combinations:
-
Different syntax, same semantics. 2 + 3 and
3 + 2 read differently but both denote 5.
So do
x * 2 and x + x. This is the licence a compiler uses to optimise.
-
Same syntax, different semantics. The characters
a + b mean integer
addition in one language, string concatenation in another, and vector addition in a third. The
form is fixed; the meaning is assigned by the language.
-
A syntactically valid program is not the same as a meaningful one. Passing the
parser only proves the form is legal. A type error, an undefined variable, a divide by
zero — these are semantic errors in a perfectly grammatical program. "It
compiles the grammar" is a much weaker promise than "it makes sense".
-
The program is the tree, not the text. It is tempting to think the source
characters are the program, but two different strings can denote the same AST, and the
AST is what carries meaning. Concrete syntax is a surface; the abstract syntax underneath is
what the semantics acts on.
-
Don't confuse the two kinds of "meaning". Operational semantics says the
meaning is how it steps; denotational semantics says the meaning is the value it
denotes. They should agree, but they are answering different questions — don't blur them.
- Syntax fixes the form: which strings are well-formed, given by a
grammar (concrete syntax) and captured as a tree (abstract syntax / AST).
- Semantics fixes the meaning: what a well-formed program does or
denotes.
- Operational semantics gives meaning as state-transition steps;
denotational semantics gives meaning as the mathematical value denoted.