Syntax and Semantics

Write down 2 + 3. On the page it is just five marks — a numeral, a plus sign, another numeral, and the spaces between them. That arrangement of marks is the syntax: the form of the expression, the shapes you are allowed to write. But the marks mean something — they denote the number five. That meaning is the semantics.

Every programming language lives a double life this way. Its syntax says which strings of characters are well-formed — a grammar, a spell-checker for programs. Its semantics says what those well-formed programs actually do or mean. This one distinction — form versus meaning — is the backbone of the whole subject, and it is what this page is about.

Grammatical but meaningless

Natural language shows the split cleanly. The linguist Noam Chomsky's sentence "Colorless green ideas sleep furiously" is perfectly grammatical — subject, adjectives, verb, adverb, all in the right places — yet it means nothing: ideas have no colour, and cannot be both colourless and green, nor sleep. Syntax is satisfied; semantics is not.

Programs do the same. A source file can pass the parser (good syntax) and still be nonsense — a variable used before it exists, a number added to a function, a type mismatch. The parser is happy; the meaning is broken. Keeping "well-formed" and "meaningful" apart is the first habit of a language designer.

Concrete syntax: the characters you type

The concrete syntax is the literal text — every character, parenthesis and space. Which strings count as legal is fixed by a grammar, usually written in BNF (Backus–Naur Form) as a set of production rules. A grammar is exactly the kind of formal-language machinery you met with regular expressions and languages, scaled up to describe nesting.

Here is a tiny grammar for arithmetic expressions:

Expr ::= Expr "+" Term | Term Term ::= Term "*" Factor | Factor Factor ::= Number | "(" Expr ")" Number ::= digit { digit }

Read ::= as "can be", and | as "or". These four rules say a legal expression is a sum of terms, a term is a product of factors, and a factor is either a number or a parenthesised sub-expression. The string 2 * (3 + 4) obeys the rules; the string 2 * * ) 3 does not — it is a syntax error, rejected before we ever ask what it means.

Abstract syntax: the tree of meaning-bearing structure

Parentheses and spacing exist only to guide the reader; once the shape is fixed they can be thrown away. What survives is the abstract syntax — the essential nested structure, captured as an abstract syntax tree (AST). Parsing is exactly the job of turning concrete text into this tree.

In the AST for 2 * (3 + 4), the outermost operation — the multiplication — sits at the root. Its children are the number 2 and the whole 3 + 4 sub-expression, which is itself a little tree. Notice there are no parentheses in the tree: the grouping they expressed is now carried by the shape. A tree is precisely the right structure for this.

Two different strings can parse to the same tree — 2*(3+4) and 2 * ( 3 + 4 ) differ only in spacing, so their concrete syntax differs but their abstract syntax is identical. The tree is what the rest of the language machinery actually works on.

From form to meaning: two ways to say what a program means

We now have a clean tree. What does it mean? There are two classic answers, and they are the two great schools of formal semantics.

Operational semantics — meaning is how it runs. Give a set of state-transition rules that rewrite the program one small step at a time until it reaches a final value. The meaning of an expression is the sequence of steps and the value it ends on. (Think of it as defining an idealised interpreter.)
Denotational semantics — meaning is the mathematical object it denotes. Define a function \llbracket \cdot \rrbracket that maps each piece of syntax straight to a value in some mathematical domain, built compositionally from the meanings of its parts. The meaning of 2 * (3 + 4) simply is the number 14.

Same program, two lenses: one describes the journey (the steps), the other names the destination (the value). A good language design makes sure they agree.

Operational semantics: meaning as small steps

Let us make the operational view concrete. We represent an expression as an AST — a union type: an expression is either a number literal, or a node holding an operator and two sub-expressions. A small-step rule finds the innermost fully-evaluated operation (a redex), performs it, and hands back a slightly smaller tree. Repeat until only a number remains. Press Run and watch the expression reduce, one transition at a time.

type Expr = number | { op: "+" | "*"; left: Expr; right: Expr }; // Pretty-print an expression back into concrete syntax. function show(e: Expr): string { if (typeof e === "number") return String(e); return "(" + show(e.left) + " " + e.op + " " + show(e.right) + ")"; } // ONE small step: reduce the left-most innermost redex. function step(e: Expr): Expr { if (typeof e === "number") return e; // already a value if (typeof e.left === "number" && typeof e.right === "number") { return e.op === "+" ? e.left + e.right : e.left * e.right; } if (typeof e.left !== "number") { // reduce the left child first return { op: e.op, left: step(e.left), right: e.right }; } return { op: e.op, left: e.left, right: step(e.right) }; // then the right } // The MEANING is the whole run: keep stepping until a number falls out. function evaluate(e: Expr): number { console.log("start: " + show(e)); while (typeof e !== "number") { e = step(e); console.log(" --> " + show(e)); } return e; } // 2 * (3 + 4) const ast1: Expr = { op: "*", left: 2, right: { op: "+", left: 3, right: 4 } }; console.log("value =", evaluate(ast1)); console.log(""); // (1 + 2) * (3 + 4) const ast2: Expr = { op: "*", left: { op: "+", left: 1, right: 2 }, right: { op: "+", left: 3, right: 4 } }; console.log("value =", evaluate(ast2));

Each printed line is one state transition. The meaning of the program, on this view, is the whole trace and the number it lands on. Change a rule — say make * reduce its right child first — and you have defined a subtly different operational semantics.

Denotational semantics: meaning as the value denoted

The denotational view skips the journey and names the destination directly. We write one function that maps a piece of syntax to the mathematical value it denotes, defined compositionally — the meaning of a node is built from the meanings of its children:

type Expr = number | { op: "+" | "*"; left: Expr; right: Expr }; // The denotation function ⟦·⟧ : Expr → number. // The meaning of a whole is composed from the meanings of its parts. function meaning(e: Expr): number { if (typeof e === "number") return e; // a literal denotes itself const l = meaning(e.left); const r = meaning(e.right); return e.op === "+" ? l + r : l * r; // + denotes addition, * denotes multiplication } const ast: Expr = { op: "*", left: 2, right: { op: "+", left: 3, right: 4 } }; console.log("2 * (3 + 4) denotes", meaning(ast)); // 14 — no steps, just the value

No trace, no reduction order — the expression simply is 14. For this tiny language the two semantics visibly agree: the operational run ends on the number the denotational function names. Proving that they always agree — an adequacy theorem — is a central sport in the theory of programming languages.

Because they are good at different questions. Operational semantics is close to an implementation, so it is the natural setting for reasoning about evaluation order, performance, and side effects — how a real interpreter or compiler behaves. Denotational semantics is mathematical and compositional, which makes it powerful for proofs: showing two programs are equivalent, or that an optimisation preserves meaning, by comparing the values they denote. A third style, axiomatic semantics (Hoare logic), describes meaning through logical pre- and post-conditions — the foundation of program verification.

Same meaning, different form — and vice versa

Because form and meaning are independent, you get all four combinations:

Different syntax, same semantics. 2 + 3 and 3 + 2 read differently but both denote 5. So do x * 2 and x + x. This is the licence a compiler uses to optimise.
Same syntax, different semantics. The characters a + b mean integer addition in one language, string concatenation in another, and vector addition in a third. The form is fixed; the meaning is assigned by the language.

A syntactically valid program is not the same as a meaningful one. Passing the parser only proves the form is legal. A type error, an undefined variable, a divide by zero — these are semantic errors in a perfectly grammatical program. "It compiles the grammar" is a much weaker promise than "it makes sense".
The program is the tree, not the text. It is tempting to think the source characters are the program, but two different strings can denote the same AST, and the AST is what carries meaning. Concrete syntax is a surface; the abstract syntax underneath is what the semantics acts on.
Don't confuse the two kinds of "meaning". Operational semantics says the meaning is how it steps; denotational semantics says the meaning is the value it denotes. They should agree, but they are answering different questions — don't blur them.

Syntax fixes the form: which strings are well-formed, given by a grammar (concrete syntax) and captured as a tree (abstract syntax / AST).
Semantics fixes the meaning: what a well-formed program does or denotes.
Operational semantics gives meaning as state-transition steps; denotational semantics gives meaning as the mathematical value denoted.