Syntax and Semantics

Write down 2 + 3. On the page it is just five marks — a numeral, a plus sign, another numeral, and the spaces between them. That arrangement of marks is the syntax: the form of the expression, the shapes you are allowed to write. But the marks mean something — they denote the number five. That meaning is the semantics.

Every programming language lives a double life this way. Its syntax says which strings of characters are well-formed — a grammar, a spell-checker for programs. Its semantics says what those well-formed programs actually do or mean. This one distinction — form versus meaning — is the backbone of the whole subject, and it is what this page is about.

Grammatical but meaningless

Natural language shows the split cleanly. The linguist Noam Chomsky's sentence "Colorless green ideas sleep furiously" is perfectly grammatical — subject, adjectives, verb, adverb, all in the right places — yet it means nothing: ideas have no colour, and cannot be both colourless and green, nor sleep. Syntax is satisfied; semantics is not.

Programs do the same. A source file can pass the parser (good syntax) and still be nonsense — a variable used before it exists, a number added to a function, a type mismatch. The parser is happy; the meaning is broken. Keeping "well-formed" and "meaningful" apart is the first habit of a language designer.

Concrete syntax: the characters you type

The concrete syntax is the literal text — every character, parenthesis and space. Which strings count as legal is fixed by a grammar, usually written in BNF (Backus–Naur Form) as a set of production rules. A grammar is exactly the kind of formal-language machinery you met with regular expressions and languages, scaled up to describe nesting.

Here is a tiny grammar for arithmetic expressions:

Expr ::= Expr "+" Term | Term Term ::= Term "*" Factor | Factor Factor ::= Number | "(" Expr ")" Number ::= digit { digit }

Read ::= as "can be", and | as "or". These four rules say a legal expression is a sum of terms, a term is a product of factors, and a factor is either a number or a parenthesised sub-expression. The string 2 * (3 + 4) obeys the rules; the string 2 * * ) 3 does not — it is a syntax error, rejected before we ever ask what it means.

Abstract syntax: the tree of meaning-bearing structure

Parentheses and spacing exist only to guide the reader; once the shape is fixed they can be thrown away. What survives is the abstract syntax — the essential nested structure, captured as an abstract syntax tree (AST). Parsing is exactly the job of turning concrete text into this tree.

In the AST for 2 * (3 + 4), the outermost operation — the multiplication — sits at the root. Its children are the number 2 and the whole 3 + 4 sub-expression, which is itself a little tree. Notice there are no parentheses in the tree: the grouping they expressed is now carried by the shape. A tree is precisely the right structure for this.

Two different strings can parse to the same tree — 2*(3+4) and 2 * ( 3 + 4 ) differ only in spacing, so their concrete syntax differs but their abstract syntax is identical. The tree is what the rest of the language machinery actually works on.

From form to meaning: two ways to say what a program means

We now have a clean tree. What does it mean? There are two classic answers, and they are the two great schools of formal semantics.

Same program, two lenses: one describes the journey (the steps), the other names the destination (the value). A good language design makes sure they agree.

Operational semantics: meaning as small steps

Let us make the operational view concrete. We represent an expression as an AST — a union type: an expression is either a number literal, or a node holding an operator and two sub-expressions. A small-step rule finds the innermost fully-evaluated operation (a redex), performs it, and hands back a slightly smaller tree. Repeat until only a number remains. Press Run and watch the expression reduce, one transition at a time.

type Expr = number | { op: "+" | "*"; left: Expr; right: Expr }; // Pretty-print an expression back into concrete syntax. function show(e: Expr): string { if (typeof e === "number") return String(e); return "(" + show(e.left) + " " + e.op + " " + show(e.right) + ")"; } // ONE small step: reduce the left-most innermost redex. function step(e: Expr): Expr { if (typeof e === "number") return e; // already a value if (typeof e.left === "number" && typeof e.right === "number") { return e.op === "+" ? e.left + e.right : e.left * e.right; } if (typeof e.left !== "number") { // reduce the left child first return { op: e.op, left: step(e.left), right: e.right }; } return { op: e.op, left: e.left, right: step(e.right) }; // then the right } // The MEANING is the whole run: keep stepping until a number falls out. function evaluate(e: Expr): number { console.log("start: " + show(e)); while (typeof e !== "number") { e = step(e); console.log(" --> " + show(e)); } return e; } // 2 * (3 + 4) const ast1: Expr = { op: "*", left: 2, right: { op: "+", left: 3, right: 4 } }; console.log("value =", evaluate(ast1)); console.log(""); // (1 + 2) * (3 + 4) const ast2: Expr = { op: "*", left: { op: "+", left: 1, right: 2 }, right: { op: "+", left: 3, right: 4 } }; console.log("value =", evaluate(ast2));

Each printed line is one state transition. The meaning of the program, on this view, is the whole trace and the number it lands on. Change a rule — say make * reduce its right child first — and you have defined a subtly different operational semantics.

Denotational semantics: meaning as the value denoted

The denotational view skips the journey and names the destination directly. We write one function that maps a piece of syntax to the mathematical value it denotes, defined compositionally — the meaning of a node is built from the meanings of its children:

type Expr = number | { op: "+" | "*"; left: Expr; right: Expr }; // The denotation function ⟦·⟧ : Expr → number. // The meaning of a whole is composed from the meanings of its parts. function meaning(e: Expr): number { if (typeof e === "number") return e; // a literal denotes itself const l = meaning(e.left); const r = meaning(e.right); return e.op === "+" ? l + r : l * r; // + denotes addition, * denotes multiplication } const ast: Expr = { op: "*", left: 2, right: { op: "+", left: 3, right: 4 } }; console.log("2 * (3 + 4) denotes", meaning(ast)); // 14 — no steps, just the value

No trace, no reduction order — the expression simply is 14. For this tiny language the two semantics visibly agree: the operational run ends on the number the denotational function names. Proving that they always agree — an adequacy theorem — is a central sport in the theory of programming languages.

Because they are good at different questions. Operational semantics is close to an implementation, so it is the natural setting for reasoning about evaluation order, performance, and side effects — how a real interpreter or compiler behaves. Denotational semantics is mathematical and compositional, which makes it powerful for proofs: showing two programs are equivalent, or that an optimisation preserves meaning, by comparing the values they denote. A third style, axiomatic semantics (Hoare logic), describes meaning through logical pre- and post-conditions — the foundation of program verification.

Same meaning, different form — and vice versa

Because form and meaning are independent, you get all four combinations: