Index Notation & the Summation Convention

By the time you meet relativity, electromagnetism or the stress in a solid, vectors and matrices stop being tidy little arrows and grids and become objects with many components that must be manipulated by the hundred. Writing them out longhand — or even with bold letters and dot and cross products — becomes hopeless. Physicists switched, over a century ago, to a language that scales: index notation, sharpened by Einstein's summation convention.

The idea is disarmingly simple. Instead of the vector \mathbf{a}, talk about its typical component a_i — the i-th entry, where i runs over 1, 2, 3 (or 1, \dots, n). Instead of the matrix M, talk about M_{ij} — the entry in row i, column j. One symbol now stands for the whole array, and the index tells you which slot. This little shift turns geometry into bookkeeping, and bookkeeping is mechanical.

Free indices and dummy indices

Every index in an expression is one of two kinds, and telling them apart is the whole grammar.

A free index appears once in a term and is free to take any value 1, 2, 3. It labels which equation you are looking at. In

c_i = a_i + b_i,

the index i is free: this single line is shorthand for the three equations c_1 = a_1 + b_1, c_2 = a_2 + b_2, c_3 = a_3 + b_3. The free indices must match on both sides of an equation and in every term — that is your first and best error check.

A dummy index (or repeated index) appears twice in a single term. It is being summed over — and its name is irrelevant, exactly like the i in \sum_i. In

\sum_{i=1}^{3} a_i b_i = a_1 b_1 + a_2 b_2 + a_3 b_3,

the i is a dummy: you could rename it k without changing a thing.

Einstein's convention: drop the sigma

Einstein noticed that in physics a repeated index is always summed, so the \textstyle\sum is pure decoration. His convention throws it away:

So the dot product collapses to two characters:

\mathbf{a}\cdot\mathbf{b} = \sum_{i=1}^{3} a_i b_i \;=\; a_i b_i.

The repeated i on the right is understood to be summed. That is the entire notation. Everything below is just this rule, applied.

The Kronecker delta

The workhorse of index notation is the Kronecker delta \delta_{ij} — the components of the identity matrix:

\delta_{ij} = \begin{cases} 1 & i = j, \\ 0 & i \neq j. \end{cases}

As a grid it is 1 down the diagonal and 0 everywhere else — the figure shows the 3\times 3 case. Because the basis vectors are orthonormal, \hat{\mathbf{e}}_i\cdot\hat{\mathbf{e}}_j = \delta_{ij}, which is where it comes from.

Its magic is the sifting property: summed against something, it renames an index. Because \delta_{ij} is zero unless j = i, the sum \delta_{ij} a_j keeps only the j = i term:

\delta_{ij}\, a_j = a_i.

The delta "eats" the dummy j and hands its index over to whatever it was summed against. Two more facts you will use constantly:

\delta_{ij}\,\delta_{jk} = \delta_{ik}, \qquad \delta_{ii} = \delta_{11} + \delta_{22} + \delta_{33} = 3 \;\;(= n \text{ in } n \text{ dimensions}).

Watch that last one: \delta_{ii} has a repeated i, so it is summed — it is the trace of the identity, which is the dimension, not 1.

The dot product and matrix multiplication, in indices

With the convention in hand, the two central operations of linear algebra become one-liners.

The dot product

The dot product is a repeated index — the result has no free index, so it is a scalar:

\mathbf{a}\cdot\mathbf{b} = a_i b_i.

Matrix times vector

Row i of M dotted with \mathbf{x}. The dummy j is summed; the free i survives, so the result is a vector:

(M\mathbf{x})_i = M_{ij}\, x_j.

Matrix times matrix

The product AB sums row i of A against column j of B over the shared (dummy) index k:

(AB)_{ij} = A_{ik}\, B_{kj}.

Two free indices (i, j) survive, so the result is a matrix — exactly as it should be. Notice how the pattern of the indices encodes the rule: the repeated k sits adjacent (the "inner" dimension), and it must match, which is precisely why the inner dimensions have to agree for the product to exist.

Why bother? Proofs become mechanical

The transpose swaps indices: (A^{\mathsf T})_{ij} = A_{ji}. Watch how a fact that is fiddly to prove with grids falls out in one line — (AB)^{\mathsf T} = B^{\mathsf T} A^{\mathsf T}:

\big((AB)^{\mathsf T}\big)_{ij} = (AB)_{ji} = A_{jk}\, B_{ki} = B_{ki}\, A_{jk} = (B^{\mathsf T})_{ik}\,(A^{\mathsf T})_{kj} = (B^{\mathsf T} A^{\mathsf T})_{ij}.

Every step is a definition or the freedom to reorder numbers (components are just scalars, so A_{jk}B_{ki} = B_{ki}A_{jk}). No cleverness, no picture — the indices do the thinking. That is the point of the notation: it converts vector-identity proofs into symbol pushing that anyone can check.

And because a component is just a number, you can compute these sums directly. Here is a_i b_i and (AB)_{ij} = A_{ik}B_{kj} written as literal loops over the repeated index:

type Vec = number[]; type Mat = number[][]; // Dot product: sum over the repeated index i → a_i b_i function dot(a: Vec, b: Vec): number { let s = 0; for (let i = 0; i < a.length; i++) s += a[i] * b[i]; return s; } // Matrix product: (AB)_ij = A_ik B_kj — the dummy index k is summed function matmul(A: Mat, B: Mat): Mat { const n = A.length, m = B[0].length, inner = B.length; const C: Mat = Array.from({ length: n }, () => Array(m).fill(0)); for (let i = 0; i < n; i++) for (let j = 0; j < m; j++) for (let k = 0; k < inner; k++) C[i][j] += A[i][k] * B[k][j]; // accumulate the sum over k return C; } console.log("a·b =", dot([1, 2, 3], [4, 5, 6])); // 1·4 + 2·5 + 3·6 = 32 console.log("AB =", matmul([[1, 2], [3, 4]], [[5, 6], [7, 8]]));

The triple loop is the summation convention made concrete: the outer two loops walk the free indices i, j, and the inner loop performs the sum over the dummy index k.

Einstein introduced the convention in his 1916 paper on general relativity and later joked that it was his "great discovery in mathematics." He was teasing — but only half. General relativity is drowning in sums: a single term can carry four or five indices, each summed over four spacetime dimensions. Written with explicit \textstyle\sum signs, an equation like R_{\mu\nu} - \tfrac12 R\,g_{\mu\nu} = 8\pi G\,T_{\mu\nu} would sprawl across the page in a thicket of summation symbols. Dropping them isn't laziness — it makes the structure visible, so you can see which indices are free (and label the equation) and which are contracted (and summed away). The convention is arguably the most successful piece of notation in twentieth-century physics.

Never let an index appear three times in one term. If you write something like a_i b_i c_i, it is ambiguous and meaningless — the convention only defines a sum for an index that appears exactly twice. If two different vectors happen to reuse the same dummy letter, rename one of them before combining. For example, to multiply the scalars (a_i b_i) and (c_j d_j) you must use different dummy indices: (a_i b_i)(c_j d_j), never (a_i b_i)(c_i d_i) = a_i b_i c_i d_i — that last form has i four times and is nonsense. When in doubt, count how many times each letter appears in a single term: it must be one (free) or two (dummy), never more.