Index Notation & the Summation Convention
By the time you meet relativity, electromagnetism or the stress in a solid, vectors and matrices
stop being tidy little arrows and grids and become objects with many components that must
be manipulated by the hundred. Writing them out longhand — or even with bold letters and dot and
cross products — becomes hopeless. Physicists switched, over a century ago, to a language that
scales: index notation, sharpened by Einstein's summation convention.
The idea is disarmingly simple. Instead of the vector \mathbf{a}, talk
about its typical component a_i — the i-th
entry, where i runs over 1, 2, 3 (or
1, \dots, n). Instead of the matrix M, talk
about M_{ij} — the entry in row i, column
j. One symbol now stands for the whole array, and the index tells you
which slot. This little shift turns geometry into bookkeeping, and bookkeeping is
mechanical.
Free indices and dummy indices
Every index in an expression is one of two kinds, and telling them apart is the whole grammar.
A free index appears once in a term and is free to take any value
1, 2, 3. It labels which equation you are looking at. In
c_i = a_i + b_i,
the index i is free: this single line is shorthand for the three
equations c_1 = a_1 + b_1, c_2 = a_2 + b_2,
c_3 = a_3 + b_3. The free indices must match on both sides
of an equation and in every term — that is your first and best error check.
A dummy index (or repeated index) appears twice in a
single term. It is being summed over — and its name is irrelevant, exactly like the
i in \sum_i. In
\sum_{i=1}^{3} a_i b_i = a_1 b_1 + a_2 b_2 + a_3 b_3,
the i is a dummy: you could rename it k
without changing a thing.
Einstein's convention: drop the sigma
Einstein noticed that in physics a repeated index is always summed, so the
\textstyle\sum is pure decoration. His convention throws it away:
-
Any index repeated twice in a term is automatically summed over its range
1, 2, 3 (or 1, \dots, n). No
\textstyle\sum is written.
-
An index may appear at most twice in a term. Three of the same index in one
term is meaningless — a certain sign you have made a mistake.
-
A free index (appearing once) must appear in every term and on both
sides of the equation.
So the dot product collapses to two characters:
\mathbf{a}\cdot\mathbf{b} = \sum_{i=1}^{3} a_i b_i \;=\; a_i b_i.
The repeated i on the right is understood to be summed. That is the
entire notation. Everything below is just this rule, applied.
The Kronecker delta
The workhorse of index notation is the Kronecker delta
\delta_{ij} — the components of the identity matrix:
\delta_{ij} = \begin{cases} 1 & i = j, \\ 0 & i \neq j. \end{cases}
As a grid it is 1 down the diagonal and 0
everywhere else — the figure shows the 3\times 3 case. Because the
basis vectors are orthonormal, \hat{\mathbf{e}}_i\cdot\hat{\mathbf{e}}_j = \delta_{ij},
which is where it comes from.
Its magic is the sifting property: summed against something, it renames an index.
Because \delta_{ij} is zero unless j = i,
the sum \delta_{ij} a_j keeps only the j = i
term:
\delta_{ij}\, a_j = a_i.
The delta "eats" the dummy j and hands its index over to whatever it
was summed against. Two more facts you will use constantly:
\delta_{ij}\,\delta_{jk} = \delta_{ik}, \qquad \delta_{ii} = \delta_{11} + \delta_{22} + \delta_{33} = 3 \;\;(= n \text{ in } n \text{ dimensions}).
Watch that last one: \delta_{ii} has a repeated
i, so it is summed — it is the trace of the identity, which is the
dimension, not 1.
The dot product and matrix multiplication, in indices
With the convention in hand, the two central operations of linear algebra become one-liners.
The dot product
The dot product
is a repeated index — the result has no free index, so it is a scalar:
\mathbf{a}\cdot\mathbf{b} = a_i b_i.
Matrix times vector
Row i of M dotted with
\mathbf{x}. The dummy j is summed; the free
i survives, so the result is a vector:
(M\mathbf{x})_i = M_{ij}\, x_j.
Matrix times matrix
The product
AB sums row i of A
against column j of B over the shared
(dummy) index k:
(AB)_{ij} = A_{ik}\, B_{kj}.
Two free indices (i, j) survive, so the result is a matrix — exactly as
it should be. Notice how the pattern of the indices encodes the rule: the repeated
k sits adjacent (the "inner" dimension), and it must match, which is
precisely why the inner dimensions have to agree for the product to exist.
Why bother? Proofs become mechanical
The transpose swaps indices: (A^{\mathsf T})_{ij} = A_{ji}. Watch how a
fact that is fiddly to prove with grids falls out in one line —
(AB)^{\mathsf T} = B^{\mathsf T} A^{\mathsf T}:
\big((AB)^{\mathsf T}\big)_{ij} = (AB)_{ji} = A_{jk}\, B_{ki} = B_{ki}\, A_{jk} = (B^{\mathsf T})_{ik}\,(A^{\mathsf T})_{kj} = (B^{\mathsf T} A^{\mathsf T})_{ij}.
Every step is a definition or the freedom to reorder numbers (components are just
scalars, so A_{jk}B_{ki} = B_{ki}A_{jk}). No cleverness, no picture —
the indices do the thinking. That is the point of the notation: it converts vector-identity
proofs into symbol pushing that anyone can check.
And because a component is just a number, you can compute these sums directly. Here is
a_i b_i and (AB)_{ij} = A_{ik}B_{kj} written
as literal loops over the repeated index:
type Vec = number[];
type Mat = number[][];
// Dot product: sum over the repeated index i → a_i b_i
function dot(a: Vec, b: Vec): number {
let s = 0;
for (let i = 0; i < a.length; i++) s += a[i] * b[i];
return s;
}
// Matrix product: (AB)_ij = A_ik B_kj — the dummy index k is summed
function matmul(A: Mat, B: Mat): Mat {
const n = A.length, m = B[0].length, inner = B.length;
const C: Mat = Array.from({ length: n }, () => Array(m).fill(0));
for (let i = 0; i < n; i++)
for (let j = 0; j < m; j++)
for (let k = 0; k < inner; k++)
C[i][j] += A[i][k] * B[k][j]; // accumulate the sum over k
return C;
}
console.log("a·b =", dot([1, 2, 3], [4, 5, 6])); // 1·4 + 2·5 + 3·6 = 32
console.log("AB =", matmul([[1, 2], [3, 4]], [[5, 6], [7, 8]]));
The triple loop is the summation convention made concrete: the outer two loops walk the
free indices i, j, and the inner loop performs the sum over the dummy
index k.
Einstein introduced the convention in his 1916 paper on general relativity and later joked that
it was his "great discovery in mathematics." He was teasing — but only half. General relativity
is drowning in sums: a single term can carry four or five indices, each summed over four
spacetime dimensions. Written with explicit \textstyle\sum signs, an
equation like R_{\mu\nu} - \tfrac12 R\,g_{\mu\nu} = 8\pi G\,T_{\mu\nu}
would sprawl across the page in a thicket of summation symbols. Dropping them isn't laziness —
it makes the structure visible, so you can see which indices are free (and label the
equation) and which are contracted (and summed away). The convention is arguably the most
successful piece of notation in twentieth-century physics.
Never let an index appear three times in one term. If you write something like
a_i b_i c_i, it is ambiguous and meaningless — the
convention only defines a sum for an index that appears exactly twice. If two
different vectors happen to reuse the same dummy letter, rename one of them
before combining. For example, to multiply the scalars
(a_i b_i) and (c_j d_j) you must use
different dummy indices:
(a_i b_i)(c_j d_j), never
(a_i b_i)(c_i d_i) = a_i b_i c_i d_i — that last form has
i four times and is nonsense. When in doubt, count how many times
each letter appears in a single term: it must be one (free) or two (dummy), never more.