A neural
network that treats an image as one long list of pixels is in trouble. A modest
200 \times 200 colour image is
120{,}000 numbers; a fully-connected first layer of a thousand neurons
would need 120 million weights — and it would have thrown away the single
most important fact about an image: that nearby pixels belong together, and a
feature like an edge looks the same wherever it appears.
A convolutional neural network (CNN) is built around that fact. Instead of
connecting every pixel to every neuron, it slides a tiny filter across the image,
looking for one small pattern everywhere at once.
Convolution: one small filter, slid everywhere
A filter (or kernel) is a little grid of weights — often just
3 \times 3. You slide it over every position of the image, and at each
spot compute a dot product of the filter with the patch beneath it. The grid of results is a
feature map that lights up wherever the filter's pattern occurs. A filter like
\begin{bmatrix}-1&0&1\\-1&0&1\\-1&0&1\end{bmatrix}
responds strongly at a vertical edge — dark on the left, bright on the right — and
stays quiet on flat regions. Run it on a little image with an edge down the middle:
type Grid = number[][];
// A 5×5 "image": dark (0) on the left, bright (1) on the right — a vertical edge.
const image: Grid = [
[0, 0, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 1, 1, 1],
];
// A vertical-edge filter: (right column) minus (left column).
const filter: Grid = [
[-1, 0, 1],
[-1, 0, 1],
[-1, 0, 1],
];
function convolve(img: Grid, k: Grid): Grid {
const out: Grid = [];
for (let i = 0; i + 3 <= img.length; i++) {
const row: number[] = [];
for (let j = 0; j + 3 <= img[0].length; j++) {
let sum = 0;
for (let di = 0; di < 3; di++)
for (let dj = 0; dj < 3; dj++) sum += img[i + di][j + dj] * k[di][dj];
row.push(sum);
}
out.push(row);
}
return out;
}
// The feature map lights up (big numbers) exactly where the edge is, and is 0 on flat areas.
for (const row of convolve(image, filter)) console.log(row.join(" "));
The output is large at the edge and zero over the uniform bright region — the filter has
found the edge, and it would find one anywhere in the image using the exact same nine
weights.