Why Go Deep?

One hidden layer is, in theory, enough to approximate any function. So why stack many layers? Because depth builds features in a hierarchy, and that's a far more efficient way to describe the world. Each layer composes the patterns found by the one before into something richer.

In a vision network the story is vivid: the first layer detects tiny edges; the next assembles edges into textures and corners; the next into parts — an eye, a wheel; the next into whole objects. Nobody programmed those features; the network learned them, level by level, because depth made that composition natural.

Features of features of features

Step through the layers of a vision network and watch the concepts grow from edges to textures to parts to objects. Deeper layers see more abstract things, each built from the simpler pieces below it. That hierarchy is what "deep" in deep learning really means.

The end of the beginning

Depth is why a single architecture can learn to see, hear, translate and converse: stack enough layers, feed enough data, and let backpropagation discover the hierarchy. From here the field branches into specialised shapes — convolutional networks for images, transformers for language and beyond — but every one of them is built from the pieces you now understand: vectors and matrices, dot products, activations, a loss, and gradient descent. You've reached the frontier of classical deep learning.

From here the climb continues all the way to the systems writing and reading these words. The Deep Learning branch picks up exactly where this leaves off — the modern training engine, the Transformer, and how today's large language models are trained and served.