Multiple Features

Real problems have many features — a house has size, bedrooms, age, location, and more. Give each feature its own weight, and the prediction is the sum of (weight × feature), plus the bias:

h(\vec{x}) = w_1 x_1 + w_2 x_2 + \dots + w_n x_n + b.

That sum is exactly a dot product of the weight vector with the feature vector. So the whole model collapses to one clean line:

h(\vec{x}) = \vec{w}\cdot\vec{x} + b.

Here is linear algebra paying off: no matter how many features, the prediction is a single dot product.

Weights decide what matters

The features are fixed (one house). Turn the weight dials and watch the prediction \vec{w}\cdot\vec{x} + b respond. A big positive weight means "this feature strongly raises the prediction"; a negative weight pushes it down; a near-zero weight means "this feature barely matters." Training is the search for the weights that make the predictions right.

Same algorithm, more knobs

Nothing else changes. The cost is still mean squared error, and gradient descent still rolls downhill — just in more dimensions, adjusting every weight at once. Writing the model as \vec{w}\cdot\vec{x}+b is also why a whole dataset can be processed as one big matrix–vector multiply — fast, and exactly what hardware loves.