Representing Sound

Clap your hands. The air around them squashes and stretches, and that ripple of pressure races outwards until it wobbles your eardrum — that is sound. Draw the pressure over time and you get a smooth, endlessly wiggling wave: a continuous curve with no gaps and no steps, gliding through every value in between.

A computer has a problem, though. It can only store numbers — a finite list of them — and this wave has infinitely many points. So how do we squeeze a smooth, never-ending curve into a box that only holds numbers? The answer is one of the neatest tricks in computing: sampling.

Sampling: measuring the wave over and over

Instead of storing the whole curve, we measure its height at regular moments — thousands of times a second — and write down just those numbers. Each measurement is a sample. Play the samples back quickly and your ear stitches them together into something that sounds like the original.

The picture below shows a sound wave (the smooth curve) with the samples marked as dots. The computer keeps only the dots — the heights — and throws the curve itself away.

Two choices decide how faithfully those dots capture the wave: how often we measure, and how precisely we record each height.

Sample rate — how often we measure

The sample rate is the number of samples taken each second, measured in hertz (Hz). A rate of 44{,}100\text{ Hz} — the standard for CDs — means the height is measured 44{,}100 times a second. Take samples more often (dots closer together) and the joined-up dots hug the real curve much more tightly.

Bit depth — how precisely we record each height

The bit depth is how many bits we use to store each single sample. With n bits, a sample can be one of 2^n different height levels. So:

8-bit gives 2^8 = 256 possible heights;
16-bit gives 2^{16} = 65{,}536 heights (CD quality).

More levels means each dot can sit closer to the true height of the wave, instead of being rounded to the nearest available rung.

Digital sound is captured by two settings:

Sample rate — samples per second, in Hz (how often the height is measured).
Bit depth — bits per sample (how finely each height is recorded, giving 2^n levels).

Working out the file size

Every second we take (sample rate) samples, and each sample costs (bit depth) bits. Stereo sound has two channels (left and right), so everything doubles. Multiply it all together:

\text{size (bits)} = \text{sample rate} \times \text{bit depth} \times \text{seconds} \times \text{channels}

Divide by 8 to get bytes, then by 1024 for kibibytes, and again for mebibytes. This program does the whole calculation — change the numbers and press Run:

// A 30-second stereo clip at CD quality. Change any of these and Run! const sampleRate: number = 44100; // samples per second (Hz) const bitDepth: number = 16; // bits per sample const seconds: number = 30; // how long the clip is const channels: number = 2; // 1 = mono, 2 = stereo const bits: number = sampleRate * bitDepth * seconds * channels; const bytes: number = bits / 8; const kib: number = bytes / 1024; const mib: number = kib / 1024; console.log("Total bits: " + bits.toLocaleString()); console.log("Bytes: " + bytes.toLocaleString()); console.log("KiB: " + kib.toFixed(1)); console.log("MiB: " + mib.toFixed(2));

Notice how quickly it grows: a single three-minute stereo song at CD quality is over 30\text{ MB} — which is exactly why we invented compressed formats like MP3.

The quality-versus-size trade-off

Look back at the formula: every factor multiplies the size. Doubling the sample rate doubles the file. Doubling the bit depth doubles it too. Higher settings sound better but cost more storage and bandwidth — so engineers pick the lowest settings that still sound good enough for the job. A phone call uses a low rate (voices don't need much detail); a music studio uses a very high one.

It is tempting to think "more is always better — turn everything up!" But sample rate and bit depth are a trade-off, not a free upgrade. A higher sample rate and a higher bit depth do give better, more accurate sound — but they also make the file bigger, every single time. There is no setting that is both more accurate and smaller.

And remember: no matter how high you go, sampling only ever approximates the smooth wave. The stored dots are joined by straight-ish steps, never the true curve. More samples (and more levels) get you closer to the original — but you never quite reach the perfect, continuous wave you started with.

There's a beautiful rule here, discovered by Harry Nyquist and Claude Shannon: to capture a sound faithfully you must sample at least twice as fast as the highest note in it. Human ears top out around 20{,}000\text{ Hz}, so CDs sample at 44{,}100\text{ Hz} — a bit more than double — and that's enough to fool your ear completely. Sample too slowly and high notes fold down into weird low buzzes, an error called aliasing.