The Boltzmann Distribution

Strike a flame with a pinch of table salt and it burns orange; the sodium atoms are being kicked into an excited state and then dropping back, spilling their extra energy as light. But here is the quiet puzzle: at any instant, the overwhelming majority of those atoms are sitting placidly in their lowest energy state, and only a vanishing sliver are excited. Nature seems to prefer low energy. Climb a mountain and the air thins beneath your boots for the same reason: an air molecule high up has more gravitational energy, and high-energy configurations are simply rarer. Both facts — dim excited atoms and thin mountain air — are the same law, the single most useful formula in all of statistical physics.

That law is the Boltzmann distribution, and it fits on one line: the probability of finding a system in a particular state of energy E, when it sits at temperature T, falls off exponentially with that energy,

P(E) \;\propto\; e^{-E/k_B T}.

The exponential factor e^{-E/k_B T} is called the Boltzmann factor. This one page is devoted to that single idea: where the exponential comes from, what the temperature does to it, and how to use it to count populations — of atoms, of molecules, of air.

The result, stated cleanly

Take a system — a single atom, say — that can occupy any of a set of states, each with a definite energy. Keep it in thermal contact with its surroundings at temperature T. Then the probability that we catch it in a state of energy E is

P(E) = \frac{1}{Z}\,e^{-E/k_B T},

where k_B = 1.381\times 10^{-23}\ \text{J/K} is Boltzmann's constant (it just converts a temperature into an energy), and Z is a normalising number — the partition function — chosen so that the probabilities of all the states add up to 1. We will meet Z properly on its own page; for now the crucial thing is the shape, the exponential e^{-E/k_B T}.

The exponential often earns its keep as a ratio, because there the awkward Z cancels. Comparing two states of energies E_1 and E_2,

\frac{P_1}{P_2} = \frac{e^{-E_1/k_B T}}{e^{-E_2/k_B T}} = e^{-(E_1 - E_2)/k_B T}.

Only the energy gap matters. If state 1 lies above state 2 by an amount \Delta E = E_1 - E_2 > 0, then P_1/P_2 = e^{-\Delta E/k_B T} < 1: the upper state is always the rarer one. That is the sodium atom's secret — the excited state costs energy, so it is exponentially under-populated.

Where the exponential comes from

Why that shape, and not some other falling curve? The exponential is forced on us by a single deep idea from entropy and multiplicity: an isolated system is equally likely to be in any of its accessible microstates. Here is the argument in miniature.

Let our little system (the atom) be in weak thermal contact with a huge reservoir (the rest of the world) at temperature T. Together they have a fixed total energy E_{\text{tot}}. If the atom takes energy E, the reservoir is left with E_{\text{tot}} - E. Because all microstates of the combined, isolated whole are equally likely, the probability of the atom having energy E is proportional to the number of ways the reservoir can arrange the energy it has left — its multiplicity \Omega_R(E_{\text{tot}} - E):

P(E) \;\propto\; \Omega_R\!\left(E_{\text{tot}} - E\right).

That multiplicity is an astronomically steep function, so it is far kinder to work with its logarithm, which is exactly the reservoir's entropy, S_R = k_B \ln \Omega_R. Since the atom's energy E is tiny compared with the reservoir's, Taylor-expand the reservoir's entropy about E_{\text{tot}}:

S_R(E_{\text{tot}} - E) \approx S_R(E_{\text{tot}}) - E\,\frac{\partial S_R}{\partial E}.

Now recall the very definition of temperature from thermodynamics — \dfrac{\partial S}{\partial E} = \dfrac{1}{T}. Substituting,

S_R(E_{\text{tot}} - E) \approx \text{const} - \frac{E}{T}.

Finally undo the logarithm. Since \Omega_R = e^{S_R/k_B}, we get \Omega_R \propto e^{-E/k_B T}, and therefore

Single state. A system in equilibrium at temperature T occupies a state of energy E with probability P(E) = \dfrac{1}{Z}\,e^{-E/k_B T}.
Ratio of two states. \dfrac{P_1}{P_2} = e^{-(E_1 - E_2)/k_B T} — only the energy gap matters, and higher energy always means lower probability.
Energy levels with degeneracy. If a level of energy E contains g distinct states, its total population is P(\text{level}) \propto g\,e^{-E/k_B T}.

The exponential is not an accident of atoms or gases — it is the shadow of the reservoir's entropy, and it appears for any system held at a fixed temperature.

What temperature does: the thermal energy scale kT

Look again at the exponent, -E/k_B T. Energy only ever appears divided by the combination k_B T, so k_B T is the natural yardstick against which every energy is measured. It is the characteristic thermal energy — a rough measure of how much energy the surrounding jostling can casually hand to your system. At room temperature k_B T \approx 4.1\times 10^{-21}\ \text{J} \approx 0.025\ \text{eV}.

A state whose energy sits far above k_B T (E \gg k_B T) has a Boltzmann factor buried deep in the exponential tail — it is almost never occupied.
A state within a few k_B T of the ground state is genuinely in play, thermally accessible.

The picture below is an energy ladder. Each rung is a state; the bar beside it shows that state's relative population e^{-E/k_B T}. Drag the temperature slider and watch the personality of the distribution change: crank k_B T down and the population collapses into the ground rung; crank it up and the bars even out as the higher rungs fill in. High temperature flattens; low temperature concentrates.

Physicists so often write the reciprocal that it gets its own symbol: the inverse temperature

\beta \equiv \frac{1}{k_B T}, \qquad\text{so that}\qquad P(E) \propto e^{-\beta E}.

In this language a hot system has small \beta (a gentle exponential, a flat distribution) and a cold system has large \beta (a steep exponential, everything in the ground state). Absolute zero is \beta \to \infty: the ground state, and nothing else.

The Boltzmann factor as a curve

Plot the Boltzmann factor e^{-E/k_B T} against energy E and slide the temperature. Two features are worth burning into memory, and the graph makes both unmistakable.

Every curve falls. No matter how hot it gets, the Boltzmann factor is monotonically decreasing in E. Higher energy is always less probable than lower energy. Temperature never inverts this — it only changes the steepness.
Temperature tilts the tail. Raise k_B T and the curve flattens, lifting the far states out of the basement; lower it and the curve plunges, crushing everything but the ground state.

Don't forget to count the states: degeneracy

The Boltzmann factor answers "how likely is one state of energy E?" But often several distinct states share the same energy — we say the level is degenerate, with degeneracy g. To get the population of a whole level, multiply the Boltzmann factor by how many states it holds:

P(\text{level}) \propto g\,e^{-E/k_B T}, \qquad \frac{P_1}{P_2} = \frac{g_1}{g_2}\,e^{-(E_1 - E_2)/k_B T}.

This is why a high level can occasionally out-populate a lower one per level even though each individual state obeys the falling exponential: a big enough head-count g_1 can beat a modest exponential penalty. In hydrogen, for instance, the n=2 shell holds g_2 = 8 states to the ground shell's g_1 = 2 — a factor of 4 that we must carry along. It rarely rescues the upper level, but leaving it out is a genuine error, not a rounding one.

Worked examples

Example 1 — a two-level atom. An atom has a ground state and a single excited state a gap \Delta E above it, with \Delta E = k_B T exactly (both non-degenerate). What fraction of atoms are excited, relative to the ground state? Using the ratio,

\frac{P_{\text{exc}}}{P_{\text{gnd}}} = e^{-\Delta E/k_B T} = e^{-1} \approx 0.368.

So for every 1000 atoms in the ground state, about 368 are excited. Push the gap to \Delta E = 2k_B T and it drops to e^{-2} \approx 0.135; at \Delta E = 5k_B T it is e^{-5} \approx 0.0067 — under one percent. Each extra k_B T of gap costs another factor of e.

Example 2 — hydrogen in the Sun (the punchline). The gap between hydrogen's first excited shell and its ground shell is \Delta E = 10.2\ \text{eV}. The Sun's surface is at about T = 5800\ \text{K}, where k_B T \approx 0.50\ \text{eV}. So

\frac{\Delta E}{k_B T} \approx \frac{10.2}{0.50} \approx 20.4,

\frac{P_2}{P_1} = \frac{g_2}{g_1}\,e^{-\Delta E/k_B T} = 4\,e^{-20.4} \approx 5\times 10^{-9}.

Fewer than one hydrogen atom in a hundred million is in the n=2 state, even at the blazing surface of a star. The degeneracy factor of 4 barely dents an exponential of -20. This is why the visible absorption lines of hydrogen are faint in cool stars and only strengthen in hotter ones — a fact that underpins the whole classification of stars.

Example 3 — why the air thins (the barometric formula). A molecule of mass m at height h carries gravitational potential energy E = mgh. Treat that as the energy in the Boltzmann factor and the number density of molecules must fall off as

n(h) = n_0\,e^{-mgh/k_B T}.

This is the barometric formula, and it is nothing but the Boltzmann distribution applied to gravitational potential energy. The height at which the density drops by a factor of e is the scale height H = k_B T/mg; for air at room temperature that works out to roughly 8\ \text{km} — which is exactly why the summit air of a Himalayan peak is so brutally thin, and why airliners cruise where the density is a fraction of its sea-level value. The atmosphere is a Boltzmann distribution you can walk up into.

No — and this is the mistake that trips up almost everyone. The bare Boltzmann factor e^{-E/k_B T} is not a probability. It cannot be: at E = 0 it equals 1, and a single state cannot have probability 1 if there are other states available. It is only an unnormalised weight. To turn weights into genuine probabilities you must divide by their sum, the partition function

Z = \sum_{\text{states }i} e^{-E_i/k_B T}, \qquad P(E_i) = \frac{e^{-E_i/k_B T}}{Z}.

The Boltzmann factor tells you the relative odds of two states — which is why it is perfectly safe inside a ratio, where Z cancels. But the moment you want an absolute probability, or an average, you need Z. Two more traps hide nearby: remember to weight a level by its degeneracy g (states, not levels, obey the plain factor), and never say "higher temperature makes high-energy states more likely than low ones" — the distribution is always monotonically decreasing in E. Higher T only makes the high states less unlikely, never more likely than the ground state.

A laser needs a population inversion — more atoms in the upper state than the lower — so that stimulated emission wins over absorption. But look at what the Boltzmann distribution forbids: in genuine thermal equilibrium, P_{\text{upper}}/P_{\text{lower}} = e^{-\Delta E/k_B T} is always less than 1 for a positive gap, no matter how hot you make it. The best that infinite temperature can do is e^{0} = 1 — the two states equally full, never inverted. You cannot make a laser by heating. Real lasers cheat thermal equilibrium entirely, pumping atoms into a metastable state faster than they decay.

There is a delicious twist. If you formally invert a population, then P_{\text{upper}} > P_{\text{lower}} forces e^{-\Delta E/k_B T} > 1, which requires a negative absolute temperature. Far from being colder than absolute zero, negative-temperature systems are in a sense hotter than infinity — they will dump heat into anything at any ordinary positive temperature. Ludwig Boltzmann (1844–1906), who first wrote down the distribution and carved S = k \log W onto the world, would have savoured that a population inversion is quite literally off the top of the temperature scale.