Ray Tracing

Look at a glossy car in an animated film: the sky curves across its bonnet, the road glints back off its doors, a soft shadow pools beneath it. None of that was painted by hand. It was ray traced — computed by literally simulating how light travels. Where projection squashes a 3-D world onto the screen with a matrix, ray tracing turns the whole idea inside out: for every single pixel, it shoots a straight line — a ray — out from the eye, through that pixel, into the scene, and asks "what's the first thing this ray hits?"

Answer that, colour the pixel with whatever the ray found, and repeat for a few million pixels. Add one more idea — when the ray hits something shiny, bounce it and keep going — and reflections, shadows and glass fall out almost for free. That single, honest simulation of light is why films and offline renderers reach for ray tracing when they want photorealism.

A ray is a point plus a direction

A ray is the simplest object in the whole subject: start at an origin O (the eye), pick a direction D (through the pixel), and slide forward. Every point on the ray is

P(t) = O + t\,D, \qquad t \ge 0.

The parameter t is distance along the ray (in units of |D|). At t=0 you're at the eye; as t grows you march out into the scene. We only care about t \ge 0 — things behind the eye don't count. Rendering a whole image is then just: for each pixel, build its ray, and find the smallest positive t at which the ray strikes a surface. The smallest t wins because that's the nearest object — everything behind it is hidden.

Ray meets sphere: it all comes down to a quadratic

Spheres are the "hello world" of ray tracing, because the intersection maths is exact and clean. A sphere is every point P at distance r from a centre C: |P - C|^2 = r^2. Substitute the ray P(t)=O+tD and expand. Writing m = O - C:

|O + tD - C|^2 = r^2 \;\Longrightarrow\; (D\cdot D)\,t^2 + 2(D\cdot m)\,t + (m\cdot m - r^2) = 0.

That's just a quadratic a t^2 + b t + c = 0 in the unknown distance t, with

a = D\cdot D, \qquad b = 2\,(D\cdot m), \qquad c = m\cdot m - r^2.

And whether the ray hits at all is decided entirely by the discriminant \Delta = b^2 - 4ac, exactly as in school algebra:

Worked example: hit or miss?

Put the eye at O=(0,0), fire straight along D=(1,0), and place a unit sphere (r=1) at C=(3,0). Then m = O - C = (-3, 0), and

a = D\cdot D = 1, \quad b = 2(D\cdot m) = 2(-3) = -6, \quad c = m\cdot m - r^2 = 9 - 1 = 8. \Delta = b^2 - 4ac = 36 - 32 = 4 > 0 \;\Rightarrow\; \text{HIT}.

Two roots: t = \frac{6 \pm 2}{2} = 4 or 2. The nearer one is t=2, so the ray strikes the front of the sphere at P(2) = (2,0) — dead on, exactly one radius in front of the centre. Now nudge the aim upward to D=(1,2): recomputing gives a=5, b=-6, c=8, so \Delta = 36 - 160 = -124 < 0 — a miss, the ray sails past above the sphere. The sign of one number, the discriminant, is the whole yes/no.

Watch a ray bounce

The eye sits on the left; the vertical line is the image plane, one point per pixel. Drag the pixel up and down to aim the primary ray. When it strikes the sphere, you'll see the surface normal (the outward direction at the hit) and the reflected ray springing off it. Aim high or low enough and the ray misses completely — the discriminant has gone negative.

The bounce: reflecting about the normal

When the ray lands, the surface has an outward unit normal N — for a sphere, simply the direction from the centre to the hit, N = (P - C)/r. The law of reflection says the bounced ray leaves at the same angle it arrived, measured from the normal: angle in equals angle out. In vectors, the incoming direction D reflects to

R = D - 2\,(D\cdot N)\,N.

The term (D\cdot N)N is the part of D pointing along the normal; subtracting it twice flips that component while leaving the sideways part untouched — precisely a mirror bounce. Spawn a fresh ray from the hit point in direction R, trace it, and you've captured a reflection. Do this recursively — reflections of reflections — and mirrored hallways and chrome spheres render themselves. This recursion is what puts the "tracing" in ray tracing.

Here's a trap everyone hits. You find an intersection at point P, then shoot a new ray from P — a shadow ray toward a light, or a reflection ray. But floating-point rounding means P is a hair inside the very surface it came from. The new ray, starting at t=0, immediately reports an intersection with that same surface at some microscopic t — so the surface decides it is shadowing itself. The screen fills with a grimy black speckle called shadow acne.

The standard cure is to start each secondary ray a tiny epsilon off the surface — push the origin a smidgen along the normal, or ignore any hit with t < \varepsilon for some small \varepsilon. One nudge, and the acne clears up.

Cost. The naive algorithm tests every ray against every object, so a frame is on the order of (\text{pixels}) \times (\text{objects}) intersection tests — and with reflection and shadow rays it multiplies again. A 4K frame is ~8 million primary rays before a single bounce. That's why real-time engines long preferred rasterization, which touches each triangle once. The modern rescue is acceleration structures (like bounding-volume hierarchies) that skip whole swathes of a scene at once, cutting the cost from linear in the object count to roughly logarithmic — plus dedicated ray-tracing hardware now baked into GPUs. Films, which render offline, have always been happy to pay the full price for the realism.