Stochastic Processes

A stochastic process is a whole family of random variables (X_t)_{t \in T}, all living on one probability space (\Omega, \mathcal{F}, \mathbb{P}) and indexed by a parameter t we read as time. Where a single random variable models one uncertain number, a process models an uncertain quantity that evolves: a price, a queue length, a particle's position.

The index set T may be discreteT = \{0, 1, 2, \dots\}, so we write X_n — or continuous, T = [0, \infty). The state space S is where the values live (often \mathbb{Z} or \mathbb{R}): each X_t : \Omega \to S.

Two ways to look at it

A process has two faces, and switching between them is the whole art. Fix a time t and you are left with a single random variable X_t — a snapshot, with its own distribution. Fix an outcome \omega \in \Omega instead and the randomness is spent; what remains is one deterministic curve

t \longmapsto X_t(\omega),

a single sample path (or trajectory) — one realised history of the world. Nature draws a single \omega once and for all; we then watch the path it traces out. The figure below draws several such paths of a simple random walk; each coloured trajectory is a different \omega. Refresh to let nature draw again.

A function of two arguments

The two viewpoints are really the two ways of holding still one object: a process is a function of two arguments, time and chance,

X : T \times \Omega \to S, \qquad (t, \omega) \longmapsto X_t(\omega).

Feed it both a time t and an outcome \omega and you get a single value in the state space. Now freeze one slot at a time:

Neither slice is the whole story. A sample path forgets how likely it was; a single X_t forgets how it relates to X_s at other times. The relations between the snapshots — how X_{t_1} and X_{t_2} move together — are what make a process more than a bundle of unrelated random variables.

What a process really is: its finite-dimensional distributions

How much information pins a process down? You cannot write down the law of the whole (uncountably infinite) path directly. The trick is to look at finitely many times at once. Pick any finite list of times t_1 < t_2 < \cdots < t_n in T and read off the snapshots there. The result is a random vector

\big(X_{t_1}, X_{t_2}, \dots, X_{t_n}\big) \in S^n,

and that vector has a joint law on S^n — its finite-dimensional distribution (an fdd):

\mu_{t_1, \dots, t_n}(B) \;=\; \mathbb{P}\!\left[\,(X_{t_1}, \dots, X_{t_n}) \in B\,\right], \qquad B \subseteq S^n.

The family of all these joint laws — one for every finite list of times — is what a process really is, for almost every purpose. Two processes that match on every finite list of times are, distributionally, the same object: every probability you can compute from finitely many observations agrees. So when we say "a Brownian motion" or "a Poisson process" we are naming a family of fdds, not one particular set of paths.

Two stochastic processes with the same state space have the same law precisely when all their finite-dimensional distributions agree: for every finite list of times t_1 < \cdots < t_n the joint laws \mu_{t_1, \dots, t_n} coincide. (Informally: the fdds are the full identikit of a process — nothing finite is left to check.)

The fdds cannot be chosen at random, though. They must be consistent: drop a time from the list and the smaller joint law must be the marginal of the bigger one (observing fewer times can't change the law of the times you kept), and permuting the times must permute the law the same way. These are the Kolmogorov consistency conditions.

We have said an fdd family describes a process. The converse — that a suitable family actually builds one — is a genuine theorem, and it is what lets us define processes by listing their finite laws instead of constructing paths by hand.

Kolmogorov's extension theorem. Given a family of finite-dimensional distributions \{\mu_{t_1, \dots, t_n}\} that satisfies the two consistency conditions above (marginalisation and permutation), there exists a probability space (\Omega, \mathcal{F}, \mathbb{P}) and a process (X_t)_{t \in T} on it whose finite-dimensional distributions are exactly the given \mu_{t_1, \dots, t_n}.

This is the licence behind every "let (W_t) be a Brownian motion": one specifies the Gaussian fdds, checks consistency, and the theorem hands back an honest process realising them — no explicit \Omega required.

One subtlety — versions and modifications. The fdds fix the law at finitely many times, but they are silent about the fine structure of the paths. Two processes (X_t) and (Y_t) with identical fdds are called versions (or, when \mathbb{P}[X_t = Y_t] = 1 at each fixed t, modifications) of one another — yet one can have continuous paths and the other not. The fdds choose the process up to its finite-time law; an extra argument (a path regularity criterion, such as Kolmogorov's continuity theorem) is needed to pick the continuous version we usually want.

Adapted to the flow of information

Time also carries information. A filtration (\mathcal{F}_t)_{t \in T} is an increasing family of σ-algebras, \mathcal{F}_s \subseteq \mathcal{F}_t for s \le t — the growing record of what is known by each time. A process is adapted to (\mathcal{F}_t) when every X_t is \mathcal{F}_t-measurable:

X_t \text{ is known by time } t \quad\text{for every } t.

No peeking into the future. A trading strategy may use today's price and all of history, but not tomorrow's — adaptedness is the precise statement of that honesty. Classic examples: the discrete random walk; the continuous Bachelier–Wiener Brownian motion; and the Poisson process counting arrivals over time.