Prompting

Once in-context learning works, the prompt is the program. You no longer change the model to change its behaviour; you change the text you feed it. The same frozen weights become a translator, a classifier, or a reasoner depending entirely on how you phrase the input. Prompting is the discipline of writing that input well — and it has a few patterns that reliably move the needle.

The patterns, line by line

Step 1 — zero-shot vs few-shot. The baseline lever is how many examples you supply. Zero-shot states the task and asks; few-shot prepends k worked examples first:

\text{zero-shot: } \texttt{[task] [query]} \qquad\quad \text{few-shot: } \texttt{[(x}_1\texttt{→y}_1\texttt{) … (x}_k\texttt{→y}_k\texttt{)] [query]}.

Few-shot pins down the desired format and disambiguates the task — often the cheapest accuracy you can buy.

Step 2 — chain-of-thought: ask it to show its work. For a multi-step problem, append a simple instruction — "Let's think step by step" — or demonstrate a worked solution that spells out the reasoning. Instead of emitting the answer directly, the model now generates intermediate steps r_1, r_2, \dots, r_m first:

x \;\to\; r_1 \to r_2 \to \cdots \to r_m \;\to\; \hat{y}.

Step 3 — why writing the steps helps: each step conditions the next. The model is autoregressive — every token it emits is fed back in as context for the next. So a written reasoning step r_i literally becomes part of what the final answer is conditioned on:

p_\theta(\hat{y} \mid x,\, r_1, \dots, r_m) \quad\text{vs.}\quad p_\theta(\hat{y} \mid x).

The intermediate tokens are scratch memory the model can read back. A direct answer has to do all the work in one forward pass with nowhere to store partial results; a chain-of-thought answer spreads the computation across many tokens, each building on the last. That is why "show your working" dramatically improves multi-step arithmetic and reasoning — and why it does little for one-step lookups, which need no scratch space.

Step 4 — system prompts and role conditioning. A final lever is to prepend a persistent instruction that frames who the model is and how it should respond — a system prompt ("You are a careful maths tutor; explain each step"). Because every later token is conditioned on it, that framing steers tone, format, and behaviour across the whole exchange — role-conditioning the same weights without touching them.

With frozen weights, the prompt steers the model. Three patterns:

The flip side of "the prompt is the program" is prompt sensitivity: trivial rewordings — a relocated example, a changed instruction, even the order of the few-shot demonstrations — can swing accuracy noticeably. The model has no stable API; you are programming in natural language, a famously underspecified one. This is why prompt engineering is empirical, and why robust prompts are tested, not just written.

And chain-of-thought is more than a trick — it buys two concrete things. First, more compute per answer: a fixed-depth transformer does a bounded amount of work per token, so a one-token answer caps the computation, whereas emitting m reasoning tokens runs the network m more times, with each pass able to build on the last. Second, explicit intermediate state: the partial results are written into the context where attention can read them back, rather than having to be juggled invisibly inside one forward pass. More serial computation plus external scratch memory — that is the whole reason "think step by step" turns a hard multi-step problem into a sequence of easy one-step ones.

Plain vs. chain-of-thought

Flip between a plain prompt and a chain-of-thought prompt for the same word problem. The plain prompt asks for the answer directly and the model blurts a wrong one-step guess. The chain-of-thought prompt adds "let's think step by step," so the model writes out each intermediate result — and because each line is fed back as context for the next, it lands on the right total. Same model, same question; only the phrasing changed.