Intelligent agents

A room thermostat senses the temperature and switches the heating on or off. A robot vacuum bumps around your floor, turning when it hits a wall. A chess program studies the board and pushes a pawn. A web crawler follows links, deciding which page to fetch next. These four things look nothing alike — yet artificial intelligence describes every one of them with a single idea: each is an agent.

An agent is anything that perceives its environment through sensors and acts upon that environment through actuators. That's the whole definition, and its power is that it is almost absurdly general — it fits a thermostat, a Mars rover, a stock-trading bot, a fly, and you. AI's central question then becomes sharp and concrete: what should the agent do next?

This page is about that question in the abstract — before any search, logic or learning. We'll pin down what an agent is, what it means for one to behave well (rationality), and how to specify the exact task an agent faces (the PEAS checklist). Get this frame right and every later technique — search, planning, game-playing — is just a different answer to "what should I do next?".

The perceive–act loop

An agent sits inside an environment and runs the same loop forever: it reads a percept from its sensors, thinks, and sends an action to its actuators — which changes the environment, producing the next percept. Round and round.

Everything the agent has ever sensed is its percept sequence. In principle an agent's behaviour is captured by an agent function that maps every possible percept sequence to an action:

f : \mathcal{P}^{*} \longrightarrow \mathcal{A}

(from percept sequences \mathcal{P}^{*} to actions \mathcal{A}). That function is a purely mathematical object — a giant lookup table. The agent program is the concrete code, running on some physical architecture, that implements it. AI is largely the craft of writing a compact program that behaves like an impossibly large table.

A concrete agent: the vacuum world

Take the classic toy world: two squares, A and B, each either clean or dirty. Our vacuum agent senses only two things — which square it is in and whether that square is dirty — and can Suck, move Left, or move Right.

A simple reflex agent ignores history entirely and picks its action from the current percept alone, by a handful of condition–action rules. Run it and watch it react:

type Percept = { location: "A" | "B"; dirty: boolean }; // A simple reflex agent: action depends ONLY on the current percept. function reflexVacuum(p: Percept): string { if (p.dirty) return "Suck"; // clean up first, always return p.location === "A" ? "Right" : "Left"; // else patrol to the other square } const stream: Percept[] = [ { location: "A", dirty: true }, { location: "A", dirty: false }, { location: "B", dirty: true }, { location: "B", dirty: false }, ]; for (const p of stream) { console.log(`in ${p.location}, dirty=${p.dirty} -> ${reflexVacuum(p)}`); }

Notice how tiny the program is compared with the function it stands for: the agent function would need a row for every conceivable percept sequence, but a few rules reproduce it. That compression — a small program acting like a vast table — is the recurring trick of the whole field.

What makes an agent good? Rationality

We don't want just any behaviour — we want good behaviour. But "good" has to be judged by an outside performance measure: a score on the state of the environment, not on how pleased the agent feels with itself. (Measure a vacuum by squares cleaned per hour, not by dirt sucked — or it will happily dump dirt out and re-suck it forever.)

For each possible percept sequence, a rational agent selects an action expected to maximise its performance measure, given

the evidence provided by the percept sequence so far, and
whatever built-in prior knowledge the agent has.

Read that carefully: rationality is about doing the best you can with what you know. It is not the same as being perfect, all-knowing, or always successful. A rational agent can still lose — if the world was unlucky or hid the crucial fact from its sensors — and it would have made the same choice again. That gap between rational and omniscient is one of the most important ideas on this page.

Suppose you look both ways, see no cars, and cross the road — only for a cargo door to fall off a passing plane and flatten you. Were you irrational? Of course not. Rationality is judged against the information you had, not against a crystal ball you didn't. Demanding that an agent maximise the actual outcome would require it to predict the future perfectly — that's omniscience, and it's impossible. Rationality maximises expected performance, which is exactly what an engineer can actually build. The remedy for bad luck isn't magic; it's better sensors and, sometimes, a little exploration to gather information before committing.

A classic slip is to treat rational as a synonym for omniscient, perfect, or always wins. It is none of these. An omniscient agent knows the actual outcome of every action in advance; no real agent can. A rational agent simply picks the action with the best expected value given its percepts and knowledge — and may still be defeated by chance or by things it could not perceive. When an exam asks whether an agent is rational, ask "did it do the best it could with the information available?", never "did it get the best possible result?".

Specifying the task: PEAS

Before you can design a rational agent you must nail down the task environment precisely. The standard checklist is PEAS — Performance measure, Environment, Actuators, Sensors. Fill in all four and you have said exactly what problem you are solving. Here it is for a self-driving taxi:

Performance measure — safe, fast, legal, comfortable trips; maximise profit; minimise fuel and complaints.
Environment — roads, other traffic, pedestrians, weather, road signs, passengers.
Actuators — steering, accelerator, brake, indicators, horn, a screen/voice for the passenger.
Sensors — cameras, LIDAR, radar, GPS, speedometer, odometer, engine sensors.

Every PEAS entry is a design decision. Add a microphone to S and the taxi can take spoken directions; put "obey the speed limit" in P and a whole class of behaviours is ruled out. The same agent architecture, pointed at a different PEAS, is a completely different product.

Reading the environment

Task environments come in flavours, and the flavour dictates how hard the agent's job is. The main axes to classify along:

Fully vs partially observable — can the sensors see the entire relevant state (chess) or only part of it (poker, driving)? A partially observable world usually forces the agent to keep internal memory of what it can't currently see.
Deterministic vs stochastic — does an action have a single guaranteed outcome (a sliding puzzle) or a spread of possible ones (a dice game, a slippery road)?
Episodic vs sequential — is each decision self-contained (spotting defects on a conveyor belt) or does it commit you down a path where earlier choices haunt later ones (chess, driving)?
Static vs dynamic — does the world hold still while you think (a crossword) or keep changing under you (a real road)?
Discrete vs continuous — finitely many distinct states and actions (board games) or smoothly varying quantities (steering angle, speed)?
Single- vs multi-agent — are you alone, or are there other agents whose choices matter — perhaps adversaries actively working against you (chess, markets)?

The hardest corner — partially observable, stochastic, sequential, dynamic, continuous, multi-agent — is roughly "driving a taxi in a city", which is why it stayed unsolved for so long. The easiest — fully observable, deterministic, episodic, static, discrete, single-agent — is a tidy puzzle, and it's where the search techniques on the next pages live.

In theory the agent function is just a table: percept sequence in, action out. Why bother with clever programs? Because the table is astronomically large. For automated driving the camera alone delivers on the order of tens of megabytes per frame; the number of distinct hour-long percept sequences dwarfs the number of atoms in the observable universe. No storage could hold that table, and no designer could fill it in. The entire discipline exists to replace an impossible table with a feasible program — by search, by logical rules, by learned parameters — that computes the right action on demand instead of looking it up.