Assembly Language and Instruction Sets

You already know that the marches through a program one instruction at a time. But what is one of those instructions, really? Deep down, the CPU only understands machine code — raw binary patterns like 0001 0100 0000 1001. Each pattern is an order: "add", "load this", "jump there". That's genuinely all the processor can execute.

The trouble is that human beings are hopeless at reading binary. Writing a whole program as strings of 1s and 0s would be slow, error-prone and impossible to debug. So very early in computing, people invented a thin, human-friendly skin over machine code: they gave every binary instruction a short, memorable name. That skin is assembly language.

Assembly is a low-level language — "low" meaning close to the hardware, not "simple". Instead of a binary pattern you write a short word called a mnemonic (pronounced "nem-on-ic" — the first m is silent), a memory aid such as LOAD, ADD, STORE or BRANCH. Crucially, each mnemonic maps almost one-to-one onto a single machine-code instruction. One line of assembly becomes (roughly) one instruction the CPU runs — a completely different relationship from a high-level language, where one line can become dozens.

From binary, to mnemonic, to English

Here is the same single instruction seen at three levels. Read across: they all mean exactly the same thing to the machine, but each is easier for a person to read than the one before it.

Machine code (what the CPU runs)	Assembly (the mnemonic)	Plain English
`0001 0100`	`LOAD 4`	Copy the number in memory box 4 into the CPU.
`0010 0101`	`ADD 5`	Add the number in memory box 5 to it.
`0011 0110`	`STORE 6`	Write the result back into memory box 6.

Notice how each machine-code byte splits into two halves. The first part (the opcode, or "operation code") says what to do — 0001 means "load", 0010 means "add". The second part (the operand) says what to do it to — usually a memory address or a value. The mnemonic ADD 5 is just a readable stand-in for that pair: opcode "add", operand "5".

The instruction set: a CPU's whole vocabulary

A CPU cannot do "anything" — it can only do the specific handful of operations its designers built into it. That complete list of every operation a particular processor understands is called its instruction set (or instruction set architecture, ISA). It is the CPU's entire vocabulary: if an instruction isn't in the set, the chip simply has no way to obey it.

A typical teaching instruction set is tiny — often only a dozen or so operations — yet that is enough to compute anything computable. Here is a representative slice:

Mnemonic	What the CPU does
`LOAD addr`	Copy the value at memory address `addr` into the accumulator (a working register inside the CPU).
`STORE addr`	Copy the accumulator's value out to memory address `addr`.
`ADD addr`	Add the value at `addr` to the accumulator.
`SUB addr`	Subtract the value at `addr` from the accumulator.
`INPUT`	Read a number from the input and put it in the accumulator.
`OUTPUT`	Send the accumulator's value to the output.
`BRANCH addr`	Jump: make the next instruction the one at `addr` (an unconditional jump).
`BRZ addr`	Branch only if the accumulator is zero (a conditional jump — this is how loops and IFs are built).
`HALT`	Stop the program.

Every program that CPU will ever run — an operating system, a game, a web browser — is built entirely out of instructions drawn from this list. The genius of computing is how much you can do with so few, humble verbs.

A whole program: add two numbers

Let's write a real (if tiny) assembly program in the style of the Little Man Computer (LMC) — a simple model CPU you may meet in class. The job: read two numbers, add them, print the answer. Read it top to bottom; the CPU runs the lines in order unless an instruction tells it to jump. This is a non-runnable listing — no browser can execute assembly, because assembly is specific to one make of CPU:

INPUT // read the first number into the accumulator STORE 20 // save it in memory box 20 (call it "first") INPUT // read the second number into the accumulator STORE 21 // save it in memory box 21 (call it "second") LOAD 20 // copy "first" back into the accumulator ADD 21 // add "second" to it -> accumulator now holds the sum OUTPUT // print the sum HALT // stop

Trace it with the inputs 7 and 3:

INPUT → accumulator = 7. STORE 20 → box 20 holds 7.
INPUT → accumulator = 3. STORE 21 → box 21 holds 3.
LOAD 20 → accumulator = 7 again (we'd overwritten it with 3).
ADD 21 → accumulator = 7 + 3 = 10.
OUTPUT → prints 10. HALT → done.

Eight instructions — and four of them are just shuffling values between the CPU and memory. Notice we had to STORE the first number and later LOAD it back, because the very next INPUT would otherwise clobber it. The CPU only holds one working value at a time, so the program has to be careful about where everything lives. That constant loading and storing is the flavour of all assembly programming.

The same idea, one line of high-level code

Now watch what a high-level language (like TypeScript, Python or Java) does with the exact same task. The whole add-and-show becomes essentially one readable line. Press Run:

const first = 7; const second = 3; console.log(first + second); // add and show — one line does what 8 assembly instructions did

The high-level version doesn't mention memory boxes, accumulators, loading or storing at all. You just say first + second and it happens. That is the entire trade-off between the two kinds of language:

High-level — easy for humans, reads like maths/English, works on many different CPUs, but you give up fine control and it must be translated before it can run.
Assembly (low-level) — verbose and fiddly, but you control the machine precisely, instruction by instruction, and it maps almost directly to what the CPU actually does.

So when would anyone still choose assembly today? When you need every last drop of speed or the smallest possible program — device drivers, the innermost loop of a games console, the software on a tiny microcontroller in a washing machine — or when you must touch a piece of hardware directly. For almost everything else, the convenience of a high-level language wins.

Who turns assembly into binary? The assembler

The CPU still only understands binary, so LOAD 20 has to become 0001 10100 before it can run. The program that does that translation is called an assembler. Because assembly is almost one-to-one with machine code, the assembler's job is mostly a straight look-up — swap each mnemonic for its opcode, work out the addresses — which is why assembly is so much simpler to translate than a high-level language (whose translator, a compiler, may turn one line into many instructions).

Machine code — the binary instructions the CPU actually executes.
Assembly language — human-readable mnemonics with a near one-to-one correspondence to those machine-code instructions.
Instruction set — the complete list of operations one CPU understands; it defines which mnemonics (and which binary opcodes) even exist for that processor.

And the assembler is the tool that translates assembly language into machine code.

Instruction sets come in two broad styles. A CISC design (Complex Instruction Set Computer, e.g. the classic x86 chips in many PCs) offers a large set of rich, sometimes elaborate instructions — a single instruction might do quite a lot of work. A RISC design (Reduced Instruction Set Computer, e.g. the ARM chips in nearly every phone) deliberately keeps the set small and simple, so each instruction is tiny and fast, and the chip can be simpler and more power-efficient — you just use more of them. Neither is "better" everywhere: RISC's simplicity and low power won it the mobile world, while CISC's rich instructions long dominated desktops. It's a genuine engineering trade-off, not a right answer.

Each assembly instruction does one tiny step — nothing more. A single instruction can only load, or store, or perform one operation like a single add. There is no assembly instruction for "add these two numbers and print the answer" — that everyday high-level statement becomes several separate assembly instructions (load, add, output, and usually some stores in between), as our eight-line program showed. If a question asks you to hand-write assembly, expect one small step per line.

And assembly is not portable. Because the mnemonics are just names for a particular CPU family's binary instruction set, an ARM program will not run on an x86 chip, and vice versa — their instruction sets are different. This is the opposite of high-level code, which you can usually re-translate for any CPU. Low-level means tied to the hardware.