This chapter gives some more detail on the Central Processing Unit (CPU) and leads up to where we can write significant programs in assembly/machine code. First we will give an overview of how a processor and memory function together to execute a single machine instruction - the famous fetch-decode-execute cycle.
A CPU consists of three major parts:
A fourth part, main memory, is never far from the CPU but from a logical point of view is best kept separate.
We will pay most attention to the data path part of the processor, and what must happen in it to cause useful things to happen - to cause program instructions to be executed.
In the system we describe, the control part is implemented by microprogram, i.e. how the fetching, decoding and execution of a machine instruction can be implemented by execution of a set of sequencing steps called a microprogram. Note on terminology: the term microprogram was devised in the early 1950s (see (Ferry, 2003)), long before microprocessors were ever dreamt of.
Figure 6.1 shows the data path part of our hypothetical CPU from (Tanenbaum, 1990), page 170 onwards. Here, we briefly describe the components of Figure 6.1. Then we give a qualitative discussion of how it executes program instructions. Finally we describe the execution of instructions in some detail.
There are 16 identical 16-bit registers. But, they are not general purpose, each has a special use:
For now, we can ignore all the others, though we give brief descriptions below.
A and B latches hold stable versions of A and B buses. There would be problems if, for example, AC was connected straight into the A input of the ALU and, meanwhile, the output of the ALU was connected to AC, i.e.. what version of AC to use; the answer would be continuously changing.
The ALU input A can be fed with either: (i) the contents of the A latch; or (ii) the contents of MBR, i.e. what was originally the contents of a memory location.
In Mac-1a the ALU may perform just one of four functions:
Any other functions have to be programmed.
The shifter is not a register - it passes the ALU output straight through: shifted left, shifted right or not shifted.
The MAR is a register which is used as a gateway - a `buffer' - onto the address bus. Likewise the MBR (it might be better to call this memory data register) for the data bus.
The memory is considered to be a collection of cells or locations,
each of which can be addressed individually, and thus written to or
read from. Effectively, memory is like an array in C, Basic or any
other high-level language. For brevity, we shall refer to this memory
`array' as
and the address of a general cell as
and so, the contents of the
cell at address
as
, or
.
To read from a memory cell, the controller must cause the following to happen:
To write to a memory cell, the controller must cause something similar to happen:
It is a feature of all general purpose computers that executable instructions and data occupy the same memory space. Often, programs are organised so that there are blocks of instructions and blocks of data. But, there is no fundamental reason, except tidiness and efficiency, why instructions and data cannot be mixed up together.
To describe the details of operation of the CPU, we use a simple language called Register Transfer Language (RTL). The notation is as follows.
denotes contents of location
; sometimes
, or even just
. Think of an envelope with £100 in it, and your address on it.
Reg denotes a register; Reg = PC, IR, AC, R1 or R2.
denotes contents of the address contained in
. Think of
an envelope containing another envelope.
We use
to denote transfer:
. Pronounce this as
`A gets B'. In the case of
, we say `A gets contents of x'.
Back in section 2.7, we produced a simple model of a computer. Here we show it again, Figure 6.2.
At the end of section 2.7 we admitted that we had been telling only half the truth! And we admitted that we had to fit the program into memory as well. Fine, here goes. Were going to use the same program.
In this more realistic model, the person operating the CPU has no list of instructions available on the desk, but must read one instruction at a time from memory.
Recall what was needed: add the contents of memory cell 0 to the contents of memory cell 1, store the result in cell 2; if the result is greater-than-or-equal-to 40, put 1 in cell 3, otherwise put 0 in cell 3. (We are adding marks, and cell 3 contains an indicator of Pass (1) or Fail (0).
And the program, with appropriate numerical code (so that instructions be stored in memory). The numerically coded instruction is given in four Hexadecimal digits; the first digit gives the operation required (load, add, store, ...) - the opcode; the last three digits give the address or data - the operand.
The opcodes are as follows:
I have to renumber the program steps from P1-P14 to P101 ..., for reasons which will soon become evident. Also, we will use hexadecimal numbering.
We now have to revise Figure 6.2 to show the program, Figure 6.4. The revisions are as follows:
In this revised model, the CPU operator has no list of instructions on his/her desk (the CPU); he/she must go through the following cycle of steps for each instruction step:
If the operation is a jump, then all the operator does is take the operand (the jump-to address) and place it in the PC - thus stopping the PC pointing to the next instruction in sequence.
There we have it. The famous fetch-decode-execute cycle. The CPU is a pretty busy place!
How does the CPU and its controller execute a sequence of instructions? Let us start
by considering the execution the instruction at location 0x100;
what follows is an endless loop of the so-called fetch-decode-execute cycle.
We now examine the instruction set, by which assembly programmers can program the machine. We will call the machine Mac-1a; Mac-1a is a restricted version of Tanenbaum's Mac-1. The main characteristics of Mac-1a are: data word length 16-bit; address size 12-bits.
Exercise. What is the maximum number of words we can have in the main memory of Mac-1a? (neglect memory mapped input-output). How many bytes?
There are two addressing modes : immediate and direct; we will neglect Tanenbaum's local and indirect for the meanwhile.
It is accumulator based: that is, everything is done through AC; thus, `Add' is done as follows: put operand 1 in AC, add to memory location, result is put in AC; if necessary, i.e. we want to retain the result, the contents of the AC is now copied to memory.
The Mac-1a programmer has no access to the PC or other CPU registers. Also, for present purposes, assume that SP does not exist.
A limited version of the Mac-1 instruction set is shown in Figure 6.5. The columns are as follows:
Control of the CPU - fetch, decode, execute - is done by a microcontroller which obeys a program of microinstructions. We might think of the microcontroller as a black-box such as that shown in Figure 6.6. The microcontroller has a set of inputs and a set of outputs - just like any other circuit, ALU, multiplexer, etc. Therefore, instead of microprogramming, it can be made from logic hardware.
To design the circuit, all you have to do is prepare a truth-table (6 input columns - op-code (4 bits) and N, Z, 22 output columns), and generate the logic.
There is no reason why this hardware circuit could not decode an instruction in ONE clock period, i.e. a lot faster than the microcode solution.
The microprogrammed solution allows arbitrarily complex instructions to be built-up. It may also be more flexible, for example, there were many machines that users could microprogram themselves; and, there were computers which differed only by their microcode, perhaps one optimised for execution of C programs, another for COBOL programs.
On the other hand, if implemented on a chip, control store takes up a lot of chip space. And, as you can see by examining (Tanenbaum, 1990), microcode interpretation may be relatively slow -- and gets slower, the more instructions there are.
Figure 6.7 shows the full Mac-1 CPU with its microcontrol unit.
Machines with large sets of complex (and perhaps slow) instructions (implemented with microcode), are called CISC - complex instruction set computer.
Those with small sets of relatively simple instructions, probably implemented in logic are called RISC - reduced instruction set computer.
Most early machines - before about 1965 - were RISC. Then the fashion switched to CISC. Now the fashion is switching back to RISC, albeit with some special go-faster features that were not present on early RISC.
CISC machines are easier to program in machine and assembly code (see next chapter), because they have a richer set of instructions. But, nowadays, less and less programmers use assembly code, and compilers are becoming better. It comes down to a trade off, complexity of `silicon' (microcode and CISC) or complexity of software (highly efficient optimising compilers and RISC).
a0 = a1 + a2:
lodd a1 addd a2 stod a0
Taking into account the fetch-execute cycle, and that there is a controller
which also uses MAR and MBR, and assuming that the program starts at 100Hex
(lodd a1 is there), and that a0, a1, a2 are at 100Hex, 101Hex,
and 102Hex, respectively, describe precisely, and in order, all the data
travel along the bus, to and from memory. Distinguish addresses and data.
These are also recall type questions that appear as parts of examination questions.
(a) the purpose of the ALU; (b) the role of the MAR and MBR in CPU-memory interaction; (c) in addition to address, data, what additional information needs to be transferred via the 'system bus' between CPU and Memory; (d) the AC register; (e) the PC register; (f) the AMUX (A-multiplexer); (g) the A and B latches; (h) the F0F1 bits input to the ALU; (i) the N,Z bits output from the ALU.
(a) Is it possible to write, in one step, to transfer the output of the ALU/Shifter to both the A and B registers?
(b) Is it possible to write, in one step, to transfer the contents of both the AC and PC registers to the A Latch?