next up previous contents
Next: 7. Assembly Language Programming Up: Lecture Notes on Computer Previous: 5. The Components of

Subsections


6. The Central Processing Unit (CPU)

6.1 Introduction

This chapter gives some more detail on the Central Processing Unit (CPU) and leads up to where we can write significant programs in assembly/machine code. First we will give an overview of how a processor and memory function together to execute a single machine instruction - the famous fetch-decode-execute cycle.

A CPU consists of three major parts:

  1. The internal registers, the ALU and the connecting buses - sometimes called the data path;

  2. The input-output interface, which is the gateway through which data are sent and received from main memory and input-output devices;

  3. The control part, which directs the activities of the data path and input-output interface, e.g. opening and closing access to buses, selecting ALU function, etc. We will avoid going into much detail about the control.

A fourth part, main memory, is never far from the CPU but from a logical point of view is best kept separate.

We will pay most attention to the data path part of the processor, and what must happen in it to cause useful things to happen - to cause program instructions to be executed.

In the system we describe, the control part is implemented by microprogram, i.e. how the fetching, decoding and execution of a machine instruction can be implemented by execution of a set of sequencing steps called a microprogram. Note on terminology: the term microprogram was devised in the early 1950s (see (Ferry, 2003)), long before microprocessors were ever dreamt of.

6.2 The Architecture of Mic-1

Figure 6.1 shows the data path part of our hypothetical CPU from (Tanenbaum, 1990), page 170 onwards. Here, we briefly describe the components of Figure 6.1. Then we give a qualitative discussion of how it executes program instructions. Finally we describe the execution of instructions in some detail.

Figure 6.1: Mic-1 CPU (from Tanenbaum, Structured Computer Organisation, 3rd ed.)
\begin{figure}\centerline{
\hbox{
\psfig{figure=f2.eps}
}}\end{figure}

6.2.1 Registers

There are 16 identical 16-bit registers. But, they are not general purpose, each has a special use:

PC, program counter
The PC points to memory location that holds the next instruction to be executed;

AC, accumulator
The accumulator is like the display register in a calculator; most operations use it implicitly as an unmentioned input, and the result of any operation is placed in it.

For now, we can ignore all the others, though we give brief descriptions below.

SP, stack-pointer
Used for maintaining a data area called the stack; the stack is used for remembering where we came from when we call subprograms; likewise for remembering data when an interrupt is being processed; it is also used as a communication medium for for passing data to subprograms; finally, it is used as storage area for local variables in subprograms;

IR, Instruction Register
Holds the instruction (the actual instruction data) currently being executed.

TIR, Temporary Instruction Register
Holds temporary versions of the instruction while it is being decoded;

0, +1, -1
Constants; it is handy to have copies of them close by - avoids wasting time accessing main memory.

AMASK
Another constant; used for masking (anding) the address part of the instruction; i.e. AMASK and IR address.

SMASK
ditto for stack (relative) addresses.

A, B, ...F
General purpose registers; but general purpose only for the microprogrammer, i.e. the assembly language cannot address them.

6.2.2 Internal Buses

There are three internal buses, A and B (source) buses and C (destination) bus.

6.2.3 External Buses

The address bus and the data bus. Minor point to note: many buses, in particular those in many of the Intel 80X86 family, use the same physical bus (connections) for both address and data; it's simple to do - the control part of the bus just has to make sure all users of the bus know when it's data, when it's address.

6.2.4 Latches

A and B latches hold stable versions of A and B buses. There would be problems if, for example, AC was connected straight into the A input of the ALU and, meanwhile, the output of the ALU was connected to AC, i.e.. what version of AC to use; the answer would be continuously changing.

6.2.5 A-Multiplexer (AMUX)

The ALU input A can be fed with either: (i) the contents of the A latch; or (ii) the contents of MBR, i.e. what was originally the contents of a memory location.

6.2.6 ALU

In Mac-1a the ALU may perform just one of four functions:

0
$ A + B$; note `plus', rather than or; $ (F_1, F_0) = (0, 0)$;

1
$ A\ \mathbf{and}\ B$; $ (F_1, F_0) = (0, 1)$;

2
$ A$ straight through, $ B$ ignored; $ (F_1, F_0) = (1, 0)$;

3
$ \mathbf{not}\ A$; $ (F_1, F_0) = (1, 1)$.

Any other functions have to be programmed.

6.2.7 Shifter

The shifter is not a register - it passes the ALU output straight through: shifted left, shifted right or not shifted.

6.2.8 Memory Address Register (MAR) and Memory Buffer Register (MBR) and Memory

The MAR is a register which is used as a gateway - a `buffer' - onto the address bus. Likewise the MBR (it might be better to call this memory data register) for the data bus.

The memory is considered to be a collection of cells or locations, each of which can be addressed individually, and thus written to or read from. Effectively, memory is like an array in C, Basic or any other high-level language. For brevity, we shall refer to this memory `array' as $ M$ and the address of a general cell as $ x$ and so, the contents of the cell at address $ x$ as $ M[x]$, or $ m[x]$.

To read from a memory cell, the controller must cause the following to happen:

  1. Put an address, $ x$, in MAR;

  2. Requests read - by asserting a read control line;

  3. At some time later, the contents of $ x$, $ M[x]$ appear in MBR, from where, the controller can cause it to be ...

  4. ...transferred to the AC or somewhere else.

To write to a memory cell, the controller must cause something similar to happen:

  1. Put an address, $ x$, in MAR;

  2. Put the data in MBR;

  3. Requests write - by asserting a write control line;

  4. At some time later, the data arrive in memory cell $ x$.

It is a feature of all general purpose computers that executable instructions and data occupy the same memory space. Often, programs are organised so that there are blocks of instructions and blocks of data. But, there is no fundamental reason, except tidiness and efficiency, why instructions and data cannot be mixed up together.

6.2.9 Register Transfer Language

To describe the details of operation of the CPU, we use a simple language called Register Transfer Language (RTL). The notation is as follows.

$ M[x]$ denotes contents of location $ x$; sometimes $ m[x]$, or even just $ [x]$. Think of an envelope with £100 in it, and your address on it.

Reg denotes a register; Reg = PC, IR, AC, R1 or R2.

$ [M[x]]$ denotes contents of the address contained in $ M[x]$. Think of an envelope containing another envelope.

We use $ \leftarrow$ to denote transfer: $ A \leftarrow B$. Pronounce this as `A gets B'. In the case of $ A \leftarrow M[x]$, we say `A gets contents of x'.

6.3 Simple Model of a Computer - Part 3

Back in section 2.7, we produced a simple model of a computer. Here we show it again, Figure 6.2.

Figure 6.2: Mechanical Computer
\begin{figure}\begin{verbatim}+-- 0 --+-- 1 --+-- 2 --+-- 3 --+-- 4 --+
\ver...
...ert
\vert 1 2 3 \vert
\vert 0 \vert
+------------+\end{verbatim}
\end{figure}

At the end of section 2.7 we admitted that we had been telling only half the truth! And we admitted that we had to fit the program into memory as well. Fine, here goes. Were going to use the same program.

In this more realistic model, the person operating the CPU has no list of instructions available on the desk, but must read one instruction at a time from memory.

Recall what was needed: add the contents of memory cell 0 to the contents of memory cell 1, store the result in cell 2; if the result is greater-than-or-equal-to 40, put 1 in cell 3, otherwise put 0 in cell 3. (We are adding marks, and cell 3 contains an indicator of Pass (1) or Fail (0).

And the program, with appropriate numerical code (so that instructions be stored in memory). The numerically coded instruction is given in four Hexadecimal digits; the first digit gives the operation required (load, add, store, ...) - the opcode; the last three digits give the address or data - the operand.

The opcodes are as follows:

Figure 6.3: Opcodes
\begin{figure}\begin{verbatim}------------------------------------------------...
...-----------------------------------------------------\end{verbatim}
\end{figure}

I have to renumber the program steps from P1-P14 to P101 ..., for reasons which will soon become evident. Also, we will use hexadecimal numbering.

P101.
Load contents of memory 0 into AC. Code: 0 000

P102.
Add contents of memory 1 to contents of AC. Code: 2 001

P103.
Store the contents of AC in memory 2. Code: 1 002

P104.
Load the constant 40 into the AC. Code: 7 028 (40dec is 28Hex)

P105.
Store the contents of AC in memory 4. Code: 1 004

P106.
Load the contents of memory 2 into AC. Code: 0 002

P107.
Subtract contents of memory 4 from contents of AC. Code: 3 004

P108.
If AC is positive, jump to instruction P10c. Code: 4 10c

P109.
Load the constant 0 into the AC. Code 7 000

P10a.
Store the contents of AC in memory 3. Code 1 003

P10b.
Jump to P10e. Code 6 10e

P10c.
Load the constant 1 into the AC. Code 7 001

P10d.
Store the contents of AC in memory 3. Code 1 003

P10e.
Stop.

We now have to revise Figure 6.2 to show the program, Figure 6.4. The revisions are as follows:

Figure 6.4: Mechanical Computer with Program
\begin{figure}\begin{verbatim}+-- 0 --+-- 1 --+-- 2 --+-- 3 --+-- 4 --+
\ver...
...ert
\vert 1 2 3 \vert
\vert 0 \vert
+------------+\end{verbatim}
\end{figure}

In this revised model, the CPU operator has no list of instructions on his/her desk (the CPU); he/she must go through the following cycle of steps for each instruction step:

Fetch
(a) Take the number in the PC; (b) place it in MAR; (c) shout "Bus"; (d) add one to the number in the PC - to make it point to the next step; (e) wait until a new number arrives in the MBR; (f) take the number in the MBR and put it in the Instruction Register;

Decode
(a) Take the number in IR; (b) Take the top digit (opcode), look it up in Figure 6.3, and see what has to be done; (c) take the number in the bottom three digits - this signifies the operand.

Execute
Perform the action required. E.g. Add contents of memory 1 to contents of AC (2 001). Opcode is 2, operand is 001. We've already done this: (a) write 1 on a piece of paper and place it in MAR; (b) put a tick against Read; (c) shout "Bus"; (d) some time later, the contents of cell 1 (33) will arrive in MBR; (e) look at what is in AC and in MBR, use the calculator to add them (22 + 33); (f) write down a copy of the result and put it in AC. Thus, in the case shown, a piece of paper with 55 on it would be put in to AC.

If the operation is a jump, then all the operator does is take the operand (the jump-to address) and place it in the PC - thus stopping the PC pointing to the next instruction in sequence.

There we have it. The famous fetch-decode-execute cycle. The CPU is a pretty busy place!

6.4 The Fetch-Decode-Execute Cycle

How does the CPU and its controller execute a sequence of instructions? Let us start by considering the execution the instruction at location 0x100; what follows is an endless loop of the so-called fetch-decode-execute cycle.

Fetch:
read the next instruction and put it in the Instruction Register. Point to the next instruction, ready for the next Fetch.

Decode:
figure out what the instruction means;

Execute:
do what is required that instruction; if it is a JUMP type instruction, then revise the pointing to the jumped-to instruction. Go back to Fetch.

6.5 Instruction Set

We now examine the instruction set, by which assembly programmers can program the machine. We will call the machine Mac-1a; Mac-1a is a restricted version of Tanenbaum's Mac-1. The main characteristics of Mac-1a are: data word length 16-bit; address size 12-bits.

Exercise. What is the maximum number of words we can have in the main memory of Mac-1a? (neglect memory mapped input-output). How many bytes?

There are two addressing modes : immediate and direct; we will neglect Tanenbaum's local and indirect for the meanwhile.

It is accumulator based: that is, everything is done through AC; thus, `Add' is done as follows: put operand 1 in AC, add to memory location, result is put in AC; if necessary, i.e. we want to retain the result, the contents of the AC is now copied to memory.

The Mac-1a programmer has no access to the PC or other CPU registers. Also, for present purposes, assume that SP does not exist.

A limited version of the Mac-1 instruction set is shown in Figure 6.5. The columns are as follows:

Binary code for instruction.
I.e. what the instruction looks like in computer memory, Machine code.

Mnemonic.
The name given to the instruction. Used when coding in assembly code.

Long name.
Descriptive name for instruction.

Action.
What the instruction actually does, described formally in register transfer language (RTL).

Figure 6.5: Mac-1a Instruction Set (limited version of Mac-1)
\begin{figure}\begin{verbatim}-----------------------------------------------...
...-----------------------------------------------------\end{verbatim}
\end{figure}

6.6 Microprogram Control versus Hardware Control

Control of the CPU - fetch, decode, execute - is done by a microcontroller which obeys a program of microinstructions. We might think of the microcontroller as a black-box such as that shown in Figure 6.6. The microcontroller has a set of inputs and a set of outputs - just like any other circuit, ALU, multiplexer, etc. Therefore, instead of microprogramming, it can be made from logic hardware.

Figure 6.6: Controller Black-box, either Microcontroller or Logic
\begin{figure}\begin{verbatim}Op-Code eg. JUMP
Inputs: N Z 0 1 1 0
6 \vert \v...
...\vert \vert \vert \vert \vert \vert \vert \vert\end{verbatim}
\par\end{figure}

To design the circuit, all you have to do is prepare a truth-table (6 input columns - op-code (4 bits) and N, Z, 22 output columns), and generate the logic.

There is no reason why this hardware circuit could not decode an instruction in ONE clock period, i.e. a lot faster than the microcode solution.

The microprogrammed solution allows arbitrarily complex instructions to be built-up. It may also be more flexible, for example, there were many machines that users could microprogram themselves; and, there were computers which differed only by their microcode, perhaps one optimised for execution of C programs, another for COBOL programs.

On the other hand, if implemented on a chip, control store takes up a lot of chip space. And, as you can see by examining (Tanenbaum, 1990), microcode interpretation may be relatively slow -- and gets slower, the more instructions there are.

Figure 6.7 shows the full Mac-1 CPU with its microcontrol unit.

Figure 6.7: Mac-1 CPU including control (from Tanenbaum, Structured Computer Organisation, 3rd ed.)
\begin{figure}\centerline{
\hbox{
\psfig{figure=f3.eps}
}}\end{figure}

6.7 CISC versus RISC

Machines with large sets of complex (and perhaps slow) instructions (implemented with microcode), are called CISC - complex instruction set computer.

Those with small sets of relatively simple instructions, probably implemented in logic are called RISC - reduced instruction set computer.

Most early machines - before about 1965 - were RISC. Then the fashion switched to CISC. Now the fashion is switching back to RISC, albeit with some special go-faster features that were not present on early RISC.

CISC machines are easier to program in machine and assembly code (see next chapter), because they have a richer set of instructions. But, nowadays, less and less programmers use assembly code, and compilers are becoming better. It comes down to a trade off, complexity of `silicon' (microcode and CISC) or complexity of software (highly efficient optimising compilers and RISC).

6.8 Exercises

  1. Consider the Mac-1a assembly code to do the equivalent of: a0 = a1 + a2:

       lodd a1
       addd a2
       stod a0
    

    Taking into account the fetch-execute cycle, and that there is a controller which also uses MAR and MBR, and assuming that the program starts at 100Hex (lodd a1 is there), and that a0, a1, a2 are at 100Hex, 101Hex, and 102Hex, respectively, describe precisely, and in order, all the data travel along the bus, to and from memory. Distinguish addresses and data.

  2. Which Mac-1a instructions make use of the N, Z condition flags in their execution (I say execution to distinguish it from decode).

  3. Some machines allow assembly programmers access to registers such as A, B, C ...in the Mac-1 CPU. Why might programs be speeded up by using these registers? For example, let us assume that there are instructions such as LODDA address, LODDB address, ...which cause registers a, B, etc. to be loaded instead of AC; also, corresponding ADDDA, ADDDB, which cause registers A, B, etc. to be added to AC.

  4. If access to main memory is a bottleneck (the `von Neumann bottleneck'), think of ways a alleviate the problem; hint: find a definition of (a) cache memory, (b) pipelining.

  5. Using assembly language, show how to clear the accumulator on Mac-1A.

  6. Many machines have a HALT instruction - which causes the machine to stop dead at the current PC address. Using assembly language, show how to halt Mac-1A, e.g. make it sit at PC = 100 effectively doing nothing.

  7. Many machines have a NOOP - no-operation, i.e. the instruction wastes a bit of time and no other effect. Using assembly language, show how to program the same effect as a NOOP.

  8. List and describe two major shortcomings of Mac-1A, i.e. the limited set of instructions in Figure 6.5.

  9. Make a prioritized list of instructions you think would improve Mac-1A. Explain and give a rationale in each case.

  10. Assuming that Mac-1a uses 16-bit twos complement to store signed integers, why is it impossible to LOCO 'load constant' minus 1, or indeed, any negative number. Suggest a method to circumvent this problem - which actually impacts positive numbers as well.

6.9 Self assessment questions

These are also recall type questions that appear as parts of examination questions.

  1. Figure 6.1 depicts a Central Processing Unit. Briefly describe:

    (a) the purpose of the ALU; (b) the role of the MAR and MBR in CPU-memory interaction; (c) in addition to address, data, what additional information needs to be transferred via the 'system bus' between CPU and Memory; (d) the AC register; (e) the PC register; (f) the AMUX (A-multiplexer); (g) the A and B latches; (h) the F0F1 bits input to the ALU; (i) the N,Z bits output from the ALU.

  2. Figure 6.1 depicts a Central Processing Unit.

    (a) Is it possible to write, in one step, to transfer the output of the ALU/Shifter to both the A and B registers?

    (b) Is it possible to write, in one step, to transfer the contents of both the AC and PC registers to the A Latch?

  3. In the context of Figure 6.1 describe the fetch-decode-execute cycle.

  4. In the context of Figure 6.1 and the instruction lodd 500 describe the fetch-decode-execute cycle.

  5. In the context of Figure 6.1 and the instruction stod 500 describe the fetch-decode-execute cycle.

  6. In the context of Figure 6.1 and the instruction addd 501 describe the fetch-decode-execute cycle.

  7. Why must a normal arithmetic-and-logic-unit (ALU) have, in addition to one data output (the result), extra output(s)?

  8. Why, in general, will a machine with cache memory run faster?

  9. At any instant, a (data) bus can have many receivers/listeners, but only one active transmitter/talker; why? explain.


next up previous contents
Next: 7. Assembly Language Programming Up: Lecture Notes on Computer Previous: 5. The Components of

平成17年1月9日