The instruction set (Mac-1a) introduced in chapter 7 was severely limited - particularly in its inability to call subprograms. We now extend our coverage to the remainder of the Mac-1 instruction set, and attempt some more ambitious programs.
Firstly, we will introduce the additional instructions - especially those based on the stack; we will explain the purpose of a stack and its use for handling subprograms. Next, we will show how some more real programs can be constructed, from simple three or four liners, to subroutines, input-output and interrupts.
In addition, we will describe the memory addressing modes found on most general purpose computers.
Although this chapter is entirely about Mac-1, the presentation is such that the principles of general-purpose computers are emphasised. Thus, someone who follows this chapter will have little difficulty in understanding Motorola 68000, Intel 80x86 series (Pentium), or virtually any existing computer.
Figure 8.1 shows the Mac-1 instruction set extended to the full repertoire given in (Tanenbaum, 1990); we do not bother with the binary version of the instruction - as in Figure 6.5 and Figure 7.1, since we will not be assembling programs using the additional instructions, nor writing them in machine code.
jneg x Jump on negative if ac<0 : pc <- x jnze x Jump on nonzero if ac!=0 : pc <- x
These save some of the trouble encountered using just jpos and
jzer; however, as we have seen, they are not essential.
The stack is a special memory area. Although we give a lot of detail below, you will be able to get by knowing the major uses of the stack. In general, the stack is somewhere for the CPU to place a data item that it needs to memorise temporarily:
In Mac-1, the register SP is the stack-pointer and is dedicated to maintaining the stack; the stack itself - the data pointed-to - is actually part of main-memory.
The stack is a memory into which values can be stored and from which they can be retrieved on a last-in-first-out (LIFO) basis. Ideally, you store with a PUSH and retrieve with a POP. It may help to think of an analogy such as a spring loaded canteen tray dispenser, or a bus conductor's coin dispenser; the main point is that you can only put on the top (PUSH), get from the top (TOP) or remove the value at the top (POP). In spite of its simplicity this device has a remarkably large impact on the computational capability of a computer. A stack gives us a sort-of indirect addressing and also a sort-of indexed addressing via a the stack pointer; but a stack does much more than that, it is the basis of the implementation of functions and procedures, and blocks in block-structured high-level languages.
SP points to the top of the stack - i.e. to the memory location where the last value was pushed.
push Push onto stack sp <- sp-1; m[sp] <- ac pop Pop off stack ac <- m[sp]; sp <- sp+1
In the case of Mac-1, the stack grows from high memory towards low
memory. push increases the size of the stack by one and places a
value in the new memory cell (at the top). pop exactly reverses the
process, i.e.. retrieves the last value written (the top) and
decreases the size of the stack. push followed by pop has exactly
no effect. And, as usual with Mac-1 most things are done through
the accumulator (AC); push pushes the number in the AC, and
pop
removes the top of the stack and places it in the AC.
push operates as follows:
push /sp <- sp - 1 ;SP decremented, NB. this INCREASES
size of stack.
/m[sp] <- ac ;put contents of AC into the memory
cell that SP POINTS TO
pop operates as follows:
pop ;ac <- m[sp] ;get contents of cell pointed to by
SP, into AC.
;sp <- sp + 1 ;decrease size of stack.
Note carefully again that the stack actually grows downwards, one word at a time - actually this is the case on a great many machines. Normally, in Mac-1 programs, we will assume that SP starts off pointing at memory cell 4020.
Example. The tables below show how the state of the stack and memory cells change, in response to the following code fragment, (assume SP initially set to 4020, and that a0 is at 500, and contains 30, that a1 is at 501 and contains 91):
/(a)
lodd a0 /ac <- [a0] (=30)
push /(b)
lodd a1 /ac <- [a1] (=91)
push /(c)
pop /ac <- m[sp]; sp <- sp -1
stod a0 /(d)
pop /
stod a1 /(e)
In examples like this, to show the address of a memory cell and what it contains, we use the notation:
address: contents
500: 30
a0 500: 30
a1 501: 91
AC : ? 4018: ?
4019: ?
SP: 4020 --points to---> 4020: ?
a0 500: 30
a1 501: 91
AC : 30 4018: ?
+->4019: 30
SP: 4019 --points to--+ 4020: ?
a0 500: 30
a1 501: 91
AC : 91 +-->4018: 91
| 4019: 30
SP: 4018 --points to--+ 4020: ?
a0 500: 91
a1 501: 91
AC : 91 4018: ?
+->4019: 30
SP: 4019 --points to--+ 4020: ?
a0 500: 91
a1 501: 30
AC : 30 4018: ?
4019: ?
SP: 4020 --points to---->4020: ?
Comments:
pshi Push indirect sp <- sp-1; m[sp]<-m[ac] popi Pop indirect m[ac] <- m[sp]; sp <- sp+1
Thus, the AC is used `indirectly' - see indirect addressing mode, section 8.13; i.e. the value that is contents of the memory cell that is pointed-to by [ac] is pushed and popped.
call and retn are used for calling subprograms (methods or
functions in Java) and RETurNing from them.
call performs three steps.
call x Call procedure sp<-sp-1; make room on stack for PC
m[sp] <- pc; save PC
pc <- x; put jump-to address in PC
call causes all of the following to happen:
JUMP label, but the important difference is that we remember where
we were in the calling program, i.e. we must remember where we came from,
so that we can get back there again.
retn performs two steps.
retn Return from procedure
pc <- m[sp]; take top of stack and put in PC
sp <- sp + 1; decrease size of stack, i.e. delete what was
on the top.
retn causes all of the following to happen:
Note: In computer science, the terms subprogram, subroutine,
procedure, method (in object-oriented programming languages
such as Java), function (in C and C++) are largely equivalent. As
mentioned above, call and retn are used for CALLing
subroutines and RETurNing
from them. One major objective of subroutines is to avoid having to repeat
large chunks of code.
In section 2.6 we showed some cookery recipes. In those, notice how the writer of recipes can improve the readability of the cookery book by avoiding repetition of common sub-recipes, e.g. making a sauce, that crop up frequently, let's say 20 times. Not only does this decrease the size of the cookery book (19 half pages saved), but also increases readability of the book; in addition, it means that if the sub-recipe is to be altered, it need be altered only in one place, rather than 20.
I'll start off with a simple example; this example may fail to impress you;
if so, imagine that it is something large an complex, e.g. reading a string
from the keyboard, see section 8.10 -- such that (i) you
wouldn't want to type more than once; (ii) would use up a large amount of
memory for each repetition; (iii) would, if such in here there and
everywhere, would hinder the readability of the program; and, finally, (iv)
if it ever had to be changed (e.g. from y = x * 4 + 3; to
y = x * 5 + 9;), you would prefer to have to change it in one place
only.
y = x * 4 + 3;
which can be written in assembly language as:
loco 3
addd x
addd x
addd x
addd x
stod y
However, for reasons that will soon become clear, I want to make this code a
little more general (not specific to x); let's write a program that
multiplies the AC by four and adds three, leaving the result in the AC.
Because we cannot add constants such as
, we'll have to change the
program a little and use a temporary variable tmp. In addition, we'll
give it a label - so that we can jump to it when needed.
/ this program multiplies AC by four and adds 3, leaving the result in AC
m4p3 stod tmp /200
loco 3 /201
addd tmp /202
addd tmp /203
addd tmp /204
addd tmp /205
From now on, let us assume that m4p3 is assembled and loaded at
address 200.
Now, let us say we have:
a2 = a1 * 4 + 3;
a4 = a3 * 4 + 3;
a6 = a5 * 4 + 3;
and we want to use just one copy of m4p3.
A simplistic solution would be:
startprog: (assume startprog is at 100)
lodd a1 /100
jump m4p3 /101
stod a2 /** /102
lodd a3
jump m4p3
stod a4
lodd a5
jump m4p3
stod a6
But, without subprograms, the is a major problem: the program never gets to
`**', because when
it's finished m4p3 it continues on to the next
instruction after 205 and not 105 as desired.
Note, however, there is nothing to stop you (at 206) JUMPing to 102,
but this defeats the whole purpose of the subprogram -- you won't be able
to use it for a3 or anywhere else.
This gets us to one of the crucial differences between JUMP and CALL (subprogram). With JUMP, it's a one way ticket, you don't ever come back! With CALL you can remember where you came from and JUMP back there (using RETN) when you're finished in the subprogram.
Figure 8.2 shows how we can make m4p3 into a
proper subprogram -- all we need to do is add retn at the end; the
label m4p3 is all we need to name it.
/ this subprogram multiplies AC by four and adds 3, leaving the result in AC
m4p3 stod tmp /200
loco 3 /201
addd tmp /202
addd tmp /203
addd tmp /204
addd tmp /205
retn
And here is how to call it.
lodd a1
call m4p3
stod a2
lodd a3
call m4p3
stod a4
lodd a5
call m4p3
stod a6
Figure 8.2 shows the sequence of actions. Note: there is no explicit use of the stack, all PUSHes and POPs are done implicitly by CALL, RETN.
Subprogram m4p3 above uses AC to pass parameters. To make it
completely general, we need to use the stack for passing parameters to the
subprogram and returning results from it. In addition, the use of tmp
is messy, and bad practice.
First, we must introduce load, store and arithmetic instructions that operate on stack memory.
lodl x Load local ac <- m[sp+x] stol x Store local m[sp+x] <- ac addl x Add local ac <- ac + m[sp+x] subl x Subtract local ac <- ac - m[sp+x]
The term local come from local variables - local to
the subprogram. From now on, if we want to pass something to a subprogram,
we push it, and if we need store a value in a temporary variable, we
also push it.
Each of these instructions allows you to access the memory cell x
below the top of the stack; for example,
lodl 0 /loads into AC the most recently pushed value
/NOT the same as pop, as nothing is removed from
/the stack
lodl 1 /loads into AC the last pushed but one
stol 2 /stores value in AC into cell 2 from top
/NOT the same as PUSH, as no new space is
/created on stack, i.e. you may be overwriting
/something valuable.
Beware, when using the stack, it is easy to unintentionally overwrite important values, e.g. the return address of a subprogram, or to write to parts of memory that are not really part of the stack.
Now, we can revise m4p3.
/ this subprogram multiplies AC by four and adds 3, leaving the result in AC
m4p3 loco 3
addl 1
addl 1
addl 1
addl 1
stol 2
retn
Notice how the stack removes any need for named temporary variables.
Question: why addl 1 and not addl 0. Answer: because the
return address is at 0 - it was the last thing pushed.
And here is how to call it.
loco 0 /actually, you could push anything
/what is important is making space for the result
push /make space for return value
lodd a1
push /push input value
call m4p3
pop /pop input value (remove from stack)
pop /pop output value
stod a2
loco 0
push
lodd a3
push
call m4p3
pop
pop
stod a4
etc...
In connection with subprograms, there are four uses for the stack:
In general, the stack looks like Figure 8.3. This is called the subprogram's stack frame.
If you called the procedure m4p3 recursively three times - or,
indeed, there were three nested calls of different procedures - the
situation would look like Figure 8.4. In this way a procedure
can call itself again and again, without
one call interfering with the other; the only limit is the size of
the stack.
In the scheme mentioned in the previous two subsections, parameters are
passed by by value/copy. Thus, subprogram m4p3 can do whatever
it likes to the memory location that contains the input values -- the
copy of a1 (the parameter) is on the stack and a1
itself are separate and so a1 in the
caller will never change; in fact, subprogram m4p3 can treat it as a
local variable.
Subprograms which use the stack for passing parameters, and for their working (local) storage can be in use by more than one process at a time (e.g. in a multitasking operating system); such subprograms are called reentrant.
If a subprogram used global data, or used some local storage in its own program space - rather than using the stack, then the different (multiple and simultaneous) users of it would get their data mixed up.
Multitasking operating systems make much use of reentrant subprograms - there needs to be just one copy of the subprogram, even if it is being used by a great many processes.
As indicated above, a non-subprogram solution could have
been used to repeat the mp4p3 code as
many times as was required. And as we mentioned, this repetition of code would
have made the overall program larger, as well as other more serious
problems.
Now, if we are content to accept the increase in program size, use of a macro avoids the other problems (i.e. more than one copy to maintain, difficulty of reading code with large chunks repeated).
Essentially, you declare a macro containing the working bits of the subroutine (no need for the housekeeping bits at the top and bottom) and then insert the macro code wherever the CALL appears. Macros are used whenever you want to trade memory for speed -- you waste no time PUSHing and POPping the stack.
There are no direct instructions for input- output; instead Mac-1a uses memory-mapped input-output, whereby some memory cells are mapped to input-output ports; for simplicity we assume that there are only two ports, one connected to a standard-input device, the other connected to a standard-output device:
Note: recall that 0x signifies Hexadecimal.
We assume that each device works with bytes (i.e. 8-bits).
A read from address 0xFFC yields a 16-bit word, with the actual data byte in the lower order byte. There is no use in reading the input port until the connected device has put the data there: so 0xFFD is used to read the input status register; the top bit (sign) of 0xFFD is set when the input data is available (DAV).
Thus, a read routine should go into a tight loop, continuously reading 0xFFD, until it goes negative; then 0xFFC can be read to get the data. Reading 0xFFC clears 0xFFD again.
Output, to 0xFFE, runs along the same lines as input. A write to 0xFFE will send the lower order byte to the standard-output device. The sign bit of 0xFFFH signifies that the device is in a ready to receive (RDY) state; again there is no use writing data to the output port until the device is ready to read it.
testStatus: lodd fff /read status
jpos testStatus /not ready
jzer testStatus /not ready
out: lodd 500
stod ffe /output
testStatus: lodd fff /read status
jpos testStatus /not ready
jzer testStatus /not ready
out: lodd 500
stod ffe /output
is unsatisfactory for many purposes; what happens, for example, if the output device is broken, or is switched off, and, as a consequence never becomes ready; the program would stay in the tight loop and the only way to stop it would be to reset/reboot.
Change the code to count its `not-ready' failures and, if this count ever reaches `maxcount' (e.g. maxcount = 100) to put -1 in AC and JUMP to label `exit'.
The scheme of input-output outlined above is called polled input-output(I/O). Polled I/O is unsatisfactory for two major reasons:
One solution: change the code to count its not-ready failures and,
if this count ever reaches maxcount, e.g. maxcount = 100, to put
-1 in AC and JUMP to label exit.
A partial solution is as follows: control does not stay in the tight loop, but the CPU goes off and does other things, returning now and again to check status. But, interrupts provide the real solution.
[NB. Mac-1 has no interrupts -- the following is modelled on interrupts on the 80X86].
As we have indicated in the previous section, it would be intolerable to have the CPU wasting its time constantly monitoring input (and output) status registers.
Consider the case of the simple case of a keyboard (GUI interfaces with a mouse present an even greater problem). In Windows and other operating systems the keyboard is read even when the computer is away running another part of the program. This is done with a special type of subroutine call - an interrupt.
When you hit a key on the keyboard, something like the following happens:
Note: These hardware actions are invisible to the programmer; i.e. they do not need any assembly/machine code.
Why address of address? The answer is simple -- for flexibility. It is nice not to force OS manufacturers to use fixed addresses for ISRs. With the interrupt vector containing a fixed address that points to a variable address, we have the flexibility of changing the variable address. In fact, low memory (addresses 0-1023) is reserved for a 256 entry interrupt vector table (IVT); four bytes (32 bits) per interrupt vector.
In other words, the interrupt vector of the keyboard never changes, but the driver software (ISR -- subprogram) may be placed anywhere as long as the address in the interrupt vector table (IVT) is kept updated.
lodd a1
subd a2 /compare a1 and a2
jzer xyz /if equal jump to xyz, i.e. if Z flag is 1, jump
The CPU can be interrupted just as it is about to fetch jzer xyz; it
then goes off and services that interrupt (executes the ISR); what was in
flag Z will have been overwritten, and a wrong decision to jump or not jump
may occur on return from the ISR.
0x 0070 07FB); 0x 0070 07FB is the actual address of the
interrupt service routine;
CALL 0070 07FB
We are now in the interrupt service routine; i.e. this part is programmed.
It would be easy to say that IRQs are the same as interrupt vectors; that is close to the truth, but not quite. In fact, all interrupts go through an intermediate devise, a PIC (Programmable Interrupt Controller); when a device wants to interrupt, it sends its IRQ (number) to the PIC; the PIC translates IRQ to interrupt vector and passes the request to the CPU. The story then continues as above.
A key factor is the transparency of interrupts. The interrupt
causes the service routine to run, but when that routine is
finished, and IRET executed, the executing program should be none
the wiser -- except, maybe, it notices that an instruction took 20
or 30
-sec to run, instead of just 1
-sec; and, of course,
there will be another character in the input buffer.
(a) what is the total time for each interrupt?
(b) estimate the highest interrupt frequency that may occur?
Assume that there are no other generators of interrupts.
In addition to the term polled I/O, we have the term programmed I/O; programmed I/O refers to the practice of the CPU reading each byte into the accumulator (LODD) and then storing it in memory (STOD). This is inefficient: (i) like polled I/O it keeps the CPU occupied rather inefficiently; (ii) all data must pass twice through the system bus. Programmed I/O may be used in an interrupt service routine.
Devices like disk or tape may require very rapid data transfer of data from the device to memory and vice-versa; we cannot tolerate any inefficiency.
In addition, if you look at the system diagram in Figure 8.5, you will see that both the disk (for example) and memory are connected to the system bus. Hence, there may be little requirement for the CPU to get involved in a data transfer, except to initiate it. Typically, a third device will be connected to the bus - a DMA controller - which mediates between the two data transfer devices and ensures an orderly use of the bus. In this case, the data passes through the system bus only once.
Recalling Mac-1a, and memory read/write: during a LODD, for example, Mac-1a would place the address to be read in MAR, and issue an RD. In a machine supporting DMA, DMA would be able to capture the bus (since I/O devices can lose data if they are kept waiting, DMA would have highest priority access to the bus). In addition, the CPU would need to use some sort of protocol to determine when data finally arrives following an RD.
When DMA is in operation, the only effect on the CPU and executing program is some slowing down, because the DMA must steal bus cycles -- so called cycle-stealing -- for the data transfer. There is only one bus, so traffic must take its turn. The degree of CPU slowing down will depend of how fast the DMA transfer can take place. If the DMA transfer can use the full data rate of the bus, then the CPU will stop for a while.
In a computer, memory locations can hold instructions or data. In addition, as we shall see the data can be interpreted either as a plain value, e.g. 100, or as an address or reference to another data item. Those of who are familiar with C/C++ will recognise pointers; and Java people will recognise references.
In general, machine instructions usually take zero, one, or two
operands (e.g. in lodd a0; a0 is the single operand;
lodd is the operation;
Actually Mac-1 has no multi-operand instructions. For a start, it is an
accumulator machine, i.e. in instructions like addd, lodd, the
second - implicit - operand is AC, the accumulator.
Operands can be data, or can refer to data - i.e. address of data,
or can be labels - which translate to addresses - of instructions,
e.g. for jumps.
The question of addressing is concerned with how operands are interpreted. In the case of data the operand can be:
LOCO 5; 0111 0000 0000 0101;
0000 0101 0000 0001 ; lodd 501;
If memory cell 501 contains `7'
+--------------+
501: | ... 0111 |
+--------------+
it is the `7' that gets loaded into the AC.
xxxx 0101 0000 0001 lodi 501; if memory cell 501 contains
`50a', and memory cell 50a contains `3', we have:
+--------------+
501: | 50a |
+--------------+
...
+--------------+
50a: | 3 |
+--------------+
and it is the `3' that gets loaded into the AC.
lodd 1230
loop: addi 1230+SI ;ac<- [ac]+m[1230+si]
inc SI
...test for end
jump loop
Of course, because Mac-1 has no index registers, I have had to invent one (SI), and an addressing mode for add: addi - add direct using SI as an index; also I had to invent an `increment instruction. Indexing is a bit like running your finger along a table of figures. The instructions 'lodl'...'subl' introduced below are a bit like indexed - but they use a very special index register called the stack-pointer (SP).
Addressing modes look complicated, but if you are careful to analyse what a construct means - by drawing a diagram, if necessary - then there are no real pitfalls.
Also, for those who are not specialists in assembly programming, you should keep to the simple modes and only use the complex modes when they are absolutely essential.
addr. 20 contains 40 addr. 30 contains 50 addr. 40 contains 60 addr. 50 contains 70 (1) load immediate 20 (2) load direct 20 (3) load indirect 20 (4) load immediate 30 (5) load direct 30 (6) load indirect 30
c = 0x45; error = outch(c,1000);
error = write4(a,b,c,d);.
in AC. If it gets END-OF-INPUT, it
returns
These are also recall type questions that appear as parts of examination questions.
push, pop.
(b) Consider the following program:
loco 29 stod a1 loco 31 stod a2 /1 lodd a1 /2 push /3 lodd a2 /4 push /5 pop /6 stod a1 /7 pop /8 stod a2 /9
(i) after /1 what is in a1, a2, the AC register?
(ii) after /2 what is in a1, a2, the AC register?
(iii) after /3 what is in a1, a2, the AC register, top of stack?
(iv) after /4 what is in a1, a2, the AC register, top of stack?
(v) after /5 what is in a1, a2, the AC register, top of stack?
(vi) after /6 what is in a1, a2, the AC register, top of stack?
(vii) after /7 what is in a1, a2, the AC register, top of stack?
(viii) after /8 what is in a1, a2, the AC register, top of stack?
(ix) after /9 what is in a1, a2, the AC register, top of stack?
main1: loco 22
stod a1
call sub1
lodd a1 /1
sub1: loco 1
addd a1
stod a1
retn
main2: loco 22
stod a1
jump sub1
lodd a1 /2
sub2: loco 1
addd a1
stod a1
retn
Write a fragment of Mac-1 code that will read from the standard input device into location 1000; include appropriate comments.
(b) Describe a situation in which programmed input-output would be appropriate.
Input: data mapped to 4092/0xFFC (lower-order byte = data byte); status mapped to 4093/FFD (sign bit set denotes 'data available').
Output, mapped to 4094/0xFFE (lower-order byte); status 4095/0xFFF (sign bit set denotes 'ready') ..."
(a) Briefly, explain the principle used in both 'read' and 'write' operations.
(b) Outline a fragment of program (using pseudo-code or assembly code - see Figure XX) that will write the contents of the lower- order byte of address 500 to the output device mentioned in the first part of the question.
(c) Explain, briefly, why programmed input-output is unsatisfactory for many applications and how interrupts can provide some remedy.
call and retn; you should mention the roles of
the stack and stack pointer; illustrate your answer with appropriate
examples / diagrams. In your answer please mention the major reasons why
call and retn cannot be replaced by simple jump instructions.
(b) In a certain computer system the time taken for the processor to recognise and acknowledge an interrupt is 4 microsecs; it takes 10 micro secs to save OR restore the PC and other registers. If the execution time for the interrupt handler instructions for peripheral X is 70 micro secs,
(i) what is the total time for each interrupt?
(ii) estimate the highest interrupt frequency that may occur?
Assume that there are no other generators of interrupts.