next up previous contents
Next: 9. Introduction to Operating Up: Lecture Notes on Computer Previous: 7. Assembly Language Programming

Subsections


8. Further Assembly Programming

8.1 Introduction

The instruction set (Mac-1a) introduced in chapter 7 was severely limited - particularly in its inability to call subprograms. We now extend our coverage to the remainder of the Mac-1 instruction set, and attempt some more ambitious programs.

Firstly, we will introduce the additional instructions - especially those based on the stack; we will explain the purpose of a stack and its use for handling subprograms. Next, we will show how some more real programs can be constructed, from simple three or four liners, to subroutines, input-output and interrupts.

In addition, we will describe the memory addressing modes found on most general purpose computers.

Although this chapter is entirely about Mac-1, the presentation is such that the principles of general-purpose computers are emphasised. Thus, someone who follows this chapter will have little difficulty in understanding Motorola 68000, Intel 80x86 series (Pentium), or virtually any existing computer.

8.2 Mac-1 Instruction Set Extensions

Figure 8.1 shows the Mac-1 instruction set extended to the full repertoire given in (Tanenbaum, 1990); we do not bother with the binary version of the instruction - as in Figure 6.5 and Figure 7.1, since we will not be assembling programs using the additional instructions, nor writing them in machine code.

Figure 8.1: Complete Mac-1 Instruction Set
\begin{figure}\begin{verbatim}Mnemonic Name Action(s)
-------- ---- ---------
...
... 0 - 4095
3. y is an address offset in range 0 - 255\end{verbatim}
\end{figure}

8.2.1 The Additional Jumps

jneg x    Jump on negative    if ac<0 : pc <- x
jnze x    Jump on nonzero     if ac!=0 : pc <- x

These save some of the trouble encountered using just jpos and jzer; however, as we have seen, they are not essential.

8.3 The Stack

The stack is a special memory area. Although we give a lot of detail below, you will be able to get by knowing the major uses of the stack. In general, the stack is somewhere for the CPU to place a data item that it needs to memorise temporarily:

In Mac-1, the register SP is the stack-pointer and is dedicated to maintaining the stack; the stack itself - the data pointed-to - is actually part of main-memory.

The stack is a memory into which values can be stored and from which they can be retrieved on a last-in-first-out (LIFO) basis. Ideally, you store with a PUSH and retrieve with a POP. It may help to think of an analogy such as a spring loaded canteen tray dispenser, or a bus conductor's coin dispenser; the main point is that you can only put on the top (PUSH), get from the top (TOP) or remove the value at the top (POP). In spite of its simplicity this device has a remarkably large impact on the computational capability of a computer. A stack gives us a sort-of indirect addressing and also a sort-of indexed addressing via a the stack pointer; but a stack does much more than that, it is the basis of the implementation of functions and procedures, and blocks in block-structured high-level languages.

SP points to the top of the stack - i.e. to the memory location where the last value was pushed.

8.3.1 Direct Accumulator-Stack Instructions

push      Push onto stack     sp <- sp-1; m[sp] <- ac
pop       Pop off stack       ac <- m[sp]; sp <- sp+1

In the case of Mac-1, the stack grows from high memory towards low memory. push increases the size of the stack by one and places a value in the new memory cell (at the top). pop exactly reverses the process, i.e.. retrieves the last value written (the top) and decreases the size of the stack. push followed by pop has exactly no effect. And, as usual with Mac-1 most things are done through the accumulator (AC); push pushes the number in the AC, and pop removes the top of the stack and places it in the AC.

push operates as follows:

     push      /sp <- sp - 1  ;SP decremented, NB. this INCREASES
                              size of stack.
               /m[sp] <- ac   ;put contents of AC into the memory
                              cell that SP POINTS TO

pop operates as follows:

     pop       ;ac <- m[sp]   ;get contents of cell pointed to by
                              SP, into AC.
               ;sp <- sp + 1  ;decrease size of stack.

Note carefully again that the stack actually grows downwards, one word at a time - actually this is the case on a great many machines. Normally, in Mac-1 programs, we will assume that SP starts off pointing at memory cell 4020.

Example. The tables below show how the state of the stack and memory cells change, in response to the following code fragment, (assume SP initially set to 4020, and that a0 is at 500, and contains 30, that a1 is at 501 and contains 91):

                    /(a)
          lodd a0   /ac <- [a0] (=30)
          push      /(b)
          lodd a1   /ac <- [a1] (=91)
          push      /(c)
          pop       /ac <- m[sp]; sp <- sp -1
          stod a0   /(d)
          pop       /
          stod a1   /(e)

In examples like this, to show the address of a memory cell and what it contains, we use the notation:

          address: contents
          500:      30

At the beginning (a):

     a0   500: 30
     a1   501: 91
          AC : ?                   4018:      ?
                                   4019:      ?
          SP: 4020 --points to---> 4020:      ?

At (b):

     a0   500: 30
     a1   501: 91
          AC : 30                  4018:      ?
                                +->4019:      30
          SP: 4019 --points to--+  4020:      ?

At (c):

     a0   500: 30
     a1   501: 91
          AC : 91               +-->4018:      91
                                |   4019:      30
          SP: 4018 --points to--+   4020:      ?

At (c):

     a0   500: 91
     a1   501: 91
          AC : 91                  4018:      ?
                                +->4019:      30
          SP: 4019 --points to--+  4020:      ?

At (e):

     a0   500: 91
     a1   501: 30
          AC : 30                  4018:      ?
                                   4019:      ?
          SP: 4020 --points to---->4020:      ?

Comments:

  1. The contents of A0 and a1 have been swapped; if we had wanted the same values POPped as PUSHed we would have had to POPped in the reverse order of PUSHing;

  2. Once the SP moves back (after POP) we show ? in the stack area; the value would probably remain, but it would be exceptionally foolish to rely on this happening - as we shall see later when we mention interrupts.

8.3.2 Indirect Accumulator-Stack Instructions

pshi      Push indirect       sp <- sp-1; m[sp]<-m[ac]
popi      Pop indirect        m[ac] <- m[sp]; sp <- sp+1

Thus, the AC is used `indirectly' - see indirect addressing mode, section 8.13; i.e. the value that is contents of the memory cell that is pointed-to by [ac] is pushed and popped.

8.3.3 Call and Return - CALL and RETN

call and retn are used for calling subprograms (methods or functions in Java) and RETurNing from them.

8.3.4 CALL

call performs three steps.

call x    Call procedure   sp<-sp-1;     make room on stack for PC
  
                           m[sp] <- pc;  save PC

                           pc <- x;      put jump-to address in PC

call causes all of the following to happen:

  1. Decrement the stack pointer - so that we will not overwrite last thing put on stack,

  2. The contents of PC - which is pointing to NEXT instruction, the one just after the CALL - is pushed onto the stack, and,

  3. Jump to `x', which is the address of the start of the subprogram is put in the PC register, this is all a jump does. Thus, we go off to the subprogram - just as in JUMP label, but the important difference is that we remember where we were in the calling program, i.e. we must remember where we came from, so that we can get back there again.

8.3.5 RETN

retn performs two steps.

retn      Return from procedure   

                pc <- m[sp]; take top of stack and put in PC 

                sp <- sp + 1; decrease size of stack, i.e. delete what was
                                                           on the top.

retn causes all of the following to happen:

  1. Pops the stack, to yield an address; if program is correct, the top of the stack will contain the address of the next instruction after the call from which we are returning; it is this instruction with which we want to resume in the calling program;

  2. Jump to the popped address, i.e. put the address into the PC register.

8.4 Subprograms

8.4.1 Introduction

Note: In computer science, the terms subprogram, subroutine, procedure, method (in object-oriented programming languages such as Java), function (in C and C++) are largely equivalent. As mentioned above, call and retn are used for CALLing subroutines and RETurNing from them. One major objective of subroutines is to avoid having to repeat large chunks of code.

In section 2.6 we showed some cookery recipes. In those, notice how the writer of recipes can improve the readability of the cookery book by avoiding repetition of common sub-recipes, e.g. making a sauce, that crop up frequently, let's say 20 times. Not only does this decrease the size of the cookery book (19 half pages saved), but also increases readability of the book; in addition, it means that if the sub-recipe is to be altered, it need be altered only in one place, rather than 20.

I'll start off with a simple example; this example may fail to impress you; if so, imagine that it is something large an complex, e.g. reading a string from the keyboard, see section 8.10 -- such that (i) you wouldn't want to type more than once; (ii) would use up a large amount of memory for each repetition; (iii) would, if such in here there and everywhere, would hinder the readability of the program; and, finally, (iv) if it ever had to be changed (e.g. from y = x * 4 + 3; to y = x * 5 + 9;), you would prefer to have to change it in one place only.

          y = x * 4 + 3;

which can be written in assembly language as:

          loco 3
          addd x
          addd x
          addd x
          addd x
          stod y

However, for reasons that will soon become clear, I want to make this code a little more general (not specific to x); let's write a program that multiplies the AC by four and adds three, leaving the result in the AC.

Because we cannot add constants such as $ 3$, we'll have to change the program a little and use a temporary variable tmp. In addition, we'll give it a label - so that we can jump to it when needed.

/ this program multiplies AC by four and adds 3, leaving the result in AC
m4p3      stod tmp         /200
          loco 3           /201
          addd tmp         /202
          addd tmp         /203
          addd tmp         /204
          addd tmp         /205

From now on, let us assume that m4p3 is assembled and loaded at address 200.

Now, let us say we have:

          a2 = a1 * 4 + 3;
          a4 = a3 * 4 + 3;
          a6 = a5 * 4 + 3;

and we want to use just one copy of m4p3.

8.4.2 The Wrong Way - using JUMP!

A simplistic solution would be:

startprog: (assume startprog is at 100)
          lodd a1              /100
          jump m4p3            /101
          stod a2   /**        /102
          lodd a3
          jump m4p3   
          stod a4
          lodd a5
          jump m4p3   
          stod a6

But, without subprograms, the is a major problem: the program never gets to `**', because when it's finished m4p3 it continues on to the next instruction after 205 and not 105 as desired.

Note, however, there is nothing to stop you (at 206) JUMPing to 102, but this defeats the whole purpose of the subprogram -- you won't be able to use it for a3 or anywhere else.

This gets us to one of the crucial differences between JUMP and CALL (subprogram). With JUMP, it's a one way ticket, you don't ever come back! With CALL you can remember where you came from and JUMP back there (using RETN) when you're finished in the subprogram.

8.4.3 The Correct Way - CALL and RETN

Figure 8.2 shows how we can make m4p3 into a proper subprogram -- all we need to do is add retn at the end; the label m4p3 is all we need to name it.

/ this subprogram multiplies AC by four and adds 3, leaving the result in AC
m4p3      stod tmp         /200
          loco 3           /201
          addd tmp         /202
          addd tmp         /203
          addd tmp         /204
          addd tmp         /205
          retn

And here is how to call it.

          lodd a1
          call m4p3
          stod a2
          lodd a3
          call m4p3
          stod a4
          lodd a5
          call m4p3
          stod a6

Figure 8.2 shows the sequence of actions. Note: there is no explicit use of the stack, all PUSHes and POPs are done implicitly by CALL, RETN.

Figure 8.2: Subprogram Call and Return
\begin{figure}\begin{verbatim}calling program subprogram
--------------- ---...
...---<------------------------------- 206:
106: etc....\end{verbatim}
\end{figure}

8.4.4 Subprograms with Parameters

Subprogram m4p3 above uses AC to pass parameters. To make it completely general, we need to use the stack for passing parameters to the subprogram and returning results from it. In addition, the use of tmp is messy, and bad practice.

First, we must introduce load, store and arithmetic instructions that operate on stack memory.

lodl x    Load local          ac <- m[sp+x]  
stol x    Store local         m[sp+x] <- ac
addl x    Add local           ac <- ac + m[sp+x]
subl x    Subtract local      ac <- ac - m[sp+x]

The term local come from local variables - local to the subprogram. From now on, if we want to pass something to a subprogram, we push it, and if we need store a value in a temporary variable, we also push it.

Each of these instructions allows you to access the memory cell x below the top of the stack; for example,

          lodl 0   /loads into AC the most recently pushed value 
                   /NOT the same as pop, as nothing is removed from
                   /the stack
 
          lodl 1   /loads into AC the last pushed but one
          stol 2   /stores value in AC into cell 2 from top
                   /NOT the same as PUSH, as no new space is
                   /created on stack, i.e. you may be overwriting
                   /something valuable.

Beware, when using the stack, it is easy to unintentionally overwrite important values, e.g. the return address of a subprogram, or to write to parts of memory that are not really part of the stack.

Now, we can revise m4p3.

/ this subprogram multiplies AC by four and adds 3, leaving the result in AC
m4p3      loco 3  
          addl 1 
          addl 1 
          addl 1
          addl 1
          stol 2 
          retn

Notice how the stack removes any need for named temporary variables. Question: why addl 1 and not addl 0. Answer: because the return address is at 0 - it was the last thing pushed.

And here is how to call it.

          loco 0      /actually, you could push anything
                      /what is important is making space for the result
          push        /make space for return value
          lodd a1
          push        /push input value
          call m4p3
          pop         /pop input value (remove from stack)
          pop         /pop output value
          stod a2
          loco 0
          push 
          lodd a3
          push
          call m4p3
          pop
          pop
          stod a4
                  etc...

8.5 Stack Frame

In connection with subprograms, there are four uses for the stack:

  1. Passing parameters:

    1. Passing parameters to the subprogram;

    2. Returning values from the subprogram.

  2. Storing the return address;

  3. Saving the environment (registers);

  4. Finally, all local variables are created on the stack.

In general, the stack looks like Figure 8.3. This is called the subprogram's stack frame.

Figure 8.3: Stack Frame
\begin{figure}\begin{verbatim}4020
High Memory ^
\vert Parameters
Return A...
...registers (not used in our example)
Local Variables.\end{verbatim}
\end{figure}

8.6 Recursive Subprograms

If you called the procedure m4p3 recursively three times - or, indeed, there were three nested calls of different procedures - the situation would look like Figure 8.4. In this way a procedure can call itself again and again, without one call interfering with the other; the only limit is the size of the stack.

Figure 8.4: Stack Frames for Recursive or Repeated Procedure Calls
\begin{figure}\begin{verbatim}4020
High Memory ^
\vert Parameters (in and o...
...registers (not used in our example)
Local Variables.\end{verbatim}
\end{figure}

8.7 Parameters Passed By Value

In the scheme mentioned in the previous two subsections, parameters are passed by by value/copy. Thus, subprogram m4p3 can do whatever it likes to the memory location that contains the input values -- the copy of a1 (the parameter) is on the stack and a1 itself are separate and so a1 in the caller will never change; in fact, subprogram m4p3 can treat it as a local variable.

8.8 Reentrant Subprograms

Subprograms which use the stack for passing parameters, and for their working (local) storage can be in use by more than one process at a time (e.g. in a multitasking operating system); such subprograms are called reentrant.

If a subprogram used global data, or used some local storage in its own program space - rather than using the stack, then the different (multiple and simultaneous) users of it would get their data mixed up.

Multitasking operating systems make much use of reentrant subprograms - there needs to be just one copy of the subprogram, even if it is being used by a great many processes.

8.9 Macros

As indicated above, a non-subprogram solution could have been used to repeat the mp4p3 code as many times as was required. And as we mentioned, this repetition of code would have made the overall program larger, as well as other more serious problems.

Now, if we are content to accept the increase in program size, use of a macro avoids the other problems (i.e. more than one copy to maintain, difficulty of reading code with large chunks repeated).

Essentially, you declare a macro containing the working bits of the subroutine (no need for the housekeeping bits at the top and bottom) and then insert the macro code wherever the CALL appears. Macros are used whenever you want to trade memory for speed -- you waste no time PUSHing and POPping the stack.

8.10 Input-Output Instructions

There are no direct instructions for input- output; instead Mac-1a uses memory-mapped input-output, whereby some memory cells are mapped to input-output ports; for simplicity we assume that there are only two ports, one connected to a standard-input device, the other connected to a standard-output device:

Note: recall that 0x signifies Hexadecimal.

We assume that each device works with bytes (i.e. 8-bits).

8.10.1 Input from standard-input device

A read from address 0xFFC yields a 16-bit word, with the actual data byte in the lower order byte. There is no use in reading the input port until the connected device has put the data there: so 0xFFD is used to read the input status register; the top bit (sign) of 0xFFD is set when the input data is available (DAV).

Thus, a read routine should go into a tight loop, continuously reading 0xFFD, until it goes negative; then 0xFFC can be read to get the data. Reading 0xFFC clears 0xFFD again.

8.10.2 Output to the standard-output device

Output, to 0xFFE, runs along the same lines as input. A write to 0xFFE will send the lower order byte to the standard-output device. The sign bit of 0xFFFH signifies that the device is in a ready to receive (RDY) state; again there is no use writing data to the output port until the device is ready to read it.

8.10.2.0.1 Example

Write a fragment of program that will output the contents of the lower-order byte of address 500 to the standard output device mentioned in section 8.10.2.

    testStatus:     lodd fff             /read status
                    jpos testStatus      /not ready
                    jzer testStatus      /not ready
          out:      lodd 500
                    stod ffe             /output

8.10.2.0.2 Exercise

Write a fragment of program that will read from the standard input device into 501.

8.10.2.0.3 Exercise

Write a program that will output the contents of the lower-order bytes of addresses a0 to a9 (say, 500 to 509) to the standard output device - use the earlier example, and the previous examples on loops (chapter 6) as your building blocks.

8.10.2.0.4 Exercise

Write a program which will:

  1. Continuously read from the standard input device, until:

  2. The number -1 (0xFFFFH) is received to signify END-OF-INPUT;

  3. Send what was read to the standard output device.

8.10.2.0.5 Exercise

Write a program which will:

  1. Continuously read from the standard input device, until:

  2. The number -1 (0xFFFFH) is received to signify END-OF-INPUT;

  3. Add $ 1$ to each number just read;

  4. Send them to the standard output device.

8.10.2.0.6 Exercise

Write a program which will:

  1. Continuously read from the standard input device, until:

  2. The number -1 (0xFFFFH) is received to signify END-OF-INPUT;

  3. For each character read, check if it is in the range 'A' to 'Z', if so, make it lower-case, i.e. add 0x20 to it;

  4. If it is not an upper case character, leave it alone;

  5. Send it to the standard output device;

8.10.2.0.7 Exercise

The example above:

    testStatus:     lodd fff             /read status
                    jpos testStatus      /not ready
                    jzer testStatus      /not ready
          out:      lodd 500
                    stod ffe             /output

is unsatisfactory for many purposes; what happens, for example, if the output device is broken, or is switched off, and, as a consequence never becomes ready; the program would stay in the tight loop and the only way to stop it would be to reset/reboot.

Change the code to count its `not-ready' failures and, if this count ever reaches `maxcount' (e.g. maxcount = 100) to put -1 in AC and JUMP to label `exit'.

8.10.3 Polled I/O

The scheme of input-output outlined above is called polled input-output(I/O). Polled I/O is unsatisfactory for two major reasons:

A partial solution is as follows: control does not stay in the tight loop, but the CPU goes off and does other things, returning now and again to check status. But, interrupts provide the real solution.

8.11 Interrupts

[NB. Mac-1 has no interrupts -- the following is modelled on interrupts on the 80X86].

As we have indicated in the previous section, it would be intolerable to have the CPU wasting its time constantly monitoring input (and output) status registers.

Consider the case of the simple case of a keyboard (GUI interfaces with a mouse present an even greater problem). In Windows and other operating systems the keyboard is read even when the computer is away running another part of the program. This is done with a special type of subroutine call - an interrupt.

When you hit a key on the keyboard, something like the following happens:

8.11.0.0.1 Hardware Actions

Note: These hardware actions are invisible to the programmer; i.e. they do not need any assembly/machine code.

  1. The keyboard is connected to a keyboard controller (a chip) on the motherboard. When you hit a key, two pieces of data become apparent to the controller: (i) a code identifying the key (including shift, control etc.); (ii) that a key was hit. The controller will store the code in a register;

  2. The keyboard controller puts a `1' on an interrupt line on the system bus; if you look at the pin layout on a Pentium chip, you will see one pin labelled INTR (interrupt) and another NMI (non-maskable interrupt). In the discussion here, we will deal only with INTR. The difference between INTR and NMI is that NMIs will always be responded to whereas the CPU can disable INTRs using an instruction DI (disable interrupys), see below;

  3. As soon as the CPU is prepared to handle the interrupt, it sends an acknowledge signal to the controller. Normally, during each fetch part of the fetch-decode-execute cycle, the CPU will check the interrupt line. Note: this means we have to add a little to fetch.

  4. The keyboard controller sends a small integer (0-255), an interrupt vector to identify itself. There could be many interrupting devices: keyboard, mouse, disk, ethernet card, etc., each with different interrupt servicing software subprograms (drivers), so the CPU must first identify the device in order to be able to select the appropriate subprogram;

  5. The CPU reads and stores the interrupt vector; in effect, the interrupt vector is the address where the address (address of address of the interrupt service routine (ISR) is held; ISRs are like other subprograms, their address is where their first instruction is held.

    Why address of address? The answer is simple -- for flexibility. It is nice not to force OS manufacturers to use fixed addresses for ISRs. With the interrupt vector containing a fixed address that points to a variable address, we have the flexibility of changing the variable address. In fact, low memory (addresses 0-1023) is reserved for a 256 entry interrupt vector table (IVT); four bytes (32 bits) per interrupt vector.

    In other words, the interrupt vector of the keyboard never changes, but the driver software (ISR -- subprogram) may be placed anywhere as long as the address in the interrupt vector table (IVT) is kept updated.

  6. The CPU PUSHes the PC onto the stack. That is, a normal action before you jump to a subprogram (see CALL);

  7. The CPU PUSHes the condition flags N, Z, onto the stack; in fact, in a Pentium, there is a FLAGS register and it is this that is saved on the stack. Why? Consider the following:

         lodd a1
         subd a2   /compare a1 and a2
         jzer xyz  /if equal jump to xyz, i.e. if Z flag is 1, jump
    

    The CPU can be interrupted just as it is about to fetch jzer xyz; it then goes off and services that interrupt (executes the ISR); what was in flag Z will have been overwritten, and a wrong decision to jump or not jump may occur on return from the ISR.

  8. The CPU multiplies the interrupt vector by four to get an address in the IVT; the interrupt vector is a number in the range 0 ...255 and we want an address in 0 ...1023. Another way of looking at this is that each IVT entry if four bytes, hence interrupt vector 0 points to address 0, interrupt vector 1 points to address 4, interrupt vector 2 points to address 8, etc.

  9. The CPU takes the address of the interrupt service routine from the entry in the IVT. That is, let's say the interrupt vector is 1; the CPU goes to memory cell 4, (and 5, 6 and 7) to get a 32 bit number (e.g. 0x 0070 07FB); 0x 0070 07FB is the actual address of the interrupt service routine;

  10. The interrupt service routine is CALLed (but implicitly); i.e. in the example above, we have the equivalent of CALL 0070 07FB

8.11.0.0.2 Software Actions

We are now in the interrupt service routine; i.e. this part is programmed.

  1. The first instruction is usually DI (disable interrupts); i.e. prevent the CPU getting confused by being interrupted while it is handling an interrupt;

  2. PUSH (save) all CPU registers (accumulators) onto the stack; in Mac-1, this means only the AC; however, in a Pentium, there are many accumulators. Why do this? The reason is the same as the reason given above for the FLAGS; when you return from servicing an interrupt, you must ensure that the CPU is left exactly as you left it. However, if an interrupt service routine is certain that it will not modify a register, then it need not save it;

  3. Read the input port -- in the normal way. If things are designed properly the device will always be ready after it has generated an interrupt; however, there is nothing to stop the ISR (interrupt service routine) first checking the status register;

  4. Put the data in a buffer (memory area -- an array), and maintain the buffer pointer; all this means is put the input character somewhere where the reading program can get at it;

  5. The interrupt service routine may need to tell the keyboard controller that it has completed servicing the interrupt;

  6. Restore (POP) all saved registers;

  7. Execute a return-from-interrupt (IRET on an 80x86): (i) IRET restores (POP them and write their values into N and Z) FLAGS and (ii) POPs the return address; i.e. same as RET, with the additional step of restoring the FLAGs.

8.11.0.0.3 IRQs - Interrupt ReQuests

It would be easy to say that IRQs are the same as interrupt vectors; that is close to the truth, but not quite. In fact, all interrupts go through an intermediate devise, a PIC (Programmable Interrupt Controller); when a device wants to interrupt, it sends its IRQ (number) to the PIC; the PIC translates IRQ to interrupt vector and passes the request to the CPU. The story then continues as above.

8.11.0.0.4 Transparency of Interrupts

A key factor is the transparency of interrupts. The interrupt causes the service routine to run, but when that routine is finished, and IRET executed, the executing program should be none the wiser -- except, maybe, it notices that an instruction took 20 or 30 $ \mu$-sec to run, instead of just 1 $ \mu$-sec; and, of course, there will be another character in the input buffer.

8.11.0.0.5 Exercise

In a certain computer system the time taken for the processor to recognise and acknowledge an interrupt is 4 microsec.; it takes 10 microsec. to save the PC and flags register, ditto 10 microsec. to restore them. If the execution time for the interrupt handler instructions for the peripheral device is 70 microsecs.:

(a) what is the total time for each interrupt?

(b) estimate the highest interrupt frequency that may occur?

Assume that there are no other generators of interrupts.

8.12 Direct Memory Access (DMA)

In addition to the term polled I/O, we have the term programmed I/O; programmed I/O refers to the practice of the CPU reading each byte into the accumulator (LODD) and then storing it in memory (STOD). This is inefficient: (i) like polled I/O it keeps the CPU occupied rather inefficiently; (ii) all data must pass twice through the system bus. Programmed I/O may be used in an interrupt service routine.

Devices like disk or tape may require very rapid data transfer of data from the device to memory and vice-versa; we cannot tolerate any inefficiency.

In addition, if you look at the system diagram in Figure 8.5, you will see that both the disk (for example) and memory are connected to the system bus. Hence, there may be little requirement for the CPU to get involved in a data transfer, except to initiate it. Typically, a third device will be connected to the bus - a DMA controller - which mediates between the two data transfer devices and ensures an orderly use of the bus. In this case, the data passes through the system bus only once.

Figure 8.5: System with DMA
\begin{figure}\begin{verbatim}C P U Memory Disk DMA Controller
+---------+ +...
...-----------------------------------------------------\end{verbatim}
\end{figure}

Recalling Mac-1a, and memory read/write: during a LODD, for example, Mac-1a would place the address to be read in MAR, and issue an RD. In a machine supporting DMA, DMA would be able to capture the bus (since I/O devices can lose data if they are kept waiting, DMA would have highest priority access to the bus). In addition, the CPU would need to use some sort of protocol to determine when data finally arrives following an RD.

When DMA is in operation, the only effect on the CPU and executing program is some slowing down, because the DMA must steal bus cycles -- so called cycle-stealing -- for the data transfer. There is only one bus, so traffic must take its turn. The degree of CPU slowing down will depend of how fast the DMA transfer can take place. If the DMA transfer can use the full data rate of the bus, then the CPU will stop for a while.

8.13 Addressing - General

In a computer, memory locations can hold instructions or data. In addition, as we shall see the data can be interpreted either as a plain value, e.g. 100, or as an address or reference to another data item. Those of who are familiar with C/C++ will recognise pointers; and Java people will recognise references.

In general, machine instructions usually take zero, one, or two operands (e.g. in lodd a0; a0 is the single operand; lodd is the operation;

Actually Mac-1 has no multi-operand instructions. For a start, it is an accumulator machine, i.e. in instructions like addd, lodd, the second - implicit - operand is AC, the accumulator.

Operands can be data, or can refer to data - i.e. address of data, or can be labels - which translate to addresses - of instructions, e.g. for jumps.

The question of addressing is concerned with how operands are interpreted. In the case of data the operand can be:

Addressing modes look complicated, but if you are careful to analyse what a construct means - by drawing a diagram, if necessary - then there are no real pitfalls.

Also, for those who are not specialists in assembly programming, you should keep to the simple modes and only use the complex modes when they are absolutely essential.

8.14 Exercises

  1. Comparison of the Mac-1 instruction set with those of other machines, e.g. Intel 80X86 (Pentium II is equivalent to 80686).

    1. Mac-1 has instructions with, at most, one operand, on a computer like the 80X86, instructions often have two operands. Think about operations that could use 3 or more operands - write them down, with justification. Are such instructions a good idea - or bad; give a two or three point discussion.

    2. Research and design an instruction set that has no operands at all, except for stack operations PUSH and POP. Write up your findings (one A4 page or less).

  2. Figure 8.1 gives a list of Mac-1 instructions. Do some research on the 8086/8088/80X86 or the 68000 series and choose six more of your favourite instructions that you want included in the Figure. List them with description and justification - what would you use them for and how they would help

  3. Describe using pictorial illustrations, and examples, the following addressing modes: (a) immediate; (b) direct, (c) indirect.

  4. Given the memory values below and a one-address machine with an accumulator, what values do the following instructions load into the accumulator? Illustrate your answer with picture(s).

    addr. 20 contains 40
    addr. 30 contains 50
    addr. 40 contains 60
    addr. 50 contains 70
    
    (1) load immediate 20
    (2) load direct 20
    (3) load indirect 20
    (4) load immediate 30
    (5) load direct 30
    (6) load indirect 30
    

  5. Calling subprograms.

    1. Sketch the start of a subroutine that has 4 (16-bit integer) arguments passed to it. Also, show that CALL in the calling program and the few instructions preceding the CALL.

    2. Draw a picture of the stack just before the subroutine gets down to its work.

  6. Explain how an index register may assist in the handling of arrays; use as an example the case where you wish to add 3 to the array of 10 numbers starting at address 0x500.

  7. Write a procedure `outch(c,maxtries)' that will write a character (the first argument) to the standard output device; the second argument should be `maxtries' - the number of `not-readies' that will be declared a `time-out' failure; on success, it should return 0 in the AC, on `timeout' -1 in AC; e.g.

    c = 0x45;
    error = outch(c,1000);
    

  8. Use `outch' in the previous exercise in writing a procedure that has four character parameters and will write these to the standard-output device, i.e. called as: error = write4(a,b,c,d);.

  9. Write a procedure `ch=inch(maxtries)' that reads from the standard-input device. `ch' is returned in AC. It `times-out' after `maxtries' not-readies and returns $ -2$ in AC. If it gets END-OF-INPUT, it returns $ -1$ in AC. END-OF-INPUT is a special character which you check for - assume that you have a variable set to its value, or, if it makes it easier, assume the special value is 0) it returns -1 in AC.

8.15 Self assessment questions

These are also recall type questions that appear as parts of examination questions.

  1. (a) Referring as necessary to Figure 8.1, explain the instructions push, pop.

    (b) Consider the following program:

       loco 29
       stod a1
       loco 31
       stod a2  /1
       lodd a1  /2
       push     /3
       lodd a2  /4
       push     /5
       pop      /6
       stod a1  /7
       pop      /8
       stod a2  /9
    

    (i) after /1 what is in a1, a2, the AC register?

    (ii) after /2 what is in a1, a2, the AC register?

    (iii) after /3 what is in a1, a2, the AC register, top of stack?

    (iv) after /4 what is in a1, a2, the AC register, top of stack?

    (v) after /5 what is in a1, a2, the AC register, top of stack?

    (vi) after /6 what is in a1, a2, the AC register, top of stack?

    (vii) after /7 what is in a1, a2, the AC register, top of stack?

    (viii) after /8 what is in a1, a2, the AC register, top of stack?

    (ix) after /9 what is in a1, a2, the AC register, top of stack?

  2. Referring as necessary to Figure 8.1, and using the following programs as examples, explain the chief difference(s) between 'jump' and 'call'. In the case of 'main1' what will be in 'a1' at /1?

    main1: loco 22
           stod a1
           call sub1
           lodd a1    /1
    
    sub1:  loco 1
           addd a1
           stod a1
           retn
    
    main2: loco 22
           stod a1
           jump sub1
           lodd a1    /2
    
    sub2:  loco 1
           addd a1
           stod a1
           retn
    

  3. Explain what is mean by the term reentrant in reentrant subprogram? (See section 8.8.) Why is it important that subprograms in multitasking operating systems be reentrant?

  4. What is the difference between a macro and a subprogram? (see section 8.9). Extensive use of macros would lead to larger, yet faster? executable programs. Explain.

  5. Memory mapped input in Mac-1. The input port is mapped to address 0ffcH (Hex); a read from 0ffcH (Hex) yields a 16-bit word, with the actual data byte in the lower order byte; 0ffdH is mapped to the input status register; the top bit (sign) of 0ffdH is set when the input data is available (DAV). Reading 0ffc clears 0ffd again.

    Write a fragment of Mac-1 code that will read from the standard input device into location 1000; include appropriate comments.

  6. (a) Explain why programmed input-output is inappropriate in all but specialised situations.

    (b) Describe a situation in which programmed input-output would be appropriate.

  7. In the context of a machine instruction, explain the difference between immediate addressing and direct addressing. In Mac-1 assembly language, give three examples of direct addressing; give one example of immediate addressing. Hint: addd, lodd, loco, stod, subd.

  8. What is meant by ROM? RAM?

  9. Explain the difference between ROM and RAM.

  10. Give one good reason why a computer system should have some ROM memory.

  11. The following describes a simple memory mapped input-output scheme: " Mac-1a uses memory-mapped I/O, whereby some memory cells are 'mapped' to i/o ports; for simplicity we assume that there are only two ports, one connected to a 'standard-input device', another connected to a 'standard-output device':

    Input: data mapped to 4092/0xFFC (lower-order byte = data byte); status mapped to 4093/FFD (sign bit set denotes 'data available').

    Output, mapped to 4094/0xFFE (lower-order byte); status 4095/0xFFF (sign bit set denotes 'ready') ..."

    (a) Briefly, explain the principle used in both 'read' and 'write' operations.

    (b) Outline a fragment of program (using pseudo-code or assembly code - see Figure XX) that will write the contents of the lower- order byte of address 500 to the output device mentioned in the first part of the question.

    (c) Explain, briefly, why programmed input-output is unsatisfactory for many applications and how interrupts can provide some remedy.

  12. Referring to Figure 8.1, explain the operation of the Mac-1 instructions: call and retn; you should mention the roles of the stack and stack pointer; illustrate your answer with appropriate examples / diagrams. In your answer please mention the major reasons why call and retn cannot be replaced by simple jump instructions.

  13. Referring to Figure 8.5, explain how DMA may improve the efficiency of a computer system.

  14. (a) Explain, briefly, the principles of programmed I/O, polled I/O, interrupt driven I/O, and explain why programmed and polled are unsuitable in most cases.

    (b) In a certain computer system the time taken for the processor to recognise and acknowledge an interrupt is 4 microsecs; it takes 10 micro secs to save OR restore the PC and other registers. If the execution time for the interrupt handler instructions for peripheral X is 70 micro secs,

    (i) what is the total time for each interrupt?

    (ii) estimate the highest interrupt frequency that may occur?

    Assume that there are no other generators of interrupts.

  15. When writing cookery recipes or other instructions, how may the concept of subprogram assist: (a) the readability of the instructions; (b) the overall size of the instructions; (c) the maintainability of the instructions.


next up previous contents
Next: 9. Introduction to Operating Up: Lecture Notes on Computer Previous: 7. Assembly Language Programming

平成17年1月9日