University of Ulster, Magee College, Faculty of Informatics. PGDip/MSc Computing and Information Systems, FT (1103), PT (1405). Module: Knowledge Based Systems. Autumn Term, 1995. Guest Lecture on Neural Networks in Expert Systems and Knowledge Based Systems Lecturer: J.G. Campbell. Date: / /95. [Note: this lecture is an adapted version of a lecture on Neural Networks that forms part of the BSc 4 Image Processing module; however, the lecture is aimed at a general audience -- no prerequisite knowledge is required.] 9. Neural Networks. ------------------ 9.1 Introduction. ---------------- In the search for efficient computing structures for artificial intelligence and knowledge-based systems, one natural and reasonable approach is attempt to model the workings of mammalian brains. The term 'artificial neural network' or simply 'neural network' refers to computing architectures which are supposedly based on the networks present in brains. There are two common motivations for the study of neural networks: - to research computational models of human / mammalian mental activity. - as novel computational structures and algorithms. We will focus on the latter (algorithmic), and show that neural networks perform well at a range of computational tasks, and, moreover, that they are strongly related to many well known traditional algorithms. In addition, our view is that, while human cognitive and computational processes are surely of interest, so little that is known seems to be practically implementable that psychology seems mostly irrelevant to those who are attempting to develop artificially intelligent systems. Indeed, there is good reason to question the validity of the term 'artificial intelligence', per se; it is significant that research in this area is often now called 'knowledge engineering'. Futhermore, perhaps the history of mechanical intelligence may parallel that of mechanical flight: the real progress was made when the obsession with feathers and flapping was removed! However, we will initially accept the claim that, at a mechanical level, much of what goes on in human brains can expressed in terms of (1) pattern recognition, and (2) computation of functions -- either logical functions or numerical functions. We will show that artificial neural networks are capable of performing simple versions of these tasks. As discussed in chapter 8, pattern recognition is concerned with 'making sense' of what we see, hear, smell, touch. When you see a face that you have seen before, you recognise it: learning, perception, recognition. In the example developed below, we show a very simple model of text character recognition. Very roughly, in the context of artificial intelligence and knowledge-based systems, 'computation of functions' is to do with creating some new information out of information you already have, e.g. you recognise the character '2' followed by '+' followed by '5' and you can infer '7', i.e. arithmetic. For engineers and computer scientists (forgetting now psychologists and neurobiologists), interest in neural networks is twofold: - the fact that the algorithms associated with them seem to be efficient at certain tasks, e.g. pattern recognition; they can 'learn' from examples, and are 'model-free', unlike some competing statistical algorithms (e.g. multiple linear regression). - they are implementable in parallel and special purpose hardware, i.e. they can be made to work FAST. Additional impetus arises from the possiblility of implementing neural networks using optical components - fast, and use little power. In this lecture we will explain some of the applications of neural networks to image processing - specifically pattern recognition. First, in section 9.2, we give a historical background, and discuss the early motivation for neural network research. Then, in section 9.3, we describe the basics of artificial neurons, their relationship with 'real' neurons, and go on to describe perceptrons and multilayer networks. Next, section 9.4, gives a very brief introduction to the implementation of neural networks in software and hardware. Section 9.5 introduces training. Since most of the chapter is on backpropogation trained multilayer feedforward networks, section 9.6 mentions some other architectures. Section 9.7 is conclusions and summary. Finally, Section 9.8 gives a bibliography and references. 9.2. Historical Background -------------------------- See Nagy (1991), Widrow and Lehr (1990), Hecht-Nielsen (1991). The initial studies of neural networks started in the 1940s by psychologists trying to come up with a mechanical / mathematical model of human thought (MacCarthy 1955). It was then taken up in the 1950s by the AI (artificial-intelligence) community, especially those interested in pattern recognition, - who, not unreasonably, reckoned that the best way to produce artificial intelligence was to produce artificial brains. And, since brains were believed to be made of networks of 'processing units' that we call neurons, how better to produce artificial brains than use artificial neural networks. Significant work on neurophysiology started about 1850, however, the earliest notable paper is McCullough and Pitts (1943), which started by identifing the equivalence of the ON/OFF response of a neuron with a (logical) proposition, i.e. has value true or false. They then went on to show how simple one- and two-input neurons could implement the NOT, AND, and OR Boolean functions. Consequently, of course, more complex Boolean functions are only a matter of connecting up a network. Some early work in machine pattern recognition focussed on human pattern recognition from a psychological perspective, e.g. (Deutsch 1955), and other contributions in the collection (Uhr 1966). Others focussed on the physiological structure of mammalian brains and vision and auditory systems, e.g. (Hubel and Wiesel 1962). Work on the brains of frogs, culminating in Barlow (1953), identified evidence for a 'matched detector' for small dark objects (e.g. a fly) - a 'fly detector' neuron that 'fires' when a fly-like object enters the frogs visual field; this was an inspiration for the 'perceptron'. Starting around 1956, Frank Rosenblatt, at Cornell University, invented and built the 'Perceptron', which, amongst other things, was used to model the processing that happens in the visual cortex - the part of the brain that does initial processing on signals sent from the retina (sensitive part of the eye). It was shown that a perceptron could recognise (differentiate between) different patterns (see next section for definition of a pattern). More importantly, it was proved mathematically, and demonstrated that a perceptron could 'learn'; by giving it example patterns along with what each pattern 'represents' you can get it to self-organise (learn) such that if a pattern similar to one of the learned patterns is encountered, the machine can recognise what it represents. See, for example, Duda and Hart (1973), Block (1962), Rosenblatt (1960), Hecht-Nielsen (1991). There was also an active group at Stanford led by Bernard Widrow, (Widrow and Lehr, 1990), which developed perceptron-like structures: Adaline, and Madaline. They developed the Widrow-Hoff or 'delta-rule' training algorithm for Adaline (effectively a perceptron). However, at the same time, another school of artificial intelligence was being set up - mostly based on 'symbolic processing'; this is what you will read in most current books on AI (see Minsky, 1960). The language LISP was developed for this sort of AI. PROLOG (PROgramming LOGic) (Bratko, 1991) is another (more advanced) example of a symbolic processing approach to AI and knowledge-based systems. Also, much of the pattern recognition work going on was based on statistical theory - statistical decision theory; statistical pattern recognition was probably born in Chow (1957), but statistical decision theory has been around since the early 1900s. Actually, statistics and statistical decision theory are remarkably good examples of knowledge-based systems: knowledge capture may be easy (estimate the statistical parameters), representation is easy (the parameters), and statistcal inference and decision theory are well understood. Much work was done on perceptron-like structures until 1969, when Marvin Minsky and Seymour Papert (Massachussetts Institute of Technology) published a book called 'Perceptrons', Minsky and Papert (1969), which showed that a simple, single layer perceptron could not differentiate between certain quite dissimilar patterns (see next section); in addition, they showed that a perceptron could not compute a very simple function - the XOR function. (Minsky had studied computational aspects of neural networks in his PhD. thesis, 1953 - but I don't know of any publications arising from that time). This Minsky-Papert attack proved a great setback to neural network research; it caused what neural net researchers call the 'Dark Ages' - government funding almost completely dried up. Hecht-Nielsen (1991) attributes a conspiratorial motive to Minsky and Papert, namely, that the MIT AI Laboratory had just been set up and was focussing on LISP based AI, and needed to spike other consumers of grants. A good story, whatever the truth, and given extra spice by the coincidence that Minsky and Rosenblatt attended the same high-school, and the same class. Moreover, any bitterness is probably justified because neural network researchers spent the best part of 20 years in the wilderness. Work did not stop however, and the current upsurge of interest began in 1986 with the famous PDP books which announced the invention of a viable training algorithm (backpropogation) for multilayer networks, (Rumelhart and McClelland, 1986). 9.3 Neural Networks Basics. --------------------------- 9.3.0 Introduction. ------------------ This section motivates the study of neural networks by demonstrating how neural nets do pattern recognition and compute functions. First we give a simplified account of 'real' neurons. Then we show how a very simple neuron can perform a limited pattern recognition task. We introduce the 'perceptron', and its limitations. Then we show how these limitations can be mitigated by combining neurons in 'layers'. 9.3.1 Brain Cells. ----------------- Here is a very brief mention of how mammalian brains are thought to be constructed; and, at a small scale, one theory of how they 'work'. These brains are composed from networks of interconnected neurons. Figure 9.3-1 shows a NEURON and its surroundings. A neuron has a CELL BODY with one AXON stretching out from the cell body, which also has many DENDRITES protruding from it. Figure 9.3-1 A 'Real' Neuron. ---------------------------- The AXON is the channel by which the neuron sends signals to other neurons. These signals are in the form of a series of electrical pulses - the more frequent the pulses, the stronger the effect of the signal. AXONS only transmit - away from their neuron. DENDRITES receive signals from the axons of other neurons. Between the axons of the transmitter neuron and the dendrites of receiver neurons are SYNAPSES. Synapses are narrow regions of conductive material; the strength of the excitation signal received by the receiver neuron is depends on how well the pulses are conducted from the axon (sender) to dentrite (receiver). Neurons operate by sending signals between one another, and each neuron fires (sends pulses) only if it receives a minimum number of excitory pulses via its dendrites. The number of excitory pulses received obviously depends on (a) the amount of pulses injected by the connected axons (from other neurons), and (b) the amount of those pulses that are conducted by the synapses. There are a great many neurons in a typical human brain, perhaps 10^11. There are even more synapses - perhaps 10^16. There is good reason to believe that synapses form what we understand as memory. Artificial neural networks were initially studied in an attempt to model the brain. In fact, the similarity - actual to artificial - may be relatively slight; however, artificial neural networks are now used as computing structures in their own right. In the RETINA of an eye there are light sensitive cells - like neurons; when a cell is illuminated it will transmit pulses via its AXON to one (or more than one) other neuron(s). 9.3.2 Artificial Neurons. ------------------------- The feedforward neural network shown in Figure 9.3-2 is a parallel network of neurons (processing units, sometimes called nodes). Each circle in Figure 9.3-2 represents a neuron of the form shown in Figure 9.3-3; each performs the weighted-sum given by eqn. (9.3- 1) followed by application of the threshold given by eqn. (9.3-2), and shown in Figure 9.3-4. sum = sum wixi (9.3-1) i=1,n output = y = 1 if sum > T (9.3-2) 0 otherwise Eqn. (9.3-1) can be written in the vector notation, s = x.w, -- we can consider a single neuron as a 'template matcher'. +------------------------------------+ | | x1 ->--+ +----> y1 | | x2 ->--+ +----> y2 | | | | | | | | xp ->--+ +----> yc | | | | +------------------------------------+ Figure 9.3-2 Artificial Neural Network. --------------------------------------- T x1 | \ | \w1 v \ +----+----+ \ | | +-- | +--+--+ | | | | output x2 w2 | | s | | | | y ------------+ +---------->+--+--+---+--------> . | | | | T | . /+--+--+ +---------+ / /wn s=sum wi.xi s>T? y=1 if s>T / =0 otherwise xn Figure 9.3-3 Artificial Neuron. ------------------------------- In fact, the two equations can be expressed much more neatly by bringing in the negative of the threshold (T) as a weight, w0, which is always tied to a +1 signal: sum = sum (wi.xi) (9.3-3) i=0..n where w0 = - T and the summation is now 0,n; Now, the thresholding simplifies to: output = 1 if sum > 0 (9.3-4) 0 otherwise The weight w0 tied to +1 represents the 'so-called' BIAS input - you could think of it as an 'inhibitory' input, since in the normal sense, w0 will be negative, wheras the other inputs are all 'excitory'. Eqns. 9.3-3 and 9.3-4 are shown in Figure 9.3-4 and 9.3-5. Figure 9.3-5 makes the threshold function implicit; this is by far the most common way of representing neurons. x1 | +1 \ | \w1 |w0 \ | +---------+ \ v | +----- | +--+--+ | | | output x2 w2 | | s | | | y ------------+ +---------->+--+------+--------> . | | | 0 | . /+--+--+ +---------+ / /wn s=sum wi.xi s>0? y=1 if s>0 / =0 otherwise xn Figure 9.3-4 Artificial Neuron - with Bias. ------------------------------------------ x1 | +1 \ | \w1 |w0 \ | \ v +--+--+ output x2 w2 | | y ------------+ +----------> . | | . /+--+--+ / /wn / (Note: normally the neuron is represented xn by a circle). Figure 9.3-5 Artificial Neuron - Threshold Implicit. --------------------------------------------------- Neural networks are trained rather than programmed: that is, the weights are adjusted to provide a best fit representation for a (training) set of examples of pairs (x,y). A training rule called 'back-propogation', which can effectively train multi-layer networks, has made multilayer networks a practical reality; see section 9.5. The weights correspond to the synapses mentioned in section 9.3.1, and it is these weights that represent the memory / knowledge-base of the system. 9.3.3 Neural Networks and Knowledge Based Systems. -------------------------------------------------- Knowledge-based systems form a branch of artificial intelligence; to some extend they represent a milder form of 'expert-system' - with, perhaps, the aims slightly lowered. Knowledge-based systems try to automate the sort of complex decision task that confronts, for example, a medical doctor during diagnosis. No two cases are the same, different cases may carry different amounts of evidence. In essence, the doctor makes a decision based on a large number of variables, but some variables may be unknown for some patients, some may pale into insignificance given certain values for others, etc. This process is most difficult to codify. However, there are sufficient advantages for us to try: if we could codify the expertise of a specialist, waiting lists could be shortened, the expertise could be distributed more easily, the expertise would not die with the specialist, etc. There are four major parts in a knowledge based system: - KNOWLEDGE ELICITATION: this is the extraction of knowledge from the expert; it may be done by person to person interview, by questionairre, or by specialised computer program, - KNOWLEDGE REPRESENTATION: we must code the knowledge in a manner that allows it to be stored and retreived on a computer, - KNOWLEDGE DATABASE: where the knowledge is stored (using the 'representation' code mentioned above), - INFERENCE ENGINE: this takes data and questions from a user and provides answers, and/or updates the knowledge database. Figure 9.3.3-1 depicts a possible organisation and operation of a knowledge based system. Actually, a well designed database with a good database management system, coupled with a query language that is usable by non- computing people, goes a long way to fulfilling the requirements of a knowledge based system. Also, some of the pattern recognition systems we mention, could be called knowledge based - after all, they store knowledge (the training data or some summary of it) and make inferences (based on the measured data or feature vector); feature extraction is very similar, in principle, to knwoledge representation. Furthermore, we note that neural networks show much promise as knowledge-based systems. +--------------+ | | | DATABASE | | +<--------+ +----+---------+ | ^ | | | v v E +--------------+ +----+---------+ +----+---------+ X | KNOWLEDGE | | KNOWLEDGE | | INFERENCE | P--->+ ELICITATION +--->+REPRESENTATION+<---+ ENGINE | E | | | | | | R +--------------+ +--------------+ +-+------+-----+ T ^ | examples, | | raw knowledge New Data| | Questions, | v etc. Answers Figure 9.3.3-1 Knowledge Based System ------------------------------------- Where do neural networks fit in here? First, the inference engine. This is the neural network, i.e. Figure 9.3-2. The questions / new data are the input data -- input at the nodes at the left of the network the xs, the answers are the outputs -- from the right hand side of the network, the ys. You may worry that the inputs and outputs of the network are numerical, but is not too difficult to code non-numerical inputs to numerical, and to translate numerical outputs into some appropriate form -- e.g. voice synthesis. Second, the knowledge database. Well, this is in the network too -- the weights represent all the knowledge in the system. Third, knowledge elicitation and knowledge representation. These are done by presenting to the network training algorithm, representative examples (inputs and outputs) of what expertise the network must eventually emulate. One big advantage, and also a disadvantage, with neural networks is that the knowledge about a particular topic may be distributed over many of the weights. This (advantage) makes them robust to damage to parts of the system. On the other hand, it makes them difficult to understand / debug, i.e. they are 'opaque' -- it is not easy to get an explanation from them (a requirement in many expert systems / KBS). In a rule based system (even a fuzzy rule based system), it is possible to identify which rules have fired, and hence, for example, identify the cause of a bad or peculiar decision. 9.3.4 Neurons for Recognising Patterns. -------------------------------------- [This is meant to be only a motivating example, so do not take it too literally; in fact, some of the examples in section 9.3.12 may be more appropriate and easier understood] Imagine that we have nine light sensitive neurons, one corresponding to each of the nine cells shown below, the letter 'C' etc. And imagine that there are nine axons from each of the receptor neurons, and all nine of them connected to the dendrites of a single 'C' recognising neuron. Assuming the character is white-on-black and that bright (filled in with '*') corresponds to '1', and dark to '0', the array corresponding to the 'C' is x[1]=1, x[2]=1, x[3]=1, x[4]=1, x[5]=0, x[6]=0, x[7]=1, x[8]=1, x[9]=1. Pixel number: 1 2 3 +----+----+----+ |****|****|****| |****|****|****| +----+----+----+ 4 |****|5 |6 | |****| | | +----+----+----+ |****|****|****| |****|****|****| +----+----+----+ 7 8 9 Figure 9.3-5(a) A Letter 'C' --------------------------- The letter 'T' would give a different observation vector: 'T': 1,1,1, 0,1,0, 0,1,0 'O': 1,1,1, 1,0,1, 1,1,1 'C': 1,1,1, 1,0,0, 1,1,1 etc. So how is the recognition done? (1) Pixel by Pixel comparison: Compare the input (candidate) image pixel for pixel; we could code this up in a rule-based system: if (x[1] == 1) and (x[2] == 1) and (x[3] == 1) and (x[4] == 1) and (x[5] == 0) and (x[6] == 0) and (x[7] == 1) and (x[8] == 1) and (x[9] == 1) then letter is 'C'. But we haven't really solved anything: any minor difference in any pixel would fool the system, e.g. addition of a small amount of noise. (a) The recognition system needs to be invariant (tolerant) to noise. (b) What if there is a minor change in grey level? grey Cs are the same as white Cs : the system needs to be amplitude invariant - tolerant to changes in amplitude. (2) Maximum Correlation or Template Matching: Compute the correlation (match) of x with each of the templates for each of the potential letters, and choose the character with maximum correlation. That is, we choose letter with maximum corr(elation); this is described mathematically as follows: (a) cxXpj = x . Xj - - i.e. the dot product of x and Xj. p-1 (b) = sum xi . Xij i= 0 the dot product of x with the jth template letter. This is called TEMPLATE MATCHING because we are matching (correlating) each template (the Xjs), and choosing the one that matches best. Template matching is more immune to noise - we expect the ups and downs of the noise to balance one another. Returning to artificial neurons, we want to make the 'C' neuron emit a '1' when a 'C' appears, and '0' otherwise. If we set up the following vector of weights, wC, for the 'C' recognising neuron: set the threshold = w0 = - 6.5 vector element 0 1 2 3 4 5 6 7 8 9 wC = [-6.5, 1,1,1, 1,-1,-1, 1,1,1] then for a 'C' input we get: sumC = -6.5+ 1+1+1+ +1-0-0, +1+1+1 = 0.5 which is > 0, so the neuron will fire for a 'C'. For a 'T' input we get: sumT = -6.5+ 1+1+1 +0-1-0 +0+1+0 = -3.5 which is < 0 and so the neuron does not fire. The foregoing discussion is grossly simplified; for example, we have not mentioned 'shift-invariance': a 'C' is a 'C' no matter where it appears in the visual field. [Actually, we would get better discrimination if we substituted for 0, -1 in the above, but that is fine detail which will not concern us for now] 9.3.5 Perceptrons. ------------------ The neuron represented by equations 9.3-3, 9.3-4 and Figure 9.3-5 is a single PERCEPTRON; actually, it is a 'straight-through' perceptron. The distinction is neccessary because Rosenblatt's perceptron had an additional layer of so-called 'associator' units between the retina and the inputs to the variable weights, see Figure 9.3.6. Each input of an associator unit is connected via a weight to some, relatively randomly positioned, cell in the retina; as well as random positioning, the weights are randomly distributed in {-1, 0, +1}. Figure 9.3.6 Perceptron with Associator Units. --------------------------------------------- 9.3.6 Neural Network Training. ----------------------------- As stated earlier, neural networks are trained - not programmed. Therefore, the simple 'C' neuron would be trained presenting it with a number of examples of 'Cs' and of the other letters, and by adjusting the nine weights until it reliably gave '1' for 'Cs' and '0' for the others. 9.3.7 Limitations of Perceptrons. --------------------------------- For ease of explanation we reduce now the input vector to two dimensions. Recall eqns. 9.3-3 and 9.3-4: sum = sum wi.xi (9.3-3) i=0..n where w0 = - T and the summation is now 0,1,2 (n=2); and, eqn. 9.3-4, the neuron fires if sum > 0; thus, eqn. 9.3-3 can be written out in full as: sum = w0.(+1) + w1.x1 + w2.x2 (9.3-5) ( = f(x,w) ) > 0 for fire i.e. 1 output <= 0 for 0 output. Thus, the (sharp) boundary between the ft(x,w) = 1, and ft(x,w) = 0 (ft() is thresholded) is given by: f(x,w) = 0 (9.3-6) i.e. w0 + w1.x1 + w2.x2 = 0 (9.3-7) i.e. w1.x1 + w2.x2 = - w0 (9.3-8) This is a straight line which cuts the x1 axis at -(w0/w1) and cuts the x2 axis at -(w0/w2), see Figure 9.3-7. \ | + -w0/w2 | \ | \ | \ | \ x2 |class 0 \ class 1 | \ | \ | \ | \ -w0/w1 0 +-------------------+--------> 0 x1 \ Figure 9.3-7 Perceptron Linear Decision Boundary. ----------------------------------------------- Thus, the perceptron can only discriminate between patterns which can be separated by a (single) straight line. Likewise functions, see section 9.3.8: OR and AND can be computed, BUT XOR cannot; see exercises in section 9.3.12. This was one of the achilles heels that Minsky and Papert successfully attacked. 9.3.8 Neurons for Computing Functions. ------------------------------------- The neuron in Figure 9.3-8 can compute the 'AND' function. The AND function is as follows: x1 x2 AND(x1,x2) Neuron summation Hard-limit (>0?) ----------------- ------------------------------ -------------- 0 0 0 sum= -1x0.5 + 0.35x0 + 0.35x2 = -0.5 => output=0 1 0 0 sum= -1x0.5 + 0.35x0 + 0.35x2 = -0.15 => output=0 0 1 0 sum= -1x0.5 + 0.35x0 + 0.35x2 = -0.15 => output=0 1 1 1 sum= -1x0.5 + 0.35x0 + 0.35x2 = +0.2 => output= 1 ------------------ ------------------------------- ------------- x1 +1 \ | \0.35 | \ |-0.5 \ | +--+--+ x2 0.35 | | ------------+ +------------> F | | +--+--+ Figure 9.3-8 AND Function via Neural Network. -------------------------------------------- Ex. 9.3-1 Work out the weights required for an OR function. Ex. 9.3-2. (a) Are the weights for any function unique? Ans. No. (b) Rationalise why this is the case (non uniqueness of weigths). The example in Figure 9.3-9 shows how a TWO-LAYER network of two interconnected neurons can be used to compute the XOR function; of course, this is obvious from our knowledge of Boolean algebra: A XOR B = A AND B' OR A' AND B where ' denotes complement. Figure 9.3-9 XOR Function via Neural Network. ------------------------------------------- Figure 9.3-10 shows the AND, OR and XOR functions plotted in the (x1,x2) plane, together with appropriate boundaries: linear for AND, OR, while XOR needs the ORing of two decision regions. (a) (b) (c) Figure 9.3-10 Decision Boundaries for (a) AND, (b) OR, (c) XOR -------------------------------------------------------------- Even though we have analyses only these simple functions, it should be obvious that combinations of neurons -- in a network -- can implement arbitrarily complex functions, with many inputs. 9.3.9 Complex Boundaries via Multiple Layer Nets ------------------------------------------------ We have shown in the previous section how how two-layers can implement a non-linear decision boundary, now we give qualitative arguments to show that two- and three-layer networks can implement more complex boundaries / decision regions; actually, three-layers can implement arbitrarily complex boundaries. Figure 9.3.11 shows two input neurons fed into a second-layer neuron that implements the AND function as given in the previous section. Each of the input neurons implements a linear boundary, and ANDing the boundaries produces the decision region shown in Figure 9.3.12(a). Now, it is easy to argue that N input neurons, ANDed together, can yield ANY arbitrary open or closed convex decision region as shown in Figures 9.3.12(b) and 9.3.12(c). Finally, if we add another, third, layer that effectively ORs the convex regions produced by the second layer, we can obtain completely arbitrary decision regions; see Figure 4.12d. (a) ANDing two (b) (c) ANDing many (d) Third linear boundaries linear boundaries. layer Figure 9.3.12 Complex Decision Regions via Multiple Layers. ---------------------------------------------------------- 9.3.10 'Soft' Threshold Functions. --------------------------------- Up to now we have used the hard-limit (McCullough-Pitts 'all-or- nothing') neuron activation function: output = 1 if sum > 0 = 0 otherwise. (9.3-9) For reasons mostly to do with training, most neural networks now use a 'softer' activation function, namely the sigmoid function (also called the logistic function): output = 1/(1 + exp(-a.sum)) (9.3-10) see Figure 9.3-13. Usually, the 'gain' factor 'a' is set to 1.0, (obviously, setting a to a very high value yields the simple threshold function ( > 0) ). Figure 9.3.13 Sigmoid Function. ------------------------------ 9.3.11 Multilayer Feedforward Neural Network. --------------------------------------------- The generalised feedforward neural network shown in Figure 9.3-2 is a parallel network of neurons. Although there is no intrisnic reason against a general topology, so long as the data flow forward, the layered structure of Figure 9.3-2, in which outputs from layer n flow only to inputs in layer n+1, is preferred in most practical hardware and software implementations. Each circle in Figure 9.3-2 represents a neuron. Early neural network designs, e.g. Nilsson (1965), used so-called threshold (hard-limit) activation functions mentioned earlier; however, the sigmoid is preferred where the network is required to provide other than 'hard' decision outputs, and, particularly, for ease of training - see below. As indicated earlier, the bias inputs are important in that they give a neuron freedom to shift the position of its threshold. The neural network architecture is defined by: number of input nodes, number of output nodes, number of processing layers, number of nodes in each processing layer; its 'memory' is the matrix of weights for each processing layer. In the literature, there is often confusion as to what constitutes a layer; we adopt and recommend the convention that networks are named according to the number of processing layers; thus, Figure 9.3-2 is two-layer; the input 'layer' does not count because it does no processing, but the output layer does. By convention, processing layers whose outputs are not available outside the network are called 'hidden'. Clearly, the number of nodes in the input and output layers are fixed by the problem; but, the number and content of the hidden layers are free. One layer networks are rather trivial, in that they are simply one neuron per output. Two layers are common but three layers are more general - see the previous section. After the number of processing layers is specified, it remains to specify the number of nodes in the hidden layers. For a two-layer net (one hidden layer) Eberhart and Dobbins (1990) suggest numberhidden = sqrt(numberin+numberout+2) 9.3.12 Exercises. ----------------- Ex. 1 (a) Plot the following data on a two-dimensional surface; note: the class means are class 0 = (1.0,1.5), class 1 = (2.0,3.0). class (y) x1 x2 0 0.40 1.50 0 1.00 0.50 0 1.00 1.50 0 1.00 2.50 0 1.60 1.50 1 1.40 3.00 1 2.00 2.00 1 2.00 3.00 1 2.00 4.00 1 2.60 3.00 + | 4 + 1 | + | 3 + 1 1 1 | + 0 | 2 + 1 | + 0 0 0 | 1 + | + 0 | +----+----+----+----+----+----+----> x1 0 1 2 3 Figure 9.3.14 Feature Space Diagram. ------------------------------------ (b) Verify that a single neuron neural network with the folllowing weights: w0 = 38.0 (bias) w1 = -13.6 weight on x1 w2 = - 8.0 weight on x2 will discriminate between the two classes. (c) Recall Figure 9.3-7, and draw the class boundary; Hint: intercepts y-axis at -w0/w2 = 4.5 x-axis at -w0/w1 = 2.8 (d) Verify that this boundary line approximately bisects the line joining the two means. 2. In the character recognition example given in section 9.3.4 the feature space is nine dimensional. Thus, visualisation of the data in feature space is difficult. The following example is easier to visualise. Consider an imaging system which has just two pixels - or, a simple organism which has just two light sensitive cells in its retina, see Figure 9.3-15. Call the outputs of these x1, and x2, therefore they form a two-dimensional vector x = (x1,x2). x1 x2 +-----+-----+ | | | | | | +-----+-----+ Figure 9.3-15 Two Pixel Image ----------------------------- (a) If the components are binary (0 or 1) we can consider a problem which wishes to distinguish 'bright' objects - class 1, from dark - class 0. For now we will define class 1 as 'both pixels light'. I.e. we have '*' denotes light (1)]: x1 x2 +-----+-----+ |*****|*****| class 1 |*****|*****| +-----+-----+ x1 x2 +-----+-----+ |*****| | class 0 |*****| | +-----+-----+ x1 x2 +-----+-----+ | |*****| class 0 | |*****| +-----+-----+ x1 x2 +-----+-----+ | | | class 0 | | | +-----+-----+ Figure 9.3-16 ------------- Note the similarity with the Boolean AND function. The feature space representation of these classes are shown in Figure 9.3-17; '@' represents class 0, '*' represents class 1. We have shown a linear boundary which segregates the classes. ^ \ 1 @ \ * | \ | \ x2 | class 0 \ class 1 | \ | \ | \ | \ 0 @-----------------------------@> \ 0 x1 1 Figure 9.3-17 Two-dimensional Scatter Diagram - Feature Space ------------------------------------------------------------- (b) Let us change to a problem which wishes to distinguish striped objects (class 0, say) from plain (class 1). I.e. we have ['*' denotes light (1)]: x1 x2 +-----+-----+ |*****|*****| class 1 |*****|*****| +-----+-----+ x1 x2 +-----+-----+ |*****| | class 0 |*****| | +-----+-----+ x1 x2 +-----+-----+ | |*****| class 0 | |*****| +-----+-----+ x1 x2 +-----+-----+ | | | class 1 | | | +-----+-----+ Figure 9.3-18 ------------- Draw the feature space diagram. Draw appropriate decision boundary line(s) - note the difficulty compared to (a). Note the similarity with the Boolean XOR function. (c) Let us change to a problem which wishes to distinguish left- handed objects (class 1, say) from right-handed (class 2), with neither left- or right-handed as reject, class 0. I.e. we have ['*' denotes light (1)]: x1 x2 +-----+-----+ |*****|*****| class 0 |*****|*****| +-----+-----+ x1 x2 +-----+-----+ |*****| | class 1 |*****| | +-----+-----+ x1 x2 +-----+-----+ | |*****| class 2 | |*****| +-----+-----+ x1 x2 +-----+-----+ | | | class 0 | | | +-----+-----+ Figure 9.3-19 ------------- Draw the feature space diagram. Show the linear boundaries. (d) (See (a)) Describe a state of affairs that corresponds to Boolean OR; draw the diagrams. Will a single linear boundary do? [Yes]. 3. Change the system in Ex. 2 to allow non-binary data. Allow the data to extend from 0 to +1 and assume Real values (e.g. 0.995, 0.0256). Now extend 2(a) to (d) assuming that there are small amounts of noise on the pixels, e.g. we have values spread over the range 0.9 to 1.0 for light, and 0.0 to 0.1 for dark. Draw the feature space diagrams for each case (a) to (d). Draw suggested linear boundaries. 4. Now let the noise in Ex. 3 increase. Now, we have values spread over the range 0.55 to 1.0 for light, and 0.0 to 0.45 for dark. (i) Draw the feature space diagrams for each case (a) to (d). (ii) Draw suggested linear boundaries. 5. Consider the following trivialised credit-worthiness expert system (see Luger and Stubblefield, p. 484 for a less trivial example of the same thing). Let x1 represent age of the client, let x2 denote collateral. Code age as >25 x1 = 1, 0 otherwise; has collateral: x2 = 1, has not x2 = 0. Consider the following set of examples: age collat. x1 x2 loan? y ------------------- 1 1 yes (code as 1) 0 1 no (code as 0) 1 0 no 0 0 0 no 0 -------------------- Now this knowledge can be represented by a single neuron network -- it is the AND function encountered in section 9.3.8. Obviously, it is possible, more realistically, as in exs. 3 and 4 to allow x1 and x2 to assume non-binary values. 6. Consider another loan-assessement system. The expert system must capture the following examples. x1 is annual salary, (for convenience of this example) divided by 100,000, salary 20,000, x1 = 0.2. x2 is years owning own residence (again for convenience of the example) divided by 2, i.e. 1 year, x2 = 0.5. salary resid. x1 x2 loan ---------------------------------- 5,000 0 0.05 0 no 5,000 0.5 0.05 0.25 no 25,000 0 0.25 0 no 10,000 0.2 0.1 0.1 no 50,000 0 0.5 0 yes 40,000 0.2 0.4 0.1 yes 30,000 0.4 0.3 0.2 yes 20,000 0.6 0.2 0.3 yes 10,000 0.8 0.1 0.4 yes 5,000 1.0 0.05 0.5 yes ---------------------------------- (a) Explain how such 'knowledge' can be captured in a single neuron. Hint: recall OR function. (b) Plot the data on a scatter plot, and show that the 'yes' and the 'no' can be separated by a linear boundary. Answer: a line joining (x= 0, y= 0.4) to (x=0.4, y = 0) does the trick. (c) Verify that the following weights implement an appropriate boundary: w0 = -0.4 w1 = 1.0 w2 = 1.0 Ans 1. -w0/w2 = 0.4, y-axis intercept -w0/w1 = 0.4, x-axis intercept and see (c) Ans 2. Fill in some values from the table and see, recall eqn of neural node: sum = 1.w0 + x1.w1 + x2.w2 (bias) (input 1) (inp. 2) if sum > 0, then output = 1 else output = 0 (d) How could this arrangement be extended to include more input variables? (e) How could this arrangement be extended to include more output variables? 9.4 Implementation. ------------------ 9.4.1 Software. -------------- Currently, most neural networks are implemented in software, for example, the neuron in eqns. 9.3-3 and 9.3-4 could be implemented as: sum: FLOAT; w : ARRAY [0..9] of FLOAT; x : ARRAY [0..9] of FLOAT; (*NB element 0 = bias*) output,i : INTEGER; ...initialise x ...initialise w sum:=0; FOR i=0 TO 9 DO sum:=sum+w[i]*x[i]; END; IF (sum > 0) output =1; ELSE output =0; 9.4.2 Hardware. --------------- If we are to operate neural networks in real-time, we must implement them in parallel or some sort of special purpose fast hardware, so, there are plenty of hardware implemented (digital) neural network boards as add-ons for PCs; usually these are based on DSP chips such as the Texas Instruments TMS32040, or the Intel i860. Also, there are a number of analogue integrated circuit neural network chips, see e.g. Brauch et al (1992), and IEEE (1992), IEEE (1993) for special issues on neural network hardware. In the past variable weights were a problem; Rosenblatt's Perceptron used variable resistors driven by motors. Bernard Widrow formed a company which produced (profitably, by all accounts) a device called a 'memistor' (memory-resistor). The memistor was sort-of 'liquid-state'! it used a copper wire (the variable resistor) immersed in a copper sulphate solution; the copper wire was a cathode and there was a copper plate anode; appropriate voltage level and polarity deposited or removed copper from the wire, thus reducing its cross-section area and hence its resistance. 9.4.3 Optical Implementations. ----------------------------- Obviously fast, and low power. - optical multipliers are easy in principle, light x transmissivity, - summer, just use the summing effect of a sensor. 9.5 Training Neural Networks. ---------------------------- 9.5.1 Introduction. ------------------ Up until 1986 (Rumelhart and McClelland, 1986) training was the big problem. Neural networks are trained rather than programmed: that is, the weights are adjusted to provide a best fit representation for a (training) set of examples of pairs (x,y), i.e. (input vector, output). It was clear enough (and pointed out by Minsky and Papert (1969)) that multilayer nets could possible overcome some of the problems of the single layer - but multiple layer couldn't be trained. A training rule called 'back-propogation', which can effectively train multi-layer networks, has made multilayer networks a practical reality. 9.5.2 Hebbian Learning Algorithm. -------------------------------- D.O. Hebb in 1949 proposed a neural leraning algorithm that has been highly influential. Hebb (see Wasserman (1989), p. 212) proposed the following deceptively simple training rule: a synapse (weight) connecting two neurons is strengthened (weight increased) whenever both neurons fire, i.e. wij(t+1) = wij(t) + outi(t).outj(t) (9.5-1) where wij(t) is the weight connecting neuron i and neuron j, at time t outi(t) and outj(t) are the respective outputs at time t, wij(t+1) is the (updated) weight at time t+1. 9.5.3 The Perceptron Training Rule. ---------------------------------- This is a supervised training rule: the weights are adjusted to provide a best fit representation for a (training) set of examples of pairs (x,y), i.e. (input vector, output). Algorithm: --------- Initialise: Set all weights to small random numbers. Loop: 1. Apply an input pattern to the net; compute the output y' 2. Compare y' with y, the target output 3.1 If y' = y (correct) goto 1. 3.2 If incorrect and y' = 0; add each input xi to its corresponding weight wi; 3.3 If incorrect and y' = 1; subtract each input xi from its corresponding weight wi; Until overall result satisfactory. Rosenblatt proved that IF THERE WAS A SOLUTION (complete separation) the perceptron would find it. However, suboptimal solutions?? when to stop? etc. 9.5.4 Widrow-Hoff Rule. ---------------------- Widrow's Adaline (Widrow and Lehr, 1990) was just a continuous valued version of the perceptron (as well as binary output, the perceptron has binary inputs). The Widrow-Hoff training rule is a steepest descent algorithm - it adjusts the weights (and biasses) to minimise the sum-of-squares-error (sum of (target - output)^2). Actually, it is only a little different from the perceptron rule: step 3 needs modification to cope with continuous values: Algorithm: --------- Initialise: Set all weights to small random numbers. Loop: 1. Apply an input pattern to the net; compute the output y' 2. Compare y' with y, the target output 3.1 errorj = yj - y'j ; i.e. target - output, for neuron j 3.2 modify weight ij according to: wij(n+1) = wij(n) + a.xi.errorj Until overall result satisfactory. 9.5.4 Statistical Training. -------------------------- Actually, the Adaline (continuous valued perceptron) training problem is identical to linear regression, and so, where the data are suitable, the Moore-Penrose pseudo-inverse yields an appropriate solution. See Duda and Hart (1973). 9.5.5 Backpropogation. --------------------- Backpropogation is another iterative descent rule. Back-propogation training proceeds in stages: Initialise: ----------- The weights are initialised to small random values in the range [- .3, .3] Train: ------ Repeat until total error small enough: Repeat for all training data: (1) a training input vector is applied to the input layer of the network and the outputs of the hidden layers, and finally, the output layer are computed, (2) then the weights of the output neurons (layer n) are adjusted according to gradient descent on the error between the target outputs and the actual outputs; (3) the weights of the previous layer are adjusted according to the same criterion (NB. the adjustments at layer n-1 are still optimising the overall output - layer n); The theory is remarkably concise and simple, depending only on chain rule derivatives; however, the continuity of the sigmoid function, and the simplicity of its derivative are crucial. Training continues iteratively: at each iteration all the example inputs and outputs are presented and they all contribute towards the estimation of the error gradient. Lippmann (1987) gives a concise (half-page) and complete specification of the algorithm; van Camp (1993), and Winston (1992) also give good explanations. There are two big problems with backpropogation: - long training time (usually), and this is not easy to parallelise, - the algorithm may get stuck in local minima; the only thing to do (provided you can determine when it has go stuck, is to reinitialise with fresh random weights and start again). It is said that XOR will only train correctly 90% of the time; I've had no problems with XOR (two-bit input, one bit out), but I've had problems with the equivalent four-bit problem. Simulated annealing provides a possible solution to this problem. - it is not easy to interpret the weights. 9.5.6 Simulated Annealing. ------------------------- - see Boltzmann training (Wasserman, 1989), p. 81... 9.5.7 Genetic Algorithms. ------------------------- - see Luger and Stubblefield. 9.6. Other Neural Networks. --------------------------- [This is very short, so see any of the textbooks, but especially Wasserman (1989).] - Hopfield: acts as content-addressable memory that can tolerate imperfect inputs, - Kohonen: self-organising (i.e. unsupervised training); for the most part, acts as a k-means clustering algorithm, - Neocognitron: (Fukushima, 1983) shift- and scale-invariant neural network that is supposed to more closely model the human visual system, than any other network, - WISARD (see Boyle and Thomas (1988) ); remarkably simple in concept - based on RAM memory; a cross between straight look-up table and a perceptron; training simply consists of writing to the RAM, application, readout. - recurrent: layer n+1 outputs fed back to layer n; some promise for prediction. 9.7. Conclusion. --------------- The objective of this chapter has been just to give you a flavour of the sorts of processing tasks that neural networks can do. We have made some simplifications, however, the major principles of neural networks are present. Nevertheless, we have shown how neurons - or very simple networks of them can: - recognise simple patterns, - compute functions, - store and recall knowledge. And, of course, combining together these two capabilities will allow them to recognise arbitrarily complex patterns. One thing that we have covered only sparsly is the ability of neural networks to 'generalise' -- i.e. you can apply them to (input) data which does not appear in the training data, and get a sensible result; of course, this 'unknown' data must be somewhat similar to what the network has been trained on. Thus, neural networks are not just lookup tables, or rule-bases, as might have appeared from the very simple examples given. In general is much frothy talk about neural networks, and a certain 'magic' attributed to them. As in everything, neural networks are no panacea -- if, for example, your example (training) data are contradictory, or you have very little training data, then neural networks will not help you, and nor will any KBS for that matter (garbage in, garbage out, still applies!). 9.8 Exercises. -------------- Included here are some exam questions that I have used in the past. See also section 9.3.12 1. Explain how a neural network may be used as a component of a knowledge-based system. [Hint: how are the following implemented: knowledge-base, knowledge elicitation, inference]. 2. Explain some advantages and disadvantages of neural networks compared to other knowledge-based system techniques. 3. (a) What is meant by the statement, "neural networks are trained, not programmed", and explain one major advantage, and one major disadvantage of this fact. [10 marks] (b) What is the significance of the XOR function in the history and theory of neural networks. [6 marks] (c) Figure 7-1 below shows four two-pixel images and their associated classes (class 0 or class 1); '*' denotes bright, value 1, blank denotes dark, value 0; describe a neural network that will distinguish class 1 objects from class 0. [9 marks] x1 x2 +-----+-----+ |*****|*****| class 1 |*****|*****| +-----+-----+ x1 x2 +-----+-----+ |*****| | class 0 |*****| | +-----+-----+ x1 x2 +-----+-----+ | |*****| class 0 | |*****| +-----+-----+ x1 x2 +-----+-----+ | | | class 0 | | | +-----+-----+ Figure 7-1 ---------- 4. (a) Describe the operation of a two-input, single layer neural network, and discuss the difficulty of implementing an XOR function using such a network. [10 marks] (b) Figure 8 below shows four two-pixel images and their associated classes (class 0 or class 1); '*' denotes bright, value 1, blank denotes dark, value 0; describe a neural network that will distinguish class 1 objects from class 0. [10 marks] x1 x2 +-----+-----+ |*****|*****| class 1 |*****|*****| +-----+-----+ x1 x2 +-----+-----+ |*****| | class 0 |*****| | +-----+-----+ x1 x2 +-----+-----+ | |*****| class 0 | |*****| +-----+-----+ x1 x2 +-----+-----+ | | | class 0 | | | +-----+-----+ Figure 8 -------- 5. (a) Describe the activities carried out by a single (neuron) processing unit; explain what components of the network represent its 'memory', and explain what is meant by the statement 'neural networks are trained, not programmed'. [10 marks] (b) Explain how the neuron shown in Figure 8-1 computes the AND function (F = A AND B); show how an alternative choice of weights can implement an 'OR' neuron. [4 marks] A +1 \ | \0.35 | \ |-0.5 \ | +--+--+ B 0.35 | | ------------+ +------------> F | | +--+--+ (c) Explain how you would apply a neural network to pattern recognition. Identify a major weakness of single layer neural network and explain how a multilayer network can overcome this weakness. [6 marks] 6. (a) Explain the similarity between the activity of an artificial neuron and template matching. (b) Explain how a neuron can implement an AND function. Hence, explain how a neural network may implement any Boolean function. (c) Explain the limitations of a single layer neural network for pattern recognition and show how multiple layers can remedy this problem. 7. (a) Explain what components of a neural network represent its 'memory', and explain what is meant by the statement 'neural networks are trained, not programmed'. (b) Explain how the neuron shown in Figure 18-1 computes the AND function (F = A AND B); show how an alternative choice of weights can implement an 'OR' neuron. A +1 \ | \0.35 | \ |-0.5 \ | +--+--+ B 0.35 | | ------------+ +------------> F | | +--+--+ (c) Explain how you would apply a single neuron to pattern recognition. (d) Sketch the software implementation of a single neuron. (e) Explain a weakness of a single neuron (or single layer of neurons) and show how multiple layers can remedy this problem. 8. (a) Give an intuitive explanation of neural network training. (b) Explain 'sigmoid activation function'. (c) Explain how to use a multiple layer neural network for pattern recognition. What problem does the 'soft' sigmoid activation cause, and how is it solved? (d) what is meant by the statement: "neural networks are model free". 9. (a) Give feature space explanation of the similarity of pattern recognition and Boolean function computation. (b) Discuss the difficulty of implementing XOR using a single layer of neurons. (c) Relate the XOR problem to pattern recognition. [Ans: linear boundaries]. (d) Discuss two weakness of neural netorks. 9.9. Recommended Reading. ------------------------ [This is included mainly for someone who would like to pursue the topic further, e.g. as part of a project / dissertation]. The current BEST book on neural networks is Haykin (1994). Wasserman (1989) is particularly easy and complete for teach- yourself; it covers all the major architectures, and training algorithms; gets to the essence of the matter often much better than most of the original papers. From an Artifial Intelligence / Expert Systems point of view, Luger and Stubblefield (1993), and the other popular AI textbook, Winston (1992), both give good introductions to neural networks. Lippmann (1987) has been the traditional teach-yourself guide - but, in my opinion, is not easy going. Rumelhart and McClelland (1986) was obviously influential but, being an edited collection, seems uneven and not easy to read. There is a companion volume (Vol. 3) that contains software; PS. I have the disk belonging to the library copy! although I have always found the software difficult to understand and, hence, use. Nagy (1991) gives a good brief account of the history; Hecht- Nielsen (1991) gives his own colourful version of the story - but is not a good teach-yourself book; ditto Kosko (Fuzzy Sets and Neural Networks): very clever, crusading and inspiring - but the book would have benefited from better editing. Duda and Hart (1973) is still the classic on pattern recognition; Agrawala (1976) gives many classic pattern recgnition papers. Schalkoff (1990) is a modern pretender - and gives a modern coverage of neural nets (but only feedforward backprop.) Uhr (1966) is a collection that includes much of the influential early work on human perception and pattern recognition. IEEE publish a bi-monthly Transactions on Neural Networks. IEEE Trans. on Systems Man and Cybernetics often contains NN applications. Applications are covered well in Gonzalez and Woods (1992), Winston (1992), Luger and Stubblefiled (1993); the Rosenfeld Image Processing survey papers (Rosenfeld, 1993, 1992, 1991, etc.) have sections on neural network applications to image processing. Mathematics packages like MATLAB now have promising neural net additions. There are dedicated NN software packages: most of them seem expensive for what they offer - don't buy one without a good recommendation, trial, and/or review. 9.9. References and Bibliography. --------------------------------- [I've not separated the references from the bibliography] Agrawala, A.K. 1976. Machine Recognition of Patterns. IEEE Press. Aleksander, I. and H. Morton. 1990. An Introduction to Neural Computing. London: Chapman and Hall. Arbib, M.A., and J. Buhmann. 1990. Neural Networks Encyclopedia of AI. in (Shapiro, 1990) Barlow, H.B. 1953. Summation and Inhibition in the Frog's Retina. J. Physiology, Vol. 119, pp. 69-88. Beck, J.V., and Arnold, K.J., 1977, Parameter Estimation in Engine- ering and Science, John Wiley. Block, H.D. 1962. The Perceptron: A Model for Brain Functioning. Reviews of Modern Physics, Vol. 34, No. 1, January. Bratko, I. 1991. PROLOG: Programming for Artificial Intelligence. Addison- Wesley. Boyle, R.D. and R.C. Thomas. 1988. Computer Vision: A First Course Blackwell Scientific. Includes simple easy to grasp description of WISARD (more than its inventors ever managed!). Brauch, J., Tam, S.M., Holler, M.A., and Shmurun, A.L., 1992, Analog VLSI Neural Networks for Impact Signal Processing, IEEE Micro, Vol. 12, No. 6, December. Brookshear, J.G. 1991. Computer Science: An Overview. Benjamin / Cummings. See section 10.2, p. 366, for a very accessible and simple introduction to NNs. Campbell, J.G. and A.A. Hashim. 1992. Fuzzy Sets, Pattern Recognition, Linear Estimation, and Neural Networks - a Unification of the Theory with Relevance to Remote Sensing. in Cracknell, A.P. and R.A. Vaughan, eds. Proc. 18th Annual Conf. of the Remote Sensing Society. University of Dundee, September, pp. 508-517. Chow, C.K. 1957. An Optimum Character Recognition System Using Decision Functions. IRE Trans. Electron. Comput., Vol. EC-6, Dec. 1957. Davalo, E. and P. Naim. 1991. Neural Networks Macmillan Press. Deutsch, J.A. 1955. A Theory of Shape Recognition. British Journal of Psychology, Vol. 46, pp. 30-37. Reprinted in (Uhr 1966). Devijver, P.A., and J. Kittler. 1982. Pattern Recognition: A Statistical Approach. Englewood Cliffs, NJ: Prentice-Hall. Duda, R.O. and Hart, P.E., 1973, Pattern Classification and Scene Analysis, Wiley-Interscience. Eberhart, R.C. and Dobbins, R.W., eds., 1990, Neural Network PC Tools, Academic Press. Feller, W., 1966, An Introduction to Probability Theory and its Applications, Volume II, John Wiley and Sons. Fisher, R.A. 1936. The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugneics. Vol. 7. pp. 179-188. (in Agrawala, 1976). Fix, E. and J.L. Hodges. 1951. Discriminatory Analysis, Nonparametric Discrimination: Consistency Properties. USAF School of Aviation Medicine, Randolph AFB, TX. Project 21-49- 004, Report No. 4, February. Fix, E. and J.L. Hodges. 1952. Discriminatory Analysis, Nonparametric Discrimination: Small Sample Performance. USAF School of Aviation Medicine, Randolph AFB, TX. Project 21-49- 004, Report No. 11, August. Funahashi, K-I. 1989. On the Approximate Realisation of Continuous Mappings by Neural Networks Neural Networks, Vol. 2, pp. 183-192, 1989. Fukushima, K., S. Miyake, and T. Ito. 1983. Neocognitron: A Neural Network Model for a Mechanism of Visual Pattern Recognition. IEEE Trans. Systems, Man, and Cybernetics. Vol. SMC-13, No. 5. Fukunaga, K. 1992. Introduction to Statistical Pattern Recognition 2nd ed. Academic Press 1992. Gonzalez, R.C. and R.E. Woods. 1992. Digital Image Processing 3rd ed. Addison-Wesley 1992. Hammerstrom, D. 1993a. Neural Networks at Work IEEE Spectrum June 1993,pp. 26-32 Hammerstrom, D. 1993b. Working with Neural Networks. IEEE Spectrum July 1993, pp. 46-53 Haykin, S. 1994. Neural Networks. Macmillan. Hecht-Nielsen, R. 1987. Kolmogorov's Mapping Neural Network existence Theorem Proc IEEE 1st International Conference, Neural Network, Vol. III pp. 11-14 Hecht-Nielsen, R., 1990, Neurocomputing, Addison-Wesley. Hinton, G.E. 1992. How Neural Networks Learn from Experience Scientific American Sept. 1992. Hopfield, J.J. 1982. Neural Networks and Physical Systems with emergent collective computational abilities Proc. Natl. Acad. Sci. USA Vol. 79. pp. 2554-2558 April 1982 Hubel, D.H. and T.N. Weisel. 1962. Receptive Fields, Binocular Interaction, and Functional Architecture in the Cat's Visual Cortex. Journal of Physiology. Vol. 160, pp. 106-123. Reprinted in (Uhr 1966). IEEE. 1993. Special Issue on Neural Network Hardware. IEEE Trans. Neural Networks. Vol. 4 No. 3. May. IEEE. 1992. Special Issue on Neural Network Hardware. IEEE Trans. Neural Networks. Vol. 3 No. 3. May. IEEE. 1990a. Special Issue on Neural Networks I: Theory and Modelling. Proceedings of the IEEE, 78, No. 9, Sept. 1990. IEEE. 1990b. Special Issue on Neural Networks I: Analysis, Techniques, and Applications. Proceedings of the IEEE, 78, No. 10, Oct. 1990. IEEE. 1983. Special Issue on Neural and Sensory Information Processing. IEEE Trans. Systems, Man, and Cybernetics. Vol. SMC-13, No. 5. Karnofsky, K. 1993. Neural Networks and Character Recognition Dr. Dobb's Journal, June 1993. Kosko, B. 1992. Neural Networks and Fuzzy Systems, Prentice-Hall Int. Kosko, B. 1991. Neural Networks for Signal Processing Prentice-Hall 1991 Lippmann, R.P., 1987, An Introduction to Computing with Neural Nets, IEEE ASSP Magazine, April. Luger, G.F. and W.A. Stubblefield. 1993. Artificial Intelligence 2nd ed. Benjamin/Cummings. MacCarthy, R.A. 1955. Electronic Principles in Brain Design J. Irish Medical Association. Vol. 37, No. 221. November. McCulloch, W.S., and W. Pitts. 1943. A Logical Calculus of the Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics, Vol. 5, 1943. Mehra, P. and B.W. Wah (eds). 1992. Artificial Neural Networks: Concepts and Theory IEEE Press 1992. (ordered for library 10/8/93). Minsky, M. 1961. Steps Towards Artificial Intelligence. Proc. IRE, Vol. 49, No.1 Jan. 1961. Minsky, M.L., and Papert, S.A., 1969, Perceptrons, MIT Press. Expanded/Reprinted edition, 1988, MIT Press. Nagy, G. 1991. Neural Networks - Then and Now. IEEE Trans. Neural Networks. Vol. 2 No. 2. Nilsson, N.J. 1965. Learning Machines: Foundations of Trainable Pattern-Classifying Systems. New York: McGraw-Hill. Rao Vemuri, V. (ed.). 1992. Artificial Neural Networks: Concepts and Control Applications IEEE Press 1992. (ordered for library 10/8/93) Rosenblatt, F. 1961. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Washington D.C.: Spartan Books. Rosenblatt, F. 1960. Perceptron Simulation Experiments. Proc. IRE. Vol. 48, pp. 301-309. March. Rosenfeld, A. and A.C. Kak. 1982a. Digital Picture Processing. Vol. 1 Academic Press. Rosenfeld, A. and A.C. Kak. 1982b. Digital Picture Processing. Vol. 2 Academic Press. Rosenfeld, A. 1992. Survey, Image Analysis and Computer Vision: 1991 CVGIP:Image Understanding, Vol. 55, No. 3, May pp. 349-380. Rosenfeld, A. 1993. Survey, Image Analysis and Computer Vision: 1992 CVGIP:Image Understanding, Vol. 58, No. 1, July pp. 85-135. Rosenfeld, A. 1990. Survey, Image Analysis and Computer Vision: 1989 Computer Vision, Graphics, and Image Processing, Vol. 50, pp. 188-240, 1990. Rosenfeld, A. 1989. Survey, Image Analysis and Computer Vision: 1988 Computer Vision, Graphics, and Image Processing, Vol. 46, pp. 196-264, 1989. Rosenfeld, A. 1988. Survey, Image Analysis and Computer Vision: 1987 Computer Vision, Graphics, and Image Processing, Vol. 42, pp. 234-293, 1988. Schalkoff, R. 1992. Pattern Recognition: Statistical, Structural and Neural Approaches Wiley 1992 Shapiro, S. (ed.) 1990 Encyclopedia of Artificial Intelligence. ?? (in reference section of Magee library). Therrien, C.W. 1989. Decision Estimation and Classification. New York:John Wiley. Uhr, L. 1966. Pattern Recognition: Theory, Experiment, Computer Simulations, and Dynamic Models of Form Perception and Discovery. New York: John Wiley. van Camp, D. 1992. Neurons for Computers Scientific American Sept. 1992. Wasserman, P.D. 1989. Neural Computing - Theory and Practice. New York: van Nostrand Reinhold. Widrow, B., and Lehr, M.A., 1990, 30 Years of Adaptive Neural Networks. Proceedings of the IEEE, 78, No. 9, Sept. 1990. Winston, P.H. 1992. Artificial Intelligence 3rd ed Addison-Wesley. end of file.