University of Ulster, Magee College. BSc. Applied Computing (1220), Year 4. Course: Image Processing (AC460). Autumn Term 1995. Lecturer: J.G. Campbell. Date: / /94 2. Digital Image Fundamentals. ------------------------------ 2.1 Visual Perception --------------------- Here, briefly, are some points about human visual perception: - the perceived image may differ from the actual light image (ie. the perceived BRIGHTNESS image is a considerably modified 'copy' of the physical light intensity emanating from the scene), - there are two types of light sensors on the retina - rods and cones, - rods are more sensitive than cones; rods are used for night (scotopic) vision; rods are largely colour insensitive (e.g. no colour evident in moonlight), - cones are used for brighter light, cones can sense colour, - perceived (subjective) brightness (Bs) is roughly a logarithmic function of light intensity (L): thus, if you increase L by 10, Bs increases by only 1 unit, increase L by 100 Bs increases by 2 units, 1000-> 3 etc. - the visual system can handle a range of about 10**10 (10 thousand million) in light intensity (from the threshold of scotopic vision to the glare limit), (Q. how many bits is that?) - to handle this range, the pupil must adapt by opening and closing the pupil; opening the pupil - in darkness - lets more light in; closing it - in bright light - lets less light in, - the eye can handle only a range of about 160 at any one instant, i.e. where there is no opening and closing of the pupil; of course, this explains why you never need any more than 8-bits (256 levels) in a display memory, 2.2 An Image Model ------------------ Note: in this chapter I am a bit cavalier about physical units; it would take too much time to deal rigorously with the physics of light radiation. 2.2.1 A General Imaging System. ------------------------------ A general camera-based sensing arrangement is shown in Figure 2.2- 1: the object, some distance from the camera lens, is projected onto the image plane. At the image plane there is a mosaic of light sensitive sensors; these have the effect of transforming the two- dimensional continuous image lightness function, fi(y,x), into a discrete function, f'[r,c], where r(ow) and c(olumn) are the discrete spatial coordinates; as in Chapter 1, we use square brackets, [,], to indicate a discrete domain; eventually, f'[.] gets digitised to yield a digital image, fd[r,c], (where digital often connotes discrete space, in addition to integer valued; since all the images under discussion will be digital, we drop the 'd' subscript in normal usage). Figure 2.2-1 Image Capture Schematic ------------------------------------ In most image sensors, the mosaic of light cells completely cover the image plane, and the light cell corresponding to f[r,c] has a finite area, say A = (yr-sy/2 <= y <= yr+sy/2, xr-sx/2 <= x <= xr+sx/2), and so the sensing process involves integration (averaging) as well as spatial sampling: fd[r,c] = ! fi(y,x) dy dx (2.2-1) A Thus, we arrive at a digital image: fd[r,c] where fd can take on discrete values [0,1..G-1] and r - [0,1..n-1], c - [0,1..m-1], (from now on we drop the 'd' i.e. fd[,] -> f[,] ) ie. f[r,c] : [0,N-1] x [0,m-1] -> [0,G-1] domain range which can be viewed as a matrix (two-dimensional array) of numbers: + + |f[0,0] f[0,1] ... f[0,m-1] | |f[1,0] f[1,1] ... f[1,m-1] | | | f[r,c]= | | (2.2-2) | | |f[n-1,0] f[n-1,1] ... f[n-1,m-1] | + + In many image processing applications, f(.,.) is represented by an 8-bit byte (f -> [0..255]); the range [0..255] derives not only from storage convenience, but from the facts that: (a) human eyes can, simultaneously, perceive only about 160 light levels (see section 2.1), and, (b) most optical sensors are troubled to exceed a signal to noise ratio of 48 decibels [48 dbs = 20.log(1/256]. Mostly we will be dealing with monochrome images - i.e. f[r,c] represents a grey level. In a colour image f(,) must give a colour. From the point of view of image processing, a colour image can be represented by three monochrome images, each representing the intensity of a primary colour (eg. red, green, blue). Thus, fr[r,c], fg[r,c], and fb[r,c], for red, green and blue. Of course, we can generalise to any number of 'wavebands' / 'colours', in or out of the visible spectrum. A generalised 'colour' image is represented by f[b,x,y], where b denotes colour (b = band), where, normally, band = 0, 1, and 2, for red, green, and blue. 2.2.2 Radiometric Measurement amd Calibration. ---------------------------------------------- 2.2.2.1 Motivation. ------------------ In chapter 1 we defined an image: " ... monochrome image, refers to a two-dimensional brightness function f(x,y), where x and y denote spatial coordinates, and the value of f at any point (x,y) gives the brightness (or, grey level) at that point". For this section it would be better to talk of light intensity or lightness (instead of brightness). Correct terms: LIGHTNESS describes the real physical light intensity, Brightness is only in the mind. Think now of the scene as a flat two-dimensional plane - a sheet of coloured paper. Its lightness, f(x,y), is the product of two factors: i(x,y) - the illumination of the scene, i.e. the amount of light falling on the scene, at (x,y), r(x,y) - the reflectance of the scene, i.e. the ratio of reflected light intensity to incident light. (2.2-3) f(x,y) = i(x,y).r(x,y) Naturally occurring ranges of values of i and r: Illumination (i): Sunny day at surface of earth: 9000 units Cloudy day 1000 Full Moon 0.01 Office lighting 100 Reflectance (r): Snow 0.93 White paint 0.80 Stainless steel 0.65 Black velvet 0.01 N.B. Pure white (r=1) and pure black (r=0.0) are hard to achieve. 2.2.2.2 Uneven Illumination. ---------------------------- More often than not, when we sense a scene, we want to measure r(x,y), so we assume that i(x,y) is constant I0, so that f(x,y) = r(x,y).I0. Thus except for the multiplicative constant, we have r(x,y). If illumination is NOT constant across the scene, then we have problems disentangling what variations are due to r, and what are due to i. 2.2.2.3 Uneven Sensor Response. ------------------------------- Most modern electronic cameras are Charge-Coupled-Device based. In a CCD you have a rectangular array of light sensitive devices i=0,1,...n-1, j=0,1,...m-1 at the image plane. The voltage given out by these is proportional to the amount of light falling on it. Often it is assumed that an image f(x,y) arriving at the cameras image plane, is converted into a values (analogue or digital), fc(x,y), which are proportional to f(x,y), i.e. (2.2-4) fc(x,y) = f(x,y).K If K = K(x,y), i.e. it varies across the image plane, then we have non-even illumination. However, in this case, if K(x,y) can be relied on to stay constant with time, we can estimate it, e.g. by imaging a sheet of constant reflectance, and constant illumination. This is RADIOMETRIC CALIBRATION. An example is given at the end of the Chapter. 2.3 Imaging Geometry. -------------------- 2.3.1 General. ------------- Figure 2.3-1 shows the imaging geometry, (Rosenfeld & Kak, 1982b). The reference frame (x,y,z) is based on the image plane. P, at coordinates (x,y,z) is a general point in the scene, and Pc, (u,v,0), its projection onto the image plane. By similar triangles, the following relationships hold: u = - f.x v = - f.y (2.3-1) ----- ----- z - f z - f Reference frame (x,y,z) is based on the centre of the image plane; O is the origin. The coordinate frame (x',y',z') is based on the object. Figure 2.3-1 Imaging Geometry. ------------------------------ 2.3.2 Geometric Distortion. --------------------------- Eqn. 2.3-1 yields two important pieces of information: first, the image is inverted (xc, yc are negative), and, second, there is a scale change, the larger z, the smaller the image. Normally, camera users are unaware of the inversion, the recording process takes care of it. Clearly, however, scaling is a problem, since the size of the image changes with distance from the camera; it not easy to ensure that the object remains at a fixed distance. The problem is exacerbated if the object is tilted with respect to the image plane, there are different scalings for x, and y, and we have persective distortion, see Figure 2.3-2a. In addition, due to imperfections, lens systems may be subject to other forms of geometric distortion, involving non-linear terms in x, y and cross terms; typical are barrel distortion, and pincushion distortion, see Figures 2.3-2b, and 2.3-2c. These show the distorted images of an object consisting of orthogonal, parallel, and equally spaced lines ruled on a plane. (a) Perspective (b) Barrel (c) Pin-cushion distortion distortion distortion Figure 2.3-2 Geometric Distortion --------------------------------- The existence of a range of distortion types, as well as parameters has serious implications for machine vision and pattern recognition: essentially they increase the 'search-space' for any matching procedure; an alternative, but equivalent, interpretation is that they introduce extra 'invariance' requirements on a recognition algorithm (see Chapter 8). 2.3.3 Geometric Calibration --------------------------- In cases where we must make accurate spatial measurements from an image, it may be neccessary to geometrically calibrate it. Essentially, this entails performing, numerically, the inverse of the image creation distortion. 2.3.4 Object Frame versus Camera Frame -------------------------------------- Figure 2.3-1 shows two reference frames, (x,y,z) based on the camera, and (x',y',z') based on the object. The position of the origin of the object frame (its range) with respect to the camera frame AND the relative orientation of the object frame is called its pose; in the aerospace industry this is called attitude). Thus, pose = range vector (r = (rx,ry,rz)) and attitude which consists of: pitch = rotation about the x-axis, yaw = rotation about the y-axis, and roll = rotation about the z-axis. 2.3.5 Lighting Angles. ---------------------- As mentioned in section 2.2, the spatial and colour distribution of the light source are important factors. In addition directionality may be important: e.g. (a) directional light from a single oblique source causes shadows; (b) a light source close to the camera axis may cause specular reflection from shiny surfaces. 2.4 Sampling and Quantisation. ------------------------------ See chapter 1. Be aware of: - the squared increase in data volume with increase in spatial resolution; i.e. go from 2 mm x 2 mm pixels to 1mm x 1mm and the number of pixels increases by FOUR (not two), - ditto as the image size increases. Look at pages Gonzalez and Woods, pp. 35-37 to see the effects of reducing resolution (sampling grid), and of reducing grey levels; notice how contouring becomes evident in Figure 2.10 (e) (16 levels, 4 bits), and (f) (8 levels, 4 bits). 2.5 Colour. ----------- 2.5.1 Electromagnetic Waves and the Electromagnetic Spectrum. ------------------------------------------------------------ Light is a form of energy conveyed by waves of electromagnetic radiation. The radiation is characterised by the length of its wavelength; the range of wavelengths is called the ELECTROMAGNETIC (EM) SPECTRUM. Visible light occupies a very small part of the spectrum. Table 2.5-1 shows the EM spectrum: the left hand column gives the wavelength in metres, the middle gives the name of the band, and the right gives the frequency of the radiation in Hertz (cycles per second). ------------------------------------------------------------ Wavelength (m) Name Frequency (Hz) ------------------------------------------------------------ 10^-15 1 femto- gamma rays 3 x 10^23 Hz -metre (fm) 10^-12 1 picametre X-rays 3 x 10^20 Hz 10^-9 1 nanometre X-rays 3 x 10^17 Hz 10^-8 10 nm. Ultraviolet 3 x 10^16 Hz 10^-7 100 nm U-V 4 x 10^-7 400 nm. Visible light (violet) 7 x 10^-7 700 nm. Visible (red) 10^-6 1 micrometre Infrared (near) 3 x 10^14 Hz 10^-5 10 micrometres Infrared 3 x 10^13 Hz Infrared (heat) 10^-3 1 millimetre Infrared (heat) + 3 x 10^11 Hz microwaves (300 GigaHz) 10^-1 0.1 metres microwaves 3 x 10^9 (3 GigaHz) 1 metre TV etc (UHF) 3 x 10^8 (300 MegaHz) FM radio is ~ 100 Mhz (VHF) 10 metres radio (shortwave) 30 Mhz 100 metres radio (shortwave) 3 MHz 200-600 m. radio (medium wave) 1.5 MHz to 500 KHz 1500 m. (1 Km) radio (long wave) 200 KHz ------------------------------------------------------------------ Table 2.5-1 The Electromagnetic Spectrum. ---------------------------------------- Thus, crudely, if you were to 'speed-up' the frequency of vibration of a TV signal, you would get microwaves, speed-up microwaves -> heat radiation, -> light -> UV -> X-rays, etc. (Incidentally, microwave cookers work at approx. 900 MHz, this happens to be the frequency at which the water molecule, H2O, will resonate). It is possible to use various parts of the EM spectrum for imaging: e.g. X-rays, microwaves, infrared (near), and thermal infrared. But, our major interest will be in visible light. 2.5.2 The Visible Spectrum. -------------------------- The visible spectrum streches from about 400 nm. to 700 nm. The reason why this part of the spectrum is visible is that the rods and cones in our retinas are sensitive to these wavelengths, and insensitive to the remainder; e.g. if you look at a clothes iron in the dark, you may 'feel' the heat radiated from it, but your eyes will not convert that energy into a light sensation; similarly, microwaves and X-rays, they may cause damage, but you will not 'see' them. The relative spectral sensitivity of human eyes within the visible spectrum is shown in Figure 2.5-1, with approximate indication of corresponding colours. Violet Blue Green Yellow Orange Red 100% + * | * * | 80% + * * | * * | 60% + * * | * * | 40% + * * | | * * 20% + * * | * * |* * +------+------+------+------+------+------+ 400 450 500 550 600 650 700 Figure 2.5-1 Relative Spectral Sensitivity of the Eye. ----------------------------------------------------- The term 'spectral' is often used - it refers to the electromagnetic radiation frequency SPECTRUM - the range of frequencies which make up the light; we will have cause to cover other forms of spectra (see Chapter 3). From Figure 2.5-1 we can see that the eye is very sensitive to radiation in the green-yellow range (peak at 550 nm), and relatively insensitive to blue, violet, and deep red; a blue light around 475 nm (relative sensitivity approx. 10%) would have to put out 10 times more power than the equivalent green-yellow light. Why did human evolve this way? Well, the energy emitted by the sun (at least that part that reaches the earth) has an energy spectrum graph similar to Figure 2.5-1. 2.5.3 Sensors. ------------- A light sensor is likely to have a similar spectral response curve to Figure 2.5-1, though usually flatter and wider - i.e. more equally sensitive to wavelengths, and sensitive to UV and to near infrared. If Figure 2.5-1 was the spectral response of a sensor, then a blue light (see above), compared to a green-yellow light of the same power, would produce a sensor output of 10% of the voltage of the green-yellow. 2.5.4 Spectral Selectivity and Colour. ------------------------------------- We have already mentioned that a colour sensor (eg. in a colour TV camera) is merely three monochrome sensors: one which senses blue, one green, and one red. What is meant by sensing blue, green, or red? What we do is arrange for the sensor to have an effective response curve that is high in green (eg.) and low elsewhere. But, we have already said that sensors have a fairly flat curve (maybe 200-1000 nm), so we must arrange somehow to block out the non green light. Wavelength sensitive blocking is done by a colour FILTER. A green filter allows through green light but absorbs the other; similarly blue and red. Violet Blue Green Yellow Orange Red 100% + * | * * | 80% + * * | * * | 60% + * * | * * | 40% + * * | | * * 20% + * * | * * | * * +------+------+------+------+------+------+ 400 450 500 550 600 650 700 Figure 2.5-2 Relative Sensitivity of a Green Filter --------------------------------------------------- So, we use three separate sensors, each with its own filter (blue, green, and red) located somwewhere between the lens and the sensor. i.e. we have f[d,r,c], d=0 (blue), 1 (green), and 2, (red). 2.5.5 Spectral Responsivity. --------------------------- The relative response of a sensor can be described as a function of wavelength (forget about (x,y) or (r,c) for the meanwhile): d(\) (\ denotes lambda, wavelength) The light arriving through the lens can also be described as a function of \ : g(\), and the overall output is found by integration: inf (2.5-1) voltage = ! d(\) . g(\) d\ 0 Obviously, the integral can be limited to (say) 100 nm to 1000 nm. If we have a filter in front of the sensor, relative transmittance (the amount of energy it lets through), t(\), then the light arriving at the sensor, g'(\), is the product of g() and t(): (2.5-2) g'(\) = g(\) . t(\) and eqn. 2.5-1 changes to: inf (2.5-3) voltage = ! d(\) . g(\) . t(\) d\ 0 or, inf (2.5-4) voltage = ! d(\) . g'(\) d\ 0 2.5.6 Colour Display. -------------------- So now we have three images stored in memory; how to display them to produce a proper sensation of colour? Similarly to our model of a colour camera as three monochrome cameras, a colour monitor can be thought of as three monochrome monitors: one which gives out blue light, one green and one red. A monochrome cathode ray tube display works by using an electron gun to squirt electrons at a fluorescent screen; the more electrons the brighter the image; what controls the amount of electrons is a voltage that represents brightness, say fv(r,c), A monochrome screen is coated uniformly with phosphor that gives out white light - i.e. its energy spectrum is simular to Figure 2.5-1. A colour screen is coated with minute spots of colour phosphor: a blue phosphor spot, a green, a red, a blue, a green, ... following the raster pattern mentioned in Chapter 1. The green phosphor has a relative energy output like the curve in Figure 2.5-2; the blue has a curve that peaks in the blue, etc. There are three electron guns - one controlled by the blue image voltage (say, f(0,r,c)), one by the green (fg(r,c)) and one by the red (fr(r,c)). Between the guns and the screen, there is an intricate arrangement called a 'shadow-mask' that ensures that electrons from the blue gun reach only the blue phosphor spots, green -> green spots, etc. 2.5.7 Additive Colour. --------------------- If you add approximately equal measures (I'm being very casual here, and not mentioning units of measure) of blue light, green light and red light, you get white light. That's what happens on a colour screen when you see bright white: each of the blue, green, and red spots are being excited a lot, and equally. Bring down the level of excitation, but keep them equal, and you get varying shades of grey. Your intuition may lead you to think of subtractive colour; filters are subtractive: the more filters, the darker; combine blue, green and red filters and you get black. However, with additive colour, the more light added in, the brighter; the more mixture, the closer to grey - and eventually white. 2.5.8 Colour Reflectance. ------------------------- [This subsection may be skimmed at the first reading] All this brings a new dimension to the discussion of illumination and reflectance in section 2.2. Now we can think of illumination (i) and reflectance(r) as functions of \ as well as (x,y): Thus, the lightness function is now spectral (and therefore a function of \), i.e. f(\,x,y) is the product of two factors: i(\,x,y) - the spectral illumination of the scene, i.e. the amount of light falling on the scene, at (x,y), at wavelength \, r(\,x,y) - the reflectance of the scene, i.e. the ratio of reflected light intensity to incident light (2.5-5) f(\,x,y) = i(\,x,y).r(\,x,y) Why does an object look green (assuming it is being illuminated with white light)? simply because its r(\,..) function is high for \ in the green region (500-550 nm), and low elsewhere (again, see Figure 2.5-2). Of course, illumination comes into the equation: a white card illuminated with green light (in this case i(\,..) looks like Figure 2.5-1) will look green, etc. 2.5-9 Exercises. --------------- Ex. 2.5-1. Write down cases where you might want to use very narrow band filters, i.e. you want to be very selective about the colour of light you let into the sensor. Ex. 2.5-2. A coloured card whose reflectivity is r(\,x,y) is illuminated with coloured light with a spectrum i(\) (constant over spatial coordinates (x,y); this is sensed with a camera whose CCD sensor has a responsivity d(\) (again constant over x,y); a filter with transmittance t(\) is used. Show that the overall voltage output is v(x,y) = ! r(\,x,y) . i(\) . t(\) . d(\) d\ Ex. 2.5-3. A blue card is illuminated with white light; explain the relative levels of output from a colour camera for blue, green, red. Ex. 2.5-4. A blue card is illuminated with red light; explain the relative levels of output from a colour camera for blue, green, red. Ex. 2.5-5. A blue card is illuminated with blue light; explain the relative levels of output from a colour camera for blue, green, red. What, if any, will be the change from Ex. 2.5-4. Ex. 2.5-6. A white card is illuminated with yellow light; explain the relative levels of output from a colour camera for blue, green, red. Ex. 2.5-7. A white card is illuminated with both blue and red lights; explain the relative levels of output from a colour camera for blue, green, red. Ex. 2.5-8. A blue card is illuminated with both blue and red lights; explain the relative levels of output from a colour camera for blue, green, red; what, if any, will be the change from Ex. 2.5-6. 2.6 Photographic Film. ---------------------- Many images start off as photographs, so film cannot be ignored. Realise that: - just like the eye, film is limited in the range of illumination that it can handle, - a camera adapts by opening / closing the lens diaphram, - or, by increasing decreasing exposure time. 2.7 General Characteristics of Sensing Methods. ---------------------------------------------- 2.7.1 Active versus Passive. --------------------------- Active methods require, in addition to a sensor, a source of energy which illuminates or oherwise probes or excites the object. See Figures 2.7-1 (a), (b), and (c). Passive methods operate by sensing some emission that emanates naturally (eg. reflected sunlight) from the object, see Figure 2.7- 2. 2.7.2 Methods of Interaction. ---------------------------- (a) Absorption. Here we assume that the object is relatively transparent, see Figure 2.7-1 (c). This how X-rays work. (b) Reflection. See section 2.5.8, and Figures 2.7-1 (a), (b) and Figure 2.7-2 (a). (c) Emission. See Figure 2.7-2 (b); here the sensed object creates the sensed energy (eg. a piece of hot metal, the sun). Figure 2.7-1 Active Sensing Configurations. ------------------------------------------ Figure 2.7-2 Passive Sensing Configurations. ------------------------------------------- 2.7.3 Contrast. -------------- For sensing to be effective the sensed signal must change for different parts of the object (otherwise we have the equivalent of a blank screen); CONTRAST defines the magnitude of sensed signal change that differentiates (generally speaking) between object present and not present. e.g. X-rays, let G0 be the image grey level corresponding to just soft tissue, let Gb be the grey level for bone, then the contrast for bone, Cb, is (2.7-1) Cb = (Gb - G0)/G0 2.7.4 Exercises. ---------------- Ex. 2.7-1. (a) What is meant by ACTIVE sensing. (b) Explain how, and why, active infra red sensing cameras could be used by wildlife film-makers. Ex. 2.7-2. (a) What is meant by PASSIVE sensing. (b) Explain how, and why, passive infra red sensing cameras could be used by wildlife film-makers. (c) In a military application, why would passive sensing be preferred to active. Ex. 2.7-3. Identify and explain one application of aerial thermal infra-red sensing. Ex. 2.7-4. (a) Explain how a medical X-ray system works. (b) Identify and explain uses of X-ray images, other than medical. Ex. 2.7-5. Referring to Figures 2.7-1, -2 identify a suitable sensing arrangement for detecting flaws (small holes) in paper manufacture. Ex. 2.7-6. In problem 2.7-5, assume that you have a single line of sensors (512 of them across the moving roll of paper). The sensor is sampled rapidly, giving out 512 samples for every millimetre of paper longitudinal movement; the sensor width also corresponds to a transverse extent of 1 mm. (a) Assuming that you have a function, say sread(f), that reads the samples into an array f (unsigned char f[512]), and that your computer can keep up with the processing, suggest processing to detect small holes. [Assume background readout (normal) of 10, and much higher when there is a hole]. Hint: #define NPIXELS 512 unsigned char f[NPIXELS]; while(1){ /*do forever*/ waitForSignal(); /*waits for sampling signal*/ sread(f); for(i=0;i