University of Ulster, Magee College. BSc. Applied Computing (1220), Year 4. Course: Image Processing (AC460). Autumn Term 1995. Lecturer: J.G. Campbell. Date: / /95 3. Transforms in Image Processing. --------------------------------- 3.1 Introduction. ---------------- Linear transformations particularly the Fourier transform, are important tools for image and signal analysis and processing. Fourier and related transforms have wide ranging applications in image enhancement - and general image filtering, in image restoration, in image data compression, in image pattern recognition and in image registration. We will start this chapter with a review of the mathematics of vectors, matrices and transforms. In Section 3.3 we will briefly introduce digital signal processing concepts (the processing one-dimensional sequences of data). Section 3.4 introduces the concept of the Fourier series expansion of functions, and section 3.5 the one-dimensional integral Fourier transform. Section 3.6 defines and describes the discrete Fourier transform (DFT). Section 3.7 introduces a fast algorithm for computing the DFT, the Fast Fourier transform - FFT. Section 3.8 discusses convolution, impulse responses, linear systems, and convolution in continuous systems, and two-dimensional convolution. Section 3.9 describes how the FFT can be used as a tool for performing 'fast' convolutions. Section 3.10 looks at the 'matrix transformation' nature of the DFT. Section 3.11 shows how the DFT can be used to compute correlations. Section 3.12 introduces the two-dimensional version of the DFT (the 2DDFT), and section 3.13 shows that the 2DDFT is 'separable', and indicates the importance of separability in image transforms. Section 3.14 introduces other transforms: the Discrete Cosine transform, and the Walsh-Hadamard transform. Section 3.15 discusses applications of the DFT. 3.2 Mathematical Preliminaries. ------------------------------- 3.2.1. Linear Simultaneous Equations. ------------------------------------- Eqn. 3.2-1 is a system of linear (simultaneous) equations. (3.2-1) y1 = 3.x1 + 1.x2 y2 = 2.x1 + 4.x2 Ex. 3.2.1-1 Practically, Eqn. (3.2-1) could express the following: Price of an apple = x1, price of an orange = x2, (both unknown). Person A buys 3 oranges, and 1 apple and the total bill is 5p (y1). Person B buys 2 oranges, 4 apples and the total bill is 10p (y2). Q. What is x1, the price of apples, and x2, the price of oranges? (1) 5 = 3.x1 + 1.x2 (2) 10 = 2.x1 + 4.x2 (1) => x2 = 5 - 3.x1 substitute into (2) 10 = 2.x1 + 4.(5 - 3.x1) 10 = 2.x1 - 12.x1 + 20 10 - 20 = -10.x1 -10 = -10.x1 Thus, x1 = 1 ------- Now, substitute x1 = 1 into (1) 5 = 3.1 + x2 5 - 3 = x2 x2 = 2 ------ The simultaneous equations 3.2-1 can be written in matrix form as eqn. 3.2-2, (3.2-2) y = A x / \ where y is a two-dimensional vector |y1| |y2| \ / / \ x is a two-dimensional vector |x1| |x2| \ / and A is a 2 row x 2 column matrix / \ |3 1| |2 4| \ / NOTE: a VECTOR is simply an array of numbers. A vector with n numbers is called an n-DIMENSIONAL vector; such a vector represents a point in n-dimensional space. DON'T try to visualise n > 3 ! just think of the n numbers grouped together. In two- or three-dimensions it is possible to visualise a vector as a line with an arrow-head - the arrow indicates the path between the origin (0,0) and the point (x,y) that the vector represents; again, for our purposes this view has limited use. Generally, a system of m equations, in n variables, (3.2-3) y1 = a11.x1 + a12.x2 ...+ a1n.xn y2 = a21.x1 + a22.x2 ...+ a2n.xn ... yr = ar1.x1 + ar2.x2 ...+ arc.xc....+arn.xn ... ym = am1.x1 + am2.x2 ...+ amn.xn can be written in matrix form as, (3.2-4) y = A x / \ where y is an m-dimensional vector |y1| |y2| | | |..| |ym| \ / / \ x is an n-dimensional vector |x1| |x2| | | |..| |xn| \ / and A is an m-row x n-column matrix / \ |a11 a12 a1n| |a21 a22 a2n| | | | ..arc.. | | | |am1 am2 amn| \ / That is, the matrix 'A' is a rectangular array of numbers whose element in row 'r', column 'c' is 'arc'. The matrix A is said to be m x n, i.e. m rows, n columns. Vectors can be considered as specialisations on matrices, i.e. matrices with only one column. Thus x is m x 1, and y is n x 1. The set of equations in (3.2-3) or (3.2-4) can be interpreted as the definition of a function which takes n arguments (x1, x2.. xn) and returns m variables (y1, y2..ym). Such a function is also called a TRANSFORMATION: it transforms n-tuples of real numbers to m-tuples of real numbers. Equation 3.2-4 is a LINEAR transformation because there are no terms in xr**2 or higher, only in xr. 3.2.2 Matrix Multiplication --------------------------- We may multiply two matrices A, m x n, B, q x p, as long as n = q. Such a multiplication produces an m x p result. Thus, (3.2-5) C = A B mxp mxn nxp Method: The element at the rth row and cth column of C is the product (DOT product, or INNER product) of the rth row vector of A with the cth column vector of B. I.e. pictorially: n p p ---------------- ---------- ----------- |----> | | | | | | | A | | B | | = | C | | | | | | | | m | | | | | n | | m ---------------- | V | ----------- | | | | ---------- [NB. From now on, for convenience of typing, we will leave out the | around matrices] Thus, C = A B, A = a11 a12 B = b11 b12 a21 a22 b21 b22 (3.2-5) C = (a11b11 + a12b21) (a11b12 + a12b22) (a21b11 + a22b21) (a21b12 + a22b22) Example. 3.2-1. Consider eqns. 3.2-3, and 3.2-4, y = A x, thus, the product of A (m x n) and x (n x 1) y1 = a11.x1 + a12.x2 ...+ a1n.xn row1 col1 = product of 1st row of A with 1st col of x etc. The product is (m x n) x (n x 1) to the result is (m x 1), i.e. y. Ex. 3.2.2-2. Apply the transformation given by / \ |3 1| |2 4| \ / to x = 1 2 3.2.3 Graphics Transformations. ------------------------------- Consider SCALING and ROTATION in computer graphics. Here the vectors are x y and the output, transformed, vector is x' y' The scaling transformation takes the form: (3.2-6) x' = x.Sx y' = y.Sy That is, x is expanded (Sx > 1) or contracted (Sx < 1) to give x'; ditto y-axis. Sx and Sy are called scaling factors. The matrix is Sx 0 0 Sy The following transformation rotates (x,y) a clockwise angle B about the origin (0,0): (3.2-7) x' = x.cosB + y.sinB y' = -x.sinB + y.cosB The matrix is, (3.2-8) R(B) = cosB sinB -sinB cosB Ex. 3.2.3-1 (a) Rotate the point 0.707 0.707 clockwise by 45 degrees. [Note: Sin(45) = 0.707 = 1/sqrt(2) = Cos(45); 0.707 * 0.707 = 0.5 ] (b) rotate the point 0 1 clockwise by 90 degrees. Ex. 3.2.3-2. What is the effect of applying the rotation matrix twice. That is, what is R(B). R(B).| x | | y | ? The following formulae may be useful: sin(a+b) = sin(a)cos(b) + cos(a)sin(b) cos(a+b) = cos(a)cos(b) - sin(a)sin(b) Ex. 3.2.3-3. What is the effect of applying the negative rotation, -B, to a point that has already been rotated by +B. That is, what is R(-B). R(B).| x | | y | ? 3.2.4 Multiplication by a Scalar. --------------------------------- (3.2-9) c . a11 a12 = c.a11 c.a12 a21 a22 c.a21 c.a22 3.2.5 Addition of Matrices. --------------------------- (3.2-10) a11 a12 + b11 b12 = a11+b11 a12+b12 a21 a22 b21 b22 a21+b21 a22+b22 The matrices must be the same size (dimensions). 3.2.6. Inverses of Matrices. ---------------------------- Only for square matrices (m = n). Consider eqn (3.2-1) y1 = 3.x1 + 1.x2 y2 = 2.x1 + 4.x2 i.e. y = A x where y = y1, x = x1, and A = 3 1 y2 x2 2 4 Apply this to x = 1 2 y = y1 = 3.1 + 1.2 = 5 y2 = 2.1 + 4.2 = 10 What if you know y = 5, 10 and want to retrieve x = x1, x2? Answer. Apply the inverse transformation to y. That is, multiply y by the inverse of the matrix. (3.2-11) x = A^^ y (normally write A^^ as A-1, pronounce 'A-inverse') In the case of a 2 x 2 matrix A = a11 a12 a21 a22 (3.2-12) A^^ = (1/det(A)) . a22 -a12 -a21 a11 where det(A) = a11.a22 - a12.a21 If det(A) is zero, then A is not invertible, it is SINGULAR. Thus for A = 3 1 det(A) = 3.4 - 2.1 = 10 2 4 so A^^ = (1/10) . 4 -1 = 0.4 -0.1 -2 3 -0.2 0.3 Therefore, apply A^^ to 5 10 A^^ . y = x 0.4 -0.1 . 5 = (5 . 0.4) + (10 . -0.1) = 1 -0.2 0.3 10 (5 . -.2) + (10 . 0.3) = 2 which is what we started off with, i.e. x = 1 2 Note: in Ex. 3.2.1-1, we were actually doing something that is very similar to inverting the matrix A = 3 1 2 4 3.2.7 Diagonal Matrices. ------------------------ The scaling matrix mentioned in section 3.2.3 Sx 0 0 Sy is DIAGONAL, i.e. the only non-zero elements are on the diagonal. The inverse of a diagonal matrix a11 0 0 a22 is 1/a11 0 0 1/a22 3.2.8 Transpose of a Matrix (At) -------------------------------- Only for square matrices. (3.2-13) If A = a11 a12 At = a11 a21 a21 a22 a12 a22 I.e. replace column 1 with row 1 etc. Usually written A superscript T. Pronounce 'A-transpose'. 3.2.9 The Identity Matrix ------------------------- (3.2-14) I = 1 0 0 1 i.e. produces no transformation effect. Thus, (3.2-14a) I . A = A Note: If (3.2-15) A . B = I then B= A^^ (inverse of A) 3.2.10 Orthogonal Matrix ------------------------ A matrix which satisfies the property: (3.2-16) A . At = I i.e. the transpose of the matrix is its inverse (see eqn 3.2-15) Another way of viewing this is: For each row of the matrix (ar1 ar2 .... arn), the dot product with itself is 1, and with all other rows 0 (see section 3.2.13). I.e. n (3.2.17) # arc.apc = 1 , for r = p c=1 = 0 otherwise. 3.2.11 Complex Numbers ---------------------- A complex number is simply a convenient way of representing the pair of numbers that represent the coordinates (x,y) of points in a plane, (3.2-18) z = x + jy where j = sqrt(-1). In many ways, a complex number is like a two-dimensional vector. The modulus of the complex number (which may be interpreted as the distance between the origin and (x,y) ) is given by: (3.2-19) |z| = |x+jy| = sqrt{ x*x + y*y} i.e. using Pythagoras' Theorem The angle, or ARGUMENT, which may be interpreted as the angle between the line (0,0) to (x,y) and the x-axis, is given by: (3.2-20) arg(z) = arctan(y/x) i.e. the angle whose tangent (opposite/adjacent) is y/x. Addition: --------- If z = x + jy, w = u +jv (3.2-21) z+w = x+u + j(y+v) A graphical interpretation of addition of complex numbers is, - draw a line from (0,0) to (x,y), - using (x,y) as the origin, draw a line to (u,v), - the point reached is (x+u, y+v). I.e. VECTOR addition. Multiplication: If z = x + jy, w = u +jv z.w = (x + jy).(u + jv) = x.u + j.j.y.v + j.(y.u + x.v) j.j = -1 (sqrt(-1).sqrt(-1) = -1) (3.2-22) z.w = (x.u - y.v) + j.(y.u +x.v) Note: if complex numbers have zero imaginary parts, the rules given here collapse to the rules of normal arithmetic for real numbers. Ex. 3.2.11-1. Verify the last statement, i.e. set y, v = 0 in (3.2-21) and (3.2-22). The COMPLEX CONJUGATE of a complex number, c = a + jb is c* = a - jb 3.2.12 Complex Numbers and Matrices. ----------------------------------- Matrices and vectors can contain complex numbers. The rules for matrix addition, multiplication, given above, all apply; we just replace the normal addition, multiplication with the complex versions given in section 3.2.11. 3.2.13 Vector Inner (Dot) Product --------------------------------- Let vector x = x1 i.e. x = (x1, x2, .. xn)T (T denotes transposed) x2 . . xn and vector y = (y1, y2, .. yn)T The INNER PRODUCT (DOT PRODUCT) of x and y is the matrix product (see section 3.2.2) (3.2-23) x.y = xT . y dimensions: 1x1 1xn nx1 n (3.2-24) x.yT = # xi.yi i = 1 If the dot product of two vectors is 0, they are said to be ORTHOGONAL, see section 3.2.10 above, and 3.2.16, 17 and 18 below. 3.2.14 Vector Addition. ---------------------- Three n x 1 vectors x, y, z: (3.2-25) z = x + y z1 = x1+y1 z2 = x2+y2 ... zn = xn+yn THE REMAINING SUB-SECTIONS 3.2.15 TO 3.2.19 ARE REQUIRED ONLY FOR Pattern Recognition (Chapter 8) and Neural Networks (Chapter 9). 3.2.15 Distance Between Vectors. ------------------------------- Considering n-dimensional vectors as points in n-dimensional space, we can talk about the distance, d, between vectors x and y: (3.2-26) d = sqrt[(x1-y1)^2 + (x2-y2)^2 + ... + (xn-yn)^2] or n (3.2-27) d = sqrt[ # (xi-yi)^2 ] i = 1 Ex. 3.2.15-1 Verify eqn. (3.2-27) for the points (1,1), (1,3) | 3 + * (1,3) | 2 + | dim 2 1 + * (1,1) | +---+---+---+---+---+--- 1 2 3 4 dim 1 d = sqrt[ (1-1)^2 + (1-3)^2 ] = sqrt[ 0 4 ] = 2 3.2.16 Length or Magnitude of a Vector. -------------------------------------- Considering now an n-dimensional vector as the line joining the origin to its 'point' in n-dimensional space, we can talk about the LENGTH - or, more usually, the MAGNITUDE - of a vector as the distance between the vectors x and the origin (0,0,0...): (3.2-28) |x| = sqrt[(x1-0)^2 + (x2-0)^2 + ... + (xn-0)^2] or n (3.2-29) |x| = sqrt[ # (xi)^2 ] i = 1 Ex. 3.2.16-1. Verify that the magnitude of the vector (1,0) is 1. Ex. 3.2.16-2. Verify that the magnitude of the vector (0,2) is 2. Ex. 3.2.16-3. Verify that the magnitude of the vector (1,1) is 1.414; i.e. sqrt(2). Note that it is NOT 1, as you can easily verify by sketching. 3.2.17 Normalised or Unit Length Vectors. ---------------------------------------- Quite often we are just interested in the relative DIRECTIONs of vectors (see Chapters 8 and 9) and, for easier comparison, we would like to reduce all vectors to unity magnitude - this is called NORMALISATION. Normalisation is performed by the following (scaling - see section 3.2.3) transformation: (3.2.30) xi' = xi/|x| where xi' is the ith component of the normalised vector, xi is the ith component of the original vector, and |x| is the magnitude of the original vector, see eqn. 3.2-29. Ex. 3.2.17-1 Normalise the vector (0,1); Ans. (0,1). Ex. 3.2.17-2. (a) Normalise the vector (0,2); Ans. (0,1) (b) Verify, with a diagram, that normalisation has retained the 'direction' of the original vector. Ex. 3.2.17-3. (a) Normalise the vector (1,1); Ans. (0.707,0.707). (b) Verify. Ans. 0.707 = 1/sqrt(2) magnitude = sqrt(x1^2 + x2^2) = sqrt[{1/sqrt(2)}^2 + {1/sqrt(2)}^2] = sqrt[ 1/2 + 1/2] = 1 Ex. 3.2.17-4. Verify that, in two-dimensions, all unit vectors lie on the unit circle (a circle with centre at the origin (0,0) and with radius 1. 3.2.18 Template Matching of Unit Vectors. ---------------------------------------- Quite often we wish to compare two vectors, x and y (assume they have already been normalised). One way is to compute the distance between them (see section 3.2.15). n (3.2-27) d = sqrt[ # (xi-yi)^2 ] i = 1 Then we can use the common sense rule: SMALL distance => SIMILAR BIG distance => DIFFERENT Alternatively, we can compute how well the corresponding components correlate, i.e. perform a 'template' matching by multiplying corresponding components and summing (i.e. a dot product - see section 3.2.13): n (3.2-31) c = [ # (xi.yi) ] i = 1 Then we can use the rule: BIG correlation value (c) => SIMILAR SMALL => DIFFERENT [See section 3.2.13: if c = 0 the two vectors are orthogonal] Maximising correlation is equivalent to minimising distance. ----------------------------------------------------------- Intuitive proof (two-dimensions): Expand (3.2-27) for the case of 2-dimensions: (3.2-32) d = [ (x1-y1)^2 + (x2-y2)^2] = [ x1^2+y1^2 -2.x1.y1 + x2^2+y2^2 -2.x2.y2 ] = [ x1^2+x2^2 + y1^2+y2^2 -2.x1.y1 -2.x2.y2 ] 2 2 2 d = [ # xi^2 + # yi^2 -2.# xi.yi ] i=1 i=1 i=1 Since x and y are normalised to unit magnitude: 2 2 # xi^2 = # yi^2 = 1 i=1 i=1 Thus, (3.2-33) d = -2.[c - 1] When the vectors are the same, x = y, so that c = # xi.yi = # xi.xi = 1 , since x is unit magnitude. This is the highest possible value that c can attain (i.e. when the vectors are the SAME, the components are completely matched / correlated). In which case (3.2-33) gives d = -2[1 -1] = 0 (i.e. when the vectors are the SAME, the distance between them is 0) 3.2.19 Vectors in Neural Networks and Pattern Recognition. ---------------------------------------------------------- Many approaches to automatic 'pattern recognition' (especially neural networks) use representations of patterns as arrays of numbers (vectors). The recognition process often involves finding the 'most similar', of a set of stored patterns, to an unknown pattern. Obviously, the distance is clearly a good measure of similarity: small distance large similarity; clearly also, 'template-matching' or correlation is intuitively appealing. We have shown that they both give the same result. Neural Networks: --------------- The computation performed by a single neuron, as used in artificial neural networks, is simply the dot product between the input excitations xi and the weights, wi, n (3.2-34) sum = [ # (xi.wi) ] i = 1 followed by passing 'sum' through some threshold function, such as: (3.2.35) output = 1 if sum > T = 0 otherwise. Note the similarity with 'template-matching'. Example 3.2.19-1. Figure 3.2.19-1 shows a letter 'C' in a small (3 x 3) part of a digital image (a digital picture). A digital picture is represented by brightness numbers (pixels) for each picture point. Now, represent the nine pixel values as elements of a vector. Assuming the character is white-on-black and that bright (filled in with '*') corresponds to '1', and dark to '0', the components of the vector corresponding to the 'C' are: x1=1, x2=1, x3=1, x4=1, x5=0, x6=0, x7=1, x8=1, x9=1. Pixel number: 1 2 3 +----+----+----+ |****|****|****| |****|****|****| +----+----+----+ 4 |****|5 |6 | |****| | | +----+----+----+ |****|****|****| |****|****|****| +----+----+----+ 7 8 9 Figure 3.2.19-1 A Letter 'C' --------------------------- The letter 'T' would give a different observation vector: 'T': 1,1,1, 0,1,0, 0,1,0 'O': 1,1,1, 1,0,1, 1,1,1 'C': 1,1,1, 1,0,0, 1,1,1 etc. 3.3 Digital Signal Processing. ------------------------------ 3.3.1 Introduction. ------------------- As introduced in Chapter 1, a continuous image is a two-dimensional lightness function f(x,y), where x and y denote spatial coordinates, and the value of f at any point (x,y) gives the lightness (or, grey level) at that point. Likewise a (continuous) signal is a one-dimensional function (usually of time) f(t), where the value of f(.) at t gives the intensity at (time) t. The function, f(.), could represent anything: the loudness of sound on a record, on a telephone line; it could represent temperature, pressure, the retail-price-index, the price of sliced pan-loaves..., although these last two are liable to be sampled (discrete) rather than continuous. The sorts of objectives we have in signal processing match those of image processing: - clean up signals so that humans find them more useful, e.g. remove noise from a telephone signal, to make it more audible. - automate human tasks. E.g. recognition of spoken words. Whilst the main impetus for digital image processing came from the need for cleaning up pictures from space probes, and tasks like optical character recognition, much of early digital signal processing research was based on seismic data processing for oil prospecting; such was the complexity of the required processing, that available analogue (continuous) methods were inadequate and the processing had to be done (digitally) on computers. We have already indicated the forces that have driven systems from analogue/continuous to discrete/digital; signal processing has also been subject to these forces and more and more signal processing is becoming digital. 3.3.2 Finite Sampled Signals. ----------------------------- The domain of the function f(.) is, strictly, - infinity to + infinity, but, for practical purposes, attention is often limited to some piece of the infinite domain: t0 <= t <= t1, or t1 <= t <= t1, where t0, t1 are finite. This is the same as, in imaging, limiting our attention to some finite rectangular part of an infinite image. (Notation: the literature on signal processing commonly uses x(t) as the 'general' signal). We use xc(t) to explicitly state that xc() is continuous. If xc(t) is sampled (just like the image in Chapter 1) we obtain an array of numbers, (a discrete or digital data sequence), x[n] , n = 0,1,2 ..,N-1 where N is the number of samples. Notation: The [] denote a discrete function. Typically, x[n] is obtained by sampling x(t) PERIODICALLY at intervals T, i.e. (3.3-1) x[n] = xc(nT), n=0,1,2, ...,N-1 Of course, x[n] is some digital representation of a real number. Figure 3.3-1 show examples of some sampled signals. If these were continuous, we would have two changes: (a) instead of existing just at discrete sample points, n = 0, 1, 2 etc. they would exist continuously -- there would be a continuous line joining the points, (b) the discrete / integer abscissa n would be replaced by a real valued one, e.g. t (for time). 1 * x[n] | * * | | +---1---*---3---4---5---*---7----------------------> n | | | * * -1 | * ---------------------------------------- n 0 1 2 3 4 5 6 7 ---------------------------------------- x[n] 1 0.707 0 -.707 -1 -.707 0 0.707 ---------------------------------------- Figure 3.3-1 Sampled Cosine Signal, x[n] = cos(2PI.n/N) ------------------------------------------------------- 1 | * x[n] | * * | | *---1---2---3---*---5---6---7----------------------> n | | | * * -1 | * ---------------------------------------- n 0 1 2 3 4 5 6 7 ---------------------------------------- x[n] 0 0.707 1 .707 0 -.707 1 -.707 ---------------------------------------- Figure 3.3-2 Sampled Sine Signal, x[n] = sin(2PI.n/N) ----------------------------------------------------- Ex. 3.3-0 (a) Use a calculator to verify the values in Figures 3.3- 1 and 3.3-2. (b) Verify that 0.707 = sqrt(2)/2. (c) Verify that both cos and sin are periodic, i.e. the repeat themselves for n = 8, 9, ... 1 * x[n] | | | +---*---*---*---*---*---*---*-----> n 0 1 2 3 4 5 6 7 -------------------------------------- n 0 1 2 3 4 5 6 7 -------------------------------------- x[n] 1 0 0 0 0 0 0 0 -------------------------------------- Figure 3.3-3 Impulse Signal. ---------------------------- 1 * * * * * * * * x[n] | | | +--------------------------------> n 0 1 2 3 4 5 6 7 -------------------------------------- n 0 1 2 3 4 5 6 7 -------------------------------------- x[n] 1 1 1 1 1 1 1 1 -------------------------------------- Figure 3.3-4 DC (Direct Current) Signal. --------------------------------------- 3.3.3 Sampling Frequency. ------------------------- The symbol T, in eqn 3.3-1, is the sampling PERIOD. The sampling frequency (fs) is (3.3-2) fs = 1/T Nyquist Sampling Theorem: If a continuous signal contains only frequencies in the range 0 to fmax (the BANDWIDTH), then by sampling it at twice fmax we obtain a discrete sequence from which it is possible to EXACTLY reconstruct the original continuous signal; that is, there is no loss of information. (3.3-3) fs = 2.fmax , for no loss of information. Example 3.3-1. For most voice recognition tasks, the effective bandwidth is 5 KHz (5,000 cycles per second). Thus, fs=10 Khz Sampling period, T: T = 1/fs = 1/10,000 seconds = 100 microseconds Ex. 3.3-2. If hi-fi music has a bandwidth of 15 Khz, calculate the minimum allowable sampling frequency fs; what is the corresponding sampling period T. Ex. 3.3-3. Compact Discs (CDs) actually use fs=44.1 KHz. (a) What is sampling period, T? (b) For one second of music, what is N in eqn. 3.3-1? Ans. 44,100. But N.B. requires two channels - 88,200 - for stereo. (c) one minute? (d) one hour? (e) Assuming 16 bits per sample, how many Megabytes are there on a one-hour CD. Ans. 635 Megabytes. 3.3.4 Amplitude resolution. --------------------------- If the range is properly chosen, 12-bits ([0..4095]), is adequate for hi-fi reproduction of audible signals; CDs use 16-bits. 3.3.5 Frequencies. ------------------ When we say fmax is 15 KHz, we mean that the most rapidly varying component (xm(t)) of the signal corresponds to a 'pure' tone, or sinusoid, (sine function) whose frequency is 15KHz. I.e. (3.3-4) xm(t) = am . sin (2.PI.fmax.t) where PI = 3.14159... am is the 'amount' of that frequency present. Example. 3.3-4. If fmax is 1 Hz, work out the values of xm(t) for time samples t = 0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0 Assume am = 1. t 2.PI.t xm(t) = sin() --------------------------------------------------- 0.0 0.0 0.0 0.125 0.785 0.707 0.25 1.57 1.0 0.375 2.36 0.707 0.5 3.14 0.0 0.625 3.93 -0.707 0.75 4.71 -1.0 0.875 5.50 -0.707 1.0 6.28 0.0 ---------------------------------------------------- Ex. 3.3-5. Plot xm(t) from Ex. 3.3-4. Recall Figure 3.3-2 1 | * x[n] | * * | | *---1---2---3---*---5---6---7----------------------> n | | | * * -1 | * ---------------------------------------- n 0 1 2 3 4 5 6 7 ---------------------------------------- x[n] 0 0.707 1 .707 0 -.707 1 -.707 ---------------------------------------- Figure 3.3-2 Sampled Sine Signal, x[n] = sin(2PI.n/N) ----------------------------------------------------- Ex. 3.3-6. If fmax is 2 Hz, work out the values of xm(t) for t = 0, 1/16, 1/8, 3/16, 1/4 .... 15/16, 1 Assume am = 1. Plot them. Ex. 3.3-7. If fmax is 0.5 Hz, work out the values of xm(t) for t = 0, 1/8, 1/4 .... 7/8, 1 Assume am = 1. Plot them. 3.3.6 Phase. ------------ Example. 3.3-8. If fn is 1 Hz, work out the values of xcn(t) = cos(2.PI.fn.t) for t = 0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0 t 2.PI.t xcn(t) = cos() --------------------------------------------------- 0.0 0.0 1.0 0.125 0.785 0.707 0.25 1.57 0.0 0.375 2.36 -0.707 0.5 3.14 -1.0 0.625 3.93 -0.707 0.75 4.71 0.0 0.875 5.50 0.707 1.0 6.28 1.0 ---------------------------------------------------- If you plot this you will find it is the same shape as sin(), but is shifted along the t-axis by 0.25 (i.e. shifted by PI/2). Thus, cos() and sin() are 'out of PHASE' by PI/2. Phase is a term for 'where-you-are-in-the-cycle'. It is usually measured in radians: phase = ph = [0 .. 2.PI] Sometimes degrees ph = [0 .. 360] Usually written with a Greek letter lower case 'phi'. Human auditory perception is not usually sensitive to phase. 3.3.7 Periodic Signals. ----------------------- A signal x(t) is periodic with period Td if: (3.3-4) x(t) = x(t+Td) Thus, the 1Hz sines and cosines, above, are periodic with period 1 sec. The 2Hz signals have a period 0.5. A 4Hz signal would have a period 0.25, etc. In analysis of signals it is often convenient to assume that signals are periodic. This can be done by taking a sufficiently large chunk of signal and assuming that it is repeated. 3.4 Fourier Series. ------------------- 3.4.1 General -------------- Roughly speaking, ANY signal can be represented as a weighted sum of pure sinusoids: (3.4-1a) xn(t) = sin (2.PI.fn.t), t = t0..t1. nmax (3.4-1b) x(t) = # an.xn(t) n=0 where an is the 'weight' of the contribution from frequency fn Ex. 3.4-1. What does component 0 (n=0) look like? Thus if the signal is sin(2.PI. 1 . f. t), i.e. a pure 1Hz sinusoid, nmax=1 a0 = 0 a1 = 1.0, fnmax = 1.0 Notation: it is useful to introduce a symbol for 2.PI.f (the RADIAN FREQUENCY). Greek lower-case 'omega' is normally used, we will use 'w'. Thus sin(wt) = sin(2.PI.f.t) Ex. 3.4-1. Add the following components: 1.sin(wt) + (1/3).sin(3wt) + (1/5).sin(5wt) ... where w = 2.PI.1. As you add more terms you will get something which proceeds from the 'sine-wave' shape, to a 'square-wave' shape. STRICTLY, eqn 3.4-1 will not completely work. We must (1) include cosines as well, (2) insist that the signal x(t) is periodic. ---------------------------------------- nmax (3.4-2) x(t) = a0 + # { an.cos(n.wd.t) + bn.sin(n.wd.t) } n=1 where wd = 2.PI/Td, Td is the (repetition) period, i.e. wd corresponds to the frequency (fd) of repetition Unless you know otherwise, nmax is infinity. The infinite series, x(t)= a0 + a1.cos(1.wd.t) + b1.sin(1.wd.t) + a2.cos(2.wd.t) ...... is called the FOURIER SERIES corresponding to x(t). Note that a0 corresponds to the n=0 term in (3.4-1). Notation: in what follows, t2 ! x(t) dx t1 denotes the integral of x(t) between t1 and t2. The coefficients an, bn, in (3.4-2) are evaluated by the expressions, +Td/2 (3.4-3a) a0 = (1/Td) ! x(t) dt -Td/2 +Td/2 (3.4-3b) an = (2/Td) ! x(t).cos(n.wd.t) dt -Td/2 +Td/2 (3.4-3c) bn = (2/Td) ! x(t).sin(n.wd.t) dt -Td/2 These are the Fourier coefficients of x(t). Example 3.4-2. If x(t) = sin (1.t), wd=1, and Td = 2.PI/wd = 2.PI there is only one component, b1 +Td/2 b1 = (2/Td) ! x(t).sin(1.1.t) dt -Td/2 This is just the same as integrating from 0 to Td, +Td = (2/Td) ! sin(1.t).sin(1.1.t) dt 0 (Integral of sin**2(t) = t/2 - sin(2t)/4) = (2/Td){ [t/2 - sin(2t)/4)] evaluated @t=Td -[t/2 -sin(2t)/4)] evaluated @t=0 } = (2/Td){ Td/2 - sin(2Td) - 0 - sin(0)} = (2/Td){ Td/2 - 0 } , since 2Td = 4.PI and sin(4.PI)=0 b1 = 1, as expected! ------- All the above says is that: ANY PERIODIC WAVEFORM CAN BE REPRESENTED BY WEIGHTED SUM OF A NUMBER OF HARMONICS OF THE FUNDAMENTAL REPETITION FREQUENCY. Harmonic: = integer multiple of a fundamental frequency. 3.4.2 Orthogonal Functions. -------------------------- [Recall orthogonal vectors - section 3.2.13] In example 3.4-2 all the other coefficients are zero because the functions, cos(n.wd.t), sin(n.wd.t) are ORTHOGONAL in the range -Td/2 to +Td/2, i.e. +Td/2 (3.4-4a) (2/Td) ! sin(k.wd.t).sin(n.wd.t) dt = 1, for k=n -Td/2 = 0 otherwise Similarly, +Td/2 (3.4-4b) (2/Td) ! cos(k.wd.t).cos(n.wd.t) dt = 1, for k=n -Td/2 = 0 otherwise and, +Td/2 (3.4-4c) (2/Td) ! sin(k.wd.t).cos(n.wd.t) dt = 1, for k=n -Td/2 = 0 otherwise 3.4.3 Finite Fourier Series. ---------------------------- If in eqn. 3.4-2, nmax (3.4-2) x(t) = a0 + # { an.cos(n.wd.t) + bn.sin(n.wd.t) } n=1 where wd = 2.PI/Td, Td is the (repetition) period, i.e. wd corresponds to the frequency (fd) of repetition the largest frequency present is just less than N.wd, i.e. N times the repetition frequency, then we can set nmax = N-1 and we can represent x(t) with a0, (N-1) a-coefficients and (N-1) b-coefficients. i.e. a total of 2N-1 coefficients. 3.4.4 Complex Fourier Series. ----------------------------- Eqn. 3.4-2 nmax (3.4-2) x(t) = a0 + # { an.cos(n.wd.t) + bn.sin(n.wd.t) } n=1 can be written even more compactly as: +nmax (3.4-5) x(t) = # { cn.exp(j.n.wd.t) } n=-nmax Now the coefficients are complex numbers cn = an + jbn, [ j=sqrt(-1) ] Note that: (3.4-6a) exp(jB) = cos(B) +j.sin(B) (3.4-6b) cos(B) = (exp(jB) + exp(-jB))/2 (3.4-6c) sin(B) = (exp(jB) - exp(-jB)/2.j Now the coefficients are given by: +Td/2 (3.4-7) cn = (1/Td) ! x(t).exp(-j.n.wd.t) dt -Td/2 Equations 3.4-5 and 3.4-7 introduce nothing new - they are just shorthand for 3.4-2 and 3.4-3, respectively. Ex. Verify the last statement by substituting eqn 3.4-6a in (3.4-5). cn = (an - jbn)/2, c-n = (an + jbn)/2 and c0 = a0 3.5 The Fourier Transform. -------------------------- If we now relax the 'periodic' restriction, i.e. let Td -> infinity, we have +infinity (3.4-5) x(t) = # { cn.exp(j.n.wd.t) } and n=-infinity +Td/2 (3.4-7) cn = (1/Td) ! x(t).exp(-j.n.wd.t) dt -Td/2 Td -> infinity => wd = 2.PI/Td -> dw (ie VERY small) n.wd -> n.dw (call this w) Eqn. (3.4-7) now becomes: +inf (3.5-1) cn = (dw/2.PI) ! x(t).exp(-jwt) dt -inf Eqn. (3.4-5) is now a sum of infinitely many terms, i.e. an integral: +infinity (3.5-2) x(t) = ! cn.exp(jwt) -infinity Substituting eqn. 3.5-1 in (3.5-2) +inf +inf (3.5-3) x(t) = ! dw.[ ! x(t)exp(-jwt)dt].exp(jwt)/(2.PI) -inf -inf The thing in the [] is a function of angular frequency (w) and is called the FOURIER TRANSFORM of x(t). ----------------------------------------------------------- Fourier transform: ------------------ +inf (3.5-4) X(w) = (1/2.PI) ! x(t).exp(-jwt) dw -inf ----------------------------------------------------------- If we substitute X(w) = ... in (3.5-3) we get ----------------------------------------------------------- Inverse Fourier transform: -------------------------- +inf (3.5-5) x(t) = (1/2.PI) ! X(w)exp(+jwt) dw -inf ----------------------------------------------------------- The simplest way of thinking about the Fourier transform is that it gives an estimate, at each value of w, the 'amount' of (a) cos (wt) in the signal (b) sin (wt) in the signal Here you can get an estimate for continuous w. On a 'graphic-equaliser' you see much larger 'w-blocks'. Further, no distinction is made between sin(), and cos() - human ears being insensitive to phase cannot tell the difference between cos() and sin() ! X(w) is a complex number: (3.5-6) X(w) = Xr(w) + j Xi(w) where Xr, and Xi, are the real part, and imaginary part, respectively. The Xrs correspond to the cosines, the Xis correspond to sines. Negative Frequencies?? Just a mathematical convenience. Physically, a negative frequency is interpreted as the same as a positive frequency. The Fourier Transform, X(w), is often called the SPECTRUM of x(t). If you take the modulus of the complex numbers: (3.5-7) Xa(w) = |X(w)| = |Xr(w)+jXi(w)| = sqrt(Xr*Xr + Xi*Xi) Xa(w) is called the AMPLITUDE spectrum; Xa*Xa is called the POWER spectrum. Likewise, by calculating the ARGUMENT (angle) of the complex number Xr + jXi, (3.5-8) Xp(w) = arg(X(w)) = arctan(Xi(w)/Xr(i)) we get the PHASE spectrum. 3.6 Discrete Fourier Transform. ------------------------------- 3.6.1 Definition. ---------------- The Discrete Fourier Transform (DFT) of the digital signal / sequence x[n] , i=0,1...N-1, is N-1 (3.6-1) X[u] = (1/N) # x[n] exp(-j.2PI.u.n/N) n=0 u=0,1...N-1 The Inverse DFT (IDFT) is N-1 (3.6-2) x[n] = # X[u] exp(+j.2PI.u.n/N) n=0 Interpretation of DFT: If x[n] has been produced by sampling at fs = 1/T, the largest frequency present in x[n] is fs/2 (section 3.3.3). Examine eqn. 3.6-1, and look at just the cos part, N-1 (3.6-1) X[u] = (1/N) # x[n] exp(-j.2PI.u.n/N) n=0 cos part: N-1 (3.6-3) Xc[u] = (1/N) # x[n] cos(2PI.u.n/N) n=0 at u=0 : cos() = cos(2PI.(0).n/N) ------- = cos(0) = 1 i.e. we are matching x[n] with a constant (sometimes called the DC component - DC = direct current) signal. at u=1 : cos() = cos(2PI.1.n/N -------- this term varies slowly from cos(0) at n=0 to cos(2PI.(N-1)/N)) = cos(2PI) i.e. just one cycle in the sequence. at u=2: we get 2 cycles in the sequence ------- at N/2: we get N/2 cycles in the sequence, i.e. 2 samples per ------- cycle, i.e. very rapidly varying, this corresponds to fs/2. 3.6.2 Discrete Fourier Spectrum. ------------------------------- As with the continuous Fourier Transform, X(w), X[u] is often also called the SPECTRUM of x[n] - or the DISCRETE spectrum. If you take the modulus of the complex numbers, X[u]=Xc[u]+j Xs[u]: (3.6-4) Xa[u] = |X[u]| = |Xc[u]+jXs[u]| = sqrt(Xc[u]*Xc[u] + Xs[u]*Xs[u]) Xa[u] is called the AMPLITUDE spectrum; Xa*Xa is called the POWER spectrum. Likewise, by calculating the ARGUMENT (angle) of the complex number (3.6-5) Xp[u] = arg(X[u]) = arctan(Xs[u]/Xc(i)) we get the PHASE spectrum. 3.6.3 Interpretation. -------------------- The components near the beginning of the DFT correspond to the 'slowly varying' parts of the signal; the components near the middle, the 'fast moving' parts. Domains: Often when working with sequences x[n], or signals x(t) we say we are in the TIME DOMAIN; in image processing, this becomes the SPATIAL DOMAIN. If we are working with X[u], or X(w), then we are in the FREQUENCY DOMAIN. 3.6.4 Frequency Discrimination by the DFT. ------------------------------------------ In general, if we have N samples, the spacing of each frequency sample is fs/N. (3.6-6) df = fs/N The larger the N, the finer is our discrimination of frequency. If N -> infinity, then we have a continuous spectrum, i.e. the continuous Fourier transform. Above N/2 we have no additional information; these terms correspond to the 'negative' frequencies of the continuous transform. Example 3.6-1. Consider a CD signal sequence; it (one channel) is sampled at fs=44.1 KHz. Take a one second sequence, i.e. 44,100 samples. Take a DFT of x[n], n = 0, 1, ...,44099. We have df = 44100/44100 = 1 Hz, i.e. a sampling of 1 Hz of the spectrum. The highest frequency is N/2; this is fmax = 44100/2 = 22050 i.e. the highest frequency represnted (fs/2). Example 3.6-2. Show how the DFT obtained in Example 3.6-1 could be used to create 'graphic-spectrum-analyser' output in 10 'bands' representing the frequency range 0 to 15 Khz? 0 to 15 Khz, so we are interested in the first 15000 frequency samples. There are 10 bands so each band will have 1500 Hz. First band: 0 - 1500 Hz 2nd 1501-3000 Hz etc. Therefore, for band 1, sum the AMPLITUDE spectrum from w = 0 to w= 1500. See eqn. 3.5-7. Band 2: sum for w= 1501 to 3000 etc. Ex. 3.6-3. (a) What is the DFT of a sequence containing the following single cycle of cosine-wave. Think, before you calculate. 1 * x[n] | * * | | +---1---*---3---4---5---*---7----------------------> n | | | * * -1 | * ---------------------------------------- n 0 1 2 3 4 5 6 7 ---------------------------------------- x[n] 1 0.707 0 -.707 -1 -.707 0 0.707 ---------------------------------------- Answer: Note all real, imag. part = 0, for all u. ---------------------------------------- u 0 1 2 3 4 5 6 7 ---------------------------------------- Xr[u] 0 1/2 0 0 0 0 0 1/2 Xi[u] 0 0 0 0 0 0 0 0 ---------------------------------------- (b) What is the amplitude spectrum? Answer: ---------------------------------------- u 0 1 2 3 4 5 6 7 ---------------------------------------- Xa[u] 0 1/2 0 0 0 0 0 1/2 ---------------------------------------- (c) Phase spectrum? Ex. 3.6-4. (a) What is the DFT of a sequence containing the following single cycle of sine-wave. Again, think, before you calculate. 1 | * x[n] | * * | | *---1---2---3---*---5---6---7----------------------> n | | | * * -1 | * ---------------------------------------- n 0 1 2 3 4 5 6 7 ---------------------------------------- x[n] 0 0.707 1 .707 0 -.707 1 -.707 ---------------------------------------- Answer: Note all imaginary, real part = 0, for all u, i.e. just the opposite of the cosine. ---------------------------------------- u 0 1 2 3 4 5 6 7 ---------------------------------------- Xr[u] 0 0 0 0 0 0 0 0 Xi[u] 0 1/2 0 0 0 0 0 1/2 ---------------------------------------- (b) What is the amplitude spectrum? Answer: ---------------------------------------- u 0 1 2 3 4 5 6 7 ---------------------------------------- Xa[u] 0 1/2 0 0 0 0 0 1/2 ---------------------------------------- (c) Phase spectrum? Ex. 3.6-5. (a) What is the DFT of a sequence containing an 'impulse' x[n] = {1,0,0,0,0.....}; work out the first few terms. ANS: Recall the definition of the DFT, eqn. (3.6-1) N-1 (3.6-1) X[u] = (1/N) # x[n] exp(-j.2PI.u.n/N) n=0 u=0,1...N-1 We can separate this into a 'cos' part, Xc[], and and a 'sin' part, X[s[]: N-1 Xc[u] = (1/N) # x[n].cos(2PI.u.n/N) n=0 N-1 Xs[u] = (1/N) # x[n].sin(2PI.u.n/N) n=0 And, since exp(-jB) = cos(B) - j.sin(B), X[u] = Xc[u] -j.Xs[u] Thus, work out Xc[]: N-1 Xc[u] = (1/N) # x[n].cos(2PI.u.n/N) n=0 Xc[0] = (1/N)[ x[0].cos(2PI.0.0/N) + x[1].cos(2PI.0.1/N) +etc..] = (1/N)[ 1.1 + 0 . 1 + etc. all zeros] since x[0] = 1, and x[i] = 0 i =1..N-1 and, cos(0) = 1. = 1/N Xc[1] = (1/N)[ x[0].cos(2PI.1.0/N) + x[1].cos(2PI.1.1/N) +etc..] = (1/N)[ 1.1 + 0.(don't care) + etc. all zeros] since x[0] = 1, and x[i] = 0 i =1..N-1 = 1/N Similarly Xc[2] ... Xc[7], all = 1/N Now Xs[]: N-1 Xs[u] = (1/N) # x[n].sin(2PI.u.n/N) n=0 Xs[0] = (1/N)[ x[0].sin(2PI.0.0/N + .....] = (1/N)[ 1.0 + 0 + 0 .... since sin(0) = 0 Xs[0] = 0 Similarly Xs[1] ... Xs[7] all = 0 Thus, X[0] = 1/N + j.0 = 1/N = 1/8 = 0.125 | imaginary part Thus: 1 * x[n] | | | +---*---*---*---*---*---*---*-----> n 0 1 2 3 4 5 6 7 -------------------------------------- n 0 1 2 3 4 5 6 7 -------------------------------------- x[n] 1 0 0 0 0 0 0 0 -------------------------------------- DFT -- real, imaginary part = 0. ---------------------------------------------- u 0 1 2 3 4 5 6 7 --------------------------------------------- X[u] 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 ---------------------------------------------- (b) What is the amplitude spectrum? Work out the first few terms, you will be able to guess the remainder. Recall the definition of the discrete amplitude spectrum, eqn (3.6- 5): Xa[u] = |X[u]| = |Xc[u]+j.Xs[u]| = sqrt(Xc*Xc + Xs*Xs) Thus, the amplitude spectrum of the impulse is simply ---------------------------------------------- u 0 1 2 3 4 5 6 7 --------------------------------------------- X[u] 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 ---------------------------------------------- I.e. an 'impulse' has equal amounts of all frequencies. So, if you fed impulses into your amplifier, the 'graphic-spectrum-analyser' would show an equal reading on all bands. Ex. 3.6-6 Work out the DFT, amplitude spectrum of a DC signal, i.e. {1, 1, 1, 1, ...}. A pattern should emerge for u= 1,2,3 ... 1 * * * * * * * * x[n] | | | +--------------------------------> n 0 1 2 3 4 5 6 7 -------------------------------------- n 0 1 2 3 4 5 6 7 -------------------------------------- x[n] 1 1 1 1 1 1 1 1 -------------------------------------- [Hint: this sequence correlates exactly with the constant sequence [cos(0), cos(0), cos(0) .....] =[1, 1, 1, ...] and does not correlate with any other of the transform sequences, e.g. cos(2PI.1.i/N), i=0,1,..7 the most slowly varying cos() cos(2PI.2.i/N), i=0,1,..7 the next cos()] Thus, work out Xc[]: N-1 Xc[u] = (1/N) # x[n].cos(2PI.u.n/N) n=0 Xc[0] = (1/N)[ x[0].cos(2PI.0.0/N) + x[1].cos(2PI.0.1/N) +etc..] = (1/N)[ 1.1 + 1.1 + etc. all 1.1s] since x[0] = 1, and x[1] = 1, etc. 2,3..7 and, cos(0) = 1. = (1/8)[1+1... 8 of them] = (1/8)[8] = 1 Xc[1] = (1/N)[ x[0].cos(2PI.1.0/N) + x[1].cos(2PI.1.1/N) +etc..] = (1/N)[for every positive cos() we have a negative] i.e. they all cancel = (1/8)[0] = 0 Similarly Xc[2] ... Xc[7], all = 0 Similarly, the Xs[] - they are all 0 - following the argument for Xc[1]. Thus: ---------------------------------------------- u 0 1 2 3 4 5 6 7 --------------------------------------------- X[u] 1 0 0 0 0 0 0 0 ---------------------------------------------- Thus, the DFT of the constant sequence [1,1,1,1...] is an 'impulse'. Note: we have already see that the DFT of an 'impulse' is a constant sequence; i.e. the impulse and the constant sequence are DUALs of one another. (b) What is the amplitude spectrum? ANS: ---------------------------------------------- u 0 1 2 3 4 5 6 7 --------------------------------------------- X[u] 1 0 0 0 0 0 0 0 ---------------------------------------------- (c) What is the phase spectrum? Ex. 3.6-7. What is the DFT of the following sequence. Look at Exs. 3.6-3, and 3.6-6, and think, before you calculate; the DFT is a linear transformation, i.e. DFT(x[] + y[]) = DFT(x[]) + DFT(y[]) 2 * x[n] | * * | | 1 + * * | | | * * 0 +---1---2---3---*---5---6---7----------------------> n 0 ---------------------------------------- n 0 1 2 3 4 5 6 7 ---------------------------------------- x[n] 2 1.707 1 -.293 0 -.293 1 1.707 ---------------------------------------- Answer: Note all real, imag. part = 0, for all u. ---------------------------------------- u 0 1 2 3 4 5 6 7 ---------------------------------------- Xr[u] ? ? ? ? ? ? ? ? Xi[u] 0 0 0 0 0 0 0 0 ---------------------------------------- Ex. 3.6-8. From what has been said above about 'square waves', give a qualitative estimate of the amplitude spectrum of a single cycle square-wave x[n] = {1,1,1..up to N/2-1, 0 at N/2, -1,-1,-1 up to N-1} 3.6.5 Implementation of the DFT. -------------------------------- i.e. let us write a program to perform the DFT. Recall the definition of the DFT: Forward: N-1 (3.6-1) X[u] = (1/N) # x[n] exp(-j.2PI.u.n/N) n=0 u=0,1...N-1 Inverse DFT: N-1 (3.6-2) x[n] = # X[u] exp(+j.2PI.u.n/N) n=0 First, we have to solve the problem of how to represent complex numbers; this can be done simply by splitting X[u] into the real, Xr[u], and imaginary, Xr[u], parts: (3.6-7) X[u] = Xr[u] + j.Xi[u] (for all the cases we deal with, Xr == Xc (cosine component) Xi == Xs (sine) ) Actually, see below, there will be a negative sign on the sin term. Now, using, (3.6-8) exp(j.B) = cos(B) + j.sin(B) exp(-j.B) = cos(B) - j.sin(B) and, in general, the sequence to be transformed may be complex (actually for a signal, xi[n] will be zero, but we must be general especially for the inverse transform). (3.6-9) x[n] = xr[n] +/- j.xi[n] Therefore, (3.6-10) x[n].exp(j.B) = (xr[n] +j.xi[n]).(cos(B) + j.sin(B)) = {xr[n].cos(B) - xi[n].sin(B)} + j.{xi[n].cos(B) +/- xr[n].sin(B)} real part imag. part Hence, (3.6-11) real part N-1 Xr[u] = (1/N) # {xr[n].cos(2PI.u.n/N) - xi[n].sin(2PI.u.n/N)} n=0 for u=0,1...N-1 If x[] is real, i.e. xi[n] == 0 for all n, this simplifies back to the same as the cosine part, Xc[] (3.6-12) N-1 Xr[u] = (1/N) # {xr[n].cos(2PI.u.n/N)} n=0 for u=0,1...N-1 (3.6-13) imag. part N-1 Xi[u] = (1/N) # {xi[n].cos(2PI.u.n/N) + xr[n].sin(2PI.u.n/N)} n=0 for u=0,1...N-1 Again if x[] is real, this simplifies to: (3.6-14) N-1 Xi[u] = - (1/N) # xr[n].sin(2PI.u.n/N) n=0 for u=0,1...N-1 Using eqns. 3.6-11, and 3.6-13, both forward and inverse DFTs can be done by the same code; for inverse ( X[u] -> x[n] ), we simply replace the negative sign on the sin part, by positive, and we don't divide across by N. But to simplify things, let us assume that we are transforming forward, and that the signal is real. Hence, the software, function rfdft (Real, Forward DFT): void rfdft( float x[], /*- real input -*/ float xro[], float xio[],/*- real and imag. outputs -*/ int N) /*- length -*/ { float pi = 3.1415926,tpi; int n, u; tpi=2.0*pi; for(u=0;u data[1], xim0 -> data[2], ... i.e. xre0, xim0, xre1, xim1, ..., xreN-1, ximN-1; Likewise, the output (transformed) data are interleaved: Xre0, Xim0, Xre1, Xim1, ..., XreN-1, XimN-1; or using our earlier notation: Xc0, Xs0, Xc1, Xs1, ..., XcN-1, XsN-1; N: is the length of the array - which must be a power of 2 (2, 4, 8, 16, 32, 1024, ... etc.) isign: defines whether the transform is Inverse (+1) or Forward (-1); all this does is set the sign in the exp( ) to select the appropriate equation: N-1 (3.6-1) X[u] = (1/N) # x[n] exp(-j.2PI.u.n/N) n=0 N-1 (3.6-2) x[n] = # X[u] exp(+j.2PI.u.n/N) n=0 A typical calling program: /* get data from image store */ d=0; /*dimension 0 - assume there is only one */ r=0; /*row 0 - assume there is only one row */ for(j=1,c=cl;c<=ch;c++,j+=2){ IMget(&val,d,r,c,ims); data[j]=val; data[j+1]=0.0; /*zero complex part*/ } /* now do fft */ four1(data,len,-1);/* we use -1 for forward*/ /* Now extract the DFT re. and imag. parts. and, if required, compute the amplitude and phase spectrum - recall eqns. (3.6-4) and (3.6-5). (3.6-4) Xa(w) = |X[u]| = |Xc[u]+jXs[u]| = sqrt(Xc*Xc + Xs*Xs) Xa[u] is called the AMPLITUDE spectrum; Xa*Xa is called the POWER spectrum. Xp[u] = arg(X[u]) = arctan(Xs[u]/Xc(i)) = PHASE spectrum. */ /* now extract re and cmplx parts*/ for(j=1,c=cl;c<=ch;c++,j+=2){ re=data[j]; /* Real and Imaginary are interleaved*/ im=data[j+1]; if(ftrri){ IMput(&re,d,r,c,imd0); IMput(&im,d,r,c,imd1); } else if(ftrap){ /*amplitude and phase reqd. */ amp=sqrt(re*re+im*im); pha=0.0; if(fabs(re)>100.0*FLT_EPSILON)pha=atan2(im,re); IMput(&,d,r,c,imd0); IMput(&pha,d,r,c,imd1); } else if(ftra){ /*amplitude only reqd.*/ amp=sqrt(re*re+im*im); IMput(&,d,r,c,imd0); } } Frequencies associated with output data. Positive frequencies: data[1] = Xc[0], cosine(0) - DC term data[2] = Xs[0], sine(1) term; always 0 for real input. data[3] = Xc[1], cosine(1) term; freq 1 => 1.df data[4] = Xs[1], sine(1) term data[5] = Xc[2], cosine(2) term freq 2 => 2.df data[6] = Xs[2], sine(2) term ... data[N+1] = Xc[N/2], cos(N/2) term => (N/2).df data[N+2] = Xs[N/2], sin(N/2) term Negative Frequencies. data[N+3] = Xc[N/2 +1], cos(-[N/2+1]) term data[N+4] = Xs[N/2 +1], sin(-[N/2+1]) term ... data[2N] = Xc[N-1] cos(-1) term data[2N+1]= Xs[N-1] sin(-1) term. If you look at any amplitude or power spectrum, you will see that the negative frequencies contain exactly the same information as the positive, i.e. the spectrum is symmetrical about (N/2).df term. Recall section 3.6.4, where we defined df = fs/N, thus, the highest frequency in the DFT/FFT is (N/2).fs/N = fs/2 which is reassuring, since the whole of sampled data theory is based on the assumption that we cannot represent frequencies higher than sampling frequency/2. The larger the N, the finer is our discrimination of frequency. If N -> infinity, then we have a continuous spectrum, i.e. the continuous Fourier transform. Above N/2 we have no additional information; these terms correspond to the 'negative' frequencies of the continuous transform. Ex. 3.7-1. If, in the program above, the sampling frequency is 10,000 Hz, work out 'df' and, hence, write the few lines of program that compute the amplitude of frequencies (get as close as possible): 0 Hz, 50 Hz, 100 Hz, 3000 Hz, 5000 Hz, 6000 Hz (be careful!). Ex. 3.7-2. If we have data from a CD, and an FFT of length 32768; write the few lines of program that compute the average amplitude in five bands of frequencies between 0 and 15000 Hz. What is the highest frequency available? 3.8 Convolution. ---------------- 3.8.1 General ------------- A great many signal processing operations can be defined in terms of convolution. Here we will deal with discrete convolution, using sums. There is an equivalent continuous convolution which is defined in Section 3.8.5, see below. The discrete convolution of x[], and h[] is given by, +inf (3.8-1) y[n] = x[n] (*) h[n] = # x[n-m].h[m] m=-inf If we agree that h[m] is zero outside the range [0..N-1], we have, N-1 (3.8-2) y[n] = x[n] (*) h[n] = # x[n-m].h[m] m=0 Convolution is commutative, so that, (3.8-3) x[](*)h[] = h[](*)x[], i.e. +inf (3.8-4) y[n] = x[n] (*) h[n] = # x[m].h[n-m] m=-inf You will find eqns 3.8-1 and 3.8-4 used interchangably in the literature. Example 3.8-1. Convolve the sequence x[] = {0,1,0,0....} with the sequence h[] = {1/3,2/3,1,0,0..} These are shown in Figure 3.8-1, x[n] | 1 | * | | *-----*--*--*--*-----------------------------> n 0 1 2 etc. (a) x[] an 'impulse' h[n] | 1 | * | * * +--------*--*--*--*-------------------------> n 0 1 2 etc. (b) h[] a 'half-triangle' hr[n] | 1 * | * | * ----------+--*--*--*--*--*--*----------------------> n -2 -1 0 1 2 etc. (c) h[] reversed ' flipped about the n = 0 Figure 3.8-1 Sequences for convolution. --------------------------------------- Apply eqn 3.8-2, N-1 (3.8-2) y[n] = x[n] (*) h[n] = # x[n-m].h[m] m=0 y[n=0]: m=0 x[n-m].h[m] = x[0-0].h[0] = 0.1/3 =0 m=1 x[0-1].h[1] = 0.2/3 =0 m=2 x[0-2].h[2] = 0.1 =0 y[0] = 0 ======== y[n=1]: m=0 x[n-m].h[m] = x[1-0].h[0] = 1.1/3 =1/3 m=1 x[1-1].h[1] = 0.2/3 =0 m=2 x[1-2].h[2] = 0.1 =0 y[1] = 1/3 ========== y[n=2]: m=0 x[n-m].h[m] = x[2-0].h[0] = 0.1/3 =0 m=1 x[2-1].h[1] = 1.2/3 =2/3 m=2 x[2-2].h[2] = 0.1 =0 y[2] = 2/3 ========== y[n=3]: m=0 x[n-m].h[m] = x[3-0].h[0] = 0.1/3 =0 m=1 x[3-1].h[1] = 0.2/3 =0 m=2 x[3-2].h[2] = 1.1 =1 y[2] = 1 ========== If you continue, you will find that the remainder of ys are zero. Thus convolution can be done mechanically as follows: - reverse ('flip') the convolution sequence h[n] about n=0, i.e. {1/3,2/3,1} becomes {1,2/3,1/3}, call this hr[], see Figure 3.8-1(c) - For each point, n, in the input sequence (x[n]) - overlay hr[] on x[], with hr[]s rightmost point at n (i.e. hr[] slides along by one for each iteration), - multiply the corresponding values, hr[].x[], - sum the products - the sum is the convolved result at [n] SUMMARY: 'flip', sum of products, slide right, sum of products, slide right, .... Ex. 3.8-2. Show that is the convolution of the 'rectangular' sequence {1,1,1,0,0,0,0...} with itself is a 'triangular' sequence; here we do it out mechanically: {1,1,1,0,0,0,0...} {1,1,1,0,0,0,0...} (flip) 1 {1,1,1,0,0,0,0...} {1,1,1,0,0,0,0...} (slide, multiply, add) 2 {1,1,1,0,0,0,0...} {1,1,1,0,0,0,0...} (slide, multiply, add) 3 {1,1,1,0,0,0,0...} {1,1,1,0,0,0,0...} (slide, multiply, add) 2 {1,1,1,0,0,0,0...} {1,1,1,0,0,0,0...} 1 {1,1,1,0,0,0,0...} {1,1,1,0,0,0,0...} 0 Thus, output is y[] = {1,2,3,2,1,0,0,0...} y[n] | 3 | * 2 | * * 1 * * +-------------*--*-------------------------> n 0 1 2 etc. 3.8.2 Impulse Response. ----------------------- In digital signal processing, a sequence consisting of a single '1' is called an 'impulse' sequence, see Example 3.8-1. In the convolution operation, h[] is called the IMPULSE RESPONSE. That is, if we convolve an impulse sequence with h[] we get an output identical to h[]. Likewise, see Ex. 3.9-3, if we convolve any sequence, x[], with an h[] that is an impulse, we get an output identical to x[]. This is easy to verify - replace h[] in Ex. 3.8-2 with [1,0,0,0...] and work it out. 3.8.3 Linear Systems. --------------------- Let y1[n] be the result of passing x1[n] through a 'system', and y2[] the result of x2[] passing through the same system. Then if the system is LINEAR, the result of passing x[] = x1[] + x2[] through the system is, y[] = y1[] + y2[] That is, we can add before or after the system. The term system is very general, we mean anything which applies processing to a sequence. In this Chapter we deal almost exclusively with linear systems (system = filter or other similar processing); in Chapter 4 we will deal with some operations that are not linear (e.g. median filtering). Any filter or operation that can be applied by a straight convolution (one-dimensional, or two-dimensional) is sure to be linear; but if you do anything like squaring, or taking absolute values, and it becomes non-linear. 3.8.4 Some Interpretations of Convolution. ------------------------------------------ (1) Weighted sum: the output is a weighted sum of past inputs; i.e. weighted according to the impulse response. (2) Delayed impulses: any sequence is just a sum of delayed and weighted (i.e. have values other than 1) impulse sequences. Thus the convolution is just a weighted sum of delayed impulse responses. This is a consequence of the linearity of convolution. Linear systems can ALWAYS be completely defined by their impulse response. (3) Tapped Delay Line: a method of applying a weighted sum; of interest only to engineers. Ex. 3.8-3. What is the result of convolving ANY sequence, x[], with h[n] = {1,0,0,0...}, i.e. h[] is an impulse. It may help to recall that: x[] (*) h[] = h[] (*) x[] 3.8.5 Convolution of Continuous Signals. ---------------------------------------- For continuous signals convolution is given by: +inf (3.8-5) y(t) = x(t) (*) h(t) = ! x(t1).h(t-t1).dt1 -inf 3.8.6 Two-Dimensional Convolution. ---------------------------------- Convolution can be extended to two-dimensions in a straightforward manner, if f[r,c] is the input image, h[r,c] the convolution kernel (or convolution mask), and g[r,c] the output image, N-1 recall(3.8-2) y[n] = x[n] (*) h[n] = # x[n-m].h[m] m=0 g[] = f[] (*) h[] N-1 M-1 (3.8-6) g[r,c] = # # f[r-k,c-l].h[k,l] k=0 l=0 As mentioned in the one-d. case, convolution is commutative, so that, g[] = h[] (*) f[] i.e. (3.8-7) is equivalent to eqn. 3.8-6, r c (3.8-7) g[r,c] = # # f[k,l].h[r-k,c-l] k=r-N+1 l=c-M+1 In Chapter 4.11, we will encounter convolution again; there we will have h[] extending from k = -w to +w instead of 0 to N-1 and l = -v to +v instead of 0 to M-1 It is only a matter of convention (cf. storage of h[][] in an array that does not allow negative subscripts). Thus, eqn. (3.8-7) can be further rewritten as: r+w c+v (3.8-7) g[r,c] = # # f[k,l].h[r-k,c-l] k=r-w l=c-v 3.8.7 Digital Filters. ---------------------- 3.8.7.1 Finite Impulse Response. -------------------------------- Recall the discrete convolution equation (3.8-2), rewritten with minor changes of notation (simply: N -> M and h[] -> b[] - this is just to agree with common usage): M-1 (3.8-8) y[n] = x[n] (*) b[n] = # x[n-m].b[m] m=0 This is, in fact, a digital filter. If we allow the index 'n' to represent discrete time (t = n.dt), then eqn. 3.8-2 says that the output of the filter at n (e.g. now) is a weighted weighted sum of the current input (x[n]) and the last N-1 inputs (x[n-1], x[n-2], ..., x[n-N+1]), i.e. see Figure 3.8-1, \ x[n-4] b[4] | x[n-3] b[3] \ x[n-2] b[2] + multiply and sum ---> y[n] x[n-1] b[1] / x[n] b[0] | x[n+1] / x[n+2] Figure 3.8-1 Another Way of Looking at Convolution. -------------------------------------------------- The above digital filter is called a FINITE IMPULSE RESPONSE (FIR) filter, because the impulse response is b[0] .. b[M-1], which is clearly finite in duration; or non-recursive - for reasons which will become evident soon. If we were doing running averages over the last five samples, this would be (3.8-9) y[n] = 1/5 ( x[n] + x[n-1] + x[n-2] + x[n-3] + x[n-4]) i.e. b[i] = 1/5, i=0..4, and 0 otherwise. 3.8.7.2 Recursive or Infinite Impulse Response (IIR). ----------------------------------------------------- The general form for a recursive filter is: M-1 K (3.8-10) y[n] = # x[n-m].b[m] - # y[n-k].a[k] m=0 k=1 That is, to get the current output, we do a weighted sum of the last M inputs (as before), plus, we also use the last K OUTPUTS. Thus, the filter is called RECURSIVE; also INFINITE IMPULSE RESPONSE - because of the recursion, the impilse response is, theoretically, infinite in duration. For M = 5 and K = 3, we have the situation depicted in Figure 3.8- 2, \ y[n-5] x[n-4] b[4] | y[n-4] x[n-3] b[3] \ |a[3] y[n-3] x[n-2] b[2] + mult & sum + <---|a[2] y[n-2] x[n-1] b[1] / | |a[1] y[n-1] x[n] b[0] | +--------> y[n] x[n+1] / x[n+2] Figure 3.8-2 Recursive Filter. ----------------------------- Ex. 3.8-4. Express the moving average of (3.8-9) as a recursive filter. y[n] = 1/5 ( x[n] + x[n-1] + x[n-2] + x[n-3] + x[n-4]) can be reworked recursively as: y[n] = (1/5).( 5.y[n-1] - x[n-5] + x[n]) = y[n-1] - (1/5).x[n-5] + (1/5).x[n] i.e. K = 1, a[1] = 1 N = 5, b[0] = 1/5, b[1] = 0, b[2] = 0, b[3] = 0, b[4] = 0 b[5] = 1/5 . 3.9 Fourier Transforms and Convolution. --------------------------------------- Although the DFT is useful in its own right, one of its greatest uses is as a tool for efficient computation of convolutions. Let N-1 y[n] = x[n] (*) h[n] = # x[n-m].h[m] m=0 Then, (3.9-1) Y[u] = N. X[u].H[u] where X[] is the DFT of x[] H[] is the DFT of h[] and Y[] is the DFT of y[] N is the length of the DFT That is, convolution in the time or spatial domain is replaced by multiplication in the frequency domain (except for the multiplicative factor 1/N). Thus, we can do convolution as follows: (1) Take DFT of x[]: X[u] = DFT(x[]) (2) Take DFT of h[]: H[u] = DFT(h[]) (3) Multiply DFTs : Y[u] = X[u].H[u] , u=1,..N-1 (4) Take InverseDFT of Y, multiplied by N : y[n] =IDFT(N.Y[]) Why go to all this bother, when we have an easily applied formula for the convolution? Unless we implement the DFT using the FFT there is no reason. But with the FFT there is great saving to be made. Consider N = 1024, log2(1024) = 10, [2**10 = 1024]. Let us convolve h[] and x[] each 1024 long sequences. (a) Calculations for straight convolution: N*N (multiplies + add) = 1 Million (b) Using FFT: Step 1. FFT of x : Nlog2(N) complex operations Step 2. FFT of h : ditto Step 3. Multiply X.H: N complex multiplies. Step 4. IFFT of Y : Nlog2(N) complex ops. Thus, a total of (3Nlog2(N) + N) complex operations, assuming multiply takes nearly the same time as mult. + add. = 3000.10 + 1000 complex ops. = 31000 Now if complex operations take about TWICE the time of ordinary, we have: FFT Method versus Straight Convolution -------------------------------------------------- 62,000 1,000,000 -------------------------------------------------- Thus, the FFT is 16 times faster. -------- With larger signals, and with images (as we shall see), the savings are even greater. Ex. 3.9-1. The convolution of a rectangular sequence, (see Ex.3.8-2.) {1,1,1,1,0,0,0,0....} with itself is a triangular sequence, {shaped like a triangle}. If the DFT of the rectangular sequence is a 'sin(x)/x' function - see Gonzalez & Woods p.83, what is the DFT of a 'triangle'. Ex. 3.9-2. Verify that for h[] = [1,0,0,...] (an impulse), and for x[] = anything, eqn. (3.9-1) makes good sense: (3.9-1) Y[u] = N.X[u].H[u] We will work with a sequence of length 8. Recall Ex. 3.6-3, where we worked out the DFT of an impulse h[] = [1,0,0,0...] : ---------------------------------------------- u 0 1 2 3 4 5 6 7 --------------------------------------------- H[u] 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 ---------------------------------------------- Therefore the right-hand side of eqn. (3.9-1) becomes: = 8. X[u]. (1/8) = X[u] Therefore Y[u] = X[u] u=0,1..7 and if we take the Inverse DFT (IDFT): IDFT(Y[]) = IDFT(X[]) = x[] i.e. we have shown, via eqn. (3.9-1) that the convolution of x[] with an impulse is itself (unchanged). And, see section 3.8.2, we know this to be true. I.e. eqn. (3.9-1) has been verified for this case. Here is an example, from DataLab, of convolution via DFT, see section 3.7.2 for a better description of the FFT function, 'four1'. for(i=1,c=cl;c<=ch;c++,i+=2){ IMget(&val,d,r,c,ims1); data1[i]=val; data1[i+1]=0.0; /*zero complex part*/ IMget(&val,d,r,c,ims2); data2[i]=val; data2[i+1]=0.0; /*zero complex part*/ } /* now do fft */ four1(data1,len,-1);/* see Num. Rec. p 411 NB. use of -1 for forward*/ four1(data2,len,-1); /* now extract re and cmplx parts and multiply*/ for(i=1,c=cl;c<=ch;c++,i+=2){ re1=data1[i]; im1=data1[i+1]; re2=data2[i]; im2=data2[i+1]; /* --- multiply DFTs --- */ re=re1*re2-im1*im2; im=im1*re2+im2*re1; } } four1(dt1,len,1); /* -- inverse DFT --- */ for(i=1,c=cl;c<=ch;c++,i+=2){ re=dt1[i]; IMput(&re,d,r,c,imd); } } 3.10 The Discrete Fourier Transform as a Matrix Transformation. -------------------------------------------------------------- Why is the DFT called a 'transformation? In 3.2.1 we said that multiplying a vector by a matrix is a transformation, i.e. (3.10-1) y = A . x (mx1) (mxn) (nx1) If y is (Nx1) and x is (Nx1), A must be (NxN), (3.10-2) y = A . x (Nx1) (NxN) (Nx1) This can be expressed using summation as, N-1 (3.10-3) y[u] = # aun . x[n] n=0 u=0,1,...N-1 where aun is the element of A in row u, col n y[u] is the uth element of vector y x[n] is the nth element of vector x. Eqn. 3.10-3 is precisely the same form as eqn. 3.6-1 N-1 (3.6-1) X[u] = (1/N) # x[n] exp(-j.2PI.u.n/N) n=0 u=0,1...N-1 where aun is now complex, (3.10-4) aun = (1/N) . exp(-j.2PI.u.n/N) But, as pointed out in sections 3.2.8 and 3.2.9, this makes no difference to the general principle. We repeat an earlier interpretation of the DFT, as follows: row u of the transformation matrix is made up of, cos(2PI.u.n/N) - j.sin(2PI.u.n/N), n=0,1...N-1 i.e. the cos() part is the cosine-wave with frequency 'u', and ditto the sine part, again with the same frequency. The values y[u] depends on how well x[], the input signal, matches the sine and cosine waves. Of course y[] is now complex, with, - the 'real' part corresponding to cos(u...) - the imaginary part corresponding to sin(u..). If you prefer, you can think of the transformation as being non-complex and having 2.N rows, N rows corresponding to 'cos' terms, and N corresponding to 'sin' terms. However, you can lose a lot of mathematical flexibility in this move. As can be imagined, the Inverse DFT just applies the inverse matrix, B = A inverse, (3.10-5) bnu = exp(j.2PI.n.u/N) Ex. 3.10-1. Check that the DFT and IDFT are indeed inverses of each other, i.e. find a few elements of C = A.B where A and B are given by eqns. 3.10-3 and 3.10-5. Check that C is, indeed, the identity matrix (see section 3.2.6), i.e. it has 1s along the diagonal, zero elsewhere. Ex. 3.10-2. Verify that the DFT matrix is orthogonal, i.e. its transpose is its inverse (see section 3.2.7). 3.11 Cross-Correlation. ----------------------- The cross-correlation of two sequences is given defined as: +inf (3.11-1) c[k] = x[n] (c) y[n] = # x[m].y[m+k] m=-inf What we obtain from the cross-correlation is 'how well y[], shifted right by k points, matches x[]. For example, if x[] is exactly the same shape as y[], only delayed by 10 points, then c[] above will exhibit a strong peak at c[k=10]. Cross-correlation is the same as convolution only we do not 'flip' the correlation signal, y[], prior to application; see section 3.8.1 for a discussion of convolution, (and the 'flip'). Thus, cross-correlation can be done efficiently using the FFT - see section 3.9; we just reverse one of the signals, before applying the DFT. As we shall see later, there are a great many applications of cross-correlation in image processing: (1) Template matching. We want to find if (or where) an object of a certain shape is in an image. First, we create an image containing the object - a 'template' (w[], say). Then we cross-correlate the template with the image (f[]), to produce the cross-correlation image c[]. Those pixels in c[] that are greater than a threshold, correspond to centres of 'matching' objects in the image, i.e. objects that match the template. Obviously, 'template-matching' has many applications in 1-dimensional signal processing, e.g. radar signal analysis, electrocardiogram analysis. (2) Image registration. Two images, f1[], and f2[], represent the same scene, perhaps satellite images taken at different dates. But, they are not registered, i.e. pixel [r,c] of f1[] does not correspond to pixel [r,c] of f2[]. Assume thay are shifted in rows and columns, with respect to one another. Solution: cross-correlate the two images, find the point of maximum correlation, the indices of that point give the shifts. Clearly, also, there are many similar applications in 1-d signals. Ex. 3.11-1. The AUTOCORRELATION is the cross correlation of a signal with itself, +inf (3.11-2) axx[k] = x[n] (c) x[n] = # x[m].x[m+k] m=-inf Explain how the DFT can be used to compute this. 3.12 The Two-Dimensional Discrete Fourier Transform. ---------------------------------------------------- The two-dimensional DFT can be written down simply by extending eqn. 3.6-1 to two-dimensions, as shown in eqn. 3.12-1. However, if we left it at that, we would miss a lot, in fact the two-d. DFT can be applied by successive applications of the one-d. DFT - this is why we spent so much time on the one-d. version. For an N row x M column input image f[r,c], c=0..M-1, r=0..N-1. Two-D. DFT: N-1 M-1 (3.12-1) F[u,v]=(1/MN) # # f[r,c].exp[-j.2PI(u.r/N+v.c/M)] r=0 c=0 for u=0,1...N-1, and v=0,1...M-1. Two-D. Inverse DFT: N-1 M-1 (3.12-2) f[r,c]= # # F[u,v].exp[j.2PI(u.r/N+v.c/M)] u=0 v=0 for r=0,1...N-1, and c=0,1...M-1. Interpretation of Two-d. DFT: ----------------------------- Remember that the u th component of the 1-d. version corresponds to how well the input sequence matches the sine-wave, sin(2PI.u.n/N) - imaginary part, and, cos(2PI.u.n/N) - real part. The u, v th component of the 2-d. DFT corresonds to how well the input image (2-dimensions) matches an image made up by multiplying a sine-wave, sin(2PI.u.r/N), along the columns, by a sine-wave, sin(2PI.v.c/M) along the columns; ditto the cosine part. Thus, recalling our interpretation of the one-dimensional DFT in section 3.6: "The components near the beginning of the DFT correspond to the 'slowly varying' parts of the signal; the components near the middle, the 'fast moving' parts". The same is true for the two-dimensional, only now we have to think of relative 'speed of variation' in two-dimensions. 3.13 The Two-Dimensional DFT as a Separable Transformation. ----------------------------------------------------------- In 3.10 we saw that the one-d. DFT could be written as a matrix transformation, (3.10-2) y = A . x (Nx1) (NxN) (Nx1) Things are more complicated for the two-d. version, but only just. To write eqn. 3.12-1, N-1 M-1 (3.12-1) F[u,v]=(1/MN) # # f[r,c].exp[-j.2PI(u.r/N+v.c/M)] r=0 c=0 in matrix form, we have to observe that the transformation KERNEL, exp[-j.2PI(u.r/N+v.c/M)] can be separated into a 'row part', exp[-j.2PI(u.r/N)], and a 'column part', exp[-j.2PI(v.c/M)], by noting that, (3.13-1) exp[-j.2PI(u.r/N+v.c/M)]=exp[-j.2PI.u.r/N].exp[-j.2PI.v.c/M] The consequence of eqn. 3.13-1 is that eqn. 3.12-1 can be written in matrix form as, (3.13-2) F[u,v] = P . f . Q (NxM) (NxN) (NxM) (MxM) Matrix Q is just the (MxM) one-d. DFT matrix (kernel) that corresponds to DFTing each row. Matrix P is the (NxN) one-d. DFT matrix that corresponds to DFTing each column. For this reason the two-dimensional DFT can be said to be SEPARABLE. We will encounter other separable operators, i.e. those that can be separated into two sets of one-dimensional operations (one for rows, one for columns). Separable operators are particularly important because they generally require O(N**3) operations, while non-separable require O(N**4). Note that, if N=1024, N**3 = 1000 Million, N**4 = 1,000,000 Million (i.e. 1000 times more). If an O(N**3) operation takes one minute, an O(N**4) operation takes nearly 17 hours. Of course, for the DFT, we can use the FFT and achieve even greater savings, i.e. O(N**3) -> O(logN.N**2). If N=1024, logN.N**2 = 10.1,000,000 = 10 Million. Thus, 1 Million Million, reduces to 10 Million! Eqn. 3.13-2 looks a bit horrendous, but becomes more placid if we approach it one step at a time: (a) DFT each column, individually, forgetting that it is part of an image; i.e. apply Q to f. Result: (NxM) image f' with the columns now DFTs. (b) Now DFT each row of image f', again individually. Result: Image F[u,v] that is the two-dimensional DFT of f[r,c]. ------------------------------------------------------- The same considerations apply to the Inverse transform. 3.14 Other Transforms. ---------------------- 3.14.1 General. --------------- A good many other transforms are used in image and signal processing. They all obey the same principles, eqn 3.10-2 in the one-dimensional case, and eqn 3.13-2, in the two dimensional case; only the matrix kernels, A - in one-d., or P and Q in 2-d., change. (3.10-2) y = A . x (Nx1) (NxN) (Nx1) (3.13-2) F[u,v] = P . f . Q (NxM) (NxM) (NxM) (MxM) As mentioned above, only transformations that are SEPARABLE are of much interest; indeed, again for performance reasons, we are often interested only in transforms that have 'fast' implementations, like the FFT; i.e. O(NlogN) operations instead of O(N**2). 3.14.2 Discrete Cosine Transform. --------------------------------- The one dimensional Discrete Cosine transform (DCT) is given by, N-1 (3.14-1) Xc(u) = (2.au/N) # x[n]. cos[PI.u.(2n+1)/(2N)] n=0 where au = 1/sqrt(2), for u=0 = 1 , for u=1,2...N-1 That is we are using cosine functions, only, as a basis for the transform. The Inverse DCT is given by, N-1 (3.14-2) x[n] = # au.Xc[u]. cos[PI.u.(2n+1)/(2N)] n=0 The major interest in the DCT is for image compression, where, for a large class of images, it can be shown to offer better compression than the DFT. The DCT forms the basis of a major image compression standard being promoted by a world authority, the Joint Photographic Experts Group (JPEG). The DCT can be implemented using a double sized FFT, see Rosenfeld and Kak Vol 1. p. 155; the only significance of this statement is that we have a 'fast' algorithm for the DCT. 3.14.3 Walsh-Hadamard Transform. --------------------------------- Returning to the matrix version of transforms, as given by eqn. (3.10-2), (3.10-2) y = A . x (Nx1) (NxN) (Nx1) we can write the matrix for the Walsh-Hadamard transform entirely in terms of +1s and -1s, e.g. 2x2 Hadamard: (3.14-3) A(2x2) = 1 1 1 -1 (8x8) Walsh-Hadamard: aun is given by Table 3.14-1. -------------------------------------------- n 0 1 2 3 4 5 6 7 u -------------------------------------------- 0 + + + + + + + + 1 + - + - + - + - 2 + + - - + + - - 3 + - - + + - - + 4 + + + + - - - - 5 + - + - - + - + 6 + + - - - - + + 7 + - - + - + + - -------------------------------------------- Table 3.14-1. Walsh-Hadamard Transform Kernel for N=8 Examination of Table 3.14-1 indicates the nature of the Walsh- Hadamard transform - the basis waveforms are 'square-wave' in form, compared to the sine/cosine waves of the DFT. In Walsh-Hadamard transforms we speak of SEQUENCY, as a generalised frequency. The Walsh-Hadamard transform has a 'fast' implementation; in addition, since the elements of the matrices are either +1, or -1, no multiplication is required. As with the DCT, the major interest in the Walsh-Hadamard transform is image compression. 3.15 Applications of the Discrete Fourier Transform. ---------------------------------------------------- 3.15.1 Introduction. -------------------- This section discusses the applications of the DFT. Since we have shown the strong relationship between he one-dimensional DFT and the two-dimensional version, and, having noted their analogous properties, we will sometimes discuss just the DFT and make no distinction between one-, and two-dimensions. Often we work in one-dimension for simplicity of notation; extention to two-dimensions is generally straightforward. The major applications are: (1) Analysis of images and signals. That is analyse the frequency content, for example, compute the amplitude spectrum, or power spectrum. (2) Filtering. (3) As a tool for 'fast' computation of convolutions, and correlations. (4) In data compression. (5) In template matching and image registration (i.e. using correlation). (6) Various applications related to description of one-, and two-dimensional shapes; of particular importance, here, is the SHIFT INVARIANT property of the DFT power-spectrum. (7) Image restoration and deblurring. Since this Chapter 3 is already too long, we will try to be brief! You are strongly encouraged to glance at the recommended texts - these give many good examples, especially pictures, which we cannot do here. In what follows, F[u] is the DFT of sequence f[n], F[u,v] is the 2-d. DFT of image f[u,v], Fc[u] refers to cosine 'parts', Fs[u] refers to sine. 3.15.2 Frequency Analysis. -------------------------- The power spectrum given by, (3.15-1) P[u] = Fc[u]**2 +Fs[u]**2 This gives an estimate of the relative 'strength', in the sequence, of frequency 'u'. In two-dimensions, P[u,v] gives the relative strength of cross terms, e.g. a low (slowly varying) frequency along the rows, a high (rapidly varying) frequency along the columns. 3.15.3 Filtering. ----------------- Filtering refers to frequency filtering. A low-pass filter is an operation which, when applied to a sequence, rejects all high-frequencies, and 'passes' all low, i.e. it 'smoothes' the input. A low-pass filter is defined by two parameters: - the cut-off frequency, - the rate of cutoff. A high-pass filter does the opposite, it rejects low-frequencies, and passes high, i.e. 'sharpens' the input. A band-pass filter passes intermediate frequencies, e.g. in a telephone system very low frequencies, below about 300 Hz, and high frequencies, above about 3000 Hz, are attenuated. That is, a band-pass filter with cut-off frequencies at 300 Hz and 3000 Hz is used. Filters can be applied in the spatial domain (or time domain) or in the frequency domain. Spatial domain filters are applied using convolution, see Chapter 4. Frequency domain filters are applied by: (1) taking the DFT, x[n] -> X[u], (2) multiplying the DFT array by an array of weights, (3.15-2) X'[u] = F[u].X[u], u=0,1,..N-1 (3) applying the IDFT, X'[u] -> y[n], the filtered output sequence. For a low-pass filter: F[u] = 1.0 , for u < cutoff = 0.0 , above For a high-pass filter: F[u] = 0.0 , for u< cutoff 1.0 , above. In practical cases more tapered weighting may be applied. In the two dimensional case, similarly, we define pass and reject regions in the (u,v) plane. 3.15.4 Fast Convolution. ------------------------ We have already indicated in section 3.9 that the FFT can be used to perform fast convolutions: If +inf y[n] = x[n] (*) h[n] = # x[n-m].h[m] m=-inf then, Y[u] = X[u].H[u] That is, convolution in the time or spatial domain is replaced by multiplication in the frequency domain. Thus, we can do convolution as follows: (1) Take DFT of x[]: X[u] = DFT(x[]) (2) Take DFT of h[]: H[u] = DFT(h[]) (3) Multiply DFTs : Y[u] = X[u].H[u] , u=1,..N-1 (4) Take InverseDFT of Y : y[n] = IDFT(Y[]) For NxM images, each of the arrays above, X[], H[], Y[] are two-dimensional. The multiplication in step (3) becomes: Y[u,v] = X[u,v].H[u,v] , for u=0,1,2...N-1, v=0,1,2...M-1. 3.15.5 Fast Correlation. ------------------------ If we observe that cross-correlation (sometimes called just correlation), eqn. 3.11-1, +inf (3.11-1) c[k] = x[n] (c) y[n] = # x[m].y[m+k] m=-inf is 'convolution' with one of the sequences reversed, then the method given in section 3.15.4 can be used to provide 'fast' correlation, i.e. we only require the preliminary reversal, x[i]=x[N-1-i], i=0,1..N-1 Again, the extension to two-dimensions is straightforward. Note: the reversal is not really neccessary. Examine the DFT as given by eqn 3.6-1, N-1 (3.6-1) X[u] = (1/N) # x[n] exp(-j.2PI.u.n/N) n=0 Arguing qualitatively, reversing x[n] is equivalent to setting n = -n , in exp(). Now, (3.15-3) exp(jB) = cosB + j.sinB exp(-jB) = cosB - j.sinB i.e. exp(jB) is the complex conjugate of exp(-jB), so that the DFT of the reversed sequence is the complex conjugate of X[u] (X*[u]). Thus, for correlation, we replace, (3) Multiply DFTs : Y[u] = X[u].H[u] , u=1,..N-1 with, (3) Multiply complex conjugate of DFT of x[], with DFT of H[] : Y[u] = X[u].H[u] , u=1,..N-1 3.15.6 Data Compression. ------------------------ Generally speaking, observe that the 'large' components of an image are more important to human viewers, in the majority of cases, for the majority of uses. Observe, also, the ready acceptance of fairly blurred pictures in newspapers, or as family snaps - the usefullness of the picture is only slightly degraded by lack of sharpness. The essence of data compression (image data compression, signal compression, or any other) is to reduce the amount of data to be transmitted, stored, etc. Now recall that the DFT of an (NxN) image has, itself, (NxN) data elements, and that the original image can be reconstructed from the DFT by applying the IDFT. Consider, then, dispensing with some of the higher frequencies, say K of them. Now we have only (N-K) DFT data elements to transmit. We can reconstruct using the IDFT, the only loss will be in a slight smoothing of the image (equivalent to low-pass filtering). This is a very qualitative argument - but you can see what is happening. If we removed K rows, or K columns, or K arbitrary pixels from an image, the efffect would be enormous, by removing K elements from the DFT, the effect is much less, or, at least, much more subtle. The Discrete Cosine and Walsh-Hadamard transforms can be used in an analogous manner. 3.15.7 Deconvolution. --------------------- DECONVOLUTION is the process of restoring an image (or signal) that has been subjected to (unwanted) degradation by convolution. Usually this takes the form of blurring. The convolution, +inf (3.15.4) y[n] = x[n] (*) h[n] = # x[n-m].h[m] m=-inf is not, in general, reversible. If we have y[n], the result of convolving a wanted signal x[n] with some 'unwanted' operator (e.g. blurring), and we don't know the operator's impulse response, h[], we have very little chance - but see below. It is as if we have the sum of ten numbers, and are asked to estimate what are the individuals that were used to make up the sum - impossible! If, however, we know h[], or can estimate it, we can use the DFT to DECONVOLVE h[] (invert the convolution) as follows: (a) If the observe that Y[u] = X[u] . H[u] (convolution -> multiplication in frequency domain), (b) therefore, using h[n], or an estimate of it, compute using the DFT, H[u]. (c) compute X'[u] = Y[u]/H[u] , u=0,1...N-1 (i.e. you can 'cancel-out' the effects of the convolution, in the frequency domain), (d) obtain the 'restored' x[n], using the IDFT. The extension to two-dimensions is again straightforward. Ex. 3.15-1. An amateur photographer chances upon a bank robbery. As the robbers' van speeds past him he takes a photograph of the side of the van. When printed the photograph turns out to be blurred - the photographer forgot to 'pan' with the moving van; thus the photograph has been 'convolved' with a smoothing function. The sign on the van cannot be read. Assuming the the smoothing was along the horizontal (i.e. along the rows) suggest a digital image processing method for restoring the image. (The image is scanned at 100 pixels per inch, the smoothing appears to be about 1/10 inch wide, i.e. 10 pixels). Suggest a restoration technique. Ex. 3.15-2. If, in the previous question, the smoothing was NOT along the horizontal, how could we remedy the situation; (the processing is simpler if the smoothing is along one dimension, only). Ex. 3.15-3. Suggest a way in which we might estimate the blurring function. Hint: if the van was black, and there was a shiny bright spot somewhere in the field of view of the photograph,...? Recently, the Hubble Space Telescope was launched in a satellite. Unfortunately, the optics had not been tested properly, and the resulting images were blurred. From a knowledge of the source defect, and, by examining actual blurred results, it was possible to estimate the blurring function, and, by deconvolution, to partially restore the images. Deconvolution (one-d. sound signals) is very important in oil prospecting. end of chapter 3.