With a little help from Hamming (1989) and Oppenheim and Schafer (1989). For the preliminaries, I've cut and pasted from my Image Processing class notes. 1. Matrix Transformation. Lets stay discrete -- but there's no problem replacing matrix eqn.s with integral equations [except that I break out in a rash -- post-traumatic-stress-syndrome from a class on them sometime in 1970 ;-) ] Here is a matrix transformation. (1-1) y = A x / \ where y is an m-dimensional vector |y1| |y2| | | |..| |ym| \ / / \ x is an n-dimensional vector |x1| |x2| | | |..| |xn| \ / and A is an m-row x n-column matrix / \ |a11 a12 a1n| |a21 a22 a2n| | | | ..arc.. | | | |am1 am2 amn| \ / 2. Convolution. # --> Big-sigma The discrete convolution of x[], and h[] (impulse response of the *linear-time-invariant-system* ) is given by, +inf (2-1) y[n] = x[n] (*) h[n] = # x[n-m].h[m] m=-inf If we agree that h[m] is zero outside the range [0..N-1], (Finite Impulse Response - FIR), we have, N-1 (2-2) y[n] = x[n] (*) h[n] = # x[n-m].h[m] m=0 n = 0..N-1 Convolution is commutative, so that, (2-3) x[](*)h[] = h[](*)x[], i.e. +inf (2-4) y[n] = x[n] (*) h[n] = # x[m].h[n-m] m=-inf 3. Convolution as a Matrix Transform (3-1) y = G x where G is composed or rows containing the impulse response h[]; each row being h[] shifted along by on sample. This is arm-wavy, but correct; The 'system' is represented by a matrix in eqn. 3-1; for a continuous system you have a convolution integral equation. 4. Eigenvectors. A matrix may have certain 'characteristic' vectors (eigenvectors) for which application of the matrix is equivalent to multiplication by a constant. (4-1) G x = l x where G and x are as in (3-1) and l is a constant In this case, the (family of) x's are the eigenvectors, and the l's are the corresponding eigenvalues. 5. Eigenvectors of Linear-time-invariant (LTI) Systems Take eqn. 2-1. +inf (2-1) y[n] = x[n] (*) h[n] = # x[n-m].h[m] m=-inf (Oppenheim and Schafer p. 39) try x[k] = exp(jwk) as a possible eigenvector (5-1) y[n] = # h[m].exp(jw[n-m]) = exp(jwn){ # h[m].exp(-jwm) } Define (5-2) H[exp(jw)] = H(jw) = # h[m].exp(-jwm) and (5-1) becomes (5-3) y[n] = H(jw)exp(jwn) But putting (2-1) in the form of (3-1) y[] = G x and substituting for x a sequence col(exp[jw.0], exp[jw.1] ... we have (5-4) G x = H(jw) x Here, H(jw) is the (complex) eigenvalue; and the complex eigenvector is the complex exponential (5-5) exp(jwk) = cos(wk) + j sin(wk) Of course, H(jw) turns out to be the Fourier transform of the impulse response h[], evaluated at jw. The fact that the F.T. is called the spectrum is related to the eigenvalue connection. H(jw) is called the transfer function, or frequency transfer function. 6. Consequences. 6.1 Purity of response. If you put a signal exp(jwk) into an LTI system, you will get back H'.exp(jwk) where H' = H(jw), is the transfer function at w. No other functions / vectors, e.g. Walsh possess this property 6.2 Optimality of Information Retention (MSE) OK. I was not correct. This depends on the statistics of the process whose signal is to be compressed/ represented. In that case, Karhunen-Loeve is the answer, at least in the mean-squared-error sense. The K-L basis is formed by the eigenvectors of the correlation matrix. If the process is [see (Papoulis, 1991, p. 297)] wide-sense stationary, Also called 'second-order stationary'. If the following two conditions are met, the process is called wide-sense stationary: mX(t) = mX 6.2-1 ie. m is constant for all t, and, RXX(t1,t2) = RXX(t1-t2) 6.2-2 ie. the autocorrelation depends only on (t1-t2) (the displacement). In the discrete case, wide-sense stationarity has the following important consequence, when RXX is expressed as a matrix: RXX = r00 r01 r02 ... r0 n-1 6.2-3 r10 r11 r1 n-1 rn-1 0 rn-1 1 ... rn-1 n-1 Applying eqn. 6.2-2 RXX = r0 r1 r2 ... rn-1 6.2-4 r1 r0 r1 rn-2 rn-1 rn-2 r1 r0 ie. RXX is a circulant matrix -- each row is merely the previous row rotated by one position. Sets of linear simultaneous equations that are characterised by a circulant matrix (or more generally a Toeplitz matrix) can be solved using 'fast' and recursive algorithms - such solution is required in prediction. Another property of Toeplitz matrices is that they are diagonalised by the Discrete Fourier Transform -- an important consequence of this fact is that the DFT is equivalent to the Karhunen-Loeve Transform for such data (the KL transform is an important transform in lossy data compression). Incidentally, discrete convolution can be expressed as multiplication by a circulant matrix -- the delayed impulse response weights form the rows; therefore the matrix may be disgonalised by the DFT, and this is why convolution decomposes into multiplication in the Discrete Fourier domain. [see (Jain, 1989), (Pratt, 1991), (Press, et al, 1992)]. 7. References. R.W. Hamming, 1989, Digital Filters 3rd ed. Prentice Hall, p 31... A.K. Jain. 1989. Fundamentals of Digital Image Processing. Englewood Cliffs, NJ: Prentice-Hall Int. Papoulis A. 1991. Probability, Random Variables and Stochastic Processes. 3rd ed. New York: McGraw-Hill. A.V. Oppenheim and R.W. Schafer, 1989, Discrete-time Signal Processing, Prentice Hall, pp. 39 ... W.K. Pratt. 1991. Digital Image Processing. New York: Wiley- Interscience. W.H. Press, S.A. Teukolsky, W.T. Vetterling,and B.P. Flannery. 1992. Numerical Recipes in C. Cambridge, U.K. : Cambridge University Press. end.