University of Ulster, Magee College. BSc. Applied Computing (1220), Year 4. Course: Image Processing (AC460). Autumn Term 1995. Lecturer: J.G. Campbell. Date: / /95 7. Aspects of Image Segmentation. --------------------------------- 7.1 Introduction. ----------------- Image segmentation is a process which partitions an image into regions (or segments) based upon similarities within regions - and differences between regions. Commonly, an image represents a scene in which there are different objects or, more generally, regions; it is a widely held view that the first stage, in human interpretation of a scene, is its division into regions; hence, there is great interset in image segmentation. However, although humans have little difficulty in separating the scene into regions, this process can be difficult to automate. Figure 7.1-1 shows a possible context for segmentation - this might represent the information flow in an automated inspection system that monitors that manufactured parts are the correct shape, or a character recognition system, etc. Raw Image [Pixel values are intensities, noise | corrupted] | | (Preprocessing) | Preprocessed Image [Pixels represent physical | attribute, e.g. thickness | of absorber, greyness | of scene] | | **(Segmentation)** | Segmented / Symbolic Image [Label each pixel | e.g. object/background] | | Feature Extraction | e.g. line detection, moments | Extracted Features / Relational Structure | | Shape Detection/Matching | (Pattern Recognition) | Identity and Position of an object Figure 7.1-1 Image Analysis Model --------------------------------- The major point to note in Figure 7.1-1 is that we start off with 'raw-data' (an array of grey levels) and we end up with 'information' - the identification and position of an object. As we progress, the data & processing move from low-level to high-level. But here we are mainly concerned with the process of labelling each pixel either 'object' or 'background' - SEGMENTATION. Examples of practical segmentation tasks are: - segment an image of metal parts on a conveyer belt into metal (object), background non-metal, i.e. conveyer belt. - segment a scanned image of a printed page into ink, white page (background). The result could be used for fax transmittal, or for further processing to recognise characters. - segment an X-ray image of a printed circuit board into metal (object), or plastic (background). Again, further processing may be desired to identify individual tracks and other features. - the same could apply to a medical X-ray: soft-tissue, bone. - segment a multispectral satellite image of the earth into regions of different landuse. Although, in this case the pixels are vector valued (each pixel has many attributes - not just one grey level), the underlying principles are the same. - segmentation for data compression: if we have an image with (say) only two 'colours' of interest in it; let us say metal, non-metal; assume the image is 256 grey-levels x 3 colours, i.e. 24 bits per pixel. Now if we can segment into metal, non-metal, we can get away with 1 bit ! - provided we transmit, before the 1 bit label data, what is called a 'code-book'; the 'code-book' gives the 'mean' colour of the two regions / objects, from which the picture can be reconstructed. The paper by Haralick and Shapiro, see reference at end of section, gives a good 'wish list' for segmentation: "What should a good image segmentation be? Regions of an image segmentation should be uniform and homogeneous with respect to some charactersitic (property) such as grey tone or texture. Region interiors should be simple and without many small holes. Adjacent regions of a segmentation should have significantly different values with respect to the characteristic on which they (the regions themselves) are uniform. Boundaries of each segment should be simple, not ragged, and must be spatially accurate." In this chapter we will cover the three general approaches to image segmentation, namely, single pixel classification, boundary based methods, and region growing methods. There are other methods - many of them; and, unfortunately, segmentation is one of the areas of image processing where there is certainly no agreed theory, nor agreed set of methods. Broadly speaking, single pixel classification methods label pixels on the basis of the pixel value alone, i.e. the process is concerned only with the position of the pixel in 'grey level space', or 'colour space' - in the case of multivalued images. The term CLASSIFICATION is used beacuse the different regions are considered to be populated by pixels of different CLASSES. Boundary based methods detect boundaries of regions; subsequently pixels enclosed by a boundary can be labelled accordingly. Finally, region growing methods are based on the identification spatially connected groups of similarly valued pixels; often the 'grouping' procedure is applied iteratively - in which case the term relaxation is used. R.M. Haralick and L.G. Shapiro "Image Segmentation Techniques" Comp. Vision Graphics Im. Proc. Vol 29 pp 100-132 1985 7.2 Single Pixel Classification ------------------------------- 7.2.1 Introduction. ------------------ Single pixel classification is reminiscent of the 'point' enhancement methods covered in Chapter 4; each pixel is labelled according to its OWN value - for that labelling, the values of its neighbours are irrelevant. We start off with simple thresholding, where choice of label depends on which side of a threshold the pixel value lies; this is applied globally. Next, as in Chapter 4, we discuss how we can make the threshold adaptive to local variations in the image. Then we explain thresholding can be used to determine boundaries, and, hence, regions. Then we generalise thresholding to 'level-banding' for multilevel / multiclass problems. Next we generalise monochrome thresholding to multispectral 'classification', and finish off with unsupervised classification: clustering. 7.2.2 Threshold selection ------------------------- Let the input image have a histogram p(z). Let there be two peaks in p(), at zi, zj, i.e. p(zi) is a local maximum (local = the maximum of p() within Zl grey levels, e.g. Zl = 10), and ditto p(zj). Let zk be the mid-point between zi and zj, ( zk=(zi+zj)/2 ), then if p(zk) is much smaller than either p(zi) or p(zj) (i.e. zk is in a significant valley), then zk should be a useful threshold for segmenting the picture. The segmentation rule is: label background if z < T1 label object otherwise i.e. in pseudo-code: if z < T1 label = background else label = object. Figure 7.2-1 gives an example (this is similar to the histogram of an object on a black background, with noise added, e.g. the rectangle example in DataLab). 0 +-----------------------> p(z) 0| | |* |************* 60 |******************** |*********** |** |* | |* |******* 160|**************** |****** |** |* | V (not to scale) z Figure 7.2-1 Histogram of Object on Dark Background with Noise. -------------------------------------------------------------- Thus, zi = 60, zj = 160; pk = 100 (mid-point); thus, threshold at 100 would separate the object from the background. This method is highly intuitive. Often 'ad hoc' changes have to be made to cope with particular distributions of grey levels. The 'clustering' methods mentioned below are more general. 7.2.3 Local versus Global ------------------------- If there is unevenness of illumination the histogram approach may run into difficulties - the histogram may tell more about the illumination distribution, than about the colours of the object(s) and background. In such cases it may be beneficial to perform segmentation first on small local areas, followed by some rationalisation of the different segmentation results. 7.2.4 Segmentation Based on Boundary Pixels. ------------------------------------------- In some cases (see for example p. 455 of Gonzalez & Woods) there may be two or more background levels (the same may be true for illumination effects - see (b) above), which may confuse the segmentation, e.g. there will be large peaks in the histogram, corresponding to each level. In such cases it may be better to work on a histogram of only those points on or near boundaries. I.e. points with a large gradient value (see Chapter 4). 7.2.5 Multilevel Thresholding. ----------------------------- Same principle as in 7.2.2; e.g. imagine Figure 7.2-1 with three peaks, not two - i.e. lowest peak is noise on the dark background, the next corresponds to a grey object, and the third to a white object. Again, look for local peaks in histogram. Choose (multiple) thresholds in between. Then apply rule: label background if z < T1; label object1 if T1<= z < T2 label object2 if T2 < z or in pseudo-code: if z < T1 then label = background else if z < T2 then label = object1 else label = object2. 7.2.6 Multilevel Thresholding for Edge Detection. ------------------------------------------------- Consider an image that has two main regions (say grey level 3 +/- 2 (A) and grey level 10+/-2 (B)). It is likely that at the boundary between A and B the grey level will gradually increase between (say) 3 and 10, i.e. the sequence of pixels along a row might be 3,5,6,7,9,10. If we segment the image into {5..7} (Edge) and others, we will get a contour showing the edge between the two regions. 7.2.7 Multispectral Images. --------------------------- Consider a two-colour image. Its histogram can be represented as a two-dimensional scatter plot showing the numbers of pixels in slots. See Figure 7.2-1, which shows the two-dimensional histogram of an image having Red and Green bands. Slots left blank are zero. 10 | 14931 9 | 244 | 1 121 7 | 112311 | 132 Red | | | 1221 2 | 113311 | 125711 0 +----------------------------- 0 Green 10 1 4 7 Figure 7.2-1 Two-dimensional scatter diagram/histogram ------------------------------------------------------ If an image corresponding to Figure 7.2-1 was to be segmented, it would be natural to segment it into three regions - corresponding to the clusters of points at (Green,Red) = (1,10), (7,7), and, (4,2). Obviously it would be possible to extend the histogram valley method (see (a) above) to two-dimensions - and beyond - but a more general concept - that of clustering - comes into play. 7.2.8 Clustering. ---------------- Clustering has been studied by statisticians for many decades. Their interest is in finding significant groupings of points according to some property or set of properties. In one dimension clustering is easy to visualise - a cluster is a 'hill' shape in the histogram. In two-dimensions ditto. In three dimensions a 'globule' where the points are fairly dense (i.e. a cluster!). Beyond that, visualisation is difficult, but the principle is the same - peaks in the density are good cluster centres. 7.2.10 k-Means Clustering. ------------------------- There is a vast array of clustering methods - see any book on image processing or pattern recognition. A very simple, but powerful method is k-Means clustering (sometimes called ISODATA - though ISODATA is really a special algorithm for doing k-means clustering): Procedure k-Means Clustering: Inputs: nc = number of classes, xij data to be clustered; i=1..d; j=1..N d = dimensionality of data vectors (d colours) N = total number of pixels. Outputs: labj, j=1..N, labels of pixels xj labj = {1..nc} Note: For the purposes of this procedure we are totally unconcerned with the spatial position of pixels - just their position in data space - see Figure 7.2-1. 1. Partition the pixels arbitrarily into nc classes (labels), e.g. divide the image into nc regions, label all the pixels in the first region '1', second region, label '2', etc. 2. Compute the mean vectors of each class: mic[i][c], i=1..d, c=1..nc 3. Procedure: k-Means Classifier: 3.0 Change = False 3.1 for j = 1..N do 3.2 reset Distmin = Bignumber, class = 0 3.3 Find nearest mean: 3.3.1 for c = 1..nc do 3.3.2 Compute Dist = Distance (xj to mean mc) 3.3.3 if Dist < Distmin then Distmin = Dist class = c 3.3.4 end. 3.4 labj = class; if 'changed' set Change = True. 3.5 end. 4. if Change = True goto 2 (*loop again*) 5. end. (* we are finished*) Function Distance(x,m) (*this calculates squared Euclidean distance - because there is no need to take square root, since we are not interested in actual values, just in finding the minimum*) Dist = 0 for i = 1 .. d do Dist = Dist + (xi -mi)**2 (*Euclidean distance*) return Dist end. The above algorithm always terminates, i.e. when there are no class changes. It will always iterate to the same result - NO MATTER THE STARTING POINT. I have found that it terminates in about nc**2 iterations, i.e. if there are 4 classes, about 15 to 20 iterations. Note: this method morks equally well for 1-dimensional data as for 50-dimensional. The following distance may be quicker (especially for those without maths co-processors): Dist = Dist + abs(xi-mi) (* 'city-block' distance*) Ex. 7.2-1. Use of DataLab function 'cukm'. Generate a rectangle image ('grect') add noise to it ('ggna'). Run cukm with the argument 2 (number of classes). After termination the 'label' image is displayed. Note: DataLab images can have label images - in addition to 'data' images; the image needs to be configured ('config') to have labels (nl=1, or 2), if the image has no labels 'cukm' will not work. Example 7.2-2. The following 10 x 10 image is a small part of a satellite image of the River Foyle and its east bank at St. Columb's Park. Segment the image using (a) k-means clustering (k=2), (b) thresholding based on histogram; use broad grey level slots for your histogram, e.g. 20 levels: 0-19, 20-39, ... 0 2 7 7 7 10 10 10 13 15 5 7 10 10 10 10 10 10 15 31 7 10 10 10 10 13 15 23 71 92 10 10 10 13 15 23 63 106 132 116 10 10 18 45 47 79 124 135 124 122 13 45 87 79 79 119 138 116 116 127 50 114 87 77 106 132 140 124 132 122 106 103 90 119 132 146 146 135 138 135 108 85 108 132 146 148 143 132 124 127 92 100 127 140 151 148 148 135 98 92 (a) Answer: - from DataLab [use batch file 'ipderry1'] DataLab I-02 01/11/94 21:34 IMage Print Type = IBYTE Bounds = Rows: 0 - 9, Cols: 0 - 9, Dims: 0 - 0. Number of classes: 2 The 2 labels are: 1 2 Class frequencies: 43 57 Prior probabilities: 0.430000 0.570000 Class means ----------- Class 1: 15.790698 Class 2: 118.070175 Min: 0.00 Max: 151.00 Dim: 0 0 2 7 7 7 10 10 10 13 15 etc... Here are the data labelled - printed using 'tpsv': [0, 0] label:data 1: 0,1: 2,1: 7,1: 7,1: 7,1: 10,1: 10,1: 10,1: 13,1: 15, 1: 5,1: 7,1: 10,1: 10,1: 10,1: 10,1: 10,1: 10,1: 15,1: 31 1: 7,1: 10,1: 10,1: 10,1: 10,1: 13,1: 15,1: 23,2: 71,2: 92 1: 10,1: 10,1: 10,1: 13,1: 15,1: 23,1: 63,2:106,2:132,2:116 1: 10,1: 10,1: 18,1: 45,1: 47,2: 79,2:124,2:135,2:124,2:122 etc... [5, 0] 1:13,1:45,2:87,2:79,2:79,2:119,2:138,2:116,2:116,2:127 [6, 0] 1:50,2:114,2:87,2:77,2:106,2:132,2:140,2:124,2:132,2:122 [7, 0] 2:106,2:103,2:90,2:119,2:132,2:146,2:146,2:135,2:138,2:135 [8, 0] 2:108,2:85,2:108,2:132,2:146,2:148,2:143,2:132,2:124,2:127 [9, 0] 2:92,2:100,2:127,2:140,2:151,2:148,2:148,2:135,2:98,2:92 Ex. 7.2-3. Verify that the 3-colour image below gives the same labelling as the one-colour image in Ex. 7.2-2. Answer: DataLab I-01 01/11/94 21:34 IMage Print Type = IBYTE Bounds = Rows: 0 - 9, Cols: 0 - 9, Dims: 0 - 2. Number of classes: 2 The 2 labels are: 1 2 Class frequencies: 43 57 Prior probabilities: 0.430000 0.570000 Class means ----------- Class 1: 15.790698 34.348839 37.465115 Class 2: 118.070175 71.982452 53.035088 Data Ranges ----------- Min: 0.00 15.00 22.00 Max: 151.00 103.00 82.00 Dim: 0 (infra-red) 0 2 7 7 7 10 10 10 13 15 5 7 10 10 10 10 10 10 15 31 7 10 10 10 10 13 15 23 71 92 10 10 10 13 15 23 63 106 132 116 10 10 18 45 47 79 124 135 124 122 13 45 87 79 79 119 138 116 116 127 50 114 87 77 106 132 140 124 132 122 106 103 90 119 132 146 146 135 138 135 108 85 108 132 146 148 143 132 124 127 92 100 127 140 151 148 148 135 98 92 Dim: 1 (red) 15 23 31 31 23 23 23 23 23 39 23 31 23 23 23 31 23 15 31 39 31 23 31 31 31 23 31 47 63 55 31 39 47 47 31 31 55 79 95 71 47 63 55 39 39 55 71 79 95 87 71 55 39 47 63 71 71 79 95 87 63 39 47 63 79 95 95 95 103 95 47 39 55 71 87 103 103 103 87 79 39 47 71 87 87 87 87 79 55 55 39 55 63 79 63 55 79 63 55 71 Dim: 2 (green) 30 30 37 45 37 45 37 30 37 37 30 30 30 37 37 37 30 22 45 45 37 30 45 52 30 30 37 30 52 52 45 45 45 45 30 22 52 67 67 60 60 45 37 37 30 37 52 67 75 60 52 30 30 45 45 52 60 67 82 60 37 22 30 45 60 67 67 67 75 67 30 22 37 60 60 67 75 67 60 60 22 30 52 67 67 60 60 52 45 52 22 30 45 52 45 45 52 45 45 67 7.2.10 Discussion. ----------------- With respect to the Haralick and Shapiro criteria given in the introduction, the single pixel methods described in this section clearly perform well on the homogeneity criteria. But, since they are concerned only with single pixels, they may not perform well on the spatial criteria, i.e. interiors ... without small holes, boundaries ... not ragged. It is possible to smooth the results of single pixel segmentations using 'shrink-expand' or 'expand-shrink' smooothing - see Niblack p. 117. General Remark: be wary of authors who put forward a general theory of segmentation - there is none. Also, be wary of ad hoc methods that work on one type of image - they may not transfer well to another type, nor (maybe) be robust to changes in (say) noise level. Note on Neural Netorks: The Kohonen neural network effectively implements a k-means clustering. 7.3 Boundary Based Methods -------------------------- Summary: Find boundaries (connected and closed) between regions; pixels enclosed by the boundary are in the region. (see Niblack Chapter 5.2, Rosenfeld Chapter 10) Chapter 4 has described methods of enhancing edge points - i.e. gradients. However, these merely produce a measure of 'edginess' for each pixel - there are a few extra steps before we get a set of linked boundary points: 1. Compute the gradient, e.g. using the combination of Sobel operators - see Chapter 4. 2. Thin the edges. There are many methods; two examples: (i) retain only points whose gradient is a local maximum in its gradient direction, (ii) (simpler). Work on horizontal and vertical edges (e.g. Sobel, before they are added). For vertical edges, choose a threshold (T); eliminate all pixels with values less than T. Then scan the image vertically eliminating all but the maximum of groups (connected runs) of edge points. Ditto horizontal. Then add vertical and horizontal thinned edge points. 3. Determine possible edge neighbours. If edge points are to be linked, they must (a) be neighbours, (b) have edge values in the same direction (see gradient angle in Chapter 4) 4. Link edge points -> edge chains. 5. Link edge chains. 6. Extend edge chains, e.g. to 'jump' across a discontinuity in an otherwise continuous chain. 7. Eliminate edge chains: i.e. those that are - too short, - too curvy,... 8. Eliminate non-closed chains. 9. The closed chains now form the boundaries of regions. 7.4 Region Growing Methods -------------------------- (see G&W p. 458...) A simple intuitive region growing method works as follows; note the similarity with clustering - but with a spatial criterion added). (1) Choose a 'seed' point (perhaps a pixel having a value at the peak of the histogram). (2) Grow the region using the two criteria: (a) the difference in grey level (or colour distance if multicolour - see k-Means Clustering algorithm above) between the seed and the candidate is less than D; typically, D would be chosen as some percentage of the image grey level range (max. grey level - min. grey level). (b) the candidate must be 'connected' to another pixel which has already been included. Connected can be defined as 'touching', i.e. one of the eight pixels which border a central pixel. 7.5 Exercises. -------------- 1. Use the histogram / threshold method to segment the image given in Figure 7.5-1 into 2 regions. 1 2 3 1 3 2 3 1 2 3 2 3 1 3 2 3 1 2 3 1 3 1 3 2 3 8 2 3 1 2 1 2 3 7 8 9 9 8 7 1 2 3 1 8 9 9 8 7 7 2 3 1 2 9 9 8 7 7 8 3 3 1 2 9 9 8 7 7 8 3 1 2 3 1 3 2 3 1 2 3 2 3 1 3 2 3 1 2 3 1 3 1 3 2 3 1 2 3 1 2 Figure 7.5-1 ------------ 2. Use the k-Means Clustering algorithm to segment the image in Figure 7.5-1 into 2 regions. Initialise by labelling all the pixels rows 1 to 5 as class 1, all in rows 6 to 10 as class 2, then iterate... (Don't worry, it terminates very quickly). 3. It should be obvious from Ex. 2 that a better initialisation would make the process terminate even sooner, i.e. choose as starting means the histogram peaks, and initialise all labels according to the closest of these means. Redo Ex. 2 using this method. 4. Run an edge detector on Figure 7.5-1. Choose an appropriate 'edge' threshold. Thin the edges. Then link edge points. Find regions. 5. Analyse the problem of segmenting an image of a human face. Ideally you want to be able to extract the face from its background. (a) Identify the major problems with respect to the simple models mentioned above. (b) Will we need to segment into multiple classes? - As well as separating 'face' from background we may need to segment within the face? Mention problems. (c) Would colour help? (d) How will lighting affect the problem? (e) Suggest a layout for the subject (face), camera, and background. 6. Segmentation for data compression. Take the image in Ex. 1, assume the grey levels are coded in FOUR bits [0..15]. (a) Now as in Ex. 1 segment it into two regions - call them regions 0, 1; (b) Calculate the means for each region, m0, m1. (c) code region 0 pixels as '0' and region 1 pixels as '1'; (d) Assuming 8 bits for the means (the 'code-book'), how many bits in the coded image ? (e) reconstruct the image, using the code-book and coded 1-bit image. (f) How would you estimate the 'loss' / 'distortion' caused by the compression. 7. Repeat Ex. 6 for the 10 x 10 three colour image in Ex. 7.2-3. end.