Experimental evaluation of nonparametric clustering algorithms for image segmentation

. Experimental evaluation of 12 nonparametric clustering algorithms for image segmentation was made. Algorithms developed in FRC ICT are compared to ones from ENVI, ELKI and Smile software packages. Seven model datasets were generated to estimate clustering accuracy. The computational efficiency was evaluated using digital photographs and fragments of multispectral images obtained from WorldView-2 satellite.


Introduction
Image segmentation is required for solution of a number of applied problems. These can be multispectral images obtained from satellites, aircrafts or unmanned aerial vehicles, as well as conventional digital photographs (e.g. medical examinations data). Segmentation has two main goals -splitting the image into parts for further analysis, and grouping pixels into higher-level informative structures [1]. One of the most common approaches to image segmentation is based on the use of data clustering algorithms [2]. Image segmentation is usually performed with neither a priori information about the probabilistic characteristics of classes, nor training samples. In these conditions, the most suitable is nonparametric approach to clustering [3]. It allows detecting clusters of complex structure without strict restrictions on probability density function. However, it has not become widespread due to high computational complexity. The use of the grid-based approach makes it possible to achieve high computational efficiency due to processing relatively small number of cells instead of data elements. But the clustering accuracy of detected clusters strongly depends on the grid structure [4]. Efficient density-and grid-based algorithms for multispectral images segmentation have been developed in the FRC ICT. In this paper, an experimental comparison of these algorithms and six most popular clustering algorithms implemented in ENVI [5], ELKI [6] and Smile [7] software packages is performed.

Algorithms and datasets
Experimental evaluation was performed using twelve clustering algorithms. Six of ones (HCA_MS [8; 9], HECA_MS [10], ECCA [11], CCAE [11], MeanSC [12], EMeanSC [13]) has been developed at the FRC ICT. Three efficient nonparametric algorithms (DBSCAN [14], OPTICS [15] and DENCLUE [16]), were taken from ELKI and Smile software packages for data mining. In addition, a parallel implementation of the effective (in terms of clustering quality) iterative density-based MeanShift algorithm [17] was made. The number of iterations in the experiments was limited to ten. Furthermore, software implementations of -means [18] and ISODATA [18] clustering algorithms from ENVI software package were evaluated (the number of iterations was also limited to ten; cluster centers from the previous iteration were used to initialize the next one, improving the quality of segmentation). These algorithms are included in most popular software packages for satellite image processing, and therefore are often used in practice.   In this work, seven model datasets ( Figure 1) and eight test images were used -five digital photos ( Figure 2) and three fragments of multispectral images obtained from the WorldView-2 satellite (Figure 3). Three spectral channels (red, green and blue) were used for processing digital photos, and four (red, green, blue and near-infrared) -for satellite images. The model datasets and RGB composites for the test images are available at [19]. The experiments were performed on a personal computer with an Intel Core i7 CPU (4 cores, 2.3 GHz each) and 8 GB RAM.

Experimental evaluation
In the first experiment, the clustering accuracy was estimated according to the following definition. Suppose that for a dataset = { 1 , … , } of size , the reference partition * into classes { 0 * , … , * } is known. Then, for an arbitrary partition (clustering) into clusters The goal of tuning the algorithm parameters was to obtain the maximum clustering accuracy. The MeanShift, -means and DENCLUE algorithms do not allow detecting "noise", therefore, the values for model dataset No. 6 were calculated without taking the "noise" class into account. The segmentation accuracy values obtained on the model datasets are presented in Table 1, and the processing time is shown in Table 2.
All of the algorithms, except for DENCLUE, -means, and MeanShift, allow obtaining a reference partition for model datasets No. 1-3. Processing of model dataset No. 4 by the MeanSC and EMeanSC algorithms leads to one misclassified data element. The DBSCAN and OPTICS algorithms make it possible to obtain a slightly less accurate result. All algorithms, except for DBSCAN and OPTICS, allowed to obtain a clustering accuracy of about 85% for model dataset No. 5. Errors are caused by significant overlap of model classes. Model dataset No. 6 contains «noise» that could not be detected by the MeanShift, -means and DENCLUE algorithms. The algorithms developed at FRC ICT made it possible to obtain a clustering accuracy higher than 84% for model dataset No. 6. The rest of the algorithms achieved less than 80% accuracy. When processing model dataset No. 7, only the HCA_MS, HECA_MS, ECCA, CCAE, MeanSC and EMeanSC algorithms allow to detect all classes. Applying DBSCAN, OPTICS, DENCLUE, and -means algorithms leads to correct separation only for the normally distributed classes. The results of the experiment demonstrate that the algorithms developed at the FRC ICT are superior to the known nonparametric algorithms in terms of clustering quality and/or processing time.
In the second experiment, the considered algorithms were applied to test images and the processing time was compared. The results are presented in Table 3. Dashes in the table correspond to unacceptably high processing times (more than 18 hours). Analysis of the results shows that the DBSCAN, OPTICS and DENCLUE algorithms do not allow efficiently handle large images. In addition, their processing time increases significantly with increasing number of channels. Algorithms from the ENVI package are better adapted to image analysis, but the processing time for images larger than 9 million pixels exceeds 5 minutes. On the other hand, algorithms HCA_MS, HECA_MS, ECCA, CCAE, MeanSC and EMeanSC allow interactive segmentation of large multispectral images.  5  17  1196  75  302  588  ISODATA  1  15  5  9  1178  68  332  337  DBSCAN  194  2731 13098  --39965  --OPTICS  638  5244 40013  -----DENCLUE  6934 39849  ------