DOI QR코드

DOI QR Code

Evaluating the Performance of Four Selections in Genetic Algorithms-Based Multispectral Pixel Clustering

  • Kutubi, Abdullah Al Rahat (Department of Applied Information Technology, Graduate School, Kookmin University) ;
  • Hong, Min-Gee (Department of Applied Information Technology, Graduate School, Kookmin University) ;
  • Kim, Choen (Department of Forest Resources, College of Science and Technology, Kookmin University)
  • Received : 2018.02.17
  • Accepted : 2018.02.26
  • Published : 2018.02.28

Abstract

This paper compares the four selections of performance used in the application of genetic algorithms (GAs) to automatically optimize multispectral pixel cluster for unsupervised classification from KOMPSAT-3 data, since the selection among three main types of operators including crossover and mutation is the driving force to determine the overall operations in the clustering GAs. Experimental results demonstrate that the tournament selection obtains a better performance than the other selections, especially for both the number of generation and the convergence rate. However, it is computationally more expensive than the elitism selection with the slowest convergence rate in the comparison, which has less probability of getting optimum cluster centers than the other selections. Both the ranked-based selection and the proportional roulette wheel selection show similar performance in the average Euclidean distance using the pixel clustering, even the ranked-based is computationally much more expensive than the proportional roulette. With respect to finding global optimum, the tournament selection has higher potential to reach the global optimum prior to the ranked-based selection which spends a lot of computational time in fitness smoothing. The tournament selection-based clustering GA is used to successfully classify the KOMPSAT-3 multispectral data achieving the sufficient the matic accuracy assessment (namely, the achieved Kappa coefficient value of 0.923).

Keywords

1. Introduction

Due to the little a priori knowledge of the multispectral (MS) data in land cover classification, methods of clustering often called unsupervised classification have been developed for generating classes (i.e., clusters) in such a way as to maximize the MS similarity between the individual pixel values in each cluster and minimize the similarity between the pixel values in discrete clusters. However, conventional clustering algorithms such as the K-means, the iterative self-organizing data analysis technique algorithm (ISODATA) and others cannot yield the optimal classification (i.e., thematic) map of all pixels in the MS image data, when the MS data involve highly nonlinear and overlapping class boundaries in the organization of the feature space into homogeneous regions (i.e., in the intensity space, see Maulik and Bandyopadhyay, 2003) compared to ones on the ground. At the pixel level classification, searching the optimum cluster center for each class in a vast solution space is an optimization problem.

Genetic algorithm (GA) is the popular optimization technique, which is extensively in use to search those optimum solutions efficiently (Goldberg, 1989). Recently, researchers of the MS pixel clustering have been paying attention to GAs in order to improve the classification accuracy (Bandyopadhyay and Pal, 2001; Mitrakis et al., 2008; Guo et al., 2010; Luo and Liao, 2014).

GA was firstly proposed by Holland (1975). GA mimis the principles of natural selection in order to develop the solutions of those large optimization problems. Along with those optimization search techniques, GA gains popularity in artificial intelligence areas such as pattern recognition, robotics, and image processing (Goldberg, 1989; Greffenstette, 1994; Yao and Tian, 2003; Pedergnana et al., 2013). GA then begins to evaluate through manipulating an initial population of individuals, hoping to get better population set in the next generation. Each individual has an associated fitness value, which is a measure of the quality of its solution. This fitness value is used in the selection of individuals, which are then used to generate the next generation through crossover and mutation. The selection mechanism decides which individual should be selected for the mating purpose. Every selection method maintains the general principle that the fittest individual has a higher probability of being a parent. Crossover generates new individuals by combining cross sections of two or more selected parents. Mutation acts by altering randomly selected genes in the individuals. Moreover, the same selection method may have the reciprocal performance in different scenarios. Therefore, this study is intended to test the performance of four selection methods of GAs for MS pixel clustering. We compare and analysis the performance of the tournament, proportional roulette wheel, ranked-based, and elitism selection methods.

The rest f the paper is organized as follows: Section 2 describes the related work. Section 3 depicts the GA procedure for MS genetic clustering. Section 4 discusses the different selection methods. Section 5 describes the methodology for processing KOMPSAT- 3 MS image for this comparison. In Section 6, experimental results are provided to compare the effectiveness and the classification accuracy of the proposed four selection methods. Lastly, Section 7 draws the conclusion.

2. Related Work

There are two key terms to compare the performance of GA: the total number of generations required reaching and how fast or slowly GA converges to the global optimum. Along with these two key terms, the computational complexity can be used to test the efficiency of each selection method in GA. Previously, several performance comparisons of selection methods have been presented for generalized comparisons (Blickle and Thiele, 1995; Zhang et al., 2005; Jodaan et al., 2008; Pandey, 2016) and specific problems (Razali and Geraghty, 2011; Chudasama et al., 2011).

Goldberg and Deb (1991) compared the proportional roulette, ranked-based, tournament and genitor selection methods based on solutions to the deterministic difference or differential equations by using the convergence rate, growth ratio, and computational complexity. They found that the proportional roulette wheel selection is significantly slower than the others are, and the binary tournament is efficient in terms of omputational omplexity. Julstrom (1999) compared the rank-based selection (both linear and exponential) with the tournament selection (both 2-tournament without replacement and k-tournament with replacement). He pointed out that the tournament selection is preferred over the rank-based selection because the repeated tournament selection is much faster than sorting the population in order to assign rank-based probabilities to the individuals regarding computational complexity. Jadaan et al. (2008) compared the proportional roulette wheel with the rank-based roulette wheel selection method in terms of some mathematical fitness functions and found out that the ranked-based method is more efficient than proportional in the number of generations. Zhong et al. (2005) compared the roulette wheel selection with the tournament selection on 7- different generalized mathematical test functions and they stated that the tournament selection converges more quickly than roulette wheel selection. Razali and Geraghty (2011) presented the performance comparison of GA in solving Traveling Salesman Problem (TSP) by using tournament, proportional roulette wheel, and rank-based roulette wheel selections. They found out that the tournament selection is more appropriate for the small size problem, while the rank-based roulette wheel can be used to solve the larger size problem. Chudasama et al. (2011) applied three selection methods-roulette wheel, elitism, and tournament selection method to find outthe optimal solution for TSP problem. They observed it at the initial stage, and all of the selection methods worked similarly, which led to the last stage where the elitism methods get the best fittest population through the tournament and the roulette wheel selection method.

3. Genetic Algorithm Procedure

GA is a simple, random-search, and general-purpose optimization tool, which is motivated by the law of the natural evaluation processes. These algorithms search for a good solution to solve a problem using certain heuristic guidelines. The major motivation of GA is the “survival of the fittest” (Holland, 1975). Moreover, it implies that the fitter individuals have a higher priority to get the chance of passing their genetic features to the next generation. In a GA, each individual is a candidate solution to the problem. There are three main steps for the GA: the initialization of population (usually random), the evaluation of the fitness function, and finally the generation of a new population by selection, crossover, and mutation. In the selection stage, the individuals are selected from the current population depending on their fitness values.

In this case, the fittest individuals have more probabilities to be selected for the next generation by crossover and mutation. GA procedure for the pixel clustering (see Fig. 1) begins from defining some important terms such as the fitness function, termination condition, encodingtype, crossover ad mutation rate, the maximum number of generations, and population size. The initial random population of the individuals is generated from KOMPSAT-3 MS imagery. Each pixel represents an individual by arranging the gray value of each band sequentially. Then the fitness of each individual is evaluated. If the current population does not satisfy the termination condition, then this population is transferred through the parent selection, crossover and the gray value variation methods to generate the new population in order to get the better cluster center.

OGCSBN_2018_v34n1_151_f0001.png 이미지

Fig. 1. Flow chart of GA for pixel clustering of MS data.

4. Four Selection Methods

1) Tournament selection

Tournament selection is the most robust method among the other selection methods because it is simple to code and efficient in the computation to implement on nonparallel or parallel architectures (Miller and Goldberg, 1995). It consists of the major two steps, at the first step, n individuals are randomly selected from the population and at the second step, the fittest individual is chosen from the selected individuals (see Fig. 2). These two steps continue until the mating pool gets full of its size. The number of individuals competing in each tournament is known as the tournament size. Increasing tournament size (ts) degrades the diversity while increasing diversity degrades the convergence speed. The major advantage of this selection method i the efficient time compexity, i.e., O(n) (Yao and Tian, 2003), controlling domination of the fittest individuals by controlling selective pressure (Zhong et al., 2005), and no requirement of fitness scaling or sorting like the rankedbased selection (Goldberg and Deb, 1991).

OGCSBN_2018_v34n1_151_f0002.png 이미지

Fig. 2. Diagram of tournament selection procedure with ts=2.

The pseudocode for the GA of performing tournament selection in this paper is as follows:

Inputs: D = [d1, d2, d3, d4, … , dn], TS

a = 0

while( IsMatingPoolFull(a) )

U = [ ]

for i = 1 to TS do

U = [U, random(D)]

end

A = selectOne(U)

end

Outputs: A

2) Proportional roulette wheel

In this method, each individual corresponds to a portion of the roulette wheel. The wide of each portion is proportional to each individual fitness value. The individual of the highest fitness value occupies the widest portion. Though every portion has a chance to be selected, the widest portion has more probability to be pointed at each spin. Thus, the higher fitted individuals will be selected more than lower fitted individuals. This fitness value is used to associate a probability of selection with each individual. Fig. 3 presents the basic strategy of this selection method. We consider the proportional roulette wheel algorithm described in (Grefenstette, 1997; Razali and Geraghty, 2011).

OGCSBN_2018_v34n1_151_f0003.png 이미지

Fig. 3. Roulette wheel selecion with n times of spin.

Following are the basic steps of the roulette wheel selection, which were given in Shukla et al. (2015).

1. Calculate the fitness value of every individual in the population.

2. Calculate the summation of fitness values within all individuals in the population.

3. Calculate the probability of selection for each individual by dividing its fitness value with the summation calculated in step 2.

4. Partition the roulette wheel into l times (total number of individuals) where an individual of the highest probability of selection occupies the widest portion and so on.

5. Spin the wheel. When the wheel stops, the portion where the pointer points is selected. Finally, the individual that occupies the selected portion goes to the mating pool. Continue Step 5 until the mating pool is full of its size.

If fi is the fitness value of ith individual in the population, its probability of being selected is depicted in Eq. (1)

\(p_{i}=\frac{f_{i}}{\sum_{j=1}^{l} f_{j}}\)       (1)

where l is the number of individuals in the population and j=1, 2, 3…., l.

3) Elitism selection

Elitism is a method where a small portion of the elite individuals is given a chance to pass to next generation unchanged (i.e., crossover and mutation aren’t run on this elite portion).It sometimes has a dramatic impact on performance because it does not waste time for reinventing previously discarded partial solutios. If elitism rate is high, then elitism may cause GA to converge on local maxima instead of the global maxima.

The number of the elite individuals from the population to be kept unaffected for the next generation is called the ‘elitism rate.’ Fig. 4 draws the basic flow of the elitism selection. We consider the elitism algorithm described in Chudasuma et al. (2011) is selected for implementation.

OGCSBN_2018_v34n1_151_f0004.png 이미지

Fig. 4. Diagram of elitism selection flow

The pseudocode for the genetic programming of the elitism selection in this paper is shown below:

Inputs: D = [d1, d2, d3, d4, … , dn]

E = [ ]

N = [ ]

S = [ ]

E = SelectElites(D)

N = SelectNonElites(D)

N = CrossOver(N)

N = Mutation(N)

S = [N, E]

Outputs : S

4) Ranked-based selection

Baker (1985) introduced the idea of ranking individuals according to their fitness values to keep the constant pressure in the parent selection. Thus, he eliminated the probability of its takeover of dominant individuals in the proportional roulette wheel selection. The main strategy of this selection method is to sort the population in terms of their fitness to rank them from the worst to the best. The best fitted individual gets the rank ‘N’ and the worst fited one gets the rank ‘1.’ Then, the same task for parent selection will be done as followed in the proportional roulette wheel selection by using the ranking value from the fitness criteria. The effect of ranking in the roulette wheel selection is shown in Fig. 5. The time complexity of this selection consists of sorting time O (n In n) and selection time (within O (n) and O (n2)) constructed by Goldberg and Deb (1991). Though it preserves the diversity, this method is computationally expensive because of sorting, which may lead to slower convergence as the best individual does not differ so much from the worse one. Either linear or exponential mapping function can be used to map rank values of individuals to their corresponding selection probabilities. In this case, we combine the algorithms stated in Jadaan et al. (2008) and Blickle and Thiele (1995) where mapping functions (e.g., Eq.(2) and Eq.(3)) are used from Blickle and Thiele (1995).

OGCSBN_2018_v34n1_151_f0005.png 이미지

Fig. 5. Effect of ranging individuals in the wide of each portion in the roulette wheel of the ranked-based selection.

Probability function by linear mapping:

\(\begin{array}{c} p_{i}=\frac{1}{\mathrm{~N}}\left(r^{w}+\left(r^{b}-r^{w}\right) \frac{i-1}{N-1} ; \mathrm{i} \in\{1,2, \ldots, \mathrm{N}\}\right. \\ \text { and } r^{b}=2-r^{w} \& r^{w} \in[0,1] \end{array}\)       (2)

where

pi is the selection probability of the ith individual, N is population size. and rw and rb are the reproduction rates of the worst and the best individual, respectively.

Probability function by exponential mapping (nonlinear):

\(p_{i}=\frac{c^{N-i}}{\sum_{j=1}^{N} c^{N-j}} \text { and } \sum_{i=1}^{N} p_{i}=1 ; i=\{1,2 \ldots, \mathrm{N}\}\)       (3)

5. Methodology

1) Cluster center optimization procedure

The purpose of this paper is both to automatically determine optimum cluster center for each land cover class by using GA in the MS pixel classification and to compare performances of the proposed four selection methods in convergence rate, which a number of generations and computational complexities are required to learn the optimum cluster center. Following steps are performed to learn the optimum pixel cluster center.

1. Choose a land cover class. Here, two land cover classes (i.e., sample test regions) are selected namely urban and forest as shown in Fig. 6.

OGCSBN_2018_v34n1_151_f0006.png 이미지

Fig. 6. Test sites of (a) urban cover type, (b) forest cover type, and (c) mixture class cover type (5 ground objects) for performing the classification accuracy test in the KOMPSAT-3 false-color composite

2. Encode (representation of individual) each nband pixel. Here, we use the value encoding i.e. the gray values of all band in a pixel are used in representing individuals. Each pixel of n-band is represented as sequentially arranged n-gray values (see Table 1).

Table 1. Value encoding of a pixel to represent the individual

OGCSBN_2018_v34n1_151_t0001.png 이미지

3. Produce an initial population (formulated in Eq. (4)) by encoding each pixel from the current land cover class (urban or forest) to an individual as described in step 2.

Pt = (It1, It2, ………, Itl); t = 1, 2, 3, …, T       (4)

where T is the number of the different land cover classes, l is the initial population size, and Itl is lth individual from tth land cover class.

4. Calculate the cluster center (Ck) explained in Eq. (5) for the current land cover class by using the mean of the gray values of each band of all individuals in the current population beginning from the initial population produced in step 3.

Ck = (ck,1, ck,2, ………, ck,i); i = 1, 2, 3 …, n,       (5)

\(c_{k, i}=\frac{\sum_{j=1}^{l} I_{k j}(i)}{l}\)

where k is current generation, l is the population size, and n is the total number of the different bands in each pixel. Ikj(i) is th gray value of the ith band in the jth individual (i.e. pixel) in the kth generation.

5. Calculate the fitness value of each individual in the current population (initial population in the first generation) by taking the inverse of the Euclidean distance of the individual to the current cluster center (Maulik and Bandyopadhyay, 2003). The more Euclidean distance declines, the more individual fitness value rises. The fitness function is formulated in Eq. (6).

\(f_{i}(j)=\frac{1}{2 \sqrt{I_{b l}^{2}-c_{k, l}^{2}}}\)       (6)

where fi(j) is the fitness value of the ith band of the jth individual.

6. Apply the selection method described in Sec. 4 to fill the mating pool M for the reproduction.

7. Randomly separate (Rc .M) individuals at the crossover rate (Rc) from the mating pool of size M. Then, apply the single-point crossover on those isolated individuals. Let \(I_{1}^{\prime}, 2,3,4, \ldots\) are the nominated individuals from the mating pool which are randomly paired such as \(\left(I_{1}^{\prime}, I_{2}^{\prime}\right),\left(I_{3}^{\prime}, I_{4}^{\prime}\right),\left(I_{5}^{\prime}, I_{6}^{\prime}\right) \ldots\) The single-point crossover operation is shown in Eq. (7) and Eq. (8).

\(\begin{aligned} I_{1}^{\prime \prime},=&\left(I_{1}^{\prime}(\text { band } 1) \mid I_{1}^{\prime}(\text { band } 2)|\ldots| I_{1}^{\prime}(\text { bandm }) \mid\right.\\ & I_{2}^{\prime}(\text { band }(+1)) \mid I_{2}^{\prime}(\text { band }(m+2))|\ldots| I_{2}^{\prime}(\text { band } \boldsymbol{n}) \end{aligned}\)       (7)

where crossover point m=?n/2? .

\(\begin{aligned} I_{2}^{\prime \prime}=&\left(I_{2}^{\prime}(\text { band } 1) \mid I_{2}^{\prime}(\text { band2 })|\ldots| I_{2}^{\prime}(\text { band } \boldsymbol{m}) \mid\right.\\ & I_{1}^{\prime}(\text { band }(m+1)) \mid I_{1}^{\prime}(\text { band }(m+2))|\ldots| I_{1}^{\prime}(\text { band } \boldsymbol{n}) \end{aligned}\)       (8)

where crossover point, \(m=\left|\frac{n}{2}\right|\). If the number of bands is n>4, one can apply the multi-point crossover.

After crossover operation, newly generated individuals are appended to the unaltered individuals in the mating pool.

8. In the case of optimizing MS pixel cluster center by GA, the mutation refers to a bit variation in the gray value of each band in the pixel. Firstly, separate (Rm .M) individuals randomly at the mutation rate (Rm) from the mating pool of size M. Then, make a bit variation in the gray value o each band of a pixel n selected individuals. Let \(I_{1}^{\prime}, I_{2}^{\prime}, I_{3}^{\prime}, I_{4}^{\prime}, \ldots \ldots\) are the selected individuals from the mating pool. For example, mutation operation for \(I_{1}^{\prime}\) is shown in the Eq. (9).

\(\begin{aligned} I_{1}^{\prime \prime}=& I_{1}^{\prime}+\Delta I ; \text { if } r=1 \\ & \text { Otherwise, } I_{1}^{\prime \prime}=I_{1}^{\prime}-\Delta I \end{aligned}\)       (9)

where r = round (random (.)) and random (.) is uniform random function ∈ [0,1].

9. The cluster center is recalculated after each iteration as step 4.

10. Terminate the reproduction while the Euclidean distance per individual of current generation exceeds the threshold T. The termination condition is as stated in Eq. (10).

\(\frac{1}{\frac{\left|\sum_{1}^{M} f_{t}(k)\right|}{M}}       (10)

where k indicates kth iteration and M is the size of the mating pool. If the Euclidean distance is less than the threshold where the optimal results are found, terminate further reproduction. Otherwise, the above steps should be carried out again.

11. Get the optimal cluster center for the current land cover class.

12. Carry on thes steps again to another land cover class and thus, get the optimal cluster centers for all land cover class.

2) Clustering algorithm

The multispectral pixel (MSP)_Classifier (Algorithm1) shown in Table 2 is the designed algorithm, which is applied in the unsupervised MS pixel classification. This algorithm uses the voting while decision making of assigning the pixel to class. The band which has highest discriminating power is given more extra votes in the voting pool which contributes to improving the classification accuracy. For example, in this study, near-infrared (band 4 of KOMPSAT-3) is given more extra votes in the voting pool because it exhibits highest discriminant power.

Table 2. Matlab procedure for GAs-based MS pixel classification

OGCSBN_2018_v34n1_151_t0002.png 이미지

The Matlab programming language is used to implement these four selection methods of GA. Cluster center optimization procedure and classification algorithm described in this section are also programmed in Matlab. The classification experiments based on the compared four selection methods of GA are conducted on KOMPSAT-3 MS data having five kinds of classes (i.e., land cover categories) such as coniferous, broadleaf, uncultivated farmland, urban, and water corresponding to the mixture cover type shown in Fig.6(c).

6. Used Data and Study Area

The 14-bit digital number (DN) of MS data from KOMPSAT-3 launched May 17, 2012, was adequately used to compare the different election methods in GA for optimizing the cluster center of the homogeneous pixels (i. e. the classification of objects into categories), since the pixel DN values of all MS bands were radiometrically and geometrically corrected according to the developed radiometric coefficients from each of the bands in the Korea Aerospace Research Institute (Yeom et al., 2016). Scenes captured by the KOMPSAT-3 have 0.7 m spatial resolution in the panchromatic band. The MS data on the other band, comprise 4 spectral bands (blue: 450 to 520nm, green: 520 to 600nm, red: 630 to 690nm, near-infrared: 760 to 900nm) and produce color images with essentially 2.8 m spatial resolution.

Fig. 6 shows the false-color scene (pixels consisting of 10646 rows by 12030 columns) composed of KOMPSAT-3 MS bands 4 (near-infrared), 3 (red), and 2 (green) which were acquired on May 23, 2013, in the Gunsan study area in the Geum River basin of South Korea. The proposed comparisons are tested on the three sites of the enlarged images in Fig. 6. The two sites represent the urban cover type and the forest one, respectively. The rest site of Fig. 6(c) is assigned to test the accuracy of the resulted classification.

7. Results and Analysis

1) Comparison in optimizing cluster center

The comparison of the different selection methods in terms of the Euclidean distance of each individual pixel in the population at each generation is shown in the two different land cover types, namely th urban of Fig. 7 based on Fig. 6(a) and the forest of Fig. 8 based on Fig. 6(b). When population size = 400, mutation rate = 0.15, crossover rate = 0.5, elitism rate = 0.1, and total number of generations (i.e. iteration) = 30 are considered, clustering results of the different selection methods are presented in Table 3. After checking this Table, it can be seen that the tournament selection and the proportional roulette wheel selection have a better possibility to obtain the optimum pixel cluster center from which the Euclidean distance of each pixel is minimum. In the case of ranked-based selection, it has a less probability of getting an optimum cluster center because the Euclidean distance fluctuates from high to low and vice versa (see Figs. 7 and 8). However, it shows medium Euclidean distance for the most interval of the generations. The major drawbacks of the elitism selection method are that it suffers from premature convergence and so it gets stuck to suboptimal in early generation instead of converging to the global optimum. Compared to ranked-based selection, the tournament selection needs a moderate number of generations to reach global optimum, and it does not suffer from premature convergence unlike the elitism.

Table 3. Optimized results from the different selection methods in two different types of reflectance data in urban and forest, respectively. Each KOMPSAT-3 image has four bands. The displayed lowest Euclidean distances (per pixel) are ound within 30 iterations

OGCSBN_2018_v34n1_151_t0003.png 이미지

OGCSBN_2018_v34n1_151_f0007.png 이미지

Fig. 7. Comparison of the selection methods in minimizing the Euclidean distance of the urban land cover type pixel

OGCSBN_2018_v34n1_151_f0008.png 이미지

Fig. 8. Comparison of the selection methods in minimizing the Euclidean distance of the forest land cover type pixel

Fig. 9 and Table 4 state that the ranked-based selection exhibits the most time-consuming computation since it needs to sort the individuals to assign ranking probabilities. As a result, the tournament selection needs less computational time than the proportional roulette wheel selection. The fitness of each pixel within the same cluster (i.e. class) is nearly the same so that the aim of the fitness smoothing in the rankedbased selection does nothing better. Consequently, to learn the cluster center of each homogeneous area in the MS genetic clustering, GA with tournament selection outperforms other selection methods.

OGCSBN_2018_v34n1_151_f0009.png 이미지

Fig. 9. Computational time comparison among four selection methods. Values of Y-axis

Table 4. Computational time required for each selection methods

OGCSBN_2018_v34n1_151_t0005.png 이미지

2) Comparison in classification accuracy

The test site of Fig. 6(c) is classified into five land cover categories, namely, coniferous (green), broadleaf (red), uncultivated farmland (yellow), urban (blue), and water (black) using Algorithm 1 MSP_Classifier. The corresponding color classification results are shown in Fig. 10. To uantitatively assess classification accuracy of the classified images, we compare these four genetic clustering images (Fig. 10(a)-(d)) with the supervised classification image created from the KOMPSAT-3 data of the same test site, which is provided with ERDAS IMAGINE’s maximum likelihood (ML) classifier and can be treated as the reference source (i.e., site-specific test pixels) data instead of the ground truth data in the confusion matrix (i.e., error matrix). In this supervised classification approach, 5 polygon regions of interest (ROIs) for each thematic class (i.e., land cover category) are used for training the ML classifier.

OGCSBN_2018_v34n1_151_f0010.png 이미지

Fig. 10. Clustering results comparison on the mixture class cover test site KOMPSAT-3 MS data among using (a) tournament, (b) roulette wheel, (c) elitism, and (d) ranked-based selection, respectively

Table 5 shows the confusion matrix of comparing 750 pixel samples (i.e., the appropriate 150 pixel samples per class) of location on each clustering image with the same locations on the supervised classification image.

Table 5. Comparison of the thematic accuracy of four selections in a confusion matrix using 750 test samples

OGCSBN_2018_v34n1_151_t0006.png 이미지

Among the four selections in the thematic accuracy assessment, the overall accuracy of the tournament selection (i.e. 93.86% of the overall accuracy) is better than that of the other three methods. As expected, the Kappa coefficient of the tournament is 0.92 and also better compared with the other three methods. Moreover, we can also observe from Table 5 that all the four selection methods yield a very satisfying result with high producer’s accuracies (86.7% -96.67%) and user’s accuracies (86.30% - 100%).

8. Conclusion

In this paper, we have proposed the comparison of four performance selections based on GAs in optimizing cluster center for MS pixel clustering called unsupervised classification including the assessment of those classification accuracies. This approach has been conducted on the KOMPSAT-3 data. The advantages of the tournament selection method compared to the other three ones are as follows. First, for minimizing the Euclidian distance per pixel from the cluster center, the tournament selection outperforms the other selections, whereas the ranked-based and proportional roulette wheel selections exhibit nearly similar performances. Second, the tournament selection can also reach global optimum earlier than the rankedbased one because of less effectiveness in smoothing the fitness of pixels, and better than the elitism one due to encountering premature convergence, which has the fastest computation time. Third, from the thematic accuracy assessment on the classified images of the mixture class test site obtained by using each of the classifiers based on the four selections, the tournament selection-based clustering GA (i.e., classifier) provides the highest overall accuracy of 93.6% and the highest Kappa coefficient of 0.923 among the above four classifiers. More specifically, our experimental procedure could be used to perform for suitable remote sensing image classification, because the obtained Kappa coefficient value over 0.75 indicates excellent agreement between the classified result using the tournament section-based clustering GA and the supervised ML classification one as the reference data.

Acknowledgments

This work was supported by the grant from the National Research Foundation of Korea (NRF) under “Space Technology Development Program” (Project No. NRF-2015M1A3A3A0201225) funded by Korea government (MSIP). Page charges of the manuscript was supported by the Industry-Academic Cooperation Foundation, Kookmin University.

References

  1. Baker, J. E., 1985. Adaptive selection methods for genetic algorithms, Proc. of the 1st International Conference on Genetic Algorithms and Their Applications, Lawrence Erlbaum Associates, Hillsdale, NJ, USA, pp. 101-111.
  2. Bandyopadhyay, S. and S. K. Pal, 2001. Pixel classification using variable string genetic algorithms with chromosome differentiation, IEEE Transactions on Geoscience and Remote Sensing, 39(2): 303-308. https://doi.org/10.1109/36.905238
  3. Blickle, T. and L. Thiele, 1995. A comparison of selection schemes used in genetic algorithms, TIK-Report Nr. 11, Swiss Federal Institute of Technology, Zurich, Switzerland.
  4. Chudasama, C., S. M. Shah, and M. Panchal, 2011. Comparison of parents selection methods of genetic algorithm for TSP, IJCA Proc. on International Conference on Computer Communication and Networks CSI-COMNET-2011, pp. 85-87.
  5. Goldberg, D. E., 1989. Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, MA, USA.
  6. Goldberg, D. E. and K. Deb, 1991. A comparative analysis of selection schemes used in genetic algorithms, in: Foundations of Genetic Algorithms, Morgan Kaufmann, San Mateo, CA, USA, pp. 69-93.
  7. Greffenstette, J. J., 1994. Evolutionary algorithms in robotics, in: Robotics and Manufacturing: Recent Trends in Research, Education, and Application: Proc. ISRAM' 94, ASME Press, New York, USA, vol.5, pp. 65-72.
  8. Greffenstette, J., 1997. Proportional selection and sampling algorithms, in: Handbook of Evolutionary Computation, Institute of Physics, Bristol, UK, pp. C2.2:1-C2.2:7.
  9. Guo, Y. Q., B. Y. Wu, Z. H. Ju, W. Jun, and Z. Luyan, 2010. Remote sensing image classification by the chaos genetic algorithm in monitoring land use changes, Mathematical and Computer Modelling, 51(11-12): 1408-1416. https://doi.org/10.1016/j.mcm.2009.10.023
  10. Holland, J. H., 1975. Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, USA.
  11. Jadaan, O. A., L. Rajamani, and C. R. Rao, 2008. Improved selection operator for GA, Journal of Theoretical and Applied Information Technology, 4(4): 269-277.
  12. Julstrom, B. A., 1999. It's all the same me: revisiting rank-based probabilities and tournaments, Proc. CEC 99, vol.2, pp.1501-1505.
  13. Luo, Y.-M. and M. H. Liao, 2014. A clonal selection algorithm for classification of mangroves remote sensing image, International Journal of control and Automation, 7(4): 395-404. https://doi.org/10.14257/ijca.2014.7.4.36
  14. Maulik, U. and S. Bandyopadhyay, 2003. Fuzzy partitioning using a real-coded variablelength genetic algorithm for pixel classification, IEEE Transactions on Geoscience and Remote Sensing, 41(5): 1075-1081. https://doi.org/10.1109/TGRS.2003.810924
  15. Miller, B. L. and D. E. Goldberg, 1995. Genetic algorithms, tournament selection, and the effects of noise, Complex Systems, 9: 193-212.
  16. Mitrakis, N. E., C. A. Topalogou, T. K. Alexandridis, J. B. Theocharis, and G. C. Zalidis, 2008. Decision fusion of GA self-organizing neurofuzzy multilayered classifiers for land cover classification using textural and spectral features, IEEE Transactions on Geoscience and Remote Sensing, 46(7): 2137-2152. https://doi.org/10.1109/TGRS.2008.916481
  17. Pandey, H. M., 2016. Performance evaluation of selection methods of genetic algorithm and network security concerns, Procedia Computer Science, 78: 13-18. https://doi.org/10.1016/j.procs.2016.02.004
  18. Pedergnana, M., P. R. Marpu, D. M. Mura, J. A. Benediktsson, and L. A. Bruzzone, 2013. A novel technique for optimal feature selection in attribute profiles based on genetic algorithms, IEEE Transactions on Geoscience and Remote Sensing, 51(6): 3514-3528. https://doi.org/10.1109/TGRS.2012.2224874
  19. Razali, N. M. and J. Geraghty, 2011. Genetic algorithm performance with different selection strategies in solving TSP, Proc. WCE 2011, IAENG, vol. 2, pp. 1134-1139.
  20. Shukla, A., H. M. Pandey, and D. Mehrotra, 2015. Comparative review of selection techniques in genetic algorithm, in: Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE), IEEE, pp. 515-519.
  21. Yao, H. B. and L. Tian, 2003. A genetic-algorithm-based selective principal component analysis (GA-SPCA) method for high dimensional data feature extraction, IEEE Transactions on Geoscience and Remote Sensing, 41(6): 1469-1478. https://doi.org/10.1109/TGRS.2003.811691
  22. Yeom, J. M., C. G. Jin, D. H. Lee, and K. S. Han, 2016. Radiometric characteristics of KOMPSAT-3 multispectral images using the spectra of well-known surface tarps, IEEE Transactions on Geoscience and Remote Sensing, 54(10): 5914-5924. https://doi.org/10.1109/TGRS.2016.2574902
  23. Zhang, J., X. Hu, J. Zhang, and M. Gu, 2005. Comparison of performance between different selection strategies on simple genetic algorithms, Proc. CIMCA-IAWTIC'05, IEEE, Piscataway, NJ, USA, vol. 2, pp. 1115-1121.