access icon free Automatic clustering method based on evolutionary optimisation

How to set the cluster number plays a key role in many clustering applications. To address this issue, this study introduces an automatic clustering method based on evolutionary algorithms (EAs). The basic idea is to convert a clustering problem into a global optimisation problem and tackle it by an EA. A new validity index, which balances the inter-cluster consistency and the intra-cluster consistency, is proposed to be the objective function. Three adaptive coding schemes, which can deal with variable-length optimisation problems by using a fixed-length chromosome, are designed to detect the cluster number automatically. The validity index and adaptive coding schemes are incorporated in an EA for automatic clustering. The authors approach is compared with some widely used validity indices and an adaptive coding scheme on some artificial data sets and two real-world problems. The experimental results suggest that their method not only successfully detects the correct cluster numbers but also achieve stable results for most of test problems.

Inspec keywords: pattern clustering; evolutionary computation; optimisation

Other keywords: adaptive coding schemes; objective function; intracluster consistency; validity index; evolutionary optimisation; EA; variable-length optimisation problems; intercluster consistency; global optimisation problem; automatic clustering method; evolutionary algorithms; fixed-length chromosome

Subjects: Optimisation techniques; Data handling techniques

References

    1. 1)
      • 53. Mahalanobis, P.C.: ‘On the generalised distance in statistics’. Proc. Nat. Inst. Sciences of India, 1936, vol. 2, no 1, pp. 4955.
    2. 2)
      • 55. Gong, M., Jiao, L.: ‘Image texture classification using a manifold distance-based evolutionary clustering method’, Opt. Eng., 2008, 47, (7), pp. 077201-1077201-10 (doi: 10.1117/1.2955785).
    3. 3)
      • 46. Sanghamitra, B., Ujjwal, M.: ‘Genetic clustering for automatic evolution of clusters and application to image classification’, Pattern Recognit., 2002, 35, (6), pp. 11971208 (doi: 10.1016/S0031-3203(01)00108-X).
    4. 4)
      • 44. Bezdek, J.C., Pal, N.R.: ‘Some new indexes of cluster validity’, IEEE Trans. Syst. Man Cybern., B, 1998, 28, (3), pp. 301315 (doi: 10.1109/3477.678624).
    5. 5)
      • 37. Maulik, U., Bandyopadhyay, S.: ‘Genetic algorithm-based clustering technique’, Pattern Recognit., 2000, 33, (9), pp. 14551465 (doi: 10.1016/S0031-3203(99)00137-5).
    6. 6)
      • 54. Fischer, B.: ‘Path-based clustering for grouping of smooth curves and texture segmentation’, IEEE Trans. Pattern Anal. Mach. Intell., 2003, 25, (4), pp. 513518 (doi: 10.1109/TPAMI.2003.1190577).
    7. 7)
      • 17. Krista, R.Z., Borut, Z.: ‘Validity index for clusters of different sizes and densities’, Pattern Recognit. Lett., 2011, 32, (2), pp. 221234 (doi: 10.1016/j.patrec.2010.08.007).
    8. 8)
      • 34. Krishna, K., Murty, N.: ‘Genetic k-means algorithm’, IEEE Trans. Syst. Man Cybern., B, 1999, 29, (3), pp. 433439 (doi: 10.1109/3477.764879).
    9. 9)
      • 33. Murthy, C.A., Chowdhury, N.: ‘In search of optimal clusters using genetic alorithms’, Pattern Recognit. Lett., 1996, 17, (8), pp. 825832 (doi: 10.1016/0167-8655(96)00043-8).
    10. 10)
      • 39. Bandyopadhyay, S.: ‘Multiobjective simulated annealing for fuzzy clustering with stability and validity’, IEEE Trans. Syst. Man Cybern., C, 2011, 41, (5), pp. 682691 (doi: 10.1109/TSMCC.2010.2088390).
    11. 11)
      • 45. Pal, N.R., Bezdek, J.C.: ‘On cluster validity for the fuzzy c-means model’, IEEE Trans. Fuzzy Syst., 1995, 3, (3), pp. 370379 (doi: 10.1109/91.413225).
    12. 12)
      • 31. Krovi, R.: ‘Genetic algorithm for clustering: a preliminary investigation’. Proc. 25th Hawaii Int. Conf. on System Sciences, 1992, pp. 540544.
    13. 13)
      • 6. Hruschka, E.R., Campello, R.J.G.B., Freitas, A.A., Carvalho, A.C.P.L.F.: ‘A survey of evolutionary algorithms for clustering’, IEEE Trans. Syst. Man Cybern. C, 2009, 39, (2), pp. 133155 (doi: 10.1109/TSMCC.2008.2007252).
    14. 14)
      • 41. Paoli, A., Melgani, F., Pasolli, E.: ‘Clustering of hyperspectral images based on multiobjective particle swarm optimization’, IEEE Trans. Geosci. Remote, 2009, 47, (12), pp. 41754188 (doi: 10.1109/TGRS.2009.2023666).
    15. 15)
      • 29. Kuncheva, L.I., Bezdek, J.C.: ‘Selection of cluster prototypes from data by a genetic algorithm’. Proc. 5th European Congress on Intelligent Techniques and Soft Computing, 1997, pp. 16831688.
    16. 16)
      • 10. Tsekouras, G.E., Sarimveis, H.: ‘A new approach for measuring the validity of fuzzy c-means algorithm’, Adv. Eng. Softw., 2004, 35, pp. 567575 (doi: 10.1016/j.advengsoft.2004.05.001).
    17. 17)
      • 51. Rand, W.M.: ‘Objective criteria for the evaluation of clustering methods’, J. Am. Stat. Assoc., 1971, 66, (336), pp. 846850 (doi: 10.1080/01621459.1971.10482356).
    18. 18)
      • 24. Rhee, H.S., Oh, K.W.: ‘A validity measure for fuzzy clustering and its use in selecting optimal number of clusters’. Proc. Fifth IEEE Int. Conf. on Fuzzy Systems, 1996, vol. 2, pp. 10201025.
    19. 19)
      • 25. Sheng, W., Swift, S., Zhang, L., Liu, X.: ‘A weighted sum validity function for clustering with a hybrid niching genetic algorithm’, IEEE Trans. Syst. Man Cybern., B, 2005, 35, (6), pp. 11561167 (doi: 10.1109/TSMCB.2005.850173).
    20. 20)
      • 36. Scheunders, P.: ‘A genetic c-means clustering algorithm applied to color image quantization’, Pattern Recognit., 1997, 30, (6), pp. 859866 (doi: 10.1016/S0031-3203(96)00131-8).
    21. 21)
      • 47. Ujjwal, M., Sanghamitra, B.: ‘Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification’, IEEE Trans. Geosci. Remote, 2003, 41, (5), pp. 10751081 (doi: 10.1109/TGRS.2003.810924).
    22. 22)
      • 49. Blake, C.L., Merz, C.J.: ‘UCI repository of machine learning databases’, [http://www.ics.uci.edu/~mlearn/MLRepository.html], Department of Information and Computer Science, University of California, Irvine, CA, 1998.
    23. 23)
      • 27. Wu, K.L., Yang, M.S.: ‘Alternative c-means clustering algorithms’, Pattern Recognit., 2002, 35, pp. 22672278 (doi: 10.1016/S0031-3203(01)00197-2).
    24. 24)
      • 7. Lloyd, S.P.: ‘Least squares quantization in PCM’, IEEE Trans. Inf. Theory, 1982, 28, (2), pp. 129137 (doi: 10.1109/TIT.1982.1056489).
    25. 25)
      • 5. Falkenauer, E.: ‘Genetic algorithms and grouping problems’ (John Wiley Sons, 1998).
    26. 26)
      • 19. Shen, J., Chang, S.: ‘Determination of cluster number in clustering microarry data’, Appl. Math. Comput., 2005, 169, (2), pp. 11721185 (doi: 10.1016/j.amc.2004.10.076).
    27. 27)
      • 50. Halgamuge, S.K., Glesner, M.: ‘Neural networks in designing fuzzy systems for real world applications’, Fuzzy Set Syst., 1994, 65, (1), pp. 112 (doi: 10.1016/0165-0114(94)90242-9).
    28. 28)
      • 30. Melanie, M.: ‘An introduction to genetic algorithms’ (MIT Press, 1998).
    29. 29)
      • 4. Xu, R., Wunsch, D.: ‘Survey of clustering algorithms’, IEEE Trans. Neural Netw., 2005, 16, (3), pp. 645678 (doi: 10.1109/TNN.2005.845141).
    30. 30)
      • 2. Ducournau, A., Bretto, A., Rital, S., Laget, B.: ‘A reductive approach to hypergraph clustering: an application to image segmentation’, Pattern Recognit., 2012, 45, (7), pp. 27882803 (doi: 10.1016/j.patcog.2012.01.005).
    31. 31)
      • 12. Chou, C.H., Su, M.C., Lai, E.: ‘A new cluster validity measure and its application to image compression’, Pattern Anal. Appl., 2004, 7, (2), pp. 205220 (doi: 10.1007/s10044-004-0218-1).
    32. 32)
      • 52. Tsai, D.M., Lin, C.C.: ‘Fuzzy c-means based clustering for linearly and nonlinearly separable data’, Pattern Recognit., 2011, 44, (8), pp. 17501760 (doi: 10.1016/j.patcog.2011.02.009).
    33. 33)
      • 8. Bezdek, J.C.: ‘Pattern recognition in handbook of fuzzy computation’ (IOP Publishing Ltd, 1998).
    34. 34)
      • 48. Anderson, E.: ‘The irises of the gaspe peninsula’, Bull. Am. Iris Soc., 1935, 59, pp. 25.
    35. 35)
      • 42. Das, S., Abraham, A., Konar, A.: ‘Automatic kernel clustering with a multi-elitist particle swarm optimization algorithm’, Pattern Recognit. Lett., 2008, 29, (5), pp. 688699 (doi: 10.1016/j.patrec.2007.12.002).
    36. 36)
      • 3. MacQueen, J.B.: ‘Some methods for classification and analysis of multivariate observations’. Proc. Fifth Berkeley Symp. on Mathematical Statistics and Probability, 1967, pp. 281297.
    37. 37)
      • 26. Liang, Z., Li, Y.: ‘Multiple kernels for generalised discriminant analysis’, IET Comput. Vis., 2010, 4, (2), pp. 117128 (doi: 10.1049/iet-cvi.2008.0039).
    38. 38)
      • 21. Calinskia, T., Harabasza, J.: ‘A dendrite method for cluster analysis’, Commun. Stat., 1974, 3, pp. 127.
    39. 39)
      • 11. Davies, D.L., Bouldin, D.W.: ‘A cluster separation measure’, IEEE Trans. Pattern Anal. Mach. Intell., 1974, PAMI-1, pp. 224227 (doi: 10.1109/TPAMI.1979.4766909).
    40. 40)
      • 1. Theodoridis, S., Koutroumbas, K.: ‘Pattern recognition’ (Academic Press, 2008, 4th edn.).
    41. 41)
      • 20. Dunn, J.C.: ‘Well-separated clusters and the optimal fuzzy partitions’, J. Cybern., 1974, 4, (1), pp. 95104 (doi: 10.1080/01969727408546059).
    42. 42)
      • 15. Sun, H., Wang, S., Jiang, Q.: ‘FCM-based model selection algorithms for determining the number of cluster’, Pattern Recognit., 2004, 37, (10), pp. 20272037 (doi: 10.1016/j.patcog.2004.03.012).
    43. 43)
      • 35. Baeck, T., Fogel, D.B.: ‘Handbook of evolutionary computation’ (IOP Publishing Ltd and Oxford University Press, 1997).
    44. 44)
      • 40. Saha, S., Bandyopadhyay, S.: ‘A symmetry based multiobjective clustering technique for automatic evolution of clusters’, Pattern Recognit., 2010, 43, (3), pp. 738751 (doi: 10.1016/j.patcog.2009.07.004).
    45. 45)
      • 32. Lu, Y., Lu, S., Fotouhi, F., Deng, Y., Brown, S.J.: ‘Incremental genetic k-means algorithm and its application in gene expression data analysis’, BMC Bioinf., 2004, 5, (1), pp. 110 (doi: 10.1186/1471-2105-5-1).
    46. 46)
      • 13. Xie, X., Beni, G.: ‘A validity measure for fuzzy clustering’, IEEE Trans. Pattern Anal. Mach. Intell., 1991, 13, (8), pp. 841847 (doi: 10.1109/34.85677).
    47. 47)
      • 16. Malay, K.B., Sanghamitra, B., Ujjwal, M.: ‘Validity index for crisp and fuzzy clusters’, Pattern Recognit., 2004, 37, (3), pp. 487501 (doi: 10.1016/j.patcog.2003.06.005).
    48. 48)
      • 43. Wang, Y., Cai, Z.X., Zhang, Q.F.: ‘Differential evolution with composite trial vector generation strategies and control parameters’, IEEE Trans. Evol. Comput., 2011, 15, (1), pp. 5566 (doi: 10.1109/TEVC.2010.2087271).
    49. 49)
      • 14. Kwon, S.: ‘Cluster validity index for fuzzy clustering’, Electron. Lett., 1998, 34, (22), pp. 21762177 (doi: 10.1049/el:19981523).
    50. 50)
      • 28. Holland, J.: ‘Adaptation in natural and artificial systems’ (The MIT Press, 1992).
    51. 51)
      • 23. Rezaee, M.R., Reiber, J.H.C.: ‘A new cluster validity index for the fuzzy c-means’, Pattern Recognit. Lett., 1998, 19, (3–4), pp. 237246 (doi: 10.1016/S0167-8655(97)00168-2).
    52. 52)
      • 9. Fukuyama, Y., Sugeno, M.: ‘A new method of choosing the number of clusters for fuzzy c-means method’. Proc. Fifth Fuzzy System Symp., 1989, pp. 247250.
    53. 53)
      • 38. Merz, P., Zell, A.: ‘Clustering gene expression profiles with memetic algorithms’, Lect. Notes Comput. Sc., 2002, 2439/2002, pp. 811820 (doi: 10.1007/3-540-45712-7_78).
    54. 54)
      • 18. Krista, R.Z.: ‘Cluster validity index for estimation of fuzzy clusters of different sizes and densities’, Pattern Recognit., 2010, 43, (10), pp. 33743390 (doi: 10.1016/j.patcog.2010.04.025).
    55. 55)
      • 22. Rousseeuw, P.J.: ‘Silhouettes: a graphicalaid to the interpretation and validation of cluster analysis’, J. Comput. Appl. Math., 1987, 20, (1), pp. 5365 (doi: 10.1016/0377-0427(87)90125-7).
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2012.0187
Loading

Related content

content/journals/10.1049/iet-cvi.2012.0187
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading