Two-phase clustering algorithm with density exploring distance measure
- Author(s): Jingjing Ma 1 ; Xiangming Jiang 1 ; Maoguo Gong 1
-
-
View affiliations
-
Affiliations:
1:
Key Lab of Intelligent Perception and Image Understanding of Ministry of Education of China, Xidian University , PO Box 224, Xi'an 710071 , People's Republic of China
-
Affiliations:
1:
Key Lab of Intelligent Perception and Image Understanding of Ministry of Education of China, Xidian University , PO Box 224, Xi'an 710071 , People's Republic of China
- Source:
Volume 3, Issue 1,
March
2018,
p.
59 – 64
DOI: 10.1049/trit.2018.0006 , Online ISSN 2468-2322
Here, the authors propose a novel two-phase clustering algorithm with a density exploring distance (DED) measure. In the first phase, the fast global K-means clustering algorithm is used to obtain the cluster number and the prototypes. Then, the prototypes of all these clusters and representatives of points belonging to these clusters are regarded as the input data set of the second phase. Afterwards, all the prototypes are clustered according to a DED measure which makes data points locating in the same structure to possess high similarity with each other. In experimental studies, the authors test the proposed algorithm on seven artificial as well as seven UCI data sets. The results demonstrate that the proposed algorithm is flexible to different data distributions and has a stronger ability in clustering data sets with complex non-convex distribution when compared with the comparison algorithms.
Inspec keywords: pattern clustering; sorting; statistical distributions
Other keywords: non-convex distribution; DED measure; data points; density exploring distance measure; cluster number; comparison algorithms; UCI data sets; two-phase clustering algorithm; fast global K-means clustering algorithm; data distributions
Subjects: Other topics in statistics; Combinatorial mathematics; Data handling techniques
References
-
-
1)
-
[7]. Huang, J., Liu, J.: ‘A similarity-based modularization quality measure for software module clustering problems’, Inf. Sci., 2016, 342, pp. 96–110 (doi: 10.1016/j.ins.2016.01.030).
-
-
2)
-
[20]. Wang, L., Bo, L., Jiao, L.: ‘A modified k-means clustering with a density-sensitive distance metric’. Proc. of the Int. Conf. on Rough Sets and Knowledge Technology, Chongquing, China, 2006, pp. 544–551.
-
-
3)
-
[9]. Meng, F., Li, H., Wu, Q., et al: ‘Globally measuring the similarity of superpixels by binary edge maps for superpixel clustering’, IEEE Trans. Circuits Syst. Video Technol., 2016, doi: 10.1109/TCSVT.2016.2632148.
-
-
4)
-
[14]. Cao, F., Liang, J., Li, D., et al: ‘A dissimilarity measure for the k-modes clustering algorithm’, Knowl.-Based Syst., 2012, 26, pp. 120–127 (doi: 10.1016/j.knosys.2011.07.011).
-
-
5)
-
[30]. Bousquet, O., Chapelle, O., Hein, M.: ‘Measure based regularization’. Advances in Neural Information Processing Systems, Vancouver, Canada, 2004, pp. 1221–1228.
-
-
6)
-
[2]. Hartigan, J.A., Wong, M.A.: ‘Algorithm as 136: a k-means clustering algorithm’, J. R. Stat. Soc. C, Appl. Stat., 1979, 28, (1), pp. 100–108.
-
-
7)
-
[6]. Wu, J., Wu, Z., Cao, J., et al: ‘Fuzzy consensus clustering with applications on big data’, IEEE Trans. Fuzzy Syst., 2017, 25, (6), pp. 1430–1445 (doi: 10.1109/TFUZZ.2017.2742463).
-
-
8)
-
[8]. Xia, G., Sun, H., Feng, L., et al: ‘Human motion segmentation via robust kernel sparse subspace clustering’, IEEE Trans. Image Process., 2018, 27, (1), pp. 135–150 (doi: 10.1109/TIP.2017.2738562).
-
-
9)
-
[26]. Machné, R., Murray, D.B., Stadler, P.F.: ‘Similarity-based segmentation of multi-dimensional signals’, Sci. Rep., 2017, 7, (1), p. 12355 (doi: 10.1038/s41598-017-12401-8).
-
-
10)
-
[28]. Zong, Y., Xu, G., Zhang, Y., et al: ‘A robust iterative refinement clustering algorithm with smoothing search space’, Knowl.-Based Syst., 2010, 23, (5), pp. 389–396 (doi: 10.1016/j.knosys.2010.01.012).
-
-
11)
-
[18]. Su, M.-C., Chou, C.-H.: ‘A modified version of the k-means algorithm with a distance based on cluster symmetry’, IEEE Trans. Pattern Anal. Mach. Intell., 2001, 23, (6), pp. 674–680 (doi: 10.1109/34.927466).
-
-
12)
-
[31]. Blum, A., Chawla, S.: ‘Learning from labeled and unlabeled data using graph mincuts’. Proc. of the 18th Int. Conf. on Machine Learning (ICML), Williamstown, MA, USA, 2001, pp. 19–26.
-
-
13)
-
[22]. Likas, A., Vlassis, N., Verbeek, J.J.: ‘The global k-means clustering algorithm’, Pattern Recognit., 2003, 36, (2), pp. 451–461 (doi: 10.1016/S0031-3203(02)00060-2).
-
-
14)
-
[3]. Wang, W., Yang, J., Muntz, R., et al: ‘STING: a statistical information grid approach to spatial data mining’. Proc. of the 23rd Int. Conf. on Very Large Data Bases, Athens, Greece, 1997, pp. 186–195.
-
-
15)
-
[21]. Gong, M., Jiao, L., Wang, L., et al: ‘Density-sensitive evolutionary clustering’. Proc. of the 11th Pacific-Asia Conf. on Knowledge Discovery and Data Mining, Nanjing, China, 2007, pp. 507–514.
-
-
16)
-
[23]. Maulik, U., Bandyopadhyay, S.: ‘Genetic algorithm-based clustering technique’, Pattern Recognit., 2000, 33, (9), pp. 1455–1465 (doi: 10.1016/S0031-3203(99)00137-5).
-
-
17)
-
[17]. Lu, Y., Hou, X., Chen, X.: ‘A novel travel-time based similarity measure for hierarchical clustering’, Neurocomputing, 2016, 173, pp. 3–8 (doi: 10.1016/j.neucom.2015.01.090).
-
-
18)
-
[10]. Taşdemir, K., Yalçin, B., Yildirim, I.: ‘Approximate spectral clustering with utilized similarity information using geodesic based hybrid distance measures’, Pattern Recognit., 2015, 48, (4), pp. 1465–1477 (doi: 10.1016/j.patcog.2014.10.023).
-
-
19)
-
[13]. Son, L.H.: ‘Generalized picture distance measure and applications to picture fuzzy clustering’, Appl. Soft Comput., 2016, 46, (C), pp. 284–295 (doi: 10.1016/j.asoc.2016.05.009).
-
-
20)
-
[4]. Agrawal, R., Gehrke, J., Gunopulos, D., et al: ‘Automatic subspace clustering of high dimensional data for data mining applications’. Proc. of ACM-SIGMOND Int. Conf. Management on Data, Seattle, Washington, USA, 1998, pp. 94–105.
-
-
21)
-
[29]. Zhu, S., Wang, D., Li, T.: ‘Data clustering with size constraints’, Knowl.-Based Syst., 2010, 23, (8), pp. 883–889 (doi: 10.1016/j.knosys.2010.06.003).
-
-
22)
-
[32]. Blake, C.L., Merz, C.J.: ‘UCI repository of machine learning databases’. Technical Report, Department of Information and Computer Science, University of California, Irvine, CA, 1998.
-
-
23)
-
[5]. Guha, S., Rastogi, R., Shim, K.: ‘CURE: an efficient clustering algorithm for large databases’. Proc. of ACM-SIGMOND Int. Conf. Management on Data, Seattle, Washington, USA, 1998, pp. 73–84.
-
-
24)
-
[24]. Kang, Q., Liu, S., Zhou, M., et al: ‘A weight-incorporated similarity-based clustering ensemble method based on swarm intelligence’, Knowl.-Based Syst., 2016, 104, pp. 156–164 (doi: 10.1016/j.knosys.2016.04.021).
-
-
25)
-
[1]. Jain, A., Murty, M., Flynn, P.: ‘Data clustering: a review’, ACM Comput. Surv., 1999, 31, (3), pp. 264–323 (doi: 10.1145/331499.331504).
-
-
26)
-
[19]. Charalampidis, D.: ‘A modified k-means algorithm for circular invariant clustering’, IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27, (12), pp. 1856–1865 (doi: 10.1109/TPAMI.2005.230).
-
-
27)
-
[25]. Kang, Z., Peng, C., Cheng, Q.: ‘Twin learning for similarity and clustering: a unified kernel approach’. AAAI, San Francisco, California, USA, 2017, pp. 2080–2086.
-
-
28)
-
[16]. Ferrari, D.G., De Castro, L.N.: ‘Clustering algorithm selection by meta-learning systems: a new distance-based problem characterization and ranking combination methods’, Inf. Sci., 2015, 301, pp. 181–194 (doi: 10.1016/j.ins.2014.12.044).
-
-
29)
-
[15]. Yang, P., Zhu, Q., Huang, B.: ‘Spectral clustering with density sensitive similarity function’, Knowl.-Based Syst., 2011, 24, (5), pp. 621–628 (doi: 10.1016/j.knosys.2011.01.009).
-
-
30)
-
[27]. Zhou, D., Bousquet, O., Lal, T.N., et al: ‘Learning with local and global consistency’. Advances in Neural Information Processing Systems, Vancouver, Canada, 2004, pp. 321–328.
-
-
31)
-
[11]. Mori, U., Mendiburu, A., Lozano, J.A.: ‘Similarity measure selection for clustering time series databases’, IEEE Trans. Knowl. Data Eng., 2016, 28, (1), pp. 181–195 (doi: 10.1109/TKDE.2015.2462369).
-
-
32)
-
[12]. Ye, J.: ‘Single-valued neutrosophic clustering algorithms based on similarity measures’, J. Classif., 2017, 34, (1), pp. 148–162 (doi: 10.1007/s00357-017-9225-y).
-
-
1)