A density-based enhancement to dominant sets clustering
- Author(s): Jian Hou 1 ; E Xu 1, 2 ; Wei-Xue Liu 1 ; Qi Xia 3 ; Nai-Ming Qi 3
-
-
View affiliations
-
Affiliations:
1:
School of Information Science and Technology, Bohai University, Jinzhou, Liaoning 121013, People's Republic of China;
2: China Centre for Industrial Security Research, Beijing Jiaotong University, Beijing 100044, People's Republic of China;
3: School of Astronautics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, People's Republic of China
-
Affiliations:
1:
School of Information Science and Technology, Bohai University, Jinzhou, Liaoning 121013, People's Republic of China;
- Source:
Volume 7, Issue 5,
October 2013,
p.
354 – 361
DOI: 10.1049/iet-cvi.2013.0072 , Print ISSN 1751-9632, Online ISSN 1751-9640
Although there is no shortage of clustering algorithms, existing algorithms are often afflicted by problems of one kind or another. Dominant sets clustering is a graph-theoretic approach to clustering and exhibits significant potential in various applications. However, the authors' work indicates that this approach suffers from two major problems, namely over-segmentation tendency and sensitiveness to distance measures. In order to overcome these two problems, the authors present a density-based enhancement to dominant sets clustering where a cluster merging step is used to fuse adjacent clusters close enough from the original dominant sets clustering. Experiments on various datasets validate the effectiveness of the proposed method.
Inspec keywords: set theory; pattern clustering; graph theory; unsupervised learning
Other keywords: density-based enhancement; distance measure sensitiveness; adjacent cluster fusion; unsupervised learning tools; over-segmentation tendency; cluster merging step; dominant sets clustering; graph-theoretic approach
Subjects: Combinatorial mathematics; Data handling techniques; Knowledge engineering techniques
References
-
-
1)
-
5. Wang, W., Yang, J., Sting, M.R.: ‘A statistical information grid approach to spatial data mining’. Proc. VLDB Conf., Athens, Greece, August 1997, pp. 186–195.
-
-
2)
-
21. Yang, X.W., Liu, H.R., Laecki, L.J.: ‘Contour-based object detection as dominant set computation’, Pattern Recognit., 2012, 45, (5), pp. 1927–1936 (doi: 10.1016/j.patcog.2011.11.010).
-
-
3)
-
1. Bradley, P.S., Mangasarian, O.L.: ‘Clustering via concave minimization’. Proc. Neural Information Processing Systems, Denver, USA, 1997, pp. 368–374.
-
-
4)
-
31. Chang, H., Yeung, D.Y.: ‘Robust path-based spectral clustering’, Pattern Recognit., 2008, 41, (1), pp. 191–203 (doi: 10.1016/j.patcog.2007.04.010).
-
-
5)
-
3. Zhang, T., Ramakrishnan, R., Birch, L.M.: ‘An efficient data clustering method for very large databases’. Proc. ACM SIGMOD Int. Conf. Management of Data, Montreal, Canada, June 1996, pp. 103–114.
-
-
6)
-
6. Shi, J., Malik, J.: ‘Normalized cuts and image segmentation’, IEEE Trans. Pattern Anal. Mach. Intell., 2000, 22, (8), pp. 167–172 (doi: 10.1109/ICIP.1998.723676).
-
-
7)
-
7. Dhillon, I.: ‘Co-clustering documents and words using bipartite spectral graph partitioning’. Proc. Int. Conf. Knowledge Discovery and Data Mining, San Francisco, USA, August 2001, pp. 269–274.
-
-
8)
-
33. Fu, L., Medico, E.: ‘FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data’, BMC Bioinf., 2007, 8, (1), pp. 1–15 (doi: 10.1186/1471-2105-8-3).
-
-
9)
-
23. Hamid, R., Maddi, S., Johnson, A.Y., Bobick, A.F., Essa, I.A., Isbell, C.: ‘A novel sequence representation for unsupervised analysis of human activities’, Artif. Intell., 2009, 173, (14), pp. 1221–1244 (doi: 10.1016/j.artint.2009.05.002).
-
-
10)
-
16. Monti, S., Tamayo, P., Mesirov, J., Golub, T.: ‘Consensus clustering: a resampling based method for class discovery and visualization of gene expression microarray data’, Mach. Learn., 2003, 52, (1–2), pp. 91–118 (doi: 10.1023/A:1023949509487).
-
-
11)
-
12. Fraley, C., Raftery, A.E.: ‘How many clusters? which clustering method? answers via model-based cluster analysis’, Comput. J., 1998, 41, (8), pp. 578–588 (doi: 10.1093/comjnl/41.8.578).
-
-
12)
-
19. Pavan, M., Pelillo, M.: ‘Dominant sets and pairwise clustering’, IEEE Trans. Pattern Anal. Mach. Intell., 2007, 29, (1), pp. 167–172 (doi: 10.1109/TPAMI.2007.250608).
-
-
13)
-
11. Smyth, P.: ‘Clustering with Monte Carlo cross-validation’. Proc. Int. Conf. Knowledge Discovery and Data Mining, Portland, USA, August 1996, pp. 126–133.
-
-
14)
-
32. Jain, A., Law, M.: ‘Data clustering: a user's dilemma’, (LNCS3776), 2005, pp. 1–10.
-
-
15)
-
27. Wu, Z., Leahy, R.: ‘An optimal graph theoretic approach to data clustering: theory and its application to image segmentation’, IEEE Trans. Pattern Anal. Mach. Intell., 1993, 15, (11), pp. 1101–1113 (doi: 10.1109/34.244673).
-
-
16)
-
34. Zahn, C.T.: ‘Graph-theoretical methods for detecting and describing gestalt clusters’, IEEE Trans. Comput., 1971, 100, (1), pp. 68–86 (doi: 10.1109/T-C.1971.223083).
-
-
17)
-
9. Ng, A., Jordan, M., Weiss, Y.: ‘On spectral clustering: analysis and an algorithm’. Proc. Neural Information Processing Systems, Vancouver, Canada, December 2002, pp. 849–856.
-
-
18)
-
18. Evanno, G., Regnaut, S., Goudet, J.: ‘Detecting the number of clusters of individuals using the software structure: a simulation study’, Mol. Ecol., 2005, 14, (8), pp. 2611–2620 (doi: 10.1111/j.1365-294X.2005.02553.x).
-
-
19)
-
24. Wang, M., Ye, Z., Wang, Y., Wang, S.: ‘Dominant sets clustering for image retrieval’, Signal Process., 2008, 88, (11), pp. 2843–2849 (doi: 10.1016/j.sigpro.2008.04.007).
-
-
20)
-
15. Tibshirani, R., Walther, G., Hastie, T.: ‘Estimating the number of clusters in a data set via the gap statistic’, J. Royal Stat. Soc. B, Stat. Methodol., 2001, 63, (2), pp. 411–423 (doi: 10.1111/1467-9868.00293).
-
-
21)
-
29. Rota Bulò, S., Pelillo, M., Bomze, I.M.: ‘Graph-based quadratic optimization: a fast evolutionary approach’, Comput. Vis. Image Underst., 2011, 115, (7), pp. 984–995 (doi: 10.1016/j.cviu.2010.12.004).
-
-
22)
-
2. Arthor, D., Vassilvitskii, S.: ‘K-means ++ : the advantages of careful seeding’. Proc. ACM-SIAM Symp. Discrete Algorithms, New Orleans, USA, January 2007, pp. 1027–1035.
-
-
23)
-
13. Hansen, M.H., Yu, B.: ‘Model selection and the principle of minimum descriptor length’, J. Am. Stat. Assoc, 1998, 96, (454), pp. 746–774 (doi: 10.1198/016214501753168398).
-
-
24)
-
14. Ray, S., Turi, R.H.: ‘Determination of number of clusters in k-means clustering and application in color image segmentation’. Proc. Int. Conf. Advances in Pattern Recognition and Digital Techniques, Calcutta, India, December 1999, pp. 137–143.
-
-
25)
-
17. Tibshirani, R., Walther, G., Botstein, D., Brown, P.: ‘Cluster validation by prediction strength’, J. Computat. Graph. Stat., 2005, 14, (3), pp. 511–528 (doi: 10.1198/106186005X59243).
-
-
26)
-
22. Frommlet, F.: ‘Tag snp selection based on clustering according to dominant sets found using replicator dynamics’, Adv. Data Anal. Classif., 2010, 4, (1), pp. 65–83 (doi: 10.1007/s11634-010-0059-2).
-
-
27)
-
25. Hou, J., Pelillo, M.: ‘A simple feature combination method based on dominant sets’, Pattern Recognit., 2013, 46, (11), pp. 3129–3139 (doi: 10.1016/j.patcog.2013.04.005).
-
-
28)
-
4. Ester, M., Kriegel, H.P., Sander, J., Xu, X.W.: ‘A density-based algorithm for discovering clusters in large spatial databases with noise’. Proc. Int. Conf. Knowledge Discovery and Data Mining, Portland, USA, August 1996, pp. 226–231.
-
-
29)
-
10. Belkin, M., Niyogi, P.: ‘Laplacian eigenmaps for dimensionality reduction and data representation’, Neural Comput., 2003, 15, (6), pp. 1373–1396 (doi: 10.1162/089976603321780317).
-
-
30)
-
20. Torsello, A., Bulo, S.R., Pelillo, M.: ‘Beyond partitions: allowing overlapping groups in pairwise clustering’. Proc. Int. Conf. Pattern Recognit., Tempa, USA, December 2008, pp. 1–4.
-
-
31)
-
8. Ding, C., He, X., Zha, H., Gu, M., Simon, H.: ‘A min-max cut algorithm for graph partitioning and data clustering’. Proc. Int. Conf. Data Mining, San Jose, USA, November 2011, pp. 107–114.
-
-
32)
-
30. Gionis, A., Mannila, H., Tsaparas, P.: ‘Clustering aggregation’, ACM Trans. Knowl. Discov. Data, 2007, 1, (1), pp. 1–30 (doi: 10.1145/1217299.1217303).
-
-
33)
-
26. Brendan, J.F., Delbert, D.: ‘Clustering by passing messages between data points’, Science, 2007, 315, pp. 972–976 (doi: 10.1126/science.1136800).
-
-
34)
-
28. Pavan, M., Pelillo, M.: ‘A graph-theoretic approach to clustering and segmentation’. Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, Madison, USA, June 2003, pp. 145–152.
-
-
1)