access icon free MapReduce framework based big data clustering using fractional integrated sparse fuzzy C means algorithm

Big data analytics gain significant interest over the traditional data-processing methodologies that engage in extracting the hidden patterns and correlations from the massive data, termed as big data. With the aim of relieving the computational complexity the clustering method plays a significant role. With the knowledge of the clustering algorithms, the big data arriving from the distributed sources is processed using the MapReduce framework (MRF). The MRF possesses two functions, namely, map function and reduce function, such that the map function is based on the proposed Fractional Sparse Fuzzy C-Means (FrSparse FCM) algorithm and reduce function is based on particle swarm optimisation-based whale optimisation algorithm (P-Whale). Initially, the optimal centroids are computed using the proposed algorithm in the mapper phase that is optimally tuned in the reducer phase, and it is clear that the proposed FrSparse FCM-based MRF ensures the parallel processing of the big data. Experimentation is performed using the Skin data set and the localisation data set taken from the UCI machine learning repository, and the analysis is progressed using the metrics, such as accuracy and DB Index. The analysis proves that the proposed method acquired a maximum accuracy of 90.6012% and a minimum DB Index of 5.33.

Inspec keywords: Big Data; data mining; learning (artificial intelligence); data analysis; pattern classification; parallel processing; pattern clustering; fuzzy set theory; particle swarm optimisation

Other keywords: fractional sparse fuzzy C-means algorithm; massive data; particle swarm optimisation-based whale optimisation algorithm; map function; Skin data; MapReduce framework; big data analytics; FrSparse FCM-based MRF; localisation data; big data clustering; clustering algorithms

Subjects: Optimisation techniques; Parallel software; Combinatorial mathematics; Knowledge engineering techniques; Data handling techniques

References

    1. 1)
      • 43. Tsapanos, N., Tefas, A., Nikolaidis, N., et al: ‘Big data clustering with kernel k-means: resources, time and performance’, Int. J. Artif. Intell. Tools, 2018, 27, (4), pp. 118.
    2. 2)
      • 32. Ekanayake, J., Pallickara, S., Fox, G.: ‘Mapreduce for data intensive scientific analyses’. Proc. – 4th IEEE Int. Conf. eScience, Indianapolis, IN, USA, 2008, pp. 277284.
    3. 3)
      • 2. Wu, X., Zhu, X., Wu, G.Q., et al: ‘Data mining with big data’, IEEE Trans. Knowl. Data Eng., 2014, 26, (1), pp. 97107.
    4. 4)
      • 1. Hidri, M.S., Zoghlami, M.A., Ayed, R.B.: ‘Speeding up the large-scale consensus fuzzy clustering for handling big data’, Fuzzy Sets Syst., 2017, 1, pp. 125.
    5. 5)
      • 29. Jordehi, A.R.: ‘A chaotic artificial immune system optimisation algorithm for solving global continuous optimisation problems’, Neural Comput. Appl., 2015, 26, (4), pp. 827833.
    6. 6)
      • 50. Wang, D., Tan, D., Liu, L.: ‘Particle swarm optimization algorithm: an overview’, Soft Comput., 2018, 22, (2), pp. 387408.
    7. 7)
      • 42. Traganitis, P.A., Slavakis, K., Giannakis, G.B.: ‘Sketch and validate for big data clustering’, IEEE J. Sel. Top. Signal Process., 2015, 9, (4), pp. 678690.
    8. 8)
      • 34. Cui, X., Zhu, P., Yang, X., et al: ‘Optimized big data K-means clustering using MapReduce’, J. Supercomput., 2014, 70, (3), pp. 12491259.
    9. 9)
      • 44. Shrivastava, P., Sahoo, L., Pandey, M., et al: ‘AKM – augmentation of K-means clustering algorithm for big data’, Intell. Eng. Informat., 2018, pp. 103109.
    10. 10)
      • 41. Ilango, S.S., Vimal, S., Kaliappan, M., et al: ‘Optimization using artificial bee colony based clustering approach for big data’, Cluster Comput., 2019, 22, pp. 1216912177.
    11. 11)
      • 28. Heidari, A.A., Abbaspour, R.A., Jordehi, A.R.: ‘An efficient chaotic water cycle algorithm for optimization tasks’, Neural Comput. Appl., 2017, 28, (1), pp. 5785.
    12. 12)
      • 45. Zhang, Q., Yang, L.T., Castiglione, A., et al: ‘Secure weighted possibilistic c-means algorithm on cloud for clustering big data’, Inf. Sci., 2019, 479, pp. 515525.
    13. 13)
      • 11. Luna-Romera, J., García-Gutiérrez, M., Martínez-Ballesteros, M., et al: ‘An approach to validity indices for clustering techniques in big data’, Prog. Artif. Intell., 2018, 7, pp. 8194.
    14. 14)
      • 40. Hajeer, M.H., Dasgupta, D.: ‘Handling big data using a data-aware HDFS and evolutionary clustering technique’, IEEE Trans. Big Data, 2017, 7790, (c), pp. 11.
    15. 15)
      • 20. Krishna, K., Murty, M.N.: ‘Genetic K-means algorithm’, IEEE Trans. Syst. Man Cybern. B Cybern., 1999, 29, (3), pp. 433439.
    16. 16)
      • 49. Mirjalili, S., Lewis, A.: ‘The whale optimization algorithm’, Adv. Eng. Softw., 2016, 95, pp. 5167.
    17. 17)
      • 15. Welch, W.J.: ‘Algorithmic complexity: three NP-hard problems in computational statistics’, J. Stat. Comput. Simul., 1982, 15, (1), pp. 1725.
    18. 18)
      • 37. Wu, J., Wu, Z., Cao, J., et al: ‘Fuzzy consensus clustering with applications on big data’, IEEE Trans. Fuzzy Syst., 2017, 25, (6), pp. 14301445.
    19. 19)
      • 14. Drineas, P., Frieze, A., Kannan, R., et al: ‘Clustering large graphs via the singular value decomposition’, Mach. Learn., 2004, 56, (1–3), pp. 933.
    20. 20)
      • 24. Kao, Y.T., Zahara, E., Kao, I.W.: ‘A hybridized approach to data clustering’, Expert Syst., 2008, 34, (3), pp. 17541762.
    21. 21)
      • 38. Zhang, Q., Yang, L.T., Castiglione, A., et al: ‘Secure weighted possibilistic c-means algorithm on cloud for clustering big data’, Inf. Sci. (Ny), 2019, 479, pp. 515525.
    22. 22)
      • 52. Arrhythmia Data Set. Available at https://archive.ics.uci.edu/ml/datasets/arrhythmia.
    23. 23)
      • 46. Kulkarni, O., Jena, S.: ‘MKS-MRF: A multiple kernel and a swarm-based map reduce framework for big data clustering’, Int. Rev. Comput. Softw., 2016, 11, (11), pp. 9971006.
    24. 24)
      • 31. Dean, J., Ghemawat, S.: ‘Mapreduce: simplified data processing on large clusters’. Proc. 6th Symp. on Operating Systems Design and Implementation, San Francisco, California, USA, 2004, pp. 137149.
    25. 25)
      • 9. Fan, T.: ‘Research and implementation of user clustering based on MapReduce in multimedia big data’, Multimed. Tools Appl., 2018, 77, pp. 1001710031.
    26. 26)
      • 26. Tabakhi, S., Moradi, P., Akhlaghian, F.: ‘An unsupervised feature selection algorithm based on ant colony optimization’, Eng. Appl. Artif. Intell., 2014, 32, pp. 112123.
    27. 27)
      • 10. Tan, P.N., Steinbach, M., Kumar, V.: ‘Chap 8: cluster analysis: basic concepts and algorithms’, Introd. to Data Min., 2005, Chapter 8.
    28. 28)
      • 4. Zhang, Q., Yang, L.T., Chen, Z.: ‘Deep computation model for unsupervised feature learning on big data’, IEEE Trans. Serv. Comput., 2016, 9, (1), pp. 161171.
    29. 29)
      • 39. Son, L.H., Tien, N.D.: ‘Tune up fuzzy C-means for big data: some novel hybrid clustering algorithms based on initial selection and incremental clustering’, Int. J. Fuzzy Syst., 2017, 19, (5), pp. 15851602.
    30. 30)
      • 33. Liu, T., Rosenberg, C., Rowley, H.A.: ‘Clustering billions of images with large scale nearest neighbor search’. Proc. – IEEE Workshop on Applications of Computer Vision (WACV), Austin, TX, USA, 2007.
    31. 31)
      • 7. Zhang, C., Hao, L., Fan, L.: ‘Optimization and improvement of data mining algorithm based on efficient incremental kernel fuzzy clustering for large data’, Cluster Comput., 2019, 22, pp. 30013010.
    32. 32)
      • 8. Wu, X., Kumar, V., Ross Quinlan, J., et al: ‘Top 10 algorithms in data mining’, Knowl. Inf. Syst., 2008, 14, (1), pp. 137.
    33. 33)
      • 16. Daga, B.S., Bhute, A.N.: ‘Predicting recurrence pattern in breast cancer using decision tree’, 2009.
    34. 34)
      • 17. Ghuge, C.A., Ruikar, S.D., Chandra Prakash, V.: ‘Support vector regression and extended nearest neighbor for video object retrieval’, Evol. Intell., 2018, pp. 114.
    35. 35)
      • 54. Xia, D., Wang, B., Li, Y., et al: ‘An efficient MapReduce-based parallel clustering algorithm for distributed traffic subarea division’, Discret. Dyn. Nat. Soc., 2015, Article ID 793010, p. 18.
    36. 36)
      • 6. Rehioui, H., Idrissi, A., Abourezq, M., et al: ‘DENCLUE-IM: a new approach for big data clustering’, Proc. Comput. Sci., 2016, 83, pp. 560567.
    37. 37)
      • 12. Madhulatha, T.S.: ‘An overview on clustering methods’, IOSR J. Eng., 2012, 2, (4), pp. 719725.
    38. 38)
      • 51. ‘UCI Machine Learning Repository’. Available at https://archive.ics.uci.edu/ml/datasets/Localization+Data+for+Person+Activity, accessed: 18 April 2018.
    39. 39)
      • 30. Bijari, K., Zare, H., Veisi, H., et al: ‘Memory-enriched big bang–big crunch optimization algorithm for data clustering’, Neural Comput., 2018, 29, pp. 111121.
    40. 40)
      • 13. Xu, R.: ‘Survey of clustering algorithms for MANET’, IEEE Trans. Neural Netw., 2005, 16, (3), pp. 645678.
    41. 41)
      • 5. Zhang, Q., Yang, L.T., Chen, Z., et al: ‘PPHOPCM: privacy-preserving high-order possibilistic c-means algorithm for big data clustering with cloud computing’, IEEE Trans. Big Data, 2017, 7790, pp. 11.
    42. 42)
      • 48. Chang, X., Wang, Q., Liu, Y., et al: ‘Sparse regularization in fuzzy c -means for high-dimensional data clustering’, IEEE Trans. Cybern., 2016, 47, (9), pp. 26162627.
    43. 43)
      • 47. Bhaladhare, P.R., Jinwala, D.C.: ‘A clustering approach for the l -diversity model in privacy preserving data mining using fractional Calculus-bacterial foraging optimization algorithm’, Adv. Comput. Eng.,2014, 2014, pp. 1013.
    44. 44)
      • 18. Remmiya, R., Abisha, C.: ‘Artifacts removal in EEG signal using a NARX model based CS learning algorithm’, Multimed. Res., 2018, 1, (1), pp. 18.
    45. 45)
      • 53. Kulkarni, O., Jena, S.: ‘MKS-MRF: a multiple kernel and a swarm-based map reduce framework for big data clustering’, Int. Rev. Comput. Softw., 2016, 11, (11).
    46. 46)
      • 56. Zhu, H., Guo, Y., Niu, M., et al: ‘Distributed SAR image change detection based on spark’. Proc. of IEEE Int. Conf. on Geoscience and Remote Sensing Symp. (IGARSS), Milan, Italy, 2015.
    47. 47)
      • 36. Polo, J., Carrera, D., Becerra, Y., et al: ‘Performance-driven task co-scheduling for mapreduce environments’. Proc. 2010 IEEE/IFIP Network Operations and Management Symp. (NOMS), Osaka, Japan, 2010, pp. 373380.
    48. 48)
      • 21. Shelokar, P.S., Jayaraman, V.K., Kulkarni, B.D.: ‘An ant colony approach for clustering’, Anal. Chim. Acta, 2004, 509, (2), pp. 187195.
    49. 49)
      • 25. Jordehi, A.R.: ‘Enhanced leader PSO (ELPSO): a new PSO variant for solving global optimisation problems’, Appl. Soft. Comput. J., 2015, 26, pp. 401417.
    50. 50)
      • 22. Cura, T.: ‘A particle swarm optimization approach to clustering’, Expert Syst., 2012, 39, (1), pp. 15821588.
    51. 51)
      • 55. Yu, Q., Ding, Z.: ‘An improved fuzzy C-means algorithm based on MapReduce’. Proc. of 8th Int. Conf. on Biomedical Engineering and Informatics (BMEI), Shenyang, China, 2015.
    52. 52)
      • 3. Ermiş, B., Acar, E., Cemgil, A.T.: ‘Link prediction in heterogeneous data via generalized coupled tensor factorization’, Data Min. Knowl. Discov., 2013, 29, (1), pp. 203236.
    53. 53)
      • 27. Jordehi, A.R.: ‘Brainstorm optimisation algorithm (BSOA): an efficient algorithm for finding optimal location and setting of FACTS devices in electric power systems’, Int. J. Electr. Power Energy Syst., 2015, 69, pp. 4857.
    54. 54)
      • 19. Al-Sultan, K.: ‘A tabu search approach to the clustering problem’, Pattern Recognit., 1995, 28, (9), pp. 14431451.
    55. 55)
      • 23. Zhang, C., Ouyang, D., Ning, J.: ‘An artificial bee colony approach for clustering’, Expert Syst., 2010, 37, (7), pp. 47614767.
    56. 56)
      • 35. Schölkopf, B., Platt, J., Hofmann, T.: ‘Map-reduce for machine learning on multicore’, Adv. Neural Inf. Process. Syst., 2007, 19, pp. 281288.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-ipr.2019.0899
Loading

Related content

content/journals/10.1049/iet-ipr.2019.0899
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading