Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Development of content-based SMS classification application by using Word2Vec-based feature extraction

While mobile instant messaging applications such as WhatsApp, Messenger, Viber offer benefits to phone users such as price, easy usage, stable, collective and direct communication, SMS (short message service) is still considered a more reliable privacy-preserving technology for mobile communication. This situation directs the institutions that want to perform the product promotion such as advertising, informing, promotion etc. to use SMS. However, spam messages sent from unknown sources constitute a serious problem for SMS recipients. In this study, a content-based classification model which uses the machine learning to filter out unwanted messages is proposed. From the selected dataset, the model to be used in the classification is created with the help of Word2Vec word embedding tool. Thanks to this model, two new features are revealed for calculating the distances of messages to spam and ham words. The performances of the classification algorithms are compared by taking these two new features into consideration. The random forest method succeeded with a correct accuracy rate of 99.64%. In comparison to other studies using the same dataset, more successful correct classification percentage is achieved.

References

    1. 1)
      • 23. Uysal, A.K., Gunal, S., Ergin, S., et al: ‘A novel framework for SMS spam filtering’. Int. Symp. Innovations in Intelligent Systems and Applications, Trabzon, Turkey, July 2012, pp. 14.
    2. 2)
      • 8. Almeida, T.A., Hidalgo, J.M.., Yamakami, A.: ‘Contributions to the study of SMS spam filtering: new collection and results’. Proc. 11th ACM Symp. Document engineering, New York, USA, September 2011, pp. 259262.
    3. 3)
      • 42. Balli, S., Sağbas, E.A.: ‘The usage of statistical learning methods on wearable devices and a case study: activity recognition on smartwatches, advances in statistical methodologies and their application to real problems’ (InTech, Rijeka, Croatia, 2017).
    4. 4)
      • 6. Junaid, M.B., Farooq, M.: ‘Using evolutionary learning classifiers to do mobile spam (SMS) filtering’. Proc. of Genetic and Evolutionary Computation Conf., Dublin, Ireland, July 2011, pp. 17951802.
    5. 5)
      • 31. Mikolov, T., Sutskever, I., Chen, K., et al: ‘Distributed representations of words and phrases and their compositionality’, Proc. Adv. Neural Inf. Process. Syst., 2013, 26, pp. 31113119.
    6. 6)
      • 10. NUS SMS corpus’. Available at http://www.comp.nus.edu.sg/entrepreneurship/innovation/osr/corpus, accessed January 2018.
    7. 7)
      • 33. Breiman, L.: ‘Random forests’, Machine Learning, 43, (1), (Springer, Berlin, Heidelberg, 2001), pp. 532.
    8. 8)
      • 14. He, H., Watson, T., Maple, C., et al: ‘A new semantic attribute deep learning with a linguistic attribute hierarchy for spam detection’. Int. Joint Conf. Neural Networks (IJCNN), Anchorage, AK, USA, May 2017, pp. 38623869.
    9. 9)
      • 7. SMS spam collection’. Available at http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection, accessed January 2018.
    10. 10)
      • 20. Suleiman, D., Al-Naymat, G.: ‘SMS spam detection using H2O framework’, Procedia Comput. Sci., 2017, 113, pp. 154161.
    11. 11)
      • 21. Nagwani, N.K.: ‘A Bi-level text classification approach for SMS spam filtering and identifying priority messages’, Int. Arab J. Inf. Technol., 2017, 14, (4), pp. 473480.
    12. 12)
      • 15. Arifin, D.D., Shaufiah, , Bijaksana, M.A.: ‘Enhancing spam detection on mobile phone short message service (SMS) performance using FP-growth and Naive Bayes classifier’. IEEE Asia Pacific Conf. Wireless and Mobile (APWiMob), Bandung, Indonesia, September 2016, pp. 8084.
    13. 13)
      • 4. Delany, S.J., Buckley, M., Greene, D.: ‘SMS spam filtering: methods and data’, Expert Syst. Appl., 2012, 39, (10), pp. 98999908.
    14. 14)
      • 9. SMS spam corpus v. 0.1’. Available at http://www.esp.uem.es/jmgomez/smsspamcorpus, accessed January 2018.
    15. 15)
      • 19. Akbari, F., Sajedi, H.: ‘SMS spam detection using selected text features and boosting classifiers’. 7th Conf. Information and Knowledge Technology (IKT), Urmia, Iran, May 2015, pp. 15.
    16. 16)
      • 35. Silalahi, M., Hardiyati, R., Nadhiroh, I.M., et al: ‘A text classification on the downstreaming potential of biomedicine publications in Indonesia’. Int. Conf. Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia, 2018, pp. 515519.
    17. 17)
      • 17. Ma, J., Zhang, Y., Liu, J., et al: ‘Intelligent SMS spam filtering using topic model’. Int. Conf. Intelligent Networking and Collaborative Systems (INCoS), Ostrawva, Czech Republic, September 2016, pp. 380383.
    18. 18)
      • 25. Enríquez, F., Troyano, J.A., Lopez-Solaz, T.: ‘An approach to the use of word embeddings in an opinion classification task’, Expert Syst. Appl., 2016, 66, pp. 16.
    19. 19)
      • 34. Wang, Z., Qu, Z.: ‘Research on Web text classification algorithm based on improved CNN and SVM’. 2017 IEEE 17th Int. Conf. Communication Technology (ICCT), Chengdu, China, 2017, pp. 19581961.
    20. 20)
      • 26. Zhang, D., Xu, H., Su, Z., et al: ‘Chinese comments sentiment classification based on word2vec and SVM’, Expert Syst. Appl., 2015, 42, (4), pp. 18571863.
    21. 21)
      • 29. Introduction to Word2Vec’. Available at https://deeplearning4j.org/word2vec, accessed January 2018.
    22. 22)
      • 38. Saab, S.A., Mitri, N., Awad, M.: ‘Ham or spam? A comparative study for some content-based classification algorithms for email filtering’. 17th IEEE Mediterranean Electrotechnical Conf., Beirut, 2014, pp. 339343.
    23. 23)
      • 13. Bozan, Y.., Çoban, Ö., Özyer, G.T., et al: ‘SMS spam filtering based on text classification and expert system’. 23nd Signal Processing and Communications Applications Conf. (SIU), Malatya, Turkey, May 2015, pp. 23452348.
    24. 24)
      • 27. Word2vec Tutorial’. Available at https://rare-technologies.com/word2vec-tutorial, accessed January 2018.
    25. 25)
      • 28. NLP with gensim (word2vec)’. Available at http://www.samyzaf.com/ML/nlp/nlp.html, accessed January 2018.
    26. 26)
      • 5. Karasoy, O., Balli, S.: ‘Developing mobile application for content base spam SMS filtering and comparison of classification algorithms’. Int. Artificial Intelligence and Data Processing Symp., Malatya, Turkey, September, 2016, pp. 4753.
    27. 27)
      • 36. Sethi, P., Bhandari, V., Kohli, B.: ‘SMS spam detection and comparison of various machine learning algorithms’. Int. Conf. Computing and Communication Technologies for Smart Nation (IC3TSN), Gurgaon, 2017, pp. 2831.
    28. 28)
      • 41. Weka’. Available at https://www.cs.waikato.ac.nz/∼ml/weka/, accessed January 2018.
    29. 29)
      • 22. Uysal, A.K., Gunal, S., Ergin, S., et al: ‘The impact of feature extraction and selection on SMS spam filtering’, Elektron. Elektrotech., 2013, 19, (5), pp. 6772.
    30. 30)
      • 37. Liu, Y., Liu, S., Wang, Y., et al: ‘A stochastic computational multi-layer perceptron with backward propagation’, IEEE Trans. Comput., 2018, 67, (9), pp. 12731286.
    31. 31)
      • 16. Waheeb, W., Ghazali, R., Deris, M.M.: ‘Content-based SMS spam filtering based on the scaled conjugate gradient backpropagation algorithm’. 12th Int. Conf. Fuzzy Systems and Knowledge Discovery (FSKD), Zhangjiajie, China, August 2015, pp. 675680.
    32. 32)
      • 24. Bılgıç, A., Kurban, O.C., Yildirim, T.: ‘Face recognition classifier based on dimension reduction in deep learning properties’. 25th Signal Processing and Communications Applications Conf. (SIU), Antalya, Turkey, May 2017, pp. 14.
    33. 33)
      • 11. DIT SMS spam dataset’. Available at http://www.dit.ie/computing/research/resources/smsdata, accessed January 2018.
    34. 34)
      • 18. Fernandes, D., Costa, K.A.P., Almeida, T.A., et al: ‘SMS spam filtering through optimum-path forest-based classifiers’. IEEE 14th Int. Conf. Machine Learning and Applications (ICMLA), Miami, FL, USA, December 2015, pp. 133137.
    35. 35)
      • 1. Castiglione, A., De Prisco, R., De Santis, A.: ‘Do you trust your phone?’. E-Commerce and Web Technologies, Linz, Austria, September 2009, pp. 5061.
    36. 36)
      • 43. Witten, I.H., Frank, E., Hall, M.A.: ‘Data mining: practical machine learning tools and techniques’ (Elsevier, Burlington, 2011, 3rd edn.).
    37. 37)
      • 3. Church, K., Oliveira, R.D.: ‘What's up with Whatsapp?: comparing mobile instant messaging behaviors with traditional SMS’. 15th Int. Conf. Human-Computer Interaction with Mobile Devices and Services, Mobile HCI, Munich, Germany, 2013.
    38. 38)
      • 12. Turkish SMS’. Available at http://ceng.anadolu.edu.tr/par, accessed January 2018.
    39. 39)
      • 32. Mathew, K., Issac, B.: ‘Intelligent spam classification for mobile text message’. Proc. 2011 Int. Conf. Computer Science and Network Technology, Harbin, China, December 2011, pp. 101105.
    40. 40)
      • 30. Wensen, L., Zewen, C., Jun, W., et al: ‘Short text classification based on wikipedia and Word2vec’. 2nd IEEE Int. Conf. Computer and Communications (ICCC), Chengdu, China, October 2016, pp. 11951200.
    41. 41)
      • 2. Ho, T., Kang, H., Kim, S.: ‘Graph-based KNN algorithm for spam SMS detection’, J. Univers. Comput. Sci., 2013, 19, (16), pp. 24042419.
    42. 42)
      • 40. Phyton’. Available at https://www.python.org/, accessed January 2018.
    43. 43)
      • 39. GENSIM’. Available at https://radimrehurek.com/gensim/models/word2vec.html, accessed January 2018.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-sen.2018.5046
Loading

Related content

content/journals/10.1049/iet-sen.2018.5046
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address