High-accuracy document classification with a new algorithm
- Author(s): T. Temel 1
-
-
View affiliations
-
Affiliations:
1:
Department of Mechatronics Engineering, Faculty of Natural Sciences , Architecture and Engineering, Bursa Technical University , 16320 Bursa , Turkey
-
Affiliations:
1:
Department of Mechatronics Engineering, Faculty of Natural Sciences , Architecture and Engineering, Bursa Technical University , 16320 Bursa , Turkey
- Source:
Volume 54, Issue 17,
23
August
2018,
p.
1028 – 1030
DOI: 10.1049/el.2018.0790 , Print ISSN 0013-5194, Online ISSN 1350-911X
A new algorithm based on learning vector quantisation classifier is presented based on a modified proximity-measure, which enforces a predetermined correct classification level in training while using sliding-mode approach for stable variation in weight updates towards convergence. The proposed algorithm and some well-known counterparts are implemented by using Python libraries and compared in a task of text classification for document categorisation. Results reveal that the new classifier is a successful contender to those algorithms in terms of testing and training performances.
Inspec keywords: learning (artificial intelligence); pattern classification; text analysis
Other keywords: modified proximity-measure; sliding-mode approach; document categorisation; stable variation; Python libraries; high-accuracy document classification; text classification; learning vector quantisation classifier; weight updates; predetermined correct classification level
Subjects: Document processing and analysis techniques; Knowledge engineering techniques
References
-
-
1)
-
3. Li, C.H., Park, S.C.: ‘An efficient document classification model using an improved back propagation neural network and singular value decomposition’, Expert Syst. Appl., 2009, 36, pp. 3208–3215 (doi: 10.1016/j.eswa.2008.01.014).
-
-
2)
-
9. Pacella, M., Grieco, A., Blaco, M.: ‘On the use of self-organizing map for text clustering in engineering change process analysis: a case study’, Comput. Intel. Neurosci., 2016, ID 5139574, pp. 1–11.
-
-
3)
-
13. Temel, T.: ‘System and circuit design for biologically-inspired intelligent learning’ (IGI Global, PA, USA, 2010).
-
-
4)
-
15. Hammer, B., Hoffmann Schleif, D.F.-M., Zhu, X.: ‘Learning vector quantization for (dis-)similarities’, Neurocomputing, 2014, 131, pp. 43–51 (doi: 10.1016/j.neucom.2013.05.054).
-
-
5)
-
4. Gkanogiannis, A., Kalamboukis, T.: ‘A perceptron-like linear supervised algorithm for text classification’, in Cao, L., et al (Ed) ‘Advanced data mining and applications’ (Springer, Berlin, 6440, 2010), pp. 86–97.
-
-
6)
-
14. Umer, M.F., Khiyal, M.S.H.: ‘Classification of textual documents using learning vector quantization’, Inf. Technol. J., 2007, 6, pp. 154–159 (doi: 10.3923/itj.2007.154.159).
-
-
7)
-
11. Nova, D., Estévez, P.: ‘A review of learning vector quantization classifiers’, Neural Comput. Appl., 2013, 25, (3–4), pp. 511–524 (doi: 10.1007/s00521-013-1535-3).
-
-
8)
-
1. Joachims, T.: ‘A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization’. Proc. Int. Conf. Machine Learning, San Francisco, CA, USA, July 1997, pp. 143–151.
-
-
9)
-
16. Kaden, M., Lange, M., Nebel, D., et al: ‘Aspects in classification learning - review of recent, developments in learning vector quantization’, Found. Comput. Decision Sci., 2014, 39, (1), pp. 79–105 (doi: 10.2478/fcds-2014-0006).
-
-
10)
-
18. Temel, T., Ashrafiuon, H.: ‘Sliding-mode control approach for faster tracking’, Electron. Lett., 2012, 48, (15), pp. 916–917 (doi: 10.1049/el.2012.1576).
-
-
11)
-
10. Miao, D., Duan, Q., Zhang, H., et al: ‘Rough set based hybrid algorithm for text classification’, Expert Syst. Appl., 2009, 36, (5), pp. 9168–9174 (doi: 10.1016/j.eswa.2008.12.026).
-
-
12)
-
17. Temel, T.: ‘A new classification algorithm: optimally generalized learning vector quantization (OGLVQ)’, Neural Netw. World, 2017, 27, (6), pp. 569–576 (doi: 10.14311/NNW.2017.27.031).
-
-
13)
-
5. Isa, D., Lee, L.H., Kallimani, V.P., et al: ‘Text documents preprocessing with the Bahes formula for classification using the support vector machine’, Trans. Knowl. Data Eng., 2008, 20, pp. 1264–1272 (doi: 10.1109/TKDE.2008.76).
-
-
14)
-
8. Bang, S.L., Yang, J.D., Yang, H.J.: ‘Hierarchical document categorization with k-NN and concept-based thesauri’. Inf. Process. Manag., 2006, 42, (2), pp. 387–406 (doi: 10.1016/j.ipm.2005.04.003).
-
-
15)
-
49. Janardhanan, S., Bandyopadhyay, B.: ‘On discretization of continuous-time terminal sliding mode’, IEEE Trans. Autom. Control, 2006, 51, (9), pp. 1532–1536 (doi: 10.1109/TAC.2006.880805).
-
-
16)
-
6. Khan, A., Baharudin, B., Lee, L.H., et al: ‘A review of machine learning algorithms for text-documents classification’, J. Adv. Inf. Technol., 2010, 1, (1), pp. 4–20.
-
-
17)
-
2. Frank, E., Bouckaert, R.: ‘Naive Bayes for text classification with unbalanced classes’. Proc. European Conf. Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, September 2006, vol. 4213, pp. 503–510.
-
-
18)
-
12. Temel, T., Karlik, B.: ‘An improved odor recognition system using learning vector quantization with a new discriminant analysis’, Neural Netw. World, 2007, 17, (4), pp. 287–294.
-
-
19)
-
7. Li, Y.H., Jain, A.K.: ‘Classification of text documents’, Comput. J., 1998, 8, pp. 537–546 (doi: 10.1093/comjnl/41.8.537).
-
-
20)
-
20. Temel, T., Ashrafiuon, H.: ‘Sliding-mode speed controller for tracking of underactuated surface vessels with extended Kalman filter’, Electron. Lett., 2015, 51, (6), pp. 467–469 (doi: 10.1049/el.2014.4516).
-
-
1)