Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Feature extraction based on information gain and sequential pattern for English question classification

The purpose of question classification (QC) is to assign a question to an appropriate category from the set of predefined categories that constitute a question taxonomy. Selected question features are able to significantly improve the performance of QC. However, feature extraction, particularly syntax feature extraction, has a high computational cost. To maintain or enhance performance without syntax features, this study presents a hybrid approach to semantic feature extraction and lexical feature extraction. These features are generated by improved information gain and sequential pattern mining methods, respectively. Selected features are then fed into classifiers for questions classification. Benchmark testing is performed using the public UIUC data set. The results reveal that the proposed approach achieves a coarse accuracy of 96% and fine accuracy of 90.4%, which is superior to existing methods.

References

    1. 1)
      • 48. Huang, Z., Thint, M., Qin, Z.: ‘Question classification using head words and their hypernyms’. Proc. of Empirical Methods in Natural Language Processing, Hawaii, USA, October, 2008, pp. 927936.
    2. 2)
      • 34. Su, L., Hu, Z., Yang, B.: ‘Cross-domain question classification in community question answering via kernel mapping’, New Rev. Hypermed. Multimed., 2015, 21, (3–4), pp. 227241.
    3. 3)
      • 40. Wu, Z., Palmer, M.: ‘Verb semantics and lexical selection’. Proc. of the 32nd Annual Meeting of the Associations for Computational Linguistics, Las Cruces, New Mexico, June, 1994, pp. 133138.
    4. 4)
      • 41. Lin, D.: ‘An information-theoretic definition of similarity’. Proc. of the 15th Int. Conf. on Machine Learning, Madison, USA, July, 1998, pp. 296304.
    5. 5)
      • 5. McRoy, S., Jones, S., Kurmally, A.: ‘Toward automated classification of consumers’ cancer-related questions with a new taxonomy of expected answer types’, Health Inf. J., 2016, 22, (3), pp. 523535.
    6. 6)
      • 2. Tomás, D., Vicedo, J.L.: ‘Minimally supervised question classification on fine-grained taxonomies’, Knowl. Inf. Syst., 2013, 36, pp. 303334.
    7. 7)
      • 27. Nguyen, M.L., Nguyen, T.T., Shimazu, A.: ‘Subtree mining for question classification problem’. Proc. of the 20th Int. Joint Conf. on Artificial Intelligent, Hyderabad, India, January, 2007, pp. 16951700.
    8. 8)
      • 39. Chan, W., Yang, W., Tang, J., et al: ‘Community question topic categorization via hierarchical kernelized classification’. In the Proc. of the 22nd ACM Int. Conf. on Conf. on Information and Knowledge Management, San Francisco, USA, October, 2013, pp. 959968.
    9. 9)
      • 44. Ma, T., Wang, Y., Tang, M., et al: ‘LED: a fast overlapping communities detection algorithm based on structural clustering’, Neurocomputing, 2016, 207, pp. 488500.
    10. 10)
      • 26. Xin, L., Dan, R.: ‘Learning question classifier: the role of semantic information’, Nat. Lang. Process., 2005, 12, (3), pp. 229249.
    11. 11)
      • 6. Verdu, E., Verdu, M.J., Regueras, L.M.: ‘A genetic: fuzzy expert system for automatic question classification in a competitive learning environment’, Expert Syst. Appl., 2012, 39, (8), pp. 74717478.
    12. 12)
      • 42. Pirró, G.: ‘A semantic similarity metric combining features and intrinsic information content’, Data Knowl. Eng., 2009, 68, (11), pp. 12891308.
    13. 13)
      • 13. Mishra, M., Mishra, V.K., Sharma, H.R.: ‘Question classification using semantic, syntactic and lexical features’, Int. J. Web Semant. Technol., 2013, 4, (3).
    14. 14)
      • 45. Wu, Y., Zhou, K., Liu, J., et al: ‘Mining sequential patterns with periodic general gap constraints’, Chin. J. Comput., 2017, 40, (6), pp. 13381352.
    15. 15)
      • 23. Zhang, D., Lee, W.S..: ‘Question classification using support vector machines’. Proc. of the 26th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Toronto, Canada, July, 2003, pp. 2632.
    16. 16)
      • 36. Pota, M., Esposito, M., De Pietro, G.: ‘A forward-selection algorithm for SVM-based question classification in cognitive systems’. Proc. of the 9th KES Int. Conf. on Intelligent Interactive Multimedia Systems and Services, Puerto de la Cruz, SPAIN, June 2016, pp. 587598.
    17. 17)
      • 24. Li, F., Zhang, X., Yuan, J., et al: ‘Classifying what-type questions by head noun tagging’. Proc. of 22th Int. Conf. on Computational Linguistics, Manchester, UK, August, 2008, pp. 481488.
    18. 18)
      • 37. Mishra, S.K., Kumar, P., Saha, S.K.: ‘A support vector machine based system for technical question classification’. Proc. of Proc. of the Third Int. Conf. on Mining Intelligence and Knowledge Exploration, Hyderabad, India, December 2015, pp. 640649.
    19. 19)
      • 20. Dodiya, T., Jain, S.: ‘Question classification for medical domain question answering system’. 2016 IEEE Int. WIE Conf. on Electrical and Computer Engineering, Pune, India, December, 2016, pp. 1921.
    20. 20)
      • 1. Liu, Y., Yi, X., Chen, R., et al: ‘A survey on frameworks and methods of question answering’. 3rd Int. Conf. on Information Science and Control Engineering, Beijing, China, July, 2016.
    21. 21)
      • 22. Hao, T., Xie, W., Wu, Q.: ‘Leveraging question target word features through semantic relation expansion for answer type classification’, Knowl.-Based Syst., 2017, 133, pp. 4352.
    22. 22)
      • 46. Pan, Y., Tang, Y., Lin, L., et al: ‘Question classification with semantic tree kernel’. Proc. of the 29th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Singapore, July, 2008, pp. 837838.
    23. 23)
      • 17. Figueroa, A., Neumann, G.: ‘Context-aware semantic classification of search queries for browsing community question-answering archives’, Knowl.-Based Syst., 2016, 96, pp. 113.
    24. 24)
      • 7. Chali, A., Hasan, S.A., Mojahid, M.: ‘A reinforcement learning formulation to the complex question answering problem’, Inf. Process. Manag., 2015, 51, pp. 252275.
    25. 25)
      • 9. Yen, S., Wu, Y., Yang, J., et al: ‘A support vector machine-based context-ranking model for question answering’, Inf. Sci., 2013, 224, pp. 7787.
    26. 26)
      • 31. Metzler, D., Croft, W.B.: ‘Analysis of statistical question classification for fact based questions’, Inf. Retr., 2005, 8, pp. 481504.
    27. 27)
      • 21. Razzaghnoori, M., Sajedi, H., Jazani, I.K.: ‘Question classification in Persian using word vectors and frequencies’, Cogn. Syst. Res., 2018, 47, pp. 1627.
    28. 28)
      • 14. Li, X., Roth, D.: ‘Learning question classifiers’. Proc. of the 19th Int. Conf. on Computational Linguistics, Taipei, China, August, 2002, pp. 17.
    29. 29)
      • 3. Liu, Y., Wang, L., Chen, R., et al: ‘A PUT-based approach to automatically extracting quantities and generating final answers for numerical attributes’, Entropy, 2016, 18, p. 6.
    30. 30)
      • 38. Li, Y., Su, L., Chen, J., et al: ‘Semi-supervised learning for question classification in CQA’, Nat. Comput., 2017, 16, pp. 567577.
    31. 31)
      • 15. Qu, B., Cong, G., Li, C., et al: ‘An evaluation of classification models for question topic categorization’, J. Am. Soc. Inf. Sci. Technol., 2012, 63, (5), pp. 889903.
    32. 32)
      • 12. Liu, L., Yu, Z., Guo, J., et al: ‘Chinese question classification based on question property kernel’, Int. J. Mach. Learn. Cybern., 2014, 5, (5), pp. 713720.
    33. 33)
      • 33. Pan, Y., Tang, Y., Luo, Y.: ‘Question classification using profile hidden Markov models’, Int. J. Artif. Intell. Tools, 2010, 19, (1), pp. 121131.
    34. 34)
      • 25. Mcmahon, J., Smith, F.J.: ‘Automatic recognition of focus and interrogative word in Chinese question classification’, Comput. Inf. Sci., 2010, 3, (1), pp. 168174.
    35. 35)
      • 11. Li, X., Huang, X., Wu, L.: ‘Combined multiple classifiers based on TBL algorithm and their application in question classification’, J. Comput. Res. Dev., 2008, 45, (3), pp. 535541.
    36. 36)
      • 8. Hu, B., Wang, D., Yu, G., et al: ‘An answer extraction algorithm based on syntax structure feature parsing and classification’, Chin. J. Comput., 2008, 31, (4), pp. 662676.
    37. 37)
      • 28. Le-Hong, P., Phan, X.H., Nguyen, T.D.: ‘Using dependency analysis to improve question’, Knowl. Syst. Eng., 2015, 326, pp. 653665.
    38. 38)
      • 32. Ray, S.K., Sing, H., Joshi, B.P.: ‘A semantic approach for question classification using WordNet and Wikipedia’, Pattern Recognit. Lett., 2010, 31, (13), pp. 19351943.
    39. 39)
      • 47. Loni, B., Tulder, G.V., Wiggers, P., et al: ‘Question classification with weighted combination of lexical, syntactical and semantic features’. Proc. of 14th Int. Conf. of Text, Speech and Dialog, Pilsen, Czech Republic, September, 2011, pp. 243250.
    40. 40)
      • 19. Haris, S.S.., Omar, Z.: ‘A rule-based approach in bloom's taxonomy question classification through natural language processing’. Proc. of 7th Int. Conf. on Computing and Convergence Technology, Seoul, South Korea, December, 2012, pp. 410414.
    41. 41)
      • 10. Sagara, T., Hagiwara, M.: ‘Natural language neural network and its application to question-answering system’, Neurocomputing, 2014, 142, pp. 201208.
    42. 42)
      • 16. Lezina, G., Braslavski, P.: ‘A large-scale community questions classification accounting for category similarity: an exploratory study’. Proc. of the 8th Russian Summer School in Information Retrieval, Nizhniy Novgorod, RUSSIA, August, 2014.
    43. 43)
      • 29. Xu, S., Cheng, G., Kong, F.: ‘Research on question classification for automatic question answering’. 2016 Int. Conf. on Asian Language Processing, Taiwan, November, 2016, pp. 218221.
    44. 44)
      • 18. Kwok, C., Etzioni, O., Weld, D.S.: ‘Scaling question answering to the web’, ACM Trans. Inf. Syst., 2001, 19, (3), pp. 242262.
    45. 45)
      • 4. Sarrouti, M., ElAlaoui, S.O.: ‘Machine learning-based method for question type classification in biomedical question answering’, Methods Inf. Med., 2017, 56, (3), pp. 209216.
    46. 46)
      • 35. Blunsom, P., Kocik, K., Curran, J.R.: ‘Question classification with loglinear models’. Proc. of the 29th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Seattle, USA, August 2006, pp. 615616.
    47. 47)
      • 43. Qu, Z., Keeney, J., Robitzsch, S., et al: ‘Multilevel pattern mining architecture for automatic network monitoring in heterogeneous wireless communication networks’, China Commun., 2016, 13, (7), pp. 108116.
    48. 48)
      • 30. Pota, M., Fuggi, A., Esposito, M., et al: ‘Extracting compact sets of features for question classification in cognitive systems: a comparative study’. Proceedings of the 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, Krakow, Poland, November 2015, pp. 551556.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-sen.2018.0006
Loading

Related content

content/journals/10.1049/iet-sen.2018.0006
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address