Feature extraction based on information gain and sequential pattern for English question classification

Feature extraction based on information gain and sequential pattern for English question classification

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Software — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

The purpose of question classification (QC) is to assign a question to an appropriate category from the set of predefined categories that constitute a question taxonomy. Selected question features are able to significantly improve the performance of QC. However, feature extraction, particularly syntax feature extraction, has a high computational cost. To maintain or enhance performance without syntax features, this study presents a hybrid approach to semantic feature extraction and lexical feature extraction. These features are generated by improved information gain and sequential pattern mining methods, respectively. Selected features are then fed into classifiers for questions classification. Benchmark testing is performed using the public UIUC data set. The results reveal that the proposed approach achieves a coarse accuracy of 96% and fine accuracy of 90.4%, which is superior to existing methods.


    1. 1)
      • 1. Liu, Y., Yi, X., Chen, R., et al: ‘A survey on frameworks and methods of question answering’. 3rd Int. Conf. on Information Science and Control Engineering, Beijing, China, July, 2016.
    2. 2)
      • 2. Tomás, D., Vicedo, J.L.: ‘Minimally supervised question classification on fine-grained taxonomies’, Knowl. Inf. Syst., 2013, 36, pp. 303334.
    3. 3)
      • 3. Liu, Y., Wang, L., Chen, R., et al: ‘A PUT-based approach to automatically extracting quantities and generating final answers for numerical attributes’, Entropy, 2016, 18, p. 6.
    4. 4)
      • 4. Sarrouti, M., ElAlaoui, S.O.: ‘Machine learning-based method for question type classification in biomedical question answering’, Methods Inf. Med., 2017, 56, (3), pp. 209216.
    5. 5)
      • 5. McRoy, S., Jones, S., Kurmally, A.: ‘Toward automated classification of consumers’ cancer-related questions with a new taxonomy of expected answer types’, Health Inf. J., 2016, 22, (3), pp. 523535.
    6. 6)
      • 6. Verdu, E., Verdu, M.J., Regueras, L.M.: ‘A genetic: fuzzy expert system for automatic question classification in a competitive learning environment’, Expert Syst. Appl., 2012, 39, (8), pp. 74717478.
    7. 7)
      • 7. Chali, A., Hasan, S.A., Mojahid, M.: ‘A reinforcement learning formulation to the complex question answering problem’, Inf. Process. Manag., 2015, 51, pp. 252275.
    8. 8)
      • 8. Hu, B., Wang, D., Yu, G., et al: ‘An answer extraction algorithm based on syntax structure feature parsing and classification’, Chin. J. Comput., 2008, 31, (4), pp. 662676.
    9. 9)
      • 9. Yen, S., Wu, Y., Yang, J., et al: ‘A support vector machine-based context-ranking model for question answering’, Inf. Sci., 2013, 224, pp. 7787.
    10. 10)
      • 10. Sagara, T., Hagiwara, M.: ‘Natural language neural network and its application to question-answering system’, Neurocomputing, 2014, 142, pp. 201208.
    11. 11)
      • 11. Li, X., Huang, X., Wu, L.: ‘Combined multiple classifiers based on TBL algorithm and their application in question classification’, J. Comput. Res. Dev., 2008, 45, (3), pp. 535541.
    12. 12)
      • 12. Liu, L., Yu, Z., Guo, J., et al: ‘Chinese question classification based on question property kernel’, Int. J. Mach. Learn. Cybern., 2014, 5, (5), pp. 713720.
    13. 13)
      • 13. Mishra, M., Mishra, V.K., Sharma, H.R.: ‘Question classification using semantic, syntactic and lexical features’, Int. J. Web Semant. Technol., 2013, 4, (3).
    14. 14)
      • 14. Li, X., Roth, D.: ‘Learning question classifiers’. Proc. of the 19th Int. Conf. on Computational Linguistics, Taipei, China, August, 2002, pp. 17.
    15. 15)
      • 15. Qu, B., Cong, G., Li, C., et al: ‘An evaluation of classification models for question topic categorization’, J. Am. Soc. Inf. Sci. Technol., 2012, 63, (5), pp. 889903.
    16. 16)
      • 16. Lezina, G., Braslavski, P.: ‘A large-scale community questions classification accounting for category similarity: an exploratory study’. Proc. of the 8th Russian Summer School in Information Retrieval, Nizhniy Novgorod, RUSSIA, August, 2014.
    17. 17)
      • 17. Figueroa, A., Neumann, G.: ‘Context-aware semantic classification of search queries for browsing community question-answering archives’, Knowl.-Based Syst., 2016, 96, pp. 113.
    18. 18)
      • 18. Kwok, C., Etzioni, O., Weld, D.S.: ‘Scaling question answering to the web’, ACM Trans. Inf. Syst., 2001, 19, (3), pp. 242262.
    19. 19)
      • 19. Haris, S.S.., Omar, Z.: ‘A rule-based approach in bloom's taxonomy question classification through natural language processing’. Proc. of 7th Int. Conf. on Computing and Convergence Technology, Seoul, South Korea, December, 2012, pp. 410414.
    20. 20)
      • 20. Dodiya, T., Jain, S.: ‘Question classification for medical domain question answering system’. 2016 IEEE Int. WIE Conf. on Electrical and Computer Engineering, Pune, India, December, 2016, pp. 1921.
    21. 21)
      • 21. Razzaghnoori, M., Sajedi, H., Jazani, I.K.: ‘Question classification in Persian using word vectors and frequencies’, Cogn. Syst. Res., 2018, 47, pp. 1627.
    22. 22)
      • 22. Hao, T., Xie, W., Wu, Q.: ‘Leveraging question target word features through semantic relation expansion for answer type classification’, Knowl.-Based Syst., 2017, 133, pp. 4352.
    23. 23)
      • 23. Zhang, D., Lee, W.S..: ‘Question classification using support vector machines’. Proc. of the 26th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Toronto, Canada, July, 2003, pp. 2632.
    24. 24)
      • 24. Li, F., Zhang, X., Yuan, J., et al: ‘Classifying what-type questions by head noun tagging’. Proc. of 22th Int. Conf. on Computational Linguistics, Manchester, UK, August, 2008, pp. 481488.
    25. 25)
      • 25. Mcmahon, J., Smith, F.J.: ‘Automatic recognition of focus and interrogative word in Chinese question classification’, Comput. Inf. Sci., 2010, 3, (1), pp. 168174.
    26. 26)
      • 26. Xin, L., Dan, R.: ‘Learning question classifier: the role of semantic information’, Nat. Lang. Process., 2005, 12, (3), pp. 229249.
    27. 27)
      • 27. Nguyen, M.L., Nguyen, T.T., Shimazu, A.: ‘Subtree mining for question classification problem’. Proc. of the 20th Int. Joint Conf. on Artificial Intelligent, Hyderabad, India, January, 2007, pp. 16951700.
    28. 28)
      • 28. Le-Hong, P., Phan, X.H., Nguyen, T.D.: ‘Using dependency analysis to improve question’, Knowl. Syst. Eng., 2015, 326, pp. 653665.
    29. 29)
      • 29. Xu, S., Cheng, G., Kong, F.: ‘Research on question classification for automatic question answering’. 2016 Int. Conf. on Asian Language Processing, Taiwan, November, 2016, pp. 218221.
    30. 30)
      • 30. Pota, M., Fuggi, A., Esposito, M., et al: ‘Extracting compact sets of features for question classification in cognitive systems: a comparative study’. Proceedings of the 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, Krakow, Poland, November 2015, pp. 551556.
    31. 31)
      • 31. Metzler, D., Croft, W.B.: ‘Analysis of statistical question classification for fact based questions’, Inf. Retr., 2005, 8, pp. 481504.
    32. 32)
      • 32. Ray, S.K., Sing, H., Joshi, B.P.: ‘A semantic approach for question classification using WordNet and Wikipedia’, Pattern Recognit. Lett., 2010, 31, (13), pp. 19351943.
    33. 33)
      • 33. Pan, Y., Tang, Y., Luo, Y.: ‘Question classification using profile hidden Markov models’, Int. J. Artif. Intell. Tools, 2010, 19, (1), pp. 121131.
    34. 34)
      • 34. Su, L., Hu, Z., Yang, B.: ‘Cross-domain question classification in community question answering via kernel mapping’, New Rev. Hypermed. Multimed., 2015, 21, (3–4), pp. 227241.
    35. 35)
      • 35. Blunsom, P., Kocik, K., Curran, J.R.: ‘Question classification with loglinear models’. Proc. of the 29th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Seattle, USA, August 2006, pp. 615616.
    36. 36)
      • 36. Pota, M., Esposito, M., De Pietro, G.: ‘A forward-selection algorithm for SVM-based question classification in cognitive systems’. Proc. of the 9th KES Int. Conf. on Intelligent Interactive Multimedia Systems and Services, Puerto de la Cruz, SPAIN, June 2016, pp. 587598.
    37. 37)
      • 37. Mishra, S.K., Kumar, P., Saha, S.K.: ‘A support vector machine based system for technical question classification’. Proc. of Proc. of the Third Int. Conf. on Mining Intelligence and Knowledge Exploration, Hyderabad, India, December 2015, pp. 640649.
    38. 38)
      • 38. Li, Y., Su, L., Chen, J., et al: ‘Semi-supervised learning for question classification in CQA’, Nat. Comput., 2017, 16, pp. 567577.
    39. 39)
      • 39. Chan, W., Yang, W., Tang, J., et al: ‘Community question topic categorization via hierarchical kernelized classification’. In the Proc. of the 22nd ACM Int. Conf. on Conf. on Information and Knowledge Management, San Francisco, USA, October, 2013, pp. 959968.
    40. 40)
      • 40. Wu, Z., Palmer, M.: ‘Verb semantics and lexical selection’. Proc. of the 32nd Annual Meeting of the Associations for Computational Linguistics, Las Cruces, New Mexico, June, 1994, pp. 133138.
    41. 41)
      • 41. Lin, D.: ‘An information-theoretic definition of similarity’. Proc. of the 15th Int. Conf. on Machine Learning, Madison, USA, July, 1998, pp. 296304.
    42. 42)
      • 42. Pirró, G.: ‘A semantic similarity metric combining features and intrinsic information content’, Data Knowl. Eng., 2009, 68, (11), pp. 12891308.
    43. 43)
      • 43. Qu, Z., Keeney, J., Robitzsch, S., et al: ‘Multilevel pattern mining architecture for automatic network monitoring in heterogeneous wireless communication networks’, China Commun., 2016, 13, (7), pp. 108116.
    44. 44)
      • 44. Ma, T., Wang, Y., Tang, M., et al: ‘LED: a fast overlapping communities detection algorithm based on structural clustering’, Neurocomputing, 2016, 207, pp. 488500.
    45. 45)
      • 45. Wu, Y., Zhou, K., Liu, J., et al: ‘Mining sequential patterns with periodic general gap constraints’, Chin. J. Comput., 2017, 40, (6), pp. 13381352.
    46. 46)
      • 46. Pan, Y., Tang, Y., Lin, L., et al: ‘Question classification with semantic tree kernel’. Proc. of the 29th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Singapore, July, 2008, pp. 837838.
    47. 47)
      • 47. Loni, B., Tulder, G.V., Wiggers, P., et al: ‘Question classification with weighted combination of lexical, syntactical and semantic features’. Proc. of 14th Int. Conf. of Text, Speech and Dialog, Pilsen, Czech Republic, September, 2011, pp. 243250.
    48. 48)
      • 48. Huang, Z., Thint, M., Qin, Z.: ‘Question classification using head words and their hypernyms’. Proc. of Empirical Methods in Natural Language Processing, Hawaii, USA, October, 2008, pp. 927936.

Related content

This is a required field
Please enter a valid email address