Improving word vector model with part-of-speech and dependency grammar information

Chunhui Deng; Gangming Lai; Huifang Deng

Improving word vector model with part-of-speech and dependency grammar information

View Fulltext

Author(s): Chunhui Deng¹ ; Gangming Lai² ; Huifang Deng²
- Affiliations: 1: School of Computer Engineering, Guangzhou College of South China University of Technology , No. 1 Xue Fu Road, Huadu, Guangzhou 510800 , People's Republic of China ;
  2: School of Computer Science and Engineering, South China University of Technology, University Town , Panyu, Guanghzou 510006 , People's Republic of China
Source: Volume 5, Issue 4, December 2020, p. 276 – 282
DOI: 10.1049/trit.2020.0055 , Online ISSN 2468-2322

This is an open access article published by the IET, Chinese Association for Artificial Intelligence and Chongqing University of Technology under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/)

Received 22/02/2020, Accepted 29/06/2020, Revised 22/05/2020, Published 15/10/2020

Part-of-speech (POS) and dependency grammar (DG) are the basic components of natural language processing. However, current word vector models have not made full use of both POS information and DG information, and hence the models’ performances are limited to some extent. The authors first put forward the concept of POS vector, and then, based on continuous bag-of-words (CBOW), constructed four models: CBOW + P, CBOW + PW, CBOW + G, and CBOW + G + P to incorporate POS information and DG information into word vectors. The CBOW + P and CBOW + PW models are based on POS tagging, the CBOW + G model is based on DG parsing, and the CBOW + G + P model is based on POS tagging and DG parsing. POS information is integrated into the training process of word vectors through the POS vector to solve the problem of the POS similarity being difficult to measure. The POS vector correlation coefficient and distance weighting function are used to train the POS vector as well as the word vector. DG information is used to correct the information loss caused by fixed context windows. Dependency relations weight is used to measure the difference of dependency relations. Experiments demonstrated the superior performance of their models while the time complexity is still kept the same as the base model of CBOW.

References

1. 1)
  - [1]. Sun, F., Guo, J.-F., Lan, Y.-Y., et al: ‘A survey on distributed word representation’, Chin. J. Comput., 2019, 42, (7), pp. 1605–1625.
2. 2)
  - [22]. Yu, S.: ‘Foundational language resource data on web sites’, Terminology Stand. Inf. Technol., 2001, (4), pp. 19–23.
3. 3)
  - [4]. Pennington, J., Socher, R., Manning, C.D.: ‘Glove: global vectors for word representation’. Proc. 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP), Doha Qatar, October 2014, pp. 1532–1543.
4. 4)
  - [18]. Lee, J., Cho, K., Hofmann, T.: ‘Fully character-level neural machine translation without explicit segmentation’, Trans. Assoc. Comput. Linguistics, 2017, 5, pp. 365–378 (doi: 10.1162/tacl_a_00067).
5. 5)
  - [19]. Turian, J., Ratinov, L., Bengio, Y.: ‘Word representations: a simple and general method for semi-supervised learning’. Proc. 48th Annual Meeting of the Association for Computational Linguistics, Uppsala Sweden, July 2010, pp. 384–394.
6. 6)
  - [15]. Chiu, J.P.C., Nichols, E.: ‘Named entity recognition with bidirectional LSTM-CNNs’, Trans. Assoc. Comput. Linguistics, 2016, 4, pp. 357–370 (doi: 10.1162/tacl_a_00104).
7. 7)
  - [6]. Yin, R., Wang, Q., Li, P., et al: ‘Multi-granularity Chinese word embedding’. Proc. 2016 Conf. on Empirical Methods in Natural Language Processing, Austin, Texas, November 2016, pp. 981–986.
8. 8)
  - [5]. Chen, X., Xu, L., Liu, Z., et al: ‘Joint learning of character and word embeddings’. Proc. 24th Int. Joint Conf. on Artificial Intelligence, Buenos Aires, Argentina, July 2015, pp. 1236–1242.
9. 9)
  - [25]. Myers, L., Sirois, M.J.: ‘Spearman correlation coefficients, differences between’, in Kotz, S., Read, C.B., Balakrishnan, N., et al: ‘Encyclopedia of statistical sciences’ (John Wiley & Sons, Inc., USA, 2006), doi: 10.1002/0471667196.ess5050.pub2.
10. 10)
  - [11]. Continuous Bag of Words (CBOW), https://iksinc.online/tag/continuous-bag-of-words-cbow/, accessed 22 February 2020.
11. 11)
  - [24]. Sun, J.: ‘Jieba (Chinese for ‘to stutter’) Chinese text segmentation’, https://github.com/fxsjy/jieba, 2018’, accessed 20 February 2020.
12. 12)
  - [7]. Cao, S., Lu, W., Zhou, J., et al: ‘Cw2vec: learning Chinese word embeddings with stroke N-gram information’. Proc. 32nd AAAI Conf. on Artificial Intelligence, New Orleans, LA, USA, February 2018, pp. 5053–5061.
13. 13)
  - [14]. Zhang, X., Zhao, J., Lecun, Y.: ‘Character-level convolutional networks for text classification’. Advances in Neural Information Processing Systems – Proc. 29th Annual Conf. on Neural Information Processing Systems, Montreal, QC, Canada, December 2015, pp. 649–657.
14. 14)
  - [16]. Lample, G., Ballesteros, M., Subramanian, S., et al: ‘Neural architectures for named entity recognition’, arXiv:1603.01360v3 [cs.CL], 2016.
15. 15)
  - [8]. Niu, Y., Xie, R, Liu, Z., et al: ‘Improved word representation learning with sememes’. Proc. 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, July 2017, pp. 2049–2058.
16. 16)
  - [2]. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: ‘Learning representations by back-propagating errors’, Nature, 1986, 323, pp. 533–536 (doi: 10.1038/323533a0).
17. 17)
  - [9]. Liu, Q., Ling, Z., Jiang, H., et al: ‘Part-of-speech relevance weights for learning word embeddings’, arXiv:1603.07695v1 [cs.CL], 2016.
18. 18)
  - [10]. Pan, B., Yu, C., Zhang, Q., et al: ‘The improved model for word2vec based on part of speech and word order’, Acta Electron. Sin., 2018, 46, (8), pp. 1976–1982.
19. 19)
  - [13]. Lai, S., Xu, L., Liu, K., et al: ‘Recurrent convolutional neural networks for text classification’. Proc. 29th AAAI Conf. on Artificial Intelligence, Austin, TX, USA, January 25, pp. 2267–2273.
20. 20)
  - [21]. Levy, O., Goldberg, Y.: ‘Dependency-based word embeddings’. Proc. 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Long Papers), Baltimore, MD, USA, June 2014, pp. 302–308.
21. 21)
  - [20]. Bengio, Y., Courville, A., Vincent, P.: ‘Representation learning: a review and new perspectives’, IEEE Trans. Pattern Anal. Mach. Intell., 2013, 35, (8), pp. 1798–1828 (doi: 10.1109/TPAMI.2013.50).
22. 22)
  - [23]. Che, W., Li, Z, Liu, T.: ‘LTP: a Chinese language technology platform’. Proc. 23rd Int. Conf. on Computational Linguistics: Demonstrations, Beijing China, August 2010, pp. 13–16.
23. 23)
  - [17]. Zhang, M., Zhang, Y., Fu, G.: ‘Transition-based neural word segmentation’. Proc. 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin Germany, August 2016, pp. 421–431.
24. 24)
  - [3]. Mikolov, T., Chen, K., Corrado, G., et al: ‘Efficient estimation of word representations in vector space’. Workshop Track Proc. 1st Int. Conf. on Learning Representations, Scottsdale, AZ, USA, May 2013, pp. 1–12.
25. 25)
  - [12]. Wang, Q., Xu, J., Chen, H., et al: ‘Two improved continuous bag-of-word models’. Proc. Int. Joint Conf. on Neural Networks, Anchorage, AK, USA, May 2017, pp. 2851–2856.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Improving word vector model with part-of-speech and dependency grammar information

References

Related content