Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Characterising text mining: a systematic mapping review of the Portuguese language

Documents written in natural language constitute a major part of the artefacts produced during the software engineering life cycle. Studies indicate that more than 80% of enterprise data is stored in some sort of unstructured form, mainly as text. Therefore, the growth of user-generated content, especially from social media, provides a huge amount of data which allows discovering the experiences, opinions, and feelings of users. Text mining refers to the set of tools, techniques, and algorithms adopted to extract useful information from unstructured data. Considering that Portuguese ranks among the ten most spoken languages, and it is the second most common in Twitter, this study aims to map current primary studies that relate to the application of text mining for Portuguese. A systematic mapping method was applied and 6075 primary studies were retrieved up to the year 2014. A total of 203 studies were included, from which more than 60% analyse texts written in Brazilian variant. The majority of studies focus on the text classification task. Support vector machine and Naïve Bayes appear as main the algorithms. Folha de São Paulo and Público newspapers appear as main corpora, followed by the Portuguese Attorney General's Office corpus and Twitter.

References

    1. 1)
      • 7. Shi, G., Kong, Y.: ‘Advances in theories and applications of text mining’. Int. Conf. Information Science and Engineering (ICISE2009), 2009, pp. 41674170.
    2. 2)
      • 12. de Abreu, S.C., Bonamigo, T.L., Vieira, R.: ‘A review on relation extraction with an eye on portuguese’, J. Braz. Comput. Soc., 2013, 19, (4), pp. 553571.
    3. 3)
      • 15. Souza, E., Vitório, D., Castro, D., et al: ‘Characterizing opinion mining: a systematic mapping study of the portugese language’. Proc. 12th Computational Processing of the Portuguese Language (PROPOR'2016), 2016, (LNCS), pp. 122127.
    4. 4)
      • 3. Marine-Roig, E., Anton Clavé, S.: ‘Tourism analytics with massive user-generated content: a case study of Barcelona’, J. Destination Mark. Manage., 2015, 4, pp. 111.
    5. 5)
      • 30. Evangelista, T.R., Padilha, T.P.P.: ‘Monitoramento de posts sobre empresas de ecommerce em redes sociais utilizando análise de sentimentos’. Brazilian Workshop on Social Network Analysis and Mining (BraSNAM), 2013.
    6. 6)
      • 18. Petersen, K., Feldt, R., Mujtaba, S., et al: ‘Systematic mapping studies in software engineering’. Proc. 12th Int. Conf. Evaluation and Assessment in Software Engineering, 2008, pp. 6877.
    7. 7)
      • 5. Hotho, A., Andreas, N., Paaß, G., et al: ‘A brief survey of text mining’, LDV Forum – GLDV J. Comput. Linguist. Lang. Technol., 2005, 20, pp. 137.
    8. 8)
      • 25. Kitchenham, B.: ‘Procedures for performing systematic reviews’. Report, TR/SE-0401, Keele University Technical, 2004.
    9. 9)
      • 6. Feldman, R., Dagan, I.: ‘Knowledge discovery in textual databases (KDT)’. Int. Conf. Knowledge Discovery and Data Mining (KDD), 1995, pp. 112117. Available at http://www.aaai.org/Papers/KDD/1995/KDD95-012.pdf, Accessed: March 2016.
    10. 10)
      • 4. ‘Twitter Official Webpage’, 2016. Available at https://about.twitter.com/company, Accessed: March, 2016.
    11. 11)
      • 10. Ravi, K., Ravi, V.: ‘A survey on opinion mining and sentiment analysis: tasks, approaches and applications’, Knowl.-Based Syst., 2015, 89, pp. 1446.
    12. 12)
      • 21. Gupta, V., Lehal, G.S.: ‘A survey of text mining techniques and applications’, J. Emerg. Technol. WEB Intell., 2009, 1, (1), pp. 6076.
    13. 13)
      • 19. Weiss, S., Indurkhya, N., Zhang, T., et al: ‘Text mining: predictive methods for analyzing unstructured information’ (Springer Verlag, 2004).
    14. 14)
      • 14. Souza, E., Castro, D., Vitório, D., et al: ‘Characterizing user-generated text content mining: a systematic mapping study of the portugese language’. Proc. Fourth World Conf. Information Systems and Technologies (WorldCIST'16). New Advances in Information Systems and Technologies, 2016, pp. 10151024.
    15. 15)
      • 27. Kitchenham, B.A., Dyba, T., Jorgensen, M.: ‘Evidence-based software engineering’. Proc. 26th Int. Conf. Software Engineering, 2004, pp. 273281.
    16. 16)
      • 17. Mostafa, M.M.: ‘More than words: social networks text mining for consumer brand sentiments’, Expert Syst. Appl., 2013, 40, (10), pp. 42414251.
    17. 17)
      • 24. Kitchenham, B., Charters, S.: ‘Guidelines for performing systematic literature reviews in software engineering’. Technical Report, EBSE-2007-01, School of Computer Science and Mathematics, Keele University, 2007.
    18. 18)
      • 20. Enríquez, F., Cruz, F.L., Ortega, F.J., et al: ‘A comparative study of classifier combination applied to NLP tasks’, Inf. Fusion, 2013, 14, (3), pp. 255267.
    19. 19)
      • 8. Pang, B., Lee, L.: ‘Opinion mining and sentiment analysis’, Found. Trends Inf. Retr., 2008, 2, (12), pp. 1135.
    20. 20)
      • 13. Tan, A.-H.: ‘Text mining: the state of the art and the challenges concept-based’. Proc. PAKDD 1999 Workshop on Knowledge Disocovery from Advanced Databases, 1999, pp. 6570.
    21. 21)
      • 22. Pardo, T., Gasperin, C., Caseli, H., et al: ‘Computational linguistics in Brazil: an overview’. Proc. NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas, 2010, pp. 17.
    22. 22)
      • 9. Liu, B., Zhang, L.: ‘A survey of opinion mining and sentiment analysis’, in Aggarwal, C.C., Zhai, C.X. (Eds.): ‘Mining text data’ (Springer US, Boston, MA, 2012), ch. 1, pp. 415463.
    23. 23)
      • 29. Laboreiro, G., Boşnjak, M., Sarmento, L., et al: ‘Determining language variant in microblog messages’. Proc. 28th Annual ACM Symp. on Applied Computing, 2013, pp. 902907.
    24. 24)
      • 1. Witte, R., Li, Q., Zhang, Y., et al: ‘Text mining and software engineering: an integrated source code and document analysis approach’, IET Softw., 2008, 2, (1), pp. 316.
    25. 25)
      • 11. Calderon, N.A., Fisher, B., Hemsley, J., et al: ‘Mixed-initiative social media analytics at the world’. IEEE Int. Conf. Big Data, 2015, pp. 16781687.
    26. 26)
      • 28. Bontcheva, K., Derczynski, L., Funk, A., et al: ‘TwitIE: an open-source information extraction pipeline for microblog text’. Proc. Int. Conf. Recent Advances in Natural Language Processing Association for Computational Linguistics, 2013.
    27. 27)
      • 16. Poblete, B., Garcia, R., Mendoza, M., et al: ‘Do all birds tweet the same? Characterizing twitter around the world categories and subject descriptors’. Int. Conf. Information and Knowledge Management, 2011, pp. 10251030.
    28. 28)
      • 23. da Silva Conrado, M., Felippo, A., Pardo, T., et al: ‘A survey of automatic term extraction for Brazilian portuguese’, J. Braz. Comput. Soc., 2014, 20, (1), p. 12.
    29. 29)
      • 26. Budgen, D., Turner, M., Brereton, P., et al: ‘Using mapping studies in software engineering’. Proc. Psychology of Programming Interest (PPIG) Group, 2008, p. 195204.
    30. 30)
      • 31. Takçıand, H., GüNgöR, T.: ‘A high performance centroid-based classification approach for language identification’, Pattern Recognit. Lett., 2012, 33, (16), pp. 20772084.
    31. 31)
      • 2. Delen, D., Crossland, M.D.: ‘Seeding the survey and analysis of research literature with text mining’, Expert Syst. Appl., 2008, 34, pp. 17071720.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-sen.2016.0226
Loading

Related content

content/journals/10.1049/iet-sen.2016.0226
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address