Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Word cloud segmentation for simplified exploration of trending topics on Twitter

Twitter is a popular microblogging platform, with 310 million monthly active users as of the first quarter of 2016. It is a rapidly growing microblogging platform where people share opinions, news on any topic of their interest. More than 7000 tweets are posted every second. Due to the enormous volume of data being generated, it becomes difficult to extract useful/meaningful information. Tweets collected from Twitter on a certain topic may consist of numerous conversation threads about relevant sub-topics. However, it is difficult to discern these sub-topics if the data is visualised as a single word cloud. The authors transform a corpus of tweets to a spectral domain and evaluate the results from a number of clustering algorithms, including K-means, latent semantic indexing and non-negative matrix factorisation to construct clustered word clouds that helps identify sub-topics under a broader topic.

References

    1. 1)
      • 40. Huang, K., Sidiropoulos, N.D., Swami, A.: ‘Non-negative matrix factorization revisited: uniqueness and algorithm for symmetric decomposition’, IEEE Trans. Signal Process., 2014, 62, (1), pp. 211224.
    2. 2)
      • 41. DAWN NEWS – Pakistan mourns 141 killed in Taliban school carnage. Available at http://goo.gl/PKVZyx, accessed 1 August 2016.
    3. 3)
      • 22. Ilyas, M.U., Radha, H.: ‘A KLT-inspired node centrality for identifying influential neighborhoods in graphs’. 2010 44th Annual Conf. on Information Sciences and Systems (CISS), 2010, pp. 17.
    4. 4)
      • 8. REST APIs. Available at https://dev.twitter.com/rest/public, accessed 13 March 2015.
    5. 5)
      • 5. Amazing Twitter Statistics and Facts. Available at http://goo.gl/2Xr9X, accessed 17 April 2016.
    6. 6)
      • 18. Yelp. Available at http://www.yelp.com/, accessed 28 March 2015.
    7. 7)
      • 1. Han, J., Pei, J., Kamber, M.: ‘DataMining: concepts and techniques’ (Elsevier, 2011).
    8. 8)
      • 25. Ghosh, S., Dubey, S.K.: ‘Comparative analysis of K-Means and fuzzy C means algorithms’, Int. J. Adv. Comput. Sci. Appl., 2013, 4, (4), pp. 3539.
    9. 9)
      • 19. McNaught, C., Lam, P.: ‘Using Wordle as a supplementary research tool’, Qual. Rep., 2010, 15, (3), pp. 630643.
    10. 10)
      • 3. Aggarwal, C.C., Zhai, C.X.: ‘A survey of text clustering algorithms’, in Aggarwal, C.C., Zhai, C.X. (EDs.): ‘Mining text data’ (Springer, 2012), pp. 77128.
    11. 11)
      • 7. Second – Internet Live Stats. Available at http://www.internetlivestats.com/one-second/#tweets-band, accessed 17 April 2016.
    12. 12)
      • 38. scikit-learn Documentation. Available at http://scikit-learn.org/stable/documentation.html, accessed 13 March 2015.
    13. 13)
      • 46. Jinja2. Available at http://jinja.pocoo.org/docs/dev/, accessed 26 April 2016.
    14. 14)
      • 14. Jiang, M., Yan, W., Wang, X., et al: ‘Wikipedia based approach for clustering keyword of reviews’, J. Softw., 2014, 9, (9), pp. 22462250.
    15. 15)
      • 39. gensim Documentation. Available at https://radimrehurek.com/gensim/, accessed 13 March 2015.
    16. 16)
      • 15. Guan, Y., Wang, X.-L., Kong, X.-Y., et al: ‘Quantifying semantic similarity of Chinese words from Hownet’. Proc. Int. Conf. on Machine Learning and Cybernetics, 2002, vol. 1, pp. 234239.
    17. 17)
      • 12. Yatani, K., Novati, M., Trusty, A., et al: ‘Review spotlight: a user interface for summarizing user-generated reviews using adjective-noun word pairs’. Proc. of the SIGCHI Conf. on Human Factors in Computing Systems, 2011, pp. 15411550.
    18. 18)
      • 6. About Twitter. Available at https://about.twitter.com/company, accessed 9 March 2015.
    19. 19)
      • 9. Wordle. Available at http://www.wordle.net/, accessed 28 March 2015.
    20. 20)
      • 11. Liu, S., Zhou, M.X., Pan, S., et al: ‘Tiara: interactive, topic-based visual text summarization and analysis’, ACM Trans. Intell. Syst. Technol., 2012, 3, (2), p. 25.
    21. 21)
      • 13. IN-SPIRE. Available at http://in-spire.pnnl.gov/about.stm, accessed 28 March 2015.
    22. 22)
      • 33. Python Documentation – urllib. Available at https://docs.python.org/2/library/urllib.html, accessed 13 March 2015.
    23. 23)
      • 4. Twitter Support. Available at https://support.twitter.com/articles/15367-posting-a-tweet, accessed 9 March 2015.
    24. 24)
      • 26. Jain, E., Jain, S.K.: ‘Using Mahout for clustering similar Twitter users: Performance evaluation of k-means and its comparison with fuzzy k-means’. 2014 Int. Conf. on Computer and Communication Technology (ICCCT), September 2014, pp. 2933.
    25. 25)
      • 16. Pedersen, T., Patwardhan, S., Michelizzi, J.: ‘WordNet: similarity: measuring the relatedness of concepts’. Demonstration papers at HLT-NAACL 2004, 2004, Association for Computational Linguistics, pp. 3841.
    26. 26)
      • 31. Twitter Search. Available at https://github.com/ckoepp/TwitterSearch, accessed 7 April 2015.
    27. 27)
      • 45. Bootstrap. Available at http://getbootstrap.com/, accessed 26 April 2016.
    28. 28)
      • 17. Wang, J., Zhao, J., Guo, S., et al: ‘ReCloud: semantics-based word cloud visualization of user reviews’. Proc. 2014 Graphics Interface Conf., GI ‘14, Toronto, Ontario, Canada, 2014, Canadian Information Processing Society, pp. 151158.
    29. 29)
      • 10. Infomous. Available at http://infomous.com/, accessed 28 March 2015.
    30. 30)
      • 43. Amazon Web Services. Available at https://aws.amazon.com/, accessed 26 April 2016.
    31. 31)
      • 30. Xu, W., Liu, X., Gong, Y.: ‘Document clustering based on non-negative matrix factorization’. Proc. of the 26th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2003, pp. 267273.
    32. 32)
      • 29. Zhang, W., Yoshida, T., Tang, X.: ‘A comparative study of TF-IDF, LSI and multi-words for text classification’, Expert Syst. Appl., 2011, 38, (3), pp. 27582765.
    33. 33)
      • 37. Xu, Y., Qu, W., Li, Z., et al: ‘Efficient k-means++ approximation with mapreduce’, IEEE Trans. Parallel Distrib. Syst., 2014, 25, (12), pp. 31353144.
    34. 34)
      • 21. Cui, W., Wu, Y., Liu, S., et al: ‘Context preserving dynamic word cloud visualization’. Proc. IEEE Pacific Visualization Symp. (PacificVis), 2010, pp. 121128.
    35. 35)
      • 34. Natural Language Toolkit – NLTK. Available at http://www.nltk.org/, accessed 13 March 2015.
    36. 36)
      • 27. Kumar, A., Sinha, R., Bhattacherjee, V., et al: ‘Modeling using K-means clustering algorithm’. 2012 1st Int. Conf. on Recent Advances in Information Technology (RAIT), 2012, pp. 554558.
    37. 37)
      • 24. Jain, A.K.: ‘Data Clustering: 50 years beyond K-means’, Pattern Recognit. Lett., 2010, 31, (8), pp. 651666.
    38. 38)
      • 23. Ilyas, M.U., Radha, H.: ‘Identifying influential nodes in online social networks using principal component centrality’. 2011 IEEE Int. Conf. on Communications (ICC), 2011, pp. 15.
    39. 39)
      • 2. Schenker, A., Kandel, A., Bunke, H., et al: ‘Graph-theoretic techniques for web content mining’, vol. 62 (World Scientific, 2005).
    40. 40)
      • 36. Aggarwal, C.C., Reddy, C.K.: ‘Data clustering: algorithms and applications’ (CRC Press, 2013).
    41. 41)
      • 32. Beautiful Soup Documentation. Available at http://www.crummy.com/software/BeautifulSoup/bs4/doc/, accessed 13 March 2015.
    42. 42)
      • 20. Kim, K.T., Ko, S., Elmqvist, N., et al: ‘WordBridge: using composite tag clouds in node-link diagrams for visualizing content and relations in text corpora’. In Proc. 44th Hawaii Int. Conf. on System Sciences (HICSS), 2011, pp. 18.
    43. 43)
      • 42. Wikipedia. 2014 Sydney Hostage Crisis. Available at https://en.wikipedia.org/wiki/2014_Sydney_hostage_crisis, accessed 10 December 2016.
    44. 44)
      • 28. Rad, M.P., Hasanzadeh, E., Rokny, H.A.: ‘Text clustering on latent semantic indexing with particle swarm optimization (PSO) algorithm’, Int. J. Phys. Sci., 2012, 7, (1), pp. 16120.
    45. 45)
      • 35. NetworkX. Available at https://networkx.github.io/, accessed 13 March 2015.
    46. 46)
      • 44. Flask (A Python Framework). Available at http://flask.pocoo.org/, accessed 26 April 2016.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-sen.2016.0307
Loading

Related content

content/journals/10.1049/iet-sen.2016.0307
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address