access icon free Enhancing transport data collection through social media sources: methods, challenges and opportunities for textual data

Social media data now enriches and supplements information flow in various sectors of society. The question addressed here is whether social media can act as a credible information source of sufficient quality to meet the needs of transport planners, operators, policy makers and the travelling public. A typology of primary transport data needs, current and new data sources is initially established, following which this study focuses on social media textual data in particular. Three sub-questions are investigated: the potential to use social media data alongside existing transport data, the technical challenges in extracting transport-relevant information from social media and the wider barriers to the uptake of this data. Following an overview of the text mining process to extract relevant information from the corpus, a review of the challenges this approach holds for the transport sector is given. These include ontologies, sentiment analysis, location names and measuring accuracy. Finally, institutional issues in the greater use of social media are highlighted, concluding that social media information has not yet been fully explored. The contribution of this study is in scoping the technical challenges in mining social media data within the transport context, laying the foundation for further research in this field.

Inspec keywords: social networking (online); text analysis; data mining

Other keywords: text mining process; data collection; location names; social media sources; institutional issues; data mining; information extraction; ontologies; measuring accuracy; sentiment analysis

Subjects: Knowledge engineering techniques; Document processing and analysis techniques; Information networks; Information analysis and indexing

References

    1. 1)
      • 27. Nocera, S., Tonin, S.: ‘A joint probability density function for reducing the uncertainty of marginal social cost of carbon evaluation in transport planning’. Advances in Intelligent Systems and Computing, 2013, accepted for publication.
    2. 2)
      • 62. Khanwalkar, S., Seldin, M., Srivastava, A., Kumar, A., Colbath, S.: ‘Content-based geo-location detection for placing tweets pertaining to trending news on map’. Fourth Int. Workshop on Mining Ubiquitous and Social Environments (MUSE), Prague, Czech Republic, September 2013.
    3. 3)
    4. 4)
      • 65. Tapscott, D., Williams, A.D., Herman, D.: ‘Government 2.0: transforming government and governance for the twenty-first century’, New Paradigm, January 2008. Available at http://www.mobility.grchina.com/innovation/gov_transforminggovernment.pdf.
    5. 5)
      • 46. Pak, A., Paroubek, P.: ‘Twitter as a corpus for sentiment analysis and opinion mining’, Computer, 2010, 10, pp. 13201326.
    6. 6)
      • 66. Cheng, Z., Caverlee, J., Lee, K.: ‘You are where you tweet: a content-based approach to geo-locating Twitter users’. Proc. of CIKM'10 Proc. of the 19th ACM Int. Conf. on Information and Knowledge Management, New York, 2010, pp. 759768.
    7. 7)
      • 20. Grant-Muller, S.M., Usher, M.: ‘Intelligent transport systems: the propensity for environmental and economic benefits’. Technological Forecasting and Social Change, 2013, doi 10.1016/j.techfore.2013.06.010.
    8. 8)
      • 64. Paradesi, S.: ‘Geotagging tweets using their content’. Proc. of the 24th Int. Florida Artificial Intelligence Research Society Conf., 2011, pp. 335356.
    9. 9)
      • 72. Third Annual State DOT Social Media Survey, AASHTO, September 2012. Available at http://www.communications.transportation.org/Documents/Social_Media_Survey_2012.pdf.
    10. 10)
    11. 11)
      • 11. Schweitzer, L.: ‘How are we doing? Opinion mining customer sentiment in US transit agencies and airlines via twitter’. Presented at the 91th Annual Meeting of the Transportation Research Board, Washington, DC, 2012.
    12. 12)
      • 55. Chung, J., Mustafaraj, E.: ‘Can collective sentiment expressed on twitter predict political elections?’. Proc. of the 25th AAAI Conf. on Artificial Intelligence, San Francisco, CA, USA, 2011, pp. 17701771.
    13. 13)
    14. 14)
      • 76. Moss, M.L., Kaufman, S.: ‘How Social Media Moves in New York – Final report’. Available at http://www.utrc2.org/sites/default/files/pubs/Final-Report-Social-Media-NYC.pdf, accessed 1 August 2013.
    15. 15)
      • 44. Grosenick, S.: ‘Real-Time Traffic Prediction Improvement through Semantic Mining of Social Networks’. Thesis (Master's), University of Washington, 2012. URI available at http://www.hdl.handle.net/1773/20911.
    16. 16)
      • 51. Davidov, D., Sur, O., Rappoport, A.: ‘Semi-supervised recognition of sarcastic sentences in Twitter and Amazon’. Proc. of the Fourteenth Conf. on Computational Natural Language Learning, Uppsala, Sweden, 2010, pp. 107116.
    17. 17)
    18. 18)
      • 9. Mai, E., Hranac, R.: ‘Twitter interactions as a data source for transportation incidents’. TRB 92nd Annual Meeting Compendium of Papers, 2013.
    19. 19)
      • 15. Barron, E., Peck, S., Venner, M., Malley, W.G.: ‘Suggested Practices Guidance Resource’, NCHRP 25–25 TASK 80, September 2013.
    20. 20)
      • 10. Pender, B., Currie, G., Delbosc, A., Shiwakoti, N.: ‘Social media use in unplanned passenger rail disruptions – an international study’. TRB 93rd Annual Meeting, 2014.
    21. 21)
      • 5. Koppel, M., Shtrimberg, I.: ‘Good news or bad news? Let the market decide’. AAAI Spring Symp. on Exploring Attitude and Affect in Text: Theories and Applications, 2004.
    22. 22)
      • 33. Nugroho, A.S., Endarnoto, S.K., Pradipta, S., Purnama, J.: ‘Traffic condition information extraction amp; visualization from social media twitter for android mobile application’. Proc. of the Int. Conf. on Electrical Engineering and Informatics (ICEEI), 2011.
    23. 23)
    24. 24)
      • 29. Schulz, A., Ristoski, P., Paulheim, H.: ‘I see a car crash: real-time detection of small scale incidents in microblogs’. ‘The semantic web: ESWC 2013 satellite events’, Berlin Heidelberg, New York, 2013 (LNCS, 7955), pp. 2233.
    25. 25)
      • 31. Ritter, A., Clark, S., Mausam, , Etzioni, O.: ‘Named entity recognition in tweets: an experimental study’. Proc. of the Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2011.
    26. 26)
    27. 27)
      • 40. Li, L., Wu, W., Liu, N.: ‘Ontology model for situation awareness of city tunnel traffic’. Proc of the Second Int. Symp. on Computer, Communication, Control and Automation (ISCCCA-13), Atlantis Press, Paris, France, 2013, pp. 601603.
    28. 28)
      • 18. Nocera, S.: ‘The key role of quality assessment in public transport policy’, Traffic Eng. Control, 2011, 52, (9), pp. 394398.
    29. 29)
    30. 30)
    31. 31)
      • 67. Bry, F., Lorenz, B., Ohlbach, H.J., Rosner, M.: ‘A geospatial world model for the semantic web’. Principles and Practice of Semantic Web Reasoning, Berlin, Heidelberg2005 (LNCS, 3703), pp. 145159.
    32. 32)
      • 73. Use of Social Networking to promote public transport and sustainable travel. Available at http://www.analytics.co.uk/resources/Use+of+Social+Media+to+promote+PT+$26+Sustainable+Travel.pdf, accessed 1 August 2013.
    33. 33)
      • 49. Sood, S., Owsley, S., Hammond, K., Birnbaum, L.: ‘Reasoning through search: a novel approach to sentiment classification’, WWW2007, North Western University, Electrical Engineering and Computer Science Department Technical Report, NWU-EECS-07–05, Banff, Canada, 21 July 2007, http://www.infolab.northwestern.edu/media/papers/paper10171.pdf, accessed 7th July 2013.
    34. 34)
      • 6. O'Connor, B., Balasubramanyan, R., Routledge, B.R., Smith, N.A.: ‘From tweets to polls: linking text sentiment to public opinion time series’. Proc. of the Fourth Int. AAAI Conf. on Weblogs and Social Media (ICWSM), Washington, DC, 2010, pp. 122129.
    35. 35)
      • 19. Innovateuk.org.: ‘Common Highways Agency Rijkswaterstaat Model (CHARM)’, 2013. [online] Available at https://www.innovateuk.org/documents/1524978/1866952/CHARM+business+specification/b5f6281d-8701-4287-84e9-c00d266a15b3, accessed: 11 December 2013.
    36. 36)
      • 22. Libardo, A., Nocera, S.: ‘Transportation elasticity for the analysis of Italian transportation demand on a regional scale’, Traffic Eng. Control, 2008, 49, (5), pp. 187192.
    37. 37)
    38. 38)
    39. 39)
      • 42. Wang, J., Ding, Z., Jiang, C.: ‘An ontology-based public transport query system’. Proc. of the First Int. Conf. on Semantics and Grid’, SKG, 2005.
    40. 40)
    41. 41)
    42. 42)
    43. 43)
      • 12. Collins, C., Hasan, S., Ukkusuri, S.V.: ‘A novel transit rider satisfaction metric: rider sentiments measured from online social media data’, J. Public Transp., 2013, 16, (2), pp. 2145.
    44. 44)
    45. 45)
    46. 46)
      • 32. Oppenheim, N.: ‘Urban travel demand modeling: from individual choices to general equilibrium’ (John Wiley and Sons, Inc., New York, 1995).
    47. 47)
      • 69. Zimmer, C.G.: ‘Social Media Use in Local Public Agencies: A Study of California's Cities’, Master thesis, Department of Public Policy and Administration, California State University, Sacramento, 2012.
    48. 48)
      • 35. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: ‘Freebase: a collaboratively created graph database for structuring human knowledge’. Proc. of the ACM SIGMOD Int. Conf. on Management of Data, Vancouver, BC, Canada, 2008, pp. 12471250, ISBN 978-1-60558-102-6.
    49. 49)
      • 50. Wiegand, M., Balahur, A., Roth, B., Klakow, D., Montoyo, A.: ‘A survey on the role of negation in sentiment analysis’. Proc. of the Workshop on Negation and Speculation in Natural Language Processing (NeSp-NLP '10), Association for Computational Linguistics, Stroudsburg, PA, USA, 2010, pp. 6068.
    50. 50)
      • 1. Kushal, D., Lawrence, S., Pennock, D.M.: ‘Mining the peanut gallery: opinion extraction and semantic classification of product reviews’. Proc. of the 12th Int. Conf. on World Wide Web, 2003, pp. 519528.
    51. 51)
      • 60. Priedhorsky, R., Culotta, A., Del Valle, S.Y.: ‘Inferring the origin locations of tweets with quantitative confidence’. Proc. of the 17th ACM Conf. on Computer Supportive Cooperative Work and Social Computing (CSCW), Baltimore, MD, 15–19 February 2014.
    52. 52)
      • 43. Houda, M., Khemaja, M., Oliveira, K., Abed, M.: ‘A public transportation ontology to support user travel planning’. Proc. of the Fourth Int. Conf. on Research Challenges in Information Science (RCIS), Nice, France, 2010, pp. 127136.
    53. 53)
      • 28. Aggarwal, C.C., Zhai, C.-X.: ‘Mining text data’ (Springer, 2012).
    54. 54)
    55. 55)
      • 36. Madkour, M., Maach, A.: ‘Ontology-based context modeling for vehicle-aware services’, J. Theor. Appl. Inf. Technol., 2011, 34, (2), pp. 158166.
    56. 56)
      • 74. Transportation Safety Board of Canada. Social media terms of use. Available at http://www.bst-tsb.gc.ca/eng/social, accessed 1 August 2013.
    57. 57)
      • 78. Virginia Department of Transportation. VDOT on Social Media. Available at http://www.virginiadot.org/newsroom/social_media.asp, accessed 1 August 2013.
    58. 58)
      • 61. Eisenstein, J., O'Connor, B., Smith, N.A., Xing, E.P.: ‘A latent variable model for geographic lexical variation’. Proc. of Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA, 2010, pp. 12771287.
    59. 59)
      • 30. Li, C., Weng, J., He, Q., et al: ‘TwiNER: named entity recognition in targeted twitter stream’. Proc. of the Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2012.
    60. 60)
      • 63. Amitay, E., Har'El, N., Sivan, R., Soffer, A.: ‘Web-a-where: ‘Geotagging web content’. SIGIR'04 Proc. of the 27th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2004, pp. 273280.
    61. 61)
      • 75. Minnesota Department of Transportation, Office of Policy Analysis: ‘Use of Social Media by Minnesota Cities and Counties’, Transportation Research Synthesis, November 2011. Available at http://www.lrrb.org/media/reports/TRS1104.pdf, accessed August 2013.
    62. 62)
      • 70. Cotey, A.: ‘Social media: transit agencies connect with riders in new ways’, Progressive Railroading, January 2011. Available at http://www.progressiverailroading.com/passenger_rail/article/Social-media-Transit-agencies-connect-with-riders-in-new-ways--25447.
    63. 63)
    64. 64)
    65. 65)
      • 34. Kaur, A., Gupta, V.: ‘A survey on sentiment analysis and opinion mining techniques’, J. Emerging Technol. Web Intell., 2013, 5, (4)pp. 367371.
    66. 66)
    67. 67)
      • 56. Bie, J., Bijlsma, M., Broll, G., et al: ‘Move better with tripzoom’, Int. J. Adv. Life Sci., 2012, 4, pp. 125135.
    68. 68)
      • 47. Musakwa, W.: ‘The use of social media in public transit systems: the case of the Gautrain, Gauteng province, South Africa: analysis and lessons learnt’. Proc. REAL CORP 2014 Tagungsband, Vienna, Austria, 21–23 May 2014. Available at http://www.corp.at.
    69. 69)
      • 71. Barron, E., Peck, S., Venner, M., Malley, W.G.: ‘Potential Use of Social Media in the NEPA Process’, NCHRP 25–25 TASK 80, September 2013.
    70. 70)
      • 68. Gao, L., Zhang, Z., Wu, H.: ‘Analyzing the use of Facebook page among state DOTs’. TRB 92nd Annual Meeting Compendium of Papers, 2013.
    71. 71)
      • 17. Nocera, S.: ‘An operational approach for quality evaluation in public transport services’, Ing. Ferrov., 2010, 65, (4), pp. 363383.
    72. 72)
      • 53. Bollen, J., Pepe, A., Mao, H.: ‘Modeling public mood and emotion: twitter sentiment and socio-economic phenomena’. Proc. of the Fifth Int. AAAI Conf. on Weblogs and Social Media (ICWSM), Barcelona, Spain, 17–21 July 2011, pp. 450453.
    73. 73)
      • 39. Trappey, C., Wu, H.Y., Liu, K.L.: ‘Knowledge discovery of customer satisfaction and dissatisfaction using ontology-based text analysis of critical incident dialogues’. Proc. of the 2012 IEEE 16th Int. Conf. on Computer Supported Cooperative Work in Design, Wuhan, 2012, pp. 470475.
    74. 74)
      • 77. Shepherd, P.A.: ‘The Transportation World Should Embrace Social Media... Carefully’, Eno Center of Transportation. Available at http://www.enotrans.org/ctp-blog/the-transportation-world-should-embrace-social-media-carefully, accessed 1 August 2013.
    75. 75)
    76. 76)
      • 3. Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: ‘Predicting elections with twitter: what 140 characters reveal about political sentiment’. Proc. of the Fourth Int. AAAI Conf. on Weblogs and Social Media, 2010.
    77. 77)
      • 14. Efthymiou, D., Antoniou, C.: ‘Use of social media for transport data collection, Procedia’, Soc. Behav. Sci., 2012, 48, pp. 775785, ISSN 1877–0428. Available at http://www.dx.doi.org/10.1016/j.sbspro.2012.06.1055.
    78. 78)
      • 41. Becker, M., Smith, S.F.: ‘An ontology for multi-modal transportation planning and scheduling’, Technical Report, CMU-RI-TR-98-15, Robotics Institute, Carnegie Mellon University, 1997.
    79. 79)
      • 16. Manning, C., Raghavan, P., Schtze, H.: ‘Introduction to information retrieval’ (Cambridge University Press, NY, USA, 2008).
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-its.2013.0214
Loading

Related content

content/journals/10.1049/iet-its.2013.0214
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading