On-line social platforms implement moderation mechanisms to filter out unwanted content and to take action against possible cases of verbal aggression and abuse, sexual harassment, and such. In this study, the authors investigate chat biometrics, the identification of users from their verbal behaviour on a social platform. The typical application scenarios are the re-identification of banned users, returning under different identities, and aggressors operating through multiple fake accounts. They propose a novel processing pipeline, and contrast the problem with the authorship recognition problem, which is relatively well-studied in the literature. They evaluate the proposed approach on a large corpus of multiparty chat records in Turkish, which they have previously collected from a multiplayer game environment. They also introduce a new corpus in this study, collected from a well-known Turkish social platform called Ekşisözlük, in order to test the robustness of the system across domain changes, as well as on Portuguese and English news datasets to test it on different languages. They evaluate both instance-based and profile-based approaches, and provide detailed analyses with regards to the required amount of text to identify a person reliably.

References

1. 1)
  - 20. Potha, N., Stamatatos, E.: ‘A profile-based method for authorship verification’, in Likas, A., Blekas, K., Kalles, D. (Eds.): ‘Artificial intelligence: methods and applications’, (Springer, Cham, 2014), pp. 313–326.
2. 2)
  - 66. Lewis, D.D., Yang, Y., Rose, T.G., et al: ‘RCV1: a new benchmark collection for text categorization research’, J. Mach. Learn. Res., 2004, 5, (Apr), pp. 361–397.
3. 3)
  - 21. Rocha, A., Scheirer, W.J., Forstall, C.W., et al: ‘Authorship attribution for social media forensics’, IEEE Trans. on Inf. Forensics Secur., 2017, 12, (1), pp. 5–33.
4. 4)
  - 14. De Vel, O., Anderson, A., Corney, M., et al: ‘Mining E-mail content for author identification forensics’, ACM Sigmod Rec., 2001, 30, (4), pp. 55–64.
5. 5)
  - 69. Eryigit, G.: ‘ITU turkish NLP Web service’. Proc. European Chapter of the Association for Computational Linguistics, 2014.
6. 6)
  - 56. Eder, M.: ‘Does size matter? authorship attribution, small samples, big problem’, Digit. Scholarship In The Humanit., 2015, 30, (2), pp. 167–182.
7. 7)
  - 31. Frantzeskou, G., Stamatatos, E., Gritzalis, S., et al: ‘Identifying authorship by byte-level N-grams: the source code author profile (SCAP) method’, Int. J. Digit. Evidence, 2007, 6, (1), pp. 1–18.
8. 8)
  - 50. Seroussi, Y., Zukerman, I., Bohnert, F.: ‘Authorship attribution with topic models’, Comput. Linguist., 2014, 40, (2), pp. 269–310.
9. 9)
  - 22. Ruder, S., Ghaffari, P., Breslin, J.G.: ‘Character-level and multi-channel convolutional neural networks for large-scale authorship attribution’, arXiv preprint arXiv:1609.06686, 2016.
10. 10)
  - 5. Gray, A., Sallis, P., Macdonell, S.: ‘Software forensics: extending authorship analysis techniques to computer programs’. Proc. Int. Assoc. of Forensic Linguists, 1997.
11. 11)
  - 26. Kešelj, V., Peng, F., Cercone, N., et al: ‘N-Gram-Based author profiles for authorship attribution’. Proc. of the Conf. Pacific Association for Computational Linguistics, PACLING, 2003, vol. 3, pp. 255–264.
12. 12)
  - 38. Layton, R., Watters, P., Dazeley, R.: ‘Recentred local profiles for authorship attribution’, Nat. Lang. Eng., 2012, 18, (03), pp. 293–312.
13. 13)
  - 4. Kuzu, R.S., Balci, K., Salah, A.A.: ‘Authorship recognition in a multiparty chat scenario’. 4th IEEE Int. Conf. Biometrics and Forensics, 2016.
14. 14)
  - 15. Sanderson, C., Guenter, S.: ‘On authorship attribution via markov chains and sequence kernels’, Proc. Int. Conf. on Pattern Recognition, 2006, vol. 3, pp. 437–440.
15. 15)
  - 47. Schwartz, R., Tsur, O., Rappoport, A., et al: ‘Authorship attribution of micro-messages’. Conf. on Empirical Methods in Natural Language Processing, 2013, vol. 3, pp. 1880–1891.
16. 16)
  - 12. Zu Eissen, S.M., Stein, B., Kulig, M.: ‘Plagiarism detection without reference collections’, in Decker, R., Lenz, H.J. (Eds.): ‘Advances in data Analysis’ (Springer, Berlin, Heidelberg, 2007), pp. 359–366.
17. 17)
  - 61. Ali, N., Price, M., Yampolskiy, R.: ‘BLN-Gram-TF-ITF as a New feature for authorship identification’. Academy of Science and Engineering (ASE) BIGDATA/SOCIALCOM/CYBERSECURITY Conf., 2014.
18. 18)
  - 64. Huang, G.-B., Zhou, H., Ding, X., et al: ‘Extreme learning machine for regression and multiclass classification’, IEEE Trans. Syst. Man, Cybern. B (Cybernetics), 2012, 42, (2), pp. 513–529.
19. 19)
  - 58. Salton, G., McGill, M.J.: ‘Introduction to modern information retrieval’ (McGraw-Hill, Inc., 1986).
20. 20)
  - 52. Overdorf, R., Greenstadt, R.: ‘Blogs, twitter feeds, and reddit comments: crossdomain authorship attribution’, Proc. Priv. Enhancing Technol., 2016, (3), pp. 155–171.
21. 21)
  - 49. Qian, T., Liu, B., Chen, L., et al: ‘Tri-Training for authorship attribution with limited training data’, Assoc. Comput. Linguist., 2014, 2, (2), pp. 345–351.
22. 22)
  - 35. Solorio, T., Pillay, S., Raghavan, S., et al: ‘Modality specific meta features for authorship attribution in Web forum posts’. Int. Joint Conf. on Natural Language Processing, 2011, pp. 156–164.
23. 23)
  - 7. Mosteller, F., Wallace, D.L.: ‘Inference in an authorship problem: a comparative study of discrimination methods applied to the authorship of the disputed federalist papers’, J. Am. Stat. Assoc., 1963, 58, (302), pp. 275–309.
24. 24)
  - 1. Balci, K., Salah, A. A.: ‘Automatic analysis and identification of verbal aggression and abusive behaviors for online social games’, Comput. Hum. Behav., 2015, 53, pp. 517–526.
25. 25)
  - 46. Iqbal, F., Binsalleeh, H., Fung, B.C., et al: ‘A unified data mining solution for authorship analysis in anonymous textual communications’, Inf. Sci., 2013, 231, pp. 98–112.
26. 26)
  - 3. Kucukyilmaz, T., Cambazoglu, B.B., Aykanat, C., et al: ‘Chat mining: predicting user and message attributes in computer-mediated communication’, Inf. Process. Manage., 2008, 44, (4), pp. 1448–1466.
27. 27)
  - 10. Jain, A.K., Ross, A., Prabhakar, S.: ‘An introduction to biometric recognition’, IEEE Trans. on Circuits Syst. Video Technol., 2004, 14, (1), pp. 4–20.
28. 28)
  - 55. Brennan, M., Afroz, S., Greenstadt, R.: ‘Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity’ ACM Trans. on Inf Syst. Secur., 2012, 15, (3), p. 12.
29. 29)
  - 36. Escalante, H.J., Solorio, T., Montes-y Gómez, M.: ‘Local histograms of character N-grams for authorship attribution’. Proc. ACL, 2011, pp. 288–298.
30. 30)
  - 53. Stamatatos, E.: ‘Authorship attribution using text distortion’. Proc. European Chapter of the Association for Computational Linguistics, 2017, pp. 1138–1149.
31. 31)
  - 62. Landauer, T.K., Foltz, P.W., Laham, D.: ‘An introduction to latent semantic analysis’, Discourse Process., 1998, 25, (2-3), pp. 259–284.
32. 32)
  - 19. Šarkute, L., Utka, A.: ‘The effect of author Set size in authorship attribution for Lithuanian’. Proc. NODALIDA, 2015, p. 87.
33. 33)
  - 18. Diri, B., Amasyalı, M.: ‘Automatic author detection for turkish texts’. Proc. ICANN/ICONIP, 2003, pp. 138–141.
34. 34)
  - 37. Oliveira, W., Justino, E., Oliveira, L.: ‘Authorship attribution of documents using data compression as a classifier’. Proc. World Congress on Engineering and Computer Science, 2012, vol. 1.
35. 35)
  - 70. Lambert, D.C.: ‘ELM v0.3 edition’, 2013. Available: https://github.com/dclambert/Python-ELM.
36. 36)
  - 44. Roffo, G., Cristani, M., Bazzani, L., et al: ‘Trusting skype: learning the way people chat for fast user recognition and verification’. CVPR Workshops, 2013, pp. 748–754.
37. 37)
  - 25. Shrestha, P., Sierra, S., González, F.A., et al: ‘Convolutional neural networks for authorship attribution of short texts’. Proc. EACL, 2017, pp. 669–674.
38. 38)
  - 45. Brocardo, M.L., Traore, I., Saad, S., et al: ‘Authorship verification for short messages using stylometry’. Proc. IEEE Computer, Information, and Telecommunication Systems, 2013.
39. 39)
  - 42. Inches, G., Harvey, M., Crestani, F.: ‘Finding participants in a chat: authorship attribution for conversational documents’, Social Computing (SocialCom), 2013 Int. Conf. on, 2013, pp. 272–279.
40. 40)
  - 59. Soucy, P., Mineau, G.W.: ‘Beyond tfidf weighting for text categorization in the vector space model’. Int. Joint Conf. on Artificial Intelligence, 2005, vol. 5, pp. 1130–1135.
41. 41)
  - 9. Zheng, R., Li, J., Chen, H., et al: ‘A framework for authorship identification of online messages: writing-style features and classification techniques’, J. Am. Soc. Inf. Sci. Technol., 2006, 57, (3), pp. 378–393.
42. 42)
  - 68. Varela, P.J.: ‘O Uso de atributos estilométricos na identificação da autoria de textos’, Ph.D. thesis, Pontifícia Universidade Católica do Paraná, 2010.
43. 43)
  - 32. Estival, D., Gaustad, T., Pham, S.B., et al: ‘Author profiling for English emails’. Proc. Pacific Association for Computational Linguistics, 2007, pp. 263–272.
44. 44)
  - 13. Juola, P.: ‘Authorship attribution for electronic documents’, in Olivier, M.S., Shenoi, S. (Eds.): ‘Advances in digital forensics II’, (Springer, Boston, MA, 2006), pp. 119–130.
45. 45)
  - 27. Clough, P.: ‘Old and new challenges in automatic plagiarism detection’. National Plagiarism Advisory Service, 2003, pp. 391–407.
46. 46)
  - 71. Clark, J.H., Hannon, C.J.: ‘A classifier system for author recognition using synonym-based features’. Mexican Int. Conf. on AI, 2007, pp. 839–849.
47. 47)
  - 34. Koppel, M., Schler, J., Argamon, S.: ‘Authorship attribution in the wild’, Lang. Res. Eval., 2011, 45, (1), pp. 83–94.
48. 48)
  - 57. Aydın Oktay, E., Balcı, K., Salah, A.A.: ‘Automatic assessment of dimensional affective content in turkish multi-party chat messages’. Proc. of the Int. Workshop on Emotion Representations and Modelling for Companion Technologies, 2015, pp. 19–24.
49. 49)
  - 60. Kuzu, R.S., Haznedaroğlu, A., Arslan, M.L.: ‘Topic identification for turkish call center records’. Proc. IEEE SIU, 2012.
50. 50)
  - 65. Zheng, W., Qian, Y., Lu, H.: ‘Text categorization based on regularization extreme learning machine’, Neural Comput. Appl., 2013, 22, (3-4), pp. 447–456.
51. 51)
  - 39. Savoy, J.: ‘Authorship attribution based on specific vocabulary’, ACM Trans. Inf. Syst. (TOIS), 2012, 30, (2), p. 12.
52. 52)
  - 67. Potthast, M., Braun, S., Buz, T., et al: ‘Who wrote the Web? Revisiting influential author identification research applicable to information retrieval’. European Conf. on Information Retrieval, (Springer, 2016), pp. 393–407.
53. 53)
  - 17. Amasyalı, M. F., Diri, B.: ‘Automatic turkish text categorization in terms of author, genre and gender’,in Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (Eds.): ‘Natural language processing and information systems’, (Springer, Berlin, Heidelberg, 2006), pp. 221–226.
54. 54)
  - 51. Segarra, S., Eisen, M., Ribeiro, A.: ‘Authorship attribution through function word adjacency networks’, IEEE Trans. on Signal Process., 2015, 63, (20), pp. 5464–5478.
55. 55)
  - 43. Monaco, J.V., Stewart, J.C., Cha, S.-H., et al: ‘Behavioral biometric verification of student identity in online course assessment and authentication of authors in literary works’. Proc. IEEE Biometrics: Theory, Applications and Systems, 2013.
56. 56)
  - 40. Cristani, M., Roffo, G., Segalin, C., et al: ‘Conversationally-inspired stylometric features for authorship attribution in instant messaging’. Proc. ACM Multimedia, 2012, pp. 1121–1124.
57. 57)
  - 6. Mendenhall, T.C.: ‘The characteristic curves of composition’, Science, 1887, 9, pp. 237–249.
58. 58)
  - 63. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: ‘Extreme learning machine: theory and applications’, Neurocomputing, 2006, 70, (1), pp. 489–501.
59. 59)
  - 41. Seidman, S.: ‘Authorship verification using the impostors method’, in Forner, P., Navigli, R., Tufis, D. (Eds.): ‘CLEF 2013 evaluation labs and workshop-online working notes’, (CEUR-WS.org, Valencia, Spain, 2013).
60. 60)
  - 24. Sari, Y., Vlachos, A., Stevenson, M.: ‘Continuous n-gram representations for authorship attribution’. Proc. EACL, 2017, pp. 267–273.
61. 61)
  - 29. Sanderson, C., Guenter, S.: ‘Short text authorship attribution via sequence kernels, markov chains and author unmasking: An investigation’. Proc. Empirical Methods in Natural Language Processing, 2006, pp. 482–491.
62. 62)
  - 2. Balci, K., Salah, A.A.: ‘Automatic classification of player complaints in social games’, IEEE Trans. on Comput. Intell. AI in Games, 2017, 9, (1), pp. 103–108.
63. 63)
  - 48. Mikros, G.K., Perifanos, K.: ‘Authorship attribution in Greek tweets using author's multilevel N-gram profiles’. AAAI Spring Symp.: Analyzing Microtext, 2013.
64. 64)
  - 28. Zhao, Y., Zobel, J., Vines, P.: ‘Using relative entropy for authorship attribution’,in Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (Eds.): ‘Information retrieval technology’, (Springer, Berlin, Heidelberg, 2006), pp. 92–105.
65. 65)
  - 11. Stamatatos, E., Fakotakis, N., Kokkinakis, G.: ‘Automatic text categorization in terms of genre and author’, Comput. Linguist., 2000, 26, (4), pp. 471–495.
66. 66)
  - 16. Tufan, T., Görür, A.K.: ‘Author identification for turkish texts’, Cankaya Univ. J. Arts Sci., 2007, 1, (7), pp. 151–161.
67. 67)
  - 54. Layton, R., McCombie, S., Watters, P.: ‘Authorship attribution of IRC messages using iinverse author frequency’. Cybercrime and Trustworthy Computing Workshop (CTC), 2012 Third, 2012, pp. 7–13.
68. 68)
  - 8. Stamatatos, E.: ‘A survey of modern authorship attribution methods’, J. Am. Soc. Inf. Sci. Technol., 2009, 60, (3), pp. 538–556.
69. 69)
  - 23. Wang, S., Ferracane, E., Mooney, R.J.: ‘Leveraging discourse information effectively for authorship attribution’, arXiv preprint arXiv:1709.02271, 2017.
70. 70)
  - 33. Argamon, S., Koppel, M., Pennebaker, J.W., et al: ‘Automatically profiling the author of an anonymous text’, Comm. ACM, 2009, 52, (2), pp. 119–123.
71. 71)
  - 30. McCarthy, P.M., Lewis, G.A., Dufty, D.F., et al: ‘Analyzing writing styles with Coh-metrix’. Florida AI Research Society Conf., 2006, pp. 764–769.

Chat biometrics

References

Related content