access icon free Malware classification based on API calls and behaviour analysis

This study presents the runtime behaviour-based classification procedure for Windows malware. Runtime behaviours are extracted with a particular focus on the determination of a malicious sequence of application programming interface (API) calls in addition to the file, network and registry activities. Mining and searching n-gram over API call sequences is introduced to discover episodes representing behaviour-based features of a malware. Voting Experts algorithm is used to extract malicious API patterns over API calls. The classification model is built by applying online machine learning algorithms and compared with the baseline classifiers. The model is trained and tested with a fairly large set of 17,400 malware samples belonging to 60 distinct families and 532 benign samples. The malware classification accuracy is reached at 98%.

Inspec keywords: learning (artificial intelligence); application program interfaces; pattern classification; invasive software; data mining

Other keywords: application programming interface call; runtime behaviour-based classification procedure; behaviour-based features; Voting Experts algorithm; classification model; baseline classifiers; n-gram; mining; API call sequences; online machine learning algorithms; behaviour analysis; malicious API pattern extraction; malware classification; Windows malware; malware classification accuracy

Subjects: Data security; General utility programs; Knowledge engineering techniques; Operating systems

References

    1. 1)
      • 31. Altman, N.S.: ‘An introduction to kernel and nearest-neighbor nonparametric regression’, Am. Stat., 1992, 46, (32), pp. 175185.
    2. 2)
      • 14. Nari, S., Ghorbani, A.A.: ‘Automated malware classification based on network behavior’. Int. Conf. on Computing, Networking and Communications, 2013, pp. 642647.
    3. 3)
      • 6. Liu, W., Ren, P.: ‘Behavior-based malware analysis and detection’. First Int. Workshop on Complexity and Data Mining (IWCDM), 2011, pp. 3942.
    4. 4)
      • 13. Kim, S., Park, J., Lee, K., et al: ‘A brief survey on rootkit techniques in malicious codes’, J. Internet Services Inf. Sec., 2012, 3, (4), pp. 134147.
    5. 5)
      • 4. Zhong, Y., Yamaki, H., Takakura, H.: ‘A malware classification method based on similarity of function structure’. IEEE/IPSJ 12th Int. Symp. on Applications and the Internet, 2012, pp. 256261.
    6. 6)
      • 20. Hunt, G., Brubacher, D.: ‘DETOURS: binary interception of Win32 functions’. 3rd Usenix Windows NT Symp., 1999, pp. 1414.
    7. 7)
      • 34. ‘Virusshare: Malware Sharing Platform’, http://www.virusshare.com/, accessed 17 July 2017.
    8. 8)
      • 5. Forrest, S., Longstaff, T.A.: ‘A sense of self for Unix processes’. IEEE Symp. on Security and Privacy, 1996, pp. 120128.
    9. 9)
      • 30. Crammer, K., Kulesza, A., Dredze, M.: ‘Adaptive regularization of weight vectors’, in Bengio, Y., Schuurmans, D., Lafferty, J. D., Williams, C. K. I., Culotta, A. (Eds): ‘Advances in neural information processing systems’ (Curran Associates, Inc., 2009), pp. 414422, http://papers.nips.cc/paper/3848-adaptive-regularization-of-weightvectors.pdf.
    10. 10)
      • 29. Dredze, M., Crammer, K., Pereira, F.: ‘Confidence-weighted linear classification’. Proc. 25th Int. Conf. on Machine learning, 2008, pp. 264271.
    11. 11)
      • 19. Ki, Y., Kim, E., Kim, H.K.: ‘A novel approach to detect malware based on API call sequence analysis’, Int. J. Distrib. Sensor Netw., 2015, 19, (4), p. 4.
    12. 12)
      • 22. ‘Cuckoo Sandbox’, http://www.cuckoosandbox.org/, accessed 24 July 2017.
    13. 13)
      • 16. Mohaisen, A., West, A.G., Mankin, A.: ‘Chatter: classifying malware families using system event ordering’. IEEE Conf. on Communications and Network Security, 2014, pp. 283291.
    14. 14)
      • 11. Bai, H., Hu, C., Jing, X., et al: ‘Approach for malware identification using dynamic behaviour and outcome triggering’, IET Inf. Sec., 2014, 8, (2), pp. 140151.
    15. 15)
      • 12. Taeho, K., Zhendong, S.: ‘Behavior-based malware analysis and detection’. IEEE 11th Int. Conf. on Data Mining, 2011, pp. 11341139.
    16. 16)
      • 32. Wu, T.F., Lin, C.J., Weng, R.C.: ‘Probability estimates for multi-class classification by pairwise coupling’, J. Mach. Learn. Res., 2004, 5, pp. 9751005.
    17. 17)
      • 2. Symantec.: ‘Internet Security Threat Report2016.
    18. 18)
      • 25. Cohen, P., Adams, N., Heeringa, B.: ‘Voting experts: an unsupervised algorithm for segmenting sequences’, Intell. Data Anal., 2006, 11, (6), pp. 607625.
    19. 19)
      • 3. Pektas, A., Eris, M., Acarman, T.: ‘Proposal of n-gram based algorithm for malware classification’. The Fifth Int. Conf. on Emerging Security Information, Systems and Technologies, 2011, pp. 16.
    20. 20)
      • 9. Shen, F., Del Vecchio, J., Mohaisen, A., et al: ‘Android malware detection using complex-flows’. IEEE 37th Int. Conf. on Distributed Computing Systems, Atlanta, GA, 2017, pp. 24302437.
    21. 21)
      • 21. Tirli, H., Pektas, A., Falcone, Y., et al: ‘Virmon: a virtualization-based automated, dynamic malware analysis system’. Proc. of Int. Conf. on Information Security and Cryptology, 2013, pp. 5762.
    22. 22)
      • 15. Hall, M., Frank, E., Holmes, G., et al: ‘The WEKA data mining software: an update’, ACM SIGKDD Explor. Newsl., 2009, 11, (1), pp. 1018.
    23. 23)
      • 28. Mohaisen, A., Alrawi, O., Matt, L., et al: ‘Towards a methodical evaluation of antivirus scans and labels’, in Kim, Y., Lee, H., Perrig, A. (Eds): ‘Information security applications’ (Springer International Publishing, Cham, 2013), pp. 231241.
    24. 24)
      • 33. Breiman, L.: ‘Random forests’, Mach. Learn., 2001, 45, (1), pp. 532.
    25. 25)
      • 35. Kolosnjaji, B., Eraisha, G., Webster, G., et al: ‘Empowering convolutional networks for malware classification and analysis’. Int. Joint Conf. on Neural Networks, May 2017, pp. 38383845.
    26. 26)
      • 18. Chandramohan, M., Tan, H.B., Kuan, B., et al: ‘A scalable approach for malware detection through bounded feature space behavior modeling’. IEEE/ACM 28th Int. Conf. on Automated Software Engineering, 2013, pp. 312322.
    27. 27)
      • 1. Sharma, A., Sahay, S.K.: ‘Evolution and detection of polymorphic and metamorphic malwares: a survey’, Int. J. Comput. Appl., 2014, 9, (2), pp. 711.
    28. 28)
      • 26. ‘Virustotal: An online multiple AV Scan Service’, http://www.virustotal.com/, accessed 17 July 2017.
    29. 29)
      • 10. Mariconti, E., Onwuzurike, L., Andriotis, P., et al: ‘Mamadroid: detecting android malware by building markov chains of behavioral models’, CoRR, 2016, http://arxiv.org/abs/1612.04433.
    30. 30)
      • 36. ‘Analysis reports of the malware samples used in this study’, http://research.pektas.in, accessed 28 August 2017.
    31. 31)
      • 24. Caltagirone, S., Pendergast, A., Betz, C.: ‘The diamond model of intrusion analysis’, DTIC Document, 2013.
    32. 32)
      • 23. Kirillov, I., Beck, D., Chase, P., et al: ‘Malware attribute enumeration and characterization’. Tech. Rep, The MITRE Corporation, 2010.
    33. 33)
      • 8. Xu, J.Y., Sung, A., Chavez, P., et al: ‘Polymorphic malicious executable scanner by API sequence analysis’. Fourth Int. Conf. on Hybrid Intelligent Systems, 2004, pp. 378383.
    34. 34)
      • 7. Natani, P., Vidyarthi, D.: ‘Malware detection using API function frequency with ensemble based classifier’, in Thampi, S.M., Atrey, P.K., Fan, CI., Perez, G.M. (Eds): ‘Security in computing and communications’ (Springer, Berlin, Heidelberg, 2013), pp. 378388.
    35. 35)
      • 17. Rieck, K., Philipp, T., Willems, C., et al: ‘Automatic analysis of malware behavior using machine learning’, J. Comput. Sec., 2011, 19, (4), pp. 639668.
    36. 36)
      • 27. Mohaisen, A., Alrawi, O.: ‘Av-meter: an evaluation of antivirus scans and labels’, in Dietrich, S. (Ed): ‘Detection of intrusions and malware, and vulnerability Assessment’ (Springer International Publishing, Cham, 2014), pp. 112131.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-ifs.2017.0430
Loading

Related content

content/journals/10.1049/iet-ifs.2017.0430
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading