Malware and malicious code do not only incur considerable costs and losses but impact negatively the reputation of the targeted organisations. Malware developers, hackers, and information security specialists are continuously improving their strategies to defeat each other. Unfortunately, there is no one-size-fits-all solution to detect and eradicate any malware. This situation is aggravated more by the undetected vulnerabilities that usually impair computer software and internet tools. Such vulnerabilities will remain undetected until fully exploited by malware developers, which will eventually cause considerable financial and reputation losses. In this paper, we propose a novel scheme to detect and classify malware using only image representations of the malware binaries. Highly discriminative features of the malware category and structure are extracted in a compact subspace using principal component analysis. Then, an optimised support vector machine model classifies the extracted features into malware categories. Unlike existing classification models, our solution requires simple algebraic dot products to classify malware based on representative digital images. To assess its performance, publicly-available image datasets, Malimg, Ember and BIG 2015, are considered. Our performance analysis indicates that their classifier outperforms state-of-the-art models and attains classification accuracies of 0.998, 0.911, and 0.997 using Malimg, Ember and BIG 2015 malware datasets, respectively.

References

1. 1)
  - 30. Sewak, M., Sahay, S.K., Rathore, H.: ‘Comparison of deep learning and the classical machine learning algorithm for the malware detection’. 2018 19th IEEE/ACIS Int. Conf. on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), Busan, South Korea, 2018, vol. abs/1809.05889, pp. 293–296.
2. 2)
  - 21. Nataraj, L., Karthikeyan, S., Jacob, G., et al: ‘Malware images: visualization and automatic classification’. Proc. 8th Int. Symp. on Visualization for Cyber Security. VizSec'11, Pittsburgh, Pennsylvania, USA, 2011, pp. 4:1–4:7.
3. 3)
  - 18. Bayer, U., Comparetti, P.M., Hlauschek, C., et al: ‘Scalable, behavior-based malware clustering’. Network and Distributed System Security Symp., San Diego, California, USA, 2009, pp. 8–11.
4. 4)
  - 4. Sikorski, M., Honig, A.: ‘Practical malware analysis: the hands-on guide to dissecting malicious software’ (No Starch Press, San Francisco, California, USA, 2012, 1st edn.).
5. 5)
  - 1. Accenture, Institute, P.: ‘Cost of cyber crime study: insights on the security investments that make a difference’. Ponemon Institute LLC, 2017.
6. 6)
  - 19. Barhoom, T., Qeshta, H.: ‘Worm detection by combination of classification with neural networks’, Int. Arab J. e-Technol., 2013, 3, (2), pp. 57–65.
7. 7)
  - 10. Kolter, J.Z., Maloof, M.A.: ‘Learning to detect and classify malicious executables in the wild’, J. Mach. Learn. Res., 2006, 7, pp. 2721–2744.
8. 8)
  - 28. Zhao, B., Han, J., Meng, X.: ‘A malware detection system based on intermediate language’. 2017 4th Int. Conf. on Systems and Informatics (ICSAI), Hangzhou, Zhejiang, China, 2017, pp. 824–830.
9. 9)
  - 15. Zolkipli, M., Jantan, A.: ‘An approach for malware behavior identification and classification’. 2011 3rd Int. Conf. on Computer Research and Development, Shanghai, China, 2011, vol. 1, pp. 191–194.
10. 10)
  - 25. Burnaev, E., Smolyakov, D.: ‘One-class SVM with privileged information and its application to malware detection’. 2016 IEEE 16th Int. Conf. on Data Mining Workshops (ICDMW), Barcelona, Spain, 2016, pp. 273–280.
11. 11)
  - 3. Idika, N., Mathur, A.: ‘A survey of malware detection techniques’ (Purdue University, West Lafayette, Indiana, USA, 2007).
12. 12)
  - 8. Ding, H., Sun, W., Chen, Y., et al: ‘Malware detection and classification based on parallel sequence comparison’. 5th Int. Conf. on Systems and Informatics, ICSAI 2018, Nanjing, China, 10–12 November 2018, pp. 670–675.
13. 13)
  - 17. Anderson, B., Quist, D., Neil, J., et al: ‘Graph-based malware detection using dynamic analysis’, J. Comput. Virol., 2011, 7, (4), pp. 247–258.
14. 14)
  - 29. Lee, Y.S., Lee, J.U., Soh, W.Y.: ‘Trend of malware detection using deep learning’. Proc. 2nd Int. Conf. on Education and Multimedia Technology. ICEMT 2018, Okinawa, Japan, 2018, pp. 102–106.
15. 15)
  - 22. Competition, K.: ‘Microsoft malware classification challenge (big 2015)’, 2017. Available at https://www.kaggle.com/c/malwareclassification.
16. 16)
  - 33. Hastie, T., Tibshirani, R., Friedman, J.: ‘The elements of statistical learning: data mining, inference and prediction’ (Springer, New York, USA, 2009, 2nd edn.).
17. 17)
  - 11. Kong, D., Yan, G.: ‘Discriminant malware distance learning on structural information for automated malware classification’. Proc. 19th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. KDD ‘13, Chicago, Illinois, USA, 2013, pp. 1357–1365.
18. 18)
  - 32. Dey, A., Bhattacharya, S., Chaki, N. ‘Byte label malware classification using image entropy’, in Chaki, R., Cortesi, A., Saeed, K., Chaki, N. (Eds.): ‘Advanced computing and systems for security’ (Springer, Singapore, 2019), pp. 17–29.
19. 19)
  - 13. Santos, I., Devesa, J., Brezo, F., et al: ‘OPEM: A static-dynamic approach for machine-learning-based malware detection’. Int. Joint Conf. CISIS'12–ICEUTE'12–SOCO'12 Special Sessions, Ostrava, Czech Republic, 5th–7th September 2012, pp. 271–280.
20. 20)
  - 6. Venkatraman, S., Alazab, M.: ‘Classification of malware using visualisation of similarity matrices’. 2017 Cybersecurity and Cyberforensics Conf. (CCC), London, UK, 2017, pp. 3–8.
21. 21)
  - 7. Sharma, S., Challa, R., Sahay, S.: ‘Detection of advanced malware by machine learning techniques’, CoRR, 2019, pp. 333–342.
22. 22)
  - 14. Siddiqui, M., Wang, M.C., Lee, J.: ‘Detecting internet worms using data mining techniques’, J. Syst. Cybern. Inf., 2009, 6, (6), pp. 48–53.
23. 23)
  - 5. Venkatraman, S., Alazab, M.: ‘Use of data visualisation for zero-day malware detection’, Secur. Commun. Netw., 2018, 2018, pp. 1728303:1–1728303:13.
24. 24)
  - 26. Narayanan, B.N., Djaneye-Boundjou, O., Kebede, T.M.: ‘Performance analysis of machine learning and pattern recognition algorithms for malware classification’. 2016 IEEE National Aerospace and Electronics Conf. (NAECON) and Ohio Innovation Summit (OIS), Dayton, Ohio, USA, 2016, pp. 338–342.
25. 25)
  - 2. ‘TSMC: Outbreak of malware that triggered delays losses caused by software for new tool’. Available at https://www.anandtech.com/show/13193/tsmc-outbreak-of-malware-that-triggereddelays-losses-caused-by-software-for-new-tool (accessed 15 March 2019).
26. 26)
  - 27. Sahay, S., Sharma, A.: ‘Grouping the executables to detect malware with high accuracy’, Procedia Comput. Sci., 2016, 78, pp. 667–674.
27. 27)
  - 12. Tian, R., Islam, R., Batten, L., et al: ‘Differentiating malware from cleanware using behavioural analysis’. 2010 5th Int. Conf. on Malicious and Unwanted Software, Nancy, Lorraine, France, 2010, pp. 23–30.
28. 28)
  - 16. Rieck, K., Trinius, P., Willems, C., et al: ‘Automatic analysis of malware behavior using machine learning’, J. Comput. Secur., 2011, 19, (4), pp. 639–668.
29. 29)
  - 31. Cakir, B., Dogdu, E.: ‘Malware classification using deep learning methods’. Proc. ACMSE 2018 Conf. (ACMSE'18), Richmond, Kentucky, USA, 2018, pp. 10:1–10:5.
30. 30)
  - 35. Goodfellow, I., Bengio, Y., Courville, A.: ‘Deep learning’ (MIT Press, Cambridge, Massachusetts, USA, 2016).
31. 31)
  - 23. Oliva, A., Torralba, A.: ‘Modeling the shape of the scene: A holistic representation of the spatial envelope’, Int. J. Comput. Vision, 2001, 42, (3), pp. 145–175.
32. 32)
  - 24. Park, Y., Reeves, D., Mulukutla, V., et al: ‘Fast malware classification by automated behavioral graph matching’. Proc. Sixth Annual Workshop on Cyber Security and Information Intelligence Research. CSIIRW'10, Oak Ridge, Tennessee, USA, 2010, pp. 45:1–45:4.
33. 33)
  - 34. Anderson, H.S., Roth, P.: ‘EMBER: an open dataset for training static PE malware machine learning models’, ArXiv e-prints, 2018.
34. 34)
  - 9. Schultz, M.G., Eskin, E., Zadok, E., et al: ‘Data mining methods for detection of new malicious executables’. Proc. 2001 IEEE Symp. on Security and Privacy. SP'01, Oakland, California, USA, 2001, pp. 38–49.
35. 35)
  - 20. Kalash, M., Rochan, M., Mohammed, N., et al: ‘Malware classification with deep convolutional neural networks’. 2018 9th IFIP Int. Conf. on New Technologies, Mobility and Security (NTMS), Paris, France, 2018, pp. 1–5.

Malware classification using compact image features and multiclass support vector machines

References

Related content