PriMera Scientific Engineering (ISSN: 2834-2550)

Research Article

Volume 6 Issue 5

Static Malware Classification: A Comparative Analysis of Machine Learning Techniques on Byte and Opcode Features

Chijioke Erasmus Ogbonna*, Michael Nsor, Praise Onyehanere, Husein Harun, Aghoghomena Emadoye, Ifunanya Obinwanne, Gloria Nwachukwu Ogochukwu and JOHN Joshua Junior

April 28, 2025

DOI : 10.56831/PSEN-06-197

Abstract

Malware classification is essential for cybersecurity, enabling early detection and mitigation of threats. This study investigates the efficacy of static analysis combined with various machine learning algorithms for identifying malware families. We extract three distinct feature sets from program binaries: byte histograms, opcode frequencies, and section distributions. A comprehensive comparative analysis is conducted using a suite of traditional machine learning classifiers, including Decision Tree, Support Vector Classifier, K-Nearest Neighbors, Naive Bayes, Stochastic Gradient Descent, Logistic Regression, and Random Forest, alongside a foundational Deep Neural Network. Evaluation on the completed byte histogram and opcode frequency feature sets reveals significant performance variations across models and feature representations. Opcode frequency features generally yield superior classification accuracy compared to byte histograms. Notably, Logistic Regression consistently demonstrates high accuracy and low false positive rates on both feature sets, achieving up to 92.9% accuracy with opcode features. While traditional models like Logistic Regression and SVC perform strongly, the initial evaluation of the basic DNN architecture shows promising improvement on opcode data but requires further exploration. These findings underscore the continued relevance of static feature engineering and comparative model analysis in malware detection, highlighting opcode patterns as particularly discriminative. Analysis incorporating section distribution features is ongoing to provide a more complete understanding.

Keywords: Malware Classification; Static Analysis; Machine Learning; Cybersecurity; Digital Forensics; Byte Histograms; Opcode Frequencies

References

  1. OA Aslan and R Samet. “A Comprehensive Review on Malware Detection Approaches”. IEEE Access 8.1 (2020): 6249-6271.
  2. AK Chakravarty., et al. “A study of signature-based and behaviour-based malware detection approaches”. Int. J. Adv. Res. Ideas Innov. Technol 5.3 (2019): 1509-1511.
  3. Z Bazrafshan., et al. “A survey on heuristic malware detection techniques”. in Proc. 5th Conf. Inf. Knowl. Technol. (IKT) (2013): 113-120.
  4. S Altaha and K Riad. “Machine Learning in Malware Analysis: Current Trends and Future Directions”. International Journal of Advanced Computer Science and Applications 15.1 (2024): 124-131.
  5. PK Anajani., et al. “Static Malware Analysis Using Optimal Machine Learning Algorithm for Malware Detection”. NeuroQuantology 20.10 (2022): 4128.
  6. M Miraoui and MB Belgacem. “Binary and multiclass malware classification of Windows portable executable using classic machine learning and deep learning”. Frontiers in Computer Science 7 (2025).
  7. B Abduraimova., et al. “Comparative study of machine learning applications in malware forensics”. [Online]. CPITS-II 2024: Workshop on Cybersecurity Providing in Information and Telecommunication Systems II (2024): 139-152.
  8. A Shabtai., et al. “Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey”. Information Security Technical Report 14.1 (2009): 16-29.
  9. M Hassen, MM Carvalho and PK Chan. “Malware classification using static analysis based features”. IEEE Xplore (2017).
  10. E Raff., et al. “An investigation of byte n-gram features for malware classification”. Journal of Computer Virology and Hacking Techniques 14.1 (2016): 1-20.
  11. S Jain and YK Meena. “Byte Level n-Gram Analysis for Malware Detection”. Communications in Computer and Information Science (2011): 51-59.
  12. A Mello., et al. “Malware identification on Portable Executable files using Opcodes Sequence”. Congresso Brasileiro de Inteligência Computacional (2023): 1-8.
  13. A Lakshmanarao and M Shashi. “Android Malware Detection with Deep Learning using RNN from Opcode Sequences”. International Journal of Interactive Mobile Technologies (iJIM) 16.01 (2022): 145-157.
  14. I Santos., et al. “Using opcode sequences in single-class learning to detect unknown malware”. IET Information Security 5.4 (2011): 220.
  15. Offwhite Security. “Section Headers”. [Online].
  16. ASTRA Labs. “PE Header Fundamentals: The First Step in Malware Analysis”. (2024) [Online].
  17. Y Liao. “PE-Header-Based Malware Study and Detection”. UGA School of Computing [Online].
  18. B Khan, M Arshad and Sajid Ullah Khan. “Comparative Analysis of Machine Learning Models for PDF Malware Detection: Evaluating Different Training and Testing Criteria”. Journal of cyber security 5.0 (2023): 1-11.
  19. L Chen. “Deep Transfer Learning for Static Malware Classification”. arXiv.org (2018).
  20. S Zhang., et al. “A Malware-Detection Method Using Deep Learning to Fully Extract API Sequence Features”. Electronics 14.1 (2025): 167-167.