ADVANCED MALICIOUS SOFTWARE DETECTION USING DNN
DOI:
https://doi.org/10.51903/jtie.v1i1.144Keywords:
Malicious software detection, DNN, Portable ExecutableAbstract
The special component of malicious software analysis is advanced malicious software analysis which implicates interested the main framework of malicious software that can be executed after executing it and aggressive malicious software investigation depend on inquisitive of the practice of malicious software after running it in a composed habitat. Advanced malicious software analysis is usually performed by contemporary anti-malicious software operating systems using signature-based analysis.
The purpose of this research is to propose also decide a DNN for the progressive identification of portable files to study the features of portable executable malicious software to minimize the occurrence of distorted likeness when aware of advanced malicious software. The model proposed in this study is a NN with a Dropout model contrary to a resolution tree model to examine how well it performs in detecting real malicious PE files. Setup-skeptic methods are used to extract features from files. The dataset is used to train the proposed approach and measure outcomes by alternative common malicious software datasets.
The results from this study illustrate that the use of simple DNNs to study PE vector elements is not only efficient but more fewer system comprehensive than the traditional interested disclosure approach. The model proposed in this study achieves an A-UC of ninety-nine point eight with ninety accurate specifics at one percent inaccurate specific on the R-OC curve. For shows that this model has the potential to complement or replace conventional anti-malicious software operating systems so for future research, it is proposed to implement this model practically.
References
Anaconda. Anaconda software distribution version 2-2.4.0, November 2016.
Andreas Moser, Christopher Kruegel, and Engin Kirda. Limits of static analysis for malicious software detection. In Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007), pages 421-430. IEEE, 2007.
Bojan Kolosnjaji, Apostolis Zarras, George Webster, and Claudia Eckert. Deep learning for classification of malicious software system call sequences. In Australasian Joint Conference on Artificial Intelligence, pages 137-149. Springers, 2016.
Byron P Roe, Hai-Jun Yang, Ji Zhu, Yong Liu, Ion Stancu, and Gordon McGregor. Boosted decision trees as an alternative to artificial NNs for particle identification. Nuclear Instruments and Methods in Physics Research Area A: Accelerators, Spectrometers, Detectors, and Associated Equipment, 543(2-3):577-584, 2005.
Charles E Metz. Basic principles of R-OC analysis. In Seminars in nuclear medicine, volume 8, pages 283-298. Elsevier, 1978.
Christopher Manning, Prabhakar Raghavan, and Hinrich Schutze. Introduction to information retrieval. Natural Language Engineering, 16(1):100-103, 2010.
Daniel Billar. Opcodes as predictors for malicious software. International Journal of Electronic Security and Digital Forensics, 1(2):156-168, 2007.
David Brumley, Cody Hartwig, Zhenkai Liang, James Newsome, Dawn Song, and Heng Yin. Automatically identifying trigger-based behavior in malicious software. In Botnet Detection, pages 65-88. Springer, 2008.
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Dilshan Keragala. Detecting malicious software and sandbox evasion techniques. SANS Institute InfoSec Reading Room, 16, 2016.
Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825-2830, 2011.
Francois Chollet et al. Hard. https://keras.io , 2015.
Guillaume Bonfante, Matthieu Kaczmarek, and Jean-Yves Marion. Control flow graphs as malicious software signatures. In International workshop on the Theory of Computer Viruses, 2007.
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, pages 3146-3154, 2017.
Hamid Divandari, Bassir Pechaz, and Majid Vafaie Jahan. Malicious software detection using Markov blanket based on opcode sequences. In the 2015 International Congress on
Igor Santos, Jaime Devesa, Felix Brezo, Javier Nieves, and Pablo Garcia Bringas. Open A static-dynamic approach for machine-learning-based malicious software detection. In International Joint Conference CISIS'12-ICEUTE 12-SOCO 12 Special Sessions, pages 271-280. Springers, 2013.
JD Hunter. Matplotlib: A 2d graphics environment. Computing in Science & Engineering, 9(3):90 -95, 2007.
Jeremy Z Kolter and Marcus A Maloof. Learning to detect malicious executables in the wild. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 470-478. ACM, 2004.
J-Michael Roberts. Virus share.(2011). URL https://virusshare.com , 2011.
John Nickolls, Ian Buck, and Michael Garland. Scalable parallel programming. In 2008 IEEE Hot Chips 20 Symposium (HCS), pages 40-53. IEEE, 2008.
Jon Oberheide, Michael Bailey, and Farnam Jahanian. Polypack: an automated online packing service for optimal antivirus evasion. In Proceedings of the 3rd USENIX conference on Offensive technologies, pages 9. USENIX Association, 2009.
Joshua Saxe and Konstantin Berlin. DNN-based malicious software detection using two-dimensional binary program features. In 2015 10th International Conference on Malicious and Unwanted Software (MALICIOUS SOFTWARE), pages 11-20. IEEE, 2015.
Karthik Raman et al. Selecting features to classify malicious software. InfoSec Southwest, 2012.
Katherine Heller, Krysta Svore, Angelos D Keromytis, and Salvatore Stolfo. One class support vector machines for detecting anomalous windows registry accesses. In ICDM Workshop on Data Mining for Computer Security, 2003.
Kilian Weinberger, Anirban Dasgupta, Josh Attenberg, John Langford, and Alex Smola. Quarrel feature for large-scale multitask learning. arXiv preprint arXiv:0902.2206, 2009.
M. Sikorski and A. Honig. Practical Malicious software Analysis: The Hands-On Guide to Dissecting Malicious Software. No Starch Press, 2012.
Manuel Egele, Theodoor Scholte, Engin Kirda, and Christopher Kruegel. A survey on automated dynamic malicious software-analysis techniques and tools. ACM computing surveys (CSUR), 44(2):6, 2012.
Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensor flow: A system for large-scale machine learning. In 12th fUSENIXg Symposium on Operating Systems Design and Implementation (fOSDIg 16), pages 265-283, 2016.
Michele Banko and Eric Brill. Scaling to very very large corpora for natural language disambiguation. In Proceedings of the 39th annual meeting on association for computational linguistics, pages 26-33. Association for Computational Linguistics 2001.
Mihai Christodorescu and Somesh Jha. Static analysis of executables to detect malicious patterns. Technical report, WISCONSIN UNIV-MADISON DEPT OF COMPUTER SCIENCES, 2006.
Mila Dalla Preda, Mihai Christodorescu, Somesh Jha, and Saumya Debray. A semantics-based approach to malicious software detection. ACM SIGPLAN Notices, 42(1):377- 388, 2007.
Bagga's name. Measuring the effectiveness of generic malicious software models. Master's thesis, San Jose State University, 2017.
Philip OKane, Sakir Sezer, and Kieran McLaughlin. Obfuscation: The hidden malicious software. IEEE Security & Privacy, 9(5):41-47, 2011.
Randy Kat. The portable executable file format from top to bottom. MSDN Library, Microsoft Corporation, 1993.
Razvan Pascanu, Jack W Stokes, Hermineh Sanossian, Mady Marinescu, and Anil Thomas. Malicious software classification with recurrent networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1916 -1920. IEEE, 2015.
Rich Caruana and Alexandru Niculescu-Mizil. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd international conference on Machine learning, pages 161-168. ACM, 2006.
Robert E Schapire. The boosting approach to machine learning: An overview. In Nonlinear estimation and classification, pages 149-171. Springer, 2003.
Romain Thomas. Lief - library to instrument executable formats. https://lief.quarkslab.com/, April 2017.
Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA, 1993.
Royi Ronen, Marian Radu, Corina Feuerstein, Elad Yom-Tov, and Mansour Ahmadi.
Srilatha Attaluri, Scott McGhee, and Mark Stamp. Profile hidden Markov models and metamorphic virus detection. Journal in computer virology, 5(2):151-169, 2009.
Stefan Van Der Walt, S Chris Colbert, and Gael Varoquaux. The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering, 13(2):22, 2011.
T Jayalakshmi and A Santhakumaran. Statistical normalization and backpropagation for classification. International Journal of Computer Theory and Engineering, 3(1):1793-8201, 2011.
Technology, Communication, and Knowledge (ICTCK) page 564-569. IEEE, 2015.
Tom Fawcett. An introduction to R-OC analysis. Pattern recognition letters, 27(8):861-874, 2006.
Travis E Oliphant. Python for scientific computing. Computing in Science & Engineering, 9(3):10-20, 2007.
Trevor Hastie, Saharon Rosset, Ji Zhu, and Hui Zou. Adaboost multi-class. Statistics and its Interface, 2(3):349-360, 2009.
Wen-Chieh Wu and Shih-Hao Hung. Droiddolphin: A dynamic android malicious software detection framework using big data and machine learning. In Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems, RACS '14, pages 247-252, New York, NY, USA, 2014. ACM.
Wenyi Huang and Jack W Stokes. Mtnet: a multi-task NN for dynamic malicious software classification. In International Conference on Detection of Intrusions and Malicious software, and Vulnerability Assessment pages 399-418. Springers, 2016.
Wes McKinney et al. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, volume 445, pages 51-56. Austin, TX, 2010.