摘要
目前对于恶意软件的分析大多是基于特征提取的方式,通过提取恶意软件的操作码、PE结构、汇编码、字符串以及捕获的动态行为信息等特征,使用机器学习、深度学习算法学习特征实现恶意软件的分类。但是由于恶意软件的各种变形和加密技术的日益成熟,使得特征选择和特征提取变得越来越困难,所以需要有效的特征提取方法和分类算法来对抗这些复杂恶意软件。首先分析了国内外针对特征融合在恶意软件分类方面的现状,提出了现阶段存在的问题。然后收集数据集并进行预处理和特征提取,其中动态特征提取是通过搭建Cuckoo沙箱捕获动态API信息并使用TF-IDF方法提取关键API行为特征,静态特征提取则对恶意软件进行反汇编并提取静态操作码信息,利用N-gram、Apriori及信息增益方法提取重要操作码组合特征,然后将动静态特征融合并使用因子分解机作为恶意软件分类算法对特征之间的交互影响建模,最后恶意软件的分类准确率和召回率达到95%以上。
At present,the analysis of malware is mostly based on feature extraction methods.By extracting the Op code/operational code,PE structure,Code,String and captured dynamic behavior information of malware,machine learning and deep learning algorithms are used to learn the features to realize the detection and classification of malicious software.However,due to the various deformations of malware and the increasing maturity of encryption technology,feature selection and feature extraction become more and more difficult.Therefore,effective feature extraction methods and classification algorithms are needed to combat these complex malware.Firstly,this thesis analyzes the current situation of feature fusion in malware classification at home and abroad,and puts forward the existing problems at this stage.After that,normal and malicious sample data sets are collected for preprocessing and feature extraction.For dynamic feature extraction,a Cuckoo Sandbox is built to capture dynamic API information and extract key API behavior features using TF-IDF method.For static feature extraction,malware is disassembled and static opcode information is extracted using N-gram,Apriori and Information Gain method to extract important opcode combination features.Then the dynamic and static features are fused,and the Factorization Machines is used as the malware classification algorithms to model the interaction between features.Finally,the classification accuracy and recall of malware reach more than 95%.
作者
陶文伟
吴金宇
张富川
曹扬
吴昊
唐瑛
王宝会
TAO Wenwei;WU Jinyu;ZHANG Fuchuan;CAO Yang;WU Hao;TANG Ying;WANG Baohui(China Southern Power Grid Co.,LTD,Guangzhou,510623,China;College of Software,Beihang University,Beijing,100191,China)
出处
《网络新媒体技术》
2023年第3期20-26,共7页
Network New Media Technology