期刊文献+

基于特征融合的恶意软件分类算法研究

Research on Malware Classification Algorithm Based on Feature Fusion
在线阅读 下载PDF
导出
摘要 目前对于恶意软件的分析大多是基于特征提取的方式,通过提取恶意软件的操作码、PE结构、汇编码、字符串以及捕获的动态行为信息等特征,使用机器学习、深度学习算法学习特征实现恶意软件的分类。但是由于恶意软件的各种变形和加密技术的日益成熟,使得特征选择和特征提取变得越来越困难,所以需要有效的特征提取方法和分类算法来对抗这些复杂恶意软件。首先分析了国内外针对特征融合在恶意软件分类方面的现状,提出了现阶段存在的问题。然后收集数据集并进行预处理和特征提取,其中动态特征提取是通过搭建Cuckoo沙箱捕获动态API信息并使用TF-IDF方法提取关键API行为特征,静态特征提取则对恶意软件进行反汇编并提取静态操作码信息,利用N-gram、Apriori及信息增益方法提取重要操作码组合特征,然后将动静态特征融合并使用因子分解机作为恶意软件分类算法对特征之间的交互影响建模,最后恶意软件的分类准确率和召回率达到95%以上。 At present,the analysis of malware is mostly based on feature extraction methods.By extracting the Op code/operational code,PE structure,Code,String and captured dynamic behavior information of malware,machine learning and deep learning algorithms are used to learn the features to realize the detection and classification of malicious software.However,due to the various deformations of malware and the increasing maturity of encryption technology,feature selection and feature extraction become more and more difficult.Therefore,effective feature extraction methods and classification algorithms are needed to combat these complex malware.Firstly,this thesis analyzes the current situation of feature fusion in malware classification at home and abroad,and puts forward the existing problems at this stage.After that,normal and malicious sample data sets are collected for preprocessing and feature extraction.For dynamic feature extraction,a Cuckoo Sandbox is built to capture dynamic API information and extract key API behavior features using TF-IDF method.For static feature extraction,malware is disassembled and static opcode information is extracted using N-gram,Apriori and Information Gain method to extract important opcode combination features.Then the dynamic and static features are fused,and the Factorization Machines is used as the malware classification algorithms to model the interaction between features.Finally,the classification accuracy and recall of malware reach more than 95%.
作者 陶文伟 吴金宇 张富川 曹扬 吴昊 唐瑛 王宝会 TAO Wenwei;WU Jinyu;ZHANG Fuchuan;CAO Yang;WU Hao;TANG Ying;WANG Baohui(China Southern Power Grid Co.,LTD,Guangzhou,510623,China;College of Software,Beihang University,Beijing,100191,China)
出处 《网络新媒体技术》 2023年第3期20-26,共7页 Network New Media Technology
关键词 恶意软件 特征融合 分类 TF-IDF N-GRAM malware feature fusion classification TF-IDF N-gram
  • 相关文献

参考文献7

二级参考文献33

  • 1ZHOU Y,WANG Z,ZHOU W,et al.Hey,you,get off of my market:detecting malicious apps in official and alternative Android markets[EB/OL].[2015-02-10].http://www.internetsociety.org/sites/default/files/P07_5.pdf.
  • 2GRACE M,ZHOU Y,ZHANG Q,et al.RiskRanker:scalable and accurate zero-day Android malware detection[C]//Proceedings of the 10th International Conference on Mobile Systems,Applications,and Services.New York:ACM,2012:281-294.
  • 3ENCK W,ONGTANG M,McDANIEL P.On lightweight mobile phone application certification[C]//Proceedings of the 16th ACM Conference on Computer and Communications Security.New York:ACM,2009:235-245.
  • 4PANDITA R,XIAO X,YANG W,et al.WHYPER:towards automating risk assessment of mobile applications[C]//SEC 2013:Proceedings of the 22nd USENIX Conference on Security.Berkeley:USENIX,2013:527-542.
  • 5WEI F,ROY S,OU X.Amandroid:a precise and general inter-component data flow analysis framework for security vetting of Android APPs[C]//Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2014:1329-1341.
  • 6ZHANG M,DUAN Y,YIN H,et al.Semantics-aware Android malware classification using weighted contextual API dependency graphs[C]//Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2014:1105-1116.
  • 7SHABTAI A,KANONOV U,ELOVICI Y,et al."Andromaly":a behavioral malware detection framework for Android devices[J].Journal of Intelligent Information Systems,2012,38(1):161-190.
  • 8LING G C,ASAHARA M,MATSUMOTO Y.Chinese unknown word identification using character-based tagging and chunking[C]//ACL 2003:Proceedings of the 41st Annual Meeting on Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2003,2:197-200.
  • 9XU Y,CHEN L.Term-frequency based feature selection methods for text categorization[C]//Proceedings of the 2010 4th International Conference on Genetic and Evolutionary Computing.Piscataway,NJ:IEEE,2010:280-283.
  • 10JIANG S,PANG G,WU M,et al.An improved K-nearest-neighbor algorithm for text categorization[J].Expert Systems with Applications,2012,39(1):1503-1509.

共引文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部