基于特征融合的恶意软件分类算法研究

Research on Malware Classification Algorithm Based on Feature Fusion

下载PDF

导出

摘要目前对于恶意软件的分析大多是基于特征提取的方式,通过提取恶意软件的操作码、PE结构、汇编码、字符串以及捕获的动态行为信息等特征,使用机器学习、深度学习算法学习特征实现恶意软件的分类。但是由于恶意软件的各种变形和加密技术的日益成熟,使得特征选择和特征提取变得越来越困难,所以需要有效的特征提取方法和分类算法来对抗这些复杂恶意软件。首先分析了国内外针对特征融合在恶意软件分类方面的现状,提出了现阶段存在的问题。然后收集数据集并进行预处理和特征提取,其中动态特征提取是通过搭建Cuckoo沙箱捕获动态API信息并使用TF-IDF方法提取关键API行为特征,静态特征提取则对恶意软件进行反汇编并提取静态操作码信息,利用N-gram、Apriori及信息增益方法提取重要操作码组合特征,然后将动静态特征融合并使用因子分解机作为恶意软件分类算法对特征之间的交互影响建模,最后恶意软件的分类准确率和召回率达到95%以上。 At present,the analysis of malware is mostly based on feature extraction methods.By extracting the Op code/operational code,PE structure,Code,String and captured dynamic behavior information of malware,machine learning and deep learning algorithms are used to learn the features to realize the detection and classification of malicious software.However,due to the various deformations of malware and the increasing maturity of encryption technology,feature selection and feature extraction become more and more difficult.Therefore,effective feature extraction methods and classification algorithms are needed to combat these complex malware.Firstly,this thesis analyzes the current situation of feature fusion in malware classification at home and abroad,and puts forward the existing problems at this stage.After that,normal and malicious sample data sets are collected for preprocessing and feature extraction.For dynamic feature extraction,a Cuckoo Sandbox is built to capture dynamic API information and extract key API behavior features using TF-IDF method.For static feature extraction,malware is disassembled and static opcode information is extracted using N-gram,Apriori and Information Gain method to extract important opcode combination features.Then the dynamic and static features are fused,and the Factorization Machines is used as the malware classification algorithms to model the interaction between features.Finally,the classification accuracy and recall of malware reach more than 95%.

作者陶文伟吴金宇张富川曹扬吴昊唐瑛王宝会 TAO Wenwei;WU Jinyu;ZHANG Fuchuan;CAO Yang;WU Hao;TANG Ying;WANG Baohui(China Southern Power Grid Co.,LTD,Guangzhou,510623,China;College of Software,Beihang University,Beijing,100191,China)

机构地区中国南方电网有限责任公司北京航空航天大学软件学院

出处《网络新媒体技术》 2023年第3期20-26,共7页 Network New Media Technology

关键词恶意软件特征融合分类 TF-IDF N-GRAM malware feature fusion classification TF-IDF N-gram

分类号 TP311.5 [自动化与计算机技术—计算机软件与理论] TP309 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献7

1谭杨,刘嘉勇,张磊.基于混合特征的深度自编码器的恶意软件家族分类[J].信息网络安全,2020(12):72-82. 被引量：8
2孙博文,黄炎裔,温俏琨,田斌,吴鹏,李祺.基于静态多特征融合的恶意软件分类方法[J].网络与信息安全学报,2017,3(11):68-76. 被引量：10
3郎大鹏,丁巍,姜昊辰,陈志远.基于多特征融合的恶意代码分类算法[J].计算机应用,2019,39(8):2333-2338. 被引量：10
4孙润康,彭国军,李晶雯,沈诗琦.基于行为的Android恶意软件判定方法及其有效性[J].计算机应用,2016,36(4):973-978. 被引量：7
5戴逸辉,殷旭东.基于随机森林的恶意代码检测[J].网络空间安全,2018,9(2):70-75. 被引量：10
6李劭杰,王晨,史崯.基于多特征随机森林的恶意代码检测[J].计算机应用与软件,2020,37(10):328-333. 被引量：7
7王兴凤,黄琨茗,张文杰.基于API序列和卷积神经网络的恶意代码检测[J].信息安全研究,2020,6(3):212-219. 被引量：2

二级参考文献33

1ZHOU Y,WANG Z,ZHOU W,et al.Hey,you,get off of my market:detecting malicious apps in official and alternative Android markets[EB/OL].[2015-02-10].http://www.internetsociety.org/sites/default/files/P07_5.pdf.
2GRACE M,ZHOU Y,ZHANG Q,et al.RiskRanker:scalable and accurate zero-day Android malware detection[C]//Proceedings of the 10th International Conference on Mobile Systems,Applications,and Services.New York:ACM,2012:281-294.
3ENCK W,ONGTANG M,McDANIEL P.On lightweight mobile phone application certification[C]//Proceedings of the 16th ACM Conference on Computer and Communications Security.New York:ACM,2009:235-245.
4PANDITA R,XIAO X,YANG W,et al.WHYPER:towards automating risk assessment of mobile applications[C]//SEC 2013:Proceedings of the 22nd USENIX Conference on Security.Berkeley:USENIX,2013:527-542.
5WEI F,ROY S,OU X.Amandroid:a precise and general inter-component data flow analysis framework for security vetting of Android APPs[C]//Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2014:1329-1341.
6ZHANG M,DUAN Y,YIN H,et al.Semantics-aware Android malware classification using weighted contextual API dependency graphs[C]//Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2014:1105-1116.
7SHABTAI A,KANONOV U,ELOVICI Y,et al."Andromaly":a behavioral malware detection framework for Android devices[J].Journal of Intelligent Information Systems,2012,38(1):161-190.
8LING G C,ASAHARA M,MATSUMOTO Y.Chinese unknown word identification using character-based tagging and chunking[C]//ACL 2003:Proceedings of the 41st Annual Meeting on Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2003,2:197-200.
9XU Y,CHEN L.Term-frequency based feature selection methods for text categorization[C]//Proceedings of the 2010 4th International Conference on Genetic and Evolutionary Computing.Piscataway,NJ:IEEE,2010:280-283.
10JIANG S,PANG G,WU M,et al.An improved K-nearest-neighbor algorithm for text categorization[J].Expert Systems with Applications,2012,39(1):1503-1509.

共引文献43

1姜倩玉,王凤英,贾立鹏.基于逆向工程的恶意代码检测[J].中国科技论文在线精品论文,2021(2):148-159.
2贾立鹏,王凤英,姜倩玉.基于多特征融合和集成学习的恶意代码检测研究[J].中国科技论文在线精品论文,2021(2):168-176. 被引量：1
3罗文塽,曹天杰.基于非用户操作序列的恶意软件检测方法[J].计算机应用,2018,38(1):56-60. 被引量：6
4盛杰,刘岳,尹成语.基于多特征和Stacking算法的Android恶意软件检测方法[J].计算机系统应用,2018,27(2):197-201. 被引量：5
5朱晓妍,章辉,马建峰.基于Hook技术的Android平台隐私保护系统[J].网络与信息安全学报,2018,4(4):38-47. 被引量：1
6肖达,刘博寒,崔宝江,王晓晨,张索星.基于程序基因的恶意程序预测技术.[J].网络与信息安全学报,2018,4(8):21-30. 被引量：2
7陈天伟.基于RBF神经网络的Android恶意行为识别[J].现代电子技术,2018,41(15):83-86. 被引量：1
8王雪敬.基于Xgboost的Android恶意软件检测方法[J].电脑知识与技术,2019,15(6X):288-290.
9郎大鹏,丁巍,姜昊辰,陈志远.基于多特征融合的恶意代码分类算法[J].计算机应用,2019,39(8):2333-2338. 被引量：10
10杨频,潘岳镭,贾鹏,刘亮.基于汇编指令词向量特征的恶意软件检测研究[J].信息安全研究,2020,6(2):113-121. 被引量：4

1张阳,范俊杰,孙晓山,张颖君,程亮.基于系统调用序列学习的内核模糊测试[J].计算机系统应用,2023,32(9):19-31.
2陈冬林,吴天昊,吴江,徐书情.基于word2vec的内容过滤科技成果推荐模型研究[J].武汉理工大学学报（信息与管理工程版）,2023,45(4):599-606. 被引量：1
3轩勃娜,李进.基于改进CNN的恶意软件分类方法[J].电子学报,2023,51(5):1187-1197. 被引量：5
4兰晓芳,刘卓,许志豪,肖毅.基于TF-IDF和TextRank结合的中文文本关键词提取方法——以体育新闻为例[J].软件工程,2023,26(8):6-10. 被引量：8
5软包装生产过程问题答疑[J].包装前沿,2023(4):47-48.
6蒋映.基于关联分析和聚类算法的高校网络警报挖掘模型及实验研究[J].信息与电脑,2023,35(12):82-84.
7韩维,孙林檀,吕静贤,陈龙,彭渤,潘宝玉.电力企业互联网舆情数据规格化存储系统设计[J].信息技术,2023,47(8):160-164. 被引量：1
8Wen Wang,Jianhua Wang,Xiaofeng Peng,Ye Yang,Chun Xiao,Shuai Yang,Mingcai Wang,Lingfei Wang,Lin Li,Xiaolin Chang.Exploring best‑matched embedding model and classifier for charging‑pile fault diagnosis[J].Cybersecurity,2023,6(3):85-97.
9宋宝庆,尹诗棋,颜绮琳.后疫情时代高校学生社团发展困境与纾困路径研究[J].教育进展,2023,13(7):4158-4166.
10徐飞.高校教务信息管理系统中关联规则挖掘算法改进研究和应用[J].信息记录材料,2023,24(8):44-46. 被引量：1

网络新媒体技术

2023年第3期

浏览历史

内容加载中请稍等...

基于特征融合的恶意软件分类算法研究

参考文献7

二级参考文献33

共引文献43

相关作者

相关机构

相关主题

浏览历史