摘要
为了解决冠心病诊断模型中性能无法满足临床应用要求、缺乏可解释性的问题,提出一种融合XGBoost与SHAP的冠心病预测及其特征分析模型。在对数据集进行特征工程的基础上,将处理好的数据集输入XGBoost模型进行训练,并且对模型进行优化,进一步提高了模型的性能表现;其次,与基于SVM、朴素贝叶斯等六种机器学习模型以及八种主流机器学习模型进行实验对比,参数优化后的XGBoost模型在准确率、特异度、F_(1)值和AUC值四个指标上分别达到0.9942、0.9970、0.9941和0.9998,均优于已有模型;最后引入SHAP框架增强模型可解释性,综合四种模型特征重要性排序结果,识别出影响冠心病的重要因素,为医生作出正确的诊断提供决策参考。
To address the lack of practical application and interpretability of coronary artery disease(CAD)diagnostic models,this paper proposed a novel model based on XGBoost and SHAP for the diagnosis of CAD.Firstly,it put the processed dataset into the XGBoost model for training,and optimized the model to boost performance.Then,compared to six machine learning models such as SVM and naive Bayes and eight mainstream machine learning models,the parameter-optimized XGBoost model obtains 0.9942,0.9970,0.9941 and 0.9998 in accuracy,specificity,F_(1) and AUC,which are higher than the existing models.Lastly,it used the SHAP framework to improve model interpretability and identified important factors affecting CAD.The proposed model has the potential to be a useful diagnostic tool in hospitals for the diagnosis of CAD.
作者
陈小昆
左航旭
廖彬
孙瑞娜
Chen Xiaokun;Zuo Hangxu;Liao Bin;Sun Ruina(College of Statistics&Data Science,Xinjiang University of Finance&Economics,Urumchi 830012,China;Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China;School of Networks Security,University of Chinese Academy of Sciences,Beijing 100093,China)
出处
《计算机应用研究》
CSCD
北大核心
2022年第6期1796-1804,共9页
Application Research of Computers
基金
国家自然科学基金资助项目(61562078,71563048)
新疆天山青年计划资助项目(2018Q073)
新疆高校研自科项目(XJEDU2021Y037)
新疆“天山雪松计划”青年拔尖人才计划资助项目。