摘要
采用太赫兹时域光谱技术,结合化学计量学方法,对牛黄及其易混品进行鉴别,获取了黄连、大黄、蒲黄、人工牛黄、掺杂牛黄和天然牛黄的太赫兹时域光谱图。分别构建了随机森林(RF)模型和三种参数优化的支持向量机(SVM)模型,对六种物质的太赫兹吸收光谱进行了分类鉴别。针对样品数据集不平衡导致的随机森林模型识别率下降的问题,提出了基于合成少数类过采样技术(SMOTE)的随机森林模型。结果表明,随机森林模型和SVM模型均可达到95.00%左右的分类准确率,但随机森林模型具有更快的运行速度,运行时间仅为最优PSO-SVM模型运行时间的2%。基于SMOTE的随机森林模型可有效地解决数据不平衡情况下识别率低的问题,识别率从数据不平衡情况下的84.17%提高到94.17%,计算速度基本不变。研究结论为基于太赫兹光谱技术的稀有中药的鉴别提供了新方法。
We employ terahertz time-domain spectroscopy(THz-TDS)combined with chemometrics to identify Calculus bovis and its confounding substances,and obtain the THz-TDS of Coptidis rhizome,Rhubarb,Cattail pollen,Calculus bovis,artificial Calculus bovis,and adulterate Calculus bovis.The random forest(RF)classification model and the support vector machine(SVM)model which adopts three kinds of parameter optimization are established,respectively.The classification and identification of the THz absorption spectra of six kinds of matter are conducted.In addition,the RF model based on the synthetic minority over-sampling technique(SMOTE)is proposed to solve the problem that the recognition rate of the RF model decreases due to the serious unbalanced sample dataset.The results show that both the RF model and the SVM model can achieve a recognition rate of about 95.00%.However,the RF model can run much faster,whose running time is only 2%of that of the optimal PSO-SVM model.The RF model based on the SMOTE technique can effectively solve the problem of low recognition rate caused by unbalanced data.The recognition rate increases from 84.17%to 94.17%,and the operation speed is basically constant.The research conclusion provides a new approach for the identification of rare Chinese medicine using terahertz spectroscopy.
作者
章龙
李春
李天莹
张岩
蒋玲
Zhang Long;Li Chun;Li Tianying;Zhang Yan;Jiang Ling(College of Information Science and Technology,Nanjing Forestry University,Nanjing,Jiangsu 210037,China)
出处
《激光与光电子学进展》
CSCD
北大核心
2020年第23期356-362,共7页
Laser & Optoelectronics Progress
基金
国家自然科学基金(31200541)
江苏省自然科学基金(BK20161526)。
关键词
光谱学
太赫兹时域光谱
天然牛黄
随机森林
不平衡数据
支持向量机
spectroscopy
terahertz time domain spectroscopy
Calculus bovis
random forest
unbalanced data
support vector machine