期刊文献+

基于改进ECAPA-TDNN的法庭自动说话人识别

Forensic Automatic Speaker Recognition Based on Enhanced ECAPA-TDNN
在线阅读 下载PDF
导出
摘要 为提高法庭说话人识别的可靠性和准确性,促进法庭语音检验方法和过程的科学评价范式转化,提出了一种基于改进通道注意力机制融合时延神经网络(emphasized channel attention propagation aggregation time delay neural network,ECAPA-TDNN)模型的法庭自动说话人识别方法。该方法为提高模型的准确率和泛化能力,融合空间注意力机制、通道注意力机制和多头注意力机制。首先,选择训练效果最佳的频谱图与伽马频率倒谱系数(gammatone frequency cepstral coefficients,GFCC)融合特征输入网络模型,把训练完成的神经网络作为深度特征提取器,然后,在法庭证据似然比量化评估体系中评估语音证据的强度。实验结果表明:在VoxCeleb1数据集上,Cllr值为0.156,优于之前发表文献中的法庭自动说话人识别系统结果;在中文zhaishell数据集上,误判率和漏判率均为零,并且支持同源假设的似然比最小值为1.72×10^(6),支持非同源假设的似然比最大值为5.83×10^(-21)。该方法进一步提高了识别系统的可靠性和准确性,可以为法庭语音证据评估结论提供强有力的支撑。 In order to enhance the reliability and accuracy of speaker recognition in courtrooms,and facilitate the transformation of scientific evaluation paradigm for courtroom voice analysis methods and processes,a novel method for automatic speaker recognition in courtrooms based on an improved emphasized channel attention propagation aggregation time delay neural network(ECAPA-TDNN)architecture was proposed.This method integrates spatial attention mechanism,channel attention mechanism,and multi-head attention mechanism to enhance the accuracy and generalization capability of the model.The network model utilized a fusion of spectrogram and gammatone frequency cepstral coefficients(GFCC),selecting the one with the best training performance as the input.The trained neural network was employed as a deep feature extractor,followed by evaluating the strength of speech evidence using a likelihood ratio quantification evaluation system specifically designed for courtroom evidence.Experimental results demonstrate that on the VoxCeleb1 dataset,the Cllr value is 0.156,outperforming the previously published literature on automatic speaker recognition systems in courtrooms.On the Chinese zhaishell dataset,the false acceptance rate and false rejection rate both reach zero,with a minimum likelihood ratio supporting the homogeneity hypothesis of 1.72×10^(6) and a maximum likelihood ratio supporting the heterogeneity hypothesis of 5.83×10^(-21).Consequently,this method further enhances the reliability and accuracy of the recognition system,providing robust support for the conclusion of evaluating speech evidence in courtrooms.
作者 万玫汐 王华朋 闫道申 刘鹏展 许铭洋 WAN Mei-xi;WANG Hua-peng;YAN Dao-shen;LIU Peng-zhan;XU Ming-yang(School of Public Security Information Technology and Intelligence,Criminal Investigation Police University of China,Shenyang 110854,China)
出处 《科学技术与工程》 北大核心 2024年第27期11763-11773,共11页 Science Technology and Engineering
基金 国家重点研发计划(2017YFC0821000) 司法部司法鉴定重点实验室(司法鉴定科学研究院)项目(KF202117)。
关键词 说话人识别 似然比 ECAPA-TDNN 注意力机制 特征融合 speaker recognition likelihood ratio ECAPA-TDNN attention mechanism feature fusion
  • 相关文献

参考文献4

二级参考文献37

共引文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部