期刊文献+

用于语音检索的三联体深度哈希方法

Triplet deep hashing method for speech retrieval
在线阅读 下载PDF
导出
摘要 现有基于内容的语音检索中深度哈希方法对监督信息利用不足,生成的哈希码是次优的,而且检索精度和检索效率不高。针对以上问题,提出一种用于语音检索的三联体深度哈希方法。首先,将语谱图图像特征以三联体方式作为模型的输入来提取语音特征的有效信息;然后,提出注意力机制-残差网络(ARN)模型,即在残差网络(ResNet)的基础上嵌入空间注意力力机制,并通过聚集整个语谱图能量显著区域信息来提高显著区域表示;最后,引入新三联体交叉熵损失,将语谱图图像特征之间的分类信息和相似性映射到所学习的哈希码中,可在模型训练的同时实现最大的类可分性和最大的哈希码可分性。实验结果表明,所提方法生成的高效紧凑的二值哈希码使语音检索的查全率、查准率、F1分数均超过了98.5%。与单标签检索等方法相比,使用Log-Mel谱图作为特征的所提方法的平均运行时间缩短了19.0%~55.5%,能在减小计算量的同时,显著提高检索效率和精度。 The existing deep hashing methods of content-based speech retrieval do not make enough use of supervised information and have the suboptimal generated hash codes,low retrieval precision and low retrieval efficiency.To address the above problems,a triplet deep hashing method for speech retrieval was proposed.Firstly,the spectrogram image features were used as the input of the model in triplet manner to extract the effective information of the speech feature.Then,an Attentional mechanism-Residual Network(ARN)model was proposed,that is,the spatial attention mechanism was embedded on the basis of the ResNet(Residual Network),and the salient region representation was improved by aggregating the energy salient region information in the whole spectrogram.Finally,a novel triplet cross-entropy loss was introduced to map the classification information and similarity between spectrogram image features into the learned hash codes,thereby achieving the maximum class separability and maximal hash code discriminability during model training.Experimental results show that the efficient and compact binary hash codes generated by the proposed method has the recall,precision and F1 score of over 98.5%in speech retrieval.Compared with methods such as single-label retrieval method,the average running time of the proposed method using Log-Mel spectra as features is shorted by 19.0%to 55.5%.Therefore,this method can improve the retrieval efficiency and retrieval precision significantly while reducing the amount of computation.
作者 张秋余 温永旺 ZHANG Qiuyu;WEN Yongwang(School of Computer and Communication,Lanzhou University of Technology,Lanzhou Gansu 730050,China)
出处 《计算机应用》 CSCD 北大核心 2023年第9期2910-2918,共9页 journal of Computer Applications
基金 国家自然科学基金资助项目(61862041)。
关键词 语音检索 三联体深度哈希 注意力机制 语谱图特征 三联体交叉熵损失 speech retrieval triplet deep hashing attentional mechanism spectrogram feature triplet cross-entropy loss
  • 相关文献

参考文献3

二级参考文献9

共引文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部