期刊文献+

基于视觉变换网络的音乐流派自动分类 被引量:4

Automatic music genre classification based on vision transformer network
在线阅读 下载PDF
导出
摘要 随着网络音乐产业的快速发展,构筑音乐自动检索和分类系统的需求日益增加。利用计算机对音乐流派进行正确标注是实现音乐类型精准分类和保障音乐推荐系统性能的重要前提。针对卷积运算不具备提取全局表征的能力,深度卷积神经网络对音乐流派数据的全局建模能力较弱的问题,提出了一种基于视觉变换(ViT)神经网络的音乐流派自动分类方法。该方法对待分类的音频进行预处理后,利用短时傅里叶变换(STFT)转化为尺寸统一的语谱图切片,实现音乐频域特征的转换。为了避免训练过拟合,通过增加白噪声对语谱图切片集进行数据增强。然后利用所生成的语谱切片集及其增强后的数据集对所构建的ViT神经网络进行训练,从而实现音乐流派风格的自动分类。仿真结果表明,所构建的ViT网络在音乐流派分类公共数据集GTZAN上的测试识别准确率达到91.01%,比基于AlexNet、AlexNet-enhanced和VGG16等传统卷积神经网络(CNN)的音乐流派分类方法提升了1.00~5.00个百分点。 With the rapid development of the online music industry,the demand for building automatic music retrieval and classification systems is increasing.Correct annotation of music genres using computers is an important prerequisite to achieve accurate classification of music types and guarantee the performance of music recommendation systems.To address the problem that convolutional operations do not have the ability to extract global representations and deep convolutional neural networks are weak in global modeling of music genre data,an automatic music genre classification method based on Vision Transformer(ViT)neural network was proposed.After pre-processing the audio to be classified,a Short-Time Fourier Transform(STFT)was used to transform it into uniform-sized spectrogram slices to realize the conversion of music frequency domain features.In order to avoid training over-fitting,data enhancement was performed by adding white noise to the speech spectrum graph slice set.Then the generated spectrum slice set and its enhanced data set were used to train the constructed ViT neural network,so as to realize the automatic classification of music genre styles.Simulation results show that the test recognition accuracy of the constructed ViT network on the public GTZAN data set reaches 91.01%,which is 1.00-5.00 percentage points higher than those of traditional Convolutional Neural Network(CNN)based music genre classification methods such as AlexNet,AlexNet-enhanced and VGG16.
作者 董安明 刘宗银 禹继国 韩玉冰 周酉 DONG Anming;LIU Zongyin;YU Jiguo;HAN Yubing;ZHOU You(Big Data Institute,Qilu University of Technology,Jinan Shandong 250353,China;School of Mathematics and Statistics,Qilu University of Technology,Jinan Shandong 250353,China;School of Computer Science and Technology,Qilu University of Technology,Jinan Shandong 250353,China;Shandong HiCon New Media Institute Company Limited,Jinan Shandong 250013,China)
出处 《计算机应用》 CSCD 北大核心 2022年第S01期54-58,共5页 journal of Computer Applications
基金 国家重点研发计划项目(2017YFB1400500) 山东省重点研发计划项目(2019JZZY020124) 山东省自然科学基金资助项目(ZR2017BF012) 山东省高等学校青年创新团队发展计划(2019KJN010) 齐鲁工业大学(山东省科学院)计算机科学与技术学科基础研究加强计划项目(2021JC02014) 齐鲁工业大学(山东省科学院)计算机科学与技术学科人才培养提升计划项目(2021PY05001)。
关键词 视觉变换网络 音乐流派 特征转换 语谱图 深度学习 数据增强 vision transformer network music genre feature transform spectrogram deep learning data enhancement
  • 相关文献

参考文献6

二级参考文献35

  • 1Scaringella N,Zoia G,Mlynek D.Automatic genre classification of music content[J].IEEE Signal Processing Magzine,2006:133-141.
  • 2Tsunoo E,Tzanetakis G,Ono N.Beyond timbral statistics:improving music classification using percussive patterns and bass lines[J].IEEE Transactions on Audio,Speech,and Language Processing,2011,19(4):1003-1014.
  • 3Tan H T,Zhu Y W,Rahardja S,et al.Rhythm analysis for personal and social music application using drum loop patterns[C]//IEEE International Conference on Multimedia and Expo,2009:1672-1675.
  • 4Aggelos G,Vassilis K.Music tempo estimation and beat tracking by applying source separation and metrical relations[C]//IEEE International Conference on Acoustics,Speech,and Signal Processing,2012:421-424.
  • 5Huang Wendong,Wang Ye.A method for separating drum objects from polyphonic musical signals[C]//IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,2005:307-310.
  • 6Logan B.Mel frequency cepstral coefficients for music modeling[C]//International Society for Music Information Retrieval Conference,2000.
  • 7Tzanetakis G.Genre.tar.gz[EB/OL].[2012-09-23].http://opihi.cs.uvic.ca/sound/genres.
  • 8Correa D C,Saito J H,Costa L F.Musical genre beating to the rhythms of different drums[J].New Journal of Physics,2010,12:1-37.
  • 9Marolt M.A mid-level representation for melody-based retrieval in audio collections[J].IEEE Transactions on Multimedia,2008,10(8):1617-1625.
  • 10杨翠丽,郭昭辉,武港山.基于改进投票机制的音乐流派分类方法研究[J].计算机工程,2008,34(9):213-215. 被引量:5

共引文献21

同被引文献47

引证文献4

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部