摘要
针对唇语识别过程中唇部特征提取和时序关系存在的问题,提出一种卷积神经网络(CNN)和双向长短时记忆网络(Bi-LSTM)相结合的深度学习模型。利用CNN学习唇部特征,并将学习到的唇部特征送入Bi-LSTM进行时序编码,通过Softmax进行分类。建立NUMBER DATASET和PHRACE DATASET两个大型汉语数据集以解决汉语唇语数据缺失问题。将该模型与传统的唇语识别方法在两个数据集上进行实验对比,发现在NUMBER DATASET上识别准确率为81.3%,比传统方法提高了8.1%,在PHRACE DATASET上识别准确率为83.5%,比传统方法提高了9%。实验结果表明该模型能有效提高唇语识别的准确率。
Aiming at the existing problems in lip feature extraction and temporal relation recognition during the research of lip-reading,a deep learning model based on convolutional neural network(CNN)and bi-directional long short-term memory(Bi-LSTM)was proposed.This paper utilizes CNN to learn the features of lip,puts these lip features acquired into Bi-LSTM to encode temporal information,and use softmax classifier to classify.Due to the lack of Chinese lip-reading data,it established two large Chinese lip-reading datasets named NUMBER DATASET and PHRACE DATASET.Compared with the traditional lip-reading methods on these two datasets,we find the recognition accuracy rate on NUMBER DATASET is 81.3%,which is 8.1%higher than the traditional method.The recognition accuracy rate on the PHRACE DATASET is 83.5%,which is 9%higher than the traditional method.The above experimental results show that the model can effectively improve the accuracy of lip-reading recognition.
作者
骆天依
刘大运
李修政
房国志
安欣
魏华杰
胡城
LUO Tian-yi;LIU Da-yun;LI Xiu-zheng;FANG Guo-zhi;AN Xin;WEI Hua-jie;HU Cheng(School of Automation,Harbin University of Science and Technology;School of Computer Science and Technology,Harbin University of Science and Technology;College of Measurement and Control Technology and Communication Engineering,Harbin University of Science and Technology,Harbin 150080,China)
出处
《软件导刊》
2019年第10期36-39,共4页
Software Guide
基金
黑龙江省大学生创新创业项目(20180214007)
关键词
唇语识别
卷积神经网络
双向长短时记忆网络
深度学习
时序编码
lip-reading
convolutional neural network
bi-directional long short-term memory
deep learning
sequential coding