摘要
人们在发言时的手势动作往往具有自己独特的个人风格,研究者们提出了基于生成式对抗网络的语音驱动个人风格手势生成的方法,然而所生成的动作不自然,存在时序上动作不连贯的问题。针对该问题,文中提出了一种基于时空图卷积网络的语音驱动个人风格手势生成的方法,引入以时空图卷积网络为基础的时序动态性判别器,构建手势动作关节点之间空间和时间上的结构关系,并通过时空图卷积网络捕获手势动作关节点在空间上的相关性和提取时序上的动态性特征,使所生成的手势动作保持时序上的连贯性,以更符合真实手势的行为和结构。在Ginosar等构建的语音手势数据集上进行实验验证,与相关方法相比,正确关键点百分比指标提高了2%~5%,所生成的手势动作更自然。
People’s gestures in speaking often have their own unique personal style.Researchers have proposed a speech-driven personal style gesture generation method based on generative adversarial networks.However,the generated actions are unnatural for temporal discontinuity.To solve this problem,this paper proposes a speech-driven personal style gesture generation method based on the spatio-temporal graph convolutional networks,which adds the temporal dynamic discriminator based on spatio-temporal graph convolutional network.The spatial and temporal structural relationships between gesture joint points is firstly constructed,and then the spatial correlation of gesture joint points is captured and the dynamic characteristics in time sequence are extracted through the spatio-temporal graph convolution network(STGCN),so that the generated gestures maintain the consistency in time sequenceand are more consistent with the behavior and structure of real gestures.The proposed method is verified on the speech and gesture dataset constructed by Ginosar et al.Compared with relevant methods,the percentage of correct keypoints improves by about 2%~5%,and the generated gestures are more natural.
作者
张斌
刘长红
曾胜
揭安全
ZHANG Bin;LIU Chang-hong;ZENG Sheng;JIE An-quan(School of Computer&Information Engineering,Jiangxi Normal University,Nanchang 330022,China)
出处
《计算机科学》
CSCD
北大核心
2022年第S02期604-608,共5页
Computer Science
基金
国家自然科学基金(62067004,61662030)
关键词
跨模态生成
手势生成
个人风格学习
时空图卷积网络
时序动态性
Cross-modal generation
Gesture generation
Personal style learning
Spatio-Temporal graph convolutional networks
Temporal dynamics