摘要
已有工作表明,融入图像视觉语义信息可以提升文本机器翻译模型的效果。已有的工作多数将图片的整体视觉语义信息融入到翻译模型,而图片中可能包含不同的语义对象,并且这些不同的局部语义对象对解码端单词的预测具有不同程度的影响和作用。基于此,该文提出一种融合图像注意力的多模态机器翻译模型,将图片中的全局语义和不同部分的局部语义信息与源语言文本的交互信息作为图像注意力融合到文本注意力权重中,从而进一步增强解码端隐含状态与源语言文本的对齐信息。在多模态机器翻译数据集Multi30k上英语—德语翻译对以及人工标注的印尼语—汉语翻译对上的实验结果表明,该文提出的模型相比已有的基于循环神经网络的多模态机器翻译模型效果具有较好的提升,证明了该模型的有效性。
Visual semantic information can improve the performance of machine translation.However,most of the existing work incorporate the overall visual semantic information of the image into the translation model,ignoring the possible different local semantic object features.To deal with this issue,a multimodal machine translation model incorporating image attention is proposed in this paper.We incorporate the interaction information between local and global image visual information with the words of source language as an image attention into the traditional textual attention,for better alignment from hidden states of the decoder to the source words.We carry several experiments on Multi30 k dataset,the results on English-German and Indonesian-Chinese tasks(the latter is annotated by human manually)show that our model has a good improvement compared with the existing recurrent neural network based multimodal machine translation model.
作者
李霞
马骏腾
覃世豪
LI Xia;MA Junteng;QIN Shihao(Guangzhou Key Laboratory of Multilingual Intelligent Processing,Guangzhou,Guangdong 510006,China;School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou,Guangdong 510006,China)
出处
《中文信息学报》
CSCD
北大核心
2020年第7期68-78,共11页
Journal of Chinese Information Processing
基金
国家自然科学基金(61976062)
广州市科技计划项目(201904010303)
关键词
多模态机器翻译
图像注意力
图像全局语义
图像局部语义
multimodal machine translation
image attention
global visual semantic information
local visual semantic information