摘要
自统计机器翻译技术出现以来,调序一直是语序差异显著的语言对互译系统中的关键问题,基于大规模语料训练的调序方法得到了广泛研究。目前汉蒙双语语料资源十分有限,使得现有的依赖于大规模语料和语言学知识的调序方法难以取得良好效果。该文对已有的相关研究进行了分析,提出了在有限语料条件下的汉蒙统计机器翻译调序方法。该方法依据语言学知识获取对译文语序影响显著的短语类型,研究这些短语类型的调序方案,并融入已有的调序模型实现调序的优化。实验表明该方法在有限语料条件下的效果提升显著。
The reordering models are significant in reducing the difference of word orders between the language pairs in statistical machine translation.Most reordering approaches have high requirements of the scale of the parallel corpus in statistical machine translation.Chinese minority language resources are very scarce and difficult to achieve substantial growth in a short time.Therefore the current reordering approaches cannot play good effect in the translations between Chinese and minority languages.After analyzing the related studies,the paper proposes a sourceside reordering method based on a small parallel corpus.In virtue of the linguistic knowledge,we analyzed both corpus and translations to obtain the verb phrases which affected the word orders of translations evidently.And then we studied the reordering rules of these verb phrases,including manually written rules and automatically extracted rules.Experiments show that our method can improve the performance of the state-of-the-art phrase translation models.
出处
《中文信息学报》
CSCD
北大核心
2013年第5期198-204,共7页
Journal of Chinese Information Processing
基金
中国科学院信息化专项(XXH12504-1-10)
国家自然科学基金资助项目(61070099)
关键词
统计机器翻译
调序
动词短语
有限语料
statistical machine translation
reordering
verb phrase
small parallel corpus