期刊文献+

Topic-aware pivot language approach for statistical machine translation

Topic-aware pivot language approach for statistical machine translation
原文传递
导出
摘要 The pivot language approach for statistical machine translation(SMT) is a good method to break the resource bottleneck for certain language pairs. However, in the implementation of conventional approaches, pivotside context information is far from fully utilized, resulting in erroneous estimations of translation probabilities. In this study, we propose two topic-aware pivot language approaches to use different levels of pivot-side context. The first method takes advantage of document-level context by assuming that the bridged phrase pairs should be similar in the document-level topic distributions. The second method focuses on the effect of local context. Central to this approach are that the phrase sense can be reflected by local context in the form of probabilistic topics, and that bridged phrase pairs should be compatible in the latent sense distributions. Then, we build an interpolated model bringing the above methods together to further enhance the system performance. Experimental results on French-Spanish and French-German translations using English as the pivot language demonstrate the effectiveness of topic-based context in pivot-based SMT. The pivot language approach for statistical machine translation (SMT) is a good method to break the resource bottleneck for certain language pairs. However, in the implementation of conventional approaches, pivot- side context information is far from fully utilized, resulting in erroneous estimations of translation probabilities. In this study, we propose two topic-aware pivot language approaches to use different levels of pivot-side context. The first method takes advantage of document-level context by assuming that the bridged phrase pairs should be similar in the document-level topic distributions. The second method focuses on the effect of local context. Central to this approach are that the phrase sense can be reflected by local context in the form of probabilistic topics, and that bridged phrase pairs should be compatible in the latent sense distributions. Then, we build an interpolated model bringing the above methods together to further enhance the system performance. Experimental results on French-Spanish and French-German translations using English as the pivot language demonstrate the effectiveness of topic-based context in pivot-based SMT.
出处 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2014年第4期241-253,共13页 浙江大学学报C辑(计算机与电子(英文版)
基金 Project supported by the National High-Tech R&D Program of China(No.2012BAH14F03) the National Natural Science Foundation of China(Nos.61005052 and 61303082) the Re-search Fund for the Doctoral Program of Higher Education of China(No.20120121120046) the Natural Science Foundation of Fujian Province of China(No.2011J01360) the Funda-mental Research Funds for the Central Universities,China(No.2010121068)
关键词 Natural language processing Pivot-based statistical machine translation Topical context information Natural language processing, Pivot-based statistical machine translation, Topical context information
  • 相关文献

参考文献45

  • 1Bertoldi, N. Federico, M. 2009. Domain adaptation for sta- tistical machine translation with monolingual resources. Proc. 4th Workshop on Statistical Machine Translation, p.182-189. [doi: 10.3115/1626431.1626468].
  • 2Bertoldi, N. Barbaiani, M. Federico, M. et al. 2008. Phrase-based statistical machine translation with pivot languages. Proc. Int. Workshop on Spoken Language Translation, p.143-149.
  • 3Blei, D.M. Ng, A.Y. Jordan, M.I. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3:993-1022.
  • 4, L. 2000. You'll take the high road and I'll take the low road: using a third language to improve bilingual word alignment. Proc. 18th Conf. on Computational Linguistics, p.97-103. [doi:lO.3115/990820.990835].
  • 5Callison-Burch, C. Koehn, P. Osborne, M. 2006. Im- proved statistical machine translation using para- phrases. Proc. Main Conf. on ttuman Language Technology Conf. of the North American Chapter of the Association of Computational Linguistics, p.17-24. [doi: 10.3115/1220835.1220838].
  • 6Chen, B.X. Foster, G. Kuhn, R. 2010. Bilingual sense similarity for statistical machine translation. Proc. 48th Annual Meeting of the Association for Computational Linguistics, p.834-843.
  • 7Clark, J.H. Dyer, C. Lavie, A. et al. 2011. Better hypoth- esis testing for statistical machine translation: control- ling for optimizer instability. Proc. 49th Annual Meet- ing of the Association for Computational Linguistics, p.176-181.
  • 8Cohn, T. Lapata, M. 2007. Machine translation by trian- gulation: making effective use of multi-parallel corpora. Proc. 45th Annual Meeting of the Association for Com- putational Linguistics, p.728-735.
  • 9Costa-Jussa, M.R. Henriquez, C. Banchs, R.E. 2011. En- hancing scarce-resource language translation through pivot combinations. Proc. 5th Int. Joint Conf. on Natural Language Processing, p.1361-1365.
  • 10Crego, J.M. Max, A. Yvon, F. 2010. Local lexical adapta- tion in machine translation through triangulation: SMT helping SMT. Proc. 23rd Int. Conf. on Computational Linguistics, p.232-240.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部