单纯形算法在统计机器翻译Re-ranking中的应用被引量：2

Re-ranking for Statistical Machine Translation Using Simplex Algorithm

下载PDF

导出

摘要近年来,discriminative re-ranking技术已经被应用到很多自然语言处理相关的分支中,像句法分析,词性标注,机器翻译等,并都取得了比较好的效果,在各自相应的评估标准下都有所提高。本文将以统计机器翻译为例,详细地讲解利用单纯形算法(Simplex Algorithm)对翻译结果进行re-rank的原理和过程,算法的实现和使用方法,以及re-rank实验中特征选择的方法,并给出该算法在NIST-2002(开发集)和NIST-2005(测试集)中英文机器翻译测试集合上的实验结果,在开发集和测试集上,BLEU分值分别获得了1.26%和1.16%的提高。 Recently, discriminative re-ranking technique has been applied in many fields relative to NLP （Natural Language Processing）, such as parsing, pos-tagging, and machine translation etc., and performs very well. We will take SMT as an example to explain how to re-rank the translation candidates using Simplex Algorithm in detail and give the experiment results on NIST-2002（development set） and NIST_2005（test set） Chinese-to-English test sets. Our experiments show that we can gain significant improvements in BLEU by re-ranking. It can provide 1.26 % absolute increase in development set and 1.16 % absolute increase in test set.

作者付雷刘群

机构地区中国科学院研究生院北京中国科学院计算技术研究所多语言交互技术评测实验室

出处《中文信息学报》 CSCD 北大核心 2007年第3期28-33,共6页 Journal of Chinese Information Processing

基金国家自然科学基金资助项目(60573188)

关键词人工智能机器翻译 discriminative re—ranking 单纯形算法统计机器翻译 artificial intelligence machine translation discriminative re-ranking simplex algorithm SMT

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献14

1Ashish Venugopal and Stephan Vogel. Considerations in Maximum Mutual Information and Minimum Classification Error training for Statistical Machine Translation [A]. In: EAMT 2005 Conference Proceedings[C].
2B. Chen, R. Cattoni, N. Bertoldi, M. Cettolo, M.Federieo. The ITC-irst SMT System for IWSLT-2005[A].
3Franz Josef Och. Minimum error rate training in statistical machine translation [A]. Ins Pro. of ACL 2003 [C].
4Franz Josef Och and Hermann Ney. Discriminative Trainging and Maximum Entropy Models for Statistical Machine Translation [A]. In: Proceedings of the 40^th Annual Meeting of the ACL [C]. Philadelphia,July 2002, pp. 295-302.
5I. Dan Melamed. A Word-to-Word Model of Translational Equivalence [A]. In: Pro. of 35th Conference of the Association for Computational Linguistics (ACL'97) [C]. Madrid, 1997. 490-497.
6Libin Shen and A. K. Joshi. An SVM based voting algorithm with application to parse reranking [A]. In:Proc. of CoNLL 2003 [C].
7Libin Shen, Anoop Sarkar, Franz Josef Och. Discriminative Reranking for Machine Translation [A]. In:Proc. HLTNAACL 2004 [C].
8M. Cettolo, M. Federico, N. Bertoldi, R. Cattoni and B. Chen. A Look inside the ITC-irst SMT System[A]. In: Proceedings of the 10th MT-Summit [C].Phuket, Thailand. 2005.
9M. Collins and N. Dully. New ranking algorithm for parsing and tagging: Kernels over discret structures,and the voted perceptron [A]. In: Proceedings of ACL 2002 [C].
10P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, R. L. Mercer. The Mathematics of Statistical Machine Translation [J]. Computational Linguistics,1993, 19(2).

二级参考文献14

1俞士汶等.机器翻译译文质量自动评估系统[A]..中国中文信息学会1991年会论文集[C].,.314—319.
2Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, Paul S. Roossin, A Statistical Approach to Machine Translation [J],Computational Linguistics, 1990.
3Peter. F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, Robert L. Mercer, The Mathematics of Statistical Machine Translation: Parameter Estimation [J], Computational Linguiatics, 19,(2), 1993.
4F. J. Och, C. Tillmann, and H. Ney. Improved alignment models for statistical machine translation[A]. In Proc. of the Joint SIGDAT Conf. On Empirical Methods in Natural Language Processing and Very Large Corpora, pages 20-28, University of Maryland, College Park, MD, June 1999.
5Franz Josef Och, Hermann Ney. What Can Machine Translation Learn from Speech Recognition? [A]In: proceedings of MT 2001 Workshop: Towards a Road Map for MT, 26-31, Santiago de Compostels,Spain, September 2001.
6Franz Josef Och, Hermann Ney, Discriminative Training and Maximum Entropy Models for Statistical Machine Translation [A], ACL2002.
7K. A. Papineni, S. Roukos, and R. T. Ward. Feature-based language understanding[A]. In European Conf. on Speech Communication and Technology, 1435-1438, Rhodes, Greece, September,1997.
8K. A. Papineni, S. Roukos, and R. T. Ward. Maximum likelihood and discriminative training of direct translation models [A] In Proc. Int. Conf. on Accoustics, Speech, and Signal Processing,pages,189-192, Seattle, WA, May, 1998.
9Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu, Bleu: a Method for Automatic Evaluation of Machine Translation [R], IBM Research, RC22176 (W0109-022) September 17, 2001.
10Ye-Yi Wang, Grammar Inference and Statistical Machine Translation [D], Ph.D Thesis, Carnegie Mellon University, 1998.

共引文献70

1唐元楠.论机器翻译的现状[J].南国博览,2019,0(4):380-380.
2贾承勋,赖华,余正涛,文永华,于志强.基于短语替换的汉越伪平行句对生成[J].中文信息学报,2021,35(8):47-55. 被引量：2
3李霞,马骏腾,覃世豪.融合图像注意力的多模态机器翻译模型[J].中文信息学报,2020(7):68-78. 被引量：5
4周新栋,王挺.基于N元语言模型的文本分类方法[J].计算机应用,2005,25(1):11-13. 被引量：11
5肖明.机器翻译系统中间件模型[J].福建电脑,2006,22(3):122-123.
6李玉鑑.英汉翻译模板的标准化方案及其应用[J].中文信息学报,2006,20(B03):41-46.
7徐波,史晓东,刘群,宗成庆,庞薇,陈振标,杨振东,魏玮,杜金华,陈毅东,刘洋,熊德意,侯宏旭,何中军.2005统计机器翻译研讨班研究报告[J].中文信息学报,2006,20(5):1-9. 被引量：10
8王洪俊,施水才,俞士汶,肖诗斌.跨语言相似文档检索[J].中文信息学报,2007,21(1):30-37. 被引量：4
9张大鲲,张玮,冯元勇,孙乐.基于非连续短语的统计翻译模型研究[J].中文信息学报,2007,21(1):101-108. 被引量：5
10李俊,薛永增,赵铁军.常用统计翻译模型在口语汉英翻译中的比较研究[J].计算机应用研究,2007,24(6):69-71. 被引量：1

同被引文献31

1刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量：198
2侯宏旭,刘群,那顺乌日图.基于实例的汉蒙机器翻译[J].中文信息学报,2007,21(4):65-72. 被引量：16
3Sonja Niessen, Hermann Ney. Statistical Machine translation with Scarce Resources Using Morphosyntatic Information [J]. Computational Linguistics, 2004,30(2) : 181-204.
4Mei Yang, Katrin Kirchhoff. Phrase-based Backoff Models for Machine Translation of Highly Inflected Languages[C]// Proceedings of EACL. 2006: 41-48.
5Young-Suk Lee. Morphological analysis for statistical machine translation[C]//Proceedings of HLT-NAACL 2004-Companion Volume. 2004: 57-60.
6Andreas Zollmann, Ashish Venugopal, Stephan Vogel. Bridging the Inflection Morphology Gap for Arabic Statistical Machine Translation [C]//Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume. 2006: 201-204.
7Maja Popovic, Hermann Ney. Towards the Use of Word Stems and Suffixes for Statistical Machine Translation[C]//Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC). 2004:1585- 1588.
8Sharon Goldwater, David McClosky. Improving Statistical MT Through Morphological Analysis[C]// Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. 2005 : 676-683.
9Einat Minkov, Kristina Toutanova, Hisami Suzuki. Generating Complex Morphology for Machine Translation[C]//Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL' 07). 2007: 128-135.
10Kemal Oflazer, Ilknur Durgar E1-Kahlout. Exploring Different Representational Units in English to-Turkish Statistical Machine Translation [C]//Proceedings of the Second Workshop on Statistical Machine Translation (ACL'07). 2007: 25-32.

引证文献2

1米海涛,熊德意,刘群.中文词法分析与句法分析融合策略研究[J].中文信息学报,2008,22(2):10-17. 被引量：13
2杨攀,张建,李淼,乌达巴拉,雪艳.汉蒙统计机器翻译中的形态学方法研究[J].中文信息学报,2009,23(1):50-57. 被引量：10

二级引证文献23

1宁伟,蔡东风,张桂平,季铎,苗雪雷.基于条件随机场的冠词选择研究[J].中文信息学报,2008,22(6):116-122. 被引量：1
2骆凯,李淼,乌达巴拉,杨攀,朱海.汉蒙翻译模型中的依存语法与形态信息应用研究[J].中文信息学报,2009,23(6):98-104. 被引量：5
3董兴华,周俊林,郭树盛,吐尔洪.吾司曼.基于短语的汉维/维汉统计机器翻译[J].计算机工程,2011,37(9):16-18. 被引量：15
4姜文斌,吴金星,乌日力嘎,那顺乌日图,刘群.蒙古语有向图形态分析器的判别式词干词缀切分[J].中文信息学报,2011,25(4):30-34. 被引量：5
5李文,李淼,梁青,朱海,应玉龙,乌达巴拉.基于短语统计机器翻译模型蒙古文形态切分[J].中文信息学报,2011,25(4):122-128. 被引量：4
6姜文斌,吴金星,长青,那顺乌日图,刘群,赵理莉.蒙古语词法分析的有向图模型[J].中文信息学报,2011,25(5):94-100. 被引量：3
7徐春,杨勇,董兴华.汉维/维汉统计机器翻译中若干问题研究[J].计算机工程与应用,2011,47(35):150-154. 被引量：6
8陈功,罗森林,陈开江,冯扬,潘丽敏.结合结构下文及词汇信息的汉语句法分析方法[J].中文信息学报,2012,26(1):9-15. 被引量：6
9麦热哈巴·艾力,姜文斌,王志洋,吐尔根·依布拉音,刘群.维吾尔语词法分析的有向图模型[J].软件学报,2012,23(12):3115-3129. 被引量：22
10陈雷,李淼,张健,曾伟辉.有限语料汉蒙统计机器翻译调序方法研究[J].中文信息学报,2013,27(5):198-204. 被引量：2

1潘澄,吴共庆,李磊,胡学钢.基于领域模型的网页搜索排序算法[J].计算机系统应用,2015,24(11):107-114. 被引量：2
2人工智能[J].中国学术期刊文摘,2007,13(18):8-8.
3Pingqi Pan.A FAST SIMPLEX ALGORITHM FOR LINEAR PROGRAMMING[J].Journal of Computational Mathematics,2010,28(6):837-847. 被引量：3
4GAO Ning,DENG ZhiHong,L ShengLong.XDist: an effective XML keyword search system with re-ranking model based on keyword distribution[J].Science China(Information Sciences),2014,57(5):116-132.
5赵护林.TEM 8(2005-2012)翻译测试答题策略研究:文本类型理论[J].科技创新与应用,2012,2(09Z):313-313.
6赵护林.TEM 8汉英与英汉翻译策略对比研究——一项基于有声思维法的实证分析[J].科技视界,2012(23):200-200. 被引量：1
7马平.CET4翻译测试考点分析[J].山东工业技术,2013(8):148-148. 被引量：1
8马平.对CET4翻译测试核心语法考点的分析[J].科技信息,2013(23):262-262. 被引量：2
9田艳.翻译网上自动评分初探[J].中国科技翻译,2008,21(1):33-35. 被引量：16
10SI Yujing,LI Ta,PAN Jielin,YAN Yonghong.A Prefix Tree Based n-best List Re-scoring Strategy for Recurrent Neural Network Language Model[J].Chinese Journal of Electronics,2014,23(1):70-74. 被引量：3

中文信息学报

2007年第3期

浏览历史

内容加载中请稍等...

单纯形算法在统计机器翻译Re-ranking中的应用被引量：2

参考文献14

二级参考文献14

共引文献70

同被引文献31

引证文献2

二级引证文献23

相关作者

相关机构

相关主题

浏览历史

单纯形算法在统计机器翻译Re-ranking中的应用 被引量：2

参考文献14

二级参考文献14

共引文献70

同被引文献31

引证文献2

二级引证文献23

相关作者

相关机构

相关主题

浏览历史

单纯形算法在统计机器翻译Re-ranking中的应用被引量：2