期刊文献+

基于事实抽取的Web文档内容数据质量评估 被引量:5

Ranking Data Quality of Web Article Content by Extracting Facts
在线阅读 下载PDF
导出
摘要 Web文档内容数据质量评估决定获取数据的有用性。基于词法或用户交互进行质量评估的方法缺乏通用性,也不能获取内容的事实内涵。因此提出基于事实的质量评估方法(Fact-based Quality Assessment,FQA)。首先在Web上构建目标文档上下文,并抽取Web文档内容的事实;然后分别采用投票和图迭代策略,构建准确性和完整性维度的参照;最后,比对目标文档和维度参照的事实,量化准确性和完整性。该方法不依赖特定特征,基于事实内涵量化数据质量维度,可取得高的评估精度。实验结果证明了FQA方法的优越性。 Data quality assessment of Web article content helps identify useful data.Exiting approaches not only heavily rely on lexicon features or user interactions to obtain quality indicators,but also can not capture the content’semantics.A fact-based quality assessment(FQA)approach was proposed in this article.Given one target article,the approach starts with the identification of alternative context by collecting relevant articles and extracting facts from every article.Then,the accuracy baseline is constructed by voting,and the completeness baseline is constructed by iterations over fact graphs.Finally,data quality dimensions,including accuracy and completeness are calculated by comparing the facts of the target article with the established dimension baselines.Based on the facts of target article content,rather than particular features,FQA approach can quantify data quality dimensions with high precisions.The superior performance of FQA was verified in the experiments.
出处 《计算机科学》 CSCD 北大核心 2014年第11期247-251,255,共6页 Computer Science
基金 国家自然科学基金项目(61003040 61100135) 中央高校基本科研业务费专项资金项目(LGZD201324)资助
关键词 数据质量 WEB文档 准确性 完整性 质量维度 事实 Data quality Web article Accuracy Completeness Quality dimensions Fact
  • 相关文献

参考文献24

  • 1Aebi D, Perrochon L. Towards improving data quality[C]// Proc. of the international conference on information systems and management Of data. New York, ACM, 1993 : 273-281.
  • 2马茜,谷峪,张天成,于戈.一种基于数据质量的异构多源多模态感知数据获取方法[J].计算机学报,2013,36(10):2120-2131. 被引量:21
  • 3郭志懋,周傲英.数据质量和数据清洗研究综述[J].软件学报,2002,13(11):2076-2082. 被引量:270
  • 4Pernici B,Scannapieco M. Data Quality in Web Information Sys- tems[C]//Proc, of the 21st International Conference on Concep- tual Modeling. Berlin Heidelberg: Springer, 2002 : 397-413.
  • 5Dalip D H, Cristo M, Calado P. Automatic assessment of docu- ment quality in web collaborative digital libraries [J]. ACM Journal of Data and Information Quality, 2011,2 (3) : 14.
  • 6Hu Mei-qun, Lim Ee-peng, Sun Ai-xirL Measuring Article Quali- ty in Wikipedia: Models and Evaluation[C]//Proc. of the 16th CIKM. New York: ACM, 2007.,243- 252.
  • 7Zeng H, Alhossaini M A, Li D, et al. Computing trust from revi- sion history[C]//Proc, of the 2006 International Conference on Privacy, Security and Trust:Bridge the Gap Between PST Tech- nologies and Business Services. New York: ACM, 2006.
  • 8Blumenstock J E. Size Matters: Word Count as a Measure of Quality on Wikipedia[C]//Proc. of the 17th International Con- ference on World Wide Web. New York:ACM,2008:1095-1096.
  • 9Knap T, Mlynkova I. Quality Assessment Social Networks: A Novel Approach for Assessing the Quality of Information on the Web[C]ffProc. of QDB of VLDB' 10. 2010.
  • 10Baeza-Yates R, Rello L. On Measuring the Lexical Quality of the Web[C]// Proe. of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality. New York: ACM, 2012 : 1-6.

二级参考文献59

  • 1顾阳.论元结构理论介绍[J].当代语言学,1994(1):1-11. 被引量:115
  • 2姜吉发.一种事件信息抽取模式获取方法[J].计算机工程,2005,31(15):96-98. 被引量:27
  • 3袁毓林.用动词的论元结构跟事件模板相匹配——一种由动词驱动的信息抽取方法[J].中文信息学报,2005,19(5):37-43. 被引量:22
  • 4梁晗,陈群秀,吴平博.基于事件框架的信息抽取系统[J].中文信息学报,2006,20(2):40-46. 被引量:38
  • 5Yangarber R, Grishman R, Tapanainen P, et al. Automatic Acquisition of Domain Knowledge for Information Extraction[C]// Proceedings of the 18^th International Conference on Computational Linguistics (COLING 2000). Saarbriicken, Germany, 2000:412-416.
  • 6Kim J, Moldovan D. Acquisition of Linguistic Patterns for Knowledge-based Information Extraction[J]. IEEE Transactions on Knowledge and Data Engineering, 1995,7(5) :713-724.
  • 7Aebi, D., Perrochon, L. Towards improving data quality. In: Sarda, N.L., ed. Proceedings of the International Conference on Information Systems and Management of Data. Delhi, 1993. 273~281.
  • 8Wang, R.Y., Kon, H.B., Madnick, S.E. Data quality requirements analysis and modeling. In: Proceedings of the 9th International Conference on Data Engineering. Vienna: IEEE Computer Society, 1993. 670~677.
  • 9Rahm, E., Do, H.H. Data cleaning: problems and current approaches. IEEE Data Engineering Bulletin, 2000,23(4):3~13.
  • 10Galhardas, H., Florescu, D., Shasha, D., et al. AJAX: an extensible data cleaning tool. In: Chen, W.D., Naughton, J.F., Bernstein, P.A., eds. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Texas: ACM, 2000. 590.

共引文献342

同被引文献45

引证文献5

二级引证文献46

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部