期刊文献+

基于预训练语言模型的商品属性抽取 被引量:5

Pre-trained Language Models for Product Attribute Extraction
在线阅读 下载PDF
导出
摘要 属性抽取是构建知识图谱的关键一环,其目的是从非结构化文本中抽取出与实体相关的属性值。该文将属性抽取转化成序列标注问题,使用远程监督方法对电商相关的多种来源文本进行自动标注,缓解商品属性抽取缺少标注数据的问题。为了对系统性能进行精准评价,构建了人工标注测试集,最终获得面向电商的多领域商品属性抽取标注数据集。基于新构建的数据集,该文进行多组实验并进行实验结果分析。特别地,基于多种预训练语言模型,进行了领域内和跨领域属性抽取。实验结果表明,预训练语言模型可以较好地提高抽取性能,其中ELECTRA在领域内属性抽取表现最佳,而在跨领域实验中BERT表现最佳。同时,该文发现增加少量目标领域标注数据可以有效提高跨领域属性抽取效果,增强了模型的领域适应性。 Attribute extraction is a key step of constructing a knowledge graph. In this paper, the task of attribute extraction is converted into a sequence labeling problem. Due to a lack of labeling data in product attribute extraction, we use the distant supervision to automatically label multiple source texts related to e-commerce. In order to accurately evaluate the performance of the system, we construct a manually annotated test set, and finally obtain a new data set for product attribute extraction in multi-domains. Based on the newly constructed data set, we carried out intra-domain and cross-domain attribute extraction for a variety of pre-trained language models. The experimental results show that the pre-trained language models can better improve the extraction performance. Among them, ELECTRA performs the best in attribute extraction in in-domain experiments, and BERT performs the best in cross-domain experiments. we also find that adding a small amount of target domain annotation data can effectively improve the performance cross-domain attribute extraction and enhance the domain adaptability of the model.
作者 张世奇 马进 周夏冰 贾昊 陈文亮 张民 ZHANG Shiqi;MA Jin;ZHOU Xiabing;JIA Hao;CHEN Wenliang;ZHANG Min(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China)
出处 《中文信息学报》 CSCD 北大核心 2022年第1期56-64,共9页 Journal of Chinese Information Processing
基金 国家自然科学基金(61876115)。
关键词 属性抽取 远程监督 预训练语言模型 跨领域学习 attribute extraction distant supervision pre-trained language model domain adaptation
  • 相关文献

参考文献3

二级参考文献35

  • 1李红亮,杨燕,尹红风,贾真.基于规则的百科人物属性抽取[J].集成技术,2013,2(3):1-4. 被引量:3
  • 2董静,孙乐,冯元勇,黄瑞红.中文实体关系抽取中的特征选择研究[J].中文信息学报,2007,21(4):80-85. 被引量:55
  • 3Tang J,Zhang J,Yao L,et al. Arnetminer: Extraction and Mining of Academic Social Networks[C]//Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM,2008:990-998.
  • 4Yang Q, Zhang C, Niu Z. Two-stage Web Record Extraction[C]//Computer Science & Education (ICCSE), 2013 8th In- ternational Conference on. IEEE,2013:783-788.
  • 5Bing L, Lam W, Wong T L. Wikipedia Entity Expansion and attribute Extraction form the Web using Semi-supervised Learning[C] // Proceedings of the sixth ACM International Conference on Web Search and Data Mining. ACM, 2013: 567- 576.
  • 6Wu B,Cheng X,Wang Y,et al. Simultaneous Product Attribute 'Name and Value Extraction from Web Pages[C]//Pro- ceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technol- ogy. IEEE Computer Society, 2009 : 295-298.
  • 7Wong T L, Lam W, Wong T S. An Unsupervised Framework for Extracting and Normalizing Product Attributes from Multiple Web Sites[C]//Proceedings of the 31st annual international ACM SIGIR conference on Research and develop- ment in information retrieval. ACM, 2008 : 35-42.
  • 8Han H,Giles C L, Manavoglu E,et al. Automatic Document Metadata Extraction Uging Support Vector Machines[C]// Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. IEEE,2003:37-48.
  • 9Sekine S, Artiles J. Weps2 Attribute Extraction Task[C]//2nd Web People Search Evaluation Workshop, 18th WWW Conference, 2009.
  • 10de Pablo-Sanchez C, Martinez Fernfindez P. UC3M at WePS2-AE:Acquiring Patterns for People Attribute Extraction from Webpages[C]//2nd Web People Search Evaluation Workshop, 18th WWW Conference, 2009.

共引文献16

同被引文献35

引证文献5

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部