期刊文献+

基于多维特征分析的戏曲类方志文献命名实体识别研究

Named Entity Recognition of Local Chronicles Literature in Traditional Chinese Opera Based on Multi-dimensional Feature Analysis
在线阅读 下载PDF
导出
摘要 方志是我国特有的一种具有极高史料价值的地方文献,对其进行数字化处理并实施知识挖掘,对传承传播中华传统文化、建设文化强国具有重要意义。命名实体识别作为一种基础性技术与关键环节,对方志知识组织与发现具有重要影响。目前,虽然方志命名实体识别已经取得了一定进展,但是仍缺乏适应方志文本特征与领域资源特征的系统化技术方案。基于此,本文提出融合多维特征与Bi-LSTM-CRF的戏曲类方志命名实体识别模型。首先,结合句法特征与符号、尾词、构词、上下文和负例等文本特征,对方志文献中的戏曲类实体特质进行解析;其次,利用在长文本结构中表现优异的Bi-LSTM-CRF模型,借助已解析的戏曲类实体特征,提升实体识别效率;最后,以《楚剧志》为具体对象开展实证研究,结果表明,本文提出的模型在命名实体识别效果上优于基准模型,F1值达到0.869。 Local chronicles are a unique and highly valuable form of regional documentation in China.Digitizing and implementing knowledge mining for these records is crucial for the inheritance and dissemination of traditional Chinese culture,as well as for the construction of a culturally strong nation.Named entity recognition(NER)plays a crucial role as a fundamental technology in organizing and discovering knowledge within local chronicles.Although there has been some progress in NER for local chronicles,a systematic technical solution that adapts to the specific features of these texts and the characteristics of domain resources is still lacking.Therefore,this study proposes a novel approach for named entity recognition in traditional Chinese opera local chronicles by integrating multi-dimensional features with Bi-LSTM-CRF.First,by combining syntactic features with textual features such as symbols,suffixes,word structure,context,and negative examples,the distinctive traits of opera entities within local chronicles are analyzed.Thereafter,the Bi-LSTM-CRF model,which performs well in long text structures,is utilized to improve the efficiency of entity recognition with the help of parsed features of opera-like entities.Finally,empirical research is conducted using the specific case of the“Chu Opera Chronicles.”The results demonstrate that the proposed model outperforms the baseline model in terms of named entity recognition,achieving an F1 score of 0.869.
作者 翟姗姗 余华娟 陈健瑶 夏立新 Zhai Shanshan;Yu Huajuan;Chen Jianyao;Xia Lixin(School of Information Management,Central China Normal University,Wuhan 430079;Intelligent Computing Laboratory for Cultural Heritage,Wuhan University,Wuhan 430072;University of Wisconsin-Milwaukee,Milwaukee 53202)
出处 《情报学报》 CSSCI CSCD 北大核心 2024年第9期1094-1104,共11页 Journal of the China Society for Scientific and Technical Information
基金 国家社会科学基金一般项目“数字人文视域下非遗知识图谱自动构建与长期演进研究”(20BTQ071) 教育部哲学社会科学实验室-武汉大学文化遗产智能计算实验室开放基金项目“面向非遗领域的地方志数字资源语义标注与关联发现”(2023ICLCH007)。
关键词 方志文献 戏曲类方志 命名实体识别 Bi-LSTM-CRF 多维特征分析 local chronicles literature local chronicles on traditional Chinese opera named entity recognition Bi-LSTMCRF multi-dimensional feature analysis
  • 相关文献

参考文献20

二级参考文献217

共引文献553

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部