期刊文献+

基于BERT的中文地址分词方法 被引量:3

Chinese address segment method based on BERT
在线阅读 下载PDF
导出
摘要 针对传统中文地址分词工作中存在的准确率差,识别率低的问题,提出了一种基于BERT的中文地址分词方法。同时,将非行政级别的地址标签进行重新设计,并通过构建BERT-BiLSTM-CRF模型,将中文地址分词任务转换为命名实体识别任务。利用大量全国地址数据对BERT进行训练,获取文本抽象特征;利用双向长短时记忆网络将文本序列化并结合上下文进一步获取文本特征;通过条件随机场获取最优序列,提取出正确的地址级别。该方法在所使用训练数据集上取得了98.21%的精确率和98.23的F1值,证明了该方法的有效性。 In order to solve the problems of poor accuracy and low recognition rate in traditional Chinese address segmentation,a Chinese address segmentation method based on BERT is proposed. At the same time,the non-administrative address label is redesigned,and the Chinese address segmentation task is transformed into named entity recognition task by constructing the BERT-BiLSTM-CRF model. A large number of national address data are used to train BERT to obtain the abstract features of the text. The bidirectional long short-term memory network is used to serialize the text and further obtain the text features in combination with the context. The optimal sequence is obtained through conditional random fields to extract the correct address level. The accuracy of 98.21% and the F1 value of 98.23 are obtained in the training data set,which proves the effectiveness of this method.
作者 孙士琦 汤鲲 SUN Shiqi;TANG Kun(Wuhan Research Institute of Posts and Telecommunications,Wuhan 430074,China;Fiber Home Starry Sky Co.,Ltd.,Nanjing 210000,China)
出处 《电子设计工程》 2021年第9期155-159,共5页 Electronic Design Engineering
关键词 BERT 中文地址分词 长短时记忆网络 条件随机场 命名实体识别 BERT Chinese address segment long short⁃term memory network conditional random fields named entity recognition
  • 相关文献

参考文献8

二级参考文献51

共引文献176

同被引文献29

引证文献3

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部