摘要
在智慧政务的应用背景下,利用深度学习的方法对海量的科技政策文本数据进行自动分类,可以降低人工处理的成本,提高政策匹配的效率。利用BERT深度学习模型对科技政策进行自动分类实验,通过TextRank算法和TF-IDF算法提取政策文本关键词,将关键词与政策标题融合后输入BERT模型中以优化实验,并对比不同深度学习模型的分类效果来验证该方法的有效性。结果表明,通过BERT模型,融合标题和TF-IDF政策关键词的分类效果最佳,其准确率可达94.41%,证明利用BERT模型在标题的基础上加入政策关键词能够提高政策文本自动分类的准确率,实现对科技政策文本的有效分类。
In the context of the application of smart government,this article uses deep learning methods to automatically classify massive amounts of scientific and technological policy text data in order to reduce the cost of manual processing and improve the efficiency of policy matching.This paper used the BERT deep learning model to automatically classify science and technology policies.It extracted the keywords of the policy text through the TextRank algorithm and the TF-IDF algorithm,then integrated the policy titles and policy keywords into the BERT model,so as to optimize the experiment and improve the effect and accuracy of policy text classification.It also made a comprehensive comparative analysis of the classification effect on different deep learning models to show the superiority of this method.The results show that the classification effect of combining the title and TF-IDF policy keywords is the best through the BERT model,and the accuracy rate can reach 94.41%,which proves that adding policy keywords on the basis of the title can improve the accuracy of automatic classification of policy texts on BERT model.Our research achieves an efficient classification of science and technology policy texts.
作者
沈自强
李晔
丁青艳
王金颖
白全民
SHEN ZiQiang;LI Ye;DING QingYan;WANG JinYing;BAI QuanMin(School of Economics and Management,Qilu University of Technology(Shandong Academy of Sciences),Jinan 250014,P.R.China;Institute of Science and Technology for Development of Shandong,Jinan,250014,P.R.China;Shandong Computer Science Center(National Super Computer Center in Jinan),Jinan 250014,P.R.China)
出处
《数字图书馆论坛》
CSSCI
2022年第1期10-16,共7页
Digital Library Forum
基金
山东省高等学校青创科技支持计划“智能时代的产业变革:技术、制度与创业导向”(编号:2020RWG009)资助。
关键词
科技政策
文本分类
BERT模型
关键词提取
Science and Technology Policy
Text Classification
BERT Model
Keyword Extraction