摘要
NLP(Natural Language Processing,自然语言处理)是人工智能领域的一个主要研究方向,而文本分类是NLP处理技术的重要分支。自然语言处理使计算机、手机等电子设备能够具有识别理解人类语言的能力,由于其自身的复杂性,目前仍有许多技术难点没有被完全攻克,主要包括不断产生的新词、中文词语的一词多义、自然语言的灵活性等问题。该文以期刊论文作为实验数据,研究中文文本分类问题,在传统卷积神经网络模型的基础上提出了一种基于卷积神经网络和支持向量机结合的文本分类模型CNNSVM(Convolutional Neural Network and Support Vector Machine Classifier)。相较于传统方法,CNNSVM增加了注意力机制,简化了模型参数,并使用基于支持向量机的分类器替代传统模型中的softmax层帮助实现文本的分类。实验结果显示,该模型提升了特征词语的提取效果,有效解决了softmax层泛化能力较弱的问题。
NLP(Natural Language Processing)is a major research direction in the field of artificial intelligence,and text classification is an important branch of NLP.Natural language processing enables computers,mobile phones and other electronic devices to recognize and understand human language.Due to its complexity,there are still many technical difficulties that have not been completely solved by researchers,which mainly include new words,polysemy of Chinese words,flexibility of natural language and so on.Based on the experimental data of journal articles,we study the classification of Chinese text.Based on the traditional convolutional neural network model,a text classification model CNNSVM(Convolutional Neural Network and Support Vector Machine Classifier)is proposed.Compared with the traditional method,CNNSVM adds an attention mechanism,simplifies the parameters of the model,and uses a classifier based on support vector machine to replace the softmax layer in the traditional model to help realize text classification.The experimental results show that such model improves the extraction effect of feature words and effectively solves the problem of weak generalization ability of softmax layer.
作者
何铠
管有庆
龚锐
HE Kai;GUAN You-qing;GONG Rui(School of Internet of Things,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
出处
《计算机技术与发展》
2022年第7期22-27,共6页
Computer Technology and Development
基金
江苏省高校自然科学研究计划项目(05KJD520146)。
关键词
自然语言处理
词频算法
中文文本分类
权重预处理
词密度权重
natural language processing
word frequency algorithm
Chinese text classification
weight pretreatment
word density weigh