摘要
本文通过对电子邮件头信息和正文内容进行离散和特征化处理,将一封电子邮件用向量组的方式加以表达;进而使用基于信息熵的决策树分类技术构建一种垃圾邮件分类识别模型;最后通过实验对该模型做了相关的检验和测试。实验证明,该模型经过一定数量的垃圾邮件和正常邮件的对比学习后,能够进行垃圾邮件的识别,具有较好的效果。
By disperseing and charactering an email, this paper uses a group of vectors to express an email. And bring forward a determination tree classifying model base on information entropy. And then followed with some experiments and tests. The results proved that the model can find out how to identify the new spams by learning and training from the spams and normals. So it shows that our model and method work well.
出处
《计算机科学》
CSCD
北大核心
2008年第2期87-89,共3页
Computer Science
关键词
决策树
信息增益
数据挖掘
垃圾邮件
Data mining, Information entropy, Determination tree, Spam