摘要
已有大部分挖掘算法基本是针对两类数据来发现对照模式以挖掘所需信息,但是针对多类型数据发现对照模式的数据挖掘仍面临挑战。关联规则挖掘算法的缺陷是因为要生成大量规则,然而这其中却包含较多的冗余规则,非冗余规则挖掘算法尽管去除了冗余规则,然而有些规则针对特定应用领域的数据兴趣度太低,所以文中给出一种高效的多类型数据挖掘算法。所给算法根据统计方法定义了诱因模式与安全模式,并实现在多类医疗数据中发现所定义的两种模式。仿真实验给出多类医疗数据的直观因果关系图,且由所给算法生成的规则所获得的分类器证实了所给算法的高效性与实用性。所给算法生成的规则提供了精确且非常有用的信息,能够在诸如医疗研究领域中实际应用。
The contrast pattern which basically aiming to two types of data is found to gain required message,but it is great challenge that to find contrast pattern in existing multiple class data to carry out data mining. The limitation of the association rules in data mining algorithm is that the association rules need to generate lots of rules,and many of this rules are redundant rules. However,while the non-redundant rules of data mining algorithm has wiped the redundant rules,but there are still kinds of rules have low interest degree in certain specific application field. Thus,an effective mining algorithm for multiple class data is presented. The pathogenic pattern and protect pattern are defined based on statistical method,and the novel algorithm is realized to find the two patterns in multiple class medical data. Meanwhile,a clearly causal graph is drawn according to the simulated experiment,and the classifier of the novel rules generated by the presented algorithm also verified the efficiency and practicability of the novel algorithm. So the rules generated by the presented algorithm provided accurate and useful message,and could be applied actually in medical research fields.
作者
张新英
付川南
ZHANG Xin-ying FU Chuan-nan(College of Information and Business, Zhongyuan University of Technology, Zhengzhou 451191, China)
出处
《中国电子科学研究院学报》
北大核心
2017年第4期359-364,共6页
Journal of China Academy of Electronics and Information Technology
基金
河南省重点科技攻关项目(152102210155)
河南省高等学校重点科研项目(17A413014)
中原工学院信息商务学院院级科研项目(ky1615)
关键词
数据挖掘
多类型数据
优化规则
兴趣度
data mining
multiple class data
optimize rules
odd ratio