摘要
针对目前大多数Apriori改进算法在处理大数据集时所面临的性能瓶紧问题,该文以项集中各项在事务中的概率分布特征为切入点,并在BF-Apriori的逆序编码算法基础上,设计基于逆序转换的模式匹配算法和候选频繁项集生成算法,以提高规则挖掘过程的时间效率。最后,3个子算法构成了该文所提出的Apriori改进算法BF_Advanced-Apriori。理论分析及实验结果表明,BF_Ad-vanced-Apriori算法在处理大数据集时更具优势。
Aiming at performance bottleneck problem,which most current improved Apriori algorithms face when dealing with large dataset,this paper chooses the probability distribution of items as the entry point,then on the basis of reverse coding(RC) technique in BF-Aprori,this paper designs pattern matching based on reverse coding(PMRC) algorithm and candidate frequent itemsets generation(CFIG) algorithm,thus to improve the time efficiency of rule mining process.Finally,the three sub-algorithms constitute the improved Apriori algorithm,BF_Advanced-Apriori,which is proposed in this paper.Theoretical analysis shows that BF_Advanced-Apriori outperforms B-Apriori and BF-Apriori,which both are based on bit vector.
出处
《杭州电子科技大学学报(自然科学版)》
2011年第5期83-86,共4页
Journal of Hangzhou Dianzi University:Natural Sciences
基金
浙江省科技计划基金资助项目(C31066
C21093)
关键词
位向量
项概率分布
逆序编码
部分匹配
bit vector
item support distribution
reversed coding
partial matching