期刊文献+

一种改进的密度偏差抽样算法 被引量:6

Improved density biased sampling algorithm
在线阅读 下载PDF
导出
摘要 随机抽样技术已经广泛应用于数据挖掘的各类算法中,它在处理分布均匀的数据集时非常有效,但在处理分布比较倾斜的数据集时容易丢失小的聚类。为此提出基于网格的密度偏差抽样算法,仅需要扫描一遍数据集就可以得到近似的密度偏差抽样。经实验测试分析表明,该算法不仅提高了聚类的正确性,而且抗噪声能力强、效率高,是解决海量数据挖掘的一种有效途径。 Uniform random sampling is widely applied to many kinds of algorithms in data mining. It processes uniform distribution data set extremely effectively, but easily loses slight cluster and consequently decreases clustering accuracy, when the processing data set is skew distribution. A grid-based density biased sampling algorithm (G_DBS) was proposed. It got approximate density biased samples through scanning data only one time. Our experimental evaluation shows that G_DBS algorithm not only improves the accuracy of clustering, but also is insensitive to noise and has high efficiency. It is one of the effective solutions to mass data mining.
出处 《计算机应用》 CSCD 北大核心 2007年第7期1695-1698,共4页 journal of Computer Applications
基金 重庆市自然科学基金资助项目(2005BB2063) 重庆市自然科学基金重点项目(2005BA2003) 重庆市教委科学技术研究项目(050509)
关键词 数据挖掘 偏差抽样 聚类 数据约简 海量数据 data mining biased sampling clustering data reduction mass data
  • 相关文献

参考文献8

  • 1PALMER C,FALOUTSOS C.Density biased sampling:an improved method for data mining and clustering[C]// Proceedings of 2000 ACM SIGMOD International Conference on Management of Data.Dallas,USA:ACM Press,2000:82-92.
  • 2KERDPRASOP K,KERDPRASOP N,SATTAYATHAM P.Density biased clustering based on reservoir sampling[C]// Proceedings of the Sixteenth International Workshop on Database and Expert Systems Applications.Copenhagen,Denmark:[s.n.].2005:1122-1126.
  • 3TOIVONEN H.Sampling large databases for association rules[C]// Proceedings of the 22th International Conference on Very Large Databases (VLDB'96).Bombay,India:Morgan Kaufmann,1996:134-145.
  • 4CHEN B,HAAS P,SCHEUERMANN P.A new two-phase sampling based algorithm for discovering association rules[C]// Proceedings of 2002 ACM SIGKDD international conference on knowledge discovery and data mining.Edmonton,Alberta,Canada:ACM Press,2002:462-468.
  • 5HANJ KAMBERM 范明 孟小峰译.数据挖掘概念与技术[M].北京:机械工业出版社,2001..
  • 6KOLLIOS G,GUMOPULOS D,KOUDAS N,et al.Efficient biased sampling for approximate clustering and outlier detection in large datasets[J].IEEE Transactions on Knowledge and Data Engineering,2003,15(5):1170-1187.
  • 7GUHA S,RASTOGI R,SHIM K.CURE:An efficient clustering algorithm for large databases[C]// Proceedings of 1998 ACM SIGMOD International Conference on Management of Data.Seattle,USA:ACM Press,1998:73-84.
  • 8[EB/OL].[2006-12-31]http://kdd.ics.uci.edu/databases/covertype/covertype.html.

共引文献44

同被引文献53

  • 1[1]VAPNIK V N.The Nature of Statistical Learning Theory[M].New York:Springer-Verlag,1995.
  • 2[3]BOTYOU B L,WESTON J.Breaking SVM compl exity with CROSS Training[J].Advances in Neural Information peessing Systems,2005,4(17):81-88.
  • 3朱梅红.数据挖掘中抽样技术的应用[J].统计与决策,2007,23(16):147-150. 被引量:4
  • 4GU B H, HU F F, LIU H. Sampling and its application in data mining: a survey[ R]. Singapore: National University of Singapore, 2000.
  • 5PALMER C R, FALOUTSOS C. Density biased sampling: an im- proved method for data mining and clustering[ C]// Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 2000:82 -92.
  • 6NANOPOULOS A, THEODORIDS Y, MANOLOPOULOS Y. In- dexed-based density biased sampling for clustering applications[ J].Data & Knowledge Engineering, 2006, 57(1) : 37 -63.
  • 7APPEL A P, PATERLINI A A, de SOUSA E P M, et al. A densi- ty-biased sampling technique to improve cluster representativeness [ C]// Proceedings of PKDD 2007. Berlin: Springer, 2007:366 - 373.
  • 8HUANG J B, SUN H L, KANG J M, et al. ESC: an efficient syn- chronization-based clustering algorithm [ J]. Knowledge-Based Sys- tems, 2013, 40". 111 - 122.
  • 9ZHAO Y C, CAO J, ZHANG C Q, et al. Enhancing grid-density based clustering for high dimensional data[ J]. Journal of Systems and Software, 2011,84(9) : 1524 - 1539.
  • 10PILEVAR A H, SUKUMAR M. GCHL: a grid-clustering algorithm for high-dimensional very large spatial data bases [ J]. Pattern Rec- ognition Letters, 2005, 26(7) : 999 - 1010.

引证文献6

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部