期刊文献+

基于加权距离的局部离群点检测算法 被引量:4

A Weighted-distance Based Outliers Detection Algorithm
在线阅读 下载PDF
导出
摘要 针对不同属性对数据点之间距离贡献的不同,提出了一种用于距离度量的属性加权策略。标称属性通过属性取值的信息熵进行加权,数值属性通过属性取值的标准差进行加权,混合属性根据标称属性和数值属性综合加权,加权策略可以放大离群点与正常数据之间的差别。仿真实验区分不同的属性类型对所提加权策略进行了验证,实验结果证明了策略的有效性。 With respect to the fact that different attribute has different affluence on the distance between data points,a strategy to weight attributes when calculating distance was proposed. According to the strategy,categorical attributes were weighted based on the entropies while numerical ones based on standard deviations. When dealing with mixed attributes,a method was introduced to integrate the weights gained from categorical attributes and numerical ones. The proposed strategy makes the outliers more significant. Experiments on different kinds of data prove the effectiveness of the proposed strategy.
机构地区 解放军理工大学
出处 《科学技术与工程》 北大核心 2014年第15期79-82,92,共5页 Science Technology and Engineering
基金 国家自然科学基金(70971137)资助
关键词 属性加权 信息熵 标准差 局部离群点因子(local cutlier factor LOF)算法 weighting attributes information entropy standard deviation local cutlier factor(LOF) algorithm
  • 相关文献

参考文献12

  • 1Hawkins D. Identification of outliers. London: Chapman & Hall, 1980.
  • 2HanJiawei,KamberM,PeiJian.DataMining:ConceptsandTech-niques(thirdedition).范明,孟小峰,译.北京:机械工业出版社,2012:351_375.
  • 3Rousseeuw P J, Hubert M. Robust statistics for outlier detection. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discov- ery, 2011 ;1(1) :73-79.
  • 4Eskin E. Anomaly detection over noisy data using learned probability distributions. Proceedings of the Int. Conf on Machine Learning, Stanford University ,2000:255-262.
  • 5Latecki L J, Lazarevic A, Pokrajac D. Outlier detection with kernel density functions. MLDM, 2007 ;4571:61-75.
  • 6Breunig M M, Kriegel H P, Ng R, et al. LOF: identifying density- based local outliers. ACM SIGMOD Int Conf Management of Data, 2000:93-104.
  • 7Jin W, Tung A K H, Han J, et al. Ranking outliers using symmetric neighborhood relationship. Knowledge Discovery and Data Mining ( PAKDD06), Singapore, 2006:577-593.
  • 8胡彩平,秦小麟.一种基于密度的局部离群点检测算法DLOF[J].计算机研究与发展,2010,47(12):2110-2116. 被引量:53
  • 9王敬华,赵新想,张国燕,刘建银.NLOF:一种新的基于密度的局部离群点检测算法[J].计算机科学,2013,40(8):181-185. 被引量:29
  • 10Ke Zhang, Hutter M, Jin Huidong. A new local distance-based out- lier detection approach for scattered real-world data. Advances in Knowledge Discovery and Data Mining, 2009 ; (5476) : 813-822.

二级参考文献14

  • 1孙焕良,鲍玉斌,于戈,赵法信,王大玲.一种基于划分的孤立点检测算法[J].软件学报,2006,17(5):1009-1016. 被引量:16
  • 2薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量:96
  • 3Breunig M M,Kriegel H P,Ng R T,et al.LOF:Identifying density-based local outliers[C]//Proc of ACM SIGMOD Conf.New York:ACM,2000:427-438.
  • 4Tang J,Chen Z,Fu A,et al.Enhancing effectiveness of outlier detections for low-density patterns[C]//Proc of Advances in Knowledge Discovery and Data Mining 6th Pacific Asia Conf.Berlin:Springer,2002:535-548.
  • 5Papadimitirou S,Kitagawa H,Gibbons P B,et al.LOCI:Fast outlier detection using the local correlation integral[C]//Proc of the 19th Int Conf on Data Engineering.Los Alamitos:IEEE Computer Society,2003:315-326.
  • 6Sanjay C,Pei Sun.SLOM:A new measure for local spatial outliers[J].Knowledge and Information Systems,2006,9(4):412-429.
  • 7Barnett V,Lewis T.Outliers in Statistical Data[M].New York:John Wiley and Sons,1994.
  • 8Johnson T,Kwok I,Ng R T.Fast computation of 2-dimensional depth contours[C]//Proc of the 4th Int Conf on Knowledge Discovery and Data Mining (KDD'98).New York:ACM,1998:224-228.
  • 9Knorr E M,Ng R T.Algorithms for mining distance-based outliers in large datasets[C]//Proc of the 24th Int Conf on Very Large Data Bases.New York:ACM,1998:392-403.
  • 10Ramaswamy S,Rastogi R,Shim K.Efficient algorithms for mining outliers from large data sets[C]//Proc of the 2000 ACM SIGMOD Int Conf on Management of Data.New York:ACM,2000:93-104.

共引文献70

同被引文献40

引证文献4

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部