摘要
针对不同属性对数据点之间距离贡献的不同,提出了一种用于距离度量的属性加权策略。标称属性通过属性取值的信息熵进行加权,数值属性通过属性取值的标准差进行加权,混合属性根据标称属性和数值属性综合加权,加权策略可以放大离群点与正常数据之间的差别。仿真实验区分不同的属性类型对所提加权策略进行了验证,实验结果证明了策略的有效性。
With respect to the fact that different attribute has different affluence on the distance between data points,a strategy to weight attributes when calculating distance was proposed. According to the strategy,categorical attributes were weighted based on the entropies while numerical ones based on standard deviations. When dealing with mixed attributes,a method was introduced to integrate the weights gained from categorical attributes and numerical ones. The proposed strategy makes the outliers more significant. Experiments on different kinds of data prove the effectiveness of the proposed strategy.
出处
《科学技术与工程》
北大核心
2014年第15期79-82,92,共5页
Science Technology and Engineering
基金
国家自然科学基金(70971137)资助