摘要
异常值检测是数据挖掘的一个重要分支。在电力行业,异常值检测可用于电网故障检测、设备故障检测、用电异常检测等领域。文章根据电力大数据的特点,研究适用于电力大数据的异常值检测算法。针对快速密度峰值聚类算法用于异常值检测时未考虑数据的局部特点以及局部密度依赖于截断距离选取的不足,利用KNN(K-Nearest Neighbors)思想重新定义局部密度和距离,提出了基于KNN的快速密度峰值异常值检测算法,从而实现更加准确的异常值检测,并基于某省配电变压器的日负荷数据异常检测仿真实验证明了该算法的有效性。
Outlier detection is an important branch of data mining. In the power industry, outlier detection can be used for power failure detection, equipment fault detection, electricity anomaly detection and so on. This paper studies the outlier detection algorithm applicable to smart grid big data. In order to overcome the shortcomings in fast search and find density peaks clustering algorithm, the local characteristics of the datasets are not considered and the local density is sensitive to the cut-off distance, the idea of KNN algorithm is used to redefine the local density and distance to achieve more accurate outlier detection. Simulation results based on the daily load data of the distribution transformers in a province demonstrate the effectiveness of the algorithm.
出处
《电力信息与通信技术》
2017年第6期36-41,共6页
Electric Power Information and Communication Technology
基金
国家高技术研究发展计划(863计划)(2015AA050204)
国家电网公司科技项目(520626150032)
关键词
电力大数据
异常值检测
KNN算法
密度聚类
smart grid big data
outlier detection
KNN algorithm
density-based clustering