针对高速公路收费数据中的异常检测和修复问题,分别了提出了基于相似系数和SSC(Sum of similar coefficients)的异常检测算法以及基于XGBoost(eXtreme gradient boosting)的多维数据预测修复方法,并使用这两种算法对实际收费数据进行了...针对高速公路收费数据中的异常检测和修复问题,分别了提出了基于相似系数和SSC(Sum of similar coefficients)的异常检测算法以及基于XGBoost(eXtreme gradient boosting)的多维数据预测修复方法,并使用这两种算法对实际收费数据进行了异常检测和修复处理。结果表明,基于SSC的异常检测算法能够考虑到数据维度之间的相关性,准确地对多维数据异常检测;同时XGBoost多元预测算法与仅针对单维数据的改进拉格朗日算法相比,R2从0.9166提升至0.9856。本文算法有效而准确,能够为公路管理部门数据分析提供高质量的数据支持。展开更多
As the rapid development of aviation industry and newly emerging crowd-sourcing projects such as Flightradar24 and FlightAware,large amount of air traffic data,particularly four-dimension(4D)trajectory data,have becom...As the rapid development of aviation industry and newly emerging crowd-sourcing projects such as Flightradar24 and FlightAware,large amount of air traffic data,particularly four-dimension(4D)trajectory data,have become available for the public.In order to guarantee the accuracy and reliability of results,data cleansing is the first step in analyzing 4D trajectory data,including error identification and mitigation.Data cleansing techniques for the 4D trajectory data are investigated.Back propagation(BP)neural network algorithm is applied to repair errors.Newton interpolation method is used to obtain even-spaced trajectory samples over a uniform distribution of each flight’s 4D trajectory data.Furthermore,a new method is proposed to compress data while maintaining the intrinsic characteristics of the trajectories.Density-based spatial clustering of applications with noise(DBSCAN)is applied to identify remaining outliers of sample points.Experiments are performed on a data set of one-day 4D trajectory data over Europe.The results show that the proposed method can achieve more efficient and effective results than the existing approaches.The work contributes to the first step of data preprocessing and lays foundation for further downstream 4D trajectory analysis.展开更多
文摘针对高速公路收费数据中的异常检测和修复问题,分别了提出了基于相似系数和SSC(Sum of similar coefficients)的异常检测算法以及基于XGBoost(eXtreme gradient boosting)的多维数据预测修复方法,并使用这两种算法对实际收费数据进行了异常检测和修复处理。结果表明,基于SSC的异常检测算法能够考虑到数据维度之间的相关性,准确地对多维数据异常检测;同时XGBoost多元预测算法与仅针对单维数据的改进拉格朗日算法相比,R2从0.9166提升至0.9856。本文算法有效而准确,能够为公路管理部门数据分析提供高质量的数据支持。
基金supported by the National Natural Science Foundations of China (Nos. 61861136005,61851110763,and 71731001).
文摘As the rapid development of aviation industry and newly emerging crowd-sourcing projects such as Flightradar24 and FlightAware,large amount of air traffic data,particularly four-dimension(4D)trajectory data,have become available for the public.In order to guarantee the accuracy and reliability of results,data cleansing is the first step in analyzing 4D trajectory data,including error identification and mitigation.Data cleansing techniques for the 4D trajectory data are investigated.Back propagation(BP)neural network algorithm is applied to repair errors.Newton interpolation method is used to obtain even-spaced trajectory samples over a uniform distribution of each flight’s 4D trajectory data.Furthermore,a new method is proposed to compress data while maintaining the intrinsic characteristics of the trajectories.Density-based spatial clustering of applications with noise(DBSCAN)is applied to identify remaining outliers of sample points.Experiments are performed on a data set of one-day 4D trajectory data over Europe.The results show that the proposed method can achieve more efficient and effective results than the existing approaches.The work contributes to the first step of data preprocessing and lays foundation for further downstream 4D trajectory analysis.