Constructing a statistical model that best fits the background is a key step in geochemical anomaly identification. But the model is hard to be constructed in situations where the sample population has unknown and/or ...Constructing a statistical model that best fits the background is a key step in geochemical anomaly identification. But the model is hard to be constructed in situations where the sample population has unknown and/or complex distribution. Isolation forest is an outlier detection approach that explicitly isolates anomaly samples rather than models the population distribution. It can extract multivariate anomalies from huge-sized high-dimensional data with unknown population distribution. For this reason,we tentatively applied the method to identify multivariate anomalies from the stream sediment survey data of the Lalingzaohuo district,an area with a complex geological setting,in Qinghai Province in China. The performance of the isolation forest algorithm in anomaly identification was compared with that of a continuous restricted Boltzmann machine. The results show that the isolation forest model performs superiorly to the continuous restricted Boltzmann machine in multivariate anomaly identification in terms of receiver operating characteristic curve,area under the curve,and data-processing efficiency. The anomalies identified by the isolation forest model occupy 19% of the study area and contain 82% of the known mineral deposits,whereas the anomalies identified by the continuous restricted Boltzmann machine occupy 35% of the study area and contain 88% of the known mineral deposits. It takes 4. 07 and 279. 36 seconds respectively handling the dataset using the two models. Therefore,isolation forest is a useful anomaly detection method that can quickly extract multivariate anomalies from geochemical exploration data.展开更多
This paper presents a nonlinear multidimensional scaling model, called kernelized fourth quantifica- tion theory, which is an integration of kernel techniques and the fourth quantification theory. The model can deal w...This paper presents a nonlinear multidimensional scaling model, called kernelized fourth quantifica- tion theory, which is an integration of kernel techniques and the fourth quantification theory. The model can deal with the problem of mineral prediction without defining a training area. In mineral target prediction, the pre-defined statistical cells, such as grid cells, can be implicitly transformed using kernel techniques from input space to a high-dimensional feature space, where the nonlinearly separable clusters in the input space are ex- pected to be linearly separable. Then, the transformed cells in the feature space are mapped by the fourth quan- tifieation theory onto a low-dimensional scaling space, where the sealed cells can be visually clustered according to their spatial locations. At the same time, those cells, which are far away from the cluster center of the majority of the sealed cells, are recognized as anomaly cells. Finally, whether the anomaly cells can serve as mineral potential target cells can be tested by spatially superimposing the known mineral occurrences onto the anomaly ceils. A case study shows that nearly all the known mineral occurrences spatially coincide with the anomaly cells with nearly the smallest scaled coordinates in one-dimensional sealing space. In the case study, the mineral target cells delineated by the new model are similar to those predicted by the well-known WofE model.展开更多
Model performance assessment is a key procedure for mineral potential mapping, but the correspond-ing research achievements are seldom reported in literature. Cumulative gain and lift charts are well known in the data...Model performance assessment is a key procedure for mineral potential mapping, but the correspond-ing research achievements are seldom reported in literature. Cumulative gain and lift charts are well known in the data mining community specialized in marketing and sales applications and widely used in customer chum prediction for model performance assessment. In this paper, they are introduced into the field of mineral poten-tial mapping for model performance assessment. These two charts can be viewed as a graphic representation of the advantage of using a predictive model to choose mineral targets. A cumulative gain curve can represent how much a predictive model is superior to a random guess in mineral target prediction. A lift chart can express how much more likely the mineral targets predicted by a model are deposit-bearing ones than those by a random se-lection. As an illustration, the cumulative gain and lift charts are applied to measure the performance of weights of evidence, logistic regression,restricted Boltzmann machine, and multilayer perceptron in mineral potential mapping in the Altay district in northern Xinjiang in China. The results show that the cumulative gain and lift charts can visually reveal that the first three models perform well while the last one performs poorly. Thus, the cumulative gain and lift charts can serve as a graphic tool for model performance assessment in mineral potential mapping.展开更多
Isolation forest and elliptic envelope are used to detect geochemical anomalies,and the bat algorithm was adopted to optimize the parameters of the two models.The two bat-optimized models and their default-parameter c...Isolation forest and elliptic envelope are used to detect geochemical anomalies,and the bat algorithm was adopted to optimize the parameters of the two models.The two bat-optimized models and their default-parameter counterparts were used to detect multivariate geochemical anomalies from the stream sediment survey data of 1:50000 scale collected from the Helong district,Jilin Province,China.Based on the data modeling results,the receiver operating characteristic(ROC)curve analysis was performed to evaluate the performance of the two bat-optimized models and their default-parameter counterparts.The results show that the bat algorithm can improve the performance of the two models by optimizing their parameters in geochemical anomaly detection.The optimal threshold determined by the Youden index was used to identify geochemical anomalies from the geochemical data points.Compared with the anomalies detected by the elliptic envelope models,the anomalies detected by the isolation forest models have higher spatial relationship with the mineral occurrences discovered in the study area.According to the results of this study and previous work,it can be inferred that the background population of the study area is complex,which is not suitable for the establishment of elliptic envelope model.展开更多
基金Supported by projects of the National Natural Science Foundation of China(Nos.41272360,41472299,41672322)
文摘Constructing a statistical model that best fits the background is a key step in geochemical anomaly identification. But the model is hard to be constructed in situations where the sample population has unknown and/or complex distribution. Isolation forest is an outlier detection approach that explicitly isolates anomaly samples rather than models the population distribution. It can extract multivariate anomalies from huge-sized high-dimensional data with unknown population distribution. For this reason,we tentatively applied the method to identify multivariate anomalies from the stream sediment survey data of the Lalingzaohuo district,an area with a complex geological setting,in Qinghai Province in China. The performance of the isolation forest algorithm in anomaly identification was compared with that of a continuous restricted Boltzmann machine. The results show that the isolation forest model performs superiorly to the continuous restricted Boltzmann machine in multivariate anomaly identification in terms of receiver operating characteristic curve,area under the curve,and data-processing efficiency. The anomalies identified by the isolation forest model occupy 19% of the study area and contain 82% of the known mineral deposits,whereas the anomalies identified by the continuous restricted Boltzmann machine occupy 35% of the study area and contain 88% of the known mineral deposits. It takes 4. 07 and 279. 36 seconds respectively handling the dataset using the two models. Therefore,isolation forest is a useful anomaly detection method that can quickly extract multivariate anomalies from geochemical exploration data.
基金supported by National Natural Science Foundation of China (No.40872193)
文摘This paper presents a nonlinear multidimensional scaling model, called kernelized fourth quantifica- tion theory, which is an integration of kernel techniques and the fourth quantification theory. The model can deal with the problem of mineral prediction without defining a training area. In mineral target prediction, the pre-defined statistical cells, such as grid cells, can be implicitly transformed using kernel techniques from input space to a high-dimensional feature space, where the nonlinearly separable clusters in the input space are ex- pected to be linearly separable. Then, the transformed cells in the feature space are mapped by the fourth quan- tifieation theory onto a low-dimensional scaling space, where the sealed cells can be visually clustered according to their spatial locations. At the same time, those cells, which are far away from the cluster center of the majority of the sealed cells, are recognized as anomaly cells. Finally, whether the anomaly cells can serve as mineral potential target cells can be tested by spatially superimposing the known mineral occurrences onto the anomaly ceils. A case study shows that nearly all the known mineral occurrences spatially coincide with the anomaly cells with nearly the smallest scaled coordinates in one-dimensional sealing space. In the case study, the mineral target cells delineated by the new model are similar to those predicted by the well-known WofE model.
基金Supported by Project of the National Natural Science Foundation of China(Nos.41272360,41472299,61133011)
文摘Model performance assessment is a key procedure for mineral potential mapping, but the correspond-ing research achievements are seldom reported in literature. Cumulative gain and lift charts are well known in the data mining community specialized in marketing and sales applications and widely used in customer chum prediction for model performance assessment. In this paper, they are introduced into the field of mineral poten-tial mapping for model performance assessment. These two charts can be viewed as a graphic representation of the advantage of using a predictive model to choose mineral targets. A cumulative gain curve can represent how much a predictive model is superior to a random guess in mineral target prediction. A lift chart can express how much more likely the mineral targets predicted by a model are deposit-bearing ones than those by a random se-lection. As an illustration, the cumulative gain and lift charts are applied to measure the performance of weights of evidence, logistic regression,restricted Boltzmann machine, and multilayer perceptron in mineral potential mapping in the Altay district in northern Xinjiang in China. The results show that the cumulative gain and lift charts can visually reveal that the first three models perform well while the last one performs poorly. Thus, the cumulative gain and lift charts can serve as a graphic tool for model performance assessment in mineral potential mapping.
基金supported by the National Natural Science Foundation of China(Nos.41672322,41872244)。
文摘Isolation forest and elliptic envelope are used to detect geochemical anomalies,and the bat algorithm was adopted to optimize the parameters of the two models.The two bat-optimized models and their default-parameter counterparts were used to detect multivariate geochemical anomalies from the stream sediment survey data of 1:50000 scale collected from the Helong district,Jilin Province,China.Based on the data modeling results,the receiver operating characteristic(ROC)curve analysis was performed to evaluate the performance of the two bat-optimized models and their default-parameter counterparts.The results show that the bat algorithm can improve the performance of the two models by optimizing their parameters in geochemical anomaly detection.The optimal threshold determined by the Youden index was used to identify geochemical anomalies from the geochemical data points.Compared with the anomalies detected by the elliptic envelope models,the anomalies detected by the isolation forest models have higher spatial relationship with the mineral occurrences discovered in the study area.According to the results of this study and previous work,it can be inferred that the background population of the study area is complex,which is not suitable for the establishment of elliptic envelope model.