The proliferation of robot accounts on social media platforms has posed a significant negative impact,necessitating robust measures to counter network anomalies and safeguard content integrity.Social robot detection h...The proliferation of robot accounts on social media platforms has posed a significant negative impact,necessitating robust measures to counter network anomalies and safeguard content integrity.Social robot detection has emerged as a pivotal yet intricate task,aimed at mitigating the dissemination of misleading information.While graphbased approaches have attained remarkable performance in this realm,they grapple with a fundamental limitation:the homogeneity assumption in graph convolution allows social robots to stealthily evade detection by mingling with genuine human profiles.To unravel this challenge and thwart the camouflage tactics,this work proposed an innovative social robot detection framework based on enhanced HOmogeneity and Random Forest(HORFBot).At the core of HORFBot lies a homogeneous graph enhancement strategy,intricately woven with edge-removal techniques,tometiculously dissect the graph intomultiple revealing subgraphs.Subsequently,leveraging the power of contrastive learning,the proposed methodology meticulously trains multiple graph convolutional networks,each honed to discern nuances within these tailored subgraphs.The culminating stage involves the fusion of these feature-rich base classifiers,harmoniously aggregating their insights to produce a comprehensive detection outcome.Extensive experiments on three social robot detection datasets have shown that this method effectively improves the accuracy of social robot detection and outperforms comparative methods.展开更多
Evaluation of water richness in sandstone is an important research topic in the prevention and control of mine water disasters,and the water richness in sandstone is closely related to its porosity.The refl ection sei...Evaluation of water richness in sandstone is an important research topic in the prevention and control of mine water disasters,and the water richness in sandstone is closely related to its porosity.The refl ection seismic exploration data have high-density spatial sampling information,which provides an important data basis for the prediction of sandstone porosity in coal seam roofs by using refl ection seismic data.First,the basic principles of the variational mode decomposition(VMD)method and the random forest method are introduced.Then,the geological model of coal seam roof sandstone is constructed,seismic forward modeling is conducted,and random noise is added.The decomposition eff ects of the empirical mode decomposition(EMD)method and VMD method on noisy signals are compared and analyzed.The test results show that the fi rstorder intrinsic mode functions(IMF1)and IMF2 decomposed by the VMD method contain the main eff ective components of seismic signals.A prediction process of sandstone porosity in coal seam roofs based on the combination of VMD and random forest method is proposed.The feasibility and eff ectiveness of the method are verifi ed by trial calculation in the porosity prediction of model data.Taking the actual coalfi eld refl ection seismic data as an example,the sandstone porosity of the 8 coal seam roof is predicted.The application results show the potential application value of the new porosity prediction method proposed in this study.This method has important theoretical guiding signifi cance for evaluating water richness in coal seam roof sandstone and the prevention and control of mine water disasters.展开更多
To improve the efficiency of air quality analysis and the accuracy of predictions, this paper proposes a composite method based on Vector Autoregressive (VAR) and Random Forest (RF) models. In the theoretical section,...To improve the efficiency of air quality analysis and the accuracy of predictions, this paper proposes a composite method based on Vector Autoregressive (VAR) and Random Forest (RF) models. In the theoretical section, the model introduction and estimation algorithms are provided. In the empirical analysis section, global air quality data from 2022 to 2024 are used, and the proposed method is applied. Specifically, principal component analysis (PCA) is first conducted, and then VAR and Random Forest methods are used for prediction on the reduced-dimensional data. The results show that the RMSE of the hybrid model is 45.27, significantly lower than the 49.11 of the VAR model alone, verifying its superiority. The stability and predictive performance of the model are effectively enhanced.展开更多
The agricultural Internet of Things(IoT)system is a critical component of modern smart agriculture,and its security risk assessment methods have garnered increasing attention from the industry.Current agricultural IoT...The agricultural Internet of Things(IoT)system is a critical component of modern smart agriculture,and its security risk assessment methods have garnered increasing attention from the industry.Current agricultural IoT security risk assessment methods primarily rely on expert judgment,introducing subjective factors that reduce the credibility of the assessment results.To address this issue,this study constructed a dataset for agricultural IoT security risk assessment based on real-world security reports.A PCARF algorithm,built on random forest principles,was proposed,incorporating ensemble learning strategies to enhance prediction accuracy.Compared to the second-best model,the proposed model demonstrated a 2.7%increase in accuracy,a 3.4%improvement in recall rate,a 3.1%rise in Area Under the Curve(AUC),and a 7.9%boost in Matthews Correlation Coefficient(MCC).Extensive comparative experiments showed that the proposed model outperforms others in prediction accuracy and robustness.展开更多
Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face ...Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face many challenges. This paper studies the problems of difficult feature information extraction,low precision of thin-layer identification and limited applicability of the model in intelligent lithologic identification. The author tries to improve the comprehensive performance of the lithology identification model from three aspects: data feature extraction, class balance, and model design. A new real-time intelligent lithology identification model of dynamic felling strategy weighted random forest algorithm(DFW-RF) is proposed. According to the feature selection results, gamma ray and 2 MHz phase resistivity are the logging while drilling(LWD) parameters that significantly influence lithology identification. The comprehensive performance of the DFW-RF lithology identification model has been verified in the application of 3 wells in different areas. By comparing the prediction results of five typical lithology identification algorithms, the DFW-RF model has a higher lithology identification accuracy rate and F1 score. This model improves the identification accuracy of thin-layer lithology and is effective and feasible in different geological environments. The DFW-RF model plays a truly efficient role in the realtime intelligent identification of lithologic information in closed-loop drilling and has greater applicability, which is worthy of being widely used in logging interpretation.展开更多
As massive underground projects have become popular in dense urban cities,a problem has arisen:which model predicts the best for Tunnel Boring Machine(TBM)performance in these tunneling projects?However,performance le...As massive underground projects have become popular in dense urban cities,a problem has arisen:which model predicts the best for Tunnel Boring Machine(TBM)performance in these tunneling projects?However,performance level of TBMs in complex geological conditions is still a great challenge for practitioners and researchers.On the other hand,a reliable and accurate prediction of TBM performance is essential to planning an applicable tunnel construction schedule.The performance of TBM is very difficult to estimate due to various geotechnical and geological factors and machine specifications.The previously-proposed intelligent techniques in this field are mostly based on a single or base model with a low level of accuracy.Hence,this study aims to introduce a hybrid randomforest(RF)technique optimized by global harmony search with generalized oppositionbased learning(GOGHS)for forecasting TBM advance rate(AR).Optimizing the RF hyper-parameters in terms of,e.g.,tree number and maximum tree depth is the main objective of using the GOGHS-RF model.In the modelling of this study,a comprehensive databasewith themost influential parameters onTBMtogetherwithTBM AR were used as input and output variables,respectively.To examine the capability and power of the GOGHSRF model,three more hybrid models of particle swarm optimization-RF,genetic algorithm-RF and artificial bee colony-RF were also constructed to forecast TBM AR.Evaluation of the developed models was performed by calculating several performance indices,including determination coefficient(R2),root-mean-square-error(RMSE),and mean-absolute-percentage-error(MAPE).The results showed that theGOGHS-RF is a more accurate technique for estimatingTBMAR compared to the other applied models.The newly-developedGOGHS-RFmodel enjoyed R2=0.9937 and 0.9844,respectively,for train and test stages,which are higher than a pre-developed RF.Also,the importance of the input parameters was interpreted through the SHapley Additive exPlanations(SHAP)method,and it was found that thrust force per cutter is the most important variable on TBMAR.The GOGHS-RF model can be used in mechanized tunnel projects for predicting and checking performance.展开更多
Precise and timely prediction of crop yields is crucial for food security and the development of agricultural policies.However,crop yield is influenced by multiple factors within complex growth environments.Previous r...Precise and timely prediction of crop yields is crucial for food security and the development of agricultural policies.However,crop yield is influenced by multiple factors within complex growth environments.Previous research has paid relatively little attention to the interference of environmental factors and drought on the growth of winter wheat.Therefore,there is an urgent need for more effective methods to explore the inherent relationship between these factors and crop yield,making precise yield prediction increasingly important.This study was based on four type of indicators including meteorological,crop growth status,environmental,and drought index,from October 2003 to June 2019 in Henan Province as the basic data for predicting winter wheat yield.Using the sparrow search al-gorithm combined with random forest(SSA-RF)under different input indicators,accuracy of winter wheat yield estimation was calcu-lated.The estimation accuracy of SSA-RF was compared with partial least squares regression(PLSR),extreme gradient boosting(XG-Boost),and random forest(RF)models.Finally,the determined optimal yield estimation method was used to predict winter wheat yield in three typical years.Following are the findings:1)the SSA-RF demonstrates superior performance in estimating winter wheat yield compared to other algorithms.The best yield estimation method is achieved by four types indicators’composition with SSA-RF)(R^(2)=0.805,RRMSE=9.9%.2)Crops growth status and environmental indicators play significant roles in wheat yield estimation,accounting for 46%and 22%of the yield importance among all indicators,respectively.3)Selecting indicators from October to April of the follow-ing year yielded the highest accuracy in winter wheat yield estimation,with an R^(2)of 0.826 and an RMSE of 9.0%.Yield estimates can be completed two months before the winter wheat harvest in June.4)The predicted performance will be slightly affected by severe drought.Compared with severe drought year(2011)(R^(2)=0.680)and normal year(2017)(R^(2)=0.790),the SSA-RF model has higher prediction accuracy for wet year(2018)(R^(2)=0.820).This study could provide an innovative approach for remote sensing estimation of winter wheat yield.yield.展开更多
Driven piles are used in many geological environments as a practical and convenient structural component.Hence,the determination of the drivability of piles is actually of great importance in complex geotechnical appl...Driven piles are used in many geological environments as a practical and convenient structural component.Hence,the determination of the drivability of piles is actually of great importance in complex geotechnical applications.Conventional methods of predicting pile drivability often rely on simplified physicalmodels or empirical formulas,whichmay lack accuracy or applicability in complex geological conditions.Therefore,this study presents a practical machine learning approach,namely a Random Forest(RF)optimized by Bayesian Optimization(BO)and Particle Swarm Optimization(PSO),which not only enhances prediction accuracy but also better adapts to varying geological environments to predict the drivability parameters of piles(i.e.,maximumcompressive stress,maximum tensile stress,and blow per foot).In addition,support vector regression,extreme gradient boosting,k nearest neighbor,and decision tree are also used and applied for comparison purposes.In order to train and test these models,among the 4072 datasets collected with 17model inputs,3258 datasets were randomly selected for training,and the remaining 814 datasets were used for model testing.Lastly,the results of these models were compared and evaluated using two performance indices,i.e.,the root mean square error(RMSE)and the coefficient of determination(R2).The results indicate that the optimized RF model achieved lower RMSE than other prediction models in predicting the three parameters,specifically 0.044,0.438,and 0.146;and higher R^(2) values than other implemented techniques,specifically 0.966,0.884,and 0.977.In addition,the sensitivity and uncertainty of the optimized RF model were analyzed using Sobol sensitivity analysis and Monte Carlo(MC)simulation.It can be concluded that the optimized RF model could be used to predict the performance of the pile,and it may provide a useful reference for solving some problems under similar engineering conditions.展开更多
Fatigue reliability-based design optimization of aeroengine structures involves multiple repeated calculations of reliability degree and large-scale calls of implicit high-nonlinearity limit state function,leading to ...Fatigue reliability-based design optimization of aeroengine structures involves multiple repeated calculations of reliability degree and large-scale calls of implicit high-nonlinearity limit state function,leading to the traditional direct Monte Claro and surrogate methods prone to unacceptable computing efficiency and accuracy.In this case,by fusing the random subspace strategy and weight allocation technology into bagging ensemble theory,a random forest(RF)model is presented to enhance the computing efficiency of reliability degree;moreover,by embedding the RF model into multilevel optimization model,an efficient RF-assisted fatigue reliability-based design optimization framework is developed.Regarding the low-cycle fatigue reliability-based design optimization of aeroengine turbine disc as a case,the effectiveness of the presented framework is validated.The reliabilitybased design optimization results exhibit that the proposed framework holds high computing accuracy and computing efficiency.The current efforts shed a light on the theory/method development of reliability-based design optimization of complex engineering structures.展开更多
In the era of the Internet,widely used web applications have become the target of hacker attacks because they contain a large amount of personal information.Among these vulnerabilities,stealing private data through cr...In the era of the Internet,widely used web applications have become the target of hacker attacks because they contain a large amount of personal information.Among these vulnerabilities,stealing private data through crosssite scripting(XSS)attacks is one of the most commonly used attacks by hackers.Currently,deep learning-based XSS attack detection methods have good application prospects;however,they suffer from problems such as being prone to overfitting,a high false alarm rate,and low accuracy.To address these issues,we propose a multi-stage feature extraction and fusion model for XSS detection based on Random Forest feature enhancement.The model utilizes RandomForests to capture the intrinsic structure and patterns of the data by extracting leaf node indices as features,which are subsequentlymergedwith the original data features to forma feature setwith richer information content.Further feature extraction is conducted through three parallel channels.Channel I utilizes parallel onedimensional convolutional layers(1Dconvolutional layers)with different convolutional kernel sizes to extract local features at different scales and performmulti-scale feature fusion;Channel II employsmaximum one-dimensional pooling layers(max 1D pooling layers)of various sizes to extract key features from the data;and Channel III extracts global information bi-directionally using a Bi-Directional Long-Short TermMemory Network(Bi-LSTM)and incorporates a multi-head attention mechanism to enhance global features.Finally,effective classification and prediction of XSS are performed by fusing the features of the three channels.To test the effectiveness of the model,we conduct experiments on six datasets.We achieve an accuracy of 100%on the UNSW-NB15 dataset and 99.99%on the CICIDS2017 dataset,which is higher than that of the existing models.展开更多
In recent years,machine learning(ML)and deep learning(DL)have significantly advanced intrusion detection systems,effectively addressing potential malicious attacks across networks.This paper introduces a robust method...In recent years,machine learning(ML)and deep learning(DL)have significantly advanced intrusion detection systems,effectively addressing potential malicious attacks across networks.This paper introduces a robust method for detecting and categorizing attacks within the Internet of Things(IoT)environment,leveraging the NSL-KDD dataset.To achieve high accuracy,the authors used the feature extraction technique in combination with an autoencoder,integrated with a gated recurrent unit(GRU).Therefore,the accurate features are selected by using the cuckoo search algorithm integrated particle swarm optimization(PSO),and PSO has been employed for training the features.The final classification of features has been carried out by using the proposed RF-GNB random forest with the Gaussian Naïve Bayes classifier.The proposed model has been evaluated and its performance is verified with some of the standard metrics such as precision,accuracy rate,recall F1-score,etc.,and has been compared with different existing models.The generated results that detected approximately 99.87%of intrusions within the IoT environments,demonstrated the high performance of the proposed method.These results affirmed the efficacy of the proposed method in increasing the accuracy of intrusion detection within IoT network systems.展开更多
Automatically detecting Ulva prolifera(U.prolifera)in rainy and cloudy weather using remote sensing imagery has been a long-standing problem.Here,we address this challenge by combining high-resolution Synthetic Apertu...Automatically detecting Ulva prolifera(U.prolifera)in rainy and cloudy weather using remote sensing imagery has been a long-standing problem.Here,we address this challenge by combining high-resolution Synthetic Aperture Radar(SAR)imagery with the machine learning,and detect the U.prolifera of the South Yellow Sea of China(SYS)in 2021.The findings indicate that the Random Forest model can accurately and robustly detect U.prolifera,even in the presence of complex ocean backgrounds and speckle noise.Visual inspection confirmed that the method successfully identified the majority of pixels containing U.prolifera without misidentifying noise pixels or seawater pixels as U.prolifera.Additionally,the method demonstrated consistent performance across different im-ages,with an average Area Under Curve(AUC)of 0.930(+0.028).The analysis yielded an overall accuracy of over 96%,with an average Kappa coefficient of 0.941(+0.038).Compared to the traditional thresholding method,Random Forest model has a lower estimation error of 14.81%.Practical application indicates that this method can be used in the detection of unprecedented U.prolifera in 2021 to derive continuous spatiotemporal changes.This study provides a potential new method to detect U.prolifera and enhances our under-standing of macroalgal outbreaks in the marine environment.展开更多
Using flexible damping technology to improve tunnel lining structure is an emerging method to resist earthquake disasters,and several methods have been explored to predict mechanical response of tunnel lining with dam...Using flexible damping technology to improve tunnel lining structure is an emerging method to resist earthquake disasters,and several methods have been explored to predict mechanical response of tunnel lining with damping layer.However,the traditional numerical methods suffer from the complex modelling and time-consuming problems.Therefore,a prediction model named the random forest regressor(RFR)is proposed based on 240 numerical simulation results of the mechanical response of tunnel lining.In addition,circle mapping(CM)is used to improve Archimedes optimization algorithm(AOA),reptile search algorithm(RSA),and Chernobyl disaster optimizer(CDO)to further improve the predictive performance of the RFR model.The performance evaluation results show that the CMRSA-RFR is the best prediction model.The damping layer thickness is the most important feature for predicting the maximum principal stress of tunnel lining containing damping layer.This study verifies the feasibility of combining numerical simulation with machine learning technology,and provides a new solution for predicting the mechanical response of aseismic tunnel with damping layer.展开更多
Machine learning has emerged as a pivotal tool in deciphering and managing this excess of information in an era of abundant data.This paper presents a comprehensive analysis of machine learning algorithms,focusing on ...Machine learning has emerged as a pivotal tool in deciphering and managing this excess of information in an era of abundant data.This paper presents a comprehensive analysis of machine learning algorithms,focusing on the structure and efficacy of random forests in mitigating overfitting—a prevalent issue in decision tree models.It also introduces a novel approach to enhancing decision tree performance through an optimized pruning method called Adaptive Cross-Validated Alpha CCP(ACV-CCP).This method refines traditional cost complexity pruning by streamlining the selection of the alpha parameter,leveraging cross-validation within the pruning process to achieve a reliable,computationally efficient alpha selection that generalizes well to unseen data.By enhancing computational efficiency and balancing model complexity,ACV-CCP allows decision trees to maintain predictive accuracy while minimizing overfitting,effectively narrowing the performance gap between decision trees and random forests.Our findings illustrate how ACV-CCP contributes to the robustness and applicability of decision trees,providing a valuable perspective on achieving computationally efficient and generalized machine learning models.展开更多
This study proposed a new real-time manufacturing process monitoring method to monitor and detect process shifts in manufacturing operations.Since real-time production process monitoring is critical in today’s smart ...This study proposed a new real-time manufacturing process monitoring method to monitor and detect process shifts in manufacturing operations.Since real-time production process monitoring is critical in today’s smart manufacturing.The more robust the monitoring model,the more reliable a process is to be under control.In the past,many researchers have developed real-time monitoring methods to detect process shifts early.However,thesemethods have limitations in detecting process shifts as quickly as possible and handling various data volumes and varieties.In this paper,a robust monitoring model combining Gated Recurrent Unit(GRU)and Random Forest(RF)with Real-Time Contrast(RTC)called GRU-RF-RTC was proposed to detect process shifts rapidly.The effectiveness of the proposed GRU-RF-RTC model is first evaluated using multivariate normal and nonnormal distribution datasets.Then,to prove the applicability of the proposed model in a realmanufacturing setting,the model was evaluated using real-world normal and non-normal problems.The results demonstrate that the proposed GRU-RF-RTC outperforms other methods in detecting process shifts quickly with the lowest average out-of-control run length(ARL1)in all synthesis and real-world problems under normal and non-normal cases.The experiment results on real-world problems highlight the significance of the proposed GRU-RF-RTC model in modern manufacturing process monitoring applications.The result reveals that the proposed method improves the shift detection capability by 42.14%in normal and 43.64%in gamma distribution problems.展开更多
Machine learning(ML)algorithms are frequently used in landslide susceptibility modeling.Different data handling strategies may generate variations in landslide susceptibility modeling,even when using the same ML algor...Machine learning(ML)algorithms are frequently used in landslide susceptibility modeling.Different data handling strategies may generate variations in landslide susceptibility modeling,even when using the same ML algorithm.This research aims to compare the combinations of inventory data handling,cross validation(CV),and hyperparameter tuning strategies to generate landslide susceptibility maps.The results are expected to provide a general strategy for landslide susceptibility modeling using ML techniques.The authors employed eight landslide inventory data handling scenarios to convert a landslide polygon into a landslide point,i.e.,the landslide point is located on the toe(minimum height),on the scarp(maximum height),at the center of the landslide,randomly inside the polygon(1 point),randomly inside the polygon(3 points),randomly inside the polygon(5 points),randomly inside the polygon(10 points),and 15 m grid sampling.Random forest models using CV-nonspatial hyperparameter tuning,spatial CV-spatial hyperparameter tuning,and spatial CV-forward feature selection-no hyperparameter tuning were applied for each data handling strategy.The combination generated 24 random forest ML workflows,which are applied using a complete inventory of 743 landslides triggered by Tropical Cyclone Cempaka(2017)in Pacitan Regency,Indonesia,and 11 landslide controlling factors.The results show that grid sampling with spatial CV and spatial hyperparameter tuning is favorable because the strategy can minimize overfitting,generate a relatively high-performance predictive model,and reduce the appearance of susceptibility artifacts in the landslide area.Careful data inventory handling,CV,and hyperparameter tuning strategies should be considered in landslide susceptibility modeling to increase the applicability of landslide susceptibility maps in practical application.展开更多
The method of Random Forest (RF) was used to classify whether rockburst will happen and the intensity of rockburst in the underground rock projects. Some main control factors of rockburst, such as the values of in-s...The method of Random Forest (RF) was used to classify whether rockburst will happen and the intensity of rockburst in the underground rock projects. Some main control factors of rockburst, such as the values of in-situ stresses, uniaxial compressive strength and tensile strength of rock, and the elastic energy index of rock, were selected in the analysis. The traditional indicators were summarized and divided into indexes I and 1I. Random Forest model and criterion were obtained through training 36 sets of rockburst samples which come from underground rock projects in domestic and abroad. Another 10 samples were tested and evaluated with the model. The evaluated results agree well with the practical records. Comparing the results of support vector machine (SVM) method, and artificial neural network (ANN) method with random forest method, the corresponding misjudgment ratios are 10%, 20%, and 0, respectively. The misjudgment ratio using index I is smaller than that using index II. It is suggested that using the index I and RF model can accurately classify rockburst grade.展开更多
基金Funds for the Central Universities(grant number CUC24SG018).
文摘The proliferation of robot accounts on social media platforms has posed a significant negative impact,necessitating robust measures to counter network anomalies and safeguard content integrity.Social robot detection has emerged as a pivotal yet intricate task,aimed at mitigating the dissemination of misleading information.While graphbased approaches have attained remarkable performance in this realm,they grapple with a fundamental limitation:the homogeneity assumption in graph convolution allows social robots to stealthily evade detection by mingling with genuine human profiles.To unravel this challenge and thwart the camouflage tactics,this work proposed an innovative social robot detection framework based on enhanced HOmogeneity and Random Forest(HORFBot).At the core of HORFBot lies a homogeneous graph enhancement strategy,intricately woven with edge-removal techniques,tometiculously dissect the graph intomultiple revealing subgraphs.Subsequently,leveraging the power of contrastive learning,the proposed methodology meticulously trains multiple graph convolutional networks,each honed to discern nuances within these tailored subgraphs.The culminating stage involves the fusion of these feature-rich base classifiers,harmoniously aggregating their insights to produce a comprehensive detection outcome.Extensive experiments on three social robot detection datasets have shown that this method effectively improves the accuracy of social robot detection and outperforms comparative methods.
基金National Natural Science Foundation of China(Grant No.42274180)National Key Research and Development Program of China(2021YFC2902003).
文摘Evaluation of water richness in sandstone is an important research topic in the prevention and control of mine water disasters,and the water richness in sandstone is closely related to its porosity.The refl ection seismic exploration data have high-density spatial sampling information,which provides an important data basis for the prediction of sandstone porosity in coal seam roofs by using refl ection seismic data.First,the basic principles of the variational mode decomposition(VMD)method and the random forest method are introduced.Then,the geological model of coal seam roof sandstone is constructed,seismic forward modeling is conducted,and random noise is added.The decomposition eff ects of the empirical mode decomposition(EMD)method and VMD method on noisy signals are compared and analyzed.The test results show that the fi rstorder intrinsic mode functions(IMF1)and IMF2 decomposed by the VMD method contain the main eff ective components of seismic signals.A prediction process of sandstone porosity in coal seam roofs based on the combination of VMD and random forest method is proposed.The feasibility and eff ectiveness of the method are verifi ed by trial calculation in the porosity prediction of model data.Taking the actual coalfi eld refl ection seismic data as an example,the sandstone porosity of the 8 coal seam roof is predicted.The application results show the potential application value of the new porosity prediction method proposed in this study.This method has important theoretical guiding signifi cance for evaluating water richness in coal seam roof sandstone and the prevention and control of mine water disasters.
文摘To improve the efficiency of air quality analysis and the accuracy of predictions, this paper proposes a composite method based on Vector Autoregressive (VAR) and Random Forest (RF) models. In the theoretical section, the model introduction and estimation algorithms are provided. In the empirical analysis section, global air quality data from 2022 to 2024 are used, and the proposed method is applied. Specifically, principal component analysis (PCA) is first conducted, and then VAR and Random Forest methods are used for prediction on the reduced-dimensional data. The results show that the RMSE of the hybrid model is 45.27, significantly lower than the 49.11 of the VAR model alone, verifying its superiority. The stability and predictive performance of the model are effectively enhanced.
文摘The agricultural Internet of Things(IoT)system is a critical component of modern smart agriculture,and its security risk assessment methods have garnered increasing attention from the industry.Current agricultural IoT security risk assessment methods primarily rely on expert judgment,introducing subjective factors that reduce the credibility of the assessment results.To address this issue,this study constructed a dataset for agricultural IoT security risk assessment based on real-world security reports.A PCARF algorithm,built on random forest principles,was proposed,incorporating ensemble learning strategies to enhance prediction accuracy.Compared to the second-best model,the proposed model demonstrated a 2.7%increase in accuracy,a 3.4%improvement in recall rate,a 3.1%rise in Area Under the Curve(AUC),and a 7.9%boost in Matthews Correlation Coefficient(MCC).Extensive comparative experiments showed that the proposed model outperforms others in prediction accuracy and robustness.
基金financially supported by the National Natural Science Foundation of China(No.52174001)the National Natural Science Foundation of China(No.52004064)+1 种基金the Hainan Province Science and Technology Special Fund “Research on Real-time Intelligent Sensing Technology for Closed-loop Drilling of Oil and Gas Reservoirs in Deepwater Drilling”(ZDYF2023GXJS012)Heilongjiang Provincial Government and Daqing Oilfield's first batch of the scientific and technological key project “Research on the Construction Technology of Gulong Shale Oil Big Data Analysis System”(DQYT-2022-JS-750)。
文摘Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face many challenges. This paper studies the problems of difficult feature information extraction,low precision of thin-layer identification and limited applicability of the model in intelligent lithologic identification. The author tries to improve the comprehensive performance of the lithology identification model from three aspects: data feature extraction, class balance, and model design. A new real-time intelligent lithology identification model of dynamic felling strategy weighted random forest algorithm(DFW-RF) is proposed. According to the feature selection results, gamma ray and 2 MHz phase resistivity are the logging while drilling(LWD) parameters that significantly influence lithology identification. The comprehensive performance of the DFW-RF lithology identification model has been verified in the application of 3 wells in different areas. By comparing the prediction results of five typical lithology identification algorithms, the DFW-RF model has a higher lithology identification accuracy rate and F1 score. This model improves the identification accuracy of thin-layer lithology and is effective and feasible in different geological environments. The DFW-RF model plays a truly efficient role in the realtime intelligent identification of lithologic information in closed-loop drilling and has greater applicability, which is worthy of being widely used in logging interpretation.
基金the National Natural Science Foundation of China(Grant 42177164)the Distinguished Youth Science Foundation of Hunan Province of China(2022JJ10073).
文摘As massive underground projects have become popular in dense urban cities,a problem has arisen:which model predicts the best for Tunnel Boring Machine(TBM)performance in these tunneling projects?However,performance level of TBMs in complex geological conditions is still a great challenge for practitioners and researchers.On the other hand,a reliable and accurate prediction of TBM performance is essential to planning an applicable tunnel construction schedule.The performance of TBM is very difficult to estimate due to various geotechnical and geological factors and machine specifications.The previously-proposed intelligent techniques in this field are mostly based on a single or base model with a low level of accuracy.Hence,this study aims to introduce a hybrid randomforest(RF)technique optimized by global harmony search with generalized oppositionbased learning(GOGHS)for forecasting TBM advance rate(AR).Optimizing the RF hyper-parameters in terms of,e.g.,tree number and maximum tree depth is the main objective of using the GOGHS-RF model.In the modelling of this study,a comprehensive databasewith themost influential parameters onTBMtogetherwithTBM AR were used as input and output variables,respectively.To examine the capability and power of the GOGHSRF model,three more hybrid models of particle swarm optimization-RF,genetic algorithm-RF and artificial bee colony-RF were also constructed to forecast TBM AR.Evaluation of the developed models was performed by calculating several performance indices,including determination coefficient(R2),root-mean-square-error(RMSE),and mean-absolute-percentage-error(MAPE).The results showed that theGOGHS-RF is a more accurate technique for estimatingTBMAR compared to the other applied models.The newly-developedGOGHS-RFmodel enjoyed R2=0.9937 and 0.9844,respectively,for train and test stages,which are higher than a pre-developed RF.Also,the importance of the input parameters was interpreted through the SHapley Additive exPlanations(SHAP)method,and it was found that thrust force per cutter is the most important variable on TBMAR.The GOGHS-RF model can be used in mechanized tunnel projects for predicting and checking performance.
基金Under the auspices of National Natural Science Foundation of China(No.52079103)。
文摘Precise and timely prediction of crop yields is crucial for food security and the development of agricultural policies.However,crop yield is influenced by multiple factors within complex growth environments.Previous research has paid relatively little attention to the interference of environmental factors and drought on the growth of winter wheat.Therefore,there is an urgent need for more effective methods to explore the inherent relationship between these factors and crop yield,making precise yield prediction increasingly important.This study was based on four type of indicators including meteorological,crop growth status,environmental,and drought index,from October 2003 to June 2019 in Henan Province as the basic data for predicting winter wheat yield.Using the sparrow search al-gorithm combined with random forest(SSA-RF)under different input indicators,accuracy of winter wheat yield estimation was calcu-lated.The estimation accuracy of SSA-RF was compared with partial least squares regression(PLSR),extreme gradient boosting(XG-Boost),and random forest(RF)models.Finally,the determined optimal yield estimation method was used to predict winter wheat yield in three typical years.Following are the findings:1)the SSA-RF demonstrates superior performance in estimating winter wheat yield compared to other algorithms.The best yield estimation method is achieved by four types indicators’composition with SSA-RF)(R^(2)=0.805,RRMSE=9.9%.2)Crops growth status and environmental indicators play significant roles in wheat yield estimation,accounting for 46%and 22%of the yield importance among all indicators,respectively.3)Selecting indicators from October to April of the follow-ing year yielded the highest accuracy in winter wheat yield estimation,with an R^(2)of 0.826 and an RMSE of 9.0%.Yield estimates can be completed two months before the winter wheat harvest in June.4)The predicted performance will be slightly affected by severe drought.Compared with severe drought year(2011)(R^(2)=0.680)and normal year(2017)(R^(2)=0.790),the SSA-RF model has higher prediction accuracy for wet year(2018)(R^(2)=0.820).This study could provide an innovative approach for remote sensing estimation of winter wheat yield.yield.
基金supported by the National Science Foundation of China(42107183).
文摘Driven piles are used in many geological environments as a practical and convenient structural component.Hence,the determination of the drivability of piles is actually of great importance in complex geotechnical applications.Conventional methods of predicting pile drivability often rely on simplified physicalmodels or empirical formulas,whichmay lack accuracy or applicability in complex geological conditions.Therefore,this study presents a practical machine learning approach,namely a Random Forest(RF)optimized by Bayesian Optimization(BO)and Particle Swarm Optimization(PSO),which not only enhances prediction accuracy but also better adapts to varying geological environments to predict the drivability parameters of piles(i.e.,maximumcompressive stress,maximum tensile stress,and blow per foot).In addition,support vector regression,extreme gradient boosting,k nearest neighbor,and decision tree are also used and applied for comparison purposes.In order to train and test these models,among the 4072 datasets collected with 17model inputs,3258 datasets were randomly selected for training,and the remaining 814 datasets were used for model testing.Lastly,the results of these models were compared and evaluated using two performance indices,i.e.,the root mean square error(RMSE)and the coefficient of determination(R2).The results indicate that the optimized RF model achieved lower RMSE than other prediction models in predicting the three parameters,specifically 0.044,0.438,and 0.146;and higher R^(2) values than other implemented techniques,specifically 0.966,0.884,and 0.977.In addition,the sensitivity and uncertainty of the optimized RF model were analyzed using Sobol sensitivity analysis and Monte Carlo(MC)simulation.It can be concluded that the optimized RF model could be used to predict the performance of the pile,and it may provide a useful reference for solving some problems under similar engineering conditions.
基金supported by the National Natural Science Foundation of China under Grant(Number:52105136)the Hong Kong Scholar program under Grant(Number:XJ2022013)China Postdoctoral Science Foundation under Grant(Number:2021M690290)Academic Excellence Foundation of BUAA under Grant(Number:BY2004103).
文摘Fatigue reliability-based design optimization of aeroengine structures involves multiple repeated calculations of reliability degree and large-scale calls of implicit high-nonlinearity limit state function,leading to the traditional direct Monte Claro and surrogate methods prone to unacceptable computing efficiency and accuracy.In this case,by fusing the random subspace strategy and weight allocation technology into bagging ensemble theory,a random forest(RF)model is presented to enhance the computing efficiency of reliability degree;moreover,by embedding the RF model into multilevel optimization model,an efficient RF-assisted fatigue reliability-based design optimization framework is developed.Regarding the low-cycle fatigue reliability-based design optimization of aeroengine turbine disc as a case,the effectiveness of the presented framework is validated.The reliabilitybased design optimization results exhibit that the proposed framework holds high computing accuracy and computing efficiency.The current efforts shed a light on the theory/method development of reliability-based design optimization of complex engineering structures.
文摘In the era of the Internet,widely used web applications have become the target of hacker attacks because they contain a large amount of personal information.Among these vulnerabilities,stealing private data through crosssite scripting(XSS)attacks is one of the most commonly used attacks by hackers.Currently,deep learning-based XSS attack detection methods have good application prospects;however,they suffer from problems such as being prone to overfitting,a high false alarm rate,and low accuracy.To address these issues,we propose a multi-stage feature extraction and fusion model for XSS detection based on Random Forest feature enhancement.The model utilizes RandomForests to capture the intrinsic structure and patterns of the data by extracting leaf node indices as features,which are subsequentlymergedwith the original data features to forma feature setwith richer information content.Further feature extraction is conducted through three parallel channels.Channel I utilizes parallel onedimensional convolutional layers(1Dconvolutional layers)with different convolutional kernel sizes to extract local features at different scales and performmulti-scale feature fusion;Channel II employsmaximum one-dimensional pooling layers(max 1D pooling layers)of various sizes to extract key features from the data;and Channel III extracts global information bi-directionally using a Bi-Directional Long-Short TermMemory Network(Bi-LSTM)and incorporates a multi-head attention mechanism to enhance global features.Finally,effective classification and prediction of XSS are performed by fusing the features of the three channels.To test the effectiveness of the model,we conduct experiments on six datasets.We achieve an accuracy of 100%on the UNSW-NB15 dataset and 99.99%on the CICIDS2017 dataset,which is higher than that of the existing models.
基金the Deanship of Scientific Research at Shaqra University for funding this research work through the project number(SU-ANN-2023051).
文摘In recent years,machine learning(ML)and deep learning(DL)have significantly advanced intrusion detection systems,effectively addressing potential malicious attacks across networks.This paper introduces a robust method for detecting and categorizing attacks within the Internet of Things(IoT)environment,leveraging the NSL-KDD dataset.To achieve high accuracy,the authors used the feature extraction technique in combination with an autoencoder,integrated with a gated recurrent unit(GRU).Therefore,the accurate features are selected by using the cuckoo search algorithm integrated particle swarm optimization(PSO),and PSO has been employed for training the features.The final classification of features has been carried out by using the proposed RF-GNB random forest with the Gaussian Naïve Bayes classifier.The proposed model has been evaluated and its performance is verified with some of the standard metrics such as precision,accuracy rate,recall F1-score,etc.,and has been compared with different existing models.The generated results that detected approximately 99.87%of intrusions within the IoT environments,demonstrated the high performance of the proposed method.These results affirmed the efficacy of the proposed method in increasing the accuracy of intrusion detection within IoT network systems.
基金Under the auspices of National Natural Science Foundation of China(No.42071385)National Science and Technology Major Project of High Resolution Earth Observation System(No.79-Y50-G18-9001-22/23)。
文摘Automatically detecting Ulva prolifera(U.prolifera)in rainy and cloudy weather using remote sensing imagery has been a long-standing problem.Here,we address this challenge by combining high-resolution Synthetic Aperture Radar(SAR)imagery with the machine learning,and detect the U.prolifera of the South Yellow Sea of China(SYS)in 2021.The findings indicate that the Random Forest model can accurately and robustly detect U.prolifera,even in the presence of complex ocean backgrounds and speckle noise.Visual inspection confirmed that the method successfully identified the majority of pixels containing U.prolifera without misidentifying noise pixels or seawater pixels as U.prolifera.Additionally,the method demonstrated consistent performance across different im-ages,with an average Area Under Curve(AUC)of 0.930(+0.028).The analysis yielded an overall accuracy of over 96%,with an average Kappa coefficient of 0.941(+0.038).Compared to the traditional thresholding method,Random Forest model has a lower estimation error of 14.81%.Practical application indicates that this method can be used in the detection of unprecedented U.prolifera in 2021 to derive continuous spatiotemporal changes.This study provides a potential new method to detect U.prolifera and enhances our under-standing of macroalgal outbreaks in the marine environment.
基金Project(2023YFB2390400)supported by the National Key R&D Programs for Young Scientists,ChinaProjects(U21A20159,52079133,52379112,52309123,41902288)supported by the National Natural Science Foundation of China+5 种基金Project(2024AFB041)supported by the Hubei Provincial Natural Science Foundation,ChinaProject(QTKS0034W23291)supported by the Key Laboratory of Water Grid Project and Regulation of Ministry of Water Resources,ChinaProject(2023SGG07)supported by the Visiting Researcher Fund Program of State Key Laboratory of Water Resources Engineering and Management,ChinaProject(2022KY56(ZDZX)-02)supported by the Key Research Program of FSDI,ChinaProject(SKS-2022103)supported by the Key Research Program of the Ministry of Water Resources,ChinaProject(202102AF080001)supported by the Yunnan Major Science and Technology Special Program,China。
文摘Using flexible damping technology to improve tunnel lining structure is an emerging method to resist earthquake disasters,and several methods have been explored to predict mechanical response of tunnel lining with damping layer.However,the traditional numerical methods suffer from the complex modelling and time-consuming problems.Therefore,a prediction model named the random forest regressor(RFR)is proposed based on 240 numerical simulation results of the mechanical response of tunnel lining.In addition,circle mapping(CM)is used to improve Archimedes optimization algorithm(AOA),reptile search algorithm(RSA),and Chernobyl disaster optimizer(CDO)to further improve the predictive performance of the RFR model.The performance evaluation results show that the CMRSA-RFR is the best prediction model.The damping layer thickness is the most important feature for predicting the maximum principal stress of tunnel lining containing damping layer.This study verifies the feasibility of combining numerical simulation with machine learning technology,and provides a new solution for predicting the mechanical response of aseismic tunnel with damping layer.
文摘Machine learning has emerged as a pivotal tool in deciphering and managing this excess of information in an era of abundant data.This paper presents a comprehensive analysis of machine learning algorithms,focusing on the structure and efficacy of random forests in mitigating overfitting—a prevalent issue in decision tree models.It also introduces a novel approach to enhancing decision tree performance through an optimized pruning method called Adaptive Cross-Validated Alpha CCP(ACV-CCP).This method refines traditional cost complexity pruning by streamlining the selection of the alpha parameter,leveraging cross-validation within the pruning process to achieve a reliable,computationally efficient alpha selection that generalizes well to unseen data.By enhancing computational efficiency and balancing model complexity,ACV-CCP allows decision trees to maintain predictive accuracy while minimizing overfitting,effectively narrowing the performance gap between decision trees and random forests.Our findings illustrate how ACV-CCP contributes to the robustness and applicability of decision trees,providing a valuable perspective on achieving computationally efficient and generalized machine learning models.
基金support from the National Science and Technology Council of Taiwan(Contract Nos.111-2221 E-011081 and 111-2622-E-011019)the support from Intelligent Manufacturing Innovation Center(IMIC),National Taiwan University of Science and Technology(NTUST),Taipei,Taiwan,which is a Featured Areas Research Center in Higher Education Sprout Project of Ministry of Education(MOE),Taiwan(since 2023)was appreciatedWe also thank Wang Jhan Yang Charitable Trust Fund(Contract No.WJY 2020-HR-01)for its financial support.
文摘This study proposed a new real-time manufacturing process monitoring method to monitor and detect process shifts in manufacturing operations.Since real-time production process monitoring is critical in today’s smart manufacturing.The more robust the monitoring model,the more reliable a process is to be under control.In the past,many researchers have developed real-time monitoring methods to detect process shifts early.However,thesemethods have limitations in detecting process shifts as quickly as possible and handling various data volumes and varieties.In this paper,a robust monitoring model combining Gated Recurrent Unit(GRU)and Random Forest(RF)with Real-Time Contrast(RTC)called GRU-RF-RTC was proposed to detect process shifts rapidly.The effectiveness of the proposed GRU-RF-RTC model is first evaluated using multivariate normal and nonnormal distribution datasets.Then,to prove the applicability of the proposed model in a realmanufacturing setting,the model was evaluated using real-world normal and non-normal problems.The results demonstrate that the proposed GRU-RF-RTC outperforms other methods in detecting process shifts quickly with the lowest average out-of-control run length(ARL1)in all synthesis and real-world problems under normal and non-normal cases.The experiment results on real-world problems highlight the significance of the proposed GRU-RF-RTC model in modern manufacturing process monitoring applications.The result reveals that the proposed method improves the shift detection capability by 42.14%in normal and 43.64%in gamma distribution problems.
文摘Machine learning(ML)algorithms are frequently used in landslide susceptibility modeling.Different data handling strategies may generate variations in landslide susceptibility modeling,even when using the same ML algorithm.This research aims to compare the combinations of inventory data handling,cross validation(CV),and hyperparameter tuning strategies to generate landslide susceptibility maps.The results are expected to provide a general strategy for landslide susceptibility modeling using ML techniques.The authors employed eight landslide inventory data handling scenarios to convert a landslide polygon into a landslide point,i.e.,the landslide point is located on the toe(minimum height),on the scarp(maximum height),at the center of the landslide,randomly inside the polygon(1 point),randomly inside the polygon(3 points),randomly inside the polygon(5 points),randomly inside the polygon(10 points),and 15 m grid sampling.Random forest models using CV-nonspatial hyperparameter tuning,spatial CV-spatial hyperparameter tuning,and spatial CV-forward feature selection-no hyperparameter tuning were applied for each data handling strategy.The combination generated 24 random forest ML workflows,which are applied using a complete inventory of 743 landslides triggered by Tropical Cyclone Cempaka(2017)in Pacitan Regency,Indonesia,and 11 landslide controlling factors.The results show that grid sampling with spatial CV and spatial hyperparameter tuning is favorable because the strategy can minimize overfitting,generate a relatively high-performance predictive model,and reduce the appearance of susceptibility artifacts in the landslide area.Careful data inventory handling,CV,and hyperparameter tuning strategies should be considered in landslide susceptibility modeling to increase the applicability of landslide susceptibility maps in practical application.
文摘目的:基于超高效液相色谱串联四极杆飞行时间质谱(UHPLC-QTOF-MS^(E))分析并经数字量化处理,结合随机森林(Random Forest,RF)算法构建数据辨识模型,以实现中华草龟、巴西龟、台湾龟、鳄鱼龟、鳖甲基原的数字化鉴定。方法:经样品预处理后,对不同来源、不同批次的龟甲进行UPLC-QTOF-MS^(E)分析,并以混合样品为基准进行峰位校正、提取并经量化处理,获取反映多肽离子信息的精确质量数-保留时间数据对(Exact Mass Retention Time,EMRT)。然后基于信息增益率的特征筛选获取重要多肽离子信息,结合随机森林(RF)算法进行数据建模,同时基于内部交叉验证中的准确率(Acc)、精确率(P)、曲线下面积(AUC)等参数进行模型评价。最后基于最优模型进行龟甲基原的鉴定验证分析。结果:基于信息增益率的特征筛选,得到71个特征多肽信息,建立的RF模型具有优秀的辨识效果,准确率、精确率以及AUC均大于0.950且外部鉴定验证的正确率为100.0%。结论:基于UHPLC-QTOF-MS^(E)分析,并结合RF算法能够高效准确地实现不同来源龟甲基原的数字化鉴定,可为龟甲的质量控制及基原考证提供参考和帮助。
基金Projects (50934006, 10872218) supported by the National Natural Science Foundation of ChinaProject (2010CB732004) supported bythe National Basic Research Program of China+1 种基金Project (kjdb2010-6) supported by Doctoral Candidate Innovation Research Support Program of Science & Technology Review, ChinaProject (201105) supported by Scholarship Award for Excellent Doctoral Student,Ministry of Education, China
文摘The method of Random Forest (RF) was used to classify whether rockburst will happen and the intensity of rockburst in the underground rock projects. Some main control factors of rockburst, such as the values of in-situ stresses, uniaxial compressive strength and tensile strength of rock, and the elastic energy index of rock, were selected in the analysis. The traditional indicators were summarized and divided into indexes I and 1I. Random Forest model and criterion were obtained through training 36 sets of rockburst samples which come from underground rock projects in domestic and abroad. Another 10 samples were tested and evaluated with the model. The evaluated results agree well with the practical records. Comparing the results of support vector machine (SVM) method, and artificial neural network (ANN) method with random forest method, the corresponding misjudgment ratios are 10%, 20%, and 0, respectively. The misjudgment ratio using index I is smaller than that using index II. It is suggested that using the index I and RF model can accurately classify rockburst grade.