The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more ...The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more reliable results. The classification and regression tree (CART) is one of the new modeling techniques which is developed for this purpose. In this study, the classification and regression trees method is explained and tested the power of the financial failure prediction. CART is applied for the data of industry companies which is trade in Istanbul Stock Exchange (ISE) between 1997-2007. As a result of this study, it has been observed that, CART has a high predicting power of financial failure one, two and three years prior to failure, and profitability ratios being the most important ratios in the prediction of failure.展开更多
Background: Vegetation distribution maps are of great significance for nature protection and management. In diverse tropical forests, accurate spatial mapping of vegetation types is challenging;the high species divers...Background: Vegetation distribution maps are of great significance for nature protection and management. In diverse tropical forests, accurate spatial mapping of vegetation types is challenging;the high species diversity and abundance of rare species challenge classification concepts, while remote sensing signals may not vary systematically with species composition, complicating the technical capability for delineating vegetation types in the landscape.Methods: We used a combination of field-based compositional data and their relations to environmental variables to predict the distribution of forest types in the Wuzhishan National Natural Reserve(WNNR), Hainan Island,China, using multivariate regression trees(MRT). The MRT was based on arboreal vegetation composition in 132plots of 20 m×20 m with a regular spacing of 1 km. Apart from the MRT, non-metric multidimensional scaling(NMDS) was used to evaluate vegetation-environment relationships.Results: The MRT model worked best when using 14 key environmental variables including topography, climate,latitude and soil, although the difference with the simpler model including only topographical variables was small. The full model classified the 132 plots into 3 vegetation types, 6 formation groups, 20 formations and 65associations at different hierarchical syntaxonomic levels. This model was the basis for forest vegetation maps for the WNNR. MRT and NMDS showed that elevation was the main driving force for the distribution of vegetation types and formation groups. Climate, latitude, and soil(especially available P), together with topographic variables, all influenced the distribution of formations and associations.Conclusions: While elevation determines forest-type distributions, lower-level syntaxonomic forest classes respond to the topographic diversity typical for mountains. Apart from providing the first detailed forest vegetation map for any part of WNNR, we show how, in spite of limitations, MRT with existing environmental data can be a useful method for mapping diverse and remote tropical forests.展开更多
Plant epidemics are often associated with weather-related variables.It is difficult to identify weather-related predictors for models predicting plant epidemics.In the article by Shah et al.,to predict Fusarium head b...Plant epidemics are often associated with weather-related variables.It is difficult to identify weather-related predictors for models predicting plant epidemics.In the article by Shah et al.,to predict Fusarium head blight(FHB)epidemics of wheat,they explored a functional approach using scalar-on-function regression to model a binary outcome(FHB epidemic or non-epidemic)with respect to weather time series spanning 140 days relative to anthesis.The scalar-on-function models fit the data better than previously described logistic regression models.In this work,given the same dataset and models,we attempt to reproduce the article by Shah et al.using a different approach,boosted regression trees.After fitting,the classification accuracy and model statistics are surprisingly good.展开更多
Tree-based models have been widely applied in both academic and industrial settings due to the natural interpretability, good predictive accuracy, and high scalability. In this paper, we focus on improving the single-...Tree-based models have been widely applied in both academic and industrial settings due to the natural interpretability, good predictive accuracy, and high scalability. In this paper, we focus on improving the single-tree method and propose the segmented linear regression trees(SLRT) model that replaces the traditional constant leaf model with linear ones. From the parametric view, SLRT can be employed as a recursive change point detect procedure for segmented linear regression(SLR) models,which is much more efficient and flexible than the traditional grid search method. Along this way,we propose to use the conditional Kendall's τ correlation coefficient to select the underlying change points. From the non-parametric view, we propose an efficient greedy splitting method that selects the splits by analyzing the association between residuals and each candidate split variable. Further, with the SLRT as a single-tree predictor, we propose a linear random forest approach that aggregates the SLRTs by a weighted average. Both simulation and empirical studies showed significant improvements than the CART trees and even the random forest.展开更多
According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the chang...According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.展开更多
In order to solve the poor generalization ability of the back-propagation(BP)neural network in the model updating hybrid test,a novel method called the AdaBoost regression tree algorithm is introduced into the model u...In order to solve the poor generalization ability of the back-propagation(BP)neural network in the model updating hybrid test,a novel method called the AdaBoost regression tree algorithm is introduced into the model updating procedure in hybrid tests.During the learning phase,the regression tree is selected as a weak regression model to be trained,and then multiple trained weak regression models are integrated into a strong regression model.Finally,the training results are generated through voting by all the selected regression models.A 2-DOF nonlinear structure was numerically simulated by utilizing the online AdaBoost regression tree algorithm and the BP neural network algorithm as a contrast.The results show that the prediction accuracy of the online AdaBoost regression algorithm is 48.3%higher than that of the BP neural network algorithm,which verifies that the online AdaBoost regression tree algorithm has better generalization ability compared to the BP neural network algorithm.Furthermore,it can effectively eliminate the influence of weight initialization and improve the prediction accuracy of the restoring force in hybrid tests.展开更多
Cable-stayed bridges have been widely used in high-speed railway infrastructure.The accurate determination of cable’s representative temperatures is vital during the intricate processes of design,construction,and mai...Cable-stayed bridges have been widely used in high-speed railway infrastructure.The accurate determination of cable’s representative temperatures is vital during the intricate processes of design,construction,and maintenance of cable-stayed bridges.However,the representative temperatures of stayed cables are not specified in the existing design codes.To address this issue,this study investigates the distribution of the cable temperature and determinates its representative temperature.First,an experimental investigation,spanning over a period of one year,was carried out near the bridge site to obtain the temperature data.According to the statistical analysis of the measured data,it reveals that the temperature distribution is generally uniform along the cable cross-section without significant temperature gradient.Then,based on the limited data,the Monte Carlo,the gradient boosted regression trees(GBRT),and univariate linear regression(ULR)methods are employed to predict the cable’s representative temperature throughout the service life.These methods effectively overcome the limitations of insufficient monitoring data and accurately predict the representative temperature of the cables.However,each method has its own advantages and limitations in terms of applicability and accuracy.A comprehensive evaluation of the performance of these methods is conducted,and practical recommendations are provided for their application.The proposed methods and representative temperatures provide a good basis for the operation and maintenance of in-service long-span cable-stayed bridges.展开更多
Modeling dynamic systems with linear parametric models usually suffer limitation which affects forecasting performance and policy implications. This paper advances a non-parametric autoregressive distributed lag model...Modeling dynamic systems with linear parametric models usually suffer limitation which affects forecasting performance and policy implications. This paper advances a non-parametric autoregressive distributed lag model that employs a Bayesian additive regression tree (BART). The performance of the BART model is compared with selection models like Lasso, Elastic Net, and Bayesian networks in simulation experiments with linear and non-linear data generating processes (DGP), and on US macroeconomic time series data. The results show that the BART model is quite competitive against the linear parametric methods when the DGP is linear, and outperforms the competing methods when the DGP is non-linear. The empirical results suggest that the BART estimators are generally more efficient than the traditional linear methods when modeling and forecasting macroeconomic time series.展开更多
Hydraulic fracturing is an effective technology for hydrocarbon extraction from unconventional shale and tight gas reservoirs.A potential risk of hydraulic fracturing is the upward migration of stray gas from the deep...Hydraulic fracturing is an effective technology for hydrocarbon extraction from unconventional shale and tight gas reservoirs.A potential risk of hydraulic fracturing is the upward migration of stray gas from the deep subsurface to shallow aquifers.The stray gas can dissolve in groundwater leading to chemical and biological reactions,which could negatively affect groundwater quality and contribute to atmospheric emissions.The knowledge oflight hydrocarbon solubility in the aqueous environment is essential for the numerical modelling offlow and transport in the subsurface.Herein,we compiled a database containing 2129experimental data of methane,ethane,and propane solubility in pure water and various electrolyte solutions over wide ranges of operating temperature and pressure.Two machine learning algorithms,namely regression tree(RT)and boosted regression tree(BRT)tuned with a Bayesian optimization algorithm(BO)were employed to determine the solubility of gases.The predictions were compared with the experimental data as well as four well-established thermodynamic models.Our analysis shows that the BRT-BO is sufficiently accurate,and the predicted values agree well with those obtained from the thermodynamic models.The coefficient of determination(R2)between experimental and predicted values is 0.99 and the mean squared error(MSE)is 9.97×10^(-8).The leverage statistical approach further confirmed the validity of the model developed.展开更多
Sustainable intensification of cultivated land use(SICLU) and large-scale operations(LSO) are widely acknowledged strategies for enhancing agricultural performance.However,the existing literature has faced challenges ...Sustainable intensification of cultivated land use(SICLU) and large-scale operations(LSO) are widely acknowledged strategies for enhancing agricultural performance.However,the existing literature has faced challenges in precisely defining SICLU and constructing comprehensive indicators,which has hindered the exploration of factors influencing LSO within the SICLU framework.To address this gap,we integrated self-efficacy theory into the design of an index framework for evaluating SICLU.We subsequently employed econometric models to analyze the significant factors that impact LSO.Our findings reveal that SICLU can be divided into four key dimensions:intensive management,efficient output,resource conservation,and ecological environment optimization.Furthermore,it is crucial to incorporate belief-based cognitive factors into the index system,as farmers’ understanding of fertilizer and pesticide application significantly influences their willingness to engage in LSO.Moreover,we identify grain market turnover as the most influential factor in promoting LSO,with single-factor contribution rates reaching 70.9% for cultivated land transfer willingness and 62.5% for the total planting areas.Interestingly,unlike irrigation and agricultural machinery inputs,increased labor inputs correspond to larger planting areas for farmers.This trend may be attributed to reduced labor availability because of rural labor migration,whereas the reduction in irrigation and agricultural input is contingent on innovations in production practices and the transfer of cultivated land management rights.Importantly,SICLU dynamically influences LSO,with each index related to SICLU having an optimal range that fosters LSO.These insights offer valuable guidance for policymakers,emphasizing farmers as their central focus,with the adjustment of input and output factors as a means to achieve LSO as the ultimate goal.In conclusion,we propose research avenues for further enriching the SICLU framework to ensure that it aligns with the specific characteristics of regional agricultural development.展开更多
In enterprise operations,maintaining manual rules for enterprise processes can be expensive,time-consuming,and dependent on specialized domain knowledge in that enterprise domain.Recently,rule-generation has been auto...In enterprise operations,maintaining manual rules for enterprise processes can be expensive,time-consuming,and dependent on specialized domain knowledge in that enterprise domain.Recently,rule-generation has been automated in enterprises,particularly through Machine Learning,to streamline routine tasks.Typically,these machine models are black boxes where the reasons for the decisions are not always transparent,and the end users need to verify the model proposals as a part of the user acceptance testing to trust it.In such scenarios,rules excel over Machine Learning models as the end-users can verify the rules and have more trust.In many scenarios,the truth label changes frequently thus,it becomes difficult for the Machine Learning model to learn till a considerable amount of data has been accumulated,but with rules,the truth can be adapted.This paper presents a novel framework for generating human-understandable rules using the Classification and Regression Tree(CART)decision tree method,which ensures both optimization and user trust in automated decision-making processes.The framework generates comprehensible rules in the form of if condition and then predicts class even in domains where noise is present.The proposed system transforms enterprise operations by automating the production of human-readable rules from structured data,resulting in increased efficiency and transparency.Removing the need for human rule construction saves time and money while guaranteeing that users can readily check and trust the automatic judgments of the system.The remarkable performance metrics of the framework,which achieve 99.85%accuracy and 96.30%precision,further support its efficiency in translating complex data into comprehensible rules,eventually empowering users and enhancing organizational decision-making processes.展开更多
We quantified deviations in regional forest biomass from simple extrapolation of plot data by the biomass expansion factor method(BEF) versus estimates obtained from a local biomass model,based on large-scale empiri...We quantified deviations in regional forest biomass from simple extrapolation of plot data by the biomass expansion factor method(BEF) versus estimates obtained from a local biomass model,based on large-scale empirical field inventory sampling data.The sources and relative contributions of deviations between the two models were analyzed by the boosted regression trees method.Relative to the local model,BEF overestimated accumulative biomass by 22.12%.The predominant sources of the total deviation (70.94%) were stand-structure variables.Stand age and diameter at breast height are the major factors.Compared with biotic variables,abiotic variables had a smaller overall contribution (29.06%),with elevation and soil depth being the most important among the examined abiotic factors.Large deviations in regional forest biomass and carbon stock estimates are likely to be obtained with BEF relative to estimates based on local data.To minimize deviations,stand age and elevation should be included in regional forest-biomass estimation.展开更多
BACKGROUND Liver disease indicates any pathology that can harm or destroy the liver or prevent it from normal functioning.The global community has recently witnessed an increase in the mortality rate due to liver dise...BACKGROUND Liver disease indicates any pathology that can harm or destroy the liver or prevent it from normal functioning.The global community has recently witnessed an increase in the mortality rate due to liver disease.This could be attributed to many factors,among which are human habits,awareness issues,poor healthcare,and late detection.To curb the growing threats from liver disease,early detection is critical to help reduce the risks and improve treatment outcome.Emerging technologies such as machine learning,as shown in this study,could be deployed to assist in enhancing its prediction and treatment.AIM To present a more efficient system for timely prediction of liver disease using a hybrid eXtreme Gradient Boosting model with hyperparameter tuning with a view to assist in early detection,diagnosis,and reduction of risks and mortality associated with the disease.METHODS The dataset used in this study consisted of 416 people with liver problems and 167 with no such history.The data were collected from the state of Andhra Pradesh,India,through https://www.kaggle.com/datasets/uciml/indian-liver-patientrecords.The population was divided into two sets depending on the disease state of the patient.This binary information was recorded in the attribute"is_patient".RESULTS The results indicated that the chi-square automated interaction detection and classification and regression trees models achieved an accuracy level of 71.36%and 73.24%,respectively,which was much better than the conventional method.The proposed solution would assist patients and physicians in tackling the problem of liver disease and ensuring that cases are detected early to prevent it from developing into cirrhosis(scarring)and to enhance the survival of patients.The study showed the potential of machine learning in health care,especially as it concerns disease prediction and monitoring.CONCLUSION This study contributed to the knowledge of machine learning application to health and to the efforts toward combating the problem of liver disease.However,relevant authorities have to invest more into machine learning research and other health technologies to maximize their potential.展开更多
Background: Tropical dry forests cover less than 13 % of the world's tropical forests and their area and biodiversity are declining. In southern Africa, the major threat is increasing population pressure, while drou...Background: Tropical dry forests cover less than 13 % of the world's tropical forests and their area and biodiversity are declining. In southern Africa, the major threat is increasing population pressure, while drought caused by climate change is a potential threat in the drier transition zones to shrub land. Monitoring climate change impacts in these transition zones is difficult as there is inadequate information on forest composition to allow disentanglement from other environmental drivers. Methods: This study combined historical and modern forest inventories covering an area of 21,000 km2 in a transition zone in Namibia and Angola to distinguish late succession tree communities, to understand their dependence on site factors, and to detect trends in the forest composition over the last 40 years. Results: The woodlands were dominated by six tree species that represented 84 % of the total basal area and can be referred to as Bdikioea - Pterocarpus woodlands. A boosted regression tree analysis revealed that late succession tree communities are primarily determined by climate and topography. The Schinziophyton rautanenfi and Baikiaea plurijuga communities are common on slightly inclined dune or valley slopes and had the highest basal area (5.5 - 6.2 m^2 ha&-1). The Burkea africana - Guibourtia coleosperma and Pterocarpus angolensis - Diafium englerianum communities are typical for the sandy plateaux and have a higher proportion of smaller stems caused by a higher fire frequency. A decrease in overall basal area or a trend of increasing domination by the more drought and cold resilient B. africana community was not confirmed by the historical data, but there were significant decreases in basal area for Ochna pulchra and the valuable fruit tree D. englerianum. Conclusions: The slope communities are more sheltered from fire, frost and drought but are more susceptible to human expansion. The community with the important timber tree P. angolensis can best withstand high fire frequency but shows signs of a higher vulnerability to climate change. Conservation and climate adaptation strategies should include protection of the slope communities through refuges. Follow-up studies are needed on short term dynamics, especially near the edges of the transition zone towards shrub land.展开更多
Interior Alaska has a short growing season of 110 d.The knowledge of timings of crop flowering and maturity will provide the information for the agricultural decision making.In this study,six machine learning algorith...Interior Alaska has a short growing season of 110 d.The knowledge of timings of crop flowering and maturity will provide the information for the agricultural decision making.In this study,six machine learning algorithms,namely Linear Discriminant Analysis(LDA),Support Vector Machines(SVMs),k-nearest neighbor(kNN),Naïve Bayes(NB),Recursive Partitioning and Regression Trees(RPART),and Random Forest(RF),were selected to forecast the timings of barley flowering and maturity based on the Alaska Crop Datasets and climate data from 1991 to 2016 in Fairbanks,Alaska.Among 32 models fit to forecast flowering time,two from LDA,12 from SVMs,four from NB,three from RF outperformed models from other algorithms with the highest accuracy.Models from kNN performed worst to forecast flowering time.Among 32 models fit to forecast maturity time,two models from LDA outperformed the models from other algorithms.Models from kNN and RPART performed worst to forecast maturity time.Models from machine learning methods also provided a variable importance explanation.In this study,four out of six algorithms gave the same variable importance order.Sowing date was the most important variable to forecast flowering but less important variable to forecast maturity.The daily maximum temperature may be more important than daily minimum temperature to fit flowering models while daily minimum temperature may be more important than daily maximum temperature to fit maturity models.The results indicate that models from machine learning provide a promising technique in forecasting the timings of flowering and maturity of barley.展开更多
This article aims to assess health habits,safety behaviors,and anxiety factors in the community during the novel coronavirus disease(COVID-19)pandemic in Saudi Arabia based on primary data collected through a question...This article aims to assess health habits,safety behaviors,and anxiety factors in the community during the novel coronavirus disease(COVID-19)pandemic in Saudi Arabia based on primary data collected through a questionnaire with 320 respondents.In other words,this paper aims to provide empirical insights into the correlation and the correspondence between sociodemographic factors(gender,nationality,age,citizenship factors,income,and education),and psycho-behavioral effects on individuals in response to the emergence of this new pandemic.To focus on the interaction between these variables and their effects,we suggest different methods of analysis,comprising regression trees and support vector machine regression(SVMR)algorithms.According to the regression tree results,the age variable plays a predominant role in health habits,safety behaviors,and anxiety.The health habit index,which focuses on the extent of behavioral change toward the commitment to use the health and protection methods,is highly affected by gender and age factors.The average monthly income is also a relevant factor but has contrasting effects during the COVID-19 pandemic period.The results of the SVMR model reveal a strong positive effect of income,with R^(2) values of 99.59%,99.93%and 99.88%corresponding to health habits,safety behaviors,and anxiety.展开更多
Wholesale and retail markets for electricity and power require consumers to forecast electricity consumption at different time intervals. The study aims to</span><span style="font-family:Verdana;"&g...Wholesale and retail markets for electricity and power require consumers to forecast electricity consumption at different time intervals. The study aims to</span><span style="font-family:Verdana;"> increase economic efficiency of the enterprise through the introduction of algorithm for forecasting electric energy consumption unchanged in technological process. Qualitative forecast allows you to essentially reduce costs of electrical </span><span style="font-family:Verdana;">energy, because power cannot be stockpiled. Therefore, when buying excess electrical power, costs can increase either by selling it on the balancing energy </span><span style="font-family:Verdana;">market or by maintaining reserve capacity. If the purchased power is insufficient, the costs increase is due to the purchase of additional capacity. This paper illustrates three methods of forecasting electric energy consumption: autoregressive integrated moving average method, artificial neural networks and classification and regression trees. Actual data from consuming of electrical energy was </span><span style="font-family:Verdana;">used to make day, week and month ahead prediction. The prediction effect of</span><span> </span><span style="font-family:Verdana;">prediction model was proved in Statistica simulation environment. Analysis of estimation of the economic efficiency of prediction methods demonstrated that the use of the artificial neural networks method for short-term forecast </span><span style="font-family:Verdana;">allowed reducing the cost of electricity more efficiently. However, for mid-</span></span><span style="font-family:""> </span><span style="font-family:Verdana;">range predictions, the classification and regression tree was the most efficient method for a Jerky Enterprise. The results indicate that calculation error reduction allows decreases expenses for the purchase of electric energy.展开更多
It is now widely recognized that the statistical property of long memory may be due to reasons other than the data generating process being fractionally integrated. We propose a new procedure aimed at distinguishing b...It is now widely recognized that the statistical property of long memory may be due to reasons other than the data generating process being fractionally integrated. We propose a new procedure aimed at distinguishing between a null hypothesis of unifractal fractionally integrated processes and an alternative hypothesis of other processes which display the long memory property. The procedure is based on a pair of empirical, but consistently defined, statistics namely the number of breaks reported by Atheoretical Regression Trees (ART) and the range of the Empirical Fluctuation Process (EFP) in the CUSUM test. The new procedure establishes through simulation the bivariate distribution of the number of breaks reported by ART with the CUSUM range for simulated fractionally integrated series. This bivariate distribution is then used to empirically construct a test which rejects the null hypothesis for a candidate series if its pair of statistics lies on the periphery of the bivariate distribution determined from simulation under the null. We apply these methods to the realized volatility series of 16 stocks in the Dow Jones Industrial Average and show that the rejection rate of the null is higher than if either statistic was used as a univariate test.展开更多
In the Acadian Forest Region of northeastern North America, forest managers are under increasing public pressure to restore the forest to a more historic, natural condition by reducing in clearcutting and promoting pa...In the Acadian Forest Region of northeastern North America, forest managers are under increasing public pressure to restore the forest to a more historic, natural condition by reducing in clearcutting and promoting partial-cut treatments that more closely emulate historic, local natural disturbance regimes. However, although numerous studies on the effects of partial-cutting on forest regeneration response have been conducted in surrounding temperate and boreal forest ecosystems, there are few studies that directly explore responses to various forms of harvesting within the Acadian Forest ecosystem, with its unique mixture of northern hardwoods and boreal forest species. Here, we conducted one of the first retrospective studies on forest regeneration following a variety of harvesting methods in the Acadian Forest using univariate and multivariate regression trees to assess regeneration response in 50 naturally-regenerating, harvested forest sites in New Brunswick, Canada. Our study shows that regeneration was highly influenced by harvest type, overstory composition, and environmental conditions as reflected by ecoregion classification. Canopy opening size (as controlled by harvest method) significantly influenced the dominance of regenerating species. The presence of conspecific overstory trees increased the likelihood of their regeneration following disturbance, supporting the direct-regeneration hypothesis, especially for species with limited seed dispersal (e.g., sugar maple (Acer saccharum Marsh.) and American beech (Fagus grandifolia Ehrh.). Despite reported problems elsewhere in eastern North America, neither American beech nor balsam fir (Abies balsamea (L.) Mill.) constituted significant competition for the desired species on a broad scale, but the presence of beech was a significant deterrent for yellow birch (Betula alleghaniensis Britt.).展开更多
文摘The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more reliable results. The classification and regression tree (CART) is one of the new modeling techniques which is developed for this purpose. In this study, the classification and regression trees method is explained and tested the power of the financial failure prediction. CART is applied for the data of industry companies which is trade in Istanbul Stock Exchange (ISE) between 1997-2007. As a result of this study, it has been observed that, CART has a high predicting power of financial failure one, two and three years prior to failure, and profitability ratios being the most important ratios in the prediction of failure.
基金financially supported by National Key R&D Program of China(2021YFD220040403 and 2021YFD220040304)the China Scholarship Council(202107565021).
文摘Background: Vegetation distribution maps are of great significance for nature protection and management. In diverse tropical forests, accurate spatial mapping of vegetation types is challenging;the high species diversity and abundance of rare species challenge classification concepts, while remote sensing signals may not vary systematically with species composition, complicating the technical capability for delineating vegetation types in the landscape.Methods: We used a combination of field-based compositional data and their relations to environmental variables to predict the distribution of forest types in the Wuzhishan National Natural Reserve(WNNR), Hainan Island,China, using multivariate regression trees(MRT). The MRT was based on arboreal vegetation composition in 132plots of 20 m×20 m with a regular spacing of 1 km. Apart from the MRT, non-metric multidimensional scaling(NMDS) was used to evaluate vegetation-environment relationships.Results: The MRT model worked best when using 14 key environmental variables including topography, climate,latitude and soil, although the difference with the simpler model including only topographical variables was small. The full model classified the 132 plots into 3 vegetation types, 6 formation groups, 20 formations and 65associations at different hierarchical syntaxonomic levels. This model was the basis for forest vegetation maps for the WNNR. MRT and NMDS showed that elevation was the main driving force for the distribution of vegetation types and formation groups. Climate, latitude, and soil(especially available P), together with topographic variables, all influenced the distribution of formations and associations.Conclusions: While elevation determines forest-type distributions, lower-level syntaxonomic forest classes respond to the topographic diversity typical for mountains. Apart from providing the first detailed forest vegetation map for any part of WNNR, we show how, in spite of limitations, MRT with existing environmental data can be a useful method for mapping diverse and remote tropical forests.
基金supported by the National Natural Science Foundation of China(Grant No.12071173 and 12171192)Huaian Key Laboratory for Infectious Diseases Control and Prevention(HAP201704).
文摘Plant epidemics are often associated with weather-related variables.It is difficult to identify weather-related predictors for models predicting plant epidemics.In the article by Shah et al.,to predict Fusarium head blight(FHB)epidemics of wheat,they explored a functional approach using scalar-on-function regression to model a binary outcome(FHB epidemic or non-epidemic)with respect to weather time series spanning 140 days relative to anthesis.The scalar-on-function models fit the data better than previously described logistic regression models.In this work,given the same dataset and models,we attempt to reproduce the article by Shah et al.using a different approach,boosted regression trees.After fitting,the classification accuracy and model statistics are surprisingly good.
文摘Tree-based models have been widely applied in both academic and industrial settings due to the natural interpretability, good predictive accuracy, and high scalability. In this paper, we focus on improving the single-tree method and propose the segmented linear regression trees(SLRT) model that replaces the traditional constant leaf model with linear ones. From the parametric view, SLRT can be employed as a recursive change point detect procedure for segmented linear regression(SLR) models,which is much more efficient and flexible than the traditional grid search method. Along this way,we propose to use the conditional Kendall's τ correlation coefficient to select the underlying change points. From the non-parametric view, we propose an efficient greedy splitting method that selects the splits by analyzing the association between residuals and each candidate split variable. Further, with the SLRT as a single-tree predictor, we propose a linear random forest approach that aggregates the SLRTs by a weighted average. Both simulation and empirical studies showed significant improvements than the CART trees and even the random forest.
基金supported by the China Earthquake Administration, Institute of Seismology Foundation (IS201526246)
文摘According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.
基金The National Natural Science Foundation of China(No.51708110)。
文摘In order to solve the poor generalization ability of the back-propagation(BP)neural network in the model updating hybrid test,a novel method called the AdaBoost regression tree algorithm is introduced into the model updating procedure in hybrid tests.During the learning phase,the regression tree is selected as a weak regression model to be trained,and then multiple trained weak regression models are integrated into a strong regression model.Finally,the training results are generated through voting by all the selected regression models.A 2-DOF nonlinear structure was numerically simulated by utilizing the online AdaBoost regression tree algorithm and the BP neural network algorithm as a contrast.The results show that the prediction accuracy of the online AdaBoost regression algorithm is 48.3%higher than that of the BP neural network algorithm,which verifies that the online AdaBoost regression tree algorithm has better generalization ability compared to the BP neural network algorithm.Furthermore,it can effectively eliminate the influence of weight initialization and improve the prediction accuracy of the restoring force in hybrid tests.
基金Project(2017G006-N)supported by the Project of Science and Technology Research and Development Program of China Railway Corporation。
文摘Cable-stayed bridges have been widely used in high-speed railway infrastructure.The accurate determination of cable’s representative temperatures is vital during the intricate processes of design,construction,and maintenance of cable-stayed bridges.However,the representative temperatures of stayed cables are not specified in the existing design codes.To address this issue,this study investigates the distribution of the cable temperature and determinates its representative temperature.First,an experimental investigation,spanning over a period of one year,was carried out near the bridge site to obtain the temperature data.According to the statistical analysis of the measured data,it reveals that the temperature distribution is generally uniform along the cable cross-section without significant temperature gradient.Then,based on the limited data,the Monte Carlo,the gradient boosted regression trees(GBRT),and univariate linear regression(ULR)methods are employed to predict the cable’s representative temperature throughout the service life.These methods effectively overcome the limitations of insufficient monitoring data and accurately predict the representative temperature of the cables.However,each method has its own advantages and limitations in terms of applicability and accuracy.A comprehensive evaluation of the performance of these methods is conducted,and practical recommendations are provided for their application.The proposed methods and representative temperatures provide a good basis for the operation and maintenance of in-service long-span cable-stayed bridges.
文摘Modeling dynamic systems with linear parametric models usually suffer limitation which affects forecasting performance and policy implications. This paper advances a non-parametric autoregressive distributed lag model that employs a Bayesian additive regression tree (BART). The performance of the BART model is compared with selection models like Lasso, Elastic Net, and Bayesian networks in simulation experiments with linear and non-linear data generating processes (DGP), and on US macroeconomic time series data. The results show that the BART model is quite competitive against the linear parametric methods when the DGP is linear, and outperforms the competing methods when the DGP is non-linear. The empirical results suggest that the BART estimators are generally more efficient than the traditional linear methods when modeling and forecasting macroeconomic time series.
文摘Hydraulic fracturing is an effective technology for hydrocarbon extraction from unconventional shale and tight gas reservoirs.A potential risk of hydraulic fracturing is the upward migration of stray gas from the deep subsurface to shallow aquifers.The stray gas can dissolve in groundwater leading to chemical and biological reactions,which could negatively affect groundwater quality and contribute to atmospheric emissions.The knowledge oflight hydrocarbon solubility in the aqueous environment is essential for the numerical modelling offlow and transport in the subsurface.Herein,we compiled a database containing 2129experimental data of methane,ethane,and propane solubility in pure water and various electrolyte solutions over wide ranges of operating temperature and pressure.Two machine learning algorithms,namely regression tree(RT)and boosted regression tree(BRT)tuned with a Bayesian optimization algorithm(BO)were employed to determine the solubility of gases.The predictions were compared with the experimental data as well as four well-established thermodynamic models.Our analysis shows that the BRT-BO is sufficiently accurate,and the predicted values agree well with those obtained from the thermodynamic models.The coefficient of determination(R2)between experimental and predicted values is 0.99 and the mean squared error(MSE)is 9.97×10^(-8).The leverage statistical approach further confirmed the validity of the model developed.
基金Under the auspices of National Natural Science Foundation of China(No.42071226,41671176)Taishan Scholars Youth Expert Support Plan of Shandong Province(No.TSQN202306183)。
文摘Sustainable intensification of cultivated land use(SICLU) and large-scale operations(LSO) are widely acknowledged strategies for enhancing agricultural performance.However,the existing literature has faced challenges in precisely defining SICLU and constructing comprehensive indicators,which has hindered the exploration of factors influencing LSO within the SICLU framework.To address this gap,we integrated self-efficacy theory into the design of an index framework for evaluating SICLU.We subsequently employed econometric models to analyze the significant factors that impact LSO.Our findings reveal that SICLU can be divided into four key dimensions:intensive management,efficient output,resource conservation,and ecological environment optimization.Furthermore,it is crucial to incorporate belief-based cognitive factors into the index system,as farmers’ understanding of fertilizer and pesticide application significantly influences their willingness to engage in LSO.Moreover,we identify grain market turnover as the most influential factor in promoting LSO,with single-factor contribution rates reaching 70.9% for cultivated land transfer willingness and 62.5% for the total planting areas.Interestingly,unlike irrigation and agricultural machinery inputs,increased labor inputs correspond to larger planting areas for farmers.This trend may be attributed to reduced labor availability because of rural labor migration,whereas the reduction in irrigation and agricultural input is contingent on innovations in production practices and the transfer of cultivated land management rights.Importantly,SICLU dynamically influences LSO,with each index related to SICLU having an optimal range that fosters LSO.These insights offer valuable guidance for policymakers,emphasizing farmers as their central focus,with the adjustment of input and output factors as a means to achieve LSO as the ultimate goal.In conclusion,we propose research avenues for further enriching the SICLU framework to ensure that it aligns with the specific characteristics of regional agricultural development.
文摘In enterprise operations,maintaining manual rules for enterprise processes can be expensive,time-consuming,and dependent on specialized domain knowledge in that enterprise domain.Recently,rule-generation has been automated in enterprises,particularly through Machine Learning,to streamline routine tasks.Typically,these machine models are black boxes where the reasons for the decisions are not always transparent,and the end users need to verify the model proposals as a part of the user acceptance testing to trust it.In such scenarios,rules excel over Machine Learning models as the end-users can verify the rules and have more trust.In many scenarios,the truth label changes frequently thus,it becomes difficult for the Machine Learning model to learn till a considerable amount of data has been accumulated,but with rules,the truth can be adapted.This paper presents a novel framework for generating human-understandable rules using the Classification and Regression Tree(CART)decision tree method,which ensures both optimization and user trust in automated decision-making processes.The framework generates comprehensible rules in the form of if condition and then predicts class even in domains where noise is present.The proposed system transforms enterprise operations by automating the production of human-readable rules from structured data,resulting in increased efficiency and transparency.Removing the need for human rule construction saves time and money while guaranteeing that users can readily check and trust the automatic judgments of the system.The remarkable performance metrics of the framework,which achieve 99.85%accuracy and 96.30%precision,further support its efficiency in translating complex data into comprehensible rules,eventually empowering users and enhancing organizational decision-making processes.
基金supported by the Major Research Development Program of China(2016YFC0502704)National Science Foundation of China(31670645,31470578 and 31200363)+4 种基金National Forestry Public Welfare Foundation of China(201304205)Fujian Provincial Department of S&T Project(2013YZ0001-1,2015Y0083,2016Y0083,2016T3037 and 2016T3032)Key Laboratory of Urban Environment and Health of CAS(KLUEH-C-201701)Youth Innovation Promotion Association CAS(2014267)Key Program of the Chinese Academy of Sciences(KFZDSW-324)
文摘We quantified deviations in regional forest biomass from simple extrapolation of plot data by the biomass expansion factor method(BEF) versus estimates obtained from a local biomass model,based on large-scale empirical field inventory sampling data.The sources and relative contributions of deviations between the two models were analyzed by the boosted regression trees method.Relative to the local model,BEF overestimated accumulative biomass by 22.12%.The predominant sources of the total deviation (70.94%) were stand-structure variables.Stand age and diameter at breast height are the major factors.Compared with biotic variables,abiotic variables had a smaller overall contribution (29.06%),with elevation and soil depth being the most important among the examined abiotic factors.Large deviations in regional forest biomass and carbon stock estimates are likely to be obtained with BEF relative to estimates based on local data.To minimize deviations,stand age and elevation should be included in regional forest-biomass estimation.
文摘BACKGROUND Liver disease indicates any pathology that can harm or destroy the liver or prevent it from normal functioning.The global community has recently witnessed an increase in the mortality rate due to liver disease.This could be attributed to many factors,among which are human habits,awareness issues,poor healthcare,and late detection.To curb the growing threats from liver disease,early detection is critical to help reduce the risks and improve treatment outcome.Emerging technologies such as machine learning,as shown in this study,could be deployed to assist in enhancing its prediction and treatment.AIM To present a more efficient system for timely prediction of liver disease using a hybrid eXtreme Gradient Boosting model with hyperparameter tuning with a view to assist in early detection,diagnosis,and reduction of risks and mortality associated with the disease.METHODS The dataset used in this study consisted of 416 people with liver problems and 167 with no such history.The data were collected from the state of Andhra Pradesh,India,through https://www.kaggle.com/datasets/uciml/indian-liver-patientrecords.The population was divided into two sets depending on the disease state of the patient.This binary information was recorded in the attribute"is_patient".RESULTS The results indicated that the chi-square automated interaction detection and classification and regression trees models achieved an accuracy level of 71.36%and 73.24%,respectively,which was much better than the conventional method.The proposed solution would assist patients and physicians in tackling the problem of liver disease and ensuring that cases are detected early to prevent it from developing into cirrhosis(scarring)and to enhance the survival of patients.The study showed the potential of machine learning in health care,especially as it concerns disease prediction and monitoring.CONCLUSION This study contributed to the knowledge of machine learning application to health and to the efforts toward combating the problem of liver disease.However,relevant authorities have to invest more into machine learning research and other health technologies to maximize their potential.
基金support of The Future Okavango(TFO)and the SASSCAL projects which were funded by the German Federal Ministry of Education and Research under promotion numbers 01 LL 0912 A and 01 LG1201 M respectivelysupport by the KLIMOS ACROPOLIS research platform(Belgian Development Aid through VLIR/ARES)
文摘Background: Tropical dry forests cover less than 13 % of the world's tropical forests and their area and biodiversity are declining. In southern Africa, the major threat is increasing population pressure, while drought caused by climate change is a potential threat in the drier transition zones to shrub land. Monitoring climate change impacts in these transition zones is difficult as there is inadequate information on forest composition to allow disentanglement from other environmental drivers. Methods: This study combined historical and modern forest inventories covering an area of 21,000 km2 in a transition zone in Namibia and Angola to distinguish late succession tree communities, to understand their dependence on site factors, and to detect trends in the forest composition over the last 40 years. Results: The woodlands were dominated by six tree species that represented 84 % of the total basal area and can be referred to as Bdikioea - Pterocarpus woodlands. A boosted regression tree analysis revealed that late succession tree communities are primarily determined by climate and topography. The Schinziophyton rautanenfi and Baikiaea plurijuga communities are common on slightly inclined dune or valley slopes and had the highest basal area (5.5 - 6.2 m^2 ha&-1). The Burkea africana - Guibourtia coleosperma and Pterocarpus angolensis - Diafium englerianum communities are typical for the sandy plateaux and have a higher proportion of smaller stems caused by a higher fire frequency. A decrease in overall basal area or a trend of increasing domination by the more drought and cold resilient B. africana community was not confirmed by the historical data, but there were significant decreases in basal area for Ochna pulchra and the valuable fruit tree D. englerianum. Conclusions: The slope communities are more sheltered from fire, frost and drought but are more susceptible to human expansion. The community with the important timber tree P. angolensis can best withstand high fire frequency but shows signs of a higher vulnerability to climate change. Conservation and climate adaptation strategies should include protection of the slope communities through refuges. Follow-up studies are needed on short term dynamics, especially near the edges of the transition zone towards shrub land.
文摘Interior Alaska has a short growing season of 110 d.The knowledge of timings of crop flowering and maturity will provide the information for the agricultural decision making.In this study,six machine learning algorithms,namely Linear Discriminant Analysis(LDA),Support Vector Machines(SVMs),k-nearest neighbor(kNN),Naïve Bayes(NB),Recursive Partitioning and Regression Trees(RPART),and Random Forest(RF),were selected to forecast the timings of barley flowering and maturity based on the Alaska Crop Datasets and climate data from 1991 to 2016 in Fairbanks,Alaska.Among 32 models fit to forecast flowering time,two from LDA,12 from SVMs,four from NB,three from RF outperformed models from other algorithms with the highest accuracy.Models from kNN performed worst to forecast flowering time.Among 32 models fit to forecast maturity time,two models from LDA outperformed the models from other algorithms.Models from kNN and RPART performed worst to forecast maturity time.Models from machine learning methods also provided a variable importance explanation.In this study,four out of six algorithms gave the same variable importance order.Sowing date was the most important variable to forecast flowering but less important variable to forecast maturity.The daily maximum temperature may be more important than daily minimum temperature to fit flowering models while daily minimum temperature may be more important than daily maximum temperature to fit maturity models.The results indicate that models from machine learning provide a promising technique in forecasting the timings of flowering and maturity of barley.
文摘This article aims to assess health habits,safety behaviors,and anxiety factors in the community during the novel coronavirus disease(COVID-19)pandemic in Saudi Arabia based on primary data collected through a questionnaire with 320 respondents.In other words,this paper aims to provide empirical insights into the correlation and the correspondence between sociodemographic factors(gender,nationality,age,citizenship factors,income,and education),and psycho-behavioral effects on individuals in response to the emergence of this new pandemic.To focus on the interaction between these variables and their effects,we suggest different methods of analysis,comprising regression trees and support vector machine regression(SVMR)algorithms.According to the regression tree results,the age variable plays a predominant role in health habits,safety behaviors,and anxiety.The health habit index,which focuses on the extent of behavioral change toward the commitment to use the health and protection methods,is highly affected by gender and age factors.The average monthly income is also a relevant factor but has contrasting effects during the COVID-19 pandemic period.The results of the SVMR model reveal a strong positive effect of income,with R^(2) values of 99.59%,99.93%and 99.88%corresponding to health habits,safety behaviors,and anxiety.
文摘Wholesale and retail markets for electricity and power require consumers to forecast electricity consumption at different time intervals. The study aims to</span><span style="font-family:Verdana;"> increase economic efficiency of the enterprise through the introduction of algorithm for forecasting electric energy consumption unchanged in technological process. Qualitative forecast allows you to essentially reduce costs of electrical </span><span style="font-family:Verdana;">energy, because power cannot be stockpiled. Therefore, when buying excess electrical power, costs can increase either by selling it on the balancing energy </span><span style="font-family:Verdana;">market or by maintaining reserve capacity. If the purchased power is insufficient, the costs increase is due to the purchase of additional capacity. This paper illustrates three methods of forecasting electric energy consumption: autoregressive integrated moving average method, artificial neural networks and classification and regression trees. Actual data from consuming of electrical energy was </span><span style="font-family:Verdana;">used to make day, week and month ahead prediction. The prediction effect of</span><span> </span><span style="font-family:Verdana;">prediction model was proved in Statistica simulation environment. Analysis of estimation of the economic efficiency of prediction methods demonstrated that the use of the artificial neural networks method for short-term forecast </span><span style="font-family:Verdana;">allowed reducing the cost of electricity more efficiently. However, for mid-</span></span><span style="font-family:""> </span><span style="font-family:Verdana;">range predictions, the classification and regression tree was the most efficient method for a Jerky Enterprise. The results indicate that calculation error reduction allows decreases expenses for the purchase of electric energy.
文摘It is now widely recognized that the statistical property of long memory may be due to reasons other than the data generating process being fractionally integrated. We propose a new procedure aimed at distinguishing between a null hypothesis of unifractal fractionally integrated processes and an alternative hypothesis of other processes which display the long memory property. The procedure is based on a pair of empirical, but consistently defined, statistics namely the number of breaks reported by Atheoretical Regression Trees (ART) and the range of the Empirical Fluctuation Process (EFP) in the CUSUM test. The new procedure establishes through simulation the bivariate distribution of the number of breaks reported by ART with the CUSUM range for simulated fractionally integrated series. This bivariate distribution is then used to empirically construct a test which rejects the null hypothesis for a candidate series if its pair of statistics lies on the periphery of the bivariate distribution determined from simulation under the null. We apply these methods to the realized volatility series of 16 stocks in the Dow Jones Industrial Average and show that the rejection rate of the null is higher than if either statistic was used as a univariate test.
文摘In the Acadian Forest Region of northeastern North America, forest managers are under increasing public pressure to restore the forest to a more historic, natural condition by reducing in clearcutting and promoting partial-cut treatments that more closely emulate historic, local natural disturbance regimes. However, although numerous studies on the effects of partial-cutting on forest regeneration response have been conducted in surrounding temperate and boreal forest ecosystems, there are few studies that directly explore responses to various forms of harvesting within the Acadian Forest ecosystem, with its unique mixture of northern hardwoods and boreal forest species. Here, we conducted one of the first retrospective studies on forest regeneration following a variety of harvesting methods in the Acadian Forest using univariate and multivariate regression trees to assess regeneration response in 50 naturally-regenerating, harvested forest sites in New Brunswick, Canada. Our study shows that regeneration was highly influenced by harvest type, overstory composition, and environmental conditions as reflected by ecoregion classification. Canopy opening size (as controlled by harvest method) significantly influenced the dominance of regenerating species. The presence of conspecific overstory trees increased the likelihood of their regeneration following disturbance, supporting the direct-regeneration hypothesis, especially for species with limited seed dispersal (e.g., sugar maple (Acer saccharum Marsh.) and American beech (Fagus grandifolia Ehrh.). Despite reported problems elsewhere in eastern North America, neither American beech nor balsam fir (Abies balsamea (L.) Mill.) constituted significant competition for the desired species on a broad scale, but the presence of beech was a significant deterrent for yellow birch (Betula alleghaniensis Britt.).