期刊文献+
共找到624篇文章
< 1 2 32 >
每页显示 20 50 100
Computational Intelligence Prediction Model Integrating Empirical Mode Decomposition,Principal Component Analysis,and Weighted k-Nearest Neighbor 被引量:2
1
作者 Li Tang He-Ping Pan Yi-Yong Yao 《Journal of Electronic Science and Technology》 CAS CSCD 2020年第4期341-349,共9页
On the basis of machine leaning,suitable algorithms can make advanced time series analysis.This paper proposes a complex k-nearest neighbor(KNN)model for predicting financial time series.This model uses a complex feat... On the basis of machine leaning,suitable algorithms can make advanced time series analysis.This paper proposes a complex k-nearest neighbor(KNN)model for predicting financial time series.This model uses a complex feature extraction process integrating a forward rolling empirical mode decomposition(EMD)for financial time series signal analysis and principal component analysis(PCA)for the dimension reduction.The information-rich features are extracted then input to a weighted KNN classifier where the features are weighted with PCA loading.Finally,prediction is generated via regression on the selected nearest neighbors.The structure of the model as a whole is original.The test results on real historical data sets confirm the effectiveness of the models for predicting the Chinese stock index,an individual stock,and the EUR/USD exchange rate. 展开更多
关键词 Empirical mode decomposition(EMD) k-nearest neighbor(KNN) principal component analysis(PCA) time series
在线阅读 下载PDF
Machine learning-based models for prediction of in-hospital mortality in patients with dengue shock syndrome
2
作者 Luan Thanh Vo Thien Vu +2 位作者 Thach Ngoc Pham Tung Huu Trinh Thanh Tat Nguyen 《World Journal of Methodology》 2025年第3期89-99,共11页
BACKGROUND Severe dengue children with critical complications have been attributed to high mortality rates,varying from approximately 1%to over 20%.To date,there is a lack of data on machine-learning-based algorithms ... BACKGROUND Severe dengue children with critical complications have been attributed to high mortality rates,varying from approximately 1%to over 20%.To date,there is a lack of data on machine-learning-based algorithms for predicting the risk of inhospital mortality in children with dengue shock syndrome(DSS).AIM To develop machine-learning models to estimate the risk of death in hospitalized children with DSS.METHODS This single-center retrospective study was conducted at tertiary Children’s Hospital No.2 in Viet Nam,between 2013 and 2022.The primary outcome was the in-hospital mortality rate in children with DSS admitted to the pediatric intensive care unit(PICU).Nine significant features were predetermined for further analysis using machine learning models.An oversampling method was used to enhance the model performance.Supervised models,including logistic regression,Naïve Bayes,Random Forest(RF),K-nearest neighbors,Decision Tree and Extreme Gradient Boosting(XGBoost),were employed to develop predictive models.The Shapley Additive Explanation was used to determine the degree of contribution of the features.RESULTS In total,1278 PICU-admitted children with complete data were included in the analysis.The median patient age was 8.1 years(interquartile range:5.4-10.7).Thirty-nine patients(3%)died.The RF and XGboost models demonstrated the highest performance.The Shapley Addictive Explanations model revealed that the most important predictive features included younger age,female patients,presence of underlying diseases,severe transaminitis,severe bleeding,low platelet counts requiring platelet transfusion,elevated levels of international normalized ratio,blood lactate and serum creatinine,large volume of resuscitation fluid and a high vasoactive inotropic score(>30).CONCLUSION We developed robust machine learning-based models to estimate the risk of death in hospitalized children with DSS.The study findings are applicable to the design of management schemes to enhance survival outcomes of patients with DSS. 展开更多
关键词 Dengue shock syndrome Dengue mortality Machine learning Supervised models Logistic regression Random forest k-nearest neighbors Support vector machine Extreme Gradient Boost Shapley addictive explanations
在线阅读 下载PDF
Propagation Path Loss Models at 28 GHz Using K-Nearest Neighbor Algorithm
3
作者 Vu Thanh Quang Dinh Van Linh To Thi Thao 《通讯和计算机(中英文版)》 2022年第1期1-8,共8页
In this paper,we develop and apply K-Nearest Neighbor algorithm to propagation pathloss regression.The path loss models present the dependency of attenuation value on distance using machine learning algorithms based o... In this paper,we develop and apply K-Nearest Neighbor algorithm to propagation pathloss regression.The path loss models present the dependency of attenuation value on distance using machine learning algorithms based on the experimental data.The algorithm is performed by choosing k nearest points and training dataset to find the optimal k value.The proposed method is applied to impove and adjust pathloss model at 28 GHz in Keangnam area,Hanoi,Vietnam.The experiments in both line-of-sight and non-line-of-sight scenarios used many combinations of transmit and receive antennas at different transmit antenna heights and random locations of receive antenna have been carried out using Wireless Insite Software.The results have been compared with 3GPP and NYU Wireless Path Loss Models in order to verify the performance of the proposed approach. 展开更多
关键词 k-nearest neighbor regression 5G millimeter waves path loss
在线阅读 下载PDF
基于不规则区域划分方法的k-Nearest Neighbor查询算法 被引量:1
4
作者 张清清 李长云 +3 位作者 李旭 周玲芳 胡淑新 邹豪杰 《计算机系统应用》 2015年第9期186-190,共5页
随着越来越多的数据累积,对数据处理能力和分析能力的要求也越来越高.传统k-Nearest Neighbor(k NN)查询算法由于其容易导致计算负载整体不均衡的规则区域划分方法及其单个进程或单台计算机运行环境的较低数据处理能力.本文提出并详细... 随着越来越多的数据累积,对数据处理能力和分析能力的要求也越来越高.传统k-Nearest Neighbor(k NN)查询算法由于其容易导致计算负载整体不均衡的规则区域划分方法及其单个进程或单台计算机运行环境的较低数据处理能力.本文提出并详细介绍了一种基于不规则区域划分方法的改进型k NN查询算法,并利用对大规模数据集进行分布式并行计算的模型Map Reduce对该算法加以实现.实验结果与分析表明,Map Reduce框架下基于不规则区域划分方法的k NN查询算法可以获得较高的数据处理效率,并可以较好的支持大数据环境下数据的高效查询. 展开更多
关键词 k-nearest neighbor(k NN)查询算法 不规则区域划分方法 MAP REDUCE 大数据
在线阅读 下载PDF
Mapping aboveground biomass by integrating geospatial and forest inventory data through a k-nearest neighbor strategy in North Central Mexico 被引量:3
5
作者 Carlos A AGUIRRE-SALADO Eduardo J TREVIO-GARZA +7 位作者 Oscar A AGUIRRE-CALDERóN Javier JIMNEZ-PREZ Marco A GONZLEZ-TAGLE José R VALDZ-LAZALDE Guillermo SNCHEZ-DíAZ Reija HAAPANEN Alejandro I AGUIRRE-SALADO Liliana MIRANDA-ARAGóN 《Journal of Arid Land》 SCIE CSCD 2014年第1期80-96,共17页
As climate change negotiations progress,monitoring biomass and carbon stocks is becoming an important part of the current forest research.Therefore,national governments are interested in developing forest-monitoring s... As climate change negotiations progress,monitoring biomass and carbon stocks is becoming an important part of the current forest research.Therefore,national governments are interested in developing forest-monitoring strategies using geospatial technology.Among statistical methods for mapping biomass,there is a nonparametric approach called k-nearest neighbor(kNN).We compared four variations of distance metrics of the kNN for the spatially-explicit estimation of aboveground biomass in a portion of the Mexican north border of the intertropical zone.Satellite derived,climatic,and topographic predictor variables were combined with the Mexican National Forest Inventory(NFI)data to accomplish the purpose.Performance of distance metrics applied into the kNN algorithm was evaluated using a cross validation leave-one-out technique.The results indicate that the Most Similar Neighbor(MSN)approach maximizes the correlation between predictor and response variables(r=0.9).Our results are in agreement with those reported in the literature.These findings confirm the predictive potential of the MSN approach for mapping forest variables at pixel level under the policy of Reducing Emission from Deforestation and Forest Degradation(REDD+). 展开更多
关键词 k-nearest neighbor Mahalanobis most similar neighbor MODIS BRDF-adjusted reflectance forest inventory the policy of Reducing Emission from Deforestation and Forest Degradation
在线阅读 下载PDF
Real-Time Spreading Thickness Monitoring of High-core Rockfill Dam Based on K-nearest Neighbor Algorithm 被引量:4
6
作者 Denghua Zhong Rongxiang Du +2 位作者 Bo Cui Binping Wu Tao Guan 《Transactions of Tianjin University》 EI CAS 2018年第3期282-289,共8页
During the storehouse surface rolling construction of a core rockfilldam, the spreading thickness of dam face is an important factor that affects the construction quality of the dam storehouse' rolling surface and... During the storehouse surface rolling construction of a core rockfilldam, the spreading thickness of dam face is an important factor that affects the construction quality of the dam storehouse' rolling surface and the overallquality of the entire dam. Currently, the method used to monitor and controlspreading thickness during the dam construction process is artificialsampling check after spreading, which makes it difficult to monitor the entire dam storehouse surface. In this paper, we present an in-depth study based on real-time monitoring and controltheory of storehouse surface rolling construction and obtain the rolling compaction thickness by analyzing the construction track of the rolling machine. Comparatively, the traditionalmethod can only analyze the rolling thickness of the dam storehouse surface after it has been compacted and cannot determine the thickness of the dam storehouse surface in realtime. To solve these problems, our system monitors the construction progress of the leveling machine and employs a real-time spreading thickness monitoring modelbased on the K-nearest neighbor algorithm. Taking the LHK core rockfilldam in Southwest China as an example, we performed real-time monitoring for the spreading thickness and conducted real-time interactive queries regarding the spreading thickness. This approach provides a new method for controlling the spreading thickness of the core rockfilldam storehouse surface. 展开更多
关键词 Core rockfill dam Dam storehouse surface construction Spreading thickness k-nearest neighbor algorithm Real-time monitor
在线阅读 下载PDF
Pruned fuzzy K-nearest neighbor classifier for beat classification 被引量:3
7
作者 Muhammad Arif Muhammad Usman Akram Fayyaz-ul-Afsar Amir Minhas 《Journal of Biomedical Science and Engineering》 2010年第4期380-389,共10页
Arrhythmia beat classification is an active area of research in ECG based clinical decision support systems. In this paper, Pruned Fuzzy K-nearest neighbor (PFKNN) classifier is proposed to classify six types of beats... Arrhythmia beat classification is an active area of research in ECG based clinical decision support systems. In this paper, Pruned Fuzzy K-nearest neighbor (PFKNN) classifier is proposed to classify six types of beats present in the MIT-BIH Arrhythmia database. We have tested our classifier on ~ 103100 beats for six beat types present in the database. Fuzzy KNN (FKNN) can be implemented very easily but large number of training examples used for classification can be very time consuming and requires large storage space. Hence, we have proposed a time efficient Arif-Fayyaz pruning algorithm especially suitable for FKNN which can maintain good classification accuracy with appropriate retained ratio of training data. By using Arif-Fayyaz pruning algorithm with Fuzzy KNN, we have achieved a beat classification accuracy of 97% and geometric mean of sensitivity of 94.5% with only 19% of the total training examples. The accuracy and sensitivity is comparable to FKNN when all the training data is used. Principal Component Analysis is used to further reduce the dimension of feature space from eleven to six without compromising the accuracy and sensitivity. PFKNN was found to robust against noise present in the ECG data. 展开更多
关键词 ARRHYTHMIA ECG k-nearest neighbor PRUNING FUZZY Classification
在线阅读 下载PDF
A Short-Term Traffic Flow Forecasting Method Based on a Three-Layer K-Nearest Neighbor Non-Parametric Regression Algorithm 被引量:7
8
作者 Xiyu Pang Cheng Wang Guolin Huang 《Journal of Transportation Technologies》 2016年第4期200-206,共7页
Short-term traffic flow is one of the core technologies to realize traffic flow guidance. In this article, in view of the characteristics that the traffic flow changes repeatedly, a short-term traffic flow forecasting... Short-term traffic flow is one of the core technologies to realize traffic flow guidance. In this article, in view of the characteristics that the traffic flow changes repeatedly, a short-term traffic flow forecasting method based on a three-layer K-nearest neighbor non-parametric regression algorithm is proposed. Specifically, two screening layers based on shape similarity were introduced in K-nearest neighbor non-parametric regression method, and the forecasting results were output using the weighted averaging on the reciprocal values of the shape similarity distances and the most-similar-point distance adjustment method. According to the experimental results, the proposed algorithm has improved the predictive ability of the traditional K-nearest neighbor non-parametric regression method, and greatly enhanced the accuracy and real-time performance of short-term traffic flow forecasting. 展开更多
关键词 Three-Layer Traffic Flow Forecasting k-nearest neighbor Non-Parametric Regression
在线阅读 下载PDF
A Novel Neighbor-Preferential Growth Scale-Free Network Model and its Properties 被引量:1
9
作者 Yongshang Long Zhen Jia 《Communications and Network》 2017年第2期111-123,共13页
In this paper, we propose a novel neighbor-preferential growth (NPG) network model. Theoretical analysis and numerical simulations indicate the new model can reproduce not only a scale-free degree distribution and its... In this paper, we propose a novel neighbor-preferential growth (NPG) network model. Theoretical analysis and numerical simulations indicate the new model can reproduce not only a scale-free degree distribution and its power exponent is related to the edge-adding number m, but also a small-world effect which has large clustering coefficient and small average path length. Interestingly, the clustering coefficient of the model is close to that of globally coupled network, and the average path length is close to that of star coupled network. Meanwhile, the synchronizability of the NPG model is much stronger than that of BA scale-free network, even stronger than that of synchronization-optimal growth network. 展开更多
关键词 NETWORK model neighbor-Preferential SCALE-FREE SMALL-WORLD
在线阅读 下载PDF
Active learning accelerated Monte-Carlo simulation based on the modified K-nearest neighbors algorithm and its application to reliability estimations
10
作者 Zhifeng Xu Jiyin Cao +2 位作者 Gang Zhang Xuyong Chen Yushun Wu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第10期306-313,共8页
This paper proposes an active learning accelerated Monte-Carlo simulation method based on the modified K-nearest neighbors algorithm.The core idea of the proposed method is to judge whether or not the output of a rand... This paper proposes an active learning accelerated Monte-Carlo simulation method based on the modified K-nearest neighbors algorithm.The core idea of the proposed method is to judge whether or not the output of a random input point can be postulated through a classifier implemented through the modified K-nearest neighbors algorithm.Compared to other active learning methods resorting to experimental designs,the proposed method is characterized by employing Monte-Carlo simulation for sampling inputs and saving a large portion of the actual evaluations of outputs through an accurate classification,which is applicable for most structural reliability estimation problems.Moreover,the validity,efficiency,and accuracy of the proposed method are demonstrated numerically.In addition,the optimal value of K that maximizes the computational efficiency is studied.Finally,the proposed method is applied to the reliability estimation of the carbon fiber reinforced silicon carbide composite specimens subjected to random displacements,which further validates its practicability. 展开更多
关键词 Active learning Monte-carlo simulation k-nearest neighbors Reliability estimation CLASSIFICATION
在线阅读 下载PDF
GHM-FKNN:a generalized Heronian mean based fuzzy k-nearest neighbor classifier for the stock trend prediction
11
作者 吴振峰 WANG Mengmeng +1 位作者 LAN Tian ZHANG Anyuan 《High Technology Letters》 EI CAS 2023年第2期122-129,共8页
Stock trend prediction is a challenging problem because it involves many variables.Aiming at the problem that some existing machine learning techniques, such as random forest(RF), probabilistic random forest(PRF), k-n... Stock trend prediction is a challenging problem because it involves many variables.Aiming at the problem that some existing machine learning techniques, such as random forest(RF), probabilistic random forest(PRF), k-nearest neighbor(KNN), and fuzzy KNN(FKNN), have difficulty in accurately predicting the stock trend(uptrend or downtrend) for a given date, a generalized Heronian mean(GHM) based FKNN predictor named GHM-FKNN was proposed.GHM-FKNN combines GHM aggregation function with the ideas of the classical FKNN approach.After evaluation, the comparison results elucidated that GHM-FKNN outperformed the other best existing methods RF, PRF, KNN and FKNN on independent test datasets corresponding to three stocks, namely AAPL, AMZN and NFLX.Compared with RF, PRF, KNN and FKNN, GHM-FKNN achieved the best performance with accuracy of 62.37% for AAPL, 58.25% for AMZN, and 64.10% for NFLX. 展开更多
关键词 stock trend prediction Heronian mean fuzzy k-nearest neighbor(FKNN)
在线阅读 下载PDF
Diagnosis of Disc Space Variation Fault Degree of Transformer Winding Based on K-Nearest Neighbor Algorithm
12
作者 Song Wang Fei Xie +3 位作者 Fengye Yang Shengxuan Qiu Chuang Liu Tong Li 《Energy Engineering》 EI 2023年第10期2273-2285,共13页
Winding is one of themost important components in power transformers.Ensuring the health state of the winding is of great importance to the stable operation of the power system.To efficiently and accurately diagnose t... Winding is one of themost important components in power transformers.Ensuring the health state of the winding is of great importance to the stable operation of the power system.To efficiently and accurately diagnose the disc space variation(DSV)fault degree of transformer winding,this paper presents a diagnostic method of winding fault based on the K-Nearest Neighbor(KNN)algorithmand the frequency response analysis(FRA)method.First,a laboratory winding model is used,and DSV faults with four different degrees are achieved by changing disc space of the discs in the winding.Then,a series of FRA tests are conducted to obtain the FRA results and set up the FRA dataset.Second,ten different numerical indices are utilized to obtain features of FRA curves of faulted winding.Third,the 10-fold cross-validation method is employed to determine the optimal k-value of KNN.In addition,to improve the accuracy of the KNN model,a comparative analysis is made between the accuracy of the KNN algorithm and k-value under four distance functions.After getting the most appropriate distance metric and kvalue,the fault classificationmodel based on theKNN and FRA is constructed and it is used to classify the degrees of DSV faults.The identification accuracy rate of the proposed model is up to 98.30%.Finally,the performance of the model is presented by comparing with the support vector machine(SVM),SVM optimized by the particle swarmoptimization(PSO-SVM)method,and randomforest(RF).The results show that the diagnosis accuracy of the proposed model is the highest and the model can be used to accurately diagnose the DSV fault degrees of the winding. 展开更多
关键词 Transformer winding frequency response analysis(FRA)method k-nearest neighbor(KNN) disc space variation(DSV)
在线阅读 下载PDF
Efficient Parallel Processing of k-Nearest Neighbor Queries by Using a Centroid-based and Hierarchical Clustering Algorithm
13
作者 Elaheh Gavagsaz 《Artificial Intelligence Advances》 2022年第1期26-41,共16页
The k-Nearest Neighbor method is one of the most popular techniques for both classification and regression purposes.Because of its operation,the application of this classification may be limited to problems with a cer... The k-Nearest Neighbor method is one of the most popular techniques for both classification and regression purposes.Because of its operation,the application of this classification may be limited to problems with a certain number of instances,particularly,when run time is a consideration.However,the classification of large amounts of data has become a fundamental task in many real-world applications.It is logical to scale the k-Nearest Neighbor method to large scale datasets.This paper proposes a new k-Nearest Neighbor classification method(KNN-CCL)which uses a parallel centroid-based and hierarchical clustering algorithm to separate the sample of training dataset into multiple parts.The introduced clustering algorithm uses four stages of successive refinements and generates high quality clusters.The k-Nearest Neighbor approach subsequently makes use of them to predict the test datasets.Finally,sets of experiments are conducted on the UCI datasets.The experimental results confirm that the proposed k-Nearest Neighbor classification method performs well with regard to classification accuracy and performance. 展开更多
关键词 CLASSIFICATION k-nearest neighbor Big data CLUSTERING Parallel processing
在线阅读 下载PDF
FLOCKING OF A THERMODYNAMIC CUCKER-SMALE MODEL WITH LOCAL VELOCITY INTERACTIONS
14
作者 金春银 李双智 《Acta Mathematica Scientia》 SCIE CSCD 2024年第2期632-649,共18页
In this paper, we study the flocking behavior of a thermodynamic Cucker–Smale model with local velocity interactions. Using the spectral gap of a connected stochastic matrix, together with an elaborate estimate on pe... In this paper, we study the flocking behavior of a thermodynamic Cucker–Smale model with local velocity interactions. Using the spectral gap of a connected stochastic matrix, together with an elaborate estimate on perturbations of a linearized system, we provide a sufficient framework in terms of initial data and model parameters to guarantee flocking. Moreover, it is shown that the system achieves a consensus at an exponential rate. 展开更多
关键词 FLOCKING local interaction thermodynamical Cucker-Smale model stochastic matrix neighbor graph
在线阅读 下载PDF
A Study of EM Algorithm as an Imputation Method: A Model-Based Simulation Study with Application to a Synthetic Compositional Data
15
作者 Yisa Adeniyi Abolade Yichuan Zhao 《Open Journal of Modelling and Simulation》 2024年第2期33-42,共10页
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode... Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance. 展开更多
关键词 Compositional Data Linear Regression model Least Square Method Robust Least Square Method Synthetic Data Aitchison Distance Maximum Likelihood Estimation Expectation-Maximization Algorithm k-nearest neighbor and Mean imputation
在线阅读 下载PDF
基于机器学习的30%TBP/煤油-硝酸体系中主要组分的分配比预测研究
16
作者 于婷 张音音 +6 位作者 张睿志 金文蕾 罗应婷 朱升峰 何辉 叶国安 龚禾林 《原子能科学技术》 北大核心 2025年第1期14-23,共10页
为最优化后处理过程的实验条件、优化工艺、降低实验成本和时间,并提高后处理流程数学模拟的准确性,本文基于随机森林、支持向量回归和K近邻这3种经典的机器学习算法建立了30%TBP/煤油-硝酸体系中主要组分铀、钚、硝酸的分配比数学模型... 为最优化后处理过程的实验条件、优化工艺、降低实验成本和时间,并提高后处理流程数学模拟的准确性,本文基于随机森林、支持向量回归和K近邻这3种经典的机器学习算法建立了30%TBP/煤油-硝酸体系中主要组分铀、钚、硝酸的分配比数学模型,并基于不同数据集进行了超参数优化和模型训练。通过对模型进行验证和测试,发现采用随机森林算法建立的分配比模型准确度最高,其对铀预测的平均绝对相对误差达7.73%,较传统方法提高了约7%。与传统建模方法相比,机器学习方法建立模型的准确度更高。 展开更多
关键词 分配比数学模型 随机森林 支持向量回归 K近邻
在线阅读 下载PDF
基于机器学习的冠心病风险预测模型构建与比较 被引量:1
17
作者 岳海涛 何婵婵 +3 位作者 成羽攸 张森诚 吴悠 马晶 《中国全科医学》 CAS 北大核心 2025年第4期499-509,共11页
背景冠状动脉粥样硬化性心脏病(以下简称冠心病)是全球重要的死亡原因之一。目前关于冠心病风险评估的研究在逐年增长。然而,在这些研究中常忽略了数据不平衡的问题,而解决该问题对于提高分类算法中识别冠心病风险的准确性至关重要。目... 背景冠状动脉粥样硬化性心脏病(以下简称冠心病)是全球重要的死亡原因之一。目前关于冠心病风险评估的研究在逐年增长。然而,在这些研究中常忽略了数据不平衡的问题,而解决该问题对于提高分类算法中识别冠心病风险的准确性至关重要。目的探索冠心病的影响因素,通过使用2种平衡数据的方法,基于5种算法建立冠心病风险相关的预测模型,比较这5种模型对冠心病风险的预测价值。方法基于2021年美国国家行为风险因素监测系统(BRFSS)横断面调查数据筛选出112606名研究对象的健康相关风险行为、慢性健康状况等24个变量信息,结局指标为自我报告是否患有冠心病并据此分为冠心病组和非冠心病组。通过进行单因素分析和逐步Logistic回归分析探索冠心病发生的影响因素并筛选出纳入预测模型的变量。随机抽取112606名受访者的10%(共计11261名),以8∶2的比例随机划分为训练与测试的数据集,采用随机过采样和合成少数过采样技术(SMOTE)两种过采样的方法处理不平衡数据,基于k最邻近算法(KNN)、Logistic回归、支持向量机(SVM)、决策树和XGBoost算法分别建立冠心病预测模型。结果两组年龄、性别、BMI、种族、婚姻状态、教育水平、收入水平、家里有几个孩子、是否被告知患高血压、是否被告知处于高血压前期、是否被告知患妊娠高血压、现在是否在服用高血压药物、是否被告知患有高脂血症、是否被告知患有糖尿病、吸烟情况、过去30 d内是否至少喝过1次酒、是否为重度饮酒者、是否为酗酒者、过去30 d内是否有体育锻炼、心理健康状况以及自我健康评价比较,差异有统计学意义(P<0.05)。逐步Logistic回归分析结果显示:年龄、性别、BMI、种族、教育水平、收入水平、是否被告知患高血压、是否被告知处于高血压前期、是否被告知患妊娠高血压、现在是否在服用高血压药物、是否被告知患有高脂血症、是否被告知患有糖尿病、吸烟情况、过去30 d内是否至少喝过1次酒、是否为重度饮酒者、是否为酗酒者以及自我健康评价为冠心病的影响因素(P<0.05)。风险模型构建的分析结果显示:k最邻近算法、Logistic回归、支持向量机、决策树和XGBoost采用SMOTE处理不平衡数据的总体分类精度分别为59.2%、67.4%、66.2%、69.2%和85.9%,召回率分别为75.2%、71.4%、70.5%、62.9%和34.8%,精确度分别为15.4%、18.2%、17.5%、17.6%和28.7%,F值分别为0.256、0.290、0.280、0.275和0.315,受试者工作特征曲线下面积分别为0.80、0.78、0.72、0.72和0.82;采用随机过采样处理不平衡数据的总体分类精度分别为62.5%、68.5%、69.0%、60.2%和70.1%,召回率分别为70.0%、69.5%、71.9%、69.0%和67.6%;精确度分别为15.8%、18.4%、19.1%、14.8%和19.0%,F值分别为0.258、0.291、0.302、0.244和0.297,受试者工作特征曲线下面积分别为0.80、0.77、0.72、0.72和0.83。结论本研究不仅确认了已知冠心病的影响因素,还发现了自我健康评价水平、收入水平和教育水平对冠心病具有潜在影响。在使用2种数据平衡方法后,5种算法的性能显著提高。其中XGBoost模型表现最佳,可作为未来优化冠心病预测模型的参考。此外,鉴于XGBoost模型的优异性能以及逐步Logistic回归的操作便捷和可解释性,推荐在冠心病风险预测模型中结合使用数据平衡后的XGBoost和逐步Logistic回归分析。 展开更多
关键词 冠心病 机器学习 风险预测模型 LOGISTIC回归 k最邻近算法 支持向量机 决策树 XGBoost
在线阅读 下载PDF
RecBERT:Semantic recommendation engine with large language model enhanced query segmentation for k-nearest neighbors ranking retrieval 被引量:1
18
作者 Richard Wu 《Intelligent and Converged Networks》 EI 2024年第1期42-52,共11页
The increasing amount of user traffic on Internet discussion forums has led to a huge amount of unstructured natural language data in the form of user comments.Most modern recommendation systems rely on manual tagging... The increasing amount of user traffic on Internet discussion forums has led to a huge amount of unstructured natural language data in the form of user comments.Most modern recommendation systems rely on manual tagging,relying on administrators to label the features of a class,or story,which a user comment corresponds to.Another common approach is to use pre-trained word embeddings to compare class descriptions for textual similarity,then use a distance metric such as cosine similarity or Euclidean distance to find top k neighbors.However,neither approach is able to fully utilize this user-generated unstructured natural language data,reducing the scope of these recommendation systems.This paper studies the application of domain adaptation on a transformer for the set of user comments to be indexed,and the use of simple contrastive learning for the sentence transformer fine-tuning process to generate meaningful semantic embeddings for the various user comments that apply to each class.In order to match a query containing content from multiple user comments belonging to the same class,the construction of a subquery channel for computing class-level similarity is proposed.This channel uses query segmentation of the aggregate query into subqueries,performing k-nearest neighbors(KNN)search on each individual subquery.RecBERT achieves state-of-the-art performance,outperforming other state-of-the-art models in accuracy,precision,recall,and F1 score for classifying comments between four and eight classes,respectively.RecBERT outperforms the most precise state-of-the-art model(distilRoBERTa)in precision by 6.97%for matching comments between eight classes. 展开更多
关键词 sentence transformer simple contrastive learning large language models query segmentation k-nearest neighbors
原文传递
基于机器学习的女性压力性尿失禁发病风险预测模型建立及效能评价
19
作者 时欣然 庞震 +2 位作者 乔婷 李晶晶 王勤章 《现代泌尿外科杂志》 2025年第3期196-206,共11页
目的运用K最近邻法(KNN)、支持向量机(SVM)、决策树(DT)及随机森林(RF)构建女性压力性尿失禁(SUI)发病的预测模型,并评估各模型效能,为SUI的早期诊断提供参考。方法回顾性分析2019年10月—2023年10月石河子大学第一附属医院泌尿外科及... 目的运用K最近邻法(KNN)、支持向量机(SVM)、决策树(DT)及随机森林(RF)构建女性压力性尿失禁(SUI)发病的预测模型,并评估各模型效能,为SUI的早期诊断提供参考。方法回顾性分析2019年10月—2023年10月石河子大学第一附属医院泌尿外科及妇产科治疗的女性SUI患者及同期行健康查体女性的临床资料,将产后42 d女性纳入产后组(n=611),围绝经期与绝经后女性纳入非产后组(n=409)。设置随机种子数并以7∶3的比例分为训练集与验证集。收集所有研究对象的相关临床资料,使用单因素及Lasso回归筛选有意义的变量,将其纳入KNN、SVM、DT及RF算法中并构建模型,分别计算模型的敏感度、特异度、准确度、曲线下面积(AUC)等,筛选出最优的模型。结果产后组SUI患者为352例,占57.6%。根据单因素及Lasso回归,产后组筛选出有意义的变量为:年龄、身体质量指数(BMI)、快肌阶段最大值、孕次、膀胱颈移动度(BND)、尿道旋转角(URA)、会阴侧切、既往尿失禁史及便秘。在产后组验证集中KNN、SVM、DT、RF模型的AUC分别为0.881、0.878、0.750、0.905,RF模型的AUC、准确度、F1指数及Kappa值均最大。非产后组SUI患者为260例,占63.6%。根据单因素及Lasso回归,非产后组筛选出有意义的变量为:年龄、BMI、快肌阶段最大值及恢复时间、慢肌阶段平均值、后静息阶段变异性、阴道分娩、既往尿失禁史及便秘。在非产后组验证集中KNN、SVM、DT、RF模型的AUC分别为0.819、0.805、0.603、0.830,RF模型的AUC、准确度、Kappa值均最大。结论本研究基于机器学习成功建立4种产后42 d女性,围绝经期及绝经后女性SUI发病的预测模型,其中采用RF算法的模型预测效率最佳。 展开更多
关键词 压力性尿失禁 预测模型 机器学习 决策树 随机森林 支持向量机 K最近邻法
在线阅读 下载PDF
基于Stacking模型融合算法的风电功率预测方法
20
作者 张雪原 蔡思烨 +4 位作者 刘巧宏 朱坚 包晓炜 夏玉剑 陈极 《电力与能源》 2025年第1期61-66,共6页
随着新能源在新型电力系统中渗透率的日益增加,对风电场功率预测的准确性能要求也不断提升。为提高风电功率预测的准确性和可靠性,设计了以线性回归、K邻近、随机森林算法为特征提取层,以轻量梯度提升机为回归预测层的Stacking模型融合... 随着新能源在新型电力系统中渗透率的日益增加,对风电场功率预测的准确性能要求也不断提升。为提高风电功率预测的准确性和可靠性,设计了以线性回归、K邻近、随机森林算法为特征提取层,以轻量梯度提升机为回归预测层的Stacking模型融合算法。以某风电场近年运行数据为案例,验证了该基于Stacking模型融合算法的预测方法相较于任一单一机器学习算法都具有更高的预测精度。 展开更多
关键词 风力发电 Stacking模型融合算法 随机森林 K邻近 负荷预测
在线阅读 下载PDF
上一页 1 2 32 下一页 到第
使用帮助 返回顶部