摘要
针对广告点击率预测中单一逻辑回归(LR)点击率预测模型未考虑特征值的差异对预测结果造成的影响,提出基于K均值的三阶段集成在线广告点击率预测模型。根据特征值的差异,用K均值算法对广告分类,得到K个数据子集,在每个数据子集上训练一个点击率预测模型,用这些模型共同对点击率进行预测,并通过梯度提升决策树(GBDT)挖掘特征之间的非线性关系来解决LR预测能力受限的问题。实验结果表明,基于K均值的三阶段集成在线广告点击率预测模型在评价指标上比GBDT+LR、LR方法分别提升了2%、6%。
The impact of eigenvalue difference on the prediction results about click-rate is not considered in the single logistic(LR)regression rate prediction model.A three-stage integrated online advertising click-rate prediction model for K-means is proposed.AD data set is classified into K data subsets by K-means.One click-rate prediction model is trained on each data subset,and these models are taken to predict click rate.In addition,the problem of limited LR prediction ability is solved by gradient boosting decision tree(GBDT),which is used to study the nonlinear relationship between features.The experiment results show that K-means three-stage integrated model increases 2% and 6% respectively compared with GBDT+LR and LR methods.
作者
邓路佳
刘平山
DENG Lujia;LIU Pingslaan(School of Computer and Information Security,Guilin University of Electronic Technology,Guilin 541004,China;School of Business,Guilin University of Electronic Technology,Guilin 541004,China)
出处
《桂林电子科技大学学报》
2018年第3期215-218,共4页
Journal of Guilin University of Electronic Technology
基金
国家自然科学基金(61762029)
广西自然科学基金(2016GXNSFAA380011)
关键词
梯度提升决策树
K均值
逻辑回归
特征学习
gradient boosting decision tree
K-means
logistic regression
characteristics of learning