摘要
ALS(最小二乘法)协同过滤推荐算法是通过矩阵分解进行推荐,它通过综合大量的用户评分数据进行计算,并存储计算过程中产生的大量特征矩阵。Hadoop的HA(高可用性)用来解决HDFS分布式文件系统的Name Node单点故障问题。Spark是一种基于内存的新型分布式大数据计算框架,具有优异的计算性能。文章基于QJM(Quorum Journal Manag-er)构建了HA下的Hadoop大数据平台,并在Spark计算框架基础上研究使用ALS协同过滤算法,实现基于ALS协同过滤算法在Spark上的并行化运行;通过和基于Hadoop的Map Reduce思想的ALS协同过滤算法在Netflix数据集上的比对实验表明,基于Spark平台的ALS协同过滤算法的并行化计算效率有明显提升,并且更适合处理海量数据。
ALS(least square)is a collaborative filtering recommendation algorithm recommended by matrix decomposition,itis calculated by a combination of a large number of user rating data,and stored the calculation process of a large number of charac.teristic matrix. Hadoop-HA(High Available)is used to solve the problem of the single point of failure of the NameNode. The Sparkis a computing framework based on new type of large data come up with distributed memory,at the same time it has excellent comput.ing performance. This study uses the QJM(Quorum Journal Manager)to construct the HA Hadoop big data platform. In this study,uses the ALS collaborative filtering algorithm with the spark coding Framework,at the same time,this study realizes the ALS collab.orative filtering algorithm based on the Spark of parallel operation. Through the comparation experiments(the ALS collaborative fil.tering algorithm based on Hadoop graphs thought and the Netflix data set),the study based on Spark platform of parallel computationis more efficiency. It is more suitable for processing huge amounts of data.
出处
《计算机与数字工程》
2017年第11期2197-2201,共5页
Computer & Digital Engineering
基金
国家自然科学基金项目(编号:51467007)资助