期刊文献+

基于Spark的并行ALS协同过滤算法研究 被引量:2

Research on Parallel Als Algorithm Based on Spark
在线阅读 下载PDF
导出
摘要 ALS(最小二乘法)协同过滤推荐算法是通过矩阵分解进行推荐,它通过综合大量的用户评分数据进行计算,并存储计算过程中产生的大量特征矩阵。Hadoop的HA(高可用性)用来解决HDFS分布式文件系统的Name Node单点故障问题。Spark是一种基于内存的新型分布式大数据计算框架,具有优异的计算性能。文章基于QJM(Quorum Journal Manag-er)构建了HA下的Hadoop大数据平台,并在Spark计算框架基础上研究使用ALS协同过滤算法,实现基于ALS协同过滤算法在Spark上的并行化运行;通过和基于Hadoop的Map Reduce思想的ALS协同过滤算法在Netflix数据集上的比对实验表明,基于Spark平台的ALS协同过滤算法的并行化计算效率有明显提升,并且更适合处理海量数据。 ALS(least square)is a collaborative filtering recommendation algorithm recommended by matrix decomposition,itis calculated by a combination of a large number of user rating data,and stored the calculation process of a large number of charac.teristic matrix. Hadoop-HA(High Available)is used to solve the problem of the single point of failure of the NameNode. The Sparkis a computing framework based on new type of large data come up with distributed memory,at the same time it has excellent comput.ing performance. This study uses the QJM(Quorum Journal Manager)to construct the HA Hadoop big data platform. In this study,uses the ALS collaborative filtering algorithm with the spark coding Framework,at the same time,this study realizes the ALS collab.orative filtering algorithm based on the Spark of parallel operation. Through the comparation experiments(the ALS collaborative fil.tering algorithm based on Hadoop graphs thought and the Netflix data set),the study based on Spark platform of parallel computationis more efficiency. It is more suitable for processing huge amounts of data.
出处 《计算机与数字工程》 2017年第11期2197-2201,共5页 Computer & Digital Engineering
基金 国家自然科学基金项目(编号:51467007)资助
关键词 ALS 协同过滤 矩阵分解 HighAvailable SPARK ALS,collaborative filtering,Matrix decomposition,High Available,Spark
  • 相关文献

参考文献6

二级参考文献55

  • 1王海波.云计算中数据库的关键问题研究与实现[D].吉林大学,2011.
  • 2Koren Y, Bell R, Volinsky C. Matrix Factorization Techniques for Recommender Systems[J]. Computer, 2009,42 (8) : 30-37.
  • 3Bell R M,Koren Y. Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights[C]//Proc of the 7th IEEE International Conference on Data Mining. Omaha NE, USA: IEEE, 2007: 43-52.
  • 4Takacs G, Pilaszy I, Nemeth B, et al. Matrix Factorization and Neighbor Based Algorithms the Netflix Prize Problem [C]// Proceedings of the 2008 ACM conference on Recommender sys- tems. Lausanne, Switzerland: ACM, 2008 : 267 274.
  • 5Zhou Y, Wilkinson D, Schreiber R, et al. Large-Scale ParallelCollaborative Filtering for the Netflix Prize[C]//Proc of the 4th international conference on Algorithmic Aspects in Information and Management. 2008.
  • 6Dean J,Ghemawat S. MapReduee: Simplified Data Processing on Large Clusters[J]. Communication of the ACM 50: anniversary issue, 2008,51 (1) : 107d 13.
  • 7Hadoop. Open-source software for reliable, scalable, distributed computing[-EB/OL], http://hadoop, apache, org/, 2011.
  • 8Mahout. Scalable machine learning and data mining[EB/OL]. http://mahout, apache, org, 2011.
  • 9Takacs G, Pliaszy I, Nemeth B, et al. Investigation of Various Matrix Factorization Methods for Large Recommender Systems [C]// Proc of the IEEE International Conference on Data Mi- ning Workshops. IEEE, 2008: 553-562.
  • 10Pilaszy I, Zibriczky D, Tikk D. Fast AL:based Matrix Factori- zation for Explicit and Implicit Feedback Datasets[C]//'Procee: dings of the fourth ACM conference on Recommender systems. New York: ACM, 2010 : 71-78.

共引文献52

同被引文献9

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部