摘要
矩阵分解是近几年提出的一种协同过滤推荐技术,但其每项预测评分的计算都要综合大量评分数据,同时在计算时还需要存储庞大的特征矩阵,用单一结点来进行推荐将会遇到计算时间和计算资源瓶颈。结合MapReduce分布式计算框架和矩阵分解推荐算法,设计了一种基于MapReduce的矩阵分解推荐算法来解决该问题,利用Hadoop的分布式缓存技术和MapFile文件结构解决了大特征矩阵在多结点间的高效共享问题并实现了多正则因子的并行处理。通过在Netflix数据集上的实验表明,该MapReduce算法及数据存储方案能带来较高的加速比,从而提高了推荐算法的计算效率。
Matrix factorization is a collaborative filtering recommendation technique proposed in recent years.In the process of recommendation,each prediction depends on the collaboration of the whole known rating set and the feature matrices need huge storage.So the recommendation with only one node will meet the bottleneck of time and resource.A MapReduce-based matrix factorization recommendation algorithm was proposed to solve this problem.The big feature matrices were shared by Hadoop distributed cache and MapFile techniques.The MapReduce algorithm could also handle multi-λ situation.The experiment on Netflix data set shows that the MapReduce-based algorithm has high speedup and improves the efficiency of collaborative filtering.
出处
《计算机科学》
CSCD
北大核心
2013年第1期19-21,36,共4页
Computer Science
基金
国家科技支撑计划项目(2012BAH15F03)资助