期刊文献+

面向申威架构的KNN并行算法实现与优化 被引量:5

Implementation and Optimization of Parallel KNN Algorithm for Sunway Architecture
在线阅读 下载PDF
导出
摘要 K近邻(KNN)是人工智能中最常用的分类算法,其性能提升对于海量数据的整理分析、大数据分类等任务具有重要意义。目前新一代神威超级计算机正处于应用发展的初始阶段,结合新一代申威异构众核处理器的结构特性,充分利用庞大的计算资源实现高效的KNN算法是海量数据分析整理的现实需求。根据SW26010pro处理器的结构特性,采用主从加速编程模型实现一种基础版本的KNN并行算法,其将计算核心传输到从核上,实现了线程级并行。分析影响基础并行算法性能的关键因素并提出优化算法SWKNN,不同于基础并行KNN算法的任务划分方式,SWKNN采用任务重划分策略,以避免冗余计算开销。通过数据流水优化、从核间通信优化、二次负载均衡优化等步骤减少不必要的通信开销,从而有效缓解访存压力并进一步提升算法性能。实验结果表明,与串行KNN算法相比,面向申威架构的基础并行KNN算法在SW26010pro处理器的单核组上可以获得最高48倍的加速效果,在同等数据规模下,SWKNN算法较基础并行KNN算法又可以获得最高399倍的加速效果。 The K-Nearest Neighbor(KNN)algorithm is the most typically used classification algorithm in artificial intelligence,and its performance improvement significantly affects the sorting and analysis of massive data and big data classification.The current new generation of Sunway supercomputers is in the initial stage of application development.Exploiting the structural characteristics of the new-generation Sunway heterogeneous many-core processors allows an efficient KNN algorithm to be achieved for massive data analysis and collation.In this study,based on the structural characteristics of the SW26010pro processor,the master-slave acceleration programming model is used to implement the basic version of the KNN parallel algorithm,which transfers the computing core to the slave core for thread-level parallelism.Subsequently,the key factors affecting the performance of the basic parallel algorithm are analyzed,and the SWKNN algorithm is proposed,which is different from the task-division method of the basic parallel KNN algorithm.Finally,unnecessary communication overhead is reduced through data pipelining optimization,intercore communication optimization,and secondary load balancing optimization,which effectively relieves memory access pressure and further improves the algorithm performance.The experimental results show that,compared with the serial KNN algorithm,the basic parallel KNN algorithm for the Sunway architecture can achieve a maximum speedup that is 48 times higher on the single-core group of the SW26010pro processor.At the same scale,the SWKNN can achieve a speedup that is 399 times higher than that of the basic parallel KNN algorithm.
作者 王其涵 庞建民 岳峰 祝迪 沈莉 肖谦 WANG Qihan;PANG Jianmin;YUE Feng;ZHU Di;SHEN Li;XIAO Qian(State Key Laboratory of Mathematical Engineering and Advanced Computing,Information Engineering University,Zhengzhou 450000,China;University of Science and Technology of China,Hefei 230000,China;Jiangnan Institute of Computing Technology,Wuxi 214000,Jiangsu,China)
出处 《计算机工程》 CAS CSCD 北大核心 2023年第5期286-294,共9页 Computer Engineering
基金 国家自然科学基金“基于深度学习与计算语言学的恶意代码作者身份识别研究”(61802433)。
关键词 异构众核处理器 K近邻算法 并行计算 算法优化 分类性能 heterogeneous many-core processors K-Nearest Neighbor(KNN)algorithm parallel computing algorithm optimization classification performance
  • 相关文献

参考文献13

二级参考文献73

共引文献107

同被引文献54

引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部