期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Optimizing Memory Access Efficiency in CUDA Kernel via Data Layout Technique
1
作者 Neda Seifi Abdullah Al-Mamun 《Journal of Computer and Communications》 2024年第5期124-139,共16页
Over the past decade, Graphics Processing Units (GPUs) have revolutionized high-performance computing, playing pivotal roles in advancing fields like IoT, autonomous vehicles, and exascale computing. Despite these adv... Over the past decade, Graphics Processing Units (GPUs) have revolutionized high-performance computing, playing pivotal roles in advancing fields like IoT, autonomous vehicles, and exascale computing. Despite these advancements, efficiently programming GPUs remains a daunting challenge, often relying on trial-and-error optimization methods. This paper introduces an optimization technique for CUDA programs through a novel Data Layout strategy, aimed at restructuring memory data arrangement to significantly enhance data access locality. Focusing on the dynamic programming algorithm for chained matrix multiplication—a critical operation across various domains including artificial intelligence (AI), high-performance computing (HPC), and the Internet of Things (IoT)—this technique facilitates more localized access. We specifically illustrate the importance of efficient matrix multiplication in these areas, underscoring the technique’s broader applicability and its potential to address some of the most pressing computational challenges in GPU-accelerated applications. Our findings reveal a remarkable reduction in memory consumption and a substantial 50% decrease in execution time for CUDA programs utilizing this technique, thereby setting a new benchmark for optimization in GPU computing. 展开更多
关键词 data layout optimization CUDA Performance optimization GPU Memory optimization Dynamic Programming Matrix Multiplication Memory Access Pattern optimization in CUDA
在线阅读 下载PDF
STrans:A Comprehensive Framework for Structure Transformation
2
作者 Jiangzhou He Wenguang Chen Zhizhong Tang 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2016年第2期231-240,共10页
Structure Data Layout Optimization (SDLO) is a prevailing compiler optimization technique to improve cache efficiency. Structure transformation is a critical step for SDLO. Diversity of transformation methods and ex... Structure Data Layout Optimization (SDLO) is a prevailing compiler optimization technique to improve cache efficiency. Structure transformation is a critical step for SDLO. Diversity of transformation methods and existence of complex data types are major challenges for structure transformation. We have designed and implemented STrans, a well-defined system which provides controllable and comprehensive functionality on structure transformation. Compared to known systems, it has less limitation on data types for transformation. In this paper we give formal definition of the approach STrans transforms data types. We have also designed Transformation Specification Language, a mini language to configure how to transform structures, which can be either manually tuned or generated by compiler. STrans supports three kinds of transformation methods, i.e., splitting, peeling, and pool-splitting, and works well on different combinations of compound data types. STrans is the transformation system used in ASLOP and is well tested for all benchmarks for ASLOR 展开更多
关键词 Structure data layout optimization (SDLO) STrans ASLOP structure transformation
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部