期刊文献+

基于主导值的计算和数据自动划分算法 被引量:5

Automatic Computation and Data Decomposition Algorithm Based on Dominant Value
在线阅读 下载PDF
导出
摘要 计算和数据自动划分是并行化编译中一种自动分配计算和数据到各个处理机的优化技术,划分的结果直接影响程序并行的性能。数组是划分处理的主要对象之一,一些数组分布后的收益不高,但带来的并行约束却能对其它数组的划分产生干扰,导致大量数据重分布通信的产生。现有的划分算法中没有约定数组分布的优先次序,因此无法限制这些数组并行约束的传播,降低了优化编译器后端自动生成并行代码的性能。提出了一种基于主导值的计算和数据自动划分算法:将划分过程中数组对程序并行性的影响量化为主导值,并依据主导值的大小约定数组分布的优先次序,限制干扰数组并行约束的传播速度,提高划分结果的合理性。实验结果表明,算法能够获得良好的划分效果。 Automatic computation and data decomposition are an optimization technique that distributes computations and data onto different processors. The decomposition result has a direct impact on the performance of program's parallelization. Array is one of main targets of treatment for the decomposition algorithm, and some profits of them are not enough after parallelization, but it will bring constraints and disrupt the other distribution of array, leading to large amounts communication of data re-distribution. The decomposition algorithm in existing has no agreement in the order of array distribution,therefore can't restrict propagation of constraint of array's parallelization, reducing performance of optimized parallel code automatically generated by the hack-end compiler. This paper presented an automatic computation and data decomposition based on the" dominant values. Algorithm quantified the impact of array on the programs' parallelism as the dominant value, and agreed priorities of distribution based on the size of the dominant values of array, limited the spread speed of constraints of interference array, improved the reasonableness of decomposition results. Experimental results show that the algorithm can get good decomposition results.
出处 《计算机科学》 CSCD 北大核心 2012年第3期290-294,303,共6页 Computer Science
基金 "核高基"重大专项子课题(863)(2009AA01220 2009zx10036-001-001)资助
关键词 自动并行化 计算划分 数据分布 主导值 约束 Automatic parallelization, Computation decomposition, Data decomposition, Dominant values, Constraint
  • 相关文献

参考文献14

  • 1Anderson J M. Automatic computation and data decomposition for multiproeessors [D]. US: Stanford University, 1997.
  • 2Kennedy K,Kremer U. Automatic Data Layout for Distributed- Memory Machines [J]. ACM Transactions on Programming Languages and Systems, 1998,20 (4) : 869-916.
  • 3韩林.面向分布存储结构的并行分饵一致性优化技术研究[D].郑州:解放军信息工程大学,2008.
  • 4夏军,杨学军.基于数据空间融合的全局计算与数据划分方法[J].软件学报,2004,15(9):1311-1327. 被引量:7
  • 5张为华,王鹏,臧斌宇,朱传琪.一种基于代表元的划分算法[J].计算机学报,2008,31(3):400-410. 被引量:4
  • 6Lee Pei-zong. Automatic data and computation decomposition on distributed memory parallel eomputers[J]. ACM Transactions on Programming Languages and Systems, 2002,24 (1) : 1-50.
  • 7Brian S A. Enabling automatic parallelization of industrial-grade applications [D]. US: Purdue University, 2010.
  • 8Zhen Cao, Yuan Dong, Wang Sheng-yuan. Domain-specific pattern matching based automatic parallelization demonstrated by 2-D prestack migration: Parallel and Distributed Systems(IC- PADS) [C]// 15th International Conference. Shenzhen, China, 2009:973-980.
  • 9Anderson J M, Lam M S. Global optimizations for parallelism and locality on scalable parallel machines [C] // Proceedings of the ACM SIGPLAN' 93 Conference on Programming Language Design and Implementation. Albuquerque, New Mexico, USA, 1993:112-125.
  • 10Anderson J M, Lam M S. Data and computation transformations for mulfiprocessors[C]//Proceedings of the 5th ACM/SIGP- LAN Symposium on Principles and Practice of Parallel Programruing. Santa Barbara,California, USA, 1995 : 166-178.

二级参考文献34

  • 1[1]Chen TS, Chang CY. Skewed data partition and alignment techniques for compiling programs on distributed memory multicomputers. The Journal of Supercomputing 2002,21 (2): 191~211.
  • 2[2]Chang WL, Chu CP, Wu JH. Communication-Free alignment for array references with linear subscripts in three loop index variables or quadratic subscripts. The Journal of Supercomputer; 2001,20(1):67~83.
  • 3[3]Shih KP, Sheu JP, Huang CH. Statement-Level communication-free partitioning technique for parallelizing compilers. The Journal of Supercomputing, 2000,15(3):243~269.
  • 4[4]Lim AW. Improve parallelism and data locality with affine partitioning[Ph.D. Thesis]. Palo Alto: Stanford University, 2001.
  • 5[5]Chen TS, Sheu JP. Communication-Free data allocation techniques for parallelizing compilers on multicomputers. IEEE Trans on Parallel and Distributed Systems, 1994,5(9):924~938.
  • 6[6]Ramanujam J, Sadayappan P. Compile-Time techniques for data distribution in distributed memory machines IEEE Trans. on Parallel and Distributed systems, 1991,2(4):472~482.
  • 7[7]Huang CH, Sadayappan P. Communication-Free hyperplane partitioning of nested loops. Journal of Parallel ard Distributed Computing, 1993,19(2):90~102.
  • 8[8]Wolf M. High Performance Compilers for Parallel Computing. Redwood Addison-Wesley Publishing Company, 1996. 137~510.
  • 9[9]Wolf M, Lam M. A data locality optimizing algorithm. In: Mauney J, ed. Proc. of the SIGPLAN'91 Conf. on Programming Language Design and Implementation. New York: ACM Press, 1991.30~44.
  • 10[10]Shen ZY, Hu ZA, Liao XK, Wu HP, Zhao KJ, Lu YT. Methods of Parallel Compilation. Beijing: National Defence Industry Press,2000.31~127 (in Chinese).

共引文献7

同被引文献72

  • 1夏军,杨学军.基于数据空间融合的全局计算与数据划分方法[J].软件学报,2004,15(9):1311-1327. 被引量:7
  • 2王轶然,陈莉,冯晓兵,张兆庆.全局部分重复计算划分[J].计算机研究与发展,2006,43(12):2158-2165. 被引量:2
  • 3Anderson JM , Lam MS. Global optimizations for parallelism and locality on scalable parallel machines [ C ] //Cartwright R,ed. Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation. Albuquerque :ACM New York,1993 : 112-125.
  • 4Anderson JM. Automatic Computation and Data Decomposition for Multiprocessors[ D]. Stanford University, 1997.
  • 5Lim AW. Improve Parallelism and Data Locality with Affine Partitioning[ D]. Stanford University, 2001.
  • 6Lee PZ, Kedem ZM. Automatic Data and Computation Decomposition on Distributed Memory Parallel Computers[ J]. ACMTransactions on Programming Languages and Systems, 2002,24(1) : 1 -50.
  • 7Han L,Zhao RC,Pang JM. Dynamic decomposition algorithm merging control flow analysis [ C ]//Arabniaed HR, ed. The2007 International Conference on Parallel and Distributed Processing Techniques and Applications. Las Vegas Nevada : CSREApress, 2007: 245-250.
  • 8Han L,Zhao RC , Pang JM. A Consistency Combination Algorithm for Global Dynamic Computation and Data Decompositions[C ] //IEEE Second International Conference on Complex, Intelligent and Software Intensive Systems ( CISIS ’ 08 ) . 2008 :148-154.
  • 9Bondhugula U , Hartono A, Ramanujam J, et al. A practical automatic polyhedral parallelizer and locality optimizer[ C ]//Proceedings of The ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation. 2008 : 101-113.
  • 10Bondhugula U. Automatic Distributed-Memory Parallelization and Code Generation using the Polyhedral Framework [ R ].Technical Report, Report No. IISc-CSA-TR-2011-3 , Bangalore : Indian Institute of Science, 2011.

引证文献5

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部