摘要
With the rapid growth of real-world graphs,the size of which can easily exceed the on-chip(board)storage capacity of an accelerator,processing large-scale graphs on a single Field Programmable Gate Array(FPGA)becomes difficult.The multi-FPGA acceleration is of great necessity and importance.Many cloud providers(e.g.,Amazon,Microsoft,and about:blank)now expose FPGAs to users in their data centers,providing opportunities to accelerate large-scale graph processing.In this paper,we present a communication library,called FDGLib,which can easily scale out any existing single FPGA-based graph accelerator to a distributed version in a data center,with minimal hardware engineering efforts.FDGLib provides six APIs that can be easily used and integrated into any FPGA-based graph accelerator with only a few lines of code modifications.Considering the torus-based FPGA interconnection in data centers,FDGLib also improves communication efficiency using simple yet effective torus-friendly graph partition and placement schemes.We interface FDGLib into AccuGraph,a state-of-the-art graph accelerator.Our results on a 32-node Microsoft Catapult-like data center show that the distributed AccuGraph can be 2.32x and 4.77x faster than a state-of-the-art distributed FPGA-based graph accelerator ForeGraph and a distributed CPU-based graph system Gemini,with better scalability.
基金
supported by the National Key Research and Development Program of China under Grant No.2018YFB1003502
the National Natural Science Foundation of China under Grant Nos.62072195,61825202,61832006,and 61628204.