Design of graph computing accelerator based on reconfigurable PE array

doi:10.19682/j.cnki.1005-8885.2024.0013

中国邮电高校学报(英文) ›› 2024, Vol. 31 ›› Issue (5): 49-63.doi: 10.19682/j.cnki.1005-8885.2024.0013

• IC and System Design • 上一篇下一篇

Design of graph computing accelerator based on reconfigurable PE array

邓军勇¹,贾彦亭²,张宝祥²,康钰春²,鲁松涛¹

1. 西安邮电学院
2. 西安邮电大学

收稿日期:2023-12-04 修回日期:2024-02-24 出版日期:2024-10-31 发布日期:2024-10-31
通讯作者: 贾彦亭 E-mail:15339003468@163.com
基金资助:
中国国家自然科学基金会

Design of graph computing accelerator based on reconfigurable PE array

Received:2023-12-04 Revised:2024-02-24 Online:2024-10-31 Published:2024-10-31
Contact: Yan-Ting JIA E-mail:15339003468@163.com
Supported by:
National Natural Science Foundation of China

摘要/Abstract

摘要：

Due to the diversity of graph computing applications, the power-law distribution of graph data, and the high compute-to-memory ratio, traditional architectures face significant challenges regarding poor flexibility, imbalanced workload distribution, and inefficient memory access when executing graph computing tasks. Graph computing accelerator, GraphApp, based on a reconfigurable processing element ( PE) array was proposed to address the challenges above. GraphApp utilizes 16 reconfigurable PEs for parallel computation and employs tiled data. By reasonably dividing the data into tiles, load balancing is achieved and the overall efficiency of parallel computation is enhanced. Additionally, it preprocesses graph data using the compressed sparse columns independently ( CSCI) data compression format to alleviate the issue of low memory access efficiency caused by the high memory access-to-computation ratio. Lastly, GraphApp is evaluated using triangle counting ( TC) and depth-first search ( DFS) algorithms. Performance analysis is conducted by measuring the execution time of these algorithms in GraphApp against existing typical graph frameworks, Ligra, and GraphBIG, using six datasets from the Stanford Network Analysis Project ( SNAP) database. The results show that GraphApp achieves a maximum performance improvement of 30.86 % compared to Ligra and 20.43 % compared to GraphBIG when processing the same datasets.

关键词:

graph computing, reconfigurable accelerator, parallel computing, triangle counting ( TC) algorithm, depth-first search ( DFS) algorithm

Abstract:

Key words:

graph computing, reconfigurable accelerator, parallel computing, triangle counting ( TC) algorithm, depth-first search ( DFS) algorithm

参考文献

[1] ZHU S Q, YU T, XU T, et al. Intelligent computing: the latest advances, challenges, and future. Intelligent Computing, 2023, 2: Article 0006.

[2] SAHEBI A, BARBONE M, PROCACCINI M, et al. Distributed large-scale graph processing on FPGAs. Journal of Big Data, 2023, 10(1): Article 95.

[3] DADU V, LIU S H, NOWATZKI T. PolyGraph: exposing the value of flexibility for graph processing accelerators. Proceedings of the ACM / IEEE 48th Annual International Symposium on Computer Architecture ( ISCA’21 ), 2021, Jun 14 - 18, Valencia, Spain. Piscataway, NJ, USA: IEEE, 2021: 595 - 608.

[4] WANG P Y, LI C, WANG J, et al. Skywalker: efficient alias- method-based graph sampling and random walk on GPUs. Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques ( PACT’21 ), 2021, Sep 26 - 29, Atlanta, GA, USA. Piscataway, NJ, USA: IEEE, 2021: 304 - 317.

[5] GUI C Y, ZHENG L, HE B S, et al. A survey on graph processing accelerators: challenges and opportunities. Journal of Computer Science and Technology, 2019, 34: 339 - 371.

[6] HE L, LIU C, WANG Y, et al. GCiM: a near-data processing accelerator for graph construction. Proceedings of the 58th ACM / IEEE Design Automation Conference ( DAC’21 ), 2021, Dec 5 - 9, San Francisco, CA, USA. Piscataway, NJ, USA: IEEE, 2021: 205 - 210.

[7 ] BRAHMAKSHATRIYA A, ZHANG Y M, HONG C W, et al. Compiling graph applications for GPUs with GraphIt. Proceedings of the 2021 IEEE / ACM International Symposium on Code Generation and Optimization ( CGO’21 ), 2021, Feb 27 - Mar 3, Seoul, Republic of Korea. Piscataway, NJ, USA: IEEE, 2021: 248 - 261.

[8] JUN S W, WRIGHT A, ZHANG S Z, et al. GraFBoost: using accelerated flash storage for external graph analytics. Proceedings of the ACM / IEEE 45th Annual International Symposium on Computer Architecture ( ISCA’18 ), 2018, Jun 1 - 6, Los Angeles, CA, USA. Piscataway, NJ, USA: IEEE, 2018: 411 - 424.

[9] ZHANG Y, LIAO X F, JIN H, et al. DepGraph: a dependency- driven accelerator for efficient iterative graph processing. Proceedings of the 27th IEEE International Symposium on High- Performance Computer Architecture ( HPCA’21), 2021, Feb 27 - Mar 3, Seoul, Republic of Korea. Piscataway, NJ, USA: IEEE, 2021: 371 - 384.

[10] DAI G H, HUANG T H, WANG Y, et al. GraphSAR: a sparsity- aware processing-in-memory architecture for large-scale graph processing on ReRAMs. Proceedings of the 24th Asia and South Pacific Design Automation Conference ( ASPDAC’19 ), 2019, Jan 21 - 24, Tokyo, Japan. New York, NY, USA: ACM, 2019: 120 - 126.

[11] ZHOU J H, LIU S L, GUO Q, et al. TuNao: a high-performance and energy-efficient reconfigurable accelerator for graph processing. Proceedings of the 17th IEEE / ACM International Symposium on Cluster, Cloud and Grid Computing ( CCGRID’17), 2017, May 14 - 17, Madrid, Spain. Piscataway, NJ, USA: IEEE, 2017: 731 - 734.

[12] CHEN X Y, CHEN Y, CHENG F, et al. ReGraph: scaling graph processing on HBM-enabled FPGAs with heterogeneous pipelines. Proceedings of the 55th IEEE / ACM International Symposium on Microarchitecture ( MICRO’22 ), 2022, Oct 1 - 5, Chicago, IL, USA. Piscataway, NJ, USA: IEEE, 2022: 1342 - 1358.

[13] SUNDARAM N, SATISH N R, PATWARY M M A, et al. GraphMat: high performance graph analytics made productive. Proceedings of the VLDB Endowment, 2015, 8 ( 11 ): 1214 -1225.

[14] MAWHIRTER D, WU B. AutoMine: harmonizing high-level abstraction and high performance for graph mining. Proceedings of the 27th ACM Symposium on Operating Systems Principles ( SOSP’19), 2019, Oct 27 - 30, Huntsville, Canada. New York, NY, USA: ACM, 2019: 509 - 523.

[15] WANG L Y, WANG Y Z, YANG C, et al. A comparative study on exact triangle counting algorithms on the GPU. Proceedings of the 1st International Workshop on High Performance Graph Processing ( HPGP’16 ), 2016, May 31, Kyoto, Japan. New York, NY, USA: ACM, 2016: 1 - 8.

[16] AZAD A, BULUÇ A, GILBERT J. Parallel triangle counting and enumeration using matrix algebra. Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015, May 25 - 29, Hyderabad, India. Piscataway, NJ, USA: IEEE, 2015: 804 - 811.

[17] CHAKRABORTY S, ENGELS C. Lower bounds for lexicographical DFS data structures. Proceedings of the 2022 Data Compression Conference ( DCC’22 ), 2022, Mar 22 - 25, Snowbird, UT, USA. Piscataway, NJ, USA: IEEE, 2022: 449.

[18] ZHANG Z W, YU J X, QIN L, et al. Divide & conquer: I / O efficient depth-first search. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data ( SIGMOD’15), 2015, May 31 - Jun 4, Melbourne, Australia. New York, NY, USA: ACM, 2015: 445 - 458.

[19] UMUROGLU Y, MORRISION D, JAHRE M. Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform. Proceedings of the 25th International Conference on Field Programmable Logic and Applications ( FPL’15 ), 2015, Sep 2 - 4, London, UK. Piscataway, NJ, USA: IEEE, 2015: 1 - 8.

[20] GONZALEZ J E, LOW Y, GU H J, et al. PowerGraph: distributed graph-parallel computation on natural graphs. Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation ( OSDI’12), 2012, Oct 8 - 10, Hollywood, CA, USA. Berkeley, CA, USA: USENIX Association, 2012: 17 - 30.

[21] ZHOU S J, CHEMLMIS C, PRASANNA V K. High-throughput and energyefficient graph processing on FPGA. Proceedings of the IEEE 24th Annual International Symposium on Field- Programmable Custom Computing Machines ( FCCM’16 ), 2016, May 1 - 3, Washington, DC, USA. Piscataway, NJ, USA: IEEE, 2016: 103 - 110.

[22] YIN L X, WANG J, ZHENG H. Exploring architecture, dataflow, and sparsity for GCN accelerators: a holistic framework. Proceedings of the Great Lakes Symposium on VLSI 2023 ( GLSVLSI’23 ), 2023, Jun 5 - 7, Knoxville, TN, USA. New York, NY, USA: ACM, 2023: 489 - 495.

[23] REN H, DENG J Y, ZHANG B X, et al. A breadth-first search algorithm accelerator based on CSCI graph data format. Proceedings of the 4th International Conference on Natural Language Processing ( ICNLP’22 ), 2022, Mar 25 - 27, Xi’an, China. Piscataway, NJ, USA: IEEE, 2022: 636 - 640.

[24] DANN J, RITTER D, FRÖNING H. GraphScale: scalable bandwidth-efficient graph processing on FPGAs. Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications ( FPL’22 ), 2022, Aug 29 - Sep 2, Belfast, UK. Piscataway, NJ, USA: IEEE, 2022: 24 - 32.

[25] LIU C Q, LIU H F, ZHENG L, et al. FNNG: a high-performance FPGA-based accelerator for K-nearest neighbor graph construction. Proceedings of the 2023 ACM / SIGDA International Symposium on Field Programmable Gate Arrays ( FPGA’23 ), 2023, Feb 12 - 14, Monterey, CA, USA. New York, NY, USA: ACM, 2023: 67 - 77.

[26] SHUN J L, BLELLOCH G E. Ligra: a lightweight graph processing framework for shared memory. Proceedings of the 18th ACM SIGPLAN Symposium on Principles And Practice of Parallel Programming ( PPoPP’13 ), 2013, Feb 23 - 27, Shenzhen, China. New York, NY, USA: ACM, 2013: 135 - 146.

[27] NAI L F, XIA Y L, TANASE I G, et al. GraphBIG: understanding graph computing in the context of industrial solutions. Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis ( SC’15 ), 2015, Nov 15 - 20, Austin, TX, USA. Piscataway, NJ, USA: IEEE, 2015: 1 - 12.

Design of graph computing accelerator based on reconfigurable PE array

Design of graph computing accelerator based on reconfigurable PE array

PDF

PDF (Mobile)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

编辑推荐

Metrics

本文评价