中国邮电高校学报(英文) ›› 2024, Vol. 31 ›› Issue (5): 49-63.doi: 10.19682/j.cnki.1005-8885.2024.0013

• IC and System Design • 上一篇    下一篇

Design of graph computing accelerator based on reconfigurable PE array

邓军勇1,贾彦亭2,张宝祥2,康钰春2,鲁松涛1   

  1. 1. 西安邮电学院
    2. 西安邮电大学
  • 收稿日期:2023-12-04 修回日期:2024-02-24 出版日期:2024-10-31 发布日期:2024-10-31
  • 通讯作者: 贾彦亭 E-mail:15339003468@163.com
  • 基金资助:
    中国国家自然科学基金会

Design of graph computing accelerator based on reconfigurable PE array

  • Received:2023-12-04 Revised:2024-02-24 Online:2024-10-31 Published:2024-10-31
  • Contact: Yan-Ting JIA E-mail:15339003468@163.com
  • Supported by:
    National Natural Science Foundation of China

摘要:

Due to the diversity of graph computing applications, the power-law distribution of graph data, and the high compute-to-memory ratio, traditional architectures face significant challenges regarding poor flexibility, imbalanced workload distribution, and inefficient memory access when executing graph computing tasks. Graph computing accelerator, GraphApp, based on a reconfigurable processing element ( PE) array was proposed to address the challenges above. GraphApp utilizes 16 reconfigurable PEs for parallel computation and employs tiled data. By reasonably dividing the data into tiles, load balancing is achieved and the overall efficiency of parallel computation is enhanced. Additionally, it preprocesses graph data using the compressed sparse columns independently ( CSCI) data compression format to alleviate the issue of low memory access efficiency caused by the high memory access-to-computation ratio. Lastly, GraphApp is evaluated using triangle counting ( TC) and depth-first search ( DFS) algorithms. Performance analysis is conducted by measuring the execution time of these algorithms in GraphApp against existing typical graph frameworks, Ligra, and GraphBIG, using six datasets from the Stanford Network Analysis Project ( SNAP) database. The results show that GraphApp achieves a maximum performance improvement of 30.86 % compared to Ligra and 20.43 % compared to GraphBIG when processing the same datasets.


关键词:

graph computing, reconfigurable accelerator, parallel computing, triangle counting ( TC) algorithm, depth-first search ( DFS) algorithm


Abstract:

Due to the diversity of graph computing applications, the power-law distribution of graph data, and the high compute-to-memory ratio, traditional architectures face significant challenges regarding poor flexibility, imbalanced workload distribution, and inefficient memory access when executing graph computing tasks. Graph computing accelerator, GraphApp, based on a reconfigurable processing element ( PE) array was proposed to address the challenges above. GraphApp utilizes 16 reconfigurable PEs for parallel computation and employs tiled data. By reasonably dividing the data into tiles, load balancing is achieved and the overall efficiency of parallel computation is enhanced. Additionally, it preprocesses graph data using the compressed sparse columns independently ( CSCI) data compression format to alleviate the issue of low memory access efficiency caused by the high memory access-to-computation ratio. Lastly, GraphApp is evaluated using triangle counting ( TC) and depth-first search ( DFS) algorithms. Performance analysis is conducted by measuring the execution time of these algorithms in GraphApp against existing typical graph frameworks, Ligra, and GraphBIG, using six datasets from the Stanford Network Analysis Project ( SNAP) database. The results show that GraphApp achieves a maximum performance improvement of 30.86 % compared to Ligra and 20.43 % compared to GraphBIG when processing the same datasets.

Key words:

graph computing, reconfigurable accelerator, parallel computing, triangle counting ( TC) algorithm, depth-first search ( DFS) algorithm