中国邮电高校学报(英文) ›› 2024, Vol. 31 ›› Issue (2): 94-104.doi: 10.19682/j.cnki.1005-8885.2024.0009

所属专题: 集成电路

• IC and System Design • 上一篇    下一篇

Design and implementation of a multi-tile parallel scanning rasterization accelerator

邢立冬,郭强,彭欣龙,冯臻夫   

  1. 西安邮电大学
  • 收稿日期:2023-12-04 修回日期:2024-03-09 出版日期:2024-04-30 发布日期:2024-04-30
  • 通讯作者: 邢立冬 E-mail:zmy_xld@163.com
  • 基金资助:
    陕西省教育厅科研计划项目资助

Design and implementation of a multi-tile parallel scanning rasterization accelerator

  • Received:2023-12-04 Revised:2024-03-09 Online:2024-04-30 Published:2024-04-30
  • Contact: Li-Dong XING E-mail:zmy_xld@163.com
  • Supported by:
    Scientific Research Program Funded by Shaanxi Provincial Education Department

摘要:

In the design of a graphic processing unit (GPU), the processing speed of triangle rasterization is an important factor that determines the performance of the GPU. An architecture of a multi-tile parallel-scan rasterization accelerator was proposed in this paper. The accelerator uses a bounding box algorithm to improve scanning efficiency. It rasterizes multiple tiles in parallel and scans multiple lines at the same time within each tile. This highly parallel approach drastically improves the performance of rasterization. Using 65nm process standard cell library of Semiconductor Manufacturing International Corporation (SMIC), the accelerator can be synthesized to a maximum clock frequency of 220MHz. An implementation on the Genesys2 field programmable gate array (FPGA) board fully verifies the functionality of the accelerator. The implementation shows a significant improvement in rendering speed and efficiency and proves its suitability for high- performance rasterization.

关键词:  GPU, rasterization, multi-tile, multi-line, parallelism

Abstract:

In the design of a graphic processing unit (GPU), the processing speed of triangle rasterization is an important factor that determines the performance of the GPU. An architecture of a multi-tile parallel-scan rasterization accelerator was proposed in this paper. The accelerator uses a bounding box algorithm to improve scanning efficiency. It rasterizes multiple tiles in parallel and scans multiple lines at the same time within each tile. This highly parallel approach drastically improves the performance of rasterization. Using 65nm process standard cell library of Semiconductor Manufacturing International Corporation (SMIC), the accelerator can be synthesized to a maximum clock frequency of 220MHz. An implementation on the Genesys2 field programmable gate array (FPGA) board fully verifies the functionality of the accelerator. The implementation shows a significant improvement in rendering speed and efficiency and proves its suitability for high- performance rasterization.

Key words:  GPU, rasterization, multi-tile, multi-line, parallelism