中国邮电高校学报(英文) ›› 2022, Vol. 29 ›› Issue (5): 1-9.doi: 10.19682/j.cnki.1005-8885.2022.0026

所属专题: Special Topic on Artificial Intelligence of Things

• Special Topic: Artificial Intelligence of Things •    下一篇

Design of high parallel CNN accelerator based on FPGA for AIoT

林志坚1,2,高学伟2,陈小培2,祝志鹏2,杜小勇2,陈平平1   

  1. 1. 福州大学物理与信息工程学院
    2. 福州大学先进制造学院
  • 收稿日期:2022-05-31 修回日期:2022-09-08 出版日期:2022-10-31 发布日期:2022-10-28
  • 通讯作者: 陈平平 E-mail:ppchen.xm@gmail.com
  • 基金资助:
    无人机协同网络中基于方向调制的隐蔽安全传输方法与理论研究;基于物理层网络编码的随机多址接入技术研究;2020年福建省高等学校科技创新团队(产业化专项)

Design of high parallel CNN accelerator based on FPGA for AIoT

Lin Zhijian1,2, Gao Xuewei1, Chen Xiaopei1, Zhu Zhipeng1, Du Xiaoyong1, Chen Pingping2    

  1. 1.
    2. SCHOOL OFADVANCED MANUFACTURING
    3. Fuzhou University School of Advanced Manufacturing
    4. Fuzhou University Advanced Manufacturing Academy
  • Received:2022-05-31 Revised:2022-09-08 Online:2022-10-31 Published:2022-10-28

摘要:

To tackle the challenge of applying convolutional neural network (CNN) in field-programmable gate array (FPGA) due to its computational complexity, a high-performance CNN hardware accelerator based on Verilog hardware description language was designed, which utilizes a pipeline architecture with three parallel dimensions including input channels, output channels, and convolution kernels. Firstly, two multiply-and-accumulate (MAC) operations were packed into one digital signal processing (DSP) block of FPGA to double the computation rate of the CNN accelerator. Secondly, strategies of feature map block partitioning and special memory arrangement were proposed to optimize the total amount of off-chip access memory and reduce the pressure on FPGA bandwidth. Finally, an efficient computational array combining multiplicative-additive tree and Winograd fast convolution algorithm was designed to balance hardware resource consumption and computational performance. The high parallel CNN accelerator was deployed in ZU3EG of Alinx, using the YOLOv3-tiny algorithm as the test object. The average computing performance of the CNN accelerator is 127.5 giga operations per second (GOPS). The experimental results show that the hardware architecture effectively improves the computational power of CNN and provides better performance compared with other existing schemes in terms of power consumption and the efficiency of DSPs and block random access memory (BRAMs).

关键词: artificial intelligence of things (AIoT)|convolutional neural network (CNN) accelerator|Winograd convolution| field-programmable gate
array (FPGA)

Abstract:

To tackle the challenge of applying convolutional neural network (CNN) in field-programmable gate array (FPGA) due to its computational complexity, a high-performance CNN hardware accelerator based on Verilog hardware description language was designed, which utilizes a pipeline architecture with three parallel dimensions including input channels, output channels, and convolution kernels. Firstly, two multiply-and-accumulate (MAC) operations were packed into one digital signal processing (DSP) block of FPGA to double the computation rate of the CNN accelerator. Secondly, strategies of feature map block partitioning and special memory arrangement were proposed to optimize the total amount of off-chip access memory and reduce the pressure on FPGA bandwidth. Finally, an efficient computational array combining multiplicative-additive tree and Winograd fast convolution algorithm was designed to balance hardware resource consumption and computational performance. The high parallel CNN accelerator was deployed in ZU3EG of Alinx, using the YOLOv3-tiny algorithm as the test object. The average computing performance of the CNN accelerator is 127.5 giga operations per second (GOPS). The experimental results show that the hardware architecture effectively improves the computational power of CNN and provides better performance compared with other existing schemes in terms of power consumption and the efficiency of DSPs and block random access memory (BRAMs).

Key words: artificial intelligence of things (AIoT)|convolutional neural network (CNN) accelerator|Winograd convolution| field-programmable gate
array (FPGA)