Design of high parallel CNN accelerator based on FPGA for AIoT

doi:10.19682/j.cnki.1005-8885.2022.0026

中国邮电高校学报(英文) ›› 2022, Vol. 29 ›› Issue (5): 1-9.doi: 10.19682/j.cnki.1005-8885.2022.0026

所属专题： Special Topic on Artificial Intelligence of Things

• Special Topic: Artificial Intelligence of Things • 下一篇

Design of high parallel CNN accelerator based on FPGA for AIoT

林志坚¹,²,高学伟²,陈小培²,祝志鹏²,杜小勇²,陈平平¹

1. 福州大学物理与信息工程学院
2. 福州大学先进制造学院

收稿日期:2022-05-31 修回日期:2022-09-08 出版日期:2022-10-31 发布日期:2022-10-28
通讯作者: 陈平平 E-mail:ppchen.xm@gmail.com
基金资助:
无人机协同网络中基于方向调制的隐蔽安全传输方法与理论研究;基于物理层网络编码的随机多址接入技术研究;2020年福建省高等学校科技创新团队(产业化专项)

Design of high parallel CNN accelerator based on FPGA for AIoT

Lin Zhijian¹,², Gao Xuewei¹, Chen Xiaopei¹, Zhu Zhipeng¹, Du Xiaoyong¹, Chen Pingping²

1.
2. SCHOOL OFADVANCED MANUFACTURING
3. Fuzhou University School of Advanced Manufacturing
4. Fuzhou University Advanced Manufacturing Academy

Received:2022-05-31 Revised:2022-09-08 Online:2022-10-31 Published:2022-10-28

摘要/Abstract

摘要：

To tackle the challenge of applying convolutional neural network (CNN) in field-programmable gate array (FPGA) due to its computational complexity, a high-performance CNN hardware accelerator based on Verilog hardware description language was designed, which utilizes a pipeline architecture with three parallel dimensions including input channels, output channels, and convolution kernels. Firstly, two multiply-and-accumulate (MAC) operations were packed into one digital signal processing (DSP) block of FPGA to double the computation rate of the CNN accelerator. Secondly, strategies of feature map block partitioning and special memory arrangement were proposed to optimize the total amount of off-chip access memory and reduce the pressure on FPGA bandwidth. Finally, an efficient computational array combining multiplicative-additive tree and Winograd fast convolution algorithm was designed to balance hardware resource consumption and computational performance. The high parallel CNN accelerator was deployed in ZU3EG of Alinx, using the YOLOv3-tiny algorithm as the test object. The average computing performance of the CNN accelerator is 127.5 giga operations per second (GOPS). The experimental results show that the hardware architecture effectively improves the computational power of CNN and provides better performance compared with other existing schemes in terms of power consumption and the efficiency of DSPs and block random access memory (BRAMs).

关键词: artificial intelligence of things (AIoT)|convolutional neural network (CNN) accelerator|Winograd convolution| field-programmable gate
array (FPGA)

Abstract:

Key words: artificial intelligence of things (AIoT)|convolutional neural network (CNN) accelerator|Winograd convolution| field-programmable gate
array (FPGA)

Lin Zhijian Gao Xuewei Chen Xiaopei Zhu Zhipeng Du Xiaoyong Chen Pingping. Design of high parallel CNN accelerator based on FPGA for AIoT[J]. The Journal of China Universities of Posts and Telecommunications, 2022, 29(5): 1-9.

参考文献

[1]GIDARIS S, KOMODAKIS N. Object detection via a multi-region and semantic segmentation-aware CNN model. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV'15), 2015, Dec 7 - 13, Santiago, Chile. Piscataway,NJ, USA: IEEE, 2015: 1134 -1142.

[2]LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'15), 2015, Jun 7 -12, Boston, MA, USA . Piscataway, NJ, USA: IEEE, 2015: 3431 -3440.

[3]GUO K Y, ZENG S L, YU J C, et al. A survey of FPGA-based neural network inference accelerators. ACM Transactions on Reconfigurable Technology and Systems, 2019, 12(1): Article 2.

[4]LIAN X C, LIU Z Y, SONG Z R, et al. High-performance FPGA-based CNN accelerator with block-floating-point arithmetic. IEEE Transactions on Very Large Scale Integration Systems, 2019, 27(8): 1874 -1885.

[5]IRMAK H, CORRADI F, DETTERER P, et al. A dynamic reconfigurable architecture for hybrid spiking and convolutional FPGA-based neural network designs. Journal of Low Power Electronics and Applications, 2021, 11(3): Article 32.

[6]CHEN C, CHAI Z L, XIA J. Design and implementation of YOLOv2 accelerator based on ZYNQ7000 FPGA heterogeneous platform. Journal of Frontiers of Computer Science and Technology, 2019, 13(10): 1677 -1693 (in Chinese).

[7]FENG G, HU Z Y, CHEN S, et al. Energy-efficient and high-throughput FPGA-based accelerator for convolutional neural networks. Proceedings of the 13th IEEE International Conference on Solid-State and Integrated Circuit Technology ( ICSICT'16), 2016, Oct 25 - 28, Hangzhou, China. Piscataway, NJ, USA: IEEE, 2016: 624 -626.

[8]MA Y F, CAO Y, VRUDHULA S, et al. Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Transactions on Very Large Scale Integration Systems, 2018,26(7): 1354 -1367.

[9]HUANG C, NI S Y, CHEN G S. A layer-based structured design of CNN on FPGA. Proceedings of the IEEE 12th International Conference on ASIC (ASICON'17), 2017, Oct 25 - 28, Guiyang, China. Piscataway, NJ, USA: IEEE, 2017: 1037 -1040.

[10] TAN F, WANG Y M, YANG Y M, et al. A ReRAM-based computing-in-memory convolutional-macro with customized 2T2R bit-cell for AIoT chip IP applications. IEEE Transactions on Circuits and Systems II: Express Briefs, 2020, 67(9): 1534 -1538.

[11] LIU X, YANG J F, ZOU C M, et al. Collaborative edge computing with FPGA-based CNN accelerators for energy-efficient and time-aware face tracking system. IEEE Transactions on Computational Social Systems, 2021, 9(1): 252 -266.

[12] JUNAID M, ARSLAN S, LEE T G, et al. Optimal architecture of floating-point arithmetic for neural network training processors. Sensors, 2022, 22(3): Article 1230.

[13] JACOB B, KLIGYS S, CHEN B, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. Proceedings of the 2018 IEEE/ CVF Conference on Computer Vision and Pattern Recognition. (CVPR'18), 2018, Jun 18 -23, Salt Lake City, UT, USA. Piscataway, NJ, USA: IEEE, 2018: 2704 -2713.

[14] LI Y H, SONG B, KANG X, et al. Vehicle-type detection based on compressed sensing and deep learning in vehicular networks. Sensors, 2018, 18(12): Article 4500.

[15] WINOGRAD S. Arithmetic complexity of computations. CBMS-NSF Regional Conference Series in Applied Mathematics, SeriesNumber 33. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 1980.

[16] GUO K Y, SUI L Z, QIU J T, et al. Angel-eye: a complete design flow for mapping CNN onto embedded FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2017, 37(1): 35 -47.

[17] LIU Z Q, DOU Y, JIANG J F, et al. An FPGA-based processor for training convolutional neural networks. Proceedings of the 2017 International Conference on Field Programmable Technology ( ICFPT'17 ), 2017, Dec 11 - 13, Melbourne, Australia. Piscataway, NJ, USA: IEEE, 2017: 207 -210.

[18] LUO C, SIT M K, FAN H X, et al. Towards efficient deep neural network training by FPGA-based batch-level parallelism. Journal of Semiconductors, 2020, 41(2): Article 022403.

Design of high parallel CNN accelerator based on FPGA for AIoT

Design of high parallel CNN accelerator based on FPGA for AIoT

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

编辑推荐

Metrics

本文评价