中国邮电高校学报(英文版) ›› 2022, Vol. 29 ›› Issue (5): 1-9.doi: 10.19682/j.cnki.1005-8885.2022.0026
所属专题： Special Topic on Artificial Intelligence of Things
• Special Topic: Artificial Intelligence of Things • 下一篇
Lin Zhijian1,2, Gao Xuewei1, Chen Xiaopei1, Zhu Zhipeng1, Du Xiaoyong1, Chen Pingping2
To tackle the challenge of applying convolutional neural network (CNN) in field-programmable gate array (FPGA) due to its computational complexity, a high-performance CNN hardware accelerator based on Verilog hardware description language was designed, which utilizes a pipeline architecture with three parallel dimensions including input channels, output channels, and convolution kernels. Firstly, two multiply-and-accumulate (MAC) operations were packed into one digital signal processing (DSP) block of FPGA to double the computation rate of the CNN accelerator. Secondly, strategies of feature map block partitioning and special memory arrangement were proposed to optimize the total amount of off-chip access memory and reduce the pressure on FPGA bandwidth. Finally, an efficient computational array combining multiplicative-additive tree and Winograd fast convolution algorithm was designed to balance hardware resource consumption and computational performance. The high parallel CNN accelerator was deployed in ZU3EG of Alinx, using the YOLOv3-tiny algorithm as the test object. The average computing performance of the CNN accelerator is 127.5 giga operations per second (GOPS). The experimental results show that the hardware architecture effectively improves the computational power of CNN and provides better performance compared with other existing schemes in terms of power consumption and the efficiency of DSPs and block random access memory (BRAMs).
GIDARIS S, KOMODAKIS N. Object detection via a multi-region and semantic segmentation-aware CNN model. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV'15), 2015, Dec 7 - 13, Santiago, Chile. Piscataway,NJ, USA: IEEE, 2015: 1134 -1142.
LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'15), 2015, Jun 7 -12, Boston, MA, USA . Piscataway, NJ, USA: IEEE, 2015: 3431 -3440.
GUO K Y, ZENG S L, YU J C, et al. A survey of FPGA-based neural network inference accelerators. ACM Transactions on Reconfigurable Technology and Systems, 2019, 12(1): Article 2.
LIAN X C, LIU Z Y, SONG Z R, et al. High-performance FPGA-based CNN accelerator with block-floating-point arithmetic. IEEE Transactions on Very Large Scale Integration Systems, 2019, 27(8): 1874 -1885.
IRMAK H, CORRADI F, DETTERER P, et al. A dynamic reconfigurable architecture for hybrid spiking and convolutional FPGA-based neural network designs. Journal of Low Power Electronics and Applications, 2021, 11(3): Article 32.
CHEN C, CHAI Z L, XIA J. Design and implementation of YOLOv2 accelerator based on ZYNQ7000 FPGA heterogeneous platform. Journal of Frontiers of Computer Science and Technology, 2019, 13(10): 1677 -1693 (in Chinese).
FENG G, HU Z Y, CHEN S, et al. Energy-efficient and high-throughput FPGA-based accelerator for convolutional neural networks. Proceedings of the 13th IEEE International Conference on Solid-State and Integrated Circuit Technology ( ICSICT'16), 2016, Oct 25 - 28, Hangzhou, China. Piscataway, NJ, USA: IEEE, 2016: 624 -626.
MA Y F, CAO Y, VRUDHULA S, et al. Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Transactions on Very Large Scale Integration Systems, 2018,26(7): 1354 -1367.
HUANG C, NI S Y, CHEN G S. A layer-based structured design of CNN on FPGA. Proceedings of the IEEE 12th International Conference on ASIC (ASICON'17), 2017, Oct 25 - 28, Guiyang, China. Piscataway, NJ, USA: IEEE, 2017: 1037 -1040.
 TAN F, WANG Y M, YANG Y M, et al. A ReRAM-based computing-in-memory convolutional-macro with customized 2T2R bit-cell for AIoT chip IP applications. IEEE Transactions on Circuits and Systems II: Express Briefs, 2020, 67(9): 1534 -1538.
 LIU X, YANG J F, ZOU C M, et al. Collaborative edge computing with FPGA-based CNN accelerators for energy-efficient and time-aware face tracking system. IEEE Transactions on Computational Social Systems, 2021, 9(1): 252 -266.
 JUNAID M, ARSLAN S, LEE T G, et al. Optimal architecture of floating-point arithmetic for neural network training processors. Sensors, 2022, 22(3): Article 1230.
 JACOB B, KLIGYS S, CHEN B, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. Proceedings of the 2018 IEEE/ CVF Conference on Computer Vision and Pattern Recognition. (CVPR'18), 2018, Jun 18 -23, Salt Lake City, UT, USA. Piscataway, NJ, USA: IEEE, 2018: 2704 -2713.
 LI Y H, SONG B, KANG X, et al. Vehicle-type detection based on compressed sensing and deep learning in vehicular networks. Sensors, 2018, 18(12): Article 4500.
 WINOGRAD S. Arithmetic complexity of computations. CBMS-NSF Regional Conference Series in Applied Mathematics, SeriesNumber 33. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 1980.
 GUO K Y, SUI L Z, QIU J T, et al. Angel-eye: a complete design flow for mapping CNN onto embedded FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2017, 37(1): 35 -47.
 LIU Z Q, DOU Y, JIANG J F, et al. An FPGA-based processor for training convolutional neural networks. Proceedings of the 2017 International Conference on Field Programmable Technology ( ICFPT'17 ), 2017, Dec 11 - 13, Melbourne, Australia. Piscataway, NJ, USA: IEEE, 2017: 207 -210.
 LUO C, SIT M K, FAN H X, et al. Towards efficient deep neural network training by FPGA-based batch-level parallelism. Journal of Semiconductors, 2020, 41(2): Article 022403.
|No related articles found!|