Fast Fourier transform convolutional neural network accelerator based on overlap addition

doi:10.19682/j.cnki.1005-8885.2024.0015

The Journal of China Universities of Posts and Telecommunications ›› 2024, Vol. 31 ›› Issue (5): 71-84.doi: 10.19682/j.cnki.1005-8885.2024.0015

Previous Articles Next Articles

Fast Fourier transform convolutional neural network accelerator based on overlap addition

Received:2023-03-20 Revised:2024-01-02 Online:2024-10-31 Published:2024-10-31
Contact: jizeng weijizengwei E-mail:weijizeng@tju.edu.cn

Abstract

Abstract:

In convolutional neural networks ( CNNs), the floating-point computation in the traditional convolutional layer is enormous, and the execution speed of the network is limited by intensive computing, which makes it challenging to meet the real-time response requirements of complex applications. This work is based on the principle that the time domain convolution result equals the frequency domain point multiplication result to reduce the amount of floating- point calculations for convolution. The input feature map and the convolution kernel are converted to the frequency domain by the fast Fourier transform( FFT), and the corresponding point multiplication is performed. Then the frequency domain result is converted back to the time domain, and the output result of the convolution is obtained. In the shared CNN, the input feature map is much larger than the convolution kernel, resulting in many invalid operations. The overlap addition method is proposed to reduce invalid calculations and speed up network execution better. This work designs a hardware accelerator for frequency domain convolution and verifies its efficiency on the Xilinx Zynq UltraScale + MPSoC ZCU102 board. Comparing the calculation time of visual geometry group 16 ( VGG16 ) under the ImageNet dataset faster than the traditional time domain convolution, the hardware acceleration of frequency domain convolution is 8. 5 times.

Key words: convolutional neural network ( CNN), fast Fourier transform ( FFT), overlap addition

References

[1] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition. arXiv Preprint, arXiv: 1409. 1556, 2014.

[2] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition ( CVPR’16 ), 2016, Jun 27 - 30, Las Vegas, NV, USA. Piscataway, NJ, USA: IEEE, 2016: 770 - 778.

[3] KRIZHEVSKY A, ILYA S, HINTON G E. ImageNet classification with deep convolutional neural networks. Proceedings of the 26th Annual Conference on Neural Information Processing Systems ( NIPS’12): Vol 1, 2012, Dec 3 - 6, Lake Tahoe, NV, USA. Red Hook, NY, USA: Curran Associates, Inc, 2012: 1097 - 1105.

[4] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognitionn ( CVPR’14 ), 2014, Jun 23 - 28, Columbus, OH, USA. Piscataway, NJ, USA: IEEE, 2014: 580 - 587.

[5] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition ( CVPR’16 ), 2016, Jun 27 - 30, Las Vegas, NV, USA. Piscataway, NJ, USA: IEEE, 2016: 779 - 788.

[6] DONG C, LOY C C, HE K M, et al. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38 (2): 295 - 307.

[7] CHELLAPILLA K, PURI S, SIMARD P. High performance convolutional neural networks for document processing. Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition ( IWFHR’06 ), 2006, Oct 23 - 26, La Baule, France. Paris, France: Publisoft, 2006: 1 - 7.

[8] GEORGANAS E, AVANCHA S, BANERJEE K, et al. Anatomy of high-performance deep learning convolutions on SIMD architectures. Proceedings of the 2018 International Conference for High Performance Computing, Networking, Storage and Analysis ( SC’18 ), 2018, Nov 11 - 16, Dallas, TX, USA. Piscataway, NJ, USA: IEEE, 2018: 830 - 841.

[9] HONG S, PARK D. Differential image-based fast and compatible convolutional layers for multi-core processors. Proceedings of the 2023 International Conference on Artificial Intelligence in Information and Communication ( ICAIIC’23 ), 2023, Feb 20 - 23, Bali, Indonesia. Piscataway, NJ, USA: IEEE, 2023: 86 - 90.

[10] CONG J, XIAO B J. Minimizing computation in convolutional neural networks. Proceedings of the 24th International Conference on Artificial Neural Networks ( ICANN’14 ), 2014, Sep 15 - 19, Hamburg, Germany. LNTCS 8681. Berlin, Germany: Springer, 2014: 281 - 290.

[11] ZLATESKI A, JIA Z, LI K, et al. FFT convolutions are faster than Winograd on modern CPUs, here is why. arXiv Preprint, arXiv: 1809. 07851, 2018.

[12 ] LAVIN, A, GRAY S. Fast algorithms for convolutional neural networks. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition ( CVPR’16), 2016, Jun 27 - 30, Las Vegas, NV, USA. Piscataway, NJ, USA: IEEE, 2016: 4013 - 4021.

[13] LU L Q, LIANG Y. SpWA: an efficient sparse Winograd convolutional neural networks accelerator on FPGAs. Proceedings of the 55th ACM / ESDA / IEEE Design Automation Conference ( DAC’18 ), 2018, Jun 24 - 28, San Francisco, CA, USA. Piscataway, NJ, USA: IEEE, 2018: 1 - 6.

[14] ABTAHI T, SHEA C, KULKARNI A, et al. Accelerating convolutional neural network with FFT on embedded hardware. IEEE Transactions on Very Large Scale Integration ( VLSI ) Systems, 2018, 26(9): 1737 - 1749.

[15] ALBERICIO J, JUDD P, HETHERINGTON T, et al. Cnvlutin: ineffectual-neuron-free deep neural network computing. ACM SIGARCH Computer Architecture News, 2016, 44(3): 1 - 13.

[16] HAN S, MAO H Z, DALLY W J. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv Preprint, arXiv: 1510. 00149, 2015.

[17] ZHANG C, SUN G Y, FANG Z M, et al. Caffeine: toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 38(11 ): 2072 - 2085.

[18] WANG L M, GUO S, HUANG W L, et al. Places205-VGGnet models for scene recognition. arXiv Preprint, arXiv: 1508. 01667,

2015.

[19] BENEDETTI A, PRATI A, SCARABOTTOLO N. Image convolution on FPGAs: the implementation of a multi-FPGA FIFO structure. Proceedings of the 24th Conference on EUROMICRO ( EUROMICRO ’98 ): Vol. 1, 1998, Aug 25 - 27, Vasteras,

Sweden. Washington, DC, USA: IEEE Computer Society, 1998: 123 - 130.

[20] LIANG Y, LU L Q, XIAO Q C, et al. Evaluating fast algorithms for convolutional neural networks on FPGAs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(4): 857 - 870.

[21] HAN S, LIU X Y, MAO H Z, et al. EIE: efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News, 2016, 44(3): 243 - 254.

[22] MA Y F, CAO Y, VRUDHULA S, et al. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. Proceedings of the 2017 ACM / SIGDA International Symposium on Field-Programmable Gate Arrays ( FPGA’17 ), 2017, Feb 22 - 24, Monterey, CA, USA. New York, NY, USA: ACM, 2017: 45 - 54.

[23] SUDA N, CHANDRA V, DASIKA G, et al. Throughput- optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. Proceedings of the 2016 ACM / SIGDA International Symposium on Field-Programmable Gate Arrays ( FPGA’16 ), 2016, Feb 21 - 23, Monterey, CA, USA. New York, NY, USA: ACM, 2016: 16 - 25.

[24] XIE B, ZHANG G D, SHEN Y J, et al. Fast FFT-based inference in 3D convolutional neural networks. Proceedings of the 12th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing ( IMIS’18 ), 2018, Jul 4 - 6, Matsue, Japan. AISC 773. Berlin, Germany: Springer, 2019: 420 - 431.

Metrics

Comments

Copyright © 2020 The Journal of China Universities of Posts and Telecommunications
　 Adress: P.O. Box 231,Beijing University of Posts and Telecommunications,10 Xi Tucheng Road,Beijing 100876,P.R.China　Post Code: 100081
Tel：86-010-62282493　Fax： 86-010-62283461　E-mail: jchupt@bupt.edu.cn
Support by: Beijing Magtech Co.Ltd

Fast Fourier transform convolutional neural network accelerator based on overlap addition

PDF

PDF (Mobile)

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 0

Recommended Articles

Metrics

Comments