Fast Fourier transform convolutional neural network accelerator based on overlap addition

doi:10.19682/j.cnki.1005-8885.2024.0015

中国邮电高校学报(英文) ›› 2024, Vol. 31 ›› Issue (5): 71-84.doi: 10.19682/j.cnki.1005-8885.2024.0015

• Artificial Intelligence • 上一篇下一篇

Fast Fourier transform convolutional neural network accelerator based on overlap addition

游晨¹,李德建²,冯曦¹,沈冲飞³,魏继增⁴,刘昱²,²

1. 天津大学
2.
3. 北京智能芯片微电子技术有限公司
4. 智能与计算学部

收稿日期:2023-03-20 修回日期:2024-01-02 出版日期:2024-10-31 发布日期:2024-10-31
通讯作者: 魏继增 E-mail:weijizeng@tju.edu.cn
基金资助:
国家自然科学基金;国家电网公司2019年项目“高端控制器芯片集成技术研究与样机开发”

Fast Fourier transform convolutional neural network accelerator based on overlap addition

Received:2023-03-20 Revised:2024-01-02 Online:2024-10-31 Published:2024-10-31
Contact: jizeng weijizengwei E-mail:weijizeng@tju.edu.cn

摘要/Abstract

摘要：

In convolutional neural networks ( CNNs), the floating-point computation in the traditional convolutional layer is enormous, and the execution speed of the network is limited by intensive computing, which makes it challenging to meet the real-time response requirements of complex applications. This work is based on the principle that the time domain convolution result equals the frequency domain point multiplication result to reduce the amount of floating- point calculations for convolution. The input feature map and the convolution kernel are converted to the frequency domain by the fast Fourier transform( FFT), and the corresponding point multiplication is performed. Then the frequency domain result is converted back to the time domain, and the output result of the convolution is obtained. In the shared CNN, the input feature map is much larger than the convolution kernel, resulting in many invalid operations. The overlap addition method is proposed to reduce invalid calculations and speed up network execution better. This work designs a hardware accelerator for frequency domain convolution and verifies its efficiency on the Xilinx Zynq UltraScale + MPSoC ZCU102 board. Comparing the calculation time of visual geometry group 16 ( VGG16 ) under the ImageNet dataset faster than the traditional time domain convolution, the hardware acceleration of frequency domain convolution is 8. 5 times.

关键词:

convolutional neural network ( CNN), fast Fourier transform ( FFT), overlap addition

Abstract:

Key words: convolutional neural network ( CNN), fast Fourier transform ( FFT), overlap addition

参考文献

[1] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition. arXiv Preprint, arXiv: 1409. 1556, 2014.

[2] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition ( CVPR’16 ), 2016, Jun 27 - 30, Las Vegas, NV, USA. Piscataway, NJ, USA: IEEE, 2016: 770 - 778.

[3] KRIZHEVSKY A, ILYA S, HINTON G E. ImageNet classification with deep convolutional neural networks. Proceedings of the 26th Annual Conference on Neural Information Processing Systems ( NIPS’12): Vol 1, 2012, Dec 3 - 6, Lake Tahoe, NV, USA. Red Hook, NY, USA: Curran Associates, Inc, 2012: 1097 - 1105.

[4] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognitionn ( CVPR’14 ), 2014, Jun 23 - 28, Columbus, OH, USA. Piscataway, NJ, USA: IEEE, 2014: 580 - 587.

[5] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition ( CVPR’16 ), 2016, Jun 27 - 30, Las Vegas, NV, USA. Piscataway, NJ, USA: IEEE, 2016: 779 - 788.

[6] DONG C, LOY C C, HE K M, et al. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38 (2): 295 - 307.

[7] CHELLAPILLA K, PURI S, SIMARD P. High performance convolutional neural networks for document processing. Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition ( IWFHR’06 ), 2006, Oct 23 - 26, La Baule, France. Paris, France: Publisoft, 2006: 1 - 7.

[8] GEORGANAS E, AVANCHA S, BANERJEE K, et al. Anatomy of high-performance deep learning convolutions on SIMD architectures. Proceedings of the 2018 International Conference for High Performance Computing, Networking, Storage and Analysis ( SC’18 ), 2018, Nov 11 - 16, Dallas, TX, USA. Piscataway, NJ, USA: IEEE, 2018: 830 - 841.

[9] HONG S, PARK D. Differential image-based fast and compatible convolutional layers for multi-core processors. Proceedings of the 2023 International Conference on Artificial Intelligence in Information and Communication ( ICAIIC’23 ), 2023, Feb 20 - 23, Bali, Indonesia. Piscataway, NJ, USA: IEEE, 2023: 86 - 90.

[10] CONG J, XIAO B J. Minimizing computation in convolutional neural networks. Proceedings of the 24th International Conference on Artificial Neural Networks ( ICANN’14 ), 2014, Sep 15 - 19, Hamburg, Germany. LNTCS 8681. Berlin, Germany: Springer, 2014: 281 - 290.

[11] ZLATESKI A, JIA Z, LI K, et al. FFT convolutions are faster than Winograd on modern CPUs, here is why. arXiv Preprint, arXiv: 1809. 07851, 2018.

[12 ] LAVIN, A, GRAY S. Fast algorithms for convolutional neural networks. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition ( CVPR’16), 2016, Jun 27 - 30, Las Vegas, NV, USA. Piscataway, NJ, USA: IEEE, 2016: 4013 - 4021.

[13] LU L Q, LIANG Y. SpWA: an efficient sparse Winograd convolutional neural networks accelerator on FPGAs. Proceedings of the 55th ACM / ESDA / IEEE Design Automation Conference ( DAC’18 ), 2018, Jun 24 - 28, San Francisco, CA, USA. Piscataway, NJ, USA: IEEE, 2018: 1 - 6.

[14] ABTAHI T, SHEA C, KULKARNI A, et al. Accelerating convolutional neural network with FFT on embedded hardware. IEEE Transactions on Very Large Scale Integration ( VLSI ) Systems, 2018, 26(9): 1737 - 1749.

[15] ALBERICIO J, JUDD P, HETHERINGTON T, et al. Cnvlutin: ineffectual-neuron-free deep neural network computing. ACM SIGARCH Computer Architecture News, 2016, 44(3): 1 - 13.

[16] HAN S, MAO H Z, DALLY W J. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv Preprint, arXiv: 1510. 00149, 2015.

[17] ZHANG C, SUN G Y, FANG Z M, et al. Caffeine: toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 38(11 ): 2072 - 2085.

[18] WANG L M, GUO S, HUANG W L, et al. Places205-VGGnet models for scene recognition. arXiv Preprint, arXiv: 1508. 01667,

2015.

[19] BENEDETTI A, PRATI A, SCARABOTTOLO N. Image convolution on FPGAs: the implementation of a multi-FPGA FIFO structure. Proceedings of the 24th Conference on EUROMICRO ( EUROMICRO ’98 ): Vol. 1, 1998, Aug 25 - 27, Vasteras,

Sweden. Washington, DC, USA: IEEE Computer Society, 1998: 123 - 130.

[20] LIANG Y, LU L Q, XIAO Q C, et al. Evaluating fast algorithms for convolutional neural networks on FPGAs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(4): 857 - 870.

[21] HAN S, LIU X Y, MAO H Z, et al. EIE: efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News, 2016, 44(3): 243 - 254.

[22] MA Y F, CAO Y, VRUDHULA S, et al. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. Proceedings of the 2017 ACM / SIGDA International Symposium on Field-Programmable Gate Arrays ( FPGA’17 ), 2017, Feb 22 - 24, Monterey, CA, USA. New York, NY, USA: ACM, 2017: 45 - 54.

[23] SUDA N, CHANDRA V, DASIKA G, et al. Throughput- optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. Proceedings of the 2016 ACM / SIGDA International Symposium on Field-Programmable Gate Arrays ( FPGA’16 ), 2016, Feb 21 - 23, Monterey, CA, USA. New York, NY, USA: ACM, 2016: 16 - 25.

[24] XIE B, ZHANG G D, SHEN Y J, et al. Fast FFT-based inference in 3D convolutional neural networks. Proceedings of the 12th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing ( IMIS’18 ), 2018, Jul 4 - 6, Matsue, Japan. AISC 773. Berlin, Germany: Springer, 2019: 420 - 431.

Fast Fourier transform convolutional neural network accelerator based on overlap addition

Fast Fourier transform convolutional neural network accelerator based on overlap addition

PDF

PDF (Mobile)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

编辑推荐

Metrics

本文评价