中国邮电高校学报(英文) ›› 2024, Vol. 31 ›› Issue (5): 71-84.doi: 10.19682/j.cnki.1005-8885.2024.0015

• Artificial Intelligence • 上一篇    下一篇

Fast Fourier transform convolutional neural network accelerator based on overlap addition

游晨1,李德建2,冯曦1,沈冲飞3,魏继增4,刘昱2,2   

  1. 1. 天津大学
    2.
    3. 北京智能芯片微电子技术有限公司
    4. 智能与计算学部
  • 收稿日期:2023-03-20 修回日期:2024-01-02 出版日期:2024-10-31 发布日期:2024-10-31
  • 通讯作者: 魏继增 E-mail:weijizeng@tju.edu.cn
  • 基金资助:
    国家自然科学基金;国家电网公司2019年项目“高端控制器芯片集成技术研究与样机开发”

Fast Fourier transform convolutional neural network accelerator based on overlap addition

  • Received:2023-03-20 Revised:2024-01-02 Online:2024-10-31 Published:2024-10-31
  • Contact: jizeng weijizengwei E-mail:weijizeng@tju.edu.cn

摘要:

In convolutional neural networks ( CNNs), the floating-point computation in the traditional convolutional layer is enormous, and the execution speed of the network is limited by intensive computing, which makes it challenging to meet the real-time response requirements of complex applications. This work is based on the principle that the time domain convolution result equals the frequency domain point multiplication result to reduce the amount of floating- point calculations for convolution. The input feature map and the convolution kernel are converted to the frequency domain by the fast Fourier transform( FFT), and the corresponding point multiplication is performed. Then the frequency domain result is converted back to the time domain, and the output result of the convolution is obtained. In the shared CNN, the input feature map is much larger than the convolution kernel, resulting in many invalid operations. The overlap addition method is proposed to reduce invalid calculations and speed up network execution better. This work designs a hardware accelerator for frequency domain convolution and verifies its efficiency on the Xilinx Zynq UltraScale + MPSoC ZCU102 board. Comparing the calculation time of visual geometry group 16 ( VGG16 ) under the ImageNet dataset faster than the traditional time domain convolution, the hardware acceleration of frequency domain convolution is 8. 5 times.

关键词:

convolutional neural network ( CNN), fast Fourier transform ( FFT), overlap addition

Abstract:

In convolutional neural networks ( CNNs), the floating-point computation in the traditional convolutional layer is enormous, and the execution speed of the network is limited by intensive computing, which makes it challenging to meet the real-time response requirements of complex applications. This work is based on the principle that the time domain convolution result equals the frequency domain point multiplication result to reduce the amount of floating- point calculations for convolution. The input feature map and the convolution kernel are converted to the frequency domain by the fast Fourier transform( FFT), and the corresponding point multiplication is performed. Then the frequency domain result is converted back to the time domain, and the output result of the convolution is obtained. In the shared CNN, the input feature map is much larger than the convolution kernel, resulting in many invalid operations. The overlap addition method is proposed to reduce invalid calculations and speed up network execution better. This work designs a hardware accelerator for frequency domain convolution and verifies its efficiency on the Xilinx Zynq UltraScale + MPSoC ZCU102 board. Comparing the calculation time of visual geometry group 16 ( VGG16 ) under the ImageNet dataset faster than the traditional time domain convolution, the hardware acceleration of frequency domain convolution is 8. 5 times.

Key words: convolutional neural network ( CNN), fast Fourier transform ( FFT), overlap addition