Design and implementation of reconfigurable CNN accelerator architecture based on elastic storage

doi:10.19682/j.cnki.1005-8885.2025.0004

中国邮电高校学报(英文) ›› 2025, Vol. 32 ›› Issue (1): 74-87.doi: 10.19682/j.cnki.1005-8885.2025.0004

• IC and System Design • 上一篇下一篇

Design and implementation of reconfigurable CNN accelerator architecture based on elastic storage

山蕊,霍紫晴,许佳宁

西安邮电大学

收稿日期:2023-10-09 修回日期:2024-10-15 出版日期:2025-02-28 发布日期:2025-02-28
通讯作者: 山蕊 E-mail:shanrui0112@163.com
基金资助:
科技创新2030——“新一代人工智能”重大项目;国家自然基金重点项目;国家自然基金青年项目

Design and implementation of reconfigurable CNN accelerator architecture based on elastic storage

Received:2023-10-09 Revised:2024-10-15 Online:2025-02-28 Published:2025-02-28
Contact: SHAN Rui E-mail:shanrui0112@163.com
Supported by:
National Key R&D Program of China;National Natural Science Foundation of China;National Natural Science Foundation of China

摘要/Abstract

摘要：

With the rapid iteration of neural network algorithms, higher requirements were placed on the computational performance and memory access bandwidth of neural network accelerators. Simply increasing bandwidth cannot improve energy efficiency, so improving the data reuse rate is a hot research topic. From the perspective of supporting data reuse, a reconfigurable convolutional neural network ( CNN) accelerator based on elastic storage ( RCAES) was designed in this paper. Supporting elastic memory access and flexible data flow reduces data movement between the processor and memory, eases the bandwidth pressure and enhances CNN acceleration performance. The experimental results indicate that by conducting 1 × 1 convolution and 3 × 3 convolution when performing convolution calculations, the execution speed increased by 25.00% and 61.61% , respectively. The 3 × 3 maximum pooling speed was increased by 76.04% .

关键词:

reconfigurable, array processor, distributed storage, neural network accelerator, data reuse

Abstract:

Key words: reconfigurable, array processor, distributed storage, neural network accelerator, data reuse

参考文献

[1] LI G H, MANDAL S K, OGRAS U Y, et al. FLASH: fast neural architecture search with hardware optimization. ACM Transactions on Embedded Computing Systems, 2021, 20(5S): 1 - 26.

[2] PAN X Y, CAO Y, JIA R, et al. Overview of neural network architecture search development. Journal of Xi’an University of Posts and Telecommunications, 2022, 27 ( 4 ): 43 - 63 ( in Chinese).

[3] DAHL G E, YU D, DENG L, et al. Context-dependent pre- trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1): 30 - 42.

[4] ZHANG K N, ZHAO S, SUN Q B, et al. Design of CNN accelerator with multi-core based on FPGA. Computer Engineering and Design, 2021, 42(6): 1592 - 1598 ( in Chinese).

[5] SATEESAN A, SINHA S, SMITHA K G, et al. A survey of algorithmic and hardware optimization techniques for vision convolutional neural networks on FPGAs. Neural Processing Letters, 2021, 53: 2331 - 2377.

[6] MA Y F, CAO Y, VRUDHULA S, et al. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. Proceedings of the 2017 ACM / SIGDA International Symposium on Field-Programmable Gate Arrays ( FPGA’17 ), 2017, Feb 22 - 24, Monterey, CA, USA. New York, NY, USA: ACM, 2017: 45 - 54.

[7] YIN S Y, OUYANG P, TANG S B, et al. A high energy efficient reconfigurable hybrid neural network processor for deep learning applications. IEEE Journal of Solid-State Circuits, 2018, 53(4): 968 - 982.

[8] SHAN R, GAO X, FENG Y N, et al. Design and implementation of near-memory computing array architecture based on shared buffer. High Technology Letters, 2022, 28(4): 345 - 353.

[9] LU J M, LIN J, WANG Z F. A reconfigurable DNN training accelerator on FPGA. Proceedings of the 2020 IEEE Workshop on Signal Processing Systems ( SiPS’20 ), 2020, Oct 20 - 22, Coimbra, Portugal. Piscataway, NJ, USA: IEEE, 2020: 6p.

[10] WU C B, WANG C S, HSIAO Y K. Reconfigurable hardware architecture design and implementation for AI deep learning accelerator. Proceedings of the IEEE 9th Global Conference on Consumer Electronics ( GCCE’20 ), 2020, Oct 13 - 16, Kobe, Japan. Piscataway, NJ, USA: IEEE, 2020: 154 - 155.

[11] LI J X, UN K F, YU W H, et al. An FPGA-based energy- efficient reconfigurable convolutional neural network accelerator for object recognition applications. IEEE Transactions on Circuits and Systems II: Express Briefs, 2021, 68(9): 3143 - 3147.

[12] ZHANG C, SUN G Y, FANG Z M, et al. Caffeine: toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019, 38(11): 2072 - 2085.

[ 13 ] YUAN Z, LIU Y P, YUE J S, et al. STICKER: an energy-efficient multi-sparsity compatible accelerator for convolutional neural networks in 65-nm CMOS. IEEE Journal of Solid-State Circuits, 2020, 55(2): 465 - 477.

[14] CHEN Y H, KRISHNA T, EMER J S, et al. Eyeriss: an energy- efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 2017, 52(1): 127 - 138.

[15] JO J, CHA S, RHO D, et al. DSIP: a scalable inference accelerator for convolutional neural networks. IEEE Journal of Solid-State Circuits, 2018, 53(2): 605 - 618.

Design and implementation of reconfigurable CNN accelerator architecture based on elastic storage

Design and implementation of reconfigurable CNN accelerator architecture based on elastic storage

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

编辑推荐

Metrics

本文评价