中国邮电高校学报(英文) ›› 2024, Vol. 31 ›› Issue (2): 105-112.doi: 10.19682/j.cnki.1005-8885.2024.0008

所属专题: 集成电路

• IC and System Design • 上一篇    

Convolutional neural network adaptation and optimization method in SIMT computing mode

冯臻夫1,张亚英1,2,杨乐乐1,邢立冬1   

  1. 1. 西安邮电大学
    2.
  • 收稿日期:2023-12-04 修回日期:2024-03-10 出版日期:2024-04-30 发布日期:2024-04-30
  • 通讯作者: 冯臻夫 E-mail:fengzhenfu@xupt.edu.cn
  • 基金资助:
    陕西省教育厅科研计划项目资助

Convolutional neural network adaptation and optimization method in SIMT computing mode

zhenfu Feng1,Ya-Ying ZHANG1,1,Lele Yang2,Li-Dong XING1   

  1. 1.
    2. Xi'an University of Postsand Telecommunications
  • Received:2023-12-04 Revised:2024-03-10 Online:2024-04-30 Published:2024-04-30
  • Contact: zhenfu Feng E-mail:fengzhenfu@xupt.edu.cn
  • Supported by:
    Scientific Research Program Funded by Shaanxi Provincial Education Department

摘要:

For studying and optimizing the performance of general-purpose computing on graphics processing units(GPGPU) based on single instruction multiple threads(SIMT) processor about the neural network application, this work contributes a self-developed SIMT processor named Pomelo and correlated assembly program. The parallel mechanism of SIMT computing mode and self-developed Pomelo processor is briefly introduced. A common convolutional neural network(CNN) is built to verify the compatibility and functionality of the Pomelo processor. CNN computing flow with task level and hardware level optimization is adopted on the Pomelo processor. A specific algorithm for organizing a Z-shaped memory structure is developed, which addresses reducing memory access in mass data computing tasks. Performing the above-combined adaptation and optimization strategy, the experimental result illustrates that reducing memory access in SIMT computing mode plays a crucial role in improving performance. A 6.52 times performance is achieved on 4 processing elements case.

关键词:

parallel computing,single instruction multiple threads, convolutional neural network,memory optimization

Abstract:

For studying and optimizing the performance of general-purpose computing on graphics processing units(GPGPU) based on single instruction multiple threads(SIMT) processor about the neural network application, this work contributes a self-developed SIMT processor named Pomelo and correlated assembly program. The parallel mechanism of SIMT computing mode and self-developed Pomelo processor is briefly introduced. A common convolutional neural network(CNN) is built to verify the compatibility and functionality of the Pomelo processor. CNN computing flow with task level and hardware level optimization is adopted on the Pomelo processor. A specific algorithm for organizing a Z-shaped memory structure is developed, which addresses reducing memory access in mass data computing tasks. Performing the above-combined adaptation and optimization strategy, the experimental result illustrates that reducing memory access in SIMT computing mode plays a crucial role in improving performance. A 6.52 times performance is achieved on 4 processing elements case.

Key words:

parallel computing,single instruction multiple threads, convolutional neural network,memory optimization