The Journal of China Universities of Posts and Telecommunications ›› 2024, Vol. 31 ›› Issue (2): 105-112.doi: 10.19682/j.cnki.1005-8885.2024.0008

Special Issue: 集成电路

Previous Articles    

Convolutional neural network adaptation and optimization method in SIMT computing mode

zhenfu Feng1,Ya-Ying ZHANG1,1,Lele Yang2,Li-Dong XING1   

  1. 1.
    2. Xi'an University of Postsand Telecommunications
  • Received:2023-12-04 Revised:2024-03-10 Online:2024-04-30 Published:2024-04-30
  • Contact: zhenfu Feng
  • Supported by:
    Scientific Research Program Funded by Shaanxi Provincial Education Department


For studying and optimizing the performance of general-purpose computing on graphics processing units(GPGPU) based on single instruction multiple threads(SIMT) processor about the neural network application, this work contributes a self-developed SIMT processor named Pomelo and correlated assembly program. The parallel mechanism of SIMT computing mode and self-developed Pomelo processor is briefly introduced. A common convolutional neural network(CNN) is built to verify the compatibility and functionality of the Pomelo processor. CNN computing flow with task level and hardware level optimization is adopted on the Pomelo processor. A specific algorithm for organizing a Z-shaped memory structure is developed, which addresses reducing memory access in mass data computing tasks. Performing the above-combined adaptation and optimization strategy, the experimental result illustrates that reducing memory access in SIMT computing mode plays a crucial role in improving performance. A 6.52 times performance is achieved on 4 processing elements case.

Key words:

parallel computing,single instruction multiple threads, convolutional neural network,memory optimization