The Journal of China Universities of Posts and Telecommunications ›› 2020, Vol. 27 ›› Issue (2): 37-45.doi: 10.19682/j.cnki.1005-8885.2020.1005

• Artificial intelligence • Previous Articles     Next Articles

Impact of data set noise on distributed deep learning

Guo Qinghao, Shuai Liguo, Hu Sunying   

  1. School of Mechanical Engineering, Southeast University, Nanjing 211100, China
  • Received:2019-02-24 Revised:2020-06-14 Online:2020-04-30 Published:2020-07-07
  • Contact: Shuai Liguo, E-mail: liguo.shuai@126.com E-mail:liguo.shuai@126.com
  • About author:Shuai Liguo, E-mail: liguo.shuai@126.com
  • Supported by:
     

Abstract: The training efficiency and test accuracy are important factors in judging the scalability of distributed deep learning. In this dissertation, the impact of noise introduced in the mixed national institute of standards and technology database (MNIST) and CIFAR-10 datasets is explored, which are selected as benchmark in distributed deep learning. The noise in the training set is manually divided into cross-noise and random noise, and each type of noise has a different ratio in the dataset. Under the premise of minimizing the influence of parameter interactions in distributed deep learning, we choose a compressed model (SqueezeNet) based on the proposed flexible communication method. It is used to reduce the communication frequency and we evaluate the influence of noise on distributed deep training in the synchronous and asynchronous stochastic gradient descent algorithms. Focusing on the experimental platform TensorFlowOnSpark, we obtain the training accuracy rate at different noise ratios and the training time for different numbers of nodes. The existence of cross-noise in the training set not only decreases the test accuracy and increases the time for distributed training. The noise has positive effect on destroying the scalability of distributed deep learning.

Key words: distributed deep learning, stochastic gradient descent, parameter server (PS), dataset noise

CLC Number: