中国邮电高校学报(英文) ›› 2014, Vol. 21 ›› Issue (5): 68-75.doi: 10.1016/S1005-8885(14)60333-2

• Networks • 上一篇    下一篇

High-quality voice conversion system based on GMM statistical parameters and RBF neural network

陈先同   

  1. 南京邮电大学
  • 收稿日期:2014-01-16 修回日期:2014-06-24 出版日期:2014-10-31 发布日期:2014-10-30
  • 通讯作者: 陈先同 E-mail:chenxt0524@126.com
  • 基金资助:

    国家自然科学基金

High-quality voice conversion system based on GMM statistical parameters and RBF neural network

CHEN Xiantong   

  • Received:2014-01-16 Revised:2014-06-24 Online:2014-10-31 Published:2014-10-30
  • Contact: CHEN Xiantong E-mail:chenxt0524@126.com

摘要:  A voice conversion (VC) system was designed based on Gaussian mixture model (GMM) and radial basis function (RBF) neural network. As a voice conversion model, RBF network needs quantities of training data to improve its performance. For one speech, the networks trained by different segments of data have different transformation effects. Since trying segment by segment to obtain the best conversion effect is complex, a conversion method was proposed, that uses GMM for statistics before training RBF network to aim at the problem. The speech transformation and representation using adaptive interpolation of weighted spectrum (STRAIGHT) model is used for accurate extraction of vocal tract spectrum. Then GMM is used to classify the numerous spectral parameters. The obtained mean parameters were trained in RBF network. Experiment reveals that, the soft classification ability of GMM can promptly realize the reduction and classification of training data under the premise of ensuring the training effect. The selection complexity is decreased thereafter. Compared to the conventional RBF network training methods, this method can make the transformation of spectral parameters more effective and improve the quality of converted speech.

关键词: VC system, STRAIGHT, vocal tract spectrum, GMM, RBF

Abstract:  A voice conversion (VC) system was designed based on Gaussian mixture model (GMM) and radial basis function (RBF) neural network. As a voice conversion model, RBF network needs quantities of training data to improve its performance. For one speech, the networks trained by different segments of data have different transformation effects. Since trying segment by segment to obtain the best conversion effect is complex, a conversion method was proposed, that uses GMM for statistics before training RBF network to aim at the problem. The speech transformation and representation using adaptive interpolation of weighted spectrum (STRAIGHT) model is used for accurate extraction of vocal tract spectrum. Then GMM is used to classify the numerous spectral parameters. The obtained mean parameters were trained in RBF network. Experiment reveals that, the soft classification ability of GMM can promptly realize the reduction and classification of training data under the premise of ensuring the training effect. The selection complexity is decreased thereafter. Compared to the conventional RBF network training methods, this method can make the transformation of spectral parameters more effective and improve the quality of converted speech.

Key words: VC system, STRAIGHT, vocal tract spectrum, GMM, RBF