中国邮电高校学报(英文) ›› 2017, Vol. 24 ›› Issue (5): 60-67.doi: 10.1016/S1005-8885(17)60234-6

• Signal processing • 上一篇    下一篇

Speaker conversion using kernel non-negative matrix factorization

Xu Qinyu, Lu Guanming, Yan Jingjie, Li Haibo, Cheng Xiao   

  1. 1. College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
    2. Jiangsu Province Key Laboratory on Image Processing and Image Communication, Nanjing 210003, China
  • 收稿日期:2017-01-23 修回日期:2017-09-29 出版日期:2017-10-30 发布日期:2017-12-18
  • 通讯作者: Lu Guanming, E-mail: lugm@njupt.edu.cn E-mail:lugm@njupt.edu.cn
  • 作者简介:Lu Guanming, E-mail: lugm@njupt.edu.cn
  • 基金资助:
    This work was supported in part by the National Natural Science Foundation of China (61501249, 61071167, 41601601), the Key Research and Development Program of Jiangsu Province (BE2016775), the Natural Science Foundation of Jiangsu Province for Youth (BK20150855), Research Project of Science and Technology Department of Jiangsu Province (BY2015011-1), the Natural Science Foundation for Jiangsu Higher Education Institutions (15KJB510022), and the Nanjing University of Posts and Telecommunications Science Foundation (NY214143).

Speaker conversion using kernel non-negative matrix factorization

Xu Qinyu, Lu Guanming, Yan Jingjie, Li Haibo, Cheng Xiao   

  1. 1. College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
    2. Jiangsu Province Key Laboratory on Image Processing and Image Communication, Nanjing 210003, China
  • Received:2017-01-23 Revised:2017-09-29 Online:2017-10-30 Published:2017-12-18
  • Contact: Lu Guanming, E-mail: lugm@njupt.edu.cn E-mail:lugm@njupt.edu.cn
  • About author:Lu Guanming, E-mail: lugm@njupt.edu.cn
  • Supported by:
    This work was supported in part by the National Natural Science Foundation of China (61501249, 61071167, 41601601), the Key Research and Development Program of Jiangsu Province (BE2016775), the Natural Science Foundation of Jiangsu Province for Youth (BK20150855), Research Project of Science and Technology Department of Jiangsu Province (BY2015011-1), the Natural Science Foundation for Jiangsu Higher Education Institutions (15KJB510022), and the Nanjing University of Posts and Telecommunications Science Foundation (NY214143).

摘要:

Voice conversion (VC) based on Gaussian mixture model (GMM) is the most classic and common method which converts the source spectrum to target spectrum. However this method is prone to over-fitting because of its frame-by-frame conversion. The VC with non-negative matrix factorization (NMF) is presented in this paper, which can keep spectrum from over-fitting by adjusting the size of basis vector (dictionary). In order to realize the non-linear mapping better, kernel NMF (KNMF) is adopted to achieve spectrum mapping. In addition, to increase the accuracy of conversion, KNMF combined with GMM (GKNMF) is also introduced into VC. In the end, KNMF, GKNMF, GMM, principal component regression (PCR), PCR combined with GMM (GPCR), partial least square regression (PLSR), NMF correlation-based frequency warping (NMF-CFW) and deep neural network (DNN) methods are compared with each other. The proposed GKNMF gets better performance in both objective evaluation and subjective evaluation.

关键词: VC, kernel, NMF, spectrum mapping

Abstract:

Voice conversion (VC) based on Gaussian mixture model (GMM) is the most classic and common method which converts the source spectrum to target spectrum. However this method is prone to over-fitting because of its frame-by-frame conversion. The VC with non-negative matrix factorization (NMF) is presented in this paper, which can keep spectrum from over-fitting by adjusting the size of basis vector (dictionary). In order to realize the non-linear mapping better, kernel NMF (KNMF) is adopted to achieve spectrum mapping. In addition, to increase the accuracy of conversion, KNMF combined with GMM (GKNMF) is also introduced into VC. In the end, KNMF, GKNMF, GMM, principal component regression (PCR), PCR combined with GMM (GPCR), partial least square regression (PLSR), NMF correlation-based frequency warping (NMF-CFW) and deep neural network (DNN) methods are compared with each other. The proposed GKNMF gets better performance in both objective evaluation and subjective evaluation.

Key words: VC, kernel, NMF, spectrum mapping

中图分类号: