中国邮电高校学报(英文) ›› 2017, Vol. 24 ›› Issue (2): 1-9.doi: 10.1016/S1005-8885(17)60193-6

• Artificial Intelligence •    下一篇

Noisy speech emotion recognition using sample reconstruction and multiple-kernel learning

姜晓庆1,2,夏克文1,林永良1,3,白建川1   

  1. 1. 河北工业大学
    2. 济南大学
    3. 信息化建设管理中心,天津城建大学
  • 收稿日期:2016-09-29 修回日期:2017-04-01 出版日期:2017-04-30 发布日期:2017-04-30
  • 通讯作者: 夏克文 E-mail:kwxia@hebut.edu.cn
  • 基金资助:
    中国国家自然科学基金;河北省自然科学基金;河北省引进留学人员基金;山东省自然科学基金;济南大学科研基金

Noisy speech emotion recognition using sample reconstruction and multiple-kernel learning

  • Received:2016-09-29 Revised:2017-04-01 Online:2017-04-30 Published:2017-04-30
  • Supported by:
    the National Natural Science Foundation of China (61501204, 61601198), the Hebei Province Natural Science Foundation (E2016202341), the Hebei Province Foundation for Returned Scholars (C2012003038), the Shandong Province Natural Science Foundation (ZR2015FL010), the Science and Technology Program of University of Jinan (XKY1710).

摘要: Speech emotion recognition (SER) in noisy environment is a vital issue in artificial intelligence (AI). In this paper, the reconstruction of speech samples removes the added noise. Acoustic features extracted from the reconstructed samples are selected to build an optimal feature subset with better emotional recognizability. A multiple-kernel (MK) support vector machine (SVM) classifier solved by semi-definite programming (SDP) is adopted in SER procedure. The proposed method in this paper is demonstrated on Berlin Database of Emotional Speech. Recognition accuracies of the original, noisy, and reconstructed samples classified by both single-kernel (SK) and MK classifiers are compared and analyzed. The experimental results show that the proposed method is effective and robust when noise exists.

关键词: speech emotion recognition, compressed sensing, multiple-kernel learning, feature selection

Abstract: Speech emotion recognition (SER) in noisy environment is a vital issue in artificial intelligence (AI). In this paper, the reconstruction of speech samples removes the added noise. Acoustic features extracted from the reconstructed samples are selected to build an optimal feature subset with better emotional recognizability. A multiple-kernel (MK) support vector machine (SVM) classifier solved by semi-definite programming (SDP) is adopted in SER procedure. The proposed method in this paper is demonstrated on Berlin Database of Emotional Speech. Recognition accuracies of the original, noisy, and reconstructed samples classified by both single-kernel (SK) and MK classifiers are compared and analyzed. The experimental results show that the proposed method is effective and robust when noise exists.

Key words: speech emotion recognition, compressed sensing, multiple-kernel learning, feature selection

中图分类号: