Noisy speech emotion recognition using sample reconstruction and multiple-kernel learning

doi:10.1016/S1005-8885(17)60193-6

中国邮电高校学报(英文) ›› 2017, Vol. 24 ›› Issue (2): 1-9.doi: 10.1016/S1005-8885(17)60193-6

• Artificial Intelligence • 下一篇

Noisy speech emotion recognition using sample reconstruction and multiple-kernel learning

姜晓庆¹,²,夏克文¹,林永良¹,³,白建川¹

1. 河北工业大学
2. 济南大学
3. 信息化建设管理中心，天津城建大学

收稿日期:2016-09-29 修回日期:2017-04-01 出版日期:2017-04-30 发布日期:2017-04-30
通讯作者: 夏克文 E-mail:kwxia@hebut.edu.cn
基金资助:
中国国家自然科学基金;河北省自然科学基金;河北省引进留学人员基金;山东省自然科学基金;济南大学科研基金

Noisy speech emotion recognition using sample reconstruction and multiple-kernel learning

Received:2016-09-29 Revised:2017-04-01 Online:2017-04-30 Published:2017-04-30
Supported by:
the National Natural Science Foundation of China (61501204, 61601198), the Hebei Province Natural Science Foundation (E2016202341), the Hebei Province Foundation for Returned Scholars (C2012003038), the Shandong Province Natural Science Foundation (ZR2015FL010), the Science and Technology Program of University of Jinan (XKY1710).

摘要/Abstract

摘要： Speech emotion recognition (SER) in noisy environment is a vital issue in artificial intelligence (AI). In this paper, the reconstruction of speech samples removes the added noise. Acoustic features extracted from the reconstructed samples are selected to build an optimal feature subset with better emotional recognizability. A multiple-kernel (MK) support vector machine (SVM) classifier solved by semi-definite programming (SDP) is adopted in SER procedure. The proposed method in this paper is demonstrated on Berlin Database of Emotional Speech. Recognition accuracies of the original, noisy, and reconstructed samples classified by both single-kernel (SK) and MK classifiers are compared and analyzed. The experimental results show that the proposed method is effective and robust when noise exists.

关键词: speech emotion recognition, compressed sensing, multiple-kernel learning, feature selection

Abstract: Speech emotion recognition (SER) in noisy environment is a vital issue in artificial intelligence (AI). In this paper, the reconstruction of speech samples removes the added noise. Acoustic features extracted from the reconstructed samples are selected to build an optimal feature subset with better emotional recognizability. A multiple-kernel (MK) support vector machine (SVM) classifier solved by semi-definite programming (SDP) is adopted in SER procedure. The proposed method in this paper is demonstrated on Berlin Database of Emotional Speech. Recognition accuracies of the original, noisy, and reconstructed samples classified by both single-kernel (SK) and MK classifiers are compared and analyzed. The experimental results show that the proposed method is effective and robust when noise exists.

Key words: speech emotion recognition, compressed sensing, multiple-kernel learning, feature selection

中图分类号:

TP18

参考文献

1. Tao J, Tan T. Affective computing: a review. Proceedings of the 1st International Conference on Affective Computing and Intelligent Interaction, Oct 22-24, 2005, Beijing, China. LNCS 3784. Berlin, Germany: Springer, 2005: 981-995

2. Schuller B, Batliner A, Steidl S, et al. Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Communication, 2011, 53(9/10): 1062-1087

3. Schuller B, Arsic D, Wallhoff F, et al. Emotion recognition in the noise applying large acoustic feature sets. Proceedings of the 3rd International Conference on Speech Prosody, May 2-5, 2006, Dresden, Germany. 2006: IP-128

4. You M Y, Chen C, Bu J J, et al. Emotion recognition from noisy speech. Proceedings of the 2006 IEEE International Conference on Multimedia and Expo (ICME’06), July 9-12, 2006, Toronto, Canada. Piscataway, NJ, USA: IEEE, 2006: 1653-1656

5. Schuller B, Wöllmer M, Moosmayr T, et al. Recognition of noisy speech: a comparative survey of robust model architecture and feature enhancement. EURASIP Journal on Audio, Speech, and Music Processing, 2009: 942617/1-17

6. Donoho D L. Compressed sensing. IEEE Transactions on Information Theory, 2006, 52(4): 1289-1306

7. Candès E J. The restricted isometry property and its implications for compressed sensing. Comptes Rendus Mathematique, 2008, 346(9/10): 589-592

8. Zhao X M, Zhang S Q, Lei B C. Robust emotion recognition in noisy speech via sparse representation. Neural Computing and Applications, 2014, 24(7): 1539-1553

9. Haupt J, Nowak R. Signal reconstruction from noisy random projections. IEEE Transactions on Information Theory, 2006, 52(9): 4036-4048

10. Lanckriet G R G, Cristianini N, Bartlett P, et al. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 2004, 5(1): 27-72

11. Jin Y, Song P, Zheng W M, et al. Novel feature fusion method for speech emotion recognition based on multiple kernel learning. Journal of Southeast University, 2013, 29(2): 129-133

12. Baraniuk R G. Compressive sensing. IEEE Signal Processing Magazine, 2007, 24(4): 118-120

13. Needell D, Vershynin R. Signal recovery from inaccurate and incomplete measurements via regularized orthogonal matching pursuit. IEEE Journal of Selected Topics in Signal Processing, 2010, 4(2): 310-316

14. Needell D, Tropp J A. CoSaMP: iterative signal recovery from incomplete and inaccurate samples. Applied and Computational Harmonic Analysis, 2008, 26(3): 301-321

15. Dai W, Milenkovic O. Subspace pursuit for compressive sensing signal reconstruction. IEEE Transactions on Information Theory, 2009, 55(5): 2230-2249

16. Tropp J A, Gilbert A C. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory, 2007, 53(12): 4655-4666

17. Saligrama V, Zhao M Q. Thresholded basis pursuit: LP algorithm for oder-wise optimal support recovery for sparse and approximately sparse signals from noisy random measurements. IEEE Transactions on Information Theory, 2011, 57(3): 1567-1586

18. Chen S S, Donoho D L, Saunders M A. Atomic decomposition by basis pursuit. SIAM Review, 2001, 43(1): 129-159

19. Figueiredo M A, Nowak R D, Wright S J. Gradient projection for sparse reconstruction: application to compress sensing and other inverse problems. IEEE Journal of Selected Topics in Signal Processing, 2007, 1(4): 586-597

20. Blumensath T, Davies M. Iterative hard thresholding for compressed sensing. Applied and Computational Harmonic Analysis, 2009, 27(3): 265-274

21. Plumbley M D. Recovery of sparse representations by polytope faces pursuit. Proceedings of the 2006 International Conference on Independent Component Analysis and Blind Source Separation, Mar 5-8, 2006, Charleston, SC, USA. LNCS3889. Berlin, Germany: Springer, 2006: 206-213

22. Yeh C Y, Su W P, Lee S J. An efficient multiple-kernel learning for pattern classification. Expert Systems with Applications, 2013, 40(9): 3491-3499

23. Chen L J, Mao X, Xue Y L, et al. Speech emotion recognition: features and classification models. Digital Signal Processing, 2012, 22(6): 1154-1160

24. Chandaka S, Chatterjee A, Munshi S. Support vector machines employing cross-correlation for emotional speech recognition. Measurement, 2009, 42(4): 611-618

25. Lee C C, Mower E, Busso C, et al. Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 2011, 53(9/10): 1162-1171

26. Burkhardt F, Paeschke A, Rolfes M, et al. A database of German emotional speech. Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH’05), Sept 4-8, 2005, Lisbon, Portugal. 2005: 1517-1520

27. Jiang X Q, Xia K W, Xia X Y, et al. Speech emotion recognition using semi-definite programming multiple-kernel SVM. Journal of Beijing University of Posts and Telecommunications, 2015, 38(S1): 67-71 (in Chinese)

28. Yang B, Lugger M. Emotion recognition from speech signals using new harmony features. Signal Processing, 2010, 90(5): 1415-1423

29. Meyer P E, Schretter C, Bontempi G. Information-theoretic feature selection in microarray dada using variable complementarity. IEEE Journal of Selected Topics in Signal Processing, 2008, 2(3): 261-274

30. Löfberg J. YALMIP: A toolbox for modeling and optimization in MATLAB. Proceedings of the 2004 International Symposium on Computer Aided Control Systems Design, Sept 2-4, 2004, Taipei, China. Piscataway, NJ, USA: IEEE, 2004: 284-289

31. Henríquez P, Alonso J B, Ferrer M A, et al. Nonlinear dynamics characterization of emotional speech. Neurocomputing, 2014, 132: 126-135

[1]	Li Hao, Zhang Linghua, Tong Cheng, Zhou Chenyang. Short-term load forecasting model based on gated recurrent unit and multi-head attention [J]. 中国邮电高校学报(英文版), 2023, 30(3): 25-31.
[2]	Du Rong, Chen Shudong, Li Weiwei, Zhang Xueting, Wang Xianhui, Ge Jin. Data augmentation via joint multi-scale CNN and multi-channel attention for bumblebee image generation [J]. 中国邮电高校学报(英文版), 2023, 30(3): 32-40.
[3]	吴青王凡范九伦侯静. L2,1-norm robust regularized extreme learning machine for regression using CCCP method [J]. 中国邮电高校学报(英文版), 2023, 30(2): 61-72.
[4]	Wu Qing, Li Feiyan, Zhang Hengchang, Fan Jiulun, Gao Xiaofeng. Least squares twin support vector machine with asymmetric squared loss[J]. 中国邮电高校学报(英文版), 2023, 30(1): 1-16.
[5]	Zhang Huibin, Li Tianzhu, Liu Haojiang, Li Zhuotong. Deep learning-based symbol detection algorithm in IMDD-OOFDM system [J]. 中国邮电高校学报(英文版), 2022, 29(6): 36-45.
[6]	Jiang Fan, Chen Jiajun, Gao Youjun, Sun Changyin. Research on ECG classification based on transfer learning [J]. 中国邮电高校学报(英文版), 2022, 29(6): 83-96.
[7]	段炼唐贵进. Low-light image enhancement algorithm using a residual network with semantic information[J]. 中国邮电高校学报(英文版), 2022, 29(2): 52-62.
[8]	Wu Qing, Fu Yanlin, Fan Jiulun, Ma Tianlu. Structural regularized twin support vector machine based on within-class scatter and between-class scatter[J]. 中国邮电高校学报(英文版), 2021, 28(4): 39-52.
[9]	李庆华张钊冯超沐雅琪尤越李研强. Human motion prediction using optimized sliding window polynomial fitting and recursive least squares [J]. 中国邮电高校学报(英文版), 2021, 28(3): 76-85.
[10]	焦继超陈新平管孟赵亚鑫. TCL: a taxi trajectory prediction model combining time and space features [J]. 中国邮电高校学报(英文版), 2021, 28(3): 63-75.
[11]	何明枢, 金磊, 王小娟, 李源. Web log classification framework with data augmentation based on GANs[J]. 中国邮电高校学报(英文版), 2020, 27(5): 34-46.
[12]	季一木, 李可, 刘尚东, 刘强, 尧海昌, 李奎. Collaborative filtering recommendation algorithm based on interactive data classification[J]. 中国邮电高校学报(英文版), 2020, 27(5): 1-12.
[13]	杨健健张强王晓林杜毅博王超吴淼. Research on equipment fault diagnosis method based on random stochastic adaptive particle swarm optimization [J]. 中国邮电高校学报(英文版), 2020, 27(4): 17-25.
[14]	Zhai Qi, Jiang Mingyan. Supervised learning of enhancing convolutional Hash for image retrieval[J]. 中国邮电高校学报(英文版), 2019, 26(4): 51-61.
[15]	Pang Hao, Bu Yunyun, Wang Cong, Xiao Hui. Automatic detection of breast nodule in the ultrasound images using CNN[J]. 中国邮电高校学报(英文版), 2019, 26(2): 9-16.

Noisy speech emotion recognition using sample reconstruction and multiple-kernel learning

Noisy speech emotion recognition using sample reconstruction and multiple-kernel learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价