Improved vocal effort modeling by exploiting echo state network and radial basis function network

doi:10.19682/j.cnki.1005-8885.2019.0010

The Journal of China Universities of Posts and Telecommunications ›› 2019, Vol. 26 ›› Issue (3): 98-104.doi: 10.19682/j.cnki.1005-8885.2019.0010

• Others • Previous Articles

Improved vocal effort modeling by exploiting echo state network and radial basis function network

Received:2018-09-18 Revised:2018-12-17 Online:2019-06-30 Published:2019-06-30
Contact: Hao CHAO E-mail:chaohao@hpu.edu.cn
Supported by:
;Fundamental Research Funds for the Universities of Henan Province;Foundation for University Key Teacher by Henan Province;Foundation for scientific and technological project of Henan Province

Abstract

Abstract: The independent hypothesis between frames in vocal effect (VE) recognition makes it difficult for frame based spectral features to describe the intrinsic temporal correlation and dynamic change information in speech phenomena. A novel VE detection method based on echo state network (ESN) is presented. The input sequences are mapped into a fixed-dimensionality vector in high dimensional coding space by reservoir of the ESN. Then, radial basis function (RBF) networks are employed to fit the probability density function (pdf) of each VE mode by using the vectors in the high dimensional coding space. Finally, the minimum error rate Bayesian decision is employed to judge the VE mode. The experiments which are conducted on isolated words test set achieve 79.5% average recognition accuracy, and the results show that the proposed method can overcome the defect of the independent hypothesis between frames effectively.

Key words: vocal effort, echo state network, reservoir, radial basis function, support vector machine

CLC Number:

TN391.42

References

1. Traunmüller H, Eriksson A. Acoustic effects of variation in vocal effort by men, women, and children. Journal of the Acoustical Society of America, 2000, 107(6): 3438-3451

2. Zelinka P, Sigmund M. Automatic vocal effort detection for reliable speech recognition. Proceedings of the 2010 IEEE International Workshop on Machine Learning for Signal Processing, Aug 29 - Sept 1, 2010, Kittilä, Finland. Piscataway, NJ, USA: IEEE, 2010: 349-354

3. Zelinka P, Sigmund M, Schimmel J. Impact of vocal effort variability on automatic speech recognition. Speech Communication, 2012, 54(6): 732-742

4. Diment A, Parviainen M, Virtanen T, et al. Noise-robust detection of whispering in telephone calls using deep neural networks. Proceedings of the 24th European Signal Processing Conference, Aug 29-Sept 2, 2016, Budapest, Hungary. Piscataway, NJ, USA: IEEE, 2016: 2310-2314

5. Shriberg E, Graciarena M, Bratt H, et al. Effects of vocal effort and speaking style on text-independent speaker verification. Proceedings of the 9th Annual Conference of the International Speech Communication Association (InterSpeech’08), Sept 22-26, 2008, Brisbane, Australia. 2008: 221-224

6. Raitio T, Suni A, Pohjalainen J, et al. Analysis and synthesis of shouted speech. Proceedings of the 14th Annual Conference of the International Speech Communication Association (InterSpeech’14), Aug 25-29, Lyon, France. 2013: 1544-1548

7. López A R, Saeidi R, Juvela L, et al. Normal-to-shouted speech spectral mapping for speaker recognition under vocal effort mismatch. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'17), Mar 5-9, 2017, New Orleans, LA, USA: Piscataway, NJ, USA: IEEE, 2017: 4940-4944

8. Saeidi R, Alku P, Bäckström T. Feature extraction using power-law adjusted linear prediction with application with application to speaker recognition under severe vocal effort mismatch. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2016, 24(1): 42-53

9. Zhang C, Hansen J H L. Analysis and classification of speech mode: Whispered through shouted. Proceedings of the 8th Annual Conference of the International Speech Communication Association (InterSpeech’07), Aug 27-31, 2007, Antwerp, Belgium. 2007: 2289-2292

10. Chao H, Song C, Liu Z Z. Multi-level detection of vocal effort based on vowel template matching. Journal of Beijing University of Posts and Telecommunications, 2016, 39(4): 98-102 (in chinese)

11. Jaeger H, Haas H. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science, 2004, 304: 78-80

12. Zhang B, Miller D J, Wang Y. Nonlinear system modeling with random matrices: Echo state networks revisited. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23( 1): 175-182

13. Roeschies B, Igel C. Structure optimization of reservoir networks. Logic Journal of the IGPL, 2010, 18(5): 635-669

14. Bianchi F M, Livi L, Alippi C. Investigating echo-state networks dynamics by means of recurrence analysis. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(2): 427-439

15. Chang C C, Lin C J. LIBSVM : A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 1-27

Metrics

Comments

Copyright © 2020 The Journal of China Universities of Posts and Telecommunications
　 Adress: P.O. Box 231,Beijing University of Posts and Telecommunications,10 Xi Tucheng Road,Beijing 100876,P.R.China　Post Code: 100081
Tel：86-010-62282493　Fax： 86-010-62283461　E-mail: jchupt@bupt.edu.cn
Support by: Beijing Magtech Co.Ltd

Improved vocal effort modeling by exploiting echo state network and radial basis function network

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 0

Recommended Articles

Metrics

Comments