Discriminative tonal feature extraction method in mandarin speech recognition

doi:1005-8885 (2007) 04-0126-05

Acta Metallurgica Sinica(English letters) ›› 2007, Vol. 14 ›› Issue (4): 126-130.doi: 1005-8885 (2007) 04-0126-05

• Wireless • Previous Articles

Discriminative tonal feature extraction method in mandarin speech recognition

HUANG Hao; ZHU Jie

Department of Electronic Engineering, Shanghai Jiao Tong University

Received:2007-04-24 Revised:1900-01-01 Online:2007-12-24
Contact: HUANG Hao

Abstract

Abstract:

To utilize the supra-segmental nature of Mandarin tones, this article proposes a feature extraction method for hidden markov model (HMM) based tone modeling. The method uses linear transforms to project F0 (fundamental frequency) features of neighboring syllables as compensations, and adds them to the original F0 features of the current syllable. The transforms are discriminatively trained by using an objective function termed as “minimum tone error”, which is a smooth approximation of tone recognition accuracy. Experiments show that the new tonal features achieve 3.82% tone recognition rate improvement, compared with the baseline, using maximum likelihood trained HMM on the normal F0 features. Further experiments show that discriminative HMM training on the new features is 8.78% better than the baseline.

Key words:

discriminative training; tone recognition; feature extraction; Mandarin speech recognition

HUANG Hao; ZHU Jie. Discriminative tonal feature extraction method in mandarin speech recognition[J]. Acta Metallurgica Sinica(English letters), 2007, 14(4): 126-130.

References

1. Chang E, Shi Yu, Zhou Jian-lai, et al. Speech lab in a box: a Mandarin speech toolbox to jumpstart speech related research. Proceedings of the 7th European Conference on Speech Communication and Technology, Sep 3-7, 2001, Aalborg, Denmark. 2001: 2779-2782

2. Huang H C H, Seide F. Pitch tracking and tone features for mandarin speech recognition. Proceedings of International Conference on Acoustics, Speech and Signal Processing: Vol 3, Jun 5-9, 2000: Istanbul, Turkey. Piscataway, NJ, USA: IEEE, 2000: 1523-1526

3. Thubthong N, Kijsirikul B. Tone recognition of continuous Thai speech under tonal assimilation and declination effects using half-tone model. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2001. 9(6): 815-825

4. Cao Yang, Zhang Shu-wu, Huang Tai-yi, et al. Tone modeling for continuous Mandarin speech recognition. International Journal of Speech Technology, 2004, 7(2-3): 115-128

5. Wong P F, Siu M H. Decision tree based tone modeling for Chinese speech recognition. Proceedings of International Conference on Acoustics, Speech and Signal Processing: Vol 1, May 17-21, 2004: Montreal, Canada. Piscataway, NJ, USA: IEEE, 2004: 905-908

6. Wang H M, Ho T H, Yang R C, et al. Complete language with very large vocabulary but limited training data. IEEE Transactions on Speech and Audio Processing, 1997, 5(2): 196-201

7. Bahl L R, Brown P F, Souza P, et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition. Proceedings of International Conference on Acoustics, Speech and Signal Processing, Vol.1, Apr. 18-22, 1986, Tokyo, Japan. Piscataway, NJ, USA: IEEE, 1986: 49-52

8. Povey D, Woodland P C. Minimum phone error and I-smoothing for improved discriminative training. Proceedings of International Conference on Acoustics Speech and Signal Processing, Vol.1, May. 13-17, 2002, Orlando, FL, USA. Piscataway, NJ, USA: IEEE, 2002: 105-108

9. Zhang B, Matsoukas S, Schwartz R. Discriminatively trained region dependent feature transforms for speech recognition. Proceedings of International Conference on Acoustics, Speech and Signal Processing: Vol.1, May 14-19, 2006, Toulouse, France. Piscataway, NJ, USA: IEEE, 2006: 313-316

10. Droppo J, Deng L, Acero A. Evaluation of the SPLICE algorithm on the Aurora2 database. Proceedings of Eurpean Conference on Speech Communication Technology: Vol. 1, Sep 3-7, 2001, Aalborg, Denmark. 2001: 217-220

11. Povey D, Kingsbury B, Mangu L, et al. fMPE: discriminatively trained features for speech recognition. Proceedings of International Conference on Acoustics, Speech and Signal Processing: Vol. 1, Mar 18-23, 2005, Philadelphia, PA, USA. Piscataway, NJ, USA: IEEE, 2005: USA, 961-964

12. Povey D. Discriminative Training for Large Vocabulary Speech Recognition. Ph D, Cambridge, UK: Cambridge University, 2004

13. Ying Na, Zhao Xiao-hui, Dong Jing. Unvoiced/voiced classification and voiced harmonic parameters estimation using the third-order statistics. The Journal of China Universities of Posts and Teleco- mmunications, 2007, 14(1): 85-89

Metrics

Comments

Copyright © 2020 The Journal of China Universities of Posts and Telecommunications
　 Adress: P.O. Box 231,Beijing University of Posts and Telecommunications,10 Xi Tucheng Road,Beijing 100876,P.R.China　Post Code: 100081
Tel：86-010-62282493　Fax： 86-010-62283461　E-mail: jchupt@bupt.edu.cn
Support by: Beijing Magtech Co.Ltd

[1]	Cheng Yi, Zhao Yan, Yin Peiwen. Radar false alarm plots elimination based on multi-feature extraction and classification [J]. The Journal of China Universities of Posts and Telecommunications, 2024, 31(1): 83-92.
[2]	Fan Xinyue, Wu Kai, Chen Shuai. RB-SLAM: visual SLAM based on rotated BEBLID feature point description [J]. The Journal of China Universities of Posts and Telecommunications, 2023, 30(3): 1-13.
[3]	Han Fengquan, Han Yinghua , Lu Jing, Zhao Qiang. Wind speed prediction based on nested shared weight long short-term memory network [J]. The Journal of China Universities of Posts and Telecommunications, 2021, 28(1): 41-51.
[4]	. Hyperspectral remote sensing images terrain classification in DCT SRDA subspace [J]. Acta Metallurgica Sinica(English letters), 2015, 22(1): 65-71.
[5]	. Automatic context induction for tone model integration in mandarin speech recognition [J]. Acta Metallurgica Sinica(English letters), 2012, 19(1): 94-100.

Discriminative tonal feature extraction method in mandarin speech recognition

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 5

Recommended Articles

Metrics

Comments