Automatic context induction for tone model integration in mandarin speech recognition

doi:10.1016/S1005-8885(11)60233-1

Acta Metallurgica Sinica(English letters) ›› 2012, Vol. 19 ›› Issue (1): 94-100.doi: 10.1016/S1005-8885(11)60233-1

• Others • Previous Articles Next Articles

Automatic context induction for tone model integration in mandarin speech recognition

Received:2011-03-01 Revised:2011-10-14 Online:2012-02-28 Published:2012-02-21
Contact: HUANG Hao E-mail:hwanghao@gmail.com
Supported by:
This work was supported by the National Natural Science Foundation of China (60965002), the College Research Project of Xinjiang (XJEDU2008S15), and the Start-up Fund Research for Ph. D. in Xinjiang University (BS090143).

Abstract

Abstract:

Tone model (TM) integration is an important task for mandarin speech recognition. It has been proved to be effective to use discriminatively trained scaling factors when integrating TM scores into multi-pass speech recognition. Moreover, context-dependent (CD) scaling can be applied for better interpolation between the models. One limitation of this approach is a large number of parameters will be introduced, which makes the technique prone to overtraining. In this paper, we propose to induce context-dependent model weights by using automatically derived phonetic decision trees. Question at each tree node is chosen to minimize the expected recognition error on the training data. First order approximation of the minimum phone error (MPE) objective function is used for question pruning to make tree building efficient. Experimental results on continuous mandarin speech recognition show the method is capable of inducing the most crucial phonetic contexts and obtains significant error reduction with far fewer parameters, compared with that obtained by using manually designed context-dependent scaling parameters.

Key words:

TM integration, MPE, decision tree, mandarin speech recognition, context-dependent

CLC Number:

TN912

References

1. Huang C H, Side F. Pitch tracking and tone features for mandarin speech recognition. Proceedings of the 25th International Conference on Acoustics，Speech, and Signal Processing (ICASSP’00): Vol 3, Jun 5-9, 2000, Istanbul, Turkey. Piscataway, NJ, USA: IEEE, 2000: 1523-1526

2. Lei X, Siu M H, Hwang M, et al. Improved tone modeling for mandarin broadcast news speech recognition. Proceedings of the 7th International Conference on Spoken Language Processing (InterSpeech/ICSLP’06), Sep 17-21, 2006, Pittsburgh, PA, USA. 2006: 1277-1280

3. Wang H L, Qian Y, Soong F K, et al. Improved mandarin speech recognition by lattice rescoring with enhanced tone models. Proceedings of the 5th International Symposium on Chinese Spoken Language Processing (ISCSLP’06), Dec 13-16, 2006, Singapore. LNAI 4274. Berlin, Germany: Springer-Verlag, 2006: 445-453

4. Beyerlein P. Discriminative model combination. Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’07), Dec 17, 2007, Santa Barbara, CA, USA. Piscataway, NJ, USA: IEEE, 1997: 238-245

5. Huang H, Zhu J. Discriminative incorporation of explicitly trained tone models into lattice based rescoring for mandarin speech recognition. Proceedings of the 33rd International Conference on Acoustics, Speech, and Signal Processing (ICASSP’08), Mar 31-Apr 4, 2008, Las Vegas, NV, USA, Piscataway, NJ, USA: IEEE, 2008: 1541-1544

6. Hoffmeister B, Liang R, Schlüter R, et al. Log-linear model combination with word-dependent scaling factors. Proceedings of the 10th International Conference on Spoken Language Processing (InterSpeech/ICSLP’09), Sep 26-30, 2009: Brighton, UK. 2009: 248-251

7. Liu X, Gales M, Woodland P. Use of contexts in language model interpolation and adaptation. Proceedings of the 10th International Conference on Spoken Language Processing (InterSpeech/ICSLP’09), Sep 26-30, 2009: Brighton, UK. 2009: 360-363

8. Povey D, Woodland P C. Minimum phone error and I-smoothing for improved discriminative training. Proceedings of the 27th International Conference on Acoustics，Speech, and Signal Proceesing (ICASSP’02): Vol 1, May 13-17, 2002, Orlando, FL, USA. Piscataway, NJ, USA: IEEE, 2002: 105-108

9. Gibson P, Hain T. Error approximation and minimum phone error acoustic model estimation. IEEE Transactions on Audio, Speech and Language Processing, 2010, 18(6): 1269-1279

10. Young S J, Odell J P, Woodland P C. Tree-based state tying for high accuracy acoustic modeling. Proceedings of the Workshop on Human Language Technology (HLT’94), Mar 8-11, 1994, Plainsboro, NJ, USA. 1994: 307-312

11. Xue J, Zhao Y. Random forests of phonetic decision trees for acoustic modeling in conversational speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 2008, 16(3): 519-528

12. Chang E, Shi Y, Zhou J, et al. Speech lab in a box: a mandarin speech toolbox to jumpstart speech related research. Proceedings of the 7th European Conference on Speech Communication and Technology (EuroSpeech’01), Sep 3-7, 2001, Aalborg, Denmark. 2001: 2779-2782.

13. Gunawardana A, Hahajan M, Acero A, et al. Hidden conditional random fields for phone classification. Proceedings of the 9th European Conference on Speech Communication and Technology (EuroSpeech’05), Sep 4-8, 2005, Lisbon, Portugal. 2005: 1117-1120

14. Quattoni A, Wang S, Morency L P, et al. Hidden conditional random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007, 29(10): 1848-1852

Metrics

Comments

Copyright © 2020 The Journal of China Universities of Posts and Telecommunications
　 Adress: P.O. Box 231,Beijing University of Posts and Telecommunications,10 Xi Tucheng Road,Beijing 100876,P.R.China　Post Code: 100081
Tel：86-010-62282493　Fax： 86-010-62283461　E-mail: jchupt@bupt.edu.cn
Support by: Beijing Magtech Co.Ltd

Automatic context induction for tone model integration in mandarin speech recognition

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 1

Recommended Articles

Metrics

Comments