Automatic context induction for tone model integration in mandarin speech recognition

doi:10.1016/S1005-8885(11)60233-1

中国邮电高校学报(英文) ›› 2012, Vol. 19 ›› Issue (1): 94-100.doi: 10.1016/S1005-8885(11)60233-1

Automatic context induction for tone model integration in mandarin speech recognition

黄浩,李兵虎

新疆大学信息科学与工程学院

收稿日期:2011-03-01 修回日期:2011-10-14 出版日期:2012-02-28 发布日期:2012-02-21
通讯作者: 黄浩 E-mail:hwanghao@gmail.com
基金资助:
This work was supported by the National Natural Science Foundation of China (60965002), the College Research Project of Xinjiang (XJEDU2008S15), and the Start-up Fund Research for Ph. D. in Xinjiang University (BS090143).

Automatic context induction for tone model integration in mandarin speech recognition

Received:2011-03-01 Revised:2011-10-14 Online:2012-02-28 Published:2012-02-21
Contact: HUANG Hao E-mail:hwanghao@gmail.com
Supported by:
This work was supported by the National Natural Science Foundation of China (60965002), the College Research Project of Xinjiang (XJEDU2008S15), and the Start-up Fund Research for Ph. D. in Xinjiang University (BS090143).

摘要/Abstract

摘要：

Tone model (TM) integration is an important task for mandarin speech recognition. It has been proved to be effective to use discriminatively trained scaling factors when integrating TM scores into multi-pass speech recognition. Moreover, context-dependent (CD) scaling can be applied for better interpolation between the models. One limitation of this approach is a large number of parameters will be introduced, which makes the technique prone to overtraining. In this paper, we propose to induce context-dependent model weights by using automatically derived phonetic decision trees. Question at each tree node is chosen to minimize the expected recognition error on the training data. First order approximation of the minimum phone error (MPE) objective function is used for question pruning to make tree building efficient. Experimental results on continuous mandarin speech recognition show the method is capable of inducing the most crucial phonetic contexts and obtains significant error reduction with far fewer parameters, compared with that obtained by using manually designed context-dependent scaling parameters.

关键词:

TM integration, MPE, decision tree, mandarin speech recognition, context-dependent

Abstract:

Key words:

TM integration, MPE, decision tree, mandarin speech recognition, context-dependent

中图分类号:

TN912

参考文献

1. Huang C H, Side F. Pitch tracking and tone features for mandarin speech recognition. Proceedings of the 25th International Conference on Acoustics，Speech, and Signal Processing (ICASSP’00): Vol 3, Jun 5-9, 2000, Istanbul, Turkey. Piscataway, NJ, USA: IEEE, 2000: 1523-1526

2. Lei X, Siu M H, Hwang M, et al. Improved tone modeling for mandarin broadcast news speech recognition. Proceedings of the 7th International Conference on Spoken Language Processing (InterSpeech/ICSLP’06), Sep 17-21, 2006, Pittsburgh, PA, USA. 2006: 1277-1280

3. Wang H L, Qian Y, Soong F K, et al. Improved mandarin speech recognition by lattice rescoring with enhanced tone models. Proceedings of the 5th International Symposium on Chinese Spoken Language Processing (ISCSLP’06), Dec 13-16, 2006, Singapore. LNAI 4274. Berlin, Germany: Springer-Verlag, 2006: 445-453

4. Beyerlein P. Discriminative model combination. Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’07), Dec 17, 2007, Santa Barbara, CA, USA. Piscataway, NJ, USA: IEEE, 1997: 238-245

5. Huang H, Zhu J. Discriminative incorporation of explicitly trained tone models into lattice based rescoring for mandarin speech recognition. Proceedings of the 33rd International Conference on Acoustics, Speech, and Signal Processing (ICASSP’08), Mar 31-Apr 4, 2008, Las Vegas, NV, USA, Piscataway, NJ, USA: IEEE, 2008: 1541-1544

6. Hoffmeister B, Liang R, Schlüter R, et al. Log-linear model combination with word-dependent scaling factors. Proceedings of the 10th International Conference on Spoken Language Processing (InterSpeech/ICSLP’09), Sep 26-30, 2009: Brighton, UK. 2009: 248-251

7. Liu X, Gales M, Woodland P. Use of contexts in language model interpolation and adaptation. Proceedings of the 10th International Conference on Spoken Language Processing (InterSpeech/ICSLP’09), Sep 26-30, 2009: Brighton, UK. 2009: 360-363

8. Povey D, Woodland P C. Minimum phone error and I-smoothing for improved discriminative training. Proceedings of the 27th International Conference on Acoustics，Speech, and Signal Proceesing (ICASSP’02): Vol 1, May 13-17, 2002, Orlando, FL, USA. Piscataway, NJ, USA: IEEE, 2002: 105-108

9. Gibson P, Hain T. Error approximation and minimum phone error acoustic model estimation. IEEE Transactions on Audio, Speech and Language Processing, 2010, 18(6): 1269-1279

10. Young S J, Odell J P, Woodland P C. Tree-based state tying for high accuracy acoustic modeling. Proceedings of the Workshop on Human Language Technology (HLT’94), Mar 8-11, 1994, Plainsboro, NJ, USA. 1994: 307-312

11. Xue J, Zhao Y. Random forests of phonetic decision trees for acoustic modeling in conversational speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 2008, 16(3): 519-528

12. Chang E, Shi Y, Zhou J, et al. Speech lab in a box: a mandarin speech toolbox to jumpstart speech related research. Proceedings of the 7th European Conference on Speech Communication and Technology (EuroSpeech’01), Sep 3-7, 2001, Aalborg, Denmark. 2001: 2779-2782.

13. Gunawardana A, Hahajan M, Acero A, et al. Hidden conditional random fields for phone classification. Proceedings of the 9th European Conference on Speech Communication and Technology (EuroSpeech’05), Sep 4-8, 2005, Lisbon, Portugal. 2005: 1117-1120

14. Quattoni A, Wang S, Morency L P, et al. Hidden conditional random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007, 29(10): 1848-1852

Automatic context induction for tone model integration in mandarin speech recognition

Automatic context induction for tone model integration in mandarin speech recognition

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 1

编辑推荐

Metrics

本文评价