中国邮电高校学报(英文) ›› 2019, Vol. 26 ›› Issue (3): 98-104.doi: 10.19682/j.cnki.1005-8885.2019.0010

• Others • 上一篇    

Improved vocal effort modeling by exploiting echo state network and radial basis function network

晁浩,董亮,刘永利   

  1. 河南理工大学
  • 收稿日期:2018-09-18 修回日期:2018-12-17 出版日期:2019-06-30 发布日期:2019-06-30
  • 通讯作者: 晁浩 E-mail:chaohao@hpu.edu.cn
  • 基金资助:
    国家自然科学基金;国家自然科学基金;河南省高校基本科研业务费专项项目;河南省高等学校重点科研项目;河南省高等学校青年骨干教师资助计划资助对象;河南省科技攻关计划

Improved vocal effort modeling by exploiting echo state network and radial basis function network

  • Received:2018-09-18 Revised:2018-12-17 Online:2019-06-30 Published:2019-06-30
  • Contact: Hao CHAO E-mail:chaohao@hpu.edu.cn
  • Supported by:
    ;Fundamental Research Funds for the Universities of Henan Province;Foundation for University Key Teacher by Henan Province;Foundation for scientific and technological project of Henan Province

摘要: The independent hypothesis between frames in vocal effect (VE) recognition makes it difficult for frame based spectral features to describe the intrinsic temporal correlation and dynamic change information in speech phenomena. A novel VE detection method based on echo state network (ESN) is presented. The input sequences are mapped into a fixed-dimensionality vector in high dimensional coding space by reservoir of the ESN. Then, radial basis function (RBF) networks are employed to fit the probability density function (pdf) of each VE mode by using the vectors in the high dimensional coding space. Finally, the minimum error rate Bayesian decision is employed to judge the VE mode. The experiments which are conducted on isolated words test set achieve 79.5% average recognition accuracy, and the results show that the proposed method can overcome the defect of the independent hypothesis between frames effectively.

关键词: vocal effort, echo state network, reservoir, radial basis function, support vector machine

Abstract: The independent hypothesis between frames in vocal effect (VE) recognition makes it difficult for frame based spectral features to describe the intrinsic temporal correlation and dynamic change information in speech phenomena. A novel VE detection method based on echo state network (ESN) is presented. The input sequences are mapped into a fixed-dimensionality vector in high dimensional coding space by reservoir of the ESN. Then, radial basis function (RBF) networks are employed to fit the probability density function (pdf) of each VE mode by using the vectors in the high dimensional coding space. Finally, the minimum error rate Bayesian decision is employed to judge the VE mode. The experiments which are conducted on isolated words test set achieve 79.5% average recognition accuracy, and the results show that the proposed method can overcome the defect of the independent hypothesis between frames effectively.

Key words: vocal effort, echo state network, reservoir, radial basis function, support vector machine

中图分类号: