References
1. Abe M, Nakamura S, Shikano K, et al. Voice conversion through vector quantization. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’88): Vol 1, Apr 11-14, 1988, New York, NY, USA. Piscataway, NJ, USA: IEEE, 1988: 655-658
2. Valbret H, Moulines E, Tubach J P. Voice transformation using PSOLA technique. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’92): Vol 1, Mar 23-26, 1992, San Francisco, CA, USA. Piscataway, NJ, USA: IEEE, 1992: 145–148
3. Stylianou Y, Cappé O, Moulines E. Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing, 1998, 6(2): 131-142
4. Kain A, Macon M W. Spectral voice conversion for text-to-speech synthesis. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’98): Vol 1, May 12-15, 1998, Seattle, WA, USA. Piscataway, NJ, USA: IEEE, 1998: 285-288
5. Toda T, Black A W, Tokuda K. Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(8): 2222-2235
6. Toda T, Ohtani Y, Shikano K. Eigenvoice conversion based on Gaussian mixture model. Proceedings of the 9th International Conference on Spoken Language Processing (INTERSPEECH’06/ICSLP’06), Sept 17-21, 2006, Pittsburgh, PA, USA. 2006: 2446-2449
7. Helander E, Virtanen T, Nurminen J, et al. Voice conversion using partial least squares regression. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(5): 912-921
8. Raj B, Singh R, Virtanen T. Phoneme-dependent NMF for speech enhancement in monaural mixtures. Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH’11), Aug 27-31, 2011, Florence, Italy. 2011: 1217-1220
9. Zhu B, Li W, Li R, et al. Multi-stage non-negative matrix factorization for monaural singing voice separation. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(10): 2096-2107
10. Takashima R, Takiguchi T, Ariki Y. Exemplar-based voice conversion in noisy environment. Proceedings of the 2012 IEEE Workshop on Spoken Language Technology (SLT’12), Dec 2-5, 2012, Miami, FL, USA. Piscataway, NJ, USA: IEEE, 2012: 313-317
11. Aihara R, Takashima R, Takiguchi T, et al. Individuality-preserving voice conversion for articulation disorders based on non-negative matrix factorization. Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSSP’13), May 26-31, 2013, Vancouver, Canada. Piscataway, NJ, USA: IEEE, 2013: 8037-8040
12. Aihara R, Nakashika T, Takiguchi T, et al. Voice conversion based on non-negative matrix factorization using phoneme-categorized dictionary. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’14), May 4-9, 2014, Florence, Italy. Piscataway, NJ, USA: IEEE, 2014: 7894-7898
13. Aihara R, Takiguchi T, Ariki Y. Activity-mapping non-negative matrix factorization for exemplar-based voice conversion. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’15), Apr 19-24, 2015, Brisbane, Australia. Piscataway, NJ, USA: IEEE, 2015: 4899-4903
14. Wu Z Z, Chng E S, Li H Z. Exemplar-based voice conversion using joint nonnegative matrix factorization. Multimedia Tools and Applications, 2015, 74(22): 9943-9958
15. Berry M W, Brown M, Langvill A N, et al. Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics and Data Analysis, 2007, 52(1): 155-173
16. Kominek J, Black A W. The CMU arctic speech databases. Proceedings of the 5th ISCA Speech Synthesis Workshop, Jun 14-16, 2004, Pittsburgh, PA, USA. International Speech Communication Association (ISCA), 2004: 223-224
17. Kawahara H, Morise M, Takahashi T, et al. Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation. Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’08), Mar 31-Apr 4, 2008, Las Vegas, NV, USA. Piscataway, NJ, USA: IEEE, 2008: 3933-3936
18. Speech Signal Processing Toolkit (SPTK). https://sourceforge.net/projects/ sp-tk/files/SPTK/.
19. Tian X H, Wu Z Z, Lee S W, et al. Sparse representation for frequency warping based voice conversion. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’15), Apr 19-24, 2015, Brisbane, Australia. Piscataway, NJ, USA: IEEE, 2015: 4235-4239
20. Mohammadi S H, Kain A. Voice conversion using deep neural networks with speaker-independent pre-training. Proceedings of the 2014 IEEE Workshop on Spoken Language Technology (SLT’14), Dec 7-10, 2014, South Lake Tahoe, NV, USA. Piscataway, NJ, USA: IEEE, 2014: 19-23
21. Mohammadi S H, Kain A. Semi-supervised training of a voice conversion mapping function using a joint-autoencoder. Proceedings of the 16th Annual Conference of the International Speech Communication Association (INTERSPEECH’15), Sept 6-10, 2015, Dresden, Germany. Piscataway, NJ, USA: IEEE, 2015: 284-288
22. Inanoglu Z. Transforming pitch in a voice conversion framework. Edmond, UK: St. Edmond’s College, University of Cambridge, 2003
23. Erro D, Moreno A, Bonafonte A. INCA algorithm for training voice conversion systems from nonparallel corpora. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(5): 944-953
24. Chen L H, Ling Z H, Liu L J, et al. Voice conversion using deep neural networks with layer-wise generative training. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(12): 1859-1872 |