DOI QR코드

DOI QR Code

Bilingual Voice Conversion Using Frequency Warping on Formant Space

포만트 공간에서의 주파수 변환을 이용한 이중 언어 음성 변환 연구

  • Received : 2014.11.15
  • Accepted : 2014.12.13
  • Published : 2014.12.31

Abstract

This paper describes several approaches to transform a speaker's individuality to another's individuality using frequency warping between bilingual formant frequencies on different language environments. The proposed methods are simple and intuitive voice conversion algorithms that do not use training data between different languages. The approaches find the warping function from source speaker's frequency to target speaker's frequency on formant space. The formant space comprises four representative monophthongs for each language. The warping functions can be represented by piecewise linear equations, inverse matrix. The used features are pure frequency components including magnitudes, phases, and line spectral frequencies (LSF). The experiments show that the LSF-based voice conversion methods give better performance than other methods.

Keywords

References

  1. Mizuno, H., Abe, M. (1995), Voice conversion algorithm based on peicewise linear conversion rules of formant frequency and spectrum tilt, Speech Communication, no. 16, pp. 153-164.
  2. Kuwabara H., Sagisaka Y. (1995), Acoustic characteristics of speaker individuality: Control and conversion, Speech Communication, no. 16, pp. 165-173.
  3. Narendranath M., Murthy H. A., Rajendran S., Yegnanarayna B. (1995), Transformation of foramnts for voice conversion using artificial neural networks, Speech Communication, no. 16, pp. 207-216.
  4. Sundermann D., Bonafonte A., Ney H. (2004), Time domain vocal tract length normalization, In Proc. of IEEE In. Symposium on Signal Processing and Information Technology, pp. 191-194.
  5. Errno D., Moreno A., Bonafonte A. (2010), Voice conversion based on weighted frequency warping, IEEE Tr. on Audio, Speech, and Language Processing, vol. 18, issue 5, pp. 922-1931. https://doi.org/10.1109/TASL.2009.2038663
  6. Pye D., Woodland P. C. (1997), Experiments in speaker normalization and adaptation for large vocabulary speech recognition, In Proc. of IEEE Int. Conference on Acoustics, Speech and Signal Processing, pp. 1047-1050.
  7. Sundermann D., Ney H., Hoge H. (2003), VTLN-Based cross-language voice conversion, In Proc. of IEEE Automatic Speech Recognition and Understanding Workshop, pp. 676-681.
  8. Saheer L., Dines J., Garner P. N. (2012), Vocal tract length normalization for statistical parametric speech synthesis, IEEE Tr. on Audio, Speech, and Language Processing, vol. 20, issue 7, pp. 2134-2148. https://doi.org/10.1109/TASL.2012.2198058
  9. Sundermann D., Hoge H., Bonafonte A., Ney H., Black A., Narayanan S. (2006), Text-independent voice conversion based on unit selection. In Proc. of Int. Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 81-84.
  10. Huang X., Acero A., Hon H.-W. (2001), Spoken language processing-A guide to theory, algorithm, and system development, Prentice Hall
  11. Yun Y.-S., Ladner R. E. (2013), Bilingual voice conversion by weighted frequency warping based on formant space, LNCS 8082, pp. 137-144
  12. Sundermann D., Strecha G., Bonafonte A., Hoge H., Ney H. (2005), Evaluation of VTLN-Based voice conversion for embedded speech synthesis, Int. Proc. of Conference on Spoken Language Processing, pp. 3-6.
  13. Erro D., Moreno A., Bonafonte A. (2010), Voice conversion based on weighted frequency warping, IEEE Tr. on Audio, Speech, and Language Processing, vol 20. issue 7, pp. 2134-2148
  14. Y.-S. Yun (2013), Multilingual voice conversion using direct frequency warping, In Proc. of 2013 Korean Society of Speech Sciences Fall Conference, pp. 127-128 (윤영선 (2013), 주파수 직접 변환에 의한 다국어 음성 변환 연구, 2013 한국음성학회 가을 학술대회 발표 논문집, pp. 127-128)
  15. Voiceware Corp., VoiceTextTM, Retrieved from http://www.voiceware.co.kr/kor/product/product1.php on October 31, 2014
  16. Jennifer Clyde Interview, Retrieved from http://pann.nate.com/video/211296293 on October 31, 2014
  17. Fant G., (1970) Acoustic theory of speech production, Mouton, The Hague