Large Scale Voice Dialling using Speaker Adaptation

Kim, Weon-Goo;

doi:10.5302/J.ICROS.2010.16.4.335

Journal of Institute of Control, Robotics and Systems (제어로봇시스템학회논문지)

Volume 16 Issue 4
/
Pages.335-338
/
2010
/
1976-5622(pISSN)
/
2233-4335(eISSN)

Institute of Control, Robotics and Systems (제어로봇시스템학회)

DOI QR Code

Large Scale Voice Dialling using Speaker Adaptation

화자 적응을 이용한 대용량 음성 다이얼링

Kim, Weon-Goo

김원구 (군산대학교 전기공학과)

Received : 2010.01.10
Accepted : 2010.02.07
Published : 2010.04.01

https://doi.org/10.5302/J.ICROS.2010.16.4.335 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

A new method that improves the performance of large scale voice dialling system is presented using speaker adaptation. Since SI (Speaker Independent) based speech recognition system with phoneme HMM uses only the phoneme string of the input sentence, the storage space could be reduced greatly. However, the performance of the system is worse than that of the speaker dependent system due to the mismatch between the input utterance and the SI models. A new method that estimates the phonetic string and adaptation vectors iteratively is presented to reduce the mismatch between the training utterances and a set of SI models using speaker adaptation techniques. For speaker adaptation the stochastic matching methods are used to estimate the adaptation vectors. The experiments performed over actual telephone line shows that proposed method shows better performance as compared to the conventional method. with the SI phonetic recognizer.

Keywords

References

N. Jain, R. Cole, and E. Barnard, "Creating speaker specific phonetic templates with a speaker-independent phonetic recognizer: implications for voice dialing," Proc. of ICASSP, pp. 881-884, 1996.
V. Fontaine and H. Bourlard, "Speaker-dependent speech recognition based on phone-like units models-application to voice dialing," Proc. of ICASSP, pp. 1527-1530, 1997.
B. Ramabhadran, L. R. Bahl, P. V. deSouza, and M. Padmanabhan, "Acoustic-only based automatic phonetic baseform generation," Proc. of ICASSP, pp. 2275-2278, 1998.
S. Deligne and L. Mangu, "On the use of lattices for automatic generation of pronunciations," Proc. of ICASSP, pp. 204-207, 2003.
A. Sankar and C. H, Lee, "A maximum-likelihood approach to stochastic matching for robust speech recognition," IEEE Trans. on Speech and Audio Processing, vol. 4, pp. 190-202, 1996. https://doi.org/10.1109/89.496215
R. A. Sukkar and C. H. Lee, "Vocabulary Independent discriminative utterance verification for non-keyword rejection in subword based speech recognition," IEEE Trans. on Speech and Audio Processing, vol. 4, pp. 420-429, 1996. https://doi.org/10.1109/89.544527