Comparison of Male/Female Speech Features and Improvement of Recognition Performance by Gender-Specific Speech Recognition

남성과 여성의 음성 특징 비교 및 성별 음성인식에 의한 인식 성능의 향상

  • 이창영 (동서대학교 정보시스템공학부)
  • Received : 2010.10.01
  • Accepted : 2010.12.10
  • Published : 2010.12.31


In an effort to improve the speech recognition rate, we investigated performance comparison between speaker-independent and gender-specific speech recognitions. For this purpose, 20 male and 20 female speakers each pronounced 300 isolated Korean words and the speeches were divided into 4 groups: female, male, and two mixed genders. To examine the validity for the gender-specific speech recognition, Fourier spectrum and MFCC feature vectors averaged over male and female speakers separately were examined. The result showed distinction between the two genders, which supports the motivation for the gender-specific speech recognition. In experiments of speech recognition rate, the error rate for the gender-specific case was shown to be less than50% compared to that of the speaker-independent case. From the obtained results, it might be suggested that hierarchical recognition of gender and speech recognition might yield better performance over the current method of speech recognition.


  1. G. Kaplan, "Words Into Action I," IEEE Spectrum, Vol. 17, pp. 22-26,
  2. K. H. Davis, R. Biddulph, and S. Balashek, "Automatic Recognition of Spoken Digits," J. Acoust. Soc. Am., Vol. 24, No. 6, pp. 637-642, 1952.
  3. B. H. Juang & L. R. Rabiner, "Automatic Speech Recognition - A Brief History of the Technology Development," Encyclopedia of Language and Linguistics, 2nd Ed., Elsevier, 2005.
  4. L. Rabiner & B. Juang, "Fundamentals of Speech Recognition," Prentice Hall, New Jersey, pp. 485-486, 1993.
  5. Z. Bo, L. Juan, P. Gang, & W. Wang, "A High Performance Mandarin Digit Recognizer," Fifth International Symposium on Signal Processing and Its Applications, Vol. 2, pp. 629-632, 1999
  6. O. Deshmukh, C. Y. Espy-Wilson, & A. Juneja, "Acoustic-Phonetic Speech Parameters for Speaker-Independent Speech Recognition," International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 593-596, 2002.
  7. J. G. Wilpon & C. N. Jacobsen, "A Study of Speech Recognition for Children and the Elderly," International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 349-352, 1996.
  8. S. Yildirim & S. S. Narayanan, "An Information- Theoretic Analysis of Developmental Changes in Speech," International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 480-483, 2003.
  9. I. Kudo, T. Nakama, & T. Watanabe, "An Estimation of Speaker Sampling in Voice Across Japan Database," International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pp. 825-828, 1996.
  10. P. Dutta & A. Haubold, "Audio-Based Classification of Speaker Characteristics," 2009 International Conference on Multimedia and Expo (ICME), pp. 422-425, 2009.
  11. S. Deshpande, S. Chikkerur, & V. Govindaraju, "Accent Classification in Speech," Fourth IEEE Workshop on Automatic Identification Advanced Technologies, pp. 139-143, 2005.
  12. &
  13. R. Muralishankar & D. O'Shaughnessy, "A Comprehensive Analysis of Noise Robust Speech Features Extracted from All-Pass Based Warping with MFCC in a Noisy Phoneme Recognition," The Third International Conference on Digital Communications (ICDT), pp. 180-185, 2008.
  14. I. Gavat & C. O. Dumitru, "ASR for Romanian Language," 14th International Workshop on Systems, Signals, and Image Processing (IWSSIP), pp. 300-303, 2007.
  15. G. Tzanetakis, "Audio-Based Gender Identification Using Bootstrapping," IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 432-433, 2005.
  16. H. Kim, K. Bae, & H. Yoon, "Age and Gender Classification for a Home-Robot Service," The 16th International Symposium on Robot and Human Interactive Communication, pp. 122-126, 2007.
  17. T. Bocklet, A. Maier, J. Bauer, F. Burkhardt, & E. Noth, "Age and Gender Recognition for Telephone Applications Based on GMM Supervectors and Support Vector Machines," International Conference on Acoustics, Speech, and Signal Processing, pp. 1605-1608, 2008.
  18. X. Zhao, D. O'Shaughnessy, & N. Minh- Quang, "A Processing Method for Pitch Smoothing Based on Autocorrelation and Cepstral F0 Detection Approaches," International Symposium on Signals, Systems, and Electronics (ISSSE), pp. 59-62, 2007.
  19. J. R. Deller, J. G. Proakis, & J. H. L. Hansen, "Discrete-Time Processing of Speech Signals," Macmillan, New York, pp. 143-145, 1994.
  20. J. Wang, J.-F. Wang, & Y. Weng, "Chip Design of MFCC Extraction For Speech Recognition," The VLSI Journal, vol. 32, pp. 111-131, 2002.
  21. W. Xu, et. al., "A Noise Robust Front-End Using Wiener Filter, Probability Model and CMS for ASR," International Conference on Natural Language Processing and Knowledge Engineering, pp. 102-105, 2005.
  22. M. Dehghan, K. Faez, M. Ahmadi, and M. Shridhar, "Unconstrained Farsi Handwritten Word Recognition Using Fuzzy Vector Quantization and Hidden Markov models," Pattern Recognition Letters, vol. 22, pp. 209-214, 2001.
  23. L. Fausett, "Fundamentals of Neural Networks," Prentice-Hall, New Jersey, p. 298, 1994.