남성과 여성의 음성 특징 비교 및 성별 음성인식에 의한 인식 성능의 향상

Comparison of Male/Female Speech Features and Improvement of Recognition Performance by Gender-Specific Speech Recognition

  • 이창영 (동서대학교 정보시스템공학부)
  • 투고 : 2010.10.01
  • 심사 : 2010.12.10
  • 발행 : 2010.12.31

초록

음성인식에서의 인식률 향상을 위한 노력의 일환으로서, 본 논문에서는 성별을 구분하지 않는 일반적 화자독립 음성인식과 성별에 따른 음성인식의 성능을 비교하는 연구를 수행하였다. 실험을 위해 남녀 각 20명의 화자로 하여금 각각 300단어를 발성하게 하고, 그 음성 데이터를 여성/남성/혼성A/혼성B의 네 그룹으로 나누었다. 우선, 성별 음성인식에 대한 근거의 타당성을 파악하기 위하여 음성 신호의 주파수 분석 및 MFCC 특징벡터들의 성별 차이를 조사하였다. 그 결과, 성별 음성인식의 동기를 뒷받침할 정도의 두드러진 성별 차이가 확인되었다. 음성인식을 수행한 결과, 성을 구분하지 않는 일반적인 화자독립의 경우에 비해 성별 음성인식에서의 오류율이 절반 이하로 떨어지는 것으로 나타났다. 이로부터, 성 인식과 성별 음성인식을 계층적으로 수행함으로써 화자독립의 인식률을 높일 수 있을 것으로 사료된다.

참고문헌

  1. G. Kaplan, "Words Into Action I," IEEE Spectrum, Vol. 17, pp. 22-26,
  2. K. H. Davis, R. Biddulph, and S. Balashek, "Automatic Recognition of Spoken Digits," J. Acoust. Soc. Am., Vol. 24, No. 6, pp. 637-642, 1952.
  3. B. H. Juang & L. R. Rabiner, "Automatic Speech Recognition - A Brief History of the Technology Development," Encyclopedia of Language and Linguistics, 2nd Ed., Elsevier, 2005.
  4. L. Rabiner & B. Juang, "Fundamentals of Speech Recognition," Prentice Hall, New Jersey, pp. 485-486, 1993.
  5. Z. Bo, L. Juan, P. Gang, & W. Wang, "A High Performance Mandarin Digit Recognizer," Fifth International Symposium on Signal Processing and Its Applications, Vol. 2, pp. 629-632, 1999
  6. O. Deshmukh, C. Y. Espy-Wilson, & A. Juneja, "Acoustic-Phonetic Speech Parameters for Speaker-Independent Speech Recognition," International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 593-596, 2002.
  7. J. G. Wilpon & C. N. Jacobsen, "A Study of Speech Recognition for Children and the Elderly," International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 349-352, 1996.
  8. S. Yildirim & S. S. Narayanan, "An Information- Theoretic Analysis of Developmental Changes in Speech," International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 480-483, 2003.
  9. I. Kudo, T. Nakama, & T. Watanabe, "An Estimation of Speaker Sampling in Voice Across Japan Database," International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pp. 825-828, 1996.
  10. P. Dutta & A. Haubold, "Audio-Based Classification of Speaker Characteristics," 2009 International Conference on Multimedia and Expo (ICME), pp. 422-425, 2009.
  11. S. Deshpande, S. Chikkerur, & V. Govindaraju, "Accent Classification in Speech," Fourth IEEE Workshop on Automatic Identification Advanced Technologies, pp. 139-143, 2005.
  12. http://en.wikipedia.org/wiki/File:Tenor.png & http://en.wikipedia.org/wiki/File:Sopran.png.
  13. R. Muralishankar & D. O'Shaughnessy, "A Comprehensive Analysis of Noise Robust Speech Features Extracted from All-Pass Based Warping with MFCC in a Noisy Phoneme Recognition," The Third International Conference on Digital Communications (ICDT), pp. 180-185, 2008.
  14. I. Gavat & C. O. Dumitru, "ASR for Romanian Language," 14th International Workshop on Systems, Signals, and Image Processing (IWSSIP), pp. 300-303, 2007.
  15. G. Tzanetakis, "Audio-Based Gender Identification Using Bootstrapping," IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 432-433, 2005.
  16. H. Kim, K. Bae, & H. Yoon, "Age and Gender Classification for a Home-Robot Service," The 16th International Symposium on Robot and Human Interactive Communication, pp. 122-126, 2007.
  17. T. Bocklet, A. Maier, J. Bauer, F. Burkhardt, & E. Noth, "Age and Gender Recognition for Telephone Applications Based on GMM Supervectors and Support Vector Machines," International Conference on Acoustics, Speech, and Signal Processing, pp. 1605-1608, 2008.
  18. X. Zhao, D. O'Shaughnessy, & N. Minh- Quang, "A Processing Method for Pitch Smoothing Based on Autocorrelation and Cepstral F0 Detection Approaches," International Symposium on Signals, Systems, and Electronics (ISSSE), pp. 59-62, 2007.
  19. J. R. Deller, J. G. Proakis, & J. H. L. Hansen, "Discrete-Time Processing of Speech Signals," Macmillan, New York, pp. 143-145, 1994.
  20. J. Wang, J.-F. Wang, & Y. Weng, "Chip Design of MFCC Extraction For Speech Recognition," The VLSI Journal, vol. 32, pp. 111-131, 2002. https://doi.org/10.1016/S0167-9260(02)00045-7
  21. W. Xu, et. al., "A Noise Robust Front-End Using Wiener Filter, Probability Model and CMS for ASR," International Conference on Natural Language Processing and Knowledge Engineering, pp. 102-105, 2005.
  22. M. Dehghan, K. Faez, M. Ahmadi, and M. Shridhar, "Unconstrained Farsi Handwritten Word Recognition Using Fuzzy Vector Quantization and Hidden Markov models," Pattern Recognition Letters, vol. 22, pp. 209-214, 2001. https://doi.org/10.1016/S0167-8655(00)00090-8
  23. L. Fausett, "Fundamentals of Neural Networks," Prentice-Hall, New Jersey, p. 298, 1994.