GMM-Based Gender Identification Employing Group Delay

Group Delay를 이용한 GMM기반의 성별 인식 알고리즘

  • 이계환 (인하대학교 전자전기공학부) ;
  • 임우형 (서울대 전기컴퓨터공학부) ;
  • 김남수 (서울대 전기컴퓨터공학부) ;
  • 장준혁 (인하대학교 전자전기공학부)
  • Published : 2007.08.31

Abstract

We propose an effective voice-based gender identification using group delay(GD) Generally, features for speech recognition are composed of magnitude information rather than phase information. In our approach, we address a difference between male and female for GD which is a derivative of the Fourier transform phase. Also, we propose a novel way to incorporate the features fusion scheme based on a combination of GD and magnitude information such as mel-frequency cepstral coefficients(MFCC), linear predictive coding (LPC) coefficients, reflection coefficients and formant. The experimental results indicate that GD is effective in discriminating gender and the performance is significantly improved when the proposed feature fusion technique is applied.

본 논문은 Group Delay(GD)를 이용한 음성신호 기반의 효과적인 성별인식 시스템을 제안한다. 일반적인 음성 인식과 관련된 시스템에서 사용되는 특징들은 위상에 관한 정보를 제거한 크기만의 정보를 이용하여 구성한다. 본 연구에서는 위상에 관한 정보를 토대로 유도되어 지는 GD의 성별에 따른 특징을 알아보고, 보다 향상된 성별인식을 위해 MFCC(Mel-frequency cepstral coefficient), LPC(linear predictive coding) 계수, 반사계수(reflection coefficient) 그리고 포만트(formant)등과 같은 크기 정보와 GD를 이용한 결합 특징 벡터를 적용하였다. 실험을 통해 성별에 따른 GD의 특징을 확인할 수 있었고, 이를 이용한 제안된 특징 벡터를 사용했을 때 우수한 인식 성능을 얻을 수 있었다.

Keywords

References

  1. H. Harb and L. Chen, 'Voice-based gender identification in multimedia applications,' Intelligent Information System, 24 179-198, May 2005 https://doi.org/10.1007/s10844-005-0322-8
  2. S. Slomka, and S. Sridharan, 'Automatic gender identification optimised for language independence,' IEEE TENCON Speech and Image Technologies for Computing and Telecommunications, 1 145-148, Dec. 1997
  3. 이계환, 강상익, 김덕환, 장준혁, '음성신호 기바의 성별인식을 위한 Support vector machines의 적용.' 한국음향학회지. 26 (2) 75-79, 2007
  4. E. S. Parris and M. J. Carey, 'Language independent gender identification,' International Conference on Acoustics, Speech and Signal Processing, 2 685-688, May 1996
  5. H. Harb and L. Chen, 'Gender identification using a general audio classifier,' IEEE International Conference, 2 733-736, July 2003
  6. M. R. Schroeder, 'Models of hearing,' IEEE, 63 (9) 1332-1350, Sep, 1975 https://doi.org/10.1109/PROC.1975.9941
  7. A. V. Oppenheim and J. S. Lim, 'The importance of phase in signals,' IEEE, 69 529-541, May 1981 https://doi.org/10.1109/PROC.1981.12022
  8. B. Yegnanarayana and H. A. Murthy, 'Significance of group delay functions in spectrum estimation,' IEEE Transactions on Signal Processing, 40 (9) 2281-2289, Sep. 1992 https://doi.org/10.1109/78.157227
  9. S. B. Davis and P. Mermelstein, 'Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,' IEEE Transactions on Acoustics, Speech, and Signal Processing, 28 (4) 357-366, Aug. 1980 https://doi.org/10.1109/TASSP.1980.1163420
  10. H. Hermansky, 'Perceptual linear predictive (PLP) analysis of speech,' Journal of Acoustic Society of America, 87 (4) 1738-1752, Apr. 1990 https://doi.org/10.1121/1.399423
  11. L. Liu, J. He and G. Palm, 'Effects of phase on the perception of intervocalic stop consonants,' Speech Communication, 22 (4) 403-417, Sep, 1997 https://doi.org/10.1016/S0167-6393(97)00054-X
  12. K. K. Paliwal and L. Alsteris, 'Usefulness of phase spectrum in human speech perception: EUROSPEECH, 2117-2120, Sep. 2003
  13. K. K. Paliwal and L. D. Alsteris, 'On the usefulness of STFT phase spectrum in human listening tests,' Speech Communication, 17 (3) 578-616, May 2007
  14. L. D. Alsteris and K. K. Paliwal, 'Importance of window shape for phase-only reconstruction of speech,' IEEE International Conference on Acoustics, Speech, and Signal Processing, 11-573-576, May 2004
  15. L. D. Alsteris and K. K. Paliwal, 'Further intelligibility results from human listening tests using the short-time phase spectrum,' Speech Communication, 48 (6) 727-736, Jun. 2006 https://doi.org/10.1016/j.specom.2005.10.005
  16. S. Ramamohan and S. Dandapat, 'Sinusoidal model-based analysis and classification of stressed speech', IEEE Transactions on Audio and Language Processing, 14 (3) 737-746, May 2006 https://doi.org/10.1109/TSA.2005.858071
  17. H. A. Murthy, K. V. Madhu Murthy and B. Yegnanarayana, 'Formant extraction from phase using weighted group delay function,' IEE Electronics Letters, 25 (23) 1609-1611, Nov. 1989 https://doi.org/10.1049/el:19891080
  18. R. Smits and B. Yegnanarayana, 'Determination of instants of significant excitation in speech using group delay function,' IEEE Transactions on Speech and Audio Processing, 3 (5) 325-333, Sep. 1995 https://doi.org/10.1109/89.466662
  19. B. Yegnanarayana, D. K. Saikia and T. R. Krishnan, 'Significance of group delay function in signal reconstruction from spectral magnitude or phase', IEEE Transactions on Acoustics, Speech, and Signal Processing, 32 (3) 610-623, Jun. 1984 https://doi.org/10.1109/TASSP.1984.1164365
  20. A. V. Oppenheim and R. W. Schafer, Digital Signal Processing, (Englewood Cliffs, Prentice-Hall. 1975)
  21. B. Yegnanarayana, 'Non-spectral features for speech processing', Tutorial presentation at INTERSPEECH, Sep. 2006
  22. N. S. Kim and J.-H. Chang, 'Spectral enhancement based on global soft decision,' IEEE Signal Processing Letters, 7 (5) 108-110, May 2000 https://doi.org/10.1109/97.841154
  23. G. Xuan, W. Zhang and P. Chai, 'EM algorithm of gaussian mixture model and hidden Markov model.' International Conference on Image Processing, 1 145-148, Oct. 2001
  24. H. A. Murthy and B. Yegnanarayana, 'Formant extraction from group delay function,' Speech Communication, 10 (3) 209-221, Aug. 1991 https://doi.org/10.1016/0167-6393(91)90011-H
  25. Y. K. Muthusamy, R. A. Cole and B. T. Oshika, 'The OGI multi-language telephone speech corpus,' International Conference on Spoken Language Processing, 2 895-898, Oct. 1992