DOI QR코드

DOI QR Code

Comparison of Characteristic Vector of Speech for Gender Recognition of Male and Female

남녀 성별인식을 위한 음성 특징벡터의 비교

  • 정병구 (목포대학교 대학원 전기공학과) ;
  • 최재승 (신라대학교 전자공학과)
  • Received : 2012.03.13
  • Accepted : 2012.04.11
  • Published : 2012.07.31

Abstract

This paper proposes a gender recognition algorithm which classifies a male or female speaker. In this paper, characteristic vectors for the male and female speaker are analyzed, and recognition experiments for the proposed gender recognition by a neural network are performed using these characteristic vectors for the male and female. Input characteristic vectors of the proposed neural network are 10 LPC (Linear Predictive Coding) cepstrum coefficients, 12 LPC cepstrum coefficients, 12 FFT (Fast Fourier Transform) cepstrum coefficients and 1 RMS (Root Mean Square), and 12 LPC cepstrum coefficients and 8 FFT spectrum. The proposed neural network trained by 20-20-2 network are especially used in this experiment, using 12 LPC cepstrum coefficients and 8 FFT spectrum. From the experiment results, the average recognition rates obtained by the gender recognition algorithm is 99.8% for the male speaker and 96.5% for the female speaker.

본 논문에서는 남성화자 혹은 여성화자인지를 구분하는 성별인식 알고리즘을 제안한다. 본 논문에서는 남성화자와 여성화자의 특징벡터를 분석하며, 이러한 남녀의 특징벡터를 이용하여 신경회로망에 의한 제안한 성별인식에 대한 인식실험을 수행한다. 신경회로망의 입력신호로 사용한 특징벡터로는 10차의 LPC 켑스트럼 계수, 12차의 LPC 켑스트럼 계수, 12차의 FFT 켑스트럼 및 1차의 RMS, 12차의 LPC 켑스트럼 및 8차의 FFT 스펙트럼들이다. 본 실험에서는 특히 12차의 LPC 켑스트럼 및 8차의 저역 FFT 스펙트럼의 특징벡터를 사용하여 20-20-2의 네트워크에 의하여 신경회로망이 학습되었다. 실험결과, 남성화자에 대하여 학습 시에는 평균 99.8%, 여성화자에 대해서는 평균 96.5%의 성별인식률이 구해졌다.

Keywords

References

  1. A. A. M. Abushariah, T. S. Gunawan, O. O. Khalifa, and A. A. M. Abushariah, "English digits speech recognition system based on Hidden Markov Models", International Conference on Computer and Communication Engineering, pp. 1-5, 2010.
  2. J. G. van Velden and G. F. Smoorenburg, "Vowel recognition in noise for male, female and child voices", International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pp. 989-992, 1991.
  3. F. Hassan, M.R.A. Kotwal, and M.N. Huda, "Bangla ASR design by suppressing gender factor with gender-independent and gender-based HMM classifiers", World Congress on Information and Communication Technologies, pp. 1276-1281, 2011.
  4. Y. Konig and N. Morgan, "GDNN: a genderdependent neural network for continuous speech recognition", International Joint Conference on Neural Networks, Vol. 2, pp. 332-337, 1992.
  5. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagation errors", Nature, Vol. 323, pp. 533-536, 1986. https://doi.org/10.1038/323533a0
  6. P. B. Patil, "Multilayered network for LPC based speech recognition", IEEE Transactions on Consumer Electronics, Vol. 44, No. 2, pp. 435-438, 1998. https://doi.org/10.1109/30.681960
  7. J. He, L. Liu, and G. Palm, "On the use of residual cepstrum in speech recognition," IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 5-8, 1996.
  8. H. Hirsch and D. Pearce, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions", in Proc. ISCA ITRW ASR2000 on Automatic Speech Recognition: Challenges for the Next Millennium, Paris, France, 2000.
  9. Q. Lin and C. Che, "Normalizing the vocal tract length for speaker independent speech recognition", IEEE Signal Processing Letters, Vol. 2, No. 11, pp. 201-203, 1995. https://doi.org/10.1109/97.473645
  10. J. Onshaunjit and J. Srinonchat, "LSP Trajectory Analysis for Speech Recognition", '08. Fifth International Conference on Computer Graphics, Imaging and Visualisation, pp. 276-279, 2008.