DOI QR코드

DOI QR Code

Analysis of Feature Extraction Methods for Distinguishing the Speech of Cleft Palate Patients

구개열 환자 발음 판별을 위한 특징 추출 방법 분석

  • 김성민 (인천대학교 정보기술대학 컴퓨터공학부) ;
  • 김우일 (인천대학교 정보기술대학 컴퓨터공학부) ;
  • 권택균 (서울대학교 의과대학 이비인후과교실) ;
  • 성명훈 (서울대학교 의과대학 이비인후과교실) ;
  • 성미영 (인천대학교 정보기술대학 컴퓨터공학부)
  • Received : 2015.05.04
  • Accepted : 2015.09.16
  • Published : 2015.11.15

Abstract

This paper presents an analysis of feature extraction methods used for distinguishing the speech of patients with cleft palates and people with normal palates. This research is a basic study on the development of a software system for automatic recognition and restoration of speech disorders, in pursuit of improving the welfare of speech disabled persons. Monosyllable voice data for experiments were collected for three groups: normal speech, cleft palate speech, and simulated clef palate speech. The data consists of 14 basic Korean consonants, 5 complex consonants, and 7 vowels. Feature extractions are performed using three well-known methods: LPC, MFCC, and PLP. The pattern recognition process is executed using the acoustic model GMM. From our experiments, we concluded that the MFCC method is generally the most effective way to identify speech distortions. These results may contribute to the automatic detection and correction of the distorted speech of cleft palate patients, along with the development of an identification tool for levels of speech distortion.

본 논문에서는 구개열 환자의 장애 발음과 정상인의 발음을 자동으로 구분하여 판별하는데 사용될 수 있는 특징 추출 방법들의 성능을 분석하는 실험에 대하여 소개한다. 이 연구는 발성 장애인의 복지 향상을 추구하며 수행하고 있는 장애 음성 자동 인식 및 복원 소프트웨어 시스템 개발의 기초과정이다. 실험에 사용된 음성 데이터는 정상인의 발음, 구개열 환자의 발음, 그리고 모의 환자의 발음의 세 그룹으로부터 수집된 한국어 단음절로서 14개의 기본 자음과 5개의 복합 자음, 7개 모음이다. 발음의 특징 추출은 LPCC, MFCC, PLP의 세 가지 방법으로 각각 수행하였고, GMM 음향 모델로 인식 훈련을 한 후, 수집된 단음절 데이터를 대상으로 하여 인식 실험을 실시하였다. 실험 결과, 정상인과 구개열 환자의 장애 발음을 구별하기 위하여 특징을 추출함에 있어서 MFCC 방법이 전반적으로 가장 우수하였다. 본 연구의 결과는 구개열 환자의 부정확한 발음을 자동으로 인식하고 복원하는 연구와 구개열 장애 발음의 정도를 측정할 수 있는 도구에 대한 연구에 도움이 될 것으로 기대된다.

Keywords

Acknowledgement

Supported by : 인천대학교

References

  1. C. W. Lee, et al., "Prevalence of orofacial clefts in Korean live births," Obstet Gynecol Sci, Vol. 58, No. 3, pp. 196-202, May. 2015. https://doi.org/10.5468/ogs.2015.58.3.196
  2. S. G. Fletcher, "Theory and instrumentation for quantitative measurement of nasality," Cleft Palate Journal, Vol. 7, pp. 601-609, 1970.
  3. J.-E. Lee, et al., "Research on Construction of the Korean Speech Corpus in Patient with Velopharyngeal Insufficiency," Korean Journal of Otorhinolaryngol - Head & Neck Surgery, Vol. 55, No. 8, pp. 498-507, 2012. (in Korean) https://doi.org/10.3342/kjorl-hns.2012.55.8.498
  4. S. M. Kim, et al., "Analysis of the Feature Extraction Methods for Detecting the Distorted Speech of Cleft Palate Patients," Proc. of the KSCSP 2014, Vol. 31, No. 1, pp. 107-109, Aug. 2014. (in Korean)
  5. Y. M. Lee, J. E. Sung, H. S. Sim, "Consonant Confusions Matrices in Adults with Dysarthria Associated with Cerebral Palsy," Journal of Korean Society of Speech Sciences, Vol. 5, No. 1 pp. 47-54, 2013. (in Korean)
  6. D.-L. Choi, B.-W. Kim, M. H. Chung, Y.-J. Lee, "Design and Creation of Speech Database for Development of QoLT Software Technology," Proc. of the HCI 2012, pp. 121-124, 2012. (in Korean)
  7. M. J. Kim, J. H. Yoo, H. R. Kim, "Dysarthric Speech Recognition Using Dysarthria-Severity-Dependent and Speaker-Adaptive Models," INTERSPEECH 2013, pp. 3622-3626, 2013.
  8. W. K. Seong, J. H. Park, and H. K. Kim, "Dysarthric speech recognition error correction using weighted finite state transducers based on context-dependent pronunciation variation," Computers Helping People with Special Needs, Vol. 7383, pp. 475-482, Jul. 2012. https://doi.org/10.1007/978-3-642-31534-3_70
  9. H.-G. Shin, O.-W. Kim, H.-G. Kim, "The Speech of Cleft Palate Patients using Nasometer, EPG and Computer based Speech Analysis System," Speech Sciences, Vol. 4, No. 2, pp. 69-89, 1998. (in Korean)
  10. J. S. Han, H. S. Sim, "Comparison of the Percentage of Correct Consonants, Speech Intelligibility, and Speech Acceptability among Children with Cleft Palate, Children with Functional Articulation Disorder, and Normally Developing Children," Korean Journal of Communication Disorders, Vol. 13, No. 3, pp. 454-476, 2008. (in Korean)
  11. A. Maier, F. Honig, T. Bocklet, and E. Noth, "Automatic detection of articulation disorders in children with cleft lip and palate," Journal of Acoustical Society of America, Vol. 126, No. 5, pp. 2589-2602, Nov. 2009. https://doi.org/10.1121/1.3216913
  12. L. He, J. Zhang, Q. Liu, H. Yin, M. Lech, "Automatic Evaluation of Hypernasality and Consonant Misarticulation in Cleft Palate Speech," IEEE Signal Processing Letters, Vol. 21, No. 10, Oct. 2014.
  13. J. R. Deller, Jr., J. H. L. Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signals, IEEE Press, 2000.
  14. Y.-G. Jung, M.-S. Han, and S.-J. Lee, "Effective Feature Vector for Isolated-Word Recognizer using Vocal Cord Signal," Journal of KIISE : Software and Applications, Vol. 34, No. 3, pp. 226-234, Mar. 2007. (in Korean) (in Korean)
  15. L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, 1978.
  16. H. Hermansky, "Perceptual Linear Predictive (PLP) Analysis of Speech," Journal of Acoustic Society America, Vol. 87, No. 4, pp. 1738-52, Apr. 1990. https://doi.org/10.1121/1.399423
  17. M. Y. Sung, et al., "Analysis on Vowel and Consonants Sounds of Patient's Speech with Velopharyngeal Insufficiency (VPI) and Simulated Speech," Journal of Korea Institute of Information and Communication Engineering, Vol. 18, No. 7, pp. 1740-1748, Jul. 2014. (in Korean) https://doi.org/10.6109/jkiice.2014.18.7.1740
  18. S.-H. Chung and M.-U. Park, "A Parallel Speech Recognition System based on Hidden Markov Model," Journal of KIISE : Computer Systems and Theory, Vol. 27, No. 12, pp. 951-959, Dec. 2000. (in Korean)
  19. The Hidden Markov Model Toolkit (HTK). [Online]. Available: http://htk.eng.cam.ac.uk