DOI QR코드

DOI QR Code

Classification of muscle tension dysphonia (MTD) female speech and normal speech using cepstrum variables and random forest algorithm

켑스트럼 변수와 랜덤포레스트 알고리듬을 이용한 MTD(근긴장성 발성장애) 여성화자 음성과 정상음성 분류

  • Yun, Joowon (Department of Speech & Language Pathology, Chungnam National University) ;
  • Shim, Heejeong (Division of Speech Pathology & Audiology, Hallym University) ;
  • Seong, Cheoljae (Department of Speech & Language Pathology, Chungnam National University)
  • 윤주원 (충남대학교 언어병리학과) ;
  • 심희정 (한림대학교 언어청각학부) ;
  • 성철재 (충남대학교 언어학과)
  • Received : 2020.08.01
  • Accepted : 2020.09.25
  • Published : 2020.12.31

Abstract

This study investigated the acoustic characteristics of sustained vowel /a/ and sentence utterance produced by patients with muscle tension dysphonia (MTD) using cepstrum-based acoustic variables. 36 women diagnosed with MTD and the same number of women with normal voice participated in the study and the data were recorded and measured by ADSVTM. The results demonstrated that cepstral peak prominence (CPP) and CPP_F0 among all of the variables were statistically significantly lower than those of control group. When it comes to the GRBAS scale, overall severity (G) was most prominent, and roughness (R), breathiness (B), and strain (S) indices followed in order in the voice quality of MTD patients. As these characteristics increased, a statistically significant negative correlation was observed in CPP. We tried to classify MTD and control group using CPP and CPP_F0 variables. As a result of statistic modeling with a Random Forest machine learning algorithm, much higher classification accuracy (100% in training data and 83.3% in test data) was found in the sentence reading task, with CPP being proved to be playing a more crucial role in both vowel and sentence reading tasks.

근긴장성 발성장애(cepstral peak prominence, MTD) 환자의 모음 발성과 문장읽기 과제를 켑스트럼 기반 변수를 이용하여 분석하였으며 음성장애 환자의 GRBAS청지각적 특성과 음향학적 특성의 상관관계를 살펴보고, 랜덤포레스트 머신러닝 분류 알고리듬을 이용한 MTD 감별 진단 가능성을 논의하였다. 내원 시 MTD로 진단받은 여성 36명과 정상음성을 사용하는 여성 36명이 연구에 참여했으며, 수집한 음성샘플은 ADSVTM를 사용하여 분석하였다. 연구 결과, 음향학적 측정치 중 MTD의 CSID(cepstral spectral index of dysphonia)는 대조군보다 높았으며, CPP(cepstral peak prominence), CPP_Fo 값이 대조군보다 유의하게 낮았다. 이는 모음 발성과 읽기 과제에서 모두 동일하게 나타났다. MTD 환자의 음질 특성은 전반적인 음성중증도(G)가 가장 두드러졌으며, 조조성(R), 기식성(B), 노력성(S)순으로 음성 특성을 보였다. 이 특성이 높아질수록 CPP가 감소하는 부적 상관을 보이고, CSID는 증가하는 정적 상관이 관찰되었다. 켑스트럴 변수 중 모음과 문장읽기과제 모두에서 집단간 유의한 차이를 보여준 CPP와 CPP_F0를 이용하여 MTD와 대조군의 음성분류를 시도하였다. 머신러닝 알고리듬인 랜덤포레스트로 모델링한 결과 문장읽기 과제에서 모음연장발성보다 조금 더 높은 분류 정확도(83.3%)가 나왔으며, 모음 발성과 문장 읽기 과제 모두에서 CPP변수가 더 중심적 역할을 수행하였음을 알 수 있었다.

Keywords

References

  1. Alharbi, G. G., Cannito, M. P., Buder, E. H., & Awan, S. N. (2019). Spectral/cepstral analyses of phonation in Parkinson's disease before and after voice treatment: A preliminary study. Folia Phoniatrica et Logopaedica, 71(5-6), 275-285. https://doi.org/10.1159/000495837
  2. Altman, K. W., Atkinson, C., & Lazarus, C. (2005). Current and emerging concepts in muscle tension dysphonia: A 30-month review. Journal of Voice, 19(2), 261-267. https://doi.org/10.1016/j.jvoice.2004.03.007
  3. Awan, S. N., & Roy, N. (2006). Toward the development of an objective index of dysphonia severity: A four-factor acoustic model. Clinical Linguistics & Phonetics, 20(1), 35-49. https://doi.org/10.1080/02699200400008353
  4. Awan, S. N., & Roy, N. (2009). Outcomes measurement in voice disorders: Application of an acoustic index of dysphonia severity. Journal of Speech, Language & Hearing Research, 52(2), 482-499. https://doi.org/10.1044/1092-4388(2008/08-0034)
  5. Awan, S. N., Roy, N., & Cohen, S. M. (2014). Exploring the relationship between spectral and cepstral measures of voice and the Voice Handicap Index (VHI). Journal of Voice, 28(4), 430-439. https://doi.org/10.1016/j.jvoice.2013.12.008
  6. Awan, S. N., Roy, N., Zhang, D., & Cohen, S. M. (2016). Validation of the cepstral spectral index of dysphonia (CSID) as a screening tool for voice disorders: Development of clinical cutoff scores. Journal of Voice, 30(2), 130-144. https://doi.org/10.1016/j.jvoice.2015.04.009
  7. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
  8. Choi, S.H., & Choi, C. H. (2016). The effect of gender and speech task on cepstral- and spectral-measures of Korean normal speakers. Audiology and Speech Research, 12(3), 157-163. https://doi.org/10.21848/asr.2016.12.3.157
  9. Heman-Ackah, Y. D., Michael, D., & Goding, G. (2002). The relationship between cepstral peak prominence and selected parameters of dysphonia. Journal of Voice, 16(1), 20-27. https://doi.org/10.1016/S0892-1997(02)00067-X
  10. Heman-Ackah, Y., Heuer, R., Michael, D., Ostrowski, R., Horman, M., Baroody, M., Hillenbrand, J., & Sataloff, R. (2003). Cepstral peak prominence: A more reliable measure of dysphonia. Annals of Otology, Rhinogogy & Laryngology, 112(4), 324-333. https://doi.org/10.1177/000348940311200406
  11. Hillenbrand, J., & Houde, R. (1996). Acoustic correlates of breathy vocal quality: Dysphonic voices and continuous speech. Journal of Speech, Language, and Hearing Research, 39(2), 311-321. https://doi.org/10.1044/jshr.3902.311
  12. Hillenbrand, J., Cleveland, R. A., & Erickson, R. L. (1994). Acoustic correlates of breathy vocal quality. Journal of Speech, Language, and Hearing Research, 37(4), 769-778. https://doi.org/10.1044/jshr.3704.769
  13. Jalalinajafabadi, F., Gadepalli, C., Ascott, F., Homer, J., Lujan, M., & Cheetham, B. (2013, November). Perceptual evaluation of voice quality and its correlation with acoustic measurement. Proceedings of the 2013 European Modelling Symposium (pp. 283-286). Manchester, UK.
  14. Kim, G. H., Lee, Y. W., Park, H. J., Bae, I. H., & Kwon, S. B. (2017). A study of cepstral peak prominence characteristics in ADSV, speech tool and praat. Journal of Speech-Language & Hearing Disorders, 26(3), 99-111. https://doi.org/10.15724/jslhd.2017.26.3.009
  15. Kim, N. S., & Seong, C. J. (2017). The acoustic characteristics and classification variables of two Hyponasal groups. The Linguistic Society of Korea, 78, 31-61.
  16. Koufman, J. A., & Blalock, P. D. (1988). Vocal fatigue and dysphonia in the professional voice user: Bogart-bacall syndrome. The Laryngoscope, 98(5), 493-498.
  17. Kumar, B., Bhat, J., & Prasad, N. (2010). Cepstral analysis of voice in persons with vocal nodules. Journal of Voice, 24(6), 651-653. https://doi.org/10.1016/j.jvoice.2009.07.008
  18. Lowell, S. Y., Kelley, R. T., Awan, S. N., Colton, R. H., & Chan, N. H. (2012). Spectral- and cepstral-based acoustic features of dysphonic, strained voice quality. Annals of Otology, Rhinology & Laryngology, 121(8), 539-548. https://doi.org/10.1177/000348941212100808
  19. Noh, S. H., Kim, S. Y., Cho, J. K., Lee, S. H., & Jin, S. M. (2017). Differentiation of adductor-type spasmodic dysphonia from muscle tension dysphonia using spectrogram. Journal of Korean Society of Laryngology, Phoniatrics and Logopedics, 28(2), 100-105. https://doi.org/10.22469/jkslp.2017.28.2.100
  20. Park, J. H. (2011). A study on aspects of vocal cord vibration and acoustic characteristics according to types of muscle tension dysphonia (Master's thesis). Daegu University, Daegu, Korea.
  21. Peterson, E. A., Roy, N., Awan, S. N., Merrill, R. M., Banks, R., & Tanner, K. (2013). Toward validation of the cepstral spectral index of dysphonia (CSID) as an objective treatment outcomes measure. Journal of Voice, 27(4), 401-410. https://doi.org/10.1016/j.jvoice.2013.04.002
  22. Pyo, H. Y., & Shim, H. S. (2007). A study for the development of Korean voice assessment model for the patients with voice disorders: A qualitative study. Phonetics and Speech Sciences, 14(2), 7-22.
  23. Rubin, J. S., Sataloff, R. T., & Korovin, G. S. (2006). Diagnosis and treatment of voice disorders. San Diego, CA: Plural.
  24. Seo, I. (2014). Acoustic measures of voice quality and phonation types across speech conditions in dysarthria (Doctoral dissertation). Chungnam National University, Chungnam, Korea.
  25. Seo, I. H., & Lee, O. B. (2015). Cepstral and spectral analysis of whispery voice by healthy adults: Preliminary study. Journal of Speech-Language & Hearing Disorders, 24(4), 259-266. https://doi.org/10.15724/JSLHD.2015.24.4.024024
  26. Seo, I. H., & Seong, C. J. (2013). Voice quality of dysarthric speakers in connected speech. Journal of the Korean Society of Speech Science, 5(4), 33-41.
  27. Shim, H. J., Jang, H. R., Shin, H. B., & Ko, D. H. (2015). Cepstral, spectral and time-based analysis of voices of esophageal speakers. Folia Phoniatrica et Logopaedica, 67(2), 90-96. https://doi.org/10.1159/000439379
  28. Shim, H. J., Jung, H., Lee, S. A., Choi, B. H., Heo, J. H., & Ko, D. H. (2016a). Cepstral and spectral analaysis of voices with adductor spasmodic dysphonia. Phonetics and Speech Sciences, 8(2), 73-80. https://doi.org/10.13064/KSSS.2016.8.2.073
  29. Shin, H. B., Shim, H. J., Jung, H., Ko, D. H. (2018). Characteristics of voice quality on clear versus casual speech in individuals with Parkinson's disease. Phonetics and Speech Sciences, 10(2), 77-84. https://doi.org/10.13064/KSSS.2018.10.2.077
  30. Watts, C., & Awan, S. (2011). Use of spectral/cepstral analyses for differentiating normal from hypofunctional voices in sustained vowel and continuous speech contexts. Journal of Speech, Language, and Hearing Research, 54(6), 1525-1537. https://doi.org/10.1044/1092-4388(2011/10-0209)
  31. Yu, M., Choi, S. H., Choi, C. H., & Choi, B. (2018). Predicting normal and pathological voice using a cepstral based acoustic index in sustained vowels versus connected speech. Communication Sciences & Disorders, 23(4), 1055-1064. https://doi.org/10.12963/csd.18550