DOI QR코드

DOI QR Code

Comparison of Classification Performance Between Adult and Elderly Using Acoustic and Linguistic Features from Spontaneous Speech

자유대화의 음향적 특징 및 언어적 특징 기반의 성인과 노인 분류 성능 비교

  • Received : 2023.04.24
  • Accepted : 2023.07.19
  • Published : 2023.08.31

Abstract

This paper aims to compare the performance of speech data classification into two groups, adult and elderly, based on the acoustic and linguistic characteristics that change due to aging, such as changes in respiratory patterns, phonation, pitch, frequency, and language expression ability. For acoustic features we used attributes related to the frequency, amplitude, and spectrum of speech voices. As for linguistic features, we extracted hidden state vector representations containing contextual information from the transcription of speech utterances using KoBERT, a Korean pre-trained language model that has shown excellent performance in natural language processing tasks. The classification performance of each model trained based on acoustic and linguistic features was evaluated, and the F1 scores of each model for the two classes, adult and elderly, were examined after address the class imbalance problem by down-sampling. The experimental results showed that using linguistic features provided better performance for classifying adult and elderly than using acoustic features, and even when the class proportions were equal, the classification performance for adult was higher than that for elderly.

사람은 노화과정에 따라 발화의 호흡, 조음, 높낮이, 주파수, 언어 표현 능력 등이 변화한다. 본 논문에서는 이러한 변화로부터 발생하는 음향적, 언어적 특징을 기반으로 발화 데이터를 성인과 노인 두 그룹으로 분류하는 성능을 비교하고자 한다. 음향적 특징으로는 발화 음성의 주파수 (frequency), 진폭(amplitude), 스펙트럼(spectrum)과 관련된 특징을 사용하였으며, 언어적 특징으로는 자연어처리 분야에서 우수한 성능을 보이고 있는 한국어 대용량 코퍼스 사전학습 모델인 KoBERT를 통해 발화 전사문의 맥락 정보를 담은 은닉상태 벡터 표현을 추출하여 사용하였다. 본 논문에서는 음향적 특징과 언어적 특징을 기반으로 학습된 각 모델의 분류 성능을 확인하였다. 또한, 다운샘플링을 통해 클래스 불균형 문제를 해소한 뒤 성인과 노인 두 클래스에 대한 각 모델의 F1 점수를 확인하였다. 실험 결과로, 음향적 특징을 사용하였을 때보다 언어적 특징을 사용하였을 때 성인과 노인 분류에서 더 높은 성능을 보이는 것으로 나타났으며, 클래스 비율이 동일하더라도 노인에 대한 분류 성능보다 성인에 대한 분류 성능이 높음을 확인하였다.

Keywords

Acknowledgement

이 논문은 2022년도 정부(과학기술정보통신부)의 재원으로 국가과학기술연구회 창의형 융합연구사업(No.CAP21052-300)의 지원을 받아 수행된 연구임.

References

  1. J. W. Kim and H. H. Kim, "Communicative ability in normal aging: A Review," Korean Journal of Communication Disorders, Vol.14, No.4, pp.495-513, 2009.
  2. G. Gosztolya, V. Vincze, L. Toth, M. Pakaski, J. Kalman, and I. Hoffmann, "Identifying mild cognitive impairment and mild Alzheimer's disease based on spontaneous speech using ASR and linguistic features," Computer Speech & Language, Vol.53, pp.181-197, 2019. https://doi.org/10.1016/j.csl.2018.07.007
  3. M. R. Morales and R. Levitan, "Speech vs. text: A comparative analysis of features for depression detection systems," 2016 IEEE spoken language technology workshop (SLT), IEEE. 2016.
  4. I. Vigo, L. Coelho, and S. Reis. "Speech-and language-based classification of alzheimer's disease: A systematic review," Bioengineering, Vol.9, No.1, pp.27, 2022.
  5. M. Ehghaghi, F. Rudzicz, and J. Novikova, "Data-driven approach to differentiating between depression and dementia from noisy speech and language data," arXiv preprint arXiv:2210.03303, 2022.
  6. S. H. Han, S. H. Dong, and B. O. Kang, "Comparison of classification performance between adult and elderly using acoustic and linguistic features from spontaneous speech," in Proceedings of the Korea Conference on Software Engineering (KCSE) 2023, Vol.25, pp.117-118, 2023.
  7. F. Eyben, M. Wollmer, and B. Schuller, "Opensmile: The munich versatile and fast open-source audio feature extractor," Proceedings of the 18th ACM International Conference on Multimedia, pp.1459-1462, 2010.
  8. SKTBrain, "KoBERT", github repository, accessed Dec. 16, 2022, [Internet], https://github.com/SKTBrain/KoBERT
  9. F. Rangel, F. Celli, P. Rosso, M. Potthast, B. Stein, and W. Daelemans, "Overview of the 3rd Author Profiling Task at PAN 2015," Conference and Labs of the Evaluation Forum, 2015.
  10. A. Liesenfeld, G. Parti, Y. Y. Hsu, and C. R. Huang, "Predicting gender and age categories in English conversations using lexical, non-lexical, and turn-taking features," arXiv preprint arXiv:2102.13355, 2021.
  11. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
  12. B. P. R. Guda, A. Garimella, and N. Chhaya. "Empathbert: A bert-based framework for demographic-aware empathy prediction," arXiv preprint arXiv:2102.00272, 2021.
  13. B. Schuller et al., "Cross-corpus acoustic emotion recognition: Variances and strategies," IEEE Transactions on Affective Computing, Vol.1, No.2, pp.119-131, 2010. https://doi.org/10.1109/T-AFFC.2010.8
  14. F. Eyben et al., "The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing," IEEE Transactions on Affective Computing, Vol.7, No.2, pp.190-202, 2015. https://doi.org/10.1109/TAFFC.2015.2457417
  15. B. Schuller et al., "The INTERSPEECH 2010 paralinguistic challenge," 2010.
  16. F. Burkhardt, M. Bruckl, and B. Schuller, "Age classification: Comparison of human vs machine performance in prompted and spontaneous speech," Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2021, pp.35-42, 2021.
  17. T. Pires, E. Schlinger, and D. Garrette, "How multilingual is multilingual BERT?," arXiv preprint arXiv:1906.01502, 2019.
  18. S. Lee, H. Jang, Y. Baik, S. Park, and H. Shin, "Kr-bert: A small-scale korean-specific language model," arXiv preprint arXiv:2008.03979, 2020.
  19. "자유대화 음성(일반남여)," AI-Hub, accessed Dec. 16, 2022, https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=109
  20. "자유대화 음성(노인남여)," AI-Hub, accessed Dec. 16, 2022, https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=107