DOI QR코드

DOI QR Code

Correlation analysis of voice characteristics and speech feature parameters, and classification modeling using SVM algorithm

목소리 특성과 음성 특징 파라미터의 상관관계와 SVM을 이용한 특성 분류 모델링

  • Received : 2017.10.30
  • Accepted : 2017.12.15
  • Published : 2017.12.31

Abstract

This study categorizes several voice characteristics by subjective listening assessment, and investigates correlation between voice characteristics and speech feature parameters. A model was developed to classify voice characteristics into the defined categories using SVM algorithm. To do this, we extracted various speech feature parameters from speech database for men in their 20s, and derived statistically significant parameters correlated with voice characteristics through ANOVA analysis. Then, these derived parameters were applied to the proposed SVM model. The experimental results showed that it is possible to obtain some speech feature parameters significantly correlated with the voice characteristics, and that the proposed model achieves the classification accuracies of 88.5% on average.

Keywords

References

  1. Bang, J., & Lee, S. (2015). Adaptive speech emotion recognition framework using prompted labeling technique. KIISE Transaction on Computing Practices, 21(2), 160-165. (방재훈.이승룡 (2015). 프롬프트 레이블링을 이용한 적응형 음성기반 감정인식 프레임워크. 한국정보과학회 컴퓨팅의 실제 논문집, 21(2), 160-165.) https://doi.org/10.5626/KTCP.2015.21.2.160
  2. Rahman, T., & Busso, C. (2012). A personalized emotion recognition system using an unsupervised feature adaptation scheme. Proceedings of International Conference on the Acoustics, Speech and Signal Processing (pp. 5117-5120).
  3. Kwon, C., Song, S., Kim, J., Kim, K., & Jang, J. (2012). Extraction of speech features for emotion recognition. Phonetics and Speech Sciences, 4(2), 73-78. (권철홍.송승규.김종열.김근호.장준수 (2012). 감정 인식을 위한 음성 특징 도출. 말소리와 음성과학, 4(2), 73-78.) https://doi.org/10.13064/KSSS.2012.4.2.073
  4. Kim, J., & Kwon, C. (2014). Measuring correlation between mental fatigues and speech features. Phonetics and Speech Sciences, 6(2), 3-8. (김정인.권철홍 (2014). 정신피로와 음성특징과의 상관관계 측정. 말소리와 음성과학, 6(2), 3-8.) https://doi.org/10.13064/KSSS.2014.6.2.003
  5. Kim, T., & Kwon, C. (2015). Correlation between physical fatigue and speech signals. Phonetics and Speech Sciences, 7(1), 11-17. (김태훈.권철홍 (2015). 육체피로와 음성신호와의 상관관계. 말소리와 음성과학, 7(1), 11-17.) https://doi.org/10.13064/KSSS.2015.7.1.011
  6. Boersma, P., & Weenink, D. (2016). Praat: doing phonetics by computer [computer program]. Retrieved from http://www.praat.org on December, 2016.
  7. MDVP: Multi Dimensional Voice Program, KayPentax. Retrieved from http://www.kayelemetrics.com on January, 2017.
  8. Shue, Y., Keating, P., Vicenik, C., & Yu, K. (2011). VoiceSauce: a program for voice analysis. Proceedings of the 17th International Congress of Phonetic Sciences (pp. 1846-1849). Retrieved from http://www.seas.ucla.edu/spapl/voicesauce/ on January, 2017.
  9. Han, S., Kim, S., Kim, J., & Kwon, C. (2011). A preliminary study on correlation between voice characteristics and speech features. Phonetics and Speech Sciences, 3(4), 85-91. (한성만.김상범.김종열.권철홍 (2011). 목소리 특성의 주관적 평가와 음성 특징과의 상관관계 기초연구. 말소리와 음성과학, 3(4), 85-91.)
  10. Song, J. (2015). SPSS/AMOS statistical analysis method required for preparation of thesis. Seoul: 21segisa. (송지준 (2015). 논문작성에 필요한 SPSS/AMOS 통계분석방법. 서울: 21 세기사.)
  11. IBM SPSS statistics, IBM Korea. Retrieved from http://www-01.ibm.com/software/kr/analytics/spss/ on January, 2017.
  12. Chang, C., & Lin, C. (2011). LIBSVM: a library for support vector machines. ACM Transaction on Intelligent Systems and Technology, 2(3), 1-27. Retrieved from http://www.csie.ntu.edu.tw/-cjlin/libsvm/ on January, 2017.
  13. Kim, T., & Kwon, C. (2016). An SVM-based physical fatigue diagnostic model speech features. Phonetics and Speech Sciences, 8(2), 17-22. (김태훈.권철홍 (2016). 음성 특징 파라미터를 이용한 SVM 기반 육체피로도 진단모델. 말소리와 음성과학, 8(2), 17-22.) https://doi.org/10.13064/KSSS.2016.8.2.017
  14. Alippi, C., Roveri, M. (2010). Virtual k-fold cross validation: An effective method for accuracy assessment. The 2010 International Joint Conference on Neural Networks (IJCNN), 18-23.
  15. Ferrand, C. (2002). Harmonics-to-Noise Ratio: an index of vocal aging. Journal of Voice, 16(4), 480-487. https://doi.org/10.1016/S0892-1997(02)00123-6
  16. Boersma, P. (1993). Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Proceedings of Institute of Phonetic Sciences, 17, 97-110.
  17. Hillenbrand, J., & Houde, R. (1996). Acoustic correlates of breathy vocal quality: dysphonic voices and continuous speech. Journal of Speech and Hearing Research, 39, 311-321. https://doi.org/10.1044/jshr.3902.311
  18. Linville, S. (2002). Source characteristics of aged voice assessed from long-term average spectra. Journal of Voice, 16(4), 472-479. https://doi.org/10.1016/S0892-1997(02)00122-4
  19. Mendoza, E., Valencia, N., Munöz, J., & Trujillo, H. (1996). Differences in voice quality between men and women: use of the long-term average spectrum (LTAS). Journal of Voice, 10(1), 59-66. https://doi.org/10.1016/S0892-1997(96)80019-1