DOI QR코드

DOI QR Code

An analysis of emotional English utterances using the prosodic distance between emotional and neutral utterances

영어 감정발화와 중립발화 간의 운율거리를 이용한 감정발화 분석

  • 이서배 (험프리스 웨스트 초등학교)
  • Received : 2020.07.30
  • Accepted : 2020.09.10
  • Published : 2020.09.30

Abstract

An analysis of emotional English utterances with 7 emotions (calm, happy, sad, angry, fearful, disgust, surprised) was conducted using the measurement of prosodic distance between 672 emotional and 48 neutral utterances. Applying the technique proposed in the automatic evaluation model of English pronunciation to the present study on emotional utterances, Euclidean distance measurement of 3 prosodic elements such as F0, intensity and duration extracted from emotional and neutral utterances was utilized. This paper, furthermore, extended the analytical methods to include Euclidean distance normalization, z-score and z-score normalization resulting in 4 groups of measurement schemes (sqrF0, sqrINT, sqrDUR; norsqrF0, norsqrINT, norsqrDUR; sqrzF0, sqrzINT, sqrzDUR; norsqrzF0, norsqrzINT, norsqrzDUR). All of the results from perceptual analysis and acoustical analysis of emotional utteances consistently indicated the greater effectiveness of norsqrF0, norsqrINT and norsqrDUR, among 4 groups of measurement schemes, which normalized the Euclidean measurement. The greatest acoustical change of prosodic information influenced by emotion was shown in the values of F0 followed by duration and intensity in descending order according to the effect size based on the estimation of distance between emotional utterances and neutral counterparts. Tukey Post Hoc test revealed 4 homogeneous subsets (calm

본 연구는 영어 발화에 나타난 7가지 감정들(calm, happy, sad, angry, fearful, disgust, surprised)을 분석하고자 감정발화(672개)와 감정중립 발화(48개)와의 운율적 거리를 측정하였다. 이를 위해 외국어 발음평가에 사용되었던 방법을 적용하여 음의 높낮이(Hz), 음의 강도(dB), 음의 길이(sec)와 같은 운율의 3요소를 유클리디언 거리로 계산하였는데 기존연구에서 더 나아가 유클리디언 거리계산 정규화 방법, z-score 방법 그리고 z-score 정규화 방법을 추가해 총 4가지 그룹(sqrF0, sqrINT, sqrDUR; norsqrF0, norsqrINT, norsqrDUR; sqrzF0, sqrzINT, sqrzDUR; norsqrzF0, norsqrzINT, norsqrzDUR)의 방법을 분석에 사용하였다. 그 결과 인지적 측면과 음향적 측면의 분석 모두에서 유클리디언 운율거리를 정규화한 norsqrF0, norsqrINT, norsqrDUR이 일관성 있게 가장 효과적인 측정방법으로 나타났다. 유클리디언 거리계산 정규화 방법으로 감정발화와 감정중립 발화를 비교했을 때, 전반적으로 감정에 따른 운율의 변화는 음의 높낮이(Hz)가 가장 크고 그다음 음의 길이(sec), 그리고 음의 강도(dB)가 가장 작게 나타났다. Tukey 사후검증 결과 norsqrF0의 경우 calm

Keywords

References

  1. Bachorowski, J., & Owren, M. J. (1995). Vocal expression of emotion: Acoustic properties of speech are associated with emotional intensity and context. Psychological Science, 6(4), 219-224. https://doi.org/10.1111/j.1467-9280.1995.tb00596.x
  2. Boersma, P., & Weenink, D. (2020). Praat: Doing phonetics by computer (version 6.1.16) [Computer program]. Retrieved from https://www.praat.org/
  3. Breazeal, C. (2003). Emotion and sociable humanoid robots. International Journal of Human-Computer Studies, 59(1), 119-155. https://doi.org/10.1016/S1071-5819(03)00018-1
  4. Cahn, J. (1990). Generating expression in synthesized speech (Technical report). Boston, MA: MIT Media Lab.
  5. Carlson, R., Granstrom, B., & Nord, L. (1992, October). Experiments with emotive speech: Acted utterances and synthesized replicas. Proceedings of the International Conference on Spoken Language Processing (ICSLP-92) (pp. 671-674). Banff, AB, Canada.
  6. Kitahara, Y., & Tohkura, Y. (1992). Prosodic control to express emotions for man-machine interaction. IEICE Transactions on Fundamentals of Electronics: Communications and Computer Sciences, 75(2), 155-163.
  7. Kitayama, S., & Ishii, K. (2002). Word and voice: spontaneous attention to emotional utterances in two languages. Cognition and Emotion, 16(1), 29-59. https://doi.org/10.1080/0269993943000121
  8. Livingstone, S. R., & Russo, F. A. (2018). The Ryerson Audio- Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American english. PLOS ONE, 13(5), e0196391. https://doi.org/10.1371/journal.pone.0196391
  9. Mozziconacci, S. (1998). Speech variability and emotion: Production and perception (Doctoral dissertation). Technical University of Eindhoven, The Netherlands.
  10. Paulmann, S. (2016). The neurocognition of prosody. In G. Hickok, S. Small (Eds.), Neurobiology of language (pp. 1109-1120). San Diego, CA: Elsevier.
  11. Pell, M., Paulmann, M., Dara, S., Alasseri, A., & Kotzb, S. (2009). Factors in the recognition of vocally expressed emotions: A comparison of our languages. Journal of Phonetics, 37(4), 417-435. https://doi.org/10.1016/j.wocn.2009.07.005
  12. Thompson, W. F., & Balkwill, L. L. (2009). Cross-cultural similarities and differences. In P. N. Juslin, J. A. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications, 1st Edn, 755-791. New York, NY: Oxford University Press.
  13. Vroomen, J., Collier, R., & Mozziconacci, S. (1993)., September). Duration and intonation in emotional speech. Proceedings of the 3rd European Conference on Speech Communication and Technology. Eurospeech-93, 577-580. Berlin, Germany.
  14. Williams, C. & Stevens, K. (1972). Emotions and speech: Some acoustical correlates. The Journal of the Acoustical Society of America, 52(4B), 1238-1250. https://doi.org/10.1121/1.1913238
  15. Yi, S. P. (2018). Study on pitch contour extracted from Korean emotional speech using momel, Journal of Language Sciences, 25(3), 191-209. https://doi.org/10.14384/kals.2018.25.3.191
  16. Yoon, K. (2009a). Building a sentential model for automatic prosody evaluation. Phonetics and Speech Sciences, 1(4), 47-59.
  17. Yoon, K. (2009b). Synthesis and evaluation of prosodically exaggerated utterances. Phonetics and Speech Sciences, 1(3), 73-85.
  18. Yoon, K. (2013). A study on human evaluators using the evaluation model of english pronunciation. Phonetics and Speech Sciences, 5(4), 109-119. https://doi.org/10.13064/KSSS.2013.5.4.109