• Title/Summary/Keyword: Emotional speech

Search Result 182, Processing Time 0.028 seconds

The fundamental frequency (f0) distribution of American speakers in a spontaneous speech corpus

  • Byunggon Yang
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.11-16
    • /
    • 2024
  • The fundamental frequency (f0), representing an acoustic measure of vocal fold vibration, serves as an indicator of the speaker's emotional state and language-specific pattern in daily conversations. This study aimed to examine the f0 distribution in an English corpus of spontaneous speech, establishing normative data for American speakers. The corpus involved 40 participants engaging in free discussions on daily activities and personal viewpoints. Using Praat, f0 values were collected filtering outliers after removing nonspeech sounds and interviewer voices. Statistical analyses were performed with R. Results indicated a median f0 value of 145 Hz for all the speakers. The f0 values for all speakers exhibited a right-skewed, pointy distribution within a frequency range of 216 Hz from 75 Hz to 339 Hz. The female f0 range was wider than that of males, with a median of 113 Hz for males and 181 Hz for females. This spontaneous speech corpus provides valuable insights for linguists into f0 variation among individuals or groups in a language. Further research is encouraged to develop analytical and statistical measures for establishing reliable f0 standards for the general population.

An analysis of emotional English utterances using the prosodic distance between emotional and neutral utterances (영어 감정발화와 중립발화 간의 운율거리를 이용한 감정발화 분석)

  • Yi, So-Pae
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.25-32
    • /
    • 2020
  • An analysis of emotional English utterances with 7 emotions (calm, happy, sad, angry, fearful, disgust, surprised) was conducted using the measurement of prosodic distance between 672 emotional and 48 neutral utterances. Applying the technique proposed in the automatic evaluation model of English pronunciation to the present study on emotional utterances, Euclidean distance measurement of 3 prosodic elements such as F0, intensity and duration extracted from emotional and neutral utterances was utilized. This paper, furthermore, extended the analytical methods to include Euclidean distance normalization, z-score and z-score normalization resulting in 4 groups of measurement schemes (sqrF0, sqrINT, sqrDUR; norsqrF0, norsqrINT, norsqrDUR; sqrzF0, sqrzINT, sqrzDUR; norsqrzF0, norsqrzINT, norsqrzDUR). All of the results from perceptual analysis and acoustical analysis of emotional utteances consistently indicated the greater effectiveness of norsqrF0, norsqrINT and norsqrDUR, among 4 groups of measurement schemes, which normalized the Euclidean measurement. The greatest acoustical change of prosodic information influenced by emotion was shown in the values of F0 followed by duration and intensity in descending order according to the effect size based on the estimation of distance between emotional utterances and neutral counterparts. Tukey Post Hoc test revealed 4 homogeneous subsets (calm

Exploring Types of Verbal Violence Through Speech Analysis on Non-facing Channels (비대면 채널에서의 음성분석을 통한 언어폭력 유형 탐색)

  • Kim, Jongseon;Ahn, Seongjin
    • The Journal of Korean Association of Computer Education
    • /
    • v.23 no.3
    • /
    • pp.71-79
    • /
    • 2020
  • This study investigates the rising issue of verbal violence at non-facing channels. Focus Group Interview(FGI) was conducted to examine verbal violence occurred during emotional labors in real-life cases. In addition, the distribution of verbal violence in the conversation was confirmed through a new big data technology called Speech Analysis(SA). The result findings highlighted the two perspectives as below. First, verbal violence occurred through calls, is classified into personal insult, swearing/verbal abuse, unreasonable demand, (sexual) harassment and intimidation/threat. Second, Speech Analysis result exhibited the most frequently appeared verbal violence were personal insult and swearing/verbal abuse. Informal language use and speaking in disrespectable manner was the highest rate in personal insult category. Moreover general cursing was the highest rate in swearing/verbal abuse category. In particular, the rate of using curse language was the highest in overall cases of verbal violence. This study summarizes the types of verbal violence that occur in non-facing channels and suggests a need for further investigation on how verbal stress affects working environment for emotional labor.

The Meaning of Teachers as they Manifest themselves in the Emotional Regulation of 2 Year Old Infants (2세 영아의 정서조절 측면에서 나타나는 교사의 의미)

  • Kim, Bo-Young;Kim, Yong-Mi
    • Korean Journal of Child Studies
    • /
    • v.34 no.5
    • /
    • pp.17-41
    • /
    • 2013
  • The purpose of this research was to investigate meaning of teachers as they manifest themselves in the emotional regulation of 2year old infants in a daycare center. In addition, the research attempts to provide basic research data that can be used as a guideline for teacher's awareness, roles, attitudes, and classroom management for infant's emotional education in the future. To achieve this goal, participatory observation was conducted in a child care center class for infants under 2 years old from January 17 to January 29, 2012. The teacher is defined as follow : Teachers are authority figures whose image is that of absolute authority, and coupled with their dual role of passive caretakers. Additionally, they function as guides who guided infants through the process of emotional socialization, and played the central role of emotional contagion from whose expression speech and atmosphere the said infants receive much influence. Such results seemed to indicate that teachers today do not fully comprehend the importance of their roles in influencing the emotion regulation of infants.

Emotional Intelligence System for Ubiquitous Smart Foreign Language Education Based on Neural Mechanism

  • Dai, Weihui;Huang, Shuang;Zhou, Xuan;Yu, Xueer;Ivanovi, Mirjana;Xu, Dongrong
    • Journal of Information Technology Applications and Management
    • /
    • v.21 no.3
    • /
    • pp.65-77
    • /
    • 2014
  • Ubiquitous learning has aroused great interest and is becoming a new way for foreign language education in today's society. However, how to increase the learners' initiative and their community cohesion is still an issue that deserves more profound research and studies. Emotional intelligence can help to detect the learner's emotional reactions online, and therefore stimulate his interest and the willingness to participate by adjusting teaching skills and creating fun experiences in learning. This is, actually the new concept of smart education. Based on the previous research, this paper concluded a neural mechanism model for analyzing the learners' emotional characteristics in ubiquitous environment, and discussed the intelligent monitoring and automatic recognition of emotions from the learners' speech signals as well as their behavior data by multi-agent system. Finally, a framework of emotional intelligence system was proposed concerning the smart foreign language education in ubiquitous learning.

Emotion Recognition Based on Frequency Analysis of Speech Signal

  • Sim, Kwee-Bo;Park, Chang-Hyun;Lee, Dong-Wook;Joo, Young-Hoon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.2 no.2
    • /
    • pp.122-126
    • /
    • 2002
  • In this study, we find features of 3 emotions (Happiness, Angry, Surprise) as the fundamental research of emotion recognition. Speech signal with emotion has several elements. That is, voice quality, pitch, formant, speech speed, etc. Until now, most researchers have used the change of pitch or Short-time average power envelope or Mel based speech power coefficients. Of course, pitch is very efficient and informative feature. Thus we used it in this study. As pitch is very sensitive to a delicate emotion, it changes easily whenever a man is at different emotional state. Therefore, we can find the pitch is changed steeply or changed with gentle slope or not changed. And, this paper extracts formant features from speech signal with emotion. Each vowels show that each formant has similar position without big difference. Based on this fact, in the pleasure case, we extract features of laughter. And, with that, we separate laughing for easy work. Also, we find those far the angry and surprise.

Analysis and synthesis of pseudo-periodicity on voice using source model approach (음성의 준주기적 현상 분석 및 구현에 관한 연구)

  • Jo, Cheolwoo
    • Phonetics and Speech Sciences
    • /
    • v.8 no.4
    • /
    • pp.89-95
    • /
    • 2016
  • The purpose of this work is to analyze and synthesize the pseudo-periodicity of voice using a source model. A speech signal has periodic characteristics; however, it is not completely periodic. While periodicity contributes significantly to the production of prosody, emotional status, etc., pseudo-periodicity contributes to the distinctions between normal and abnormal status, the naturalness of normal speech, etc. Measurement of pseudo-periodicity is typically performed through parameters such as jitter and shimmer. For studying the pseudo-periodic nature of voice in a controlled environment, through collected natural voice, we can only observe the distributions of the parameters, which are limited by the size of collected data. If we can generate voice samples in a controlled manner, experiments that are more diverse can be conducted. In this study, the probability distributions of vowel pitch variation are obtained from the speech signal. Based on the probability distribution of vocal folds, pulses with a designated jitter value are synthesized. Then, the target and re-analyzed jitter values are compared to check the validity of the method. It was found that the jitter synthesis method is useful for normal voice synthesis.

Multimodal Emotion Recognition using Face Image and Speech (얼굴영상과 음성을 이용한 멀티모달 감정인식)

  • Lee, Hyeon Gu;Kim, Dong Ju
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.8 no.1
    • /
    • pp.29-40
    • /
    • 2012
  • A challenging research issue that has been one of growing importance to those working in human-computer interaction are to endow a machine with an emotional intelligence. Thus, emotion recognition technology plays an important role in the research area of human-computer interaction, and it allows a more natural and more human-like communication between human and computer. In this paper, we propose the multimodal emotion recognition system using face and speech to improve recognition performance. The distance measurement of the face-based emotion recognition is calculated by 2D-PCA of MCS-LBP image and nearest neighbor classifier, and also the likelihood measurement is obtained by Gaussian mixture model algorithm based on pitch and mel-frequency cepstral coefficient features in speech-based emotion recognition. The individual matching scores obtained from face and speech are combined using a weighted-summation operation, and the fused-score is utilized to classify the human emotion. Through experimental results, the proposed method exhibits improved recognition accuracy of about 11.25% to 19.75% when compared to the most uni-modal approach. From these results, we confirmed that the proposed approach achieved a significant performance improvement and the proposed method was very effective.

Evaluation of Synthetic Voice which is Agreeable to the Ear Using Sensibility Ergonomics Method (감성 평가를 이용한 듣기 좋은 음성 합성음에 대한 연구)

  • Park, Yong-Kuk;Kim, Jae-Kuk;Jeon, Yong-Woong;Cho, Am
    • Journal of the Ergonomics Society of Korea
    • /
    • v.21 no.1
    • /
    • pp.51-65
    • /
    • 2002
  • As the method of providing information is getting multimedia, the synthetic voice is used in not only CTI(Computer Telephony Integration), information service for the blind, but also applications on internet. But properties of synthetic voice, such as speech rate, pitch, timbre and so on, are not adjusted to customers' preference but providers' preference. In order to consider customers' preference, this study proposed four subjective factors of voice through the evaluation of voice using the method of sensibility ergonomics. And the relation synthetic voice to be agreeable to the ear with emotional images was formulated as a fuzzy model. Consequently, this study proposed the speech rate and pitch of synthetic voice which is agreeable to the ear.

The Relationship between Acoustic Characteristics and Voice Handicap Index in Esophageal Speakers (식도발성 환자의 음향학적 특성과 음성장애지수의 상관성)

  • Jang, Hyo-Ryung;Shim, Hee-Jeong;Shin, Hee-Baek;Ko, Do-Heung;Kim, Hyun-Ki
    • Phonetics and Speech Sciences
    • /
    • v.6 no.2
    • /
    • pp.115-121
    • /
    • 2014
  • This paper investigates the relationship between acoustic characteristics and voice handicap index for 29 males with esophageal speakers. Acoustic characteristics were measured by using a sustained vowel /a/ three times. The stable vocalization for 2 seconds was analyzed by MDVP program. Specifically, relationships between four VHI scores (total, functional, physical, and emotional) and three acoustic characteristics (jitter, shimmer, and NHR) were investigated using the Pearson correlation coefficient. As results, we found no relationship between NHR and VHI scores. However, both jitter and shimmer had statistically significant correlations with all four VHI scores. This research will contribute to establishing a baseline related to speech characteristics in voice rehabilitation with esophageal speakers. Further research could be done to examine the overall quality of life survey, which is widely used as a subjective measure about voice for patients with esophageal speakers.