Search | Korea Science

Spontaneous Speech Emotion Recognition Based On Spectrogram With Convolutional Neural Network (CNN 기반 스펙트로그램을 이용한 자유발화 음성감정인식)

Guiyoung Son;Soonil Kwon
- The Transactions of the Korea Information Processing Society
- /
- v.13 no.6
- /
- pp.284-290
- /
- 2024
Speech emotion recognition (SER) is a technique that is used to analyze the speaker's voice patterns, including vibration, intensity, and tone, to determine their emotional state. There has been an increase in interest in artificial intelligence (AI) techniques, which are now widely used in medicine, education, industry, and the military. Nevertheless, existing researchers have attained impressive results by utilizing acted-out speech from skilled actors in a controlled environment for various scenarios. In particular, there is a mismatch between acted and spontaneous speech since acted speech includes more explicit emotional expressions than spontaneous speech. For this reason, spontaneous speech-emotion recognition remains a challenging task. This paper aims to conduct emotion recognition and improve performance using spontaneous speech data. To this end, we implement deep learning-based speech emotion recognition using the VGG (Visual Geometry Group) after converting 1-dimensional audio signals into a 2-dimensional spectrogram image. The experimental evaluations are performed on the Korean spontaneous emotional speech database from AI-Hub, consisting of 7 emotions, i.e., joy, love, anger, fear, sadness, surprise, and neutral. As a result, we achieved an average accuracy of 83.5% and 73.0% for adults and young people using a time-frequency 2-dimension spectrogram, respectively. In conclusion, our findings demonstrated that the suggested framework outperformed current state-of-the-art techniques for spontaneous speech and showed a promising performance despite the difficulty in quantifying spontaneous speech emotional expression.
https://doi.org/10.3745/TKIPS.2024.13.6.284 인용 PDF

A study on the predictability of acoustic power distribution of English speech for English academic achievement in a Science Academy (과학영재학교 재학생 영어발화 주파수 대역별 음향 에너지 분포의 영어 성취도 예측성 연구)

Park, Soon;Ahn, Hyunkee
- Phonetics and Speech Sciences
- /
- v.14 no.3
- /
- pp.41-49
- /
- 2022
The average acoustic distribution of American English speakers was statistically compared with the English-speaking patterns of gifted students in a Science Academy in Korea. By analyzing speech recordings, the duration time of which is much longer than in previous studies, this research identified the degree of acoustic proximity between the two parties and the predictability of English academic achievement of gifted high school students. Long-term spectral acoustic power distribution vectors were obtained for 2,048 center frequencies in the range of 20 Hz to 20,000 Hz by applying an long-term average speech spectrum (LTASS) MATLAB code. Three more variables were statistically compared to discover additional indices that can predict future English academic achievement: the receptive vocabulary size test, the cumulative vocabulary scores of English formative assessment, and the English Speaking Proficiency Test scores. Linear regression and correlational analyses between the four variables showed that the receptive vocabulary size test and the low-frequency vocabulary formative assessments which require both lexical and domain-specific science background knowledge are relatively more significant variables than a basic suprasegmental level English fluency in the predictability of gifted students' academic achievement.
https://doi.org/10.13064/KSSS.2022.14.3.041 인용 PDF KSCI

The f0 distribution of Korean speakers in a spontaneous speech corpus

Yang, Byunggon
- Phonetics and Speech Sciences
- /
- v.13 no.3
- /
- pp.31-37
- /
- 2021
The fundamental frequency, or f0, is an important acoustic measure in the prosody of human speech. The current study examined the f0 distribution of a corpus of spontaneous speech in order to provide normative data for Korean speakers. The corpus consists of 40 speakers talking freely about their daily activities and their personal views. Praat scripts were created to collect f0 values, and a majority of obvious errors were corrected manually by watching and listening to the f0 contour on a narrow-band spectrogram. Statistical analyses of the f0 distribution were conducted using R. The results showed that the f0 values of all the Korean speakers were right-skewed, with a pointy distribution. The speakers produced spontaneous speech within a frequency range of 274 Hz (from 65 Hz to 339 Hz), excluding statistical outliers. The mode of the total f0 data was 102 Hz. The female f0 range, with a bimodal distribution, appeared wider than that of the male group. Regression analyses based on age and f0 values yielded negligible R-squared values. As the mode of an individual speaker could be predicted from the median, either the median or mode could serve as a good reference for the individual f0 range. Finally, an analysis of the continuous f0 points of intonational phrases revealed that the initial and final segments of the phrases yielded several f0 measurement errors. From these results, we conclude that an examination of a spontaneous speech corpus can provide linguists with useful measures to generalize acoustic properties of f0 variability in a language by an individual or groups. Further studies would be desirable of the use of statistical measures to secure reliable f0 values of individual speakers.
https://doi.org/10.13064/KSSS.2021.13.3.031 인용 PDF KSCI

Occupational Performance of Hearing-Impaired and Normal-Hearing Workers in Korea

Kim, Jinsook;Shin, Yerim;Lee, Seungwan;Lee, Eunsung;Han, Woojae;Lee, Jihyeon
- Journal of Audiology & Otology
- /
- v.25 no.4
- /
- pp.189-198
- /
- 2021
Background and Objectives: This study aimed to investigate the occupational performance of Korean workers with and without hearing loss and analyze the hearing-related difficulties in the working environment. Subjects and Methods: The Amsterdam checklist for hearing and work was used for the analyses and the occupational environments of the Korean workers were investigated. Out of 129 total participants, 86 workers experienced severe to profound hearing loss and 43 had the normal hearing ability. The hearing-impaired workers were recruited from two leading vocational centers and normal-hearing workers were their colleagues. Results: The hearing-impaired workers were found to take fewer sick leaves and exhibited higher rates of permanent job statuses compared to the normal-hearing workers. Workers with hearing loss rarely detected background sound; however, they could perceive reverberation more frequently. They felt more satisfied with their careers than the normal hearing workers as they received social support and needed to put their effort into hearing for most hearing activities. Furthermore, the effort in hearing increased with the increase in job demand, job control, social support, and career satisfaction. The working hours per week increased with the increase in age, education level, job demand, job control, and social support. Different trends were observed in 9 out of 12 variables while comparing the data from the present study with that obtained from the hearing-impaired workers of the Netherlands, indicating a large difference between countries. Conclusions: Although the hearing-impaired Korean workers operate diligently with good job positions, it is necessary to enhance their acoustic environment and provide them social support. Considering the cultural background of the hearing-impaired workers, the development of suitable vocational rehabilitation programs and specific questionnaires is strongly recommended worldwide.
https://doi.org/10.7874/jao.2021.00185 인용

Occupational Performance of Hearing-Impaired and Normal-Hearing Workers in Korea

Kim, Jinsook;Shin, Yerim;Lee, Seungwan;Lee, Eunsung;Han, Woojae;Lee, Jihyeon
- Korean Journal of Audiology
- /
- v.25 no.4
- /
- pp.189-199
- /
- 2021
Background and Objectives: This study aimed to investigate the occupational performance of Korean workers with and without hearing loss and analyze the hearing-related difficulties in the working environment. Subjects and Methods: The Amsterdam checklist for hearing and work was used for the analyses and the occupational environments of the Korean workers were investigated. Out of 129 total participants, 86 workers experienced severe to profound hearing loss and 43 had the normal hearing ability. The hearing-impaired workers were recruited from two leading vocational centers and normal-hearing workers were their colleagues. Results: The hearing-impaired workers were found to take fewer sick leaves and exhibited higher rates of permanent job statuses compared to the normal-hearing workers. Workers with hearing loss rarely detected background sound; however, they could perceive reverberation more frequently. They felt more satisfied with their careers than the normal hearing workers as they received social support and needed to put their effort into hearing for most hearing activities. Furthermore, the effort in hearing increased with the increase in job demand, job control, social support, and career satisfaction. The working hours per week increased with the increase in age, education level, job demand, job control, and social support. Different trends were observed in 9 out of 12 variables while comparing the data from the present study with that obtained from the hearing-impaired workers of the Netherlands, indicating a large difference between countries. Conclusions: Although the hearing-impaired Korean workers operate diligently with good job positions, it is necessary to enhance their acoustic environment and provide them social support. Considering the cultural background of the hearing-impaired workers, the development of suitable vocational rehabilitation programs and specific questionnaires is strongly recommended worldwide.
https://doi.org/10.7874/jao.2021.00185 인용

Vocal acoustic characteristics of speakers with depression (우울증 화자 음성의 음향음성학적 특성)

Baek, Yeon-Sook;Kim, Se-Joo;Kim, Eun-Yeon;Choi, Yae-Lin
- Phonetics and Speech Sciences
- /
- v.4 no.1
- /
- pp.91-98
- /
- 2012
The purposes of this paper is to study the characteristics of compared to the speakers voice without depression and speakers with depression, and to propose a objective method for the measurement of the therapeutic effects as well as for diagnostics of depression based on the characteristics. The voice samples obtained from 11 female speakers with depression, aged from 20 to 40, diagnosed as having major depressive disorder by an psychiatrist were compared with those from 12 normal controls with matched sex, age, height, weight, education, smoking, and drinking. The voice samples are taken by a portable digital recorder(TASCAM DR-07, Japan) and analysed using the MDVP(Multi-Dimentional Voice Program) software module from CSL(Computerized Speech Lab, kay elemetrics, co, model 4100). The result of the investigation are as following. First, the average speaking fundamental frequency and loudness range of the speakers with depression group was statistically significantly lower than that of the control group. The pitch range of the control group was rather higher than that of the speakers with depression group, but without statistical significance. Overall speech rates have no statistical difference between two groups. Second, the average speaking fundamental frequency and loudness range have statistically significant negative correlation with Beck Depression Inventory, i. e. more severe depression exhibits lower average speaking fundamental frequency and loudness range. Other vocal parameters such as pitch range and overall speech rate have no statistically meaningful correlations with Beck Depression Inventory.
https://doi.org/10.13064/KSSS.2012.4.1.091 인용 PDF

Speech Problems of English Laterals by Korean Learners based on the acoustic Characteristics (한국인 영어 학습자의 설측음 발화의 문제점: 음향음성학적 특성을 중심으로)

Kim, Chong-Gu;Kim, Hyun-Gi;Jeon, Byung-Man
- Speech Sciences
- /
- v.7 no.3
- /
- pp.127-138
- /
- 2000
The aim of this paper is to find the speech problems of English Laterals by Korean learners and to contribute to the effective pronunciation education with visualizing the pronunciation. In this paper we analyzed 18 words including lateral sounds which were divided into such as: initial, initial consonant cluster, intervocalic, final consonant cluster, and final. To analyse the words we used High speed speech analysis system. We examined acoustic characteristics of English lateral spectrogram by using voice sustained time(ms), FL1, FL2, FL3. Before we started, we had expected that the result would show us that the mother tongue interfere in the final sounds because we have similar sounds in Korea. The results of our experiments showed that initially, voice sustained time showed many more differences between Korean and native pronunciation. Also, it was seen that Korean pronunciation used the syllable structure of the own mother tongue. For instance, in the case of initial consonant cluster CCVC, Koreans often used CC as a syllable and VC as another. This was due to the mother tongue interference. For this reason in the intervocalic and in the final, we saw the differences between Korean and native. Therefore we have to accept the visualized analysis system in the instruction of pronunciation.
PDF

Sums-of-Products Models for Korean Segment Duration Prediction

Chung, Hyun-Song
- Speech Sciences
- /
- v.10 no.4
- /
- pp.7-21
- /
- 2003
Sums-of-Products models were built for segment duration prediction of spoken Korean. An experiment for the modelling was carried out to apply the results to Korean text-to-speech synthesis systems. 670 read sentences were analyzed. trained and tested for the construction of the duration models. Traditional sequential rule systems were extended to simple additive, multiplicative and additive-multiplicative models based on Sums-of-Products modelling. The parameters used in the modelling include the properties of the target segment and its neighbors and the target segment's position in the prosodic structure. Two optimisation strategies were used: the downhill simplex method and the simulated annealing method. The performance of the models was measured by the correlation coefficient and the root mean squared prediction error (RMSE) between actual and predicted duration in the test data. The best performance was obtained when the data was trained and tested by ' additive-multiplicative models. ' The correlation for the vowel duration prediction was 0.69 and the RMSE. 31.80 ms. while the correlation for the consonant duration prediction was 0.54 and the RMSE. 29.02 ms. The results were not good enough to be applied to the real-time text-to-speech systems. Further investigation of feature interactions is required for the better performance of the Sums-of-Products models.
PDF

Relationship between executive function and cue weighting in Korean stop perception across different dialects and ages

Kong, Eun Jong;Lee, Hyunjung
- Phonetics and Speech Sciences
- /
- v.13 no.3
- /
- pp.21-29
- /
- 2021
The present study investigated how one's cognitive resources are related to speech perception by examining Korean speakers' executive function (EF) capacity and its association with voice onset time (VOT) and f0 sensitivity in identifying Korean stop laryngeal categories (/t'/ vs. /t/ vs. /t^h/). Previously, Kong et al. (under revision) reported that Korean listeners (N = 154) in Seoul and Changwon (Gyeongsang) showed differential group patterns in dialect-specific cue weightings across educational institutions (college, high school, and elementary school). We follow up this study by further relating their EF control (working memory, mental flexibility, and inhibition) to their speech perception patterns to examine whether better cognitive ability would control attention to multiple acoustic dimensions. Partial correlation analyses revealed that better EFs in Korean listeners were associated with greater sensitivity to available acoustic details and with greater suppression of irrelevant acoustic information across subgroups, although only a small set of EF components turned out to be relevant. Unlike Seoul participants, Gyeongsang listeners' f0 use was not correlated with any EF task scores, reflecting dialect-specific cue primacy using f0 as a secondary cue. The findings confirm the link between speech perception and general cognitive ability, providing experimental evidence from Korean listeners.
https://doi.org/10.13064/KSSS.2021.13.3.021 인용 PDF KSCI

An Experimental Research on the Room Acoustical Environment of the Elementary School Classrooms (초등학교 교실의 음환경 평가에 관한 실험적 연구)

Haan, Chan-Hoon;Moon, Kyu-Chun
- Journal of the Korean Institute of Educational Facilities
- /
- v.11 no.1
- /
- pp.5-14
- /
- 2004
Since 1990s in Korea, elementary school classrooms have been designed toward open education system in pursuit of variety of educational purpose. Also, the architectural designs of schools have been acomplished for individual school not based on the standard design code. The present paper aims to investigate the acoustic environment of existing classrooms and to compare the sound insulation capacity between the ordinary classrooms and the newly built classrooms for open education. The current acoustical situation of elementary classrooms was analyzed using field measurements and questionnaire survey. In order to this, Three elementary schools were selected which were built in 1978, 1996 and 2000 respectively. Room acoustical parameters including Reverberation time(RT), Definition(D50), Speech Intelligibility(RASTI), Transmission loss(TL) and STC were measured in a classroom in each elementary school classroom. Each measurement was undertaken with the windows and doors being open or closed. As the result, it was found that the transmission loss between rooms in open classrooms is, $5{\sim}6dB$ in average, inferior than the ordinary classrooms. The RASTI of 0.70 was measured in newly built classrooms which is better than old classrooms(0.70) and open classrooms(0.73). This was shown as same in the speech definition measurements. This results from the condition of sealing and airtightness of classrooms and floor materials. The results denote that open classrooms have poor acoustic condition in sound insulation and speech intelligibility.
PDF KSCI

Search Result 438, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)