• Title/Summary/Keyword: experimental phonetics

Search Result 89, Processing Time 0.022 seconds

A perceptual study on the correlation between the meaning of Korean polysemic ending and its boundary tone (동형다의 종결어미의 의미와 경계성조의 상관성에 대한 지각연구)

  • Youngsook Yune
    • Phonetics and Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.1-10
    • /
    • 2022
  • The Korean polysemic ending '-(eu)lgeol' can has two different meanings, 'guess' and 'regret'. These are expressed by different boundary-tone types: a rising tone for guess, a falling one for regret. Therefore the sentence-final boundary-tone type is the most salient prosodic feature. However, besides tone type, the pitch difference between the final and penultimate syllables of '-(eu)lgeol' can also affect semantic discrimination. To investigate this aspect, we conducted a perception test using two sentences that were morphologically and syntactically identical. These two sentences were spoken using different boundary-tone types by a Korean native speaker. From these two sentences, the experimental stimuli were generated by artificially raising or lowering the pitch of the boundary syllable by 1Qt while fixing the pitch of the penultimate syllable and boundary-tone type. Thirty Korean native speakers participated in three levels of perceptual test, in which they were asked to mark whether the experimental sentences they listened to were perceived as guess or regret. The results revealed that regardless of boundary-tone types, the larger the pitch difference between the final and penultimate syllable in the positive direction, the more likely it is perceived as guess, and the smaller the pitch difference in the negative direction, the more likely it is perceived as regret.

A STUDY ON THE INFLUENCE OF THE PALATAL PLATES UPON THE DURATION OF KOREAN SOUNDS (구개상 장착에 따른 한국어 어음의 조음시간 변화에 관한 연구)

  • Koh, Yeo-Joon;Kim, Chang-Whe;Kim, Yong-Soo
    • The Journal of Korean Academy of Prosthodontics
    • /
    • v.32 no.1
    • /
    • pp.77-102
    • /
    • 1994
  • Many studies have been made on the masticatory and esthetic effects of prosthodontic treatments, but few on the restoration of pronunciation, especially in complete denture wearers. The purpose of this study is to provide a basis that could be of help to the complete denture wearers' speech adaptation by analyzing the influence of the palatal coverage upon the duration of consonants and vowels with the method of experimental phonetics. For this study, metal plates and resin plates were made for 3 male subjects in their twenties, who have good occlusion, and do not have speech and hearing disorders. Then 8 Korean consonants and 4 Korean vowels were selected, systemically considering phonetic variants such as the place and manner of articulation, lenis/fortis, mutual effect of each phoneme, etc. They were combined into meaningless tested words in the form of /VCV/, and were included in the carrier sentences. Each informant uttered the sentences 1) without the plate, 2) with the metal plate, 3) with the resin plate. The recorded data were analyzed through the waveform of sounds and spectrogram by using the program SoundEdit, Signalize, Statview 512+for the Macintosh computer. The duration of each segment was measured by searching for the boundaries between the preceding vowels and consonants, and between the consonants and the following vowels. The study led to the conclusion that. 1. With the palatal plate, the duration of all the tested words increased and the duration increased more with the resin plate than with the metal plate. 2. With the palatal plate, the duration of all the preceding vowels, consonants, and following vowels increased, but the temporal structure of the tested words was maintained. 3. As for the manner of articulation, fricative /s/(ㅅ) was greatly influenced by both kinds of palatal plates. 4. As for the place of articulation, alveolar sounds /d/(ㄷ), /n/(ㄴ) were greatly influnced by the kinds of palatal plates, and the velar sounds /n/(ㅇ), /g/(ㄱ) were influenced by the platal plates, but the kind of the palatal plates did not show any significance. 5. As for the lenis/fortis, lenis was influenced more by the kind of the palatal plates. 6. As for the influence of vowels upon each segment in the tested words, palatal vowel /i/(ㅣ) had greater influence than pharyngeal vowel /a/(ㅏ), and following vowels than preceding vowels.

  • PDF

Compromised feature normalization method for deep neural network based speech recognition (심층신경망 기반의 음성인식을 위한 절충된 특징 정규화 방식)

  • Kim, Min Sik;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.65-71
    • /
    • 2020
  • Feature normalization is a method to reduce the effect of environmental mismatch between the training and test conditions through the normalization of statistical characteristics of acoustic feature parameters. It demonstrates excellent performance improvement in the traditional Gaussian mixture model-hidden Markov model (GMM-HMM)-based speech recognition system. However, in a deep neural network (DNN)-based speech recognition system, minimizing the effects of environmental mismatch does not necessarily lead to the best performance improvement. In this paper, we attribute the cause of this phenomenon to information loss due to excessive feature normalization. We investigate whether there is a feature normalization method that maximizes the speech recognition performance by properly reducing the impact of environmental mismatch, while preserving useful information for training acoustic models. To this end, we introduce the mean and exponentiated variance normalization (MEVN), which is a compromise between the mean normalization (MN) and the mean and variance normalization (MVN), and compare the performance of DNN-based speech recognition system in noisy and reverberant environments according to the degree of variance normalization. Experimental results reveal that a slight performance improvement is obtained with the MEVN over the MN and the MVN, depending on the degree of variance normalization.

Phonological phrase boundary and word frequency that influence the phonological word recognition (음운구 경계와 단어빈도가 한국어 음운단어 재인에 미치는 영향)

  • Kim, Jeahong;Shin, Hasun;Kim, Yeseul;Yun, Gwangyeol;Kim, Daseul;Shin, Jiyoung;Nam, Kichun
    • Phonetics and Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.45-56
    • /
    • 2019
  • This study investigated the interaction between phonological phrase boundary and word frequency variable in Korean speech processing. A word monitoring task was performed to examine the interference caused by the frequency effect of target word depending on whether a phonological phrase is formed within the target word. Frequency of target word (high vs low) and phonological phrase boundary (within target word vs between target words) were applied as between and within subject condition respectively. Our results showed the significant main effect of the phonological phrase boundary and the significant interaction. In the post-hoc analysis, the high-frequency target words were detected significantly faster than the low-frequency target words only in the within phonological phrase boundary condition. Frequency effect in the between phonological phrase boundary condition did not appear. The results indicated that the phonological phrase boundary and word frequency variable played an important role in Korean speech processing. In particular, we discussed the possibility of processing the word frequency at the very early sensory information processing stage based on the interaction of two experimental factors.

Automatic severity classification of dysarthria using voice quality, prosody, and pronunciation features (음질, 운율, 발음 특징을 이용한 마비말장애 중증도 자동 분류)

  • Yeo, Eun Jung;Kim, Sunhee;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.57-66
    • /
    • 2021
  • This study focuses on the issue of automatic severity classification of dysarthric speakers based on speech intelligibility. Speech intelligibility is a complex measure that is affected by the features of multiple speech dimensions. However, most previous studies are restricted to using features from a single speech dimension. To effectively capture the characteristics of the speech disorder, we extracted features of multiple speech dimensions: voice quality, prosody, and pronunciation. Voice quality consists of jitter, shimmer, Harmonic to Noise Ratio (HNR), number of voice breaks, and degree of voice breaks. Prosody includes speech rate (total duration, speech duration, speaking rate, articulation rate), pitch (F0 mean/std/min/max/med/25quartile/75 quartile), and rhythm (%V, deltas, Varcos, rPVIs, nPVIs). Pronunciation contains Percentage of Correct Phonemes (Percentage of Correct Consonants/Vowels/Total phonemes) and degree of vowel distortion (Vowel Space Area, Formant Centralized Ratio, Vowel Articulatory Index, F2-Ratio). Experiments were conducted using various feature combinations. The experimental results indicate that using features from all three speech dimensions gives the best result, with a 80.15 F1-score, compared to using features from just one or two speech dimensions. The result implies voice quality, prosody, and pronunciation features should all be considered in automatic severity classification of dysarthria.

Modified AWSSDR method for frequency-dependent reverberation time estimation (주파수 대역별 잔향시간 추정을 위한 변형된 AWSSDR 방식)

  • Min Sik Kim;Hyung Soon Kim
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.91-100
    • /
    • 2023
  • Reverberation time (T60) is a typical acoustic parameter that provides information about reverberation. Since the impacts of reverberation vary depending on the frequency bands even in the same space, frequency-dependent (FD) T60, which offers detailed insights into the acoustic environments, can be useful. However, most conventional blind T60 estimation methods, which estimate the T60 from speech signals, focus on fullband T60 estimation, and a few blind FDT60 estimation methods commonly show poor performance in the low-frequency bands. This paper introduces a modified approach based on Attentive pooling based Weighted Sum of Spectral Decay Rates (AWSSDR), previously proposed for blind T60 estimation, by extending its target from fullband T60 to FDT60. The experimental results show that the proposed method outperforms conventional blind FDT60 estimation methods on the acoustic characterization of environments (ACE) challenge evaluation dataset. Notably, it consistently exhibits excellent estimation performance in all frequency bands. This demonstrates that the mechanism of the AWSSDR method is valuable for blind FDT60 estimation because it reflects the FD variations in the impact of reverberation, aggregating information about FDT60 from the speech signal by processing the spectral decay rates associated with the physical properties of reverberation in each frequency band.

Machine-learning-based out-of-hospital cardiac arrest (OHCA) detection in emergency calls using speech recognition (119 응급신고에서 수보요원과 신고자의 통화분석을 활용한 머신 러닝 기반의 심정지 탐지 모델)

  • Jong In Kim;Joo Young Lee;Jio Chung;Dae Jin Shin;Dong Hyun Choi;Ki Hong Kim;Ki Jeong Hong;Sunhee Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.109-118
    • /
    • 2023
  • Cardiac arrest is a critical medical emergency where immediate response is essential for patient survival. This is especially true for Out-of-Hospital Cardiac Arrest (OHCA), for which the actions of emergency medical services in the early stages significantly impact outcomes. However, in Korea, a challenge arises due to a shortage of dispatcher who handle a large volume of emergency calls. In such situations, the implementation of a machine learning-based OHCA detection program can assist responders and improve patient survival rates. In this study, we address this challenge by developing a machine learning-based OHCA detection program. This program analyzes transcripts of conversations between responders and callers to identify instances of cardiac arrest. The proposed model includes an automatic transcription module for these conversations, a text-based cardiac arrest detection model, and the necessary server and client components for program deployment. Importantly, The experimental results demonstrate the model's effectiveness, achieving a performance score of 79.49% based on the F1 metric and reducing the time needed for cardiac arrest detection by 15 seconds compared to dispatcher. Despite working with a limited dataset, this research highlights the potential of a cardiac arrest detection program as a valuable tool for responders, ultimately enhancing cardiac arrest survival rates.

A Study on Speech Recognition Using the HM-Net Topology Design Algorithm Based on Decision Tree State-clustering (결정트리 상태 클러스터링에 의한 HM-Net 구조결정 알고리즘을 이용한 음성인식에 관한 연구)

  • 정현열;정호열;오세진;황철준;김범국
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.2
    • /
    • pp.199-210
    • /
    • 2002
  • In this paper, we carried out the study on speech recognition using the KM-Net topology design algorithm based on decision tree state-clustering to improve the performance of acoustic models in speech recognition. The Korean has many allophonic and grammatical rules compared to other languages, so we investigate the allophonic variations, which defined the Korean phonetics, and construct the phoneme question set for phonetic decision tree. The basic idea of the HM-Net topology design algorithm is that it has the basic structure of SSS (Successive State Splitting) algorithm and split again the states of the context-dependent acoustic models pre-constructed. That is, it have generated. the phonetic decision tree using the phoneme question sets each the state of models, and have iteratively trained the state sequence of the context-dependent acoustic models using the PDT-SSS (Phonetic Decision Tree-based SSS) algorithm. To verify the effectiveness of the above algorithm we carried out the speech recognition experiments for 452 words of center for Korean language Engineering (KLE452) and 200 sentences of air flight reservation task (YNU200). Experimental results show that the recognition accuracy has progressively improved according to the number of states variations after perform the splitting of states in the phoneme, word and continuous speech recognition experiments respectively. Through the experiments, we have got the average 71.5%, 99.2% of the phoneme, word recognition accuracy when the state number is 2,000, respectively and the average 91.6% of the continuous speech recognition accuracy when the state number is 800. Also we haute carried out the word recognition experiments using the HTK (HMM Too1kit) which is performed the state tying, compared to share the parameters of the HM-Net topology design algorithm. In word recognition experiments, the HM-Net topology design algorithm has an average of 4.0% higher recognition accuracy than the context-dependent acoustic models generated by the HTK implying the effectiveness of it.

Comparative Study on Acoustic Characteristics of Vocal Fold Paralysis and Benign Mucosal Disorders of Vocal Fold (성대마비와 양성 성대점막질환의 음향학적 특성비교)

  • Kong, Il-Seung;Cho, Young-Ju;Lee, Myung-Hee;Kim, Jong-Seung;Yang, Yun-Su;Hong, Ki-Hwan
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.18 no.2
    • /
    • pp.122-128
    • /
    • 2007
  • This study aims to analyze the voices of the patients with voice disorders including vocal fold paralysis, vocal fold cyst and vocal nodule/polyp in the aspect of acoustic phonetics. This study intends to collect subsidiary acoustic data in order to make a speech treatment and an standardization of vocal disorders. Subjects and Methods: The subjects of this study were 64 adult patients who underwent indirect laryngoscopy and laryngostroboscopy, and were diagnosed as vocal fold paralysis, vocal fold cyst or vocal nodule/polyp. Experimental group consisted of 20 patients who were diagnosed as vocal fold paralysis, 21 patients who were diagnosed as vocal fold cyst and had the average age of 42.0 $({\pm}10.03)$ ; and 23 patients who were diagnosed as vocal nodule/polyp and had the average age of 40.9 $({\pm}13.75)$. For the methodology of this study, the patients listed above were asked to sit in a comfortable position at intervals of 10cm apart from the patient's mouth and a microphone, and subsequently to phonate a vowel sound /e/ for the maximum phonation time with natural tone and vocal volume then the sound was directly inputted on a computer. During recording, sampling rate was set to 44,100Hz and the 1-second area corresponding to stable zone except the first and the last stage of waveform of the vowel sound /e/ vocalized by the individual patients was analyzed. Results: First, there was no statistically significant difference in jitter and shimmer between vocal fold paralysis and vocal fold cyst, while there was highly statistically significant difference in them between vocal fold paralysis and vocal nodule/polyp. Second, looking into the mean values obtained from NNE, HNR and SNR results associated with noise ratio, the disease showing the most abnormal characteristics was vocal fold paralysis, followed by cyst and nodule/polyp in order. For NNE, there was statistically significant difference between vocal nodule/polyp, and cyst or paralysis. In other words, it was found that the NNE of vocal nodule/polyp was weaker than that of cyst or paralysis. Similarly, HNR and SNR also showed the same characteristics; there was statistically significant difference between vocal fold paralysis and vocal fold cyst or nodule/polyp, and HNR and SNR values of vocal fold paralysis were lower than those of vocal fold cyst or nodule/polyp. Conclusion: For vocal fold paralysis, the abnormal values of acoustic parameters associated with frequency, amplitude and noise ratio were statistically significantly higher than those of vocal fold cyst and nodule/polyp. This finding suggests that the voices of the patients with vocal fold paralysis are the most severely injured due to less stability of vocal fold movement, asymmetry and incomplete glottic closure. In addition, there was no statistically significant difference in the acoustic parameters of tremor among vocal fold paralysis, vocal fold cyst and vocal nodule/polyp. Further studies need to ascertain reasonable acoustic parameters with various vocal disorders as well as to clarify the correlation between acoustics-based objective tools and subjective evaluations.

  • PDF