Search | Korea Science

A Study of Korean Phonetic and Phonological Properties for Speech Recognition and Synthesis (음성 인식/합성을 위한 국어의 음성-음운론적 특성 연구)

Chung, Kook;Koo, Hee-San;Lee, Chan-Do;Kim, Jong-Mi;Han , Sun-Hee
- The Journal of the Acoustical Society of Korea
- /
- v.13 no.6
- /
- pp.31-44
- /
- 1994
The paper introduces several studies of various aspects of Korean phonology and phonetics for speech recognition and synthesis. The phonological and phonetic studies presented in this paper are : i) For a study of segmental phonology, we made an annotated list of Korean allophones and their corresponding alphabetic symbols to type into computers. ii) For a study of segmental phonetics, we present some acoustic regulations in Korean consonants according to their phonological environment within a word. iii) For a study of prosodic phonology, we suggest the phonological functions of prosodic features and their acoustic cues. iv) For a study of prosodic phonetics, we present the characteristic patterns of accent and intonation in Korean. v) Finally, we suggest some ways of using this phonological and phonetic knowledge for possible improvement of speech recognition and synthesis.
PDF

Segmental Interpretation of Suprasegmental Properties in Non-native Phoneme Perception

Kim, Miran
- Phonetics and Speech Sciences
- /
- v.7 no.3
- /
- pp.117-128
- /
- 2015
This paper investigates the acoustic-perceptual relation between Korean dent-alveolar fricatives and the English voiceless alveolar fricative /s/ in varied prosodic contexts (e.g., stress, accent, and word initial position). The denti-alveolar fricatives in Korean show a two-way distinction, which can be referred to as either plain (lenis) /s/ or fortis /$s^*$/. The English alveolar voiceless fricative /s/ that corresponds to the two Korean fricatives would be placed in a one-to-two non-native phoneme mapping situation when Korean listeners hear English /s/. This raises an interesting question of how the single fricative of English perceptually maps into the two-way distinction in Korean. This paper reports the acoustic-perceptual mapping pattern by investigating spectral properties of the English stimuli that are heard as either /s/ or /$s^*$/ by Korean listeners, in order to answer the two questions: first, how prosody influences fricatives acoustically, and second, how the resultant properties drive non-native listeners to interpret them as segmental features instead of as prosodic information. The results indicate that Korean listeners' responses change depending on the prosodic context in which the stimuli are placed. It implies that Korean speakers interpret some of the information provided by prosody as segmental one, and that the listeners take advantage of the information in their judgment of non-native phonemes.
https://doi.org/10.13064/KSSS.2015.7.3.117 인용 PDF KSCI

Speech Emotion Recognition on a Simulated Intelligent Robot (모의 지능로봇에서의 음성 감정인식)

Jang Kwang-Dong;Kim Nam;Kwon Oh-Wook
- MALSORI
- /
- no.56
- /
- pp.173-183
- /
- 2005
We propose a speech emotion recognition method for affective human-robot interface. In the Proposed method, emotion is classified into 6 classes: Angry, bored, happy, neutral, sad and surprised. Features for an input utterance are extracted from statistics of phonetic and prosodic information. Phonetic information includes log energy, shimmer, formant frequencies, and Teager energy; Prosodic information includes Pitch, jitter, duration, and rate of speech. Finally a pattern classifier based on Gaussian support vector machines decides the emotion class of the utterance. We record speech commands and dialogs uttered at 2m away from microphones in 5 different directions. Experimental results show that the proposed method yields $48\%$ classification accuracy while human classifiers give $71\%$ accuracy.
PDF

Some Prosodic Characteristics in Apraxia - From a visual task point of view - (실행증 환자의 운율적 특성 연구 - 시각과제 중심으로 -)

Kim Sujung
- Proceedings of the KSPS conference
- /
- 2003.10a
- /
- pp.125-127
- /
- 2003
The aim of the paper is to analyze prosodic characteristics in apraxia of speech and establish the fundamental sources in diagnosis of motor speech disorders. The sentences consist of two different types (declarative and interrogative) with different numbers of constituents which are one to three. The stimuli were constructed to assess apraxics speech with articulation and humming skills. The features of speech patterns were examined such as utterance duration, boundary tones, and etc. The results of the analysis are as follow: 1) In the interrogative sentences, the rising boundary tones appeared only in the humming tasks 2) the utterance duration is relatively shorter in the humming tasks than the speech with articulation.
PDF

Synthesis and Evaluation of Prosodically Exaggerated Utterances

Yoon, Kyu-Chul
- Phonetics and Speech Sciences
- /
- v.1 no.3
- /
- pp.73-85
- /
- 2009
This paper introduces the technique of synthesizing and evaluating human utterances with exaggerated or atypical prosody. Prosody exaggeration can be implemented by manipulating either the fundamental frequency (F0) contour, the segmental durations, or the intensity contour of an utterance. Of these three prosodic elements, two or more can be exaggerated at the same time. The algorithms of synthesis and evaluation were suggested. Learner utterances exaggerated in each of the three prosodic features were evaluated with respect to their original native versions in terms of the differences in their F0 contours, the segmental durations, and the intensity contours. The measure of differences was the Euclidean distance metric between the matching points in their F0 and intensity contours. The measure was calculated after the exaggerated learner utterances were aligned by the segments and rendered identical to their native version in terms of their segmental durations. For the evaluation of the segmental durations, no prior modifications were made in durations and the same measure was used. The results from the pilot experiment suggest the viability of this measure in the evaluation of learner utterances with atypical prosody with respect to their native versions.
PDF

Prediction of Prosodic Break Using Syntactic Relations and Prosodic Features (구문 관계와 운율 특성을 이용한 한국어 운율구 경계 예측)

Jung, Youngim;Cho, SunHo;Yoon, Aesun;Kwon, Hyuk-Chul
- Annual Conference on Human and Language Technology
- /
- 2007.10a
- /
- pp.7-14
- /
- 2007
본 논문에서는 자연스러운 한국어 운율구 경계를 예측하기 위해 (1) 문장 성분을 하위범주화하고, (2) 세분화된 문장 성분 간 의존관계를 이용하여 통사구를 추출하며 (3) 추출한 통사구의 유형에 따른 운율구 경계 예측 규칙을 설정하였다. 또한, (4) 통사적 정보 외에도 통사구와 문장의 길이, 통사구의 문장 내 위치, 문맥의 의미 정보 등에 따라 가변적인 운율구 경계를 판단하여 보다 자연스러운 한국어 운율구 경계 예측 시스템을 개발하였다. 그 결과 통사구 경계와 상관 관계가 높은 강한 운율구 경계 예측과 운율구 내부 비경계 예측에 있어 90% 이상의 높은 재현율과 정확도를 보였으며, 전체 운율구 경계 예측에 있어서도 87% 이상의 성능을 보였다.
PDF

A Note on Prosodic Differences between Korean and English - in loan words from English - (외래어 발음에서 나타난 영어와 한국어의 운율적 차이)

Kim Sunmi;Moon Soo-Mee
- MALSORI
- /
- no.35_36
- /
- pp.25-36
- /
- 1998
The prosodic properties of Korean and English stress were examined with focus on syllable duration and pitch by loan words. 14 loan words were selected by the criteria of the numbers of syllables and stress positions. 3 Korean males using Seoul dialect and 3 American males using general American English served as subjects. Each tokens were uttered 3 times and second one was chosen to be analysed by CSL. We measured the duration and F0 of each syllable. In English, duration is the most salient acoustic correlates of stress, and pitch is the second. In Korean, by contrast, it seems that neither duration nor pitch is the acoustic features of stress, from our data
PDF

Speech Emotion Recognition by Speech Signals on a Simulated Intelligent Robot (모의 지능로봇에서 음성신호에 의한 감정인식)

Jang, Kwang-Dong;Kwon, Oh-Wook
- Proceedings of the KSPS conference
- /
- 2005.11a
- /
- pp.163-166
- /
- 2005
We propose a speech emotion recognition method for natural human-robot interface. In the proposed method, emotion is classified into 6 classes: Angry, bored, happy, neutral, sad and surprised. Features for an input utterance are extracted from statistics of phonetic and prosodic information. Phonetic information includes log energy, shimmer, formant frequencies, and Teager energy; Prosodic information includes pitch, jitter, duration, and rate of speech. Finally a patten classifier based on Gaussian support vector machines decides the emotion class of the utterance. We record speech commands and dialogs uttered at 2m away from microphones in 5different directions. Experimental results show that the proposed method yields 59% classification accuracy while human classifiers give about 50%accuracy, which confirms that the proposed method achieves performance comparable to a human.
PDF

Harmonic Structure Features for Robust Speaker Diarization

Zhou, Yu;Suo, Hongbin;Li, Junfeng;Yan, Yonghong
- ETRI Journal
- /
- v.34 no.4
- /
- pp.583-590
- /
- 2012
In this paper, we present a new approach for speaker diarization. First, we use the prosodic information calculated on the original speech to resynthesize the new speech data utilizing the spectrum modeling technique. The resynthesized data is modeled with sinusoids based on pitch, vibration amplitude, and phase bias. Then, we use the resynthesized speech data to extract cepstral features and integrate them with the cepstral features from original speech for speaker diarization. At last, we show how the two streams of cepstral features can be combined to improve the robustness of speaker diarization. Experiments carried out on the standardized datasets (the US National Institute of Standards and Technology Rich Transcription 04-S multiple distant microphone conditions) show a significant improvement in diarization error rate compared to the system based on only the feature stream from original speech.
https://doi.org/10.4218/etrij.12.0111.0455 인용 PDF KSCI

Analysis of the Timing of Spoken Korean Using a Classification and Regression Tree (CART) Model

Chung, Hyun-Song;Huckvale, Mark
- Speech Sciences
- /
- v.8 no.1
- /
- pp.77-91
- /
- 2001
This paper investigates the timing of Korean spoken in a news-reading speech style in order to improve the naturalness of durations used in Korean speech synthesis. Each segment in a corpus of 671 read sentences was annotated with 69 segmental and prosodic features so that the measured duration could be correlated with the context in which it occurred. A CART model based on the features showed a correlation coefficient of 0.79 with an RMSE (root mean squared prediction error) of 23 ms between actual and predicted durations in reserved test data. These results are comparable with recent published results in Korean and similar to results found in other languages. An analysis of the classification tree shows that phrasal structure has the greatest effect on the segment duration, followed by syllable structure and the manner features of surrounding segments. The place features of surrounding segments only have small effects. The model has application in Korean speech synthesis systems.
PDF

Search Result 75, Processing Time 0.02 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)