Proceedings of the KSPS conference (대한음성학회:학술대회논문집)
The Korean Society Of Phonetic Sciences And Speech Technology
- Semi Annual
Domain
- Linguistics > Linguistics, General
2004.05a
-
We provide a brief overview of the area of speaker recognition, describing underlying techniques and current market review. We describe the techniques mainly based on GMM(gaussian mixture model) that is the most prevalent and effective approach. Following the technical overview, we will outline the market review of the area inside and outside of the country.
-
Koo, Myoung-Wan;Kim, Jae-In;Jeong, Yeong-Jun;Kim, Mun-Sik;Kim, Won-U;Kim, Hak-Hun;Park, Seong-Jun;Ryu, Chang-Seon;Kim, Hui-Gyeong 125
In this paper, we explain telecommunication services based on spoken language information technology. There are three different kinds of services. The first is based on Advanced Intelligent services(AIN). We built a Intelligent Peripheral(IP)with speech recognition, speech synthesis and VoiceXML interpreter. The second is based on KT-HUVOIS, a proprietary speech platform based on VoiceXML. The third is based on VoiceXML interpreter. We explain various services depending on these platforms in detail. -
This study investigates spectral characteristics of frication noise in Korean sibilants in terms of center of gravity and skewness. Specifically, the present study seeks to observe the two parameters with emphasis on place of articulation in different vowel environments. This study also examines whether these parameters can discriminate phonation types. The results showed that the fricatives are palatalized in front of the front vowel /i/ and the affricates are articulated at the same place of articulation regardless of the following vowels. This study also suggests that the place of articulation of the fricatives followed by /i/ is the same as those of the Korean affricates. With regard to the phonation type, there was a significant difference in the center of gravity between lax and tense series for both fricatives and affricates.
-
This study examines the acoustic characteristics of Korean stops of two dialect, Seoul and Daegu, 20 speakers of these two dialects were asked to read 15 words containing the stops of different places of articulation and phonation types at initial. The stops in the two dialects show mainly two acoustic differences. Firstly, There was a difference in distinctive features for phonetic types in the two dialects. Secondly, lenis revel fortis`s characters in Daegu dialect.
-
The aim of this paper is to investigate how the stem-final consonant clusters beginning with the liquid /ㄹ/ in Korean are realized in speech. Most scholars claim that the Korean stem-final consonant clusters are simplified and reduced to a stop consonant when pronounced. An attempt is made in this paper to verify the claim by conducting a series of listening tests and an acoustic analysis. The listening tests show that, contrary to the previous claims, some Koreans actually pronounce the stem-final consonant clusters as a whole. The result of the spectrographical study confirms our auditory observation. It has been found that the duration time taken by the stem-final consonant clusters is clearly longer when both consonants are pronounced than when only a liquid is pronounced. Similarly the vowel length of the previous syllable in the former is found to be longer in scale than the latter.
-
The aim of this paper is to show what are the sociolinguistic variables of length loss in Seoul dialect. 350 people were inquired to pronunce 40words. Among the informants, 152 were male, and198 were female. In terms of their age, 49 were twenties, 70 were thirties, 69 were forties, 71 were fifties, and 91 were above sixties. According to our statistics, 18 words show sociolinguistic variation by age, and sex was not a variable. So we can conclude that Seoul dialect is undergoing length loss by age at least. But we need to enlarge the number of words and informants and we also need to adopt other variables.
-
This study concerns the constraints of English Poetic Fixed Meter. In English poems, the metrical pattern doesn't always match the linguistic stress on the lines. These mismatches are found differently among the poets. For the lexical stress mismatched with the weak metrical position,
${\ast}W{\Rightarrow}$ Strength is established by the concept of the strong syllable. The peaked monosyllabic word mismatched with weak metrical position is divided according to which side of the boundary of a phonological domain it is adjacent to. In most poets,${\ast}$ Peak] is ranked higher than${\ast}$ [Peak. In Shakespeare, Adjacency Constraint is ranked higher than${\ast}$ Peak]. -
The aim of this paper is to study the relation between Eojeol and prosodic phrase in Korean. Depending on two adnominal ending form in Korean '-ㄴ' and '-ㄹ', there are some different prosodic phrase: 1)
$1{\sim}2$ syllable eojeols : '-ㄴ' has none prosodic phrase in front of the eojeol, an accentual phrase in the end of the eojeol. In contrast, '-ㄹ' has an accentual phrase in front of the eojeol, but none in the end of the eojeol. 2) More than 3 syllable eojeols : '-ㄴ' have accentual phrases on the edge of the eojeol. but '-ㄹ' has an accentual phrase in the end of the eojeol. -
The purpose of this paper is to find prosodic characteristics in voice imitation. Speakers change various phonetic features in voice imitation. Speakers change their pitch ranges in the most cases. Especially, the pitch range is important for word conditions. And, as imitators change the voice, the average value of f0 is close to high frequence than low frequence or middle level.
-
The aim of this study is to analyse the adjustment of the proportion of segment duration in imitating voice. When imitating others' voices, how far is his/her original proportion of segment duration adjusted, and what is this adjustment like under various segments? In this study, I classified segments into consonants and vowels and consonants classified into obstruents and sonorants. The result of the analysis is as follows. ; (1)Individual variation in the proportion of obstruent is not significant, and when imitating, and its distribution is not typicalized. (2) Vowels has individual variation in the proportion of segment duration even under imitating. (3) Nasal has the most distinct individual variation even under imitating, compared with vowel and obstruent. For the further study, I should examine the characteristics of quantitative and qualitative changes in liquid (among sonorant) to find out which segment can best describe personnel characteristics of the proportion of segment duration in imitating voice.
-
The aim of this paper is to analyze vowel in voice imitation, and to find the invariable phonetic features of the speaker. In this paper we examined the formants of vowel /a, u, i/. The results of the present are as follows : (1) Speakers change their vocal tract cavity features. (2) F1 changes easily compared to
$F2{\sim}F3{\sim}F4$ . (3) F3-F2 appears to be constituent for a speakers identification in vowel /a/ and F4-F2 in vowel /i/. -
The aim of this paper is to analyze the acoustic features for disguised voice. In this paper we examined the features such as pitch range, vowel formants(F1, F2, F3, F4). So the result of the analysis is as follows. : (1) Pitch range and average of pitch value is very important cue for speaker verification. (2) F3-F2 is also important cue for speaker verification (3) /a/ is more verified than other vowels.
-
The purpose of this study was to determine a correlation between acoustic and perceptual parameters of the singing voice in singing students and compare them with the results with previous studies, and a more sensitive parameters in analyzing professional vocal usage. This study measured acoustic and perceptual parameters in 41 singing students. Digital audio recordings were made in sung vowels acoustic analysis. Each sample was judged by 1 experienced singing teacher and 1 voice pathologist on two semantic bipolar 7-point scales (ringing-dull, rich-thin). The results showed that SPP1 (p<0.01), SPP2 (p<0.01), and P1(p<0.01) had significant correlations with ringing and richness quality.
-
This study investigated the use of grammatical markers of Korean speaking children(from 3 to 8 years old) and adults. Participants had no problem of speech and language. In this paper we examined the usage of grammatical markers were increasing from 3 to 8 years old even though it was still increasing after 8 years old. Specifically, they used subjective marker and adverb marker at all ages in large. A few adjective marker were used at all age group including adults. But the frequency of objective marker is increasing from 3 to adults. The pattern of usage is getting similar to adult's. The present study was also designed to investigate the characteristics of children's errors of grammatical markers. Results showed that there were a little differences among the age-groups. The substitution errors were occurred most frequently in all age-groups.
-
This paper describes an efficient method for improving the noise-robustness in speech recognition in a running car by considering wind noise. In driving car, mainly three kind of noises engine noise, tire noise and wind noise, are severely affect recognition performance. Especially wind noise is an important factor in driving car with window opened. We analyzed wind noise in various driving conditions that are 60, 80, 100 km/h with window fully opened, window half opened. We clarified that the recognition rate is significantly degenerated when the wind noise components in the frequency range above 200 Hz are large. We developed a preprocessing method to improve the noise robustness despite of wind noise. We adaptively changed the cutoff frequency of the front-end high-pass filter from 100 through 200 Hz according to the level of the wind noise components. By this method, the recognition rate is considerably improved for all kind of driving conditions
-
Frame erasures cause speech quality degradation in wireless communication networks or packet networks. The degradation becomes worse when consecutive frame erasures occur. Speech coders have a frame erasure concealment(FEC) mechanism to compensate for frame erasures. It is meaningful to evaluate the performance of FEC mechanisms for frame erasures that occur in communications networks. In this paper, various frame erasures are designed. And the FEC algorithms of speech coders are evaluated and analyzed with the Perceptual Evaluation of Speech Quality(PESQ). It is found that the performances vary in accordance with frame erasure types, frame erasure rates, and utterance lengths.
-
An automatic pronunciation correction system provides users with correction guidelines for each pronunciation error. For this purpose, we propose a speech recognition system which automatically classifies pronunciation errors when Koreans speak a foreign language. In this paper, we also propose machine scoring methods for automatic assessment of pronunciation quality by the speech recognizer. Scores obtained from an expert human listener are used as the reference to evaluate the different machine scores and to provide targets when training some of algorithms. We use a log-likelihood score and a normalized log-likelihood score as machine scoring methods. Experimental results show that the normalized log-likelihood score had higher correlation with human scores than that obtained using the log-likelihood score.
-
The aim of this paper is to show the possibilities of the proficiency -based and integrated teaching of High School English reading and listening based on sense group and utterance restructuring. The proficiency -based and integrated listening and reading activities in stages are as follows. Step1, students fill in the blanks with strong or weak sounding words according to their abilities. Step2, speak along (track) based on restructuring and post-lexical phenomena while listening to the sentence. Step3, read and understand directly the passage, which have been marked the differentiated places where a native speaker of English would beat all likely to pause. Students need to listen to spoken English, so they recognize words in written and spoken form. They must be familiar with suprasegmental features, stress and rhythm, and post-lexical phenomena during reading activities.
-
The purpose of this paper is to examine the effect of using songs for the acquisition of English Prosody in elementary school. For the purpose, 8 classes were chosen to teach songs for four months, and listening tests and reading test were performed for analyzing the effect. The result is as follows : (1) The result of listening test showed that the average scores of the experimental classes were higher than those of comparative classes, and it was more effective in lower grades than in upper grades. (2) In pronunciation tests, the pronunciations of experimental classes were more similar to native speaker's pronunciation that those of comparative classes in intonation, lexical stress and sentence stress. (3) Singing songs repeatedly is more important than learning many songs. It means that to give the chances to sing as many times as possible is advisable for teaching pronunciation.
-
This study aims to define the isochronism of English feet. 275 feet, which consist of 66 lines of 13 poems written by 12 modem poets(The Poet Speaks, 1982), were used for analysis of durational measurement. To assess the average value of a foot, the study is, first, to set up a way of measuring the duration of each foot on its types in English meter. Secondly, with the measurement of the average duration of feet in modern poets' English poetries through Praat (version 4.119, 2003), it clarifies the foot isochronism in the fixed meter. With the two ways of measuring the isochronism, it clarifies the fact the foot isochronism permits the difference scope of the perceptive gap of 555.4-974.5msec. per foot in case of iambic meters in English poetry.
-
The purpose of this paper is to improve English communicative competence of elementary school students through an acoustic study For this purpose, this study investigates various postlexical phenomena which can be applied to utterence contents in elementary school English book and analyzes the application of postlexical phenomena through the spectrogram when native speakers and elementary school students speak English. The speech materials were seven sentences which contained various postlexical phenomena. This leads to the conclusion that knowing and pronouncing postlexical phenomena of English is needed for improving English communicative competence successfully.