• 제목/요약/키워드: phonetic variation

검색결과 61건 처리시간 0.021초

Robust Speech Recognition using Noise Compensation Method Based on Eigen - Environment (Eigen - Environment 잡음 보상 방법을 이용한 강인한 음성인식)

  • Song Hwa Jeon;Kim Hyung Soon
    • MALSORI
    • /
    • 제52호
    • /
    • pp.145-160
    • /
    • 2004
  • In this paper, a new noise compensation method based on the eigenvoice framework in feature space is proposed to reduce the mismatch between training and testing environments. The difference between clean and noisy environments is represented by the linear combination of K eigenvectors that represent the variation among environments. In the proposed method, the performance improvement of speech recognition systems is largely affected by how to construct the noisy models and the bias vector set. In this paper, two methods, the one based on MAP adaptation method and the other using stereo DB, are proposed to construct the noisy models. In experiments using Aurora 2 DB, we obtained 44.86% relative improvement with eigen-environment method in comparison with baseline system. Especially, in clean condition training mode, our proposed method yielded 66.74% relative improvement, which is better performance than several methods previously proposed in Aurora project.

  • PDF

Building a Morpheme-Based Pronunciation Lexicon for Korean Large Vocabulary Continuous Speech Recognition (한국어 대어휘 연속음성 인식용 발음사전 자동 생성 및 최적화)

  • Lee Kyong-Nim;Chung Minhwa
    • MALSORI
    • /
    • 제55권
    • /
    • pp.103-118
    • /
    • 2005
  • In this paper, we describe a morpheme-based pronunciation lexicon useful for Korean LVCSR. The phonemic-context-dependent multiple pronunciation lexicon improves the recognition accuracy when cross-morpheme pronunciation variations are distinguished from within-morpheme pronunciation variations. Since adding all possible pronunciation variants to the lexicon increases the lexicon size and confusability between lexical entries, we have developed a lexicon pruning scheme for optimal selection of pronunciation variants to improve the performance of Korean LVCSR. By building a proposed pronunciation lexicon, an absolute reduction of $0.56\%$ in WER from the baseline performance of $27.39\%$ WER is achieved by cross-morpheme pronunciation variations model with a phonemic-context-dependent multiple pronunciation lexicon. On the best performance, an additional reduction of the lexicon size by $5.36\%$ is achieved from the same lexical entries.

  • PDF

Acoustic Characteristics of Vowels in Korean Distant-Talking Speech (한국어 원거리 음성의 모음의 음향적 특성)

  • Lee Sook-hyang;Kim Sunhee
    • MALSORI
    • /
    • 제55권
    • /
    • pp.61-76
    • /
    • 2005
  • This paper aims to analyze the acoustic effects of vowels produced in a distant-talking environment. The analysis was performed using a statistical method. The influence of gender and speakers on the variation was also examined. The speech data used in this study consist of 500 distant-talking words and 500 normal words of 10 speakers (5 males and 5 females). Acoustic features selected for the analysis were the duration, the formants (Fl and F2), the fundamental frequency and the total energy. The results showed that the duration, F0, F1 and the total energy increased in the distant-talking speech compared to normal speech; female speakers showed higher increase in all features except for the total energy and the fundamental frequency. In addition, speaker differences were observed.

  • PDF

Effects of Vowel Differences on Laryngeal DDK (모음에 따른 후두 교호운동 특성)

  • Han, Ji-Yeon;Lee, Ok-Bun
    • MALSORI
    • /
    • 제68권
    • /
    • pp.1-15
    • /
    • 2008
  • This study investigated the vowel effect on laryngeal DDK (L-DDK) in terms of rate, regularity, and range. Thirteen normal speakers participated in this experiment. Speakers were asked to repeat the vowels /a, e, i, o, u/ for vocal fold adduction DDK, and /ha, he, hi, ho, hul for vocal fold abduction DDK. Acoustic data was analyzed via Motor Speech Profile. There were 6 parameters: DDKavp and DDKavr for rate of L-DDK, DDKcvp and DDKjit for regulariry of L-DDK, and DDKavi and DDKcvi for range of L-DDK. Results of MANOVA and Fredman analysis showed no significant vowel effect on rate and regularity of L-DDK. MANOVA revealed significant effects of vowels and vocal fold ab/adduction on range of L-DDK. DDK peak intensity (DDKavi) in vowel /i/ production was lower than in vowels /a, e, o, u/. Variation of DDK peak intensity (DDKcvi) was significantly greater for /ha/ than for /a/ production. The implication of these findings on voice and speech pathology is discussed.

  • PDF

On the Simple Speaker Verification System Using Tolerance Interval Analysis Without Background Speaker Models (Tolerance Interval Analysis를 이용한 배경화자 없는 간단한 화자인증시스템에 관한 연구)

  • Choi, Hong-Sub
    • MALSORI
    • /
    • 제56호
    • /
    • pp.147-158
    • /
    • 2005
  • In this paper, we are focused to develop the simplified speaker verification algorithm without background speaker models, which will be adopted in the portable speaker verification system equipped in portable terminals such as mobile phone and PMP. According to the tolerance interval analysis, the population of someone's speaker model can be represented by a suitable number of selected independent samples of speaker model. So we can make the representative speaker model and threshold under the specified confidence level and coverage. Using proposed algorithm with the number of samples is 40, the experiments show that the false rejection rate is $3.0\%$ and the false acceptance rate $4.3\%$, worth comparing to conventional method's results, $5.4\%\;and\;5.5\%$, respectively. Next step of research will be on the suitable adaptation methods to overcome speech variation problems due to aging effect and operating environments.

  • PDF

A new method of Extracting the Filter Characteristics of the Nasal Cavity Using Homorganic Nasal-Stop Sequences: A Preliminary Report (동기관음의 스펙트럼 차이를 이용한 비강 특성 산출: 예비 연구)

  • Park, Han-Sang
    • MALSORI
    • /
    • 제53호
    • /
    • pp.17-35
    • /
    • 2005
  • A New Method of Extracting the Filter Characteristics of the Nasal Cavity Using Homorganic Nasal-Stop Sequences: A Preliminary R eportHansang ParkThis study provides a new method of extracting the filter characteristics of the nasal cavity. Korean lenis stops are realized as voiced in the homorganic nasal-lenis stop sequences between vowels. Since the only difference between the two members of the homorganic nasal- lenis stop sequences, such as [mb], [nd], and [ g], is whether the passage to the nasal cavity is open or not, the subtraction of the LPC spectrum of the voiced stop from that of the preceding nasal leads to the filter characteristics of the nasal cavity of an individual speaker regardless of place of articulation. The results suggest that various attempts should be made to extract a robust filter characteristics of the nasal cavity by giving variation to LPC coefficients and by paying particular attention to speech samples. This study is significant in that it provides a preliminary report about a new method of extracting the filter characteristics of the nasal cavity.

  • PDF

Acoustic Variation in infant crying (아기 울음의 음향학적 특성)

  • Choi, Yoon-Mi;Kim, Sun-Jun;Joo, Chan-Uhng;Kim, Hyun-Gi
    • Proceedings of the KSPS conference
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.146-148
    • /
    • 2007
  • Studies of cry characteristics in the newborn infant were aimed to determine if cry analysis could be succesful in the early detection of the infant at risk for developmental difficulties. Crying presupposes functioning of the respiratory, laryngeal and supralaryngeal muscles. The nervous system controls the capacity, stability, and co-ordination of the movements in these muscles. Hence, the cry provides information about how the Nervous System is functioning. 3 patients(down syndrome, cornelia de lange syndrome, Patent ductus arteriosus) were assessed through a Computerized Speech Lab (CSL). Tests had been chosen to assess Fundamental frequency(mean, maximum, minimum values), Melody contour, NHR, Energy. We compared the data from patients and healthy volunteer. Variations in cry characteristics were documented in a number of medical abnormalities.

  • PDF

Multimodal Dialog System Using Hidden Information State Dialog Manager (Hidden Information State 대화 관리자를 이용한 멀티모달 대화시스템)

  • Kim, Kyung-Duk;Lee, Geun-Bae
    • Proceedings of the KSPS conference
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.29-32
    • /
    • 2007
  • This paper describes a multimodal dialog system that uses Hidden Information State (HIS) method to manage the human-machine dialog. HIS dialog manager is a variation of classic partially observable Markov decision process (POMDP), which provides one of the stochastic dialog modeling frameworks. Because dialog modeling using conventional POMDP requires very large size of state space, it has been hard to apply POMDP to the real domain of dialog system. In HIS dialog manager, system groups the belief states to reduce the size of state space, so that HIS dialog manager can be used in real world domain of dialog system. We adapted this HIS method to Smart-home domain multimodal dialog system.

  • PDF

Stress Patterns of Compound Nouns in English (영어 복합명사의 강세형)

  • Lee Yeong-Kil
    • MALSORI
    • /
    • 제42호
    • /
    • pp.25-36
    • /
    • 2001
  • Stress assignment has been much discussed in the literature on English compound nouns. The general view of the stress pattern of English compound nouns is that a main stress falls on the first element and a secondary stress on the second element; however, a stress pattern is often employed that provides counterevidence to the traditional pedagogical approach. A new idea is suggested by Ladd(1984) that 'compound stress represents the deaccenting of the head of the compound.' Recent studies show that initial stressing does not indicate compounds and syntactic phrases are not always characterized by final stressing. In his pilot test Pennanen comments on the frequent variation of stress patterns on individual items, on the basis of which Bauer confirms Pennanen's results with different informants. This paper is an attempt to justify Bauer's analysis with the same data as Bauer's and different subjects. It turns out that the competences of native-speaker informants do not rovide clear-cut answers. Some factors should be taken into account in assigning appropirate stress to compound nouns.

  • PDF

A Neglected Factor of French Prosody: The peak variation at the end of rhythmic groups

  • Claude Roberge;Noriko Hoki
    • MALSORI
    • /
    • 제31_32호
    • /
    • pp.207-221
    • /
    • 1996
  • The aim of this research is to study the functioning of the peak variations at the end of the rhythmic groups in spoken french. For this purpose, the text '60 Voix, 60 Exercices', published by Hachette in 1988, was selected. This textbook is based on interviews with 60 persons who briefly speak in a monolog from on a subject of their choice. 500 hundred different groups were selected and submitted to the auditory judgment of six informants, three French natives and three Japanese natives who had studied French for at least three years. It was found, first, that there exists a tendency to a change of either rising or tolling intonation compared with the flat one, and second, that the rising intonation obtains a flirty good score of frequency compared with the two other, ones even if the examined sentences do not pertain to the strict classical types of interrogative or exclamative sentences or dialogs, where affectivity is so often an important factor.

  • PDF