• Title/Summary/Keyword: Phonemes

Search Result 227, Processing Time 0.025 seconds

N-gram Based Robust Spoken Document Retrievals for Phoneme Recognition Errors (음소인식 오류에 강인한 N-gram 기반 음성 문서 검색)

  • Lee, Su-Jang;Park, Kyung-Mi;Oh, Yung-Hwan
    • MALSORI
    • /
    • no.67
    • /
    • pp.149-166
    • /
    • 2008
  • In spoken document retrievals (SDR), subword (typically phonemes) indexing term is used to avoid the out-of-vocabulary (OOV) problem. It makes the indexing and retrieval process independent from any vocabulary. It also requires a small corpus to train the acoustic model. However, subword indexing term approach has a major drawback. It shows higher word error rates than the large vocabulary continuous speech recognition (LVCSR) system. In this paper, we propose an probabilistic slot detection and n-gram based string matching method for phone based spoken document retrievals to overcome high error rates of phone recognizer. Experimental results have shown 9.25% relative improvement in the mean average precision (mAP) with 1.7 times speed up in comparison with the baseline system.

  • PDF

Some Notational Problems of the translation of Japanese stops[k, t] and affricates[t s ,$t{\int}$] into Korean (일본어 파열음[k, t]과 파찰음[t s , $t{\int}$ 의 국어 표기상의 문제점)

  • Lee, Young-Hee
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.187-192
    • /
    • 2007
  • The purpose of this paper is to show that the current notation of Japanese proper names in Korean has some problems. It cannot represent the different sounds between the voiced and voiceless. The purpose of this paper is also to give a more correct notation which is coherent and efficient. After introducing some general knowledge about the phonemes of Japanese language, I measured the Voice Onset Time of the stops[k, t] at the beginning, in the middle and at the end of a word, and compared the spectrogram of affricates with that of fricatives. In conclusion, Japanese voiceless [k, t ,$t{\int}$] should be written as [ㅋ,ㅌ,ㅊ] and voiced [g, d $d_3$] as [ㄱ,ㄷ,ㅈ] and the affricate[ts] as[ㅊ] in Korean.

  • PDF

Text-driven Speech Animation with Emotion Control

  • Chae, Wonseok;Kim, Yejin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.8
    • /
    • pp.3473-3487
    • /
    • 2020
  • In this paper, we present a new approach to creating speech animation with emotional expressions using a small set of example models. To generate realistic facial animation, two example models called key visemes and expressions are used for lip-synchronization and facial expressions, respectively. The key visemes represent lip shapes of phonemes such as vowels and consonants while the key expressions represent basic emotions of a face. Our approach utilizes a text-to-speech (TTS) system to create a phonetic transcript for the speech animation. Based on a phonetic transcript, a sequence of speech animation is synthesized by interpolating the corresponding sequence of key visemes. Using an input parameter vector, the key expressions are blended by a method of scattered data interpolation. During the synthesizing process, an importance-based scheme is introduced to combine both lip-synchronization and facial expressions into one animation sequence in real time (over 120Hz). The proposed approach can be applied to diverse types of digital content and applications that use facial animation with high accuracy (over 90%) in speech recognition.

A Study on the Phoneme Recognition in the Restricted Continuously Spoken Korean (제한된 한국어 연속음성에 나타난 음소인식에 관한 연구)

  • 심성룡;김선일;이행세
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.12
    • /
    • pp.1635-1643
    • /
    • 1995
  • This paper proposes an algorithm for machine recognition of phonemes in continuously spoken Korean. The proposed algorithm is a static strategy neural network. The algorithm uses, at the stage of training neurons, features such as the rate of zero crossing, short-term energy, and either PARCOR or auditory-like perceptual linear prediction(PLP) but not both, covering a time of 171ms long. Numerical results show that the algorithm with PLP achieves approximately the frame-based phoneme recognition rate of 99% for small vocabulary recognition experiments. Based on this it is concluded that the proposed algorithm with PLP analysis is effective in phoneme recognition.

  • PDF

The Recognition of Unvoiced Consonants Using Characteristic Parameters of the Phonemes (음소 특정 파라미터를 이용한 무성자음 인식)

  • 허만택;이종혁;남기곤;윤태훈;김재창;이양성
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.4
    • /
    • pp.175-182
    • /
    • 1994
  • In this study, we present unvoiced consonant recognition system using characteristic parameters of the phoneme of the each syllable. For the recognition, the characteristic parameters on the time domain such as ZCR, total energy of the consonant region and half region energy of the consonant region, and those on the frequency domain such as the frequency spectrum of the transition region are used. The objective unvoiced consonants in this study are /ㄱ/,/ㄷ/,/ㅂ/,/ㅈ/,/ㅋ/,/ㅌ/,/ㅍ/ and /ㅊ/. Each characteristic parameter of two regions extracted from these segmented unvoiced consonants are used for each recognition system of the region, independently, And complementing two outputs of each other system, the final output is to be produced. The recognition system is implemented using MLP which has learning ability. The recognition simulation results for 112 unvoiced consonant samples are that average recognition rates are 96.4$\%$ under 80$\%$ learning rates and 93.7$\%$ under 60$\%$ learning rates.

  • PDF

A study on the voice command recognition at the motion control in the industrial robot (산업용 로보트의 동작제어 명령어의 인식에 관한 연구)

  • 이순요;권규식;김홍태
    • Journal of the Ergonomics Society of Korea
    • /
    • v.10 no.1
    • /
    • pp.3-10
    • /
    • 1991
  • The teach pendant and keyboard have been used as an input device of control command in human-robot sustem. But, many problems occur in case that the usef is a novice. So, speech recognition system is required to communicate between a human and the robot. In this study, Korean voice commands, eitht robot commands, and ten digits based on the broad phonetic analysis are described. Applying broad phonetic analysis, phonemes of voice commands are divided into phoneme groups, such as plosive, fricative, affricative, nasal, and glide sound, having similar features. And then, the feature parameters and their ranges to detect phoneme groups are found by minimax method. Classification rules are consisted of combination of the feature parameters, such as zero corssing rate(ZCR), log engery(LE), up and down(UD), formant frequency, and their ranges. Voice commands were recognized by the classification rules. The recognition rate was over 90 percent in this experiment. Also, this experiment showed that the recognition rate about digits was better than that about robot commands.

  • PDF

Selective Adaptation of Speaker Characteristics within a Subcluster Neural Network

  • Haskey, S.J.;Datta, S.
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.464-467
    • /
    • 1996
  • This paper aims to exploit inter/intra-speaker phoneme sub-class variations as criteria for adaptation in a phoneme recognition system based on a novel neural network architecture. Using a subcluster neural network design based on the One-Class-in-One-Network (OCON) feed forward subnets, similar to those proposed by Kung (2) and Jou (1), joined by a common front-end layer. the idea is to adapt only the neurons within the common front-end layer of the network. Consequently resulting in an adaptation which can be concentrated primarily on the speakers vocal characteristics. Since the adaptation occurs in an area common to all classes, convergence on a single class will improve the recognition of the remaining classes in the network. Results show that adaptation towards a phoneme, in the vowel sub-class, for speakers MDABO and MWBTO Improve the recognition of remaining vowel sub-class phonemes from the same speaker

  • PDF

Acoustic properties of vowels produced by cerebral palsic adults in conversational and clear speech (뇌성마비 성인의 일상발화와 명료한 발화에서의 모음의 음향적 특성)

  • Ko Hyun-Ju;Kim Soo-Jin
    • Proceedings of the KSPS conference
    • /
    • 2006.05a
    • /
    • pp.101-104
    • /
    • 2006
  • The present study examined two acoustic characteristics(duration and intensity) of vowels produced by 4 cerebral palsic adults and 4 nondisabled adults in conversational and clear speech. In this study, clear speech means: (1) slow one's speech rate just a little, (2) articulate all phonemes accurately and increase vocal volume. Speech material included 10 bisyllabic real words in the frame sentences. Temporal-acoustic analysis showed that vowels produced by two speaker groups in clear speech(in this case, more accurate and louder speech) were significantly longer than vowels in conversational speech. In addition, intensity of vowels produced by cerebral palsic speakers in clear speech(in this case, more accurate and louder speech) was higher than in conversational speech.

  • PDF

A Study on the Phonetic Discrimination and Acquisition Ability of Korean Language Learners (한국어 학습자의 음성 변별 능력과 음운 습득 능력의 상관성에 관한 연구)

  • Jung, Mi-Ji;Kwon, Sung-Mi
    • Phonetics and Speech Sciences
    • /
    • v.2 no.1
    • /
    • pp.23-32
    • /
    • 2010
  • This study aimed at discovering whether Korean language learners who had never been exposed to Korean phones before could distinguish Korean phones and whether learners who had comparatively better ability of identifying phonetic differences displayed a better result in acquiring Korean phonemes. The study conducted two experiments on 25 learners. In Experiment I, an oddball test (ABX) was performed to investigate the learners' ability to discriminate Korean phones on the first day of the course. In Experiment II, an identification test was administered to analyze the ability of identifying Korean phones on the same learners after three weeks of language instruction. The results revealed that the true-beginner learners demonstrated different phonetic discrimination abilities, but these abilities did not seem to correlate with the rate of acquisition.

  • PDF

The Development of Phonological Awareness in Children (아동의 음운인식 발달)

  • Park, Hyang Ah
    • Korean Journal of Child Studies
    • /
    • v.21 no.1
    • /
    • pp.35-44
    • /
    • 2000
  • This study examined the development of phonological awareness of 3-, 5-, and 7-year-old children, 20 subjects at each age level. The 3-year-olds were given 2 phoneme detection tasks and the 5- and 7-year-olds were given 5 phoneme detection tasks. In each task, the children first heard a target syllable together with 2 other syllables and were asked to tell which of the 2 syllables sounded similar to the target. Children were able to detect relatively large segments ($Consonant_1+Vowel$ or $Vowel+Consonant_2$: $C_1V$ or $VC_2$) at the age of 3 and gradually progressed to smaller sound segments(e.g., phonemes). This study indicated the Korean children detect $C_1V$ segments better than $VC_2$ segments and detect the initial consonant better than the middle vowel and the final consonant.

  • PDF