• Title/Summary/Keyword: Phonemes

Search Result 226, Processing Time 0.031 seconds

Case Study of Auditory Training for the Acquired Hearing loss Adult with Cochlear Implant (후천성 인공와우 이식 성인의 청능훈련 사례 연구)

  • Hong, Ha Na
    • 재활복지
    • /
    • v.17 no.4
    • /
    • pp.371-382
    • /
    • 2013
  • Recently, the number of those who were transplanted cochlear implants increased as health insurance increases has expanded. Last six years between 2005 to 2009, patients who received a cochlear implant surgery were about 3,300 and number of cochlear implants in adults of them have shown growing aspects. In the case of young children, they actively participated auditory training program after cochlear implant surgery and the studies related to auditory training in child are many, but the studies related to auditory training in adults is insufficient. In this study, we perform the auditory training for the female adult (age 54) received cochlear implant after language acquisition used Ling 6 sounds test, standardized consonants, vowels and sentences listening test and word recognition and confirmation test. As a result after auditory training for 10 weeks, she identified all phonemes in Ling 6 sound test and performed close to 100% in standardized consonants, vowels and sentences listening tests. Also, she improved the ability of real-world environmental sound and real-world words identifications by 57-95%. The results of this study showed the need of auditory training program with systematic and effective planning and considering the characteristics of the individual for adults.

A Study on the Rejection Capability Based on Anti-phone Modeling (반음소 모델링을 이용한 거절기능에 대한 연구)

  • 김우성;구명완
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.3
    • /
    • pp.3-9
    • /
    • 1999
  • This paper presents the study on the rejection capability based on anti-phone modeling for vocabulary independent speech recognition system. The rejection system detects and rejects out-of-vocabulary words which were not included in candidate words which are defined while the speech recognizer is made. The rejection system can be classified into two categories by their implementation methods, keyword spotting method and utterance verification method. The keyword spotting method uses an extra filler model as a candidate word as well as keyword models. The utterance verification method uses the anti-models for each phoneme for the calculation of confidence score after it has constructed the anti-models for all phonemes. We implemented an utterance verification algorithm which can be used for vocabulary independent speech recognizer. We also compared three kinds of means for the calculation of confidence score, and found out that the geometric mean had shown the best result. For the normalization of confidence score, usually Sigmoid function is used. On using it, we compared the effect of the weight constant for Sigmoid function and determined the optimal value. And we compared the effects of the size of cohort set, the results showed that the larger set gave the better results. And finally we found out optimal confidence score threshold value. In case of using the threshold value, the overall recognition rate including rejection errors was about 76%. This results are going to be adapted for stock information system based on speech recognizer which is currently provided as an experimental service by Korea Telecom.

  • PDF

A Study on Comparison of Pronunciation Accuracy of Soprano Singers

  • Song, Uk-Jin;Park, Hyungwoo;Bae, Myung-Jin
    • International journal of advanced smart convergence
    • /
    • v.6 no.2
    • /
    • pp.59-64
    • /
    • 2017
  • There are three sorts of voices of female vocalists: soprano, mezzo-soprano, and contralto according to the transliteration. Among them, the soprano has the highest vocal range. Since the voice is generated through the human vocal tract based on the voice generation model, it is greatly influenced by the vocal tract. The structure of vocal organs differs from person to person, and the formants characteristic of vocalization differ accordingly. The formant characteristic refers to a characteristic in which a specific frequency band appears distinctly due to resonance occurring in each vocal tract in the vocal process. Formant characteristics include personality that occurs in the throat, jaw, lips, and teeth, as well as phonological properties of phonemes. The first formant is the throat, the second formant is the jaw, the third formant and the fourth formant are caused by the resonance phenomenon in the lips and the teeth. Among them, pronunciation is influenced not only by phonological information but also by jaws, lips and teeth. When the mouth is small or the jaw is stiff when pronouncing, pronunciation becomes unclear. Therefore, the higher the accuracy of the pronunciation characteristics, the more clearly the formant characteristics appear in the grammar spectrum. However, many soprano singers can not open their mouths because their jaws, lips, teeth, and facial muscles are rigid to maintain high tones when singing, which makes the pronunciation unclear and thus the formant characteristics become unclear. In this paper, in order to confirm the accuracy of the pronunciation characteristics of soprano singers, the experimental group was selected as the soprano singers A, B, C, D, E of Korea and analyzed the grammar spectrum and conducted the MOS test for pronunciation recognition. As a result, soprano singer B showed a clear recognition from F1 to F5 and MOS test result showed the highest recognition rate with 4.6 points. Soprano singers A, C, and D appear from F1 to F3, but it was difficult to find formants above 2kHz. Finally, the soprano singer E had difficulty in finding the formant as a whole, and MOS test showed the lowest recognition rate at 2.1 points. Therefore, we confirmed that the soprano singer B, which exhibits the most distinct formant characteristics in the grammar spectrum, has the best pronunciation accuracy.

Making Human Phantom for X-ray Practice with 3D Printing (3D 프린팅을 활용한 일반 X선 촬영 실습용 인체 팬텀 제작)

  • Choi, Woo Jeon;Kim, Dong Hyun
    • Journal of the Korean Society of Radiology
    • /
    • v.11 no.5
    • /
    • pp.371-377
    • /
    • 2017
  • General phantom for practical X-ray photography Practical phantom is an indispensable textbook for radiology, but it is difficult for existing commercially available phantom to be equipped with various kinds of phantom because it is an expensive import. Using 3D printing technology, I would like to make the general phantom for practical X-ray photography less expensive and easier. We would like to use a skeleton model that was produced based on CT image data using a 3D printer of FDM (Fused Deposition Modeling) method as a phantom for general X-ray imaging. 3D slicer 4.7.0 program is used to convert CT DICOM image data into STL file, convert it to G-code conversion process, output it to 3D printer, and create skeleton model. The phantom of the completed phantom was photographed by X - ray and CT, and compared with actual medical images and phantoms on the market, there was a detailed difference between actual medical images and bone density, but it could be utilized as a practical phantom. 3D phonemes that can be used for general X-ray practice can be manufactured at low cost by utilizing 3D printers which are low cost and distributed and free 3D slicer program for research. According to the future diversification and research of 3D printing technology, it will be possible to apply to various fields such as health education and medical service.

Perception of English Vowels By Korean Learners: Comparisons between New and Similar L2 Vowel Categories (한국인 학습자의 영어 모음 인지: 새로운 L2 모음 범주와 비슷한 L2 모음 범주의 비교)

  • Lee, Kye-Youn;Cho, Mi-Hui
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.8
    • /
    • pp.579-587
    • /
    • 2015
  • The purpose of this study is to investigate how Korean learners perceive English vowels and further to test SLM which claims that new L2 vowel categories are more easily acquired than similar L2 vowel categories. Twenty Korean learners participated in English-to-Korean mapping test and English vowel identification test with target vowels /i, ɪ, u, ʊ, ɛ, æ/. The result revealed that Korean participants mapped the English pairs /i/-/ɪ/ and /u/-/ʊ/ onto single Korean vowel /i/ and /u/, respectively. in addition, both of English /ɛ/ and /æ/ were simultaneously mapped onto Korean /e/ and /ɛ/. This indicated that the Korean participants seemed to have perceptual difficulty for the pairs /i-ɪ/, /u-ʊ/, and /ɛ-æ/. The result of the forced-choice identification test showed that the accuracy of /ɪ, ʊ, æ/(ɪ: 81.3%, ʊ: 62.5%, æ: 60.0%) was significantly higher than that of /i, u, ɛ/(i: 28,8%, u: 28.8%, ɛ: 32.4%). Thus, the claim of SLM is confirmed given that /ɪ, ʊ, æ/ are new vowel categories whereas /i, u, ɛ/ are similar vowel categories. Further, the conspicuously low accuracy of the similar L2 vowel categories /i, u, ɛ/ was accounted for by over-generalization whereby the Korean participants excessively replaced L2 similar /i, u, ɛ/ with L2 new /ɪ, ʊ, æ/ as the participants were learning the L2 new vowel categories in the process of acquisition. Based on the findings this study, pedagogical suggestions are provided.

The Design of Keyword Spotting System based on Auditory Phonetical Knowledge-Based Phonetic Value Classification (청음 음성학적 지식에 기반한 음가분류에 의한 핵심어 검출 시스템 구현)

  • Kim, Hack-Jin;Kim, Soon-Hyub
    • The KIPS Transactions:PartB
    • /
    • v.10B no.2
    • /
    • pp.169-178
    • /
    • 2003
  • This study outlines two viewpoints the classification of phone likely unit (PLU) which is the foundation of korean large vocabulary speech recognition, and the effectiveness of Chiljongseong (7 Final Consonants) and Paljogseong (8 Final Consonants) of the korean language. The phone likely classifies the phoneme phonetically according to the location of and method of articulation, and about 50 phone-likely units are utilized in korean speech recognition. In this study auditory phonetical knowledge was applied to the classification of phone likely unit to present 45 phone likely unit. The vowels 'ㅔ, ㅐ'were classified as phone-likely of (ee) ; 'ㅒ, ㅖ' as [ye] ; and 'ㅚ, ㅙ, ㅞ' as [we]. Secondly, the Chiljongseong System of the draft for unified spelling system which is currently in use and the Paljongseonggajokyong of Korean script haerye were illustrated. The question on whether the phonetic value on 'ㄷ' and 'ㅅ' among the phonemes used in the final consonant of the korean fan guage is the same has been argued in the academic world for a long time. In this study, the transition stages of Korean consonants were investigated, and Ciljonseeng and Paljongseonggajokyong were utilized in speech recognition, and its effectiveness was verified. The experiment was divided into isolated word recognition and speech recognition, and in order to conduct the experiment PBW452 was used to test the isolated word recognition. The experiment was conducted on about 50 men and women - divided into 5 groups - and they vocalized 50 words each. As for the continuous speech recognition experiment to be utilized in the materialized stock exchange system, the sentence corpus of 71 stock exchange sentences and speech corpus vocalizing the sentences were collected and used 5 men and women each vocalized a sentence twice. As the result of the experiment, when the Paljongseonggajokyong was used as the consonant, the recognition performance elevated by an average of about 1.45% : and when phone likely unit with Paljongseonggajokyong and auditory phonetic applied simultaneously, was applied, the rate of recognition increased by an average of 1.5% to 2.02%. In the continuous speech recognition experiment, the recognition performance elevated by an average of about 1% to 2% than when the existing 49 or 56 phone likely units were utilized.

Coarticulation Model of Hangul Visual speedh for Lip Animation (입술 애니메이션을 위한 한글 발음의 동시조음 모델)

  • Gong, Gwang-Sik;Kim, Chang-Heon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.9
    • /
    • pp.1031-1041
    • /
    • 1999
  • 기존의 한글에 대한 입술 애니메이션 방법은 음소의 입모양을 몇 개의 입모양으로 정의하고 이들을 보간하여 입술을 애니메이션하였다. 하지만 발음하는 동안의 실제 입술 움직임은 선형함수나 단순한 비선형함수가 아니기 때문에 보간방법에 의해 중간 움직임을 생성하는 방법으로는 음소의 입술 움직임을 효과적으로 생성할 수 없다. 또 이 방법은 동시조음도 고려하지 않아 음소들간에 변화하는 입술 움직임도 표현할 수 없었다. 본 논문에서는 동시조음을 고려하여 한글을 자연스럽게 발음하는 입술 애니메이션 방법을 제안한다. 비디오 카메라로 발음하는 동안의 음소의 움직임들을 측정하고 입술 움직임 제어 파라미터들을 추출한다. 각각의 제어 파라미터들은 L fqvist의 스피치 생성 제스처 이론(speech production gesture theory)을 이용하여 실제 음소의 입술 움직임에 근사한 움직임인 지배함수(dominance function)들로 정의되고 입술 움직임을 애니메이션할 때 사용된다. 또, 각 지배함수들은 혼합함수(blending function)와 반음절에 의한 한글 합성 규칙을 사용하여 결합하고 동시조음이 적용된 한글을 발음하게 된다. 따라서 스피치 생성 제스처 이론을 이용하여 입술 움직임 모델을 구현한 방법은 기존의 보간에 의해 중간 움직임을 생성한 방법보다 실제 움직임에 근사한 움직임을 생성하고 동시조음도 고려한 움직임을 보여준다.Abstract The existing lip animation method of Hangul classifies the shape of lips with a few shapes and implements the lip animation with interpolating them. However it doesn't represent natural lip animation because the function of the real motion of lips, during articulation, isn't linear or simple non-linear function. It doesn't also represent the motion of lips varying among phonemes because it doesn't consider coarticulation. In this paper we present a new coarticulation model for the natural lip animation of Hangul. Using two video cameras, we film the speaker's lips and extract the lip control parameters. Each lip control parameter is defined as dominance function by using L fqvist's speech production gesture theory. This dominance function approximates to the real lip animation of a phoneme during articulation of one and is used when lip animation is implemented. Each dominance function combines into blending function by using Hangul composition rule based on demi-syllable. Then the lip animation of our coarticulation model represents natural motion of lips. Therefore our coarticulation model approximates to real lip motion rather than the existing model and represents the natural lip motion considered coarticulation.

Methods for Video Caption Extraction and Extracted Caption Image Enhancement (영화 비디오 자막 추출 및 추출된 자막 이미지 향상 방법)

  • Kim, So-Myung;Kwak, Sang-Shin;Choi, Yeong-Woo;Chung, Kyu-Sik
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.4
    • /
    • pp.235-247
    • /
    • 2002
  • For an efficient indexing and retrieval of digital video data, research on video caption extraction and recognition is required. This paper proposes methods for extracting artificial captions from video data and enhancing their image quality for an accurate Hangul and English character recognition. In the proposed methods, we first find locations of beginning and ending frames of the same caption contents and combine those multiple frames in each group by logical operation to remove background noises. During this process an evaluation is performed for detecting the integrated results with different caption images. After the multiple video frames are integrated, four different image enhancement techniques are applied to the image: resolution enhancement, contrast enhancement, stroke-based binarization, and morphological smoothing operations. By applying these operations to the video frames we can even improve the image quality of phonemes with complex strokes. Finding the beginning and ending locations of the frames with the same caption contents can be effectively used for the digital video indexing and browsing. We have tested the proposed methods with the video caption images containing both Hangul and English characters from cinema, and obtained the improved results of the character recognition.

Phonological development of children aged 3 to 7 under the condition of sentence repetition (문장 따라말하기 과제에서 3~7세 아동의 말소리발달)

  • Kim, Soo-Jin;Park, Na rae;Chang, Moon Soo;Kim, Young Tae;Shin, Moonja;Ha, Ji-Wan
    • Phonetics and Speech Sciences
    • /
    • v.12 no.1
    • /
    • pp.85-95
    • /
    • 2020
  • Sentence repetition is a way of evaluating speech sound production to improve the limitation of word tests and spontaneous speech analysis. Speech sounds produced by children can be evaluated using several indicators. This study examined the progression of the percentage of correct consonants-revised (PCC-R) and phonological whole-word measure in different age and gender groups after setting consonants in various vowel contexts and implementing sentence repetition tasks that were designed to give all phonemes the chance to appear at least three times. For this study, 11 sentence repetition tasks were applied to 535 children aged 3 to 7 across the country, after which the resulting PCC-R and whole-word measure were analyzed. The study results showed that all the indicators improved in older age groups and there were significant differences depending on age, however, no significant differences dependent on gender were found. The sentence repetition conditions data used in this study were collected from across the country, and the age difference between each age group was six months. This study is noteworthy because it collected a sufficient amount of data from each group, highlighted the limitation of the word naming and the spontaneous speech analysis, and suggests new criteria of evaluation through the analysis of each whole-word measure in sentence repetition, which was not applied in previous studies.

Improvement of Naturalness for a HMM-based Korean TTS using the prosodic boundary information (운율경계정보를 이용한 HMM기반 한국어 TTS 자연성 향상 연구)

  • Lim, Gi-Jeong;Lee, Jung-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.9
    • /
    • pp.75-84
    • /
    • 2012
  • HMM-based Text-to-Speech systems generally utilize context dependent tri-phone units from a large corpus speech DB to enhance the synthetic speech. To downsize a large corpus speech DB, acoustically similar tri-phone units are clustered based on the decision tree using context dependent information. Context dependent information includes phoneme sequence as well as prosodic information because the naturalness of synthetic speech highly depends on the prosody such as pause, intonation pattern, and segmental duration. However, if the prosodic information was complicated, many context dependent phonemes would have no examples in the training data, and clustering would provide a smoothed feature which will generate unnatural synthetic speech. In this paper, instead of complicate prosodic information we propose a simple three prosodic boundary types and decision tree questions that use rising tone, falling tone, and monotonic tone to improve naturalness. Experimental results show that our proposed method can improve naturalness of a HMM-based Korean TTS and get high MOS in the perception test.