• Title/Summary/Keyword: 조음도

Search Result 253, Processing Time 0.023 seconds

Creation and labeling of multiple phonotopic maps using a hierarchical self-organizing classifier (계층적 자기조직화 분류기를 이용한 다수 음성자판의 생성과 레이블링)

  • Chung, Dam;Lee, Kee-Cheol;Byun, Young-Tai
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.3
    • /
    • pp.600-611
    • /
    • 1996
  • Recently, neural network-based speech recognition has been studied to utilize the adaptivity and learnability of neural network models. However, conventional neural network models have difficulty in the co-articulation processing and the boundary detection of similar phonmes of the Korean speech. Also, in case of using one phonotopic map, learning speed may dramatically increase and inaccuracies may be caused because homogeneous learning and recognition method should be applied for heterogenous data. Hence, in this paper, a neural net typewriter has been designed using a hierarchical self-organizing classifier(HSOC), and related algorithms are presented. This HSOC, during its learing stage, distributed phoneme data on hierarchically structured multiple phonotopic maps, using Kohonen's self-organizing feature maps(SOFM). Presented and experimented in this paper were the algorithms for deciding the number of maps, map sizes, the selection of phonemes and their placement per map, an approapriate learning and preprocessing method per map. If maps are divided according to a priorlinguistic knowledge, we would have difficulty in acquiring linguistic knowledge and how to alpply it(e.g., processing extended phonemes). Contrarily, our HSOC has an advantage that multiple phonotopic maps suitable for given input data are self-organizable. The resulting three korean phonotopic maps are optimally labelled and have their own optimal preprocessing schemes, and also confirm to the conventional linguistic knowledge.

  • PDF

Statistical Analysis of Korean Phonological Variations Using a Grapheme-to-phoneme System (발음열 자동 생성기를 이용한 한국어 음운 변화 현상의 통계적 분석)

  • 이경님;정민화
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.7
    • /
    • pp.656-664
    • /
    • 2002
  • We present a statistical analysis of Korean phonological variations using a Grapheme-to-Phoneme (GPT) system. The GTP system used for experiments generates pronunciation variants by applying rules modeling obligatory and optional phonemic changes and allophonic changes. These rules are derived form morphophonological analysis and government standard pronunciation rules. The GTP system is optimized for continuous speech recognition by generating phonetic transcriptions for training and constructing a pronunciation dictionary for recognition. In this paper, we describe Korean phonological variations by analyzing the statistics of phonemic change rule applications for the 60,000 sentences in the Samsung PBS Speech DB. Our results show that the most frequently happening obligatory phonemic variations are in the order of liaison, tensification, aspirationalization, and nasalization of obstruent, and that the most frequently happening optional phonemic variations are in the order of initial consonant h-deletion, insertion of final consonant with the same place of articulation as the next consonants, and deletion of final consonant with the same place of articulation as the next consonant's, These statistics can be used for improving the performance of speech recognition systems.

Knowledge based Text to Facial Sequence Image System for Interaction of Lecturer and Learner in Cyber Universities (가상대학에서 교수자와 학습자간 상호작용을 위한 지식기반형 문자-얼굴동영상 변환 시스템)

  • Kim, Hyoung-Geun;Park, Chul-Ha
    • The KIPS Transactions:PartB
    • /
    • v.15B no.3
    • /
    • pp.179-188
    • /
    • 2008
  • In this paper, knowledge based text to facial sequence image system for interaction of lecturer and learner in cyber universities is studied. The system is defined by the synthesis of facial sequence image which is synchronized the lip according to the text information based on grammatical characteristic of hangul. For the implementation of the system, the transformation method that the text information is transformed into the phoneme code, the deformation rules of mouse shape which can be changed according to the code of phonemes, and the synthesis method of facial sequence image by using deformation rules of mouse shape are proposed. In the proposed method, all syllables of hangul are represented 10 principal mouse shape and 78 compound mouse shape according to the pronunciation characteristics of the basic consonants and vowels, and the characteristics of the articulation rules, respectively. To synthesize the real time facial sequence image able to realize the PC, the 88 mouth shape stored data base are used without the synthesis of mouse shape in each frame. To verify the validity of the proposed method the various synthesis of facial sequence image transformed from the text information is accomplished, and the system that can be applied the PC is implemented using the proposed method.

Speech Evaluation Tasks Related to Subthalamic Nucleus Deep Brain Stimulation in Idiopathic Parkinson's Disease: A Review (특발성 파킨슨병의 시상밑부핵 심부뇌자극술 관련 말 평가 과제에 대한 문헌연구)

  • Kim, Sun Woo;Kim, Hyang Hee
    • 재활복지
    • /
    • v.18 no.4
    • /
    • pp.237-255
    • /
    • 2014
  • Idiopathic Parkinson disease(IPD) is an neurodegenerative disease caused by the loss of dopamine cells in the substantia nigra, a region of midbrain. Its major symptoms are muscular rigidity, bradykinesia, resting tremor, and postural instability. An estimated 70~90% of patients with IPD also have hypokinetic dysarthria. Subthalamic nucleus deep brain stimulation (STN-DBS) has been reported to be successful in relieving the core motor symptoms of IPD in the advanced stages of the disease. However, data on the effects of STN-DBS on speech performance are inconsistent. A medline literature search was done to retrieve articles published from 1987 to 2012. The results were narrowed down to focus on speech performance under STN-DBS based perceptual, acoustic, and/or aerodynamic analyses. Among the 32 publications which dealt with speech performance after STN-DBS indicated improvement(42%), deterioration(29%), mixed results(26%), or no change(3%). The most favorite method was found to be based upon acoustic analysis by using a vowel prolongation and Unified Parkinson's Disease Rating Scale(UPDRS). For the purpose of verifying the effect of the STN-DBS, speech evaluation should be undertaken on all speech components such as articulation, resonance, phonation, respiration, and prosody by using a contextual speech task.

Acoustic analysis of Korean affricates produced by dysarthric speakers with cerebral palsy (뇌성마비 마비말장애 성인의 파찰음 실현 양상 분석)

  • Mun, Jihyun;Kim, Sunhee;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.45-55
    • /
    • 2021
  • This study aims to analyze the acoustic characteristics of Korean affricates produced by dysarthric speakers with cerebral palsy. Korean fricatives and affricates are the consonants that are prone to errors in dysarthric speech, but previous studies have focused only on fricatives. For this study, three affricates /tɕ, tɕh, ͈tɕ/ appearing at word initial and intervocalic positions produced by six mild-moderate male speakers of spastic dysarthria are selected from a QOLT database constructed in 2014. The parameters representing the acoustic characteristics of Korean affricates were extracted by using Praat: frication duration, closure duration, center of gravity, variance, skewness, kurtosis, and central moment. The results are as follows: 1) frication duration of the intervocalic affricates produced by dysarthric speakers was significantly longer than that of the non-disordered speakers; 2) the closure duration of dysarthric speakers was significantly longer; 3) in the case of the center of gravity, there was no significant difference between the two groups; 4) the skewness of the dysarthric speakers was significantly larger; and 5) the central moment of dysarthric speakers was significantly larger. This study investigated the characteristics of the affricates produced by dysarthric speakers and differences with non-disordered speakers.

Development and effects of Nanta program using speech rhythm for children with limited speech sound production (말소리가 제한된 아동을 위한 말리듬을 이용한 난타 프로그램의 개발과 효과)

  • Park, Yeong Hye;Choi, Seong Hee
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.67-76
    • /
    • 2021
  • Nanta means "tapping" using percussion instruments such as drums, which is the rhythm of Samulnori, a tradtional Korean music. Nanta speech rhythm intervention program was developed and applied for the children with limited speech sound production and investigated its effect. Nanta program provided audible stimulation, various sound loudness and beats, and rhythms. Nanta program consists of three stages : Respiration, phonation and articulation with the rhythm. Six children with language development delay participated in this study. Children were encouraged to explore sounds and beats and freely express sounds and beats. Along with the rhythm, children also were encouraged to produce speech sounds by increasing the length of syllables in mimetic and imitating words. A total of 15 sessions were conducted twice a week for 40 minutes per session. For exploring the effectiveness, raw scores from preschool receptive-expressive scales (PRES) and receptive-expressive vocabulary test (REVT) were obtained and compared before and after therapy. The results demonstrated that significantly improved receptive (p=.027) and expressive language scores (p=.024) in PRES and receptive (p=.028) and expressive (p=.028) vocabulary scores following intervention using Wilcoxon signed-rank test.These findings suggest that the nanta rhythm program can be useful for improving language development and vocabulary in children with limited speech sound production.

Occupational Therapists' Perception and Communication of the Elderly (작업치료사를 대상으로 한 노인 인식 및 의사소통 현황)

  • Park, Sung-Ji;Han, Hee-Won;Jung, Yu-Jin;Woo, Hee-Soon
    • The Journal of Korean society of community based occupational therapy
    • /
    • v.10 no.3
    • /
    • pp.15-26
    • /
    • 2020
  • Objective : Due to the rapid aging of the country, the elderly population is increasing and the ratio of face-to-face meetings with the elderly in major areas of occupational therapy is high. The purpose of this study was to investigate the overall perception of the elderly and the status of communication with the elderly. Methods : This survey was conducted From April to June 2020. Occupational therapists who provide interventions for the elderly in hospitals and public health centers across the country were collected using online questionnaires. A total of 91 questionnaires were collected through a self-administered questionnaire, of which 82 were used for data analysis, excluding 9 with insufficient responses. Results : Occupational therapists generally had a lot of negative perceptions of the elderly. In addition, as factors that hindered communication with the elderly, they responded in order of hearing-related, language understanding related, and articulation related. Lastly, occupational therapists were found to induce smooth communication with the elderly in the order of conversation response management, emotional expression, understanding enhancement, and relationship control. Conclusions : Occupational therapists are health care professionals who play a major role in improving the subjects' social participation and quality of life. Therefore, it is necessary to provide professional therapeutic interventions and related education using improved communication skills along with an understanding of the elderly's communication skills and their level of communication.

Personalized Speech Classification Scheme for the Smart Speaker Accessibility Improvement of the Speech-Impaired people (언어장애인의 스마트스피커 접근성 향상을 위한 개인화된 음성 분류 기법)

  • SeungKwon Lee;U-Jin Choe;Gwangil Jeon
    • Smart Media Journal
    • /
    • v.11 no.11
    • /
    • pp.17-24
    • /
    • 2022
  • With the spread of smart speakers based on voice recognition technology and deep learning technology, not only non-disabled people, but also the blind or physically handicapped can easily control home appliances such as lights and TVs through voice by linking home network services. This has greatly improved the quality of life. However, in the case of speech-impaired people, it is impossible to use the useful services of the smart speaker because they have inaccurate pronunciation due to articulation or speech disorders. In this paper, we propose a personalized voice classification technique for the speech-impaired to use for some of the functions provided by the smart speaker. The goal of this paper is to increase the recognition rate and accuracy of sentences spoken by speech-impaired people even with a small amount of data and a short learning time so that the service provided by the smart speaker can be actually used. In this paper, data augmentation and one cycle learning rate optimization technique were applied while fine-tuning ResNet18 model. Through an experiment, after recording 10 times for each 30 smart speaker commands, and learning within 3 minutes, the speech classification recognition rate was about 95.2%.

Relationships between rhythm and fluency indices and listeners' ratings of Korean speakers' English paragraph reading (리듬 및 유창성 지수와 한국 화자의 영어 읽기 발화 청취 평가의 관련성)

  • Hyunsong Chung
    • Phonetics and Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.25-33
    • /
    • 2022
  • This study investigates the relationships between rhythm and fluency indices and listeners' ratings of the rhythm and fluency of Korean college students' English paragraph reading. 17 university students read and recorded a passage from "The North Wind and the Sun" twice before and after three months of English pronunciation instruction. Seven in-service and pre-service English teachers in graduate school assessed the rhythm and fluency of the utterances. In addition, the values of 14 indices of rhythm and fluency were extracted from each speech and the relationships between the indices and the listeners' ratings were analyzed. The rhythm indices of the speakers in this study did not differ significantly from those of native English speakers presented in previous studies in %V, VarcoV, and nPVIV, but were higher in ΔV, ΔC, and VarcoC and lower in speech rate. The level of rhythm and fluency demonstrated by Korean college students was comparable, at least in terms of objective values for certain indices. The fluency indices, such as percentage of pauses, articulation rate, and speech rate, significantly contributed more to predicting both rhythm and fluency ratings than the rhythm indices.

A Study on Korean Speech Animation Generation Employing Deep Learning (딥러닝을 활용한 한국어 스피치 애니메이션 생성에 관한 고찰)

  • Suk Chan Kang;Dong Ju Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.10
    • /
    • pp.461-470
    • /
    • 2023
  • While speech animation generation employing deep learning has been actively researched for English, there has been no prior work for Korean. Given the fact, this paper for the very first time employs supervised deep learning to generate Korean speech animation. By doing so, we find out the significant effect of deep learning being able to make speech animation research come down to speech recognition research which is the predominating technique. Also, we study the way to make best use of the effect for Korean speech animation generation. The effect can contribute to efficiently and efficaciously revitalizing the recently inactive Korean speech animation research, by clarifying the top priority research target. This paper performs this process: (i) it chooses blendshape animation technique, (ii) implements the deep-learning model in the master-servant pipeline of the automatic speech recognition (ASR) module and the facial action coding (FAC) module, (iii) makes Korean speech facial motion capture dataset, (iv) prepares two comparison deep learning models (one model adopts the English ASR module, the other model adopts the Korean ASR module, however both models adopt the same basic structure for their FAC modules), and (v) train the FAC modules of both models dependently on their ASR modules. The user study demonstrates that the model which adopts the Korean ASR module and dependently trains its FAC module (getting 4.2/5.0 points) generates decisively much more natural Korean speech animations than the model which adopts the English ASR module and dependently trains its FAC module (getting 2.7/5.0 points). The result confirms the aforementioned effect showing that the quality of the Korean speech animation comes down to the accuracy of Korean ASR.