• Title/Summary/Keyword: phonetic analysis

Search Result 273, Processing Time 0.024 seconds

A study on the voice command recognition at the motion control in the industrial robot (산업용 로보트의 동작제어 명령어의 인식에 관한 연구)

  • 이순요;권규식;김홍태
    • Journal of the Ergonomics Society of Korea
    • /
    • v.10 no.1
    • /
    • pp.3-10
    • /
    • 1991
  • The teach pendant and keyboard have been used as an input device of control command in human-robot sustem. But, many problems occur in case that the usef is a novice. So, speech recognition system is required to communicate between a human and the robot. In this study, Korean voice commands, eitht robot commands, and ten digits based on the broad phonetic analysis are described. Applying broad phonetic analysis, phonemes of voice commands are divided into phoneme groups, such as plosive, fricative, affricative, nasal, and glide sound, having similar features. And then, the feature parameters and their ranges to detect phoneme groups are found by minimax method. Classification rules are consisted of combination of the feature parameters, such as zero corssing rate(ZCR), log engery(LE), up and down(UD), formant frequency, and their ranges. Voice commands were recognized by the classification rules. The recognition rate was over 90 percent in this experiment. Also, this experiment showed that the recognition rate about digits was better than that about robot commands.

  • PDF

The Automated Threshold Decision Algorithm for Node Split of Phonetic Decision Tree (음소 결정트리의 노드 분할을 위한 임계치 자동 결정 알고리즘)

  • Kim, Beom-Seung;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.3
    • /
    • pp.170-178
    • /
    • 2012
  • In the paper, phonetic decision tree of the triphone unit was built for the phoneme-based speech recognition of 640 stations which run by the Korail. The clustering rate was determined by Pearson and Regression analysis to decide threshold used in node splitting. Using the determined the clustering rate, thresholds are automatically decided by the threshold value according to the average clustering rate. In the recognition experiments for verifying the proposed method, the performance improved 1.4~2.3 % absolutely than that of the baseline system.

A Comparative Study of Aphasics' Abilities in Reading and Writing Hangul and Hanja

  • Kim, Heui-Beom
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.289-293
    • /
    • 1996
  • In Korean, as with Kana and Kanji in Japanese, two kinds of word-writing systems--Hangul (the Korean alphabet) and Hanja (the Chinese character; Kanji in Japanese)--have been and still are being used. Hangul is phonetic while Hanja is ideographic. A phonetic alphabet represents the pronunciation of words, wheras ideographs are where a character of a writing system represents a concept. Aphasics suffer from language disorders following brain damage. The reading and writing of Hangul and Hanja by two Korean Broca's aphasics were analyzed with two goals. The first goal was to confirm the functional autonomy of reading and writing systems in the brain that has been argued by other researchers. The second goal was to reveal what difference the subjects show in reading and writing Hangul and Hanja. As experimental materials, 50 monosyllabic words were chosen in Hangul and Hanja respectively. The 50 word pairs of Hangul and Hanja have the same meaning and are also the most familiar monosyllabic words for a group of normal adults in their fifties and sixties. The errors that the aphasic subjects made in performing the experimental materials are analyzed and discussed here. This analysis has confirmed that reading and writing systems are located in different parts in the brain. Furthemore, it seems clear that the two writing systems of Hangul and Hanja have their own respective processes.

  • PDF

An Experimental Study on the Degree of Phonetic Similarity between Korean and Japanese Vowels (한국어와 일본어 단모음의 유사성 분석을 위한 실험음성학적 연구)

  • Kwon, Sung-Mi
    • MALSORI
    • /
    • no.63
    • /
    • pp.47-66
    • /
    • 2007
  • This study aims at exploring the degree of phonetic similarity between Korean and Japanese vowels in terms of acoustic features by performing the speech production test on Korean speakers and Japanese speakers. For this purpose, the speech of 16 Japanese speakers for Japanese speech data, and the speech of 16 Korean speakers for Korean speech data were utilized. The findings in assessing the degree of the similarity of the 7 nearest equivalents of the Korean and Japanese vowels are as follows: First, Korean /i/ and /e/ turned out to display no significant differences in terms of F1 and F2 with their counterparts, Japanese /i/ and /e/, and the distribution of F1 and F2 of Korean /i/ and /e/ in the distributional map completely overlapped with Japanese /i/ and /e/. Accordingly, Korean /i/ and /e/ were believed to be "identical." Second, Korean /a/, /o/, and /i/ displayed a significant difference in either F1 or F2, but showed a great similarity in distribution of F1 and F2 with Japanese /a/, /o/, and /m/ respectively. Korean /a/ /o/, and /i/, therefore, were categorized as very similar to Japanese vowels. Third, Korean /u/, which has the counterpart /m/ in Japanese, showed a significant difference in both F1 and F2, and only half of the distribution overlapped. Thus, Korean /u/ was analyzed as being a moderately similar vowel to Japanese vowels. Fourth, Korean /${\wedge}$/ did not have a close counterpart in Japanese, and was classified as "the least similar vowel."

  • PDF

Designing a large recording script for open-domain English speech synthesis

  • Kim, Sunhee;Kim, Hojeong;Lee, Yooseop;Kim, Boryoung;Won, Yongkook;Kim, Bongwan
    • Phonetics and Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.65-70
    • /
    • 2021
  • This paper proposes a method for designing a large recording script for open domain English speech synthesis. For read-aloud style text, 12 domains and 294 sub-domains were designed using text contained in five different news media publications. For conversational style text, 4 domains and 36 sub-domains were designed using movie subtitles. The final script consists of 43,013 sentences, 27,085 read-aloud style sentences, and 15,928 conversational style sentences, consisting of 549,683 tokens and 38,356 types. The completed script is analyzed using four criteria: word coverage (type coverage and token coverage), high-frequency vocabulary coverage, phonetic coverage (diphone coverage and triphone coverage), and readability. The type coverage of our script reaches 36.86% despite its low token coverage of 2.97%. The high-frequency vocabulary coverage of the script is 73.82%, and the diphone coverage and triphone coverage of the whole script is 86.70% and 38.92%, respectively. The average readability of whole sentences is 9.03. The results of analysis show that the proposed method is effective in producing a large recording script for English speech synthesis, demonstrating good coverage in terms of unique words, high-frequency vocabulary, phonetic units, and readability.

ToBI and beyond: Phonetic intonation of Seoul Korean ani in Korean Intonation Corpus (KICo)

  • Ji-eun Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.1-9
    • /
    • 2024
  • This study investigated the variation in the intonation of Seoul Korean interjection ani across different meanings ("no" and "really?") and speech levels (Intimate and Polite) using data from Korean Intonation Corpus (KICo). The investigation was conducted in two stages. First, IP-final tones in the dataset were categorized according to the K-ToBI convention (Jun, 2000). While significant relationships were observed between the meaning of ani and its IP-final tones, substantial overlap between groups was notable. Second, the F0 characteristics of the final syllable of ani were analyzed to elucidate the apparent many-to-many relationships between intonation and meaning/speech level. Results indicated that these seemingly overlapping relationships could be significantly distinguished. Overall, this study advocates for a deeper analysis of phonetic intonation beyond ToBI-based categorical labels. By examining the F0 characteristics of the IP-final syllable, previously unclear connections between meaning/speech level and intonation become more comprehensible. Although ToBI remains a valuable tool and framework for studying intonation, it is imperative to explore beyond these categories to grasp the "distinctiveness" of intonation, thereby enriching our understanding of prosody.

Full mouth rehabilitation accompanied by phonetic analysis of a patient with reduction of vertical dimension of occlusion, and inaccurate pronunciation due to numerous tooth loss: a case report (다수의 치아 상실로 인해 교합수직고경의 감소와 부정확한 발음을 가진 환자의 발음평가를 동반한 전악 수복 증례)

  • Ji-Young Park;Jong-Jin Kim;Jin Baik;Hyun-Suk Cha;Joo-Hee Lee
    • Journal of Dental Rehabilitation and Applied Science
    • /
    • v.39 no.3
    • /
    • pp.119-132
    • /
    • 2023
  • The loss of posterior occlusal support due to tooth loss is likely to lead to compensatory protrusion and labial tilt of the anterior teeth, which may be accompanied by a deep bite and a decrease in vertical dimension. The patient may suffer from a decrease in masticatory efficiency, inaccurate pronunciation, facial appearance changes, and temporomandibular joint disorder, so stable occlusal formation with support of posterior occlusion and restoration of vertical dimension is necessary. We report the case of a patient with reduction of vertical dimension, and inaccurate pronunciation due to multiple tooth loss who underwent full mouth rehabilitation with increased vertical dimension accompanied by phonetic analysis and achieved satisfactory functional and aesthetic results.

On the primacy of auditory phonetics In tonological analysis and pitch description;In connection with the development of a new pitch scale (성조 분석과 음조 기술에서 청각음성학의 일차성;반자동 음조 청취 등급 분석기 개발과 관련하여)

  • Gim, Cha-Gyun
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.3-23
    • /
    • 2007
  • King Sejong the Great, his students in Jip-hyeun-jeon school and Choe Sejin, their successor of the sixteenth century, indicated Middle Korean had three distinctive pitches, low, high, and rising (phyeong-, geo-, sang-sheong). Thanks to $Hun-min-jeng-{\emptyset}eum$ as well as its Annotation and side-dots literatures in fifteenth and sixteenth centuries, we can compare Middle Korean with Hamgyeong dialect, Gyeongsang dialect, and extant tone dialects with joint preservers of what was probably the tonal system of unitary mother Korean language. What is most remarkable about middle Korean phonetic work is its manifest superiority in conception and execution as anything produced in the present day linguistic scholarship. But at this stage in linguistics, prior to the technology and equipment needed for the scientific analysis of sound waves, auditory description was the only possible frame for an accurate and systematic classification. And auditory phonetics still remains fundamental in pitch description, even though modern acoustic categories may supplement and supersede auditory ones in tonological analysis. Auditory phonetics, however, has serious shortcoming that its theory and practice are too subject to be developed into the present century science. With joint researchers, I am developping a new pitch scale. It is a semiautomatic auditory grade pitch analysis program. The result of our labor will give a significant breakthrough to upgrade our component in linguistics.

  • PDF

Implementation of HMM-Based Speech Recognizer Using TMS320C6711 DSP

  • Bae Hyojoon;Jung Sungyun;Bae Keunsung
    • MALSORI
    • /
    • no.52
    • /
    • pp.111-120
    • /
    • 2004
  • This paper focuses on the DSP implementation of an HMM-based speech recognizer that can handle several hundred words of vocabulary size as well as speaker independency. First, we develop an HMM-based speech recognition system on the PC that operates on the frame basis with parallel processing of feature extraction and Viterbi decoding to make the processing delay as small as possible. Many techniques such as linear discriminant analysis, state-based Gaussian selection, and phonetic tied mixture model are employed for reduction of computational burden and memory size. The system is then properly optimized and compiled on the TMS320C6711 DSP for real-time operation. The implemented system uses 486kbytes of memory for data and acoustic models, and 24.5 kbytes for program code. Maximum required time of 29.2 ms for processing a frame of 32 ms of speech validates real-time operation of the implemented system.

  • PDF