• Title/Summary/Keyword: Number of Syllables

Search Result 88, Processing Time 0.019 seconds

Korean Head-Tail Tokenization and Part-of-Speech Tagging by using Deep Learning (딥러닝을 이용한 한국어 Head-Tail 토큰화 기법과 품사 태깅)

  • Kim, Jungmin;Kang, Seungshik;Kim, Hyeokman
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.4
    • /
    • pp.199-208
    • /
    • 2022
  • Korean is an agglutinative language, and one or more morphemes are combined to form a single word. Part-of-speech tagging method separates each morpheme from a word and attaches a part-of-speech tag. In this study, we propose a new Korean part-of-speech tagging method based on the Head-Tail tokenization technique that divides a word into a lexical morpheme part and a grammatical morpheme part without decomposing compound words. In this method, the Head-Tail is divided by the syllable boundary without restoring irregular deformation or abbreviated syllables. Korean part-of-speech tagger was implemented using the Head-Tail tokenization and deep learning technique. In order to solve the problem that a large number of complex tags are generated due to the segmented tags and the tagging accuracy is low, we reduced the number of tags to a complex tag composed of large classification tags, and as a result, we improved the tagging accuracy. The performance of the Head-Tail part-of-speech tagger was experimented by using BERT, syllable bigram, and subword bigram embedding, and both syllable bigram and subword bigram embedding showed improvement in performance compared to general BERT. Part-of-speech tagging was performed by integrating the Head-Tail tokenization model and the simplified part-of-speech tagging model, achieving 98.99% word unit accuracy and 99.08% token unit accuracy. As a result of the experiment, it was found that the performance of part-of-speech tagging improved when the maximum token length was limited to twice the number of words.

A comparison study of the characteristics of pauses and breath groups during paragraph reading for normal female adults with and without voice disorders (정상성인 여성 화자와 음성장애 성인 여성 화자의 문단 낭독 시 휴지 및 호흡단락 특성의 비교)

  • Pyo, Hwa Young
    • Phonetics and Speech Sciences
    • /
    • v.11 no.4
    • /
    • pp.109-116
    • /
    • 2019
  • This study was conducted to identify the characteristics of pauses and breath groups made by normal adults and patients with voice disorders while reading a paragraph. Forty normal female adults and forty female patients with a functional voice disorder (18-45 yrs.) read the "Gaeul" paragraph with the "Running Speech" protocol of the Phonatory Aerodynamic System (PAS), by which the pauses with or without inspiration and between or within syntactic words and breath groups were analyzed. The number of pauses with inspiration was found to be higher in the patient group, but the number of pauses without inspiration was higher in the normal group. The rate of syntactic word boundaries with pauses with inspiration was higher in the patient group, while the number of syllables per breath group was higher in the normal group. As these results can be explained by patients' poor breath support due to glottal insufficiency, the question of whether voice disorder patients use their pauses and breath groups properly should be considered carefully in evaluation and intervention.

Phonological processes of vowels in pronounced phrasal words of the Seoul Corpus by gender and age groups (서울코퍼스의 성별·연령 집단별 말 어절 모음에 나타난 음운변동)

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.23-29
    • /
    • 2017
  • This paper investigated the phonological processes of monophthongs and diphthongs in pronounced phrasal words of the Seoul Corpus by gender and age groups in order to provide linguists and phoneticians with a clearer understanding of the spoken Korean. Both orthographic and pronounced phrasal words were extracted from the transcribed label scripts of the Corpus using Praat. Then, phonological processes of monophthongs and diphthongs were tabulated using an R script after syllabifying the phrasal words into separate components. Results revealed that 97% of the number of syllables in the orthographic and pronounced phrasal words were the same while 65.8% showed difference in the syllable structure. 90.5% of the vowels in the orthographic phrasal words were realized in the pronounced phrasal words. A Chi-square test of independence was performed to obtain a significant dependence in the distribution of phonological process types of male and female groups along with a very strong correlation. Female group changed the diphthong yo into yv at the end of the pronounced phrasal words more often than the male group did. Age groups also showed a significant dependence in the distribution of phonological process types along with a very strong correlation. Females in the 40s produced the diphthong yv and made the vowel raising at the end of the pronounced phrasal words most often among the gender and age groups. From the results, this paper concludes that an analysis of phonological processes in light of syllable structure can contribute greatly to the understanding of the spoken Korean.

Maximum Likelihood-based Automatic Lexicon Generation for AI Assistant-based Interaction with Mobile Devices

  • Lee, Donghyun;Park, Jae-Hyun;Kim, Kwang-Ho;Park, Jeong-Sik;Kim, Ji-Hwan;Jang, Gil-Jin;Park, Unsang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.9
    • /
    • pp.4264-4279
    • /
    • 2017
  • In this paper, maximum likelihood-based automatic lexicon generation using mixed-syllables is proposed for unlimited vocabulary voice interface for East Asian languages (e.g. Korean, Chinese and Japanese) in AI-assistant based interaction with mobile devices. The conventional lexicon has two inevitable problems: 1) a tedious repetition of out-of-lexicon unit additions to the lexicon, and 2) the propagation of errors during a morpheme analysis and space segmentation. The proposed method provides an automatic framework to solve the above problems. The proposed method produces a level of overall accuracy similar to one of previous methods in the presence of one out-of-lexicon word in a sentence, but the proposed method provides superior results with the absolute improvements of 1.62%, 5.58%, and 10.09% in terms of word accuracy when the number of out-of-lexicon words in a sentence was two, three and four, respectively.

Acoustical Analysis of Phonological Reduction in Conversational Japanese (일본어 회화문에 나타난 축약형의 음운론적 해석과 음향음성학적 분석)

  • Choi, Young-Sook
    • Speech Sciences
    • /
    • v.8 no.4
    • /
    • pp.229-241
    • /
    • 2001
  • Using eighteen texts from various genera of present-day Japanese, I collected phonologically reduced forms frequently observed in conversational Japanese, and classified them in search of a unified. explanation of phonological phenomena. I found 7,516 cases of reduced forms which I divided into 43 categories according to the types of phonological changes they have undergone. The general tendencies are that deletion and fusion of a phoneme or an entire syllable takes place frequently, resulting in the decrease in the number of syllables. From a morphosyntactic point of view, phonological reduction often occurs at the NP and VP morpheme boundaries. The following findings are drawn from phonetical observations of reduction. (1) Vowels are more easily deleted than consonants. (2) Bilabials ([m], [b], and [w]) are the most likely candidates for deletion. (3) In a concatenation of vowels, closed vowels are absorbed into open vowels, or two adjacent vowels come to create another vowel, in which case reconstruction of the original sequence is not always predictable. (4) Alveolars are palatalized under the influence of front vowels. (5) Regressive assimilation takes place in a syllable starting with [r], changing the entire syllable into a phonological choked sound or a syllabic nasal, depending on the voicing of the following phoneme.

  • PDF

Phonological processes of vowels from orthographic to pronounced words in the Buckeye Corpus by sex and age groups

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.25-31
    • /
    • 2018
  • This paper investigated the phonological processes of monophthongs and diphthongs in the pronounced words present in the Buckeye Corpus and compared the frequency distribution of these processes by sex and age groups to provide a clearer understanding of spoken English to linguists and phoneticians. Both orthographic and pronounced words were extracted from the transcribed label scripts of the Buckeye Corpus using R. Next, the phonological processes of monophthongs and diphthongs in the orthographic and pronounced labels were tabulated using R scripts, and a frequency distribution by vowel process types, as well as sex and age groups, was created. The results revealed that 95% of the orthographic words contained the same number of syllables, whereas 5% had different numbers of vowels, thereby proving that speakers tend to preserve vowels in spontaneous speech. In addition, deletion processes were preferred in natural speech. Most vowel deletions occurred with an unstressed syllable. Chi-square tests were performed to calculate dependence in the distribution of phonological process types for male and female groups and young and old groups. The results showed a very strong correlation. This finding indicates that vowel processes occurred in approximately the same pattern in natural and spontaneous speech data regardless of sex and age, as well as whether or not the vowel processes were identical. Based on these results, the author concludes that an analysis of phonological processes in spontaneous speech corpora can greatly enhance practical understanding of spoken English.

Synchronizationof Synthetic Facial Image Sequences and Synthetic Speech for Virtual Reality (가상현실을 위한 합성얼굴 동영상과 합성음성의 동기구현)

  • 최장석;이기영
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.7
    • /
    • pp.95-102
    • /
    • 1998
  • This paper proposes a synchronization method of synthetic facial iamge sequences and synthetic speech. The LP-PSOLA synthesizes the speech for each demi-syllable. We provide the 3,040 demi-syllables for unlimited synthesis of the Korean speech. For synthesis of the Facial image sequences, the paper defines the total 11 fundermental patterns for the lip shapes of the Korean consonants and vowels. The fundermental lip shapes allow us to pronounce all Korean sentences. Image synthesis method assigns the fundermental lip shapes to the key frames according to the initial, the middle and the final sound of each syllable in korean input text. The method interpolates the naturally changing lip shapes in inbetween frames. The number of the inbetween frames is estimated from the duration time of each syllable of the synthetic speech. The estimation accomplishes synchronization of the facial image sequences and speech. In speech synthesis, disk memory is required to store 3,040 demi-syllable. In synthesis of the facial image sequences, however, the disk memory is required to store only one image, because all frames are synthesized from the neutral face. Above method realizes synchronization of system which can real the Korean sentences with the synthetic speech and the synthetic facial iage sequences.

  • PDF

Korean Intonation Patterns from the Viewpoint of F0 Percentage Change (F0 변화율로 본 한국어 억양 패턴의 음향 특성)

  • Lee, Ji Yeon;Lee, Ho-Young
    • Phonetics and Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.123-130
    • /
    • 2013
  • Previous researches on Korean intonation have been mainly focused on $F_0$ target frequencies, $F_0$ slope, and the duration of intonation patterns. This study investigated Korean intonation patterns, both boundary and phrasal tones, in relation to the $F_0$ percentage change between pitch targets. We measured the percentage change between the pitch targets of both boundary and phrasal tones. Additionally, the $F_0$ change between the preceding pitch target and the first pitch target of the boundary tone and the $F_0$ targets of the sequence of two LH phrasal tones ('LH + LH') were also measured. Two phrasal tones, LHLH and HLH, were compared with 'LH + LH' and the 'HLH' in the LHLH pattern respectively. We found that the percentage change between pitch targets in the phrasal tone is fixed to some extent. This helped explain why the slope of the phrasal tone is closely related to the number of syllables and the duration of the phrasal tone as discussed in previous studies. Since we analyzed the intonation patterns with the utterances from a large speech corpus, the results of this paper are expected to be used in building a larger annotated corpus of Korean.

The Recognition of Korean Syllables using Parameter Based on Principal Component Analysis (PCA 기반 파라메타를 이용한 숫자음 인식)

  • 박경훈;표창수;김창근;허강인
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2000.12a
    • /
    • pp.181-184
    • /
    • 2000
  • The new method of feature extraction is proposed, considering the statistic feature of human voice, unlike the conventional methods of voice extraction. PCA(principal Component Analysis) is applied to this new method. PCA removes the repeating of data after finding the axis direction which has the greatest variance in input dimension. Then the new method is applied to real voice recognition to assess performance. When results of the number recognition in this paper and the conventional Mel-Cepstrum of voice feature parameter are compared, there is 0.5% difference of recognition rate. Better recognition rate is expected than word or sentence recognition in that less convergence time than the conventional method in extracting voice feature. Also, better recognition tate is expected when the optimum vector is used by statistic feature of data.

  • PDF

The effect of word length on f0 intervals: Evidence from North Kyungsang children

  • Kim, Jungsun
    • Phonetics and Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.107-116
    • /
    • 2015
  • The present experiment investigated the effect of word length on the length of f0 intervals for North Kyungsang children. In order to find out the lengths of the f0 intervals, the f0 values at the midpoints of vowels in words were measured. F0 estimates were computed as intervals consistent with the logarithmic scale corresponding to the number of syllables in the words. The results indicated that the mean f0 intervals in words of different lengths showed a significant difference for the HH in HH vs. HHL and the LH in LH vs. LLH for North Kyungsang children. Adult speakers from the North Kyungsang region significantly differed only within the HH in HH vs. HHL. Adult speakers made a noticeable contribution in this characteristic from the children. The result of the adult study was presented to confirm whether the children used a North Kyungsang dialect. With respect to individual speaker differences, the North Kyungsang children showed more or less consistent patterns in quantile-quantile plots for the HH vs. HHL, but for the HL vs. LHL and LH vs. LLH, there were more variations than for the HH vs. HHL. The individual speakers' variation was the largest for the HL vs. LHL and the smallest for HH vs. HHL. Considering these results, the effect of word length on f0 intervals tended to show pitch accent-type-specific characteristics in the process of prosodic acquisition.