• Title/Summary/Keyword: English Grapheme

Grapheme-to-Phoneme Conversion Regularity Effects among Late Korean-English Bilinguals (후기 한국어-영어 이중언어화자의 자소-음소 변환 규칙에 따른 영어 규칙성 효과)

  • Kim, Dahee;Baik, Yeonji;Ryu, Jaehee;Nam, Kichun
    • Korean Journal of Cognitive Science
    • /
    • v.26 no.3
    • /
    • pp.323-355
    • /
    • 2015
  • This study examined grapheme-to-phoneme regularity effect among late Korean-English bilinguals by using whole word level task (lexical processing) and two meta-phonological tasks(sub-lexical processing): [1] English word naming task(whole word level), [2] rhyme judgement task(rhyme level), and [3] phoneme deletion task(phoneme level). Forty-three late Korean-English bilinguals participated in all three tasks. In these tasks, participants showed better performance in regular word conditions compared to irregular word conditions, demonstrating a clear English regularity effect. Post-hoc correlational analysis revealed strong correlation between word naming task and rhyme judgement task, which is different from the results reported with English monolinguals. The contradicting results might be due to the relevantly low English proficiency level among late Korean-English bilingual speakers. In conclusion, this study suggests that late Korean-English bilinguals make use of L2 grapheme-to-phoneme conversion (GPC) rule when reading L2 English words.

Perception and Production of English Grapheme <a> by Korean Students (한국 학생들의 영어 철자 <a> 인지와 발화)

Improvements of an English Pronunciation Dictionary Generator Using DP-based Lexicon Pre-processing and Context-dependent Grapheme-to-phoneme MLP (DP 알고리즘에 의한 발음사전 전처리와 문맥종속 자소별 MLP를 이용한 영어 발음사전 생성기의 개선)

  • 김회린;문광식;이영직;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.5
    • /
    • pp.21-27
    • /
    • 1999
  • In this paper, we propose an improved MLP-based English pronunciation dictionary generator to apply to the variable vocabulary word recognizer. The variable vocabulary word recognizer can process any words specified in Korean word lexicon dynamically determined according to the current recognition task. To extend the ability of the system to task for English words, it is necessary to build a pronunciation dictionary generator to be able to process words not included in a predefined lexicon, such as proper nouns. In order to build the English pronunciation dictionary generator, we use context-dependent grapheme-to-phoneme multi-layer perceptron(MLP) architecture for each grapheme. To train each MLP, it is necessary to obtain grapheme-to-phoneme training data from general pronunciation dictionary. To automate the process, we use dynamic programming(DP) algorithm with some distance metrics. For training and testing the grapheme-to-phoneme MLPs, we use general English pronunciation dictionary with about 110 thousand words. With 26 MLPs each having 30 to 50 hidden nodes and the exception grapheme lexicon, we obtained the word accuracy of 72.8% for the 110 thousand words superior to rule-based method showing the word accuracy of 24.0%.

  • PDF

An English-to-Korean Transliteration Model based on Grapheme and Phoneme (자소 및 음소 정보를 이용한 영어-한국어 음차표기 모델)

  • Oh Jong-Hoon;Choi Key-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.4
    • /
    • pp.312-326
    • /
    • 2005
  • There has been increasing interest in English-to-Korean transliteration recently. Previous ,works are related to a direct method like $\rightarrow$Korean graphemes> and a pivot method like $\rightarrow$English phoneme$\rightarrow$Korean graphemes>. Though most of the previous works focus on the direct method, transliteration, however, is a phonetic process rather than an orthographic one. In this point of view, we present an English-Korean transliteration model using grapheme and phoneme information. Unlike the previous works, our method uses phonetic information such as phonemes and their context. Moreover, we also use graphemes corresponding to phonemes. Our method shows about $60\%$ word accuracy.

Orthographic Influence in the Perception and Production of English Intervocalic Consonants: A Pilot Study (영어 모음사이 자음의 인지와 발화에서 철자의 영향: 파일럿 연구)

  • Cho, Mi-Hui;Chung, Ju-Yeon
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.12
    • /
    • pp.459-466
    • /
    • 2009
  • While Korean allows the same consonants at the coda of the preceding syllable and at the onset of the following syllable, English does not allow the geminate consonants in the same intervocalic position. Due to this difference between Korean and English, Korean learners of English tend to incorrectly produce geminate consonants for English geminate graphemes as in $su\underline{mm}er$. Based on this observation, a pilot study was designed to investigate how Korean learners of English perceive and produce English doubleton graphemes and singleton graphemes. Twenty Korean college students were asked to perform a forced-choice perception test as well as a production test for the 36 real word stimuli which consist of (near) minimal pairs of singleton and doubleton graphemes. The result showed that the accuracy rates for the words with singleton graphemes were higher than those for the words with doubleton graphemes both in perception and production because the subjects misperceived and misproduced the doubleton graphemes as geminates due to orthographic influence. In addition, the low error rates of the word with voiced stops were accounted for by Korean language transfer. Further, spectrographic analyses were provided where more production errors were witnessed in doubleton grapheme words than singleton grapheme words. Finally, pedagogical implications are provided.

Perception and Production of English Geminate Graphemes by Korean Students (한국 학생들의 영어 겹자음 철자 인지와 발화)

  • Cho, Mi-Hui
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2009.05a
    • /
    • pp.1092-1096
    • /
    • 2009
  • While Korean allows the same consonants at the coda of the preceding syllable and at the onset of the following syllable, English does not allow the geminate consonant in the same position. Due to this difference between Korean and English, Korean learners of English tend to incorrectly produce geminate consonants for English geminate graphemes as in summer. Based on this observation, a pilot study was designed to investigate how Korean learners of English perceive and produce English doubleton graphemes and singleton graphemes. Twenty Korean college students were asked to perform a forced-choice perception test as well as a production test for the 36 real word stimuli which consist of near minimal pairs of singleton and doubleton graphemes. The result showed that the accuracy rates for the word with singleton graphemes were relatively high both in perception and production (78.6% and 76.1%, respectively), while those for the word with doubleton graphemes were low both in perception and production (55.3% and 61.7%, respectively). Also, spectrographic analyses were provided where more production errors were witnessed in doubleton grapheme words than singleton grapheme words.

  • PDF

Construction of Linearly Aliened Corpus Using Unsupervised Learning (자율 학습을 이용한 선형 정렬 말뭉치 구축)

  • Lee, Kong-Joo;Kim, Jae-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.11B no.3
    • /
    • pp.387-394
    • /
    • 2004
  • In this paper, we propose a modified unsupervised linear alignment algorithm for building an aligned corpus. The original algorithm inserts null characters into both of two aligned strings (source string and target string), because the two strings are different from each other in length. This can cause some difficulties like the search space explosion for applications using the aligned corpus with null characters and no possibility of applying to several machine learning algorithms. To alleviate these difficulties, we modify the algorithm not to contain null characters in the aligned source strings. We have shown the usability of our approach by applying it to different areas such as Korean-English back-trans literation, English grapheme-phoneme conversion, and Korean morphological analysis.

Online Recognition of Handwritten Korean and English Characters

  • Ma, Ming;Park, Dong-Won;Kim, Soo Kyun;An, Syungog
    • Journal of Information Processing Systems
    • /
    • v.8 no.4
    • /
    • pp.653-668
    • /
    • 2012
  • In this study, an improved HMM based recognition model is proposed for online English and Korean handwritten characters. The pattern elements of the handwriting model are sub character strokes and ligatures. To deal with the problem of handwriting style variations, a modified Hierarchical Clustering approach is introduced to partition different writing styles into several classes. For each of the English letters and each primitive grapheme in Korean characters, one HMM that models the temporal and spatial variability of the handwriting is constructed based on each class. Then the HMMs of Korean graphemes are concatenated to form the Korean character models. The recognition of handwritten characters is implemented by a modified level building algorithm, which incorporates the Korean character combination rules within the efficient network search procedure. Due to the limitation of the HMM based method, a post-processing procedure that takes the global and structural features into account is proposed. Experiments showed that the proposed recognition system achieved a high writer independent recognition rate on unconstrained samples of both English and Korean characters. The comparison with other schemes of HMM-based recognition was also performed to evaluate the system.

Pronunciation Variation Patterns of Loanwords Produced by Korean and Grapheme-to-Phoneme Conversion Using Syllable-based Segmentation and Phonological Knowledge (한국인 화자의 외래어 발음 변이 양상과 음절 기반 외래어 자소-음소 변환)

  • Ryu, Hyuksu;Na, Minsu;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.139-149
    • /
    • 2015
  • This paper aims to analyze pronunciation variations of loanwords produced by Korean and improve the performance of pronunciation modeling of loanwords in Korean by using syllable-based segmentation and phonological knowledge. The loanword text corpus used for our experiment consists of 14.5k words extracted from the frequently used words in set-top box, music, and point-of-interest (POI) domains. At first, pronunciations of loanwords in Korean are obtained by manual transcriptions, which are used as target pronunciations. The target pronunciations are compared with the standard pronunciation using confusion matrices for analysis of pronunciation variation patterns of loanwords. Based on the confusion matrices, three salient pronunciation variations of loanwords are identified such as tensification of fricative [s] and derounding of rounded vowel [ɥi] and [$w{\varepsilon}$]. In addition, a syllable-based segmentation method considering phonological knowledge is proposed for loanword pronunciation modeling. Performance of the baseline and the proposed method is measured using phone error rate (PER)/word error rate (WER) and F-score at various context spans. Experimental results show that the proposed method outperforms the baseline. We also observe that performance degrades when training and test sets come from different domains, which implies that loanword pronunciations are influenced by data domains. It is noteworthy that pronunciation modeling for loanwords is enhanced by reflecting phonological knowledge. The loanword pronunciation modeling in Korean proposed in this paper can be used for automatic speech recognition of application interface such as navigation systems and set-top boxes and for computer-assisted pronunciation training for Korean learners of English.

Korean Lip-Reading: Data Construction and Sentence-Level Lip-Reading (한국어 립리딩: 데이터 구축 및 문장수준 립리딩)

  • Sunyoung Cho;Soosung Yoon
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.27 no.2
    • /
    • pp.167-176
    • /
    • 2024
  • Lip-reading is the task of inferring the speaker's utterance from silent video based on learning of lip movements. It is very challenging due to the inherent ambiguities present in the lip movement such as different characters that produce the same lip appearances. Recent advances in deep learning models such as Transformer and Temporal Convolutional Network have led to improve the performance of lip-reading. However, most previous works deal with English lip-reading which has limitations in directly applying to Korean lip-reading, and moreover, there is no a large scale Korean lip-reading dataset. In this paper, we introduce the first large-scale Korean lip-reading dataset with more than 120 k utterances collected from TV broadcasts containing news, documentary and drama. We also present a preprocessing method which uniformly extracts a facial region of interest and propose a transformer-based model based on grapheme unit for sentence-level Korean lip-reading. We demonstrate that our dataset and model are appropriate for Korean lip-reading through statistics of the dataset and experimental results.