• Title/Summary/Keyword: grapheme recognition

Search Result 34, Processing Time 0.027 seconds

A Study on Grapheme and Grapheme Recognition Using Connected Components Grapheme for Machine-Printed Korean Character Recognition

  • Lee, Kyong-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.9
    • /
    • pp.27-36
    • /
    • 2016
  • Recognition of grapheme is a very important process in the recognition within 'Hangul(Korean written language)' letters using phoneme recognition. It is because the success or failure in the recognition of phoneme greatly affects the recognition of letters. For this reason, it is reported that separation of phonemes is the biggest difficulty in the phoneme recognition study. The current study separates and suggests the new phonemes that used the connective elements that are helpful for dividing phonemes, recommends the features for recognition of such suggested phonemes, databases this, and carried out a set of experiments of recognizing phonemes using the suggested features. The current study used 350 letters in the experiment of phoneme separation and recognition. In this particular kind of letters, there were 1,125 phonemes suggested. In the phoneme separation experiment, the phonemes were divided in the rate of 100%, and the phoneme recognition experiment showed the recognition rate of 98% in recognizing only 14 phonemes into different ones.

Grapheme-based on-line recognition of cursive korean characters (자소 단위의 온라인 흘림체 한글 인식)

  • 정기철;김상균;이종국;김행준
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.9
    • /
    • pp.124-134
    • /
    • 1996
  • Korean has a large set of characters, and has a two-dimensional formation: each character is composed of graphemes in two dimensions. Whereas connections between characters are rare, connections inside a grapheme and between graphemes happen frequently and these connections generate many cursive strokes. To deal with the large character set and the cursive strokes, using the graphemes as a recognition unit is an efffective approach, because it naturally accommodates the structural characteristics of the characters. In this paper, we propose a grapheme-based on-line recognition method for cursive korean characters. Our method uses a TDNN recognition engine to segment cursive strokes into graphemes and a graph-algorithmic postprocessor based on korean grapheme composition rule and viterbi search algorithm to find the best recognition score path. We experimented the method on freely hand-written charactes and obtained a recognition rate of 94.5%.

  • PDF

Korean speech recognition based on grapheme (문자소 기반의 한국어 음성인식)

  • Lee, Mun-hak;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.5
    • /
    • pp.601-606
    • /
    • 2019
  • This paper is a study on speech recognition in the Korean using grapheme unit (Cho-sumg [onset], Jung-sung [nucleus], Jong-sung [coda]). Here we make ASR (Automatic speech recognition) system without G2P (Grapheme to Phoneme) process and show that Deep learning based ASR systems can learn Korean pronunciation rules without G2P process. The proposed model is shown to reduce the word error rate in the presence of sufficient training data.

A study on Machine-Printed Korean Character Recognition by the Character Composition form Information of the Graphemes and Graphemes using the Connection Ingredient and by the Vertical Detection Information in the Weight Center of Graphemes

  • Lee, Kyong-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.3
    • /
    • pp.97-105
    • /
    • 2017
  • This study is the realization study recognizing the Korean gothic printing letter. This study defined the new grapheme by using the connection ingredient and had the graphemes recognized by means of the feature dots of the isolated dot, end dot, 2-line gathering dots, more than 3 lines gathering dots, and classified the characters by means of the arrangement information of the graphemes and the layers that the graphemes form within the characters, and made the character database for the recognition by using them. The layers and the arrangement information of the graphemes consisting in the characters were presumed by using the weight center position information of the graphemes extracted from the characters to recognize and the information of the graphemes obtained by vertically exploring from the weight center of each grapheme, and it recognized the characters by judging and comparing the character groups of the database by means of the information which was secured this way. 350 characters were used for the character recognition test and about 97% recognition result was obtained by recognizing 338 characters.

Ambiguity Types of the Homonymic & Heterographic Units for Improving Korean Voice Recognition System - a Preliminary Research (한국어 음성인식 시스템 향상을 위한 동음이철 단위의 중의성 유형 분류)

  • Yoon, Ae-Sun;Kang, Mi-Young
    • Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.67-81
    • /
    • 2008
  • The accuracy rate of P2G (Phoneme-to-Grapheme) is one of the important factors determining the quality of unlimited voice recognition (VR) systems. Few studies were, however, conducted to reduce ambiguities of a phoneme string which can be segmented into a variety of different linguistic units (i.e. morphemes, words, eo-jeols), thus be transformed into more than one grapheme string. This paper is a preliminary research for building a large knowledge base of those homonymic & heterographic units(HHUs), which will provide unlimited Korean VR systems with more accurate P2G information. This paper analyzes 2 main factors generating HHUs: (1) boundary determination of the prosodic unit; (2) its segmentation into linguistic units. In this paper, linguistic characteristics determining variable boundaries of a prosodic unit are investigated, and the ambiguity types of HHUs are classified in accordance with their morphological and syntactic structures as well as with the phonological rules governing them.

  • PDF

Handwritten Hangul Graphemes Classification Using Three Artificial Neural Networks

  • Aaron Daniel Snowberger;Choong Ho Lee
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.2
    • /
    • pp.167-173
    • /
    • 2023
  • Hangul is unique compared to other Asian languages because of its simple letter forms that combine to create syllabic shapes. There are 24 basic letters that can be combined to form 27 additional complex letters. This produces 51 graphemes. Hangul optical character recognition has been a research topic for some time; however, handwritten Hangul recognition continues to be challenging owing to the various writing styles, slants, and cursive-like nature of the handwriting. In this study, a dataset containing thousands of samples of 51 Hangul graphemes was gathered from 110 freshmen university students to create a robust dataset with high variance for training an artificial neural network. The collected dataset included 2200 samples for each consonant grapheme and 1100 samples for each vowel grapheme. The dataset was normalized to the MNIST digits dataset, trained in three neural networks, and the obtained results were compared.

A Study on Hanguel Character Recognition using GRNN (자소 인식 신경망을 이용한 한글 문자 인식에 관한 연구)

  • 장석진;강선미;김혁구;노우식;김덕진
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.1
    • /
    • pp.81-87
    • /
    • 1994
  • This paper describes the recognition of the printed Hanguel(Korean Character) using Neural Network. In this study, Neural network is used in only specific classification. Hanguel is classified globally by using template matching. Neural network is learned using the segmented grapheme. The grapheme of Hanguel is segmented using the structural method. Neural network is constructed, which is corresponded to the kind and the shape of graphemes. Each neural network is multi layer perceptron. The learning algorithm is the modified error back propagation using descending epsilon method. With five test character sets, the recognition rate of 94.95% is obtained.

  • PDF

A Study on Character Recognition using Connected Components Grapheme (연결성분 자소를 이용한 문자 인식 연구)

  • Lee, Kyong-Ho
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2017.01a
    • /
    • pp.157-160
    • /
    • 2017
  • 본 연구에서는 한글 문자 인식을 수행하였다. 한글 인식을 수행하되 고딕 인쇄체 문자를 대상으로 하였고, 자소 단위 인식을 통한 인식을 수행하되 기존 한글 문자 인식 연구에서 사용하는 자음과 모음 단위의 자소가 아닌 연결성분을 이용하여 인식하는 새로운 자소를 이용하였다. 새로운 자소들은 끝점, 2선 모임점, 3선 모임점, 4선 모임점의 특징을 추출하고 특징에 의해 자소를 인식하는 데이터베이스를 구성하여 자소를 인식하게 하였다. 또한 연결 성분을 반영한 새로운 자소로 고딕 인쇄체 문자를 인식하므로 추출된 자소를 6가지로 분류하였고, 6가지 자소에 의해 구성되는 92가지 문자 구조를 제안하고 이에 따른 문자를 데이터베이스를 구축하였고, 자소의 무게 중심을 이용한 분포를 이용하여 제안된 구조를 통하여 데이터베이스를 이용한 문자인식을 수행하였다.

  • PDF

Graphemes Segmentation for Arabic Online Handwriting Modeling

  • Boubaker, Houcine;Tagougui, Najiba;El Abed, Haikal;Kherallah, Monji;Alimi, Adel M.
    • Journal of Information Processing Systems
    • /
    • v.10 no.4
    • /
    • pp.503-522
    • /
    • 2014
  • In the cursive handwriting recognition process, script trajectory segmentation and modeling represent an important task for large or open lexicon context that becomes more complicated in multi-writer applications. In this paper, we will present a developed system of Arabic online handwriting modeling based on graphemes segmentation and the extraction of its geometric features. The main contribution consists of adapting the Fourier descriptors to model the open trajectory of the segmented graphemes. To segment the trajectory of the handwriting, the system proceeds by first detecting its baseline by checking combined geometric and logic conditions. Then, the detected baseline is used as a topologic reference for the extraction of particular points that delimit the graphemes' trajectories. Each segmented grapheme is then represented by a set of relevant geometric features that include the vector of the Fourier descriptors for trajectory shape modeling, normalized metric parameters that model the grapheme dimensions, its position in respect to the baseline, and codes for the description of its associated diacritics.

Corpus Based Unrestricted vocabulary Mandarin TTS (코퍼스 기반 무제한 단어 중국어 TTS)

  • Yu Zheng;Ha Ju-Hong;Kim Byeongchang;Lee Gary Geunbae
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.175-179
    • /
    • 2003
  • In order to produce a high quality (intelligibility and naturalness) synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model. In this paper, we analyzed Chinese texts using a segmentation, POS tagging and unknown word recognition. We present a grapheme-to-phoneme conversion using a dictionary-based and rule-based method. We constructed a prosody model using a probabilistic method and a decision tree-based error correction method. According to the result from the above analysis, we can successfully select and concatenate exact synthesis unit of syllables from the Chinese Synthesis DB.

  • PDF