• Title/Summary/Keyword: vocabulary data

Search Result 285, Processing Time 0.026 seconds

Implementation of Korean TTS System based on Natural Language Processing (자연어 처리 기반 한국어 TTS 시스템 구현)

  • Kim Byeongchang;Lee Gary Geunbae
    • MALSORI
    • /
    • no.46
    • /
    • pp.51-64
    • /
    • 2003
  • In order to produce high quality synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model from texts using natural language processing. Robust preprocessing for non-Korean characters should also be required. In this paper, we analyzed Korean texts using a morphological analyzer, part-of-speech tagger and syntactic chunker. We present a new grapheme-to-phoneme conversion method for Korean using a hybrid method with a phonetic pattern dictionary and CCV (consonant vowel) LTS (letter to sound) rules, for unlimited vocabulary Korean TTS. We constructed a prosody model using a probabilistic method and decision tree-based method. The probabilistic method atone usually suffers from performance degradation due to inherent data sparseness problems. So we adopted tree-based error correction to overcome these training data limitations.

  • PDF

Determining the Optimal Number of Signal Clusters Using Iterative HMM Classification

  • Ernest, Duker Junior;Kim, Yoon Joong
    • International journal of advanced smart convergence
    • /
    • v.7 no.2
    • /
    • pp.33-37
    • /
    • 2018
  • In this study, we propose an iterative clustering algorithm that automatically clusters a set of voice signal data without a label into an optimal number of clusters and generates hmm model for each cluster. In the clustering process, the likelihood calculations of the clusters are performed using iterative hmm learning and testing while varying the number of clusters for given data, and the maximum likelihood estimation method is used to determine the optimal number of clusters. We tested the effectiveness of this clustering algorithm on a small-vocabulary digit clustering task by mapping the unsupervised decoded output of the optimal cluster to the ground-truth transcription, we found out that they were highly correlated.

Effective Acoustic Model Clustering via Decision Tree with Supervised Decision Tree Learning

  • Park, Jun-Ho;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.71-84
    • /
    • 2003
  • In the acoustic modeling for large vocabulary speech recognition, a sparse data problem caused by a huge number of context-dependent (CD) models usually leads the estimated models to being unreliable. In this paper, we develop a new clustering method based on the C45 decision-tree learning algorithm that effectively encapsulates the CD modeling. The proposed scheme essentially constructs a supervised decision rule and applies over the pre-clustered triphones using the C45 algorithm, which is known to effectively search through the attributes of the training instances and extract the attribute that best separates the given examples. In particular, the data driven method is used as a clustering algorithm while its result is used as the learning target of the C45 algorithm. This scheme has been shown to be effective particularly over the database of low unknown-context ratio in terms of recognition performance. For speaker-independent, task-independent continuous speech recognition task, the proposed method reduced the percent accuracy WER by 3.93% compared to the existing rule-based methods.

  • PDF

A Basic Study on the Concept and Cases of Healthy Eco Dwelling (친환경 건강주거의 개념과 사례에 대한 기초 연구)

  • Lee, Min-Kyoung;Jung, Jin-Ju;Choi, Hyo-Seung
    • Journal of the Korean Institute of Rural Architecture
    • /
    • v.7 no.3
    • /
    • pp.68-75
    • /
    • 2005
  • 'Health' trend like the latest fashion that spread to present society that elevate a lot of interests in qualitative elevation of life by rapid economic growth did to derive vocabulary called 'well-being' over all field. More and more comprehensive new type word 'LOHAS' appear to this. I wish to present basic data that need in study of healthy eco dwelling through making comparative study of existing study, literature analysis, and application type and planning elements etc of healthy dwelling. This study is preceded in following steps and contents. First, make conceptional definition for similarity words regarding healthy eco dwelling through existing study and internet search data etc, and analysis comprehensive meaning of 'Health' and 'Healthy Dwelling'. Second, I have done arrangement examination synthetically about healthy eco dwelling, wellbeing house, and ubiquitous future dwelling by paradigm of spread healthy dwelling. Third, examine application type of healthy dwelling that appear present and arrange analyzing planning elements is applied in the types.

  • PDF

Fast Speaker Adaptation and Environment Compensation Based on Eigenspace-based MLLR (Eigenspace-based MLLR에 기반한 고속 화자적응 및 환경보상)

  • Song Hwa-Jeon;Kim Hyung-Soon
    • MALSORI
    • /
    • no.58
    • /
    • pp.35-44
    • /
    • 2006
  • Maximum likelihood linear regression (MLLR) adaptation experiences severe performance degradation with very tiny amount of adaptation data. Eigenspace- based MLLR, as an alternative to MLLR for fast speaker adaptation, also has a weak point that it cannot deal with the mismatch between training and testing environments. In this paper, we propose a simultaneous fast speaker and environment adaptation based on eigenspace-based MLLR. We also extend the sub-stream based eigenspace-based MLLR to generalize the eigenspace-based MLLR with bias compensation. A vocabulary-independent word recognition experiment shows the proposed algorithm is superior to eigenspace-based MLLR regardless of the amount of adaptation data in diverse noisy environments. Especially, proposed sub-stream eigenspace-based MLLR with bias compensation yields 67% relative improvement with 10 adaptation words in 10 dB SNR environment, in comparison with the conventional eigenspace-based MLLR.

  • PDF

Automatic Target Recognition Study using Knowledge Graph and Deep Learning Models for Text and Image data (지식 그래프와 딥러닝 모델 기반 텍스트와 이미지 데이터를 활용한 자동 표적 인식 방법 연구)

  • Kim, Jongmo;Lee, Jeongbin;Jeon, Hocheol;Sohn, Mye
    • Journal of Internet Computing and Services
    • /
    • v.23 no.5
    • /
    • pp.145-154
    • /
    • 2022
  • Automatic Target Recognition (ATR) technology is emerging as a core technology of Future Combat Systems (FCS). Conventional ATR is performed based on IMINT (image information) collected from the SAR sensor, and various image-based deep learning models are used. However, with the development of IT and sensing technology, even though data/information related to ATR is expanding to HUMINT (human information) and SIGINT (signal information), ATR still contains image oriented IMINT data only is being used. In complex and diversified battlefield situations, it is difficult to guarantee high-level ATR accuracy and generalization performance with image data alone. Therefore, we propose a knowledge graph-based ATR method that can utilize image and text data simultaneously in this paper. The main idea of the knowledge graph and deep model-based ATR method is to convert the ATR image and text into graphs according to the characteristics of each data, align it to the knowledge graph, and connect the heterogeneous ATR data through the knowledge graph. In order to convert the ATR image into a graph, an object-tag graph consisting of object tags as nodes is generated from the image by using the pre-trained image object recognition model and the vocabulary of the knowledge graph. On the other hand, the ATR text uses the pre-trained language model, TF-IDF, co-occurrence word graph, and the vocabulary of knowledge graph to generate a word graph composed of nodes with key vocabulary for the ATR. The generated two types of graphs are connected to the knowledge graph using the entity alignment model for improvement of the ATR performance from images and texts. To prove the superiority of the proposed method, 227 documents from web documents and 61,714 RDF triples from dbpedia were collected, and comparison experiments were performed on precision, recall, and f1-score in a perspective of the entity alignment..

Protein Ontology: Semantic Data Integration in Proteomics

  • Sidhu, Amandeep S.;Dillon, Tharam S.;Chang, Elizabeth;Sidhu, Baldev S.
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.388-391
    • /
    • 2005
  • The Protein Structural and Functional Conservation need a common language for data definition. With the help of common language provided by Protein Ontology the high level of sequence and functional conservation can be extended to all organisms with the likelihood that proteins that carry out core biological processes will again be probable orthologues. The structural and functional conservation in these proteins presents both opportunities and challenges. The main opportunity lies in the possibility of automated transfer of protein data annotations from experimentally traceable model organisms to a less traceable organism based on protein sequence similarity. Such information can be used to improve human health or agriculture. The challenge lies in using a common language to transfer protein data annotations among different species of organisms. First step in achieving this huge challenge is producing a structured, precisely defined common vocabulary using Protein Ontology. The Protein Ontology described in this paper covers the sequence, structure and biological roles of Protein Complexes in any organism.

  • PDF

English-Korean Transfer Dictionary Extension Tool in English-Korean Machine Translation System (영한 기계번역 시스템의 영한 변환사전 확장 도구)

  • Kim, Sung-Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.1
    • /
    • pp.35-42
    • /
    • 2013
  • Developing English-Korean machine translation system requires the construction of information about the languages, and the amount of information in English-Korean transfer dictionary is especially critical to the translation quality. Newly created words are out-of-vocabulary words and they appear as they are in the translated sentence, which decreases the translation quality. Also, compound nouns make lexical and syntactic analysis complex and it is difficult to accurately translate compound nouns due to the lack of information in the transfer dictionary. In order to improve the translation quality of English-Korean machine translation, we must continuously expand the information of the English-Korean transfer dictionary by collecting the out-of-vocabulary words and the compound nouns frequently used. This paper proposes a method for expanding of the transfer dictionary, which consists of constructing corpus from internet newspapers, extracting the words which are not in the existing dictionary and the frequently used compound nouns, attaching meaning to the extracted words, and integrating with the transfer dictionary. We also develop the tool supporting the expansion of the transfer dictionary. The expansion of the dictionary information is critical to improving the machine translation system but requires much human efforts. The developed tool can be useful for continuously expanding the transfer dictionary, and so it is expected to contribute to enhancing the translation quality.

A Study on Expression of Space Emotion by Finishing Materials - According to Evaluation of Emotional Vocabulary and Factor Analysis - (마감재를 통한 공간감성 표현에 관한 연구 - 감성어휘 평가와 요인분석을 통해 -)

  • Seo, Ji-Eun;Park, Eui-Jeong
    • Korean Institute of Interior Design Journal
    • /
    • v.21 no.1
    • /
    • pp.177-185
    • /
    • 2012
  • The purpose of this study is to use as the basic data for design method in commercial space. So, we analyzed whether any emotion was induced by finishing materials in the commercial space. And we was to suggest expression methods of finish materials to induce in the emotional space. The results of this study are as follows : First, we could know that the emotional design is needed to enhance satisfaction of consumers. The role of finishing material is very important in emotional expression in the commercial space. Second, we extracted the adjectives vocabulary(14 pairs) to evaluate the space emotion. we could educe the four kinds of space emotion by Factor Analysis. In addition, we could arrange the emotional words to represent each space type(Decoration : 5 pairs, Expand : 4 pairs, Limitation : 3 pairs, Hierarchy : 2 pairs). Third, to use finishing materials and wall is very effective to induce the emotion in the emotional space. To use the color is good among the elements of finishing materials. Fourth, We could find that the center of the types of emotional space was induced with the boundary and the decoration. If we use contrasting colors and accent colors in the commercial space, we can induce the center and the boundary together. And if we use colorful or unusual patterns, we can induce the center and the decoration together. Fifth, To induce the expand, we should finish with one color in space. And To induce the center, we should finish with one type of the color or pattern and then we should partially use the contrast color and special pattern. the case of boundary, it is good method to part emphasize by color, texture and materials. And we can induce the decoration with materials and patterns.

  • PDF

Analysis of the English Textbooks in North Korean First Middle School (북한 제1중학교 영어교과서 분석)

  • Hwang, Seo-yeon;Kim, Jeong-ryeol
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.11
    • /
    • pp.242-251
    • /
    • 2017
  • For the purposes of this research, a corpus of words was created from the English textbooks of the "First Middle School" for the gifted in North Korea, and using the corpus, their linguistic characteristics were analyzed. Although there have been many studies that identified the traits of English textbooks in the North Korea's general middle school, not much focus has been placed on the English textbooks used at North Korea's First Middle School. Initially, the structure of English textbooks of the first, second, fourth, and sixth grades that had been procured from the Information Center on North Korea was reviewed, after which their corpus was created. Then, by using Wordsmith Tools 7.0, linguistic properties and high frequency content words appeared in the English textbook of the first grade were analyzed specifically. Basic statistical data gathered indicated that while the number of vocabulary did not increase as students progress through the grades, the words used tended to diversify incrementally. In the mean time, a distribution of the high frequency content words by grade illustrated that a big difference was found between the content words used in the English texts of each grade, and it was a subject matter of the texts that determined such difference.