• Title/Summary/Keyword: language text

Search Result 1,322, Processing Time 0.022 seconds

Improving Elasticsearch for Chinese, Japanese, and Korean Text Search through Language Detector

  • Kim, Ki-Ju;Cho, Young-Bok
    • Journal of information and communication convergence engineering
    • /
    • v.18 no.1
    • /
    • pp.33-38
    • /
    • 2020
  • Elasticsearch is an open source search and analytics engine that can search petabytes of data in near real time. It is designed as a distributed system horizontally scalable and highly available. It provides RESTful APIs, thereby making it programming-language agnostic. Full text search of multilingual text requires language-specific analyzers and field mappings appropriate for indexing and searching multilingual text. Additionally, a language detector can be used in conjunction with the analyzers to improve the multilingual text search. Elasticsearch provides more than 40 language analysis plugins that can process text and extract language-specific tokens and language detector plugins that can determine the language of the given text. This study investigates three different approaches to index and search Chinese, Japanese, and Korean (CJK) text (single analyzer, multi-fields, and language detector-based), and identifies the advantages of the language detector-based approach compared to the other two.

This study revises Lee Hyo-seok's The Buckwheat Season, utilizing Novel Corpus, intermediate learners' level (소설텍스트의 난이도 조정 방안 연구 -이효석의 「메밀꽃 필 무렵」을 중심으로-)

  • Hwang, Hye ran
    • Journal of Korean language education
    • /
    • v.29 no.4
    • /
    • pp.255-294
    • /
    • 2018
  • The Buckwheat Season, evaluated as the best of Lee Hyo-seok's literature, is one of the short stories that represent Korean literature. However, vivid literary expressions such as lyrical and beautiful depictions, figurative expressions and dialects, which show the Korean beauty, rather make learners have difficulty and become a factor that fails in reading comprehension. Thus, it is necessary to revise and present the text modified for the learners' language level. The methods of revising a literary text include the revision of linguistic elements such as cryptic vocabulary or sentence structure and the revision of the composition of the text, e.g. suggestion of characters or plot, or insertion of illustration. The methods of revising the language of the text can be divided into methods of simplification and detailing. However, in the process of revising the text, many depend on the adapter's subjective perception, not revising it with objective criteria. This paper revised the text, utilizing by the Academy of Korean Studies, , and the by the National Institute of Korean Language to secure objectivity in revising the text.

Sign2Gloss2Text-based Sign Language Translation with Enhanced Spatial-temporal Information Centered on Sign Language Movement Keypoints (수어 동작 키포인트 중심의 시공간적 정보를 강화한 Sign2Gloss2Text 기반의 수어 번역)

  • Kim, Minchae;Kim, Jungeun;Kim, Ha Young
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.10
    • /
    • pp.1535-1545
    • /
    • 2022
  • Sign language has completely different meaning depending on the direction of the hand or the change of facial expression even with the same gesture. In this respect, it is crucial to capture the spatial-temporal structure information of each movement. However, sign language translation studies based on Sign2Gloss2Text only convey comprehensive spatial-temporal information about the entire sign language movement. Consequently, detailed information (facial expression, gestures, and etc.) of each movement that is important for sign language translation is not emphasized. Accordingly, in this paper, we propose Spatial-temporal Keypoints Centered Sign2Gloss2Text Translation, named STKC-Sign2 Gloss2Text, to supplement the sequential and semantic information of keypoints which are the core of recognizing and translating sign language. STKC-Sign2Gloss2Text consists of two steps, Spatial Keypoints Embedding, which extracts 121 major keypoints from each image, and Temporal Keypoints Embedding, which emphasizes sequential information using Bi-GRU for extracted keypoints of sign language. The proposed model outperformed all Bilingual Evaluation Understudy(BLEU) scores in Development(DEV) and Testing(TEST) than Sign2Gloss2Text as the baseline, and in particular, it proved the effectiveness of the proposed methodology by achieving 23.19, an improvement of 1.87 based on TEST BLEU-4.

A construction of dictionary for Korean Text to Sign Language Translation (한글문장-수화 번역기를 위한 사전구성)

  • 권경혁;민홍기
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.841-844
    • /
    • 1998
  • Korean Text to Sign Language Traslator could be applied to learn letters for both the deaf and hard-of-hearing people, and to have a conversation with normal people. This paper describes some useful dictionaries for developing korean text to sign language translator; Base sign language dictionary, Compound sign language dictionary, and Resemble sign language dictionary. As korean sign language is composed entirely of about 6,000 words, the additional dictionaries are required for matching them to korean written language. We design base sign language dictionary which was composed of basic symbols and moving picture of korean sign language, and propose the definition of compound isng language dictionary which was composed of symbols of base sing language. In addition, resemble sign language dictionary offer sign symbols and letters which is used same meaning in conversation. By using these methods, we could search quickly sign language during korean text to sign language translating process, and save storage space. We could also solve the lack of sign language words by using them, which are appeared on translating process.

  • PDF

A Frame-based Approach to Text Generation

  • Le, Huong Thanh
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.192-201
    • /
    • 2007
  • This paper is a study on constructing a natural language interface to database, concentrating on generating textual answers. TGEN, a system that generates textual answer from query result tables is presented. The TGEN architecture guarantees its portability across domains. A combination of a frame-based approach and natural language generation techniques in the TGEN provides text fluency and text flexibility. The implementation result shows that this approach is feasible while a deep NLG approach is still far to be reached.

  • PDF

Statistical Analysis Between Size and Balance of Text Corpus by Evaluation of the effect of Interview Sentence in Language Modeling (언어모델 인터뷰 영향 평가를 통한 텍스트 균형 및 사이즈간의 통계 분석)

  • Jung Eui-Jung;Lee Youngjik
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.87-90
    • /
    • 2002
  • This paper analyzes statistically the relationship between size and balance of text corpus by evaluation of the effect of interview sentences in language model for Korean broadcast news transcription system. Our Korean broadcast news transcription system's ultimate purpose is to recognize not interview speech, but the anchor's and reporter's speech in broadcast news show. But the gathered text corpus for constructing language model consists of interview sentences a portion of the whole, $15\%$ approximately. The characteristic of interview sentence is different from the anchor's and the reporter's in one thing or another. Therefore it disturbs the anchor and reporter oriented language modeling. In this paper, we evaluate the effect of interview sentences in language model for Korean broadcast news transcription system and analyze statistically the relationship between size and balance of text corpus by making an experiment as the same procedure according to varying the size of corpus.

  • PDF

Text Categorization for Authorship based on the Features of Lingual Conceptual Expression

  • Zhang, Quan;Zhang, Yun-liang;Yuan, Yi
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.515-521
    • /
    • 2007
  • The text categorization is an important field for the automatic text information processing. Moreover, the authorship identification of a text can be treated as a special text categorization. This paper adopts the conceptual primitives' expression based on the Hierarchical Network of Concepts (HNC) theory, which can describe the words meaning in hierarchical symbols, in order to avoid the sparse data shortcoming that is aroused by the natural language surface features in text categorization. The KNN algorithm is used as computing classification element. Then, the experiment has been done on the Chinese text authorship identification. The experiment result gives out that the processing mode that is put forward in this paper achieves high correct rate, so it is feasible for the text authorship identification.

  • PDF

The Effects of Inferential Reading Strategy Program on Text Comprehension and Korean Language Academic Achievements of Vocational High School Students (추론적 읽기전략 프로그램이 전문계 고등학생의 텍스트 이해와 국어과 학업성취에 미치는 효과)

  • Kim, Seon-Kyung;Yune, So-Jung;Kim, Jung-Sub
    • Journal of Fisheries and Marine Sciences Education
    • /
    • v.23 no.1
    • /
    • pp.1-12
    • /
    • 2011
  • The purpose of this study was to examine the effects of inferential reading strategy program on text comprehension and Korean language academic achievements of vocational high school students. We developed the program of inferential reading strategy, applied it to an educational spot, and examined the effects of it on text comprehension ability and Korean language academic achievements of learners. ANCOVA was used for data analysis with SPSS ver.12.0 statistic program. The main findings of this study were as follows. First, the experimental group which had been conducted with the inferential reading strategy program showed statistically significant difference in their text comprehension ability from controlled group. Second, the experimental group showed statistically significant difference in their Korean language academic achievements ability from controlled group. The study shows that the inferential reading strategy program had effect on the text comprehension and Korean language academic achievements of vocational high school students.

On the Analysis of Natural Language Processing Morphology for the Specialized Corpus in the Railway Domain

  • Won, Jong Un;Jeon, Hong Kyu;Kim, Min Joong;Kim, Beak Hyun;Kim, Young Min
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.4
    • /
    • pp.189-197
    • /
    • 2022
  • Today, we are exposed to various text-based media such as newspapers, Internet articles, and SNS, and the amount of text data we encounter has increased exponentially due to the recent availability of Internet access using mobile devices such as smartphones. Collecting useful information from a lot of text information is called text analysis, and in order to extract information, it is performed using technologies such as Natural Language Processing (NLP) for processing natural language with the recent development of artificial intelligence. For this purpose, a morpheme analyzer based on everyday language has been disclosed and is being used. Pre-learning language models, which can acquire natural language knowledge through unsupervised learning based on large numbers of corpus, are a very common factor in natural language processing recently, but conventional morpheme analysts are limited in their use in specialized fields. In this paper, as a preliminary work to develop a natural language analysis language model specialized in the railway field, the procedure for construction a corpus specialized in the railway field is presented.

Perspective Coherence in Simultaneous Interpreting - with Reference to German-Korean Interpreting - (동시통역과 시각적 응집성 - 독한 통역을 중심으로 -)

  • Ahn In-Kyoung
    • Koreanishche Zeitschrift fur Deutsche Sprachwissenschaft
    • /
    • v.9
    • /
    • pp.169-193
    • /
    • 2004
  • In simultaneous interpreting, if the syntactic structure of the source language and the target language are very different, interpreters have to wait before being able to reformulate the source text segments into a meaningful utterance in target language. It is inevitable to adapt the target language structure to that of the source language so as not to unduly increase the memory load and to minimize the pause. While such adaptation enables simultaneous interpretating, it results in damaging the perspective coherence of the text. Discovering when such perspective coherence is impaired, and how the problem can be relieved, will enable interpreters to enhance their performance. This paper analyses the reasons for perspective coherence damage by looking at some examples of German-Korean simultaneous interpreting.

  • PDF