• Title/Summary/Keyword: Language Resources

Search Result 429, Processing Time 0.023 seconds

A Dictionary Constructing System based on a Web-based Object Model of Distributed Language Resources (웹 기반의 언어자원 객체화에 근거한 사전 개발 시스템)

  • 황도삼
    • Korean Journal of Cognitive Science
    • /
    • v.12 no.1_2
    • /
    • pp.1-9
    • /
    • 2001
  • In this paper. we present a web-based object model of language resources that are distributed in different places in variable forms. Language resources organized as objects distributed over web sites can be easily utilized to produce application systems of natural language processing. So. it renders effective maintenance of overall language processing environment in that upgrading language resources can lead to the mechanical upgrading of application systems. We implemented a dictionary constructing system for Korean Language (YDK2000). This system can integrate various linguistic dictionaries and also allow to construct high quality application specific dictionaries by connecting them to natural language systems on the Internet.

  • PDF

International Marriage Immigrant Women's Resources for Life Adjustment in Korea (결혼이주여성의 자원체계와 한국생활적응)

  • Hong, Sung-Hee
    • Journal of Family Resource Management and Policy Review
    • /
    • v.17 no.2
    • /
    • pp.121-145
    • /
    • 2013
  • The purpose of this study is to understand married female immigrants' life adjustment process in Korea by explaining the resources to which they have access and how they use them. The data were collected through in-depth interviews with ten female participants who have more than one child, have participated in programs of the multicultural family support center, have work experience, can communicate with Koreans, and live in Daegu. The major findings are as follows. The participants' personal resources differed. English language skills were very useful resources for making money and for earning the respect of family members and others. However, the participants without English language skills had sincerely and actively tried to learn the Korean language and gain bilingual competence. The participants obtained diverse family resources from their husbands and parents-in-law after adapting themselves to perform their gender role. Further, the participants used the social resources offered by public support systems as a starting point for learning the Korean language in their early adaptation process, and formed personal networks with staff members at the multicultural family support center. The results show that the participants used many kinds of resources for acculturation by interacting positively with their environment. Moreover, the resources from diverse levels of their environments affected their acculturation process.

  • PDF

Automatic Mapping Between Large-Scale Heterogeneous Language Resources for NLP Applications: A Case of Sejong Semantic Classes and KorLexNoun for Korean

  • Park, Heum;Yoon, Ae-Sun
    • Language and Information
    • /
    • v.15 no.2
    • /
    • pp.23-45
    • /
    • 2011
  • This paper proposes a statistical-based linguistic methodology for automatic mapping between large-scale heterogeneous languages resources for NLP applications in general. As a particular case, it treats automatic mapping between two large-scale heterogeneous Korean language resources: Sejong Semantic Classes (SJSC) in the Sejong Electronic Dictionary (SJD) and nouns in KorLex. KorLex is a large-scale Korean WordNet, but it lacks syntactic information. SJD contains refined semantic-syntactic information, with semantic labels depending on SJSC, but the list of its entry words is much smaller than that of KorLex. The goal of our study is to build a rich language resource by integrating useful information within SJD into KorLex. In this paper, we use both linguistic and statistical methods for constructing an automatic mapping methodology. The linguistic aspect of the methodology focuses on the following three linguistic clues: monosemy/polysemy of word forms, instances (example words), and semantically related words. The statistical aspect of the methodology uses the three statistical formulae ${\chi}^2$, Mutual Information and Information Gain to obtain candidate synsets. Compared with the performance of manual mapping, the automatic mapping based on our proposed statistical linguistic methods shows good performance rates in terms of correctness, specifically giving recall 0.838, precision 0.718, and F1 0.774.

  • PDF

Development of Sensibility Vocabulary Classification System for Sensibility Evaluation of Visitors According to Forest Environment

  • Lee, Jeong-Do;Joung, Dawou;Hong, Sung-Jun;Kim, Da-Young;Park, Bum-Jin
    • Journal of People, Plants, and Environment
    • /
    • v.22 no.2
    • /
    • pp.209-217
    • /
    • 2019
  • Generally human sensibility is expressed in a certain language. To discover the sensibility of visitors in relation to the forest environment, it is first necessary to determine their exact meanings. Furthermore, it is necessary to sort these terms according to their meanings based on an appropriate classification system. This study attempted to develop a classification system for forest sensibility vocabulary by extracting Korean words used by forest visitors to express their sensibilities in relation to the forest environment, and established the structure of the system to classify the accumulated vocabulary. For this purpose, we extracted forest sensibility words based on literature review of experiences reported in the past as well as interviews of forest visitors, and categorized the words by meanings using the Standard Korean Language Dictionary maintained by the National Institute of the Korean Language. Next, the classification system for these words was established with reference to the classification system for vocabulary in the Korean language examined in previous studies of Korean language and literature. As a result, 137 forest sensibility words were collected using a documentary survey, and we categorized these words into four types: emotion, sense, evaluation, and existence. Categorizing the collected forest sensibility words based on this Korean language classification system resulted in the extraction of 40 representative sensibility words. This experiment enabled us to determine from where our sensibilities that find expressions in the forest are derived, that is, from sight, hearing, smell, taste, or touch, along with various other aspects of how our human sensibilities are expressed such as whether the subject of a word is person-centered or object-centered. We believe that the results of this study can serve as foundational data about forest sensibility.

Building a Korean-English Parallel Corpus by Measuring Sentence Similarities Using Sequential Matching of Language Resources and Topic Modeling (언어 자원과 토픽 모델의 순차 매칭을 이용한 유사 문장 계산 기반의 위키피디아 한국어-영어 병렬 말뭉치 구축)

  • Cheon, JuRyong;Ko, YoungJoong
    • Journal of KIISE
    • /
    • v.42 no.7
    • /
    • pp.901-909
    • /
    • 2015
  • In this paper, to build a parallel corpus between Korean and English in Wikipedia. We proposed a method to find similar sentences based on language resources and topic modeling. We first applied language resources(Wiki-dictionary, numbers, and online dictionary in Daum) to match word sequentially. We construct the Wiki-dictionary using titles in Wikipedia. In order to take advantages of the Wikipedia, we used translation probability in the Wiki-dictionary for word matching. In addition, we improved the accuracy of sentence similarity measuring method by using word distribution based on topic modeling. In the experiment, a previous study showed 48.4% of F1-score with only language resources based on linear combination and 51.6% with the topic modeling considering entire word distributions additionally. However, our proposed methods with sequential matching added translation probability to language resources and achieved 9.9% (58.3%) better result than the previous study. When using the proposed sequential matching method of language resources and topic modeling after considering important word distributions, the proposed system achieved 7.5%(59.1%) better than the previous study.

Automatic Acquisition of Lexical-Functional Grammar Resources from a Japanese Dependency Corpus

  • Oya, Masanori;Genabith, Josef Van
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.375-384
    • /
    • 2007
  • This paper describes a method for automatic acquisition of wide-coverage treebank-based deep linguistic resources for Japanese, as part of a project on treebank-based induction of multilingual resources in the framework of Lexical-Functional Grammar (LFG). We automatically annotate LFG f-structure functional equations (i.e. labelled dependencies) to the Kyoto Text Corpus version 4.0 (KTC4) (Kurohashi and Nagao 1997) and the output of of Kurohashi-Nagao Parser (KNP) (Kurohashi and Nagao 1998), a dependency parser for Japanese. The original KTC4 and KNP provide unlabelled dependencies. Our method also includes zero pronoun identification. The performance of the f-structure annotation algorithm with zero-pronoun identification for KTC4 is evaluated against a manually-corrected Gold Standard of 500 sentences randomly chosen from KTC4 and results in a pred-only dependency f-score of 94.72%. The parsing experiments on KNP output yield a pred-only dependency f-score of 82.08%.

  • PDF

Collaborative Social Tagging for eBook using External DSL Approach

  • Yoo, Hwan-Soo;Kim, Seong-Whan
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.11a
    • /
    • pp.1068-1072
    • /
    • 2014
  • We propose a collaborative social tagging for eBook using external DSL approach. The goal of this paper is (1) to provide DSL by which authors can write HTML5 rich contents ebook and tag resources, (2) to make users enhance book by tagging resources easily, (3) to make readers read rich book easily regardless of their devices types, (4) to provide ebook resources of RESTful address style by which other system can identify self-descriptive resources of book. To achieve the goal, we provide Bukle DSL language by which author and users can author and enhance ebook with ease. As a domainspecific language Bukle provides a simple yet expressive language for authoring and tagging books that would otherwise be more difficult to express with a general purpose language. Further work includes visual DSL approach and tools by using that the unskilled users could tag book easily. In order that future work also includes text-to-visual DSL transform engine. UX research is also required to tag and to author book. To tackle the above questions we are looking at using visual notation focusing visual syntax.

Korean Nominal Bank, Using Language Resources of Sejong Project (세종계획 언어자원 기반 한국어 명사은행)

  • Kim, Dong-Sung
    • Language and Information
    • /
    • v.17 no.2
    • /
    • pp.67-91
    • /
    • 2013
  • This paper describes Korean Nominal Bank, a project that provides argument structure for instances of the predicative nouns in the Sejong parsed Corpus. We use the language resources of the Sejong project, so that the same set of data is annotated with more and more levels of annotation, since a new type of a language resource building project could bring new information of separate and isolated processing. We have based on the annotation scheme based on the Sejong electronic dictionary, semantically tagged corpus, and syntactically analyzed corpus. Our work also involves the deep linguistic knowledge of syntaxsemantic interface in general. We consider the semantic theories including the Frame Semantics of Fillmore (1976), argument structure of Grimshaw (1990) and argument alternation of Levin (1993), and Levin and Rappaport Hovav (2005). Various syntactic theories should be needed in explaining various sentence types, including empty categories, raising, left (or right dislocation). We also need an explanation on the idiosyncratic lexical feature, such as collocation and etc.

  • PDF

Teaching Chinese through Drama to University Students for Language Skills (드라마 「신조협려(神雕俠侶)」를 활용한 대학 중국어 교육)

  • Choi, Tae-hoon
    • Cross-Cultural Studies
    • /
    • v.31
    • /
    • pp.415-438
    • /
    • 2013
  • This paper explores how to teach Chinese, using multi-media resources such as Chinese dramas and focusing on one of Jin Yong's dramas, The Return of the Condor Heroes. The purpose of this study is to develop teaching methodologies for university students learning Chinese through drama to integrate language skills: enhancing communicative competence and understanding Chinese cultures. First, the overview of previous studies provides several cases of foreign language education using drama. Teaching Chinese through drama can be an integrative education because students can develop their communicative competence as well as understand the cultures of the target language. In other words, the contexts of drama may offer rich sources of the history of China, Han Chinese ethnocentrism, and knowledge of Chinese literature as well as geography. Second, this study applies the principles of Tomlinson (2010) for materials development in language teaching into the case of Chinese drama. It concentrates on Jin Yong's The Return of the Condor Heroes that the author has used in the Chinese language courses for three years. It examines the characteristics of the drama for developing effective ways of teaching and learning Chinese language and culture. Furthermore, it discusses the impact of using drama on changes in students' pervasive perceptions about unnecessity of Chinese classical literature. Third, this paper presents some sample lessons which may help teachers to develop understanding of how to organize lessons through drama. Finally, it illustrates university students' opinions about using drama to learn Chinese.

Construction of Text Summarization Corpus in Economics Domain and Baseline Models

  • Sawittree Jumpathong;Akkharawoot Takhom;Prachya Boonkwan;Vipas Sutantayawalee;Peerachet Porkaew;Sitthaa Phaholphinyo;Charun Phrombut;Khemarath Choke-mangmi;Saran Yamasathien;Nattachai Tretasayuth;Kasidis Kanwatchara;Atiwat Aiemleuk;Thepchai Supnithi
    • Journal of information and communication convergence engineering
    • /
    • v.22 no.1
    • /
    • pp.33-43
    • /
    • 2024
  • Automated text summarization (ATS) systems rely on language resources as datasets. However, creating these datasets is a complex and labor-intensive task requiring linguists to extensively annotate the data. Consequently, certain public datasets for ATS, particularly in languages such as Thai, are not as readily available as those for the more popular languages. The primary objective of the ATS approach is to condense large volumes of text into shorter summaries, thereby reducing the time required to extract information from extensive textual data. Owing to the challenges involved in preparing language resources, publicly accessible datasets for Thai ATS are relatively scarce compared to those for widely used languages. The goal is to produce concise summaries and accelerate the information extraction process using vast amounts of textual input. This study introduced ThEconSum, an ATS architecture specifically designed for Thai language, using economy-related data. An evaluation of this research revealed the significant remaining tasks and limitations of the Thai language.