• Title/Summary/Keyword: natural language

Search Result 1,538, Processing Time 0.028 seconds

Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0

  • Kim, Sunho;Kim, Royoung;Nam, Hee-Jo;Kim, Ryeo-Gyeong;Ko, Enjin;Kim, Han-Su;Shin, Jihye;Cho, Daeun;Jin, Yurhee;Bae, Soyeon;Jo, Ye Won;Jeong, San Ah;Kim, Yena;Ahn, Seoyeon;Jang, Bomi;Seong, Jiheyon;Lee, Yujin;Seo, Si Eun;Kim, Yujin;Kim, Ha-Jeong;Kim, Hyeji;Sung, Hye-Lynn;Lho, Hyoyoung;Koo, Jaywon;Chu, Jion;Lim, Juwon;Kim, Youngju;Lee, Kyungyeon;Lim, Yuri;Kim, Meongeun;Hwang, Seonjeong;Han, Shinhye;Bae, Sohyeun;Kim, Sua;Yoo, Suhyeon;Seo, Yeonjeong;Shin, Yerim;Kim, Yonsoo;Ko, You-Jung;Baek, Jihee;Hyun, Hyejin;Choi, Hyemin;Oh, Ji-Hye;Kim, Da-Young;Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • v.18 no.3
    • /
    • pp.33.1-33.7
    • /
    • 2020
  • This paper describes a community effort to improve earlier versions of the full-text corpus of Genomics & Informatics by semi-automatically detecting and correcting PDF-to-text conversion errors and optical character recognition errors during the first hackathon of Genomics & Informatics Annotation Hackathon (GIAH) event. Extracting text from multi-column biomedical documents such as Genomics & Informatics is known to be notoriously difficult. The hackathon was piloted as part of a coding competition of the ELTEC College of Engineering at Ewha Womans University in order to enable researchers and students to create or annotate their own versions of the Genomics & Informatics corpus, to gain and create knowledge about corpus linguistics, and simultaneously to acquire tangible and transferable skills. The proposed projects during the hackathon harness an internal database containing different versions of the corpus and annotations.

Comparison of Application Effect of Natural Language Processing Techniques for Information Retrieval (정보검색에서 자연어처리 응용효과 분석)

  • Xi, Su Mei;Cho, Young Im
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.18 no.11
    • /
    • pp.1059-1064
    • /
    • 2012
  • In this paper, some applications of natural language processing techniques for information retrieval have been introduced, but the results are known not to be satisfied. In order to find the roles of some classical natural language processing techniques in information retrieval and to find which one is better we compared the effects with the various natural language techniques for information retrieval precision, and the experiment results show that basic natural language processing techniques with small calculated consumption and simple implementation help a small for information retrieval. Senior high complexity of natural language processing techniques with high calculated consumption and low precision can not help the information retrieval precision even harmful to it, so the role of natural language understanding may be larger in the question answering system, automatic abstract and information extraction.

Korean Dependency Parsing Using Sequential Parsing Method Based on Pointer Network (순차적 구문 분석 방법을 반영한 포인터 네트워크 기반의 한국어 의존 구문 분석기)

  • Han, Janghoon;Park, Yeongjoon;Jeong, Younghoon;Lee, Inkwon;Han, Jungwook;Park, Seojun;Kim, Juae;Seo, Jeongyeon
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.533-536
    • /
    • 2019
  • 의존 구문 분석은 문장 구성 성분 간의 의존 관계를 분석하는 태스크로, 자연어 이해의 대표적인 과제 중 하나이다. 본 논문에서는 한국어 의존 구문 분석의 성능 향상을 위해 Deep Bi-Affine Network와 Left to Right Dependency Parser를 적용하고, 새롭게 한국어의 언어적 특징을 반영한 Right to Left Dependency Parser 모델을 제안한다. 3개의 의존 구문 분석 모델에 단어 표현을 생성하는 방법으로 ELMo, BERT 임베딩 방법을 적용하고 여러 종류의 모델을 앙상블하여 세종 의존 구문 분석 데이터에 대해 UAS 94.50, LAS 92.46 성능을 얻을 수 있었다.

  • PDF

Analyzing Morpheme of the Natural Language to Express the Symptoms of Korean Medicine (한의학 증상용어의 형태소 분석을 위한 자연어 표기 분석)

  • Kim, Hye-Eun;Sung, Ho-Kyung;Eom, Dong-Myung;Lee, Choong-Yeol;Lee, Byung-Wook
    • Journal of Society of Preventive Korean Medicine
    • /
    • v.17 no.2
    • /
    • pp.179-187
    • /
    • 2013
  • Objectives : In many cases, patient's symptoms have been recorded on EMR in natural language instead of medical terminologies. It is possible to build a database by analyzing the symptoms of Korean Medicine(KM) that indicates patient's symptoms in natural language. Using the database, when doctors record patient's symptoms on EMR in natural language, conversely it'll be also possible to extract the symptoms of KM from those natural language. The database will enhance the value of EMR as a medical data. Methods : In this study, we aimed to make data structure of the terminologies that represent the symptoms of KM. The data structure is combinations of smallest unit in natural language. We made the database by analyzing morpheme of the natural language to express the symptoms of KM. Results & Conclusions : By classifying the natural language in 15 features, we made the structure of concept and the data available for morphological analysis.

A Survey of Automatic Code Generation from Natural Language

  • Shin, Jiho;Nam, Jaechang
    • Journal of Information Processing Systems
    • /
    • v.17 no.3
    • /
    • pp.537-555
    • /
    • 2021
  • Many researchers have carried out studies related to programming languages since the beginning of computer science. Besides programming with traditional programming languages (i.e., procedural, object-oriented, functional programming language, etc.), a new paradigm of programming is being carried out. It is programming with natural language. By programming with natural language, we expect that it will free our expressiveness in contrast to programming languages which have strong constraints in syntax. This paper surveys the approaches that generate source code automatically from a natural language description. We also categorize the approaches by their forms of input and output. Finally, we analyze the current trend of approaches and suggest the future direction of this research domain to improve automatic code generation with natural language. From the analysis, we state that researchers should work on customizing language models in the domain of source code and explore better representations of source code such as embedding techniques and pre-trained models which have been proved to work well on natural language processing tasks.

On the Analysis of Natural Language Processing Morphology for the Specialized Corpus in the Railway Domain

  • Won, Jong Un;Jeon, Hong Kyu;Kim, Min Joong;Kim, Beak Hyun;Kim, Young Min
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.4
    • /
    • pp.189-197
    • /
    • 2022
  • Today, we are exposed to various text-based media such as newspapers, Internet articles, and SNS, and the amount of text data we encounter has increased exponentially due to the recent availability of Internet access using mobile devices such as smartphones. Collecting useful information from a lot of text information is called text analysis, and in order to extract information, it is performed using technologies such as Natural Language Processing (NLP) for processing natural language with the recent development of artificial intelligence. For this purpose, a morpheme analyzer based on everyday language has been disclosed and is being used. Pre-learning language models, which can acquire natural language knowledge through unsupervised learning based on large numbers of corpus, are a very common factor in natural language processing recently, but conventional morpheme analysts are limited in their use in specialized fields. In this paper, as a preliminary work to develop a natural language analysis language model specialized in the railway field, the procedure for construction a corpus specialized in the railway field is presented.

A study on Implementation of English Sentence Generator using Lexical Functions (언어함수를 이용한 영문 생성기의 구현에 관한 연구)

  • 정희연;김희연;이웅재
    • Journal of Internet Computing and Services
    • /
    • v.1 no.2
    • /
    • pp.49-59
    • /
    • 2000
  • The majority of work done to date on natural language processing has focused on analysis and understanding of language, thus natural language generation had been relatively less attention than understanding, And people even tends to regard natural language generation CIS a simple reverse process of language understanding, However, need for natural language generation is growing rapidly as application systems, especially multi-language machine translation systems on the web, natural language interface systems, natural language query systems need more complex messages to generate, In this paper, we propose an algorithm to generate more flexible and natural sentence using lexical functions of Igor Mel'uk (Mel'uk & Zholkovsky, 1988) and systemic grammar.

  • PDF

Korean Dependency Parsing Using Deep Bi-affine Network and Stack Pointer Network (Deep Bi-affine Network와 스택 포인터 네트워크를 이용한 한국어 의존 구문 분석 시스템)

  • Ahn, Hwijeen;Park, Chanmin;Seo, Minyoung;Lee, Jaeha;Son, Jeongyeon;Kim, Juae;Seo, Jeongyeon
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.689-691
    • /
    • 2018
  • 의존 구문 분석은 자연어 이해 영역의 대표적인 과제 중 하나이다. 본 논문에서는 한국어 의존 구분 분석의 성능 향상을 위해 Deep Bi-affine Network 와 스택 포인터 네트워크의 앙상블 모델을 제안한다. Bi-affine 모델은 그래프 기반 방식, 스택 포인터 네트워크의 경우 그래프 기반과 전이 기반의 장점을 모두 사용하는 모델로 서로 다른 모델의 앙상블을 통해 성능 향상을 기대할 수 있다. 두 모델 모두 한국어 어절의 특성을 고려한 자질을 사용하였으며 세종 의존 구문 분석 데이터에 대해 UAS 90.60 / LAS 88.26(Deep Bi-affine Network), UAS 92.17 / LAS 90.08(스택 포인터 네트워크) 성능을 얻었다. 두 모델에 대한 앙상블 기법 적용시 추가적인 성능 향상을 얻을 수 있었다.

  • PDF

Korean Natural Language Inference with Natural Langauge Explanations (Natural Language Explanations 에 기반한 한국어 자연어 추론)

  • Jun-Ho Yoon;Seung-Hoon Na
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.170-175
    • /
    • 2022
  • 일반적으로 대규모 언어 모델들은 다량의 데이터를 오랜시간 사전학습하면서 레이블을 예측하기 위한 성능을 높여왔다. 최근 언어 모델의 레이블 예측에 대한 정확도가 높아지면서, 언어 모델이 왜 해당 결정을 내렸는지 이해하기 위한 신뢰도 높은 Natural Language Explanation(NLE) 을 생성하는 것이 시간이 지남에 따라 주요 요소로 자리잡고 있다. 본 논문에서는 높은 레이블 정확도를 유지하면서 동시에 언어 모델의 예측에 대한 신뢰도 높은 explanation 을 생성하는 참신한 자연어 추론 시스템을 제시한 Natural-language Inference over Label-specific Explanations(NILE)[1] 을 소개하고 한국어 데이터셋을 이용해 NILE 과 NLE 를 활용하지 않는 일반적인 자연어 추론 태스크의 성능을 비교한다.

  • PDF

A Study of Improving the Flexibility and Effectiveness of Natural Anguage Understanding Considering Natural Language Classification Methodologies (Machine에 의한 자연 언어 이해의 효과성 및 탄력성 중대를 위한 자연언어 이해 기법과 분류 기법과 연결적 통합 사용에 대한 연구)

  • 이현부
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.1 no.3
    • /
    • pp.20-32
    • /
    • 1991
  • This study seeks a way a way of dealing with unformatted natural language considering fuzzy set theory. The goal of the study is to establish a framework of an effective language understanding system that is linked to language classification system This study has found that languate understanding is strongly influenced by the language classification. The understanding of language. This study shows that the precision of language classification depends upon the way of how the language is classified in advance. In this study, a fuzzy logic was used to improve the precision of language classification. It was considered that the fuzzy logic might be albe to distinctively classify nuatural language texts into pretinent homogenious groups where contents of the language were identical. Accordingly, in the study, it was expected that classification of language were precisely classified by the fuzzy logic. An experimentalsystems was designed to evaluate the performane of a natural language understanding system that was connected to a fuzzy language classification system. Finally, the experiment suggests that a successful language understanding should require an real time interaction between mem andmachine fuzzy provious language classification.

  • PDF