• Title/Summary/Keyword: 용어추출

Search Result 365, Processing Time 0.03 seconds

A Method for Extracting Relationships Between Terms Using Pattern-Based Technique (패턴 기반 기법을 사용한 용어 간 관계 추출 방법)

  • Kim, Young Tae;Kim, Chi Su
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.8
    • /
    • pp.281-286
    • /
    • 2018
  • With recent increase in complexity and variety of information and massively available information, interest in and necessity of ontology has been on the rise as a method of extracting a meaningful search result from massive data. Although there have been proposed many methods of extracting the ontology from a given text of a natural language, the extraction based on most of the current methods is not consistent with the structure of the ontology. In this paper, we propose a method of automatically creating ontology by distinguishing a term needed for establishing the ontology from a text given in a specific domain and extracting various relationships between the terms based on the pattern-based method. To extract the relationship between the terms, there is proposed a method of reducing the size of a searching space by taking a matching set of patterns into account and connecting a join-set concept and a pattern array. The result is that this method reduces the size of the search space by 50-95% without removing any useful patterns from the search space.

A Development of Reference Terminology Subset Editor for effective adaption of Clinical Vocabulary (임상용어의 효율적 적용을 위한 참조용어 Subset 에디터의 개발)

  • Cho, Hune;Kim, Hyung-Hoi;Choi, Byung-Guan;Choi, Young-Yeon;Kim, Hwa-Sun;Hong, Hae-Sook
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.3
    • /
    • pp.364-372
    • /
    • 2008
  • It is highly useful in an actual clinical setting to apply appropriate medical terms to every area of electronic medical record (EMR) and link them effectively, as a single medical terminology system cannot cover all medical concepts. In order to use standardized terms conveniently and efficiently, it is required to categorize them depending on the purpose of individual departments or physicians and thereby develop organized subsets of extracted terms highly likely to be used. In addition, it is important to such a subset to make it possible to change or correct standardized terminology system and continue to develop and upgrade to meet renewed demands of users. In this paper, data including chief compliant, symptoms, diagnosis, operation, and history of previous treatments were collected from discharge summary of patients with Department of Neurosurgery at Busan National University Hospital for analysis. In addition, subset database was created, and for terms needed to be added, the physician directly performed mapping through connection with reference terminology server and developed subset editor for the purpose of creating new subset database. Therefore, it is expected that this can serve as a practical and effective management method to reduce problems and inefficiency caused by existing vast terminology system.

  • PDF

An Analyses of the Terms used in the Information Boards of Geosites at Jeonbuk West Coast National Geopark (전북 서해안권 국가지질공원 지질명소 안내 표지판에 사용된 용어 분석)

  • Shin, Young-Jun;Cho, Kyu-Seong
    • Journal of the Korean earth science society
    • /
    • v.41 no.1
    • /
    • pp.40-47
    • /
    • 2020
  • The purpose of this study was to analyze the terms used in the Information Boards of Geosites at Jeonbuk West Coast National Geopark. Among the terms used in the Information Boards, nouns were extracted and listed based on the Standard Korean Language Dictionary, a glossary of earth and the data for the development of textbooks according to the 2015 revision of curriculum, by which eight types were classified. Seventy-one nouns (10.8%) of the extracted terms were not listed in any glossary. Most of these terms were compound words derived by combining [noun]+[noun] or [noun]+[affix] so that they were not easy to comprehend. In addition, two hundred fifty-six nouns (46%) of the terms were identified as jargons used in specific disciplines. Therefore, it is strongly suggested that when creating the National Geopark Information Boards, the academic jargon embedded terminologies be explained with annotation for general public visitors and students to understand without difficulty.

Construction of Test Collection for Extraction of Biomedical PLOT & Relations (생의학분야 PLOT 및 관계추출을 위한 테스트컬렉션 구축)

  • Choi, Yun-Soo;Choi, Sung-Phl;Jeong, Chang-Hoo
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2010.05a
    • /
    • pp.425-427
    • /
    • 2010
  • Large-scaled information extraction consists of named-entity recognition, terminology extraction and relation extraction. Since all the elementary technologies have been studied independently so far, test collections for related machine learning models also have been constructed independently. As a result, it is difficult to handle scientific documents to extract both named-entities and technical terms at once. In this study, we integrate named-entities and terminologies with PLOT(Person, Location, Organization, Terminology) in a biomedical domain and construct a test collection of PLOT and relations between PLOTs.

  • PDF

A Study on Keyword Extraction From a Single Document Using Term Clustering (용어 클러스터링을 이용한 단일문서 키워드 추출에 관한 연구)

  • Han, Seung-Hee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.44 no.3
    • /
    • pp.155-173
    • /
    • 2010
  • In this study, a new keyword extraction algorithm is applied to a single document with term clustering. A single document is divided by multiple passages, and two ways of calculating similarities between two terms are investigated; the first-order similarity and the second-order distributional similarity. In this experiment, the best cluster performance is achieved with a 50-term passage from the second-order distributional similarity. From the results of first experiment, the second-order distribution similarity was also applied to various keyword extraction methods using statistic information of terms. In the second experiment, pf(paragraph frequency) and $tf{\times}ipf$(term frequency by inverse paragraph frequency) were found to improve the overall performance of keyword extraction. Therefore, it showed that the algorithm fulfills the necessary conditions which good keywords should have.

A Study on the Integration of Recognition Technology for Scientific Core Entities (과학기술 핵심개체 인식기술 통합에 관한 연구)

  • Choi, Yun-Soo;Jeong, Chang-Hoo;Cho, Hyun-Yang
    • Journal of the Korean Society for information Management
    • /
    • v.28 no.1
    • /
    • pp.89-104
    • /
    • 2011
  • Large-scaled information extraction plays an important role in advanced information retrieval as well as question answering and summarization. Information extraction can be defined as a process of converting unstructured documents into formalized, tabular information, which consists of named-entity recognition, terminology extraction, coreference resolution and relation extraction. Since all the elementary technologies have been studied independently so far, it is not trivial to integrate all the necessary processes of information extraction due to the diversity of their input/output formation approaches and operating environments. As a result, it is difficult to handle scientific documents to extract both named-entities and technical terms at once. In order to extract these entities automatically from scientific documents at once, we developed a framework for scientific core entity extraction which embraces all the pivotal language processors, named-entity recognizer and terminology extractor.

Ontology construction for Korea Telecom(KT) Terms (KT 용어 온토로지 구축)

  • Roh, Duck-Keun;Byun, Dong-Ryul;Park, Soon-Cheol
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.10d
    • /
    • pp.550-555
    • /
    • 2007
  • 본 논문에서는 한국통신(KT)에서 사용되는 주요 용어들을 추출하여 추출된 용어들 간의 고유성과 관계성을 기초로 한 용어 온토로지를 구축하였다. 또한 생성된 용어 온토로지를 이용한 검색질의 예를 통해서 기업의 다양한 분야를 관리하는데 도움을 줄 수 있는 방안을 모색했다. 온토로지 구축 툴로는 은토로지 에디터, Protege를 사용하였으며. 온토로지는 최상위 클래스 Organization(기관), Employee(직원), Product(상품), Technique(기술) 등 4가지로 분류하여 구축하였다. 본 연구를 기초로 한국통신(KT)의 다양한 지식정보를 체계화하고 KT 데이터베이스를 효과적으로 관리할 수 있을 것이다. 또한 구축된 온토로지를 이용한 미래의 KT 시멘틱 검색시스템 구축에 기초가 되기를 기대한다.

  • PDF

A Study on Thesaurus Expansion through Definitions of Terms and Extraction of Relationships (용어정의와 관계추출을 통한 시소러스 확장에 관한 연구)

  • Kim, Ji-Hun;Kim, Tae-Soo
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.40 no.1
    • /
    • pp.293-314
    • /
    • 2006
  • To maintain consistency of terms in information retrieval process, it is necessary to present the meaning of terms definitely in thesaurus. Therefore, most of thesauri has presented meaning of terms through basic relationships or scope notes. But. thesaurus including standardized definitions in contents and form has been proposed lately. This study was performed to make standardized definitions and extract relationships in contents of defining models. Also, expanded thesaurus was constructed being integrated and replaced standardized definitions and extracted relationships into the existing thesaurus. As the result. this study has shown a possibility for further development of thesaurus.

Machine-Learning Based Biomedical Term Recognition (기계학습에 기반한 생의학분야 전문용어의 자동인식)

  • Oh Jong-Hoon;Choi Key-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.8
    • /
    • pp.718-729
    • /
    • 2006
  • There has been increasing interest in automatic term recognition (ATR), which recognizes technical terms for given domain specific texts. ATR is composed of 'term extraction', which extracts candidates of technical terms and 'term selection' which decides whether terms in a term list derived from 'term extraction' are technical terms or not. 'term selection' is a process to rank a term list depending on features of technical term and to find the boundary between technical term and general term. The previous works just use statistical features of terms for 'term selection'. However, there are limitations on effectively selecting technical terms among a term list using the statistical feature. The objective of this paper is to find effective features for 'term selection' by considering various aspects of technical terms. In order to solve the ranking problem, we derive various features of technical terms and combine the features using machine-learning algorithms. For solving the boundary finding problem, we define it as a binary classification problem which classifies a term in a term list into technical term and general term. Experiments show that our method records 78-86% precision and 87%-90% recall in boundary finding, and 89%-92% 11-point precision in ranking. Moreover, our method shows higher performance than the previous work's about 26% in maximum.

The Analysis on Research Trends for Computational Thinking in Korea : Based on Terminology of CT (Computational Thinking(CT) 관련 국내 연구 동향 분석 : CT 용어 사용을 중심으로)

  • Han, Jeong-Min;Kim, Seong-Won;Lee, Young-Jun
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2017.07a
    • /
    • pp.223-226
    • /
    • 2017
  • 소프트웨어 교육이 활성화됨에 따라 CT의 중요성이 증가하고 있지만, CT는 다양한 단어로 혼용되고 있다. 이와 같은 현황은 CT 연구에 어려움을 유발하고 있다. 따라서 본 연구에서는 CT 용어 사용의 표준화를 위하여 CT 관련 연구에서 사용하고 있는 CT의 용어를 분석하였다. 이러한 연구를 위하여 선행 연구를 통해 '컴퓨팅 사고(력)', 'computational thinking(CT)', '계산적 사고(력)', '알고리즘적 사고(력)', '컴퓨터적 사고(력)', '컴퓨터 과학적 사고(력)', '정보적 사고(력)', '정보 과학적 사고(력)'이라는 키워드를 추출하였다. 추출한 키워드를 기반으로 학술연구정보서비스에서 CT 관련 논문을 수집한 후 CT 관련 논문 중, 제목에 CT 관련 용어가 포함된 123편의 논문을 최종 연구 대상으로 선정하였다. 이와 같은 논문을 분석한 결과, CT 관련 연구는 2008년부터 지금까지 꾸준히 증가해 왔으며 특히 2014년과 2015년 사이에 큰 폭으로 증가한 것을 알 수 있었다. 또한 CT를 표현하는 여러 용어들이 혼재하는 기간을 거쳐 CT의 한글 용어에 대한 합의가 '컴퓨팅 사고력'으로 귀결되고 있는 것을 확인 할 수 있었다. 이를 출발점으로 연구주제, 연구방법, 연구대상 등을 중심으로 한 CT 관련 연구동향 연구가 추후 이루어져야 함을 제안한다.

  • PDF