• Title/Summary/Keyword: terminology extraction

Search Result 29, Processing Time 0.021 seconds

Construction of Test Collection for Extraction of Biomedical PLOT & Relations (생의학분야 PLOT 및 관계추출을 위한 테스트컬렉션 구축)

  • Choi, Yun-Soo;Choi, Sung-Phl;Jeong, Chang-Hoo
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2010.05a
    • /
    • pp.425-427
    • /
    • 2010
  • Large-scaled information extraction consists of named-entity recognition, terminology extraction and relation extraction. Since all the elementary technologies have been studied independently so far, test collections for related machine learning models also have been constructed independently. As a result, it is difficult to handle scientific documents to extract both named-entities and technical terms at once. In this study, we integrate named-entities and terminologies with PLOT(Person, Location, Organization, Terminology) in a biomedical domain and construct a test collection of PLOT and relations between PLOTs.

  • PDF

Advanced Procedure and Computing System for Standardization of IEC Terminologies (선진화된 IEC 기술용어 표준화 구축절차 및 전산시스템)

  • Hwang, Humor;Kim, Jung-Hoon;Moon, Bong-Hee
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.3
    • /
    • pp.388-396
    • /
    • 2016
  • Through the correspondence works with international electrotechnical vocabulary(IEV) in the smart grid field and power information technology field, we analyzed cases for discussion of terms and definitions in the IEV and then proposed an advanced procedure and computing system for standardization of International Electronical Committee(IEC) terminologies. The standardization procedure consists of processes for existing terminology, new terminology and correspondent terminology which have different structures. An example of the standardization work of correspondent terminology is given. The standardization computing system are based on the process for terminology extraction, terminology verification and terminology management which could provide the Wikipedia type terminology search function. In order to prevent that there exist multiple terminologies in IEV, the database search system is needed to be developed. We proposed the 'IEV_Term_Search' program which is the database search system. Terminology standardization of different technical committees(TC) and completion of the IEV to promote cooperation between TC 1 and the TCs must be followed by revision and standardization using the standardization computing system.

A Study on the Integration of Recognition Technology for Scientific Core Entities (과학기술 핵심개체 인식기술 통합에 관한 연구)

  • Choi, Yun-Soo;Jeong, Chang-Hoo;Cho, Hyun-Yang
    • Journal of the Korean Society for information Management
    • /
    • v.28 no.1
    • /
    • pp.89-104
    • /
    • 2011
  • Large-scaled information extraction plays an important role in advanced information retrieval as well as question answering and summarization. Information extraction can be defined as a process of converting unstructured documents into formalized, tabular information, which consists of named-entity recognition, terminology extraction, coreference resolution and relation extraction. Since all the elementary technologies have been studied independently so far, it is not trivial to integrate all the necessary processes of information extraction due to the diversity of their input/output formation approaches and operating environments. As a result, it is difficult to handle scientific documents to extract both named-entities and technical terms at once. In order to extract these entities automatically from scientific documents at once, we developed a framework for scientific core entity extraction which embraces all the pivotal language processors, named-entity recognizer and terminology extractor.

Automatic Extraction and Usage of Terminology Dictionary Based on Definitional Sentences Patterns in Technical Documents (기술문서 정의문 패턴을 이용한 전문용어사전 자동추출 및 활용방안)

  • Han, Hui-Jeong;Kim, Tae-Young;Doo, Hyo-Chul;Oh, Hyo-Jung
    • Journal of the Korean Society for information Management
    • /
    • v.34 no.4
    • /
    • pp.81-99
    • /
    • 2017
  • Technical documents are important research outputs generated by knowledge and information society. In order to properly use the technical documents properly, it is necessary to utilize advanced information processing techniques, such as summarization and information extraction. In this paper, to extract core information, we automatically extracted the terminologies and their definition based on definitional sentences patterns and the structure of technical documents. Based on this, we proposed the system to build a specialized terminology dictionary. And further we suggested the personalized services so that users can utilize the terminology dictionary in various ways as an knowledge memory. The results of this study will allow users to find up-to-date information faster and easier. In addition, providing a personalized terminology dictionary to users can maximize the value, usability, and retrieval efficiency of the dictionary.

Science and Technology Terminology Dictionary Building Process and Workbench Development in Defense Area (국방과학기술 전문용어 사전 구축을 위한 프로세스 및 워크벤치 개발)

  • Choi, Jung-Whoan;Park, Jeong-Ho;Kim, Kyung-Sun;Kim, Pyung
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.8
    • /
    • pp.420-428
    • /
    • 2012
  • To improve the efficiency of business, it is important to standardize the meaning of terminology. And then, terminology dictionaries have been actively being built and used in various fields. In defense area, the publication of defense terminology dictionary is useful for information exchange of each army and distribution of standardized terminology. Defense agency for technology and quality(DTaQ) publishes terminology dictionary of defense science and technology on a three-year cycle. DTaQ tries to standardize the construction process of terminology dictionary and improve service efficiency by using terminology dictionary in defense area. This proposed method is based on the result of previous study about standardization of terminology dictionary. We suggest the practical steps including terminology dictionary constructing process, composition and role of organization, definition of headword, selection of target documents to be extracted terminology candidate, terminology extraction, generation of terminology candidate group, workbench registration, construction and validation of terminology dictionary. Thesaurus and workbench are developed to use and support terminology dictionary effectively.

Development of u-Health standard terminology and guidelines for terminology standardization (유헬스 표준용어 및 용어 표준화 가이드라인 개발)

  • Lee, Soo-Kyoung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.6
    • /
    • pp.4056-4066
    • /
    • 2015
  • For understanding of terminology related to u-Health and activating u-Health industry, it is required to develop u-Health standard terminology for communication. The purpose of this study is to develop u-Health standard terminology and provides guidelines for terminology standardization in order to develop the u-Health standard terminology. We finally developed the 187 u-Health standard terminology through the process of data acquisition, term extraction, term refinement, term selection and term management based on reports, glossary and Telecommunications Technology Association (TTA) standards about u-Health. As a result, the standard terminology and guidelines of u-Health optimized to the domestic environment were suggested. They included details of definition, classification, components, the methods and principles of the process for u-Health standard terminology. Presented in this study, u-Health standard terminology and guidelines for terminology standardization would assist the cost-reducing of employing terminology and management of it, while making information transfer easy. This would make possible promoting efficient development of u-Health industry in general.

Optimization and Performance Analysis of Distributed Parallel Processing Platform for Terminology Recognition System (전문용어 인식 시스템을 위한 분산 병렬 처리 플랫폼 최적화 및 성능평가)

  • Choi, Yun-Soo;Lee, Won-Goo;Lee, Min-Ho;Choi, Dong-Hoon;Yoon, Hwa-Mook;Song, Sa-kwang;Jung, Han-Min
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.10
    • /
    • pp.1-10
    • /
    • 2012
  • Many statistical methods have been adapted for terminology recognition to improve its accuracy. However, since previous studies have been carried out in a single core or a single machine, they have difficulties in real-time analysing explosively increasing documents. In this study, the task where bottlenecks occur in the process of terminology recognition is classified into linguistic processing in the process of 'candidate terminology extraction' and collection of statistical information in the process of 'terminology weight assignment'. A terminology recognition system is implemented and experimented to address each task by means of the distributed parallel processing-based MapReduce. The experiments were performed in two ways; the first experiment result revealed that distributed parallel processing by means of 12 nodes improves processing speed by 11.27 times as compared to the case of using a single machine and the second experiment was carried out on 1) default environment, 2) multiple reducers, 3) combiner, and 4) the combination of 2)and 3), and the use of 3) showed the best performance. Our terminology recognition system contributes to speed up knowledge extraction of large scale science and technology documents.

A Study on the Integration of Information Extraction Technology for Detecting Scientific Core Entities based on Large Resources (대용량 자원 기반 과학기술 핵심개체 탐지를 위한 정보추출기술 통합에 관한 연구)

  • Choi, Yun-Soo;Cheong, Chang-Hoo;Choi, Sung-Pil;You, Beom-Jong;Kim, Jae-Hoon
    • Journal of Information Management
    • /
    • v.40 no.4
    • /
    • pp.1-22
    • /
    • 2009
  • Large-scaled information extraction plays an important role in advanced information retrieval as well as question answering and summarization. Information extraction can be defined as a process of converting unstructured documents into formalized, tabular information, which consists of named-entity recognition, terminology extraction, coreference resolution and relation extraction. Since all the elementary technologies have been studied independently so far, it is not trivial to integrate all the necessary processes of information extraction due to the diversity of their input/output formation approaches and operating environments. As a result, it is difficult to handle scientific documents to extract both named-entities and technical terms at once. In this study, we define scientific as a set of 10 types of named entities and technical terminologies in a biomedical domain. in order to automatically extract these entities from scientific documents at once, we develop a framework for scientific core entity extraction which embraces all the pivotal language processors, named-entity recognizer, co-reference resolver and terminology extractor. Each module of the integrated system has been evaluated with various corpus as well as KEEC 2009. The system will be utilized for various information service areas such as information retrieval, question-answering(Q&A), document indexing, dictionary construction, and so on.

Terminology Recognition System based on Machine Learning for Scientific Document Analysis (과학 기술 문헌 분석을 위한 기계학습 기반 범용 전문용어 인식 시스템)

  • Choi, Yun-Soo;Song, Sa-Kwang;Chun, Hong-Woo;Jeong, Chang-Hoo;Choi, Sung-Pil
    • The KIPS Transactions:PartD
    • /
    • v.18D no.5
    • /
    • pp.329-338
    • /
    • 2011
  • Terminology recognition system which is a preceding research for text mining, information extraction, information retrieval, semantic web, and question-answering has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.8 and 6.5% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.8 on F-score. We applied three machine learning methods such as Logistic regression, C4.5, and SVMs, and got the best score from the decision tree method, C4.5.

Design and Implementation of an Ontology-based Knowledge Management System

  • Hideki-Mima;Yoon, Tae-Sung;Katsumori-Matsushima
    • Proceedings of the CALSEC Conference
    • /
    • 2004.02a
    • /
    • pp.107-111
    • /
    • 2004
  • The purpose of the study is to develop an integrated knowledge management system for the domains of genome and nano-technology, in which terminology-based literature mining, knowledge acquisition, knowledge structuring, and knowledge retrieval are combined. The system supports integrating different types of databases (papers and patents, technologies and innovations) and retrieving different types of knowledge simultaneously. The main objective of the system is to facilitate knowledge acquisition from documents and new knowledge discovery through a terminology-based similarity calculation and a visualization of automatically structured knowledge. Implementation issue of the system is also mentioned.

  • PDF