• Title/Summary/Keyword: Terminology Recognition

Search Result 33, Processing Time 0.03 seconds

Optimization and Performance Analysis of Distributed Parallel Processing Platform for Terminology Recognition System (전문용어 인식 시스템을 위한 분산 병렬 처리 플랫폼 최적화 및 성능평가)

  • Choi, Yun-Soo;Lee, Won-Goo;Lee, Min-Ho;Choi, Dong-Hoon;Yoon, Hwa-Mook;Song, Sa-kwang;Jung, Han-Min
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.10
    • /
    • pp.1-10
    • /
    • 2012
  • Many statistical methods have been adapted for terminology recognition to improve its accuracy. However, since previous studies have been carried out in a single core or a single machine, they have difficulties in real-time analysing explosively increasing documents. In this study, the task where bottlenecks occur in the process of terminology recognition is classified into linguistic processing in the process of 'candidate terminology extraction' and collection of statistical information in the process of 'terminology weight assignment'. A terminology recognition system is implemented and experimented to address each task by means of the distributed parallel processing-based MapReduce. The experiments were performed in two ways; the first experiment result revealed that distributed parallel processing by means of 12 nodes improves processing speed by 11.27 times as compared to the case of using a single machine and the second experiment was carried out on 1) default environment, 2) multiple reducers, 3) combiner, and 4) the combination of 2)and 3), and the use of 3) showed the best performance. Our terminology recognition system contributes to speed up knowledge extraction of large scale science and technology documents.

Terminology Recognition System based on Machine Learning for Scientific Document Analysis (과학 기술 문헌 분석을 위한 기계학습 기반 범용 전문용어 인식 시스템)

  • Choi, Yun-Soo;Song, Sa-Kwang;Chun, Hong-Woo;Jeong, Chang-Hoo;Choi, Sung-Pil
    • The KIPS Transactions:PartD
    • /
    • v.18D no.5
    • /
    • pp.329-338
    • /
    • 2011
  • Terminology recognition system which is a preceding research for text mining, information extraction, information retrieval, semantic web, and question-answering has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.8 and 6.5% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.8 on F-score. We applied three machine learning methods such as Logistic regression, C4.5, and SVMs, and got the best score from the decision tree method, C4.5.

Design and Implementation of an Ontology-based Knowledge Management System

  • Hideki-Mima;Yoon, Tae-Sung;Katsumori-Matsushima
    • Proceedings of the CALSEC Conference
    • /
    • 2004.02a
    • /
    • pp.107-111
    • /
    • 2004
  • The purpose of the study is to develop an integrated knowledge management system for the domains of genome and nano-technology, in which terminology-based literature mining, knowledge acquisition, knowledge structuring, and knowledge retrieval are combined. The system supports integrating different types of databases (papers and patents, technologies and innovations) and retrieving different types of knowledge simultaneously. The main objective of the system is to facilitate knowledge acquisition from documents and new knowledge discovery through a terminology-based similarity calculation and a visualization of automatically structured knowledge. Implementation issue of the system is also mentioned.

  • PDF

A Study on the Integration of Recognition Technology for Scientific Core Entities (과학기술 핵심개체 인식기술 통합에 관한 연구)

  • Choi, Yun-Soo;Jeong, Chang-Hoo;Cho, Hyun-Yang
    • Journal of the Korean Society for information Management
    • /
    • v.28 no.1
    • /
    • pp.89-104
    • /
    • 2011
  • Large-scaled information extraction plays an important role in advanced information retrieval as well as question answering and summarization. Information extraction can be defined as a process of converting unstructured documents into formalized, tabular information, which consists of named-entity recognition, terminology extraction, coreference resolution and relation extraction. Since all the elementary technologies have been studied independently so far, it is not trivial to integrate all the necessary processes of information extraction due to the diversity of their input/output formation approaches and operating environments. As a result, it is difficult to handle scientific documents to extract both named-entities and technical terms at once. In order to extract these entities automatically from scientific documents at once, we developed a framework for scientific core entity extraction which embraces all the pivotal language processors, named-entity recognizer and terminology extractor.

Automatic Terminology Recognition using the Dictionary Hierarchy (사전간 계층관계를 이용한 전문용어 자동 추출 기법)

  • 오종훈;이경순;최기선
    • Proceedings of the Korean Society for Cognitive Science Conference
    • /
    • 2000.05a
    • /
    • pp.131-136
    • /
    • 2000
  • 기존의 통계에 기반한 용어 자동 추출 기법(Automatic Term Recognition)은 비교적 좋은 성능의 결과를 보여왔다. 하지만 전문용어 사전 등의 정보를 이용하여 성능의 향상을 이룰 수 있는 여지는 여전히 남아있다. 본 논문에서는 이러한 근거에 기반하여 전문용어간의 계층 정보를 전문용어 사전을 통하여 구축하고 이를 이용하여 전문용어를 추출하는 방법을 제안하고자 한다. 본 논문이 제안하는 기법은 기존의 방법에 비해 좋은 성능을 나타내었다.

  • PDF

Control Procedures of Standardization for GIS Terminology (GIS 용어 표준화과정에 대한 고찰)

  • 성효현
    • Spatial Information Research
    • /
    • v.6 no.2
    • /
    • pp.183-199
    • /
    • 1998
  • The motivation of this paper comes from a recognition that GIS educators in the private and public sectors are faced with both an opportunity and a dilemma. As the GIS vendors move to open systems which can be integrated with many traditional operations, the use of spatial data and analysis will become widespread throughout business, government and education. Hence the need for standardization in GIS fields is expanding rapidly. Especially non-standardized terminology of GIS prevents GIS-users from communicating among the GIS application fields. This paper will assist this shifting foundation by providing terminology control procedures for ISO/TC2ll family of standards and KS(Korea Standards) information terminology and make recommendations for the improvement and harmonization of terminology.

  • PDF

Construction of Test Collection for Extraction of Biomedical PLOT & Relations (생의학분야 PLOT 및 관계추출을 위한 테스트컬렉션 구축)

  • Choi, Yun-Soo;Choi, Sung-Phl;Jeong, Chang-Hoo
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2010.05a
    • /
    • pp.425-427
    • /
    • 2010
  • Large-scaled information extraction consists of named-entity recognition, terminology extraction and relation extraction. Since all the elementary technologies have been studied independently so far, test collections for related machine learning models also have been constructed independently. As a result, it is difficult to handle scientific documents to extract both named-entities and technical terms at once. In this study, we integrate named-entities and terminologies with PLOT(Person, Location, Organization, Terminology) in a biomedical domain and construct a test collection of PLOT and relations between PLOTs.

  • PDF

Improving spaCy dependency annotation and PoS tagging web service using independent NER services

  • Colic, Nico;Rinaldi, Fabio
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.21.1-21.6
    • /
    • 2019
  • Dependency parsing is often used as a component in many text analysis pipelines. However, performance, especially in specialized domains, suffers from the presence of complex terminology. Our hypothesis is that including named entity annotations can improve the speed and quality of dependency parses. As part of BLAH5, we built a web service delivering improved dependency parses by taking into account named entity annotations obtained by third party services. Our evaluation shows improved results and better speed.

A Study on the Integration of Information Extraction Technology for Detecting Scientific Core Entities based on Large Resources (대용량 자원 기반 과학기술 핵심개체 탐지를 위한 정보추출기술 통합에 관한 연구)

  • Choi, Yun-Soo;Cheong, Chang-Hoo;Choi, Sung-Pil;You, Beom-Jong;Kim, Jae-Hoon
    • Journal of Information Management
    • /
    • v.40 no.4
    • /
    • pp.1-22
    • /
    • 2009
  • Large-scaled information extraction plays an important role in advanced information retrieval as well as question answering and summarization. Information extraction can be defined as a process of converting unstructured documents into formalized, tabular information, which consists of named-entity recognition, terminology extraction, coreference resolution and relation extraction. Since all the elementary technologies have been studied independently so far, it is not trivial to integrate all the necessary processes of information extraction due to the diversity of their input/output formation approaches and operating environments. As a result, it is difficult to handle scientific documents to extract both named-entities and technical terms at once. In this study, we define scientific as a set of 10 types of named entities and technical terminologies in a biomedical domain. in order to automatically extract these entities from scientific documents at once, we develop a framework for scientific core entity extraction which embraces all the pivotal language processors, named-entity recognizer, co-reference resolver and terminology extractor. Each module of the integrated system has been evaluated with various corpus as well as KEEC 2009. The system will be utilized for various information service areas such as information retrieval, question-answering(Q&A), document indexing, dictionary construction, and so on.

A Survey on the Awareness of Radiation-related Workers and Radiation Workers in the Medical Institutions According to the Dual System (의료기관의 방사선사 중 방사선 관계종사자와 방사선 작업종사자의 이원화 체계에 따른 인식도 조사)

  • Her, Mi;Ahn, Sung-Min
    • Journal of radiological science and technology
    • /
    • v.41 no.5
    • /
    • pp.479-485
    • /
    • 2018
  • Radiologic technologists working at the second and third medical institutions are classified as radiation-related workers and radiation workers according to their working departments, and are subject to double regulation by the Ministry of Health and Welfare and the Nuclear Safety Commission. We will try to understand the system of dualization and to understand the investigation of recognition. The dualized system of radiation-related workers and radiation workers includes the difference in name and terminology, the effective dose limit, the maintenance education and training of radiologic technologists, the period of medical examination, the radiation zone, dose of the woman whose pregnancy is confirmed in radiologic technologists, the qualification criteria of the safety officer, and the period of the regular inspection of the radiological equipment. In the questionnaire survey on the dualization system, there were various items showing significant differences between the radiation-related workers and radiation workers Overall, the radiation workers were more aware of the radiation workers' education and related terms than the radiation-related workers.