• Title/Summary/Keyword: Automatic Information Extraction

Search Result 592, Processing Time 0.024 seconds

Applying Natural Language Processing Techniques to Bioinformatics

  • Park, Hyun-Seok
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.71-73
    • /
    • 2000
  • Considering that there is the lack of standards for storing genome-related on-line documents, the techniques in Natural Language Processing (NLP) is likely to become more and more important. It is necessary to extract useful information from the raw text and to store it in a computer-readable database format. Recent advances in NLP technologies raise new challenges and opportunities for tackling genome-related on-line text for information extraction task, For example, we can obtain many useful information related to genetic network or metabolic pathways simply by analyzing verbs such as 'activate'or 'inhibit'in Medline abstracts in a fully automatic way, Thus, combining NLP techniques with genome informatics extends beyond the traditional realms of either technology to a variety of emerging applications.

  • PDF

A Methodology for Ontology-based Knowledge Acquisition and Structuring in an Industry-Academic-Government Project ″Go Japan!″

  • Hideki-Mima;Yoon, Tae-Sung
    • Proceedings of the CALSEC Conference
    • /
    • 2003.09a
    • /
    • pp.197-203
    • /
    • 2003
  • The purpose of the study is to develop an integrated knowledge structuring system for the domain of engineering, in which ontology-based literature mining, knowledge acquisition, knowledge integration, and knowledge retrieval are combined using XML-based tag information and ontology management. The system supports combining different types of databases (papers and patents, technologies and innovations) and retrieving different types of knowledge simultaneously. The main objective of the system is to facilitate knowledge acquisition and knowledge retrieval from documents through an ontology-based dynamic similarity calculation and a visualization of automatically structured knowledge. Through experimentations we conducted using 100,000 words economic documents reported in the "Go! Japan" project for analyzing Japanese industrial situation, and 100,000 words molecular biology Papers, we show the system is Practical enough for accelerating knowledge acquisition and knowledge discovery from the information sea.

  • PDF

Automatic detection of pulmonary nodules in X-ray chest images (폐의 X선 영상에서의 노쥴 자동 탐지 기법)

  • Seong, Won;Park, Jong-Won
    • Annual Conference of KIPS
    • /
    • 2002.04a
    • /
    • pp.767-770
    • /
    • 2002
  • 일반적으로 방사선 의사들(radialogists)이 폐 노쥴(pulmonary nodule)을 탐지하는 데는 실제적으로 30%의 실패율을 가진다고 알려져 있다. 만약 자동화된 시스템이 체스트 영상에서 의심스런 노쥴들의 위치들을 방사선 의사에게 알려줄 수 있다면 잘못 판단되는 노쥴들의 수를 잠재적으로 줄일 수 있다. 우리는 형태학적 필터들(morphological filters)과 두가지 특징-추출(feature-extraction) 기술들을 포함하는 컴퓨터 자동 처리 시스템을 구현하였다. 본 시스템에서는 첫째로 형태학적 필터(morphological filtering) 처리를 행한다. 이 과정은 원래의 영상에 침식(erosion)과 확장 (dilation)을 연이어서 행하는 것으로 처리가 어려운 X 선 영상을 좀 더 다루기 쉬운 상태로 바꿔주는 역할을 하게 된다. 둘째는 일차적으로 노쥴로서 컴퓨터에 선택된 의심 부분에 가해주는 특징-추출 테스트로서 이 작용은 노쥴로 감지되었으나 실제로는 노쥴이 아닌 경우인 false-positive 갑지들을 줄이기 위해서 사용된다. 그리하여 본 시스템은 노쥴의 정확한 판독이 어려운 폐의 X 선 영상에 적용되어 false-positive 들을 효과적으로 줄임으로써 보다 효율적인 폐 노쥴의 탐지를 가능하게 하였다.

  • PDF

Keyword Automatic Extraction Scheme with Enhanced TextRank using Word Co-Occurrence in Korean Document (한글 문서의 단어 동시 출현 정보에 개선된 TextRank를 적용한 키워드 자동 추출 기법)

  • Song, KwangHo;Min, Ji-Hong;Kim, Yoo-Sung
    • Annual Conference on Human and Language Technology
    • /
    • 2016.10a
    • /
    • pp.62-66
    • /
    • 2016
  • 문서의 의미 기반 처리를 위해서 문서의 내용을 대표하는 키워드를 추출하는 것은 정확성과 효율성 측면에서 매우 중요한 과정이다. 그러나 단일문서로부터 키워드를 추출해 내는 기존의 연구들은 정확도가 낮거나 한정된 분야에 대해서만 검증을 수행하여 결과를 신뢰하기 어려운 문제가 있었다. 따라서 본 연구에서는 정확하면서도 다양한 분야의 텍스트에 적용 가능한 키워드 추출 방법을 제시하고자 단어의 동시출현정보와 그래프 모델을 바탕으로 TextRank 알고리즘을 변형한 새로운 형태의 알고리즘을 동시에 적용하는 키워드 추출 기법을 제안하였다. 제안한 기법을 활용하여 성능평가를 진행한 결과 기존의 연구들보다 향상된 정확도를 얻을 수 있음을 확인하였다.

  • PDF

Automatic Extraction of protein-protein interaction information from biological literature (생물학 관련 문헌으로부터 상호작용 정보 자동 추출)

  • 정의헌;김민경;박현석
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10b
    • /
    • pp.808-810
    • /
    • 2003
  • 본 논문에서는 생물학 관련 문서에서 단백질 간의 상호작용을 추출하는 방법에 대한 전반적인 기술 동향을 소개하고, 현재 구현된 상호작용 정보 자동추출 시스템의 연구 결과에 대해 기술한다. 일반적으로 이미 알려진 단백질들의 관계를 추출함에 있어서는 단백질의 이름에 대한 특성 구분과 표현의 의미적 해석등에 NLP 기법을 사용하여, 사용자 정의에 따른 룰을 생성하는 방법과 데이터 마이닝 기법을 적용하여, 단백질간의 관계를 자동적으로 추출하는 방법, 또한 위의 이 두가지 방법을 병행하는 방법이 현재 연구되고 있다. 이 논문에서는 자연언어처리 기법과 머신러닝 기법(SVM)을 이용하여, 단백질간의 상호작용에 관한 일반 생물 정보 문헌에서 추출하고, 그 성능을 테스트 해 보겠다.

  • PDF

Automatic Extraction of Dependencies between Web Components and Database Resources in Java Web Applications

  • Oh, Jaewon;Ahn, Woo Hyun;Kim, Taegong
    • Journal of information and communication convergence engineering
    • /
    • v.17 no.2
    • /
    • pp.149-160
    • /
    • 2019
  • Web applications typically interact with databases. Therefore, it is very crucial to understand which web components access which database resources when maintaining web apps. Existing research identifies interactions between Java web components, such as JavaServer Pages and servlets but does not extract dependencies between the web components and database resources, such as tables and attributes. This paper proposes a dynamic analysis of Java web apps, which extracts such dependencies from a Java web app and represents them as a graph. The key responsibility of our analysis method is to identify when web components access database resources. To fulfill this responsibility, our method dynamically observes the database-related objects provided in the Java standard library using the proxy pattern, which can be applied to control access to a desired object. This study also experiments with open source web apps to verify the feasibility of the proposed method.

The Design and Implementation of OWL Ontology Construction System through Information Extraction of Unstructured Documents (비정형 문서의 정보추출을 통한 OWL 온톨로지 구축 시스템의 설계 및 구현)

  • Jo, Dae Woong;Choi, Ji Woong;Kim, Myung Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.10
    • /
    • pp.23-33
    • /
    • 2014
  • The development of the information retrieval field is evolving to the research field searching accurately for the information from thing finding rapidly a large amount of information. Personalization and the semantic web technology is a key technology. The automatic indexing technology about the web document and throughput go beyond the research stage and show up as the practical service. However, there is a lack of research on the document information retrieval field about the attached document type of except the web document. In this paper, we illustrate about the method in which it analyzed the text content of the unstructured documents prepared in the text, word, hwp form and it how to construction OWL ontology. To build TBox of the document ontology and the resources which can be obtained from the document is selected, and we implement with the system in order to utilize as the instant of the constructed document ontology. It is effectually usable in the information retrieval and document management system using the semantic technology of the correspondence document as the ontology automatic construction of this kind of the unstructured documents.

Texture Analysis Algorithm and its Application to Leather Automatic Classification Inspection System (텍스처 분석 알고리즘과 피혁 자동 선별 시스템에의 응용)

  • 김명재;이명수;권장우;김광섭;길경석
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2001.10a
    • /
    • pp.363-366
    • /
    • 2001
  • The present process of grading leather quality by the rare eyes is not reliable. Because inconsistency of grading due to eyes strain for long time can cause incorrect result of grading. Therefore it is necessary to automate the process of grading quality of leather based on objective standard for it. In this paper, leather automatic classification system consists of the process obtaining the information of leather and the process grading the quality of leather from the information. Leather is graded by its information such as texture density, types and distribution of defects. This paper proposes the algorithm which sorts out leather information like texture density and defects from the gray-level images obtained by digital camera. The density information is sorted out by the distribution value of Fourier spectrum which comes out after original image is converted to the image in frequency domain. And the defect information is obtained by the statistics of pixels which is relevant to Window using searching Window after sort out boundary lines from preprocessed images. The information for entire leather is used as standard of grading leather quality, and the proposed algorithm is practically applied to machine vision system.

  • PDF

The Automatic Extraction System of Application Update Information in Android Smart Device (안드로이드 스마트 기기 내의 애플리케이션 업데이트 정보 자동 추출 시스템)

  • Kim, Hyounghwan;Kim, Dohyun;Park, Jungheum;Lee, Sangjin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.24 no.2
    • /
    • pp.345-352
    • /
    • 2014
  • As the utilization rate of smart device increases, various applications for smart device have been developed. Since these applications can contain important data related to user behaviors in digital forensic perspective, the analysis of them should be conducted in advance. However, lots of applications get to have new data format or type when they are updated. Therefore, whether the applications are updated or not should be checked one by one, and if they are, whether their data are changed should be also analyzed. But observing application data repeatedly is a time-consuming task, and that is why the effective method for dealing with this problem is needed. This paper suggests the automatic system which gets updated information and checks changed data by collecting application information.

Automatic Information Extraction for Structured Web Documents (구조화된 웹 문서에 대한 자동 정보추출)

  • Yun, Bo-Hyun
    • Journal of Internet Computing and Services
    • /
    • v.6 no.3
    • /
    • pp.129-145
    • /
    • 2005
  • This paper proposes the web information extraction system that extracts the pre-defined information automatically from web documents (i.e, HTML documents) and integrates the extracted information, The system recognizes entities without lables by the probabilistic based entity recognition method and extends the existing domain knowledge semiautomatically by using the extracted data, Moreover, the system extracts the sub-linked information linked to the basic page and integrates the similar results extracted from heterogeneous sources, The experimental result shows that the system extracts the sub-linked information and uses the probabilistic based entity recognition enhances the precision significantly against the system using only the domain knowledge, Moreover, the presented system can the more various information precisely due to applying the system with flexibleness according to domains, Because bath the semiautomatic domain knowledge expansion and the probabilistic based entity recognition improve the quality of the information, the system can increase the degree of user satisfaction at its maximum. Thus, this system can satisfy the intellectual curiosity of users from movie sites, performance sites, and dining room sites, We can construct various comparison shopping mall and contribute the revitalization of e-business.

  • PDF