• Title/Summary/Keyword: Automatic Information Extraction

Search Result 592, Processing Time 0.031 seconds

Comparative Study of GDPA and Hough Transformation for Automatic Linear Feature Extraction

  • Ryu, Hee-Young;Lee, Ki-Won;Kwon, Byung-Doo
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.238-240
    • /
    • 2003
  • As remote sensing is weighty in GIS updating, it is indispensable to get spatial information quickly and exactly. In this study, we have designed and implemented the program by two algorithms of GDPA (Gradient Direction Profile Analysis) and Hough transformation to extract linear features automatically from high-resolution imagery. We applied the software to embody both algorithms to KOMPSAT-EOC, IKONOS, and Landsat-ETM and made a comparative study of results.

  • PDF

Automatic Extraction of Medical Term Definition from Texts (의학 전문용어의 정의문 자동 추출)

  • 김재호;배선미;신효식;최기선
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.922-924
    • /
    • 2004
  • 지식 정보의 확산에 따라 기존 전문분야 용어집에 수록되지 않은 용어의 수가 폭발적으로 증가하고 있다 이에 따라 용어집을 자동으로 구축하는 작업이 필요하게 되었다. 본 논문에서는 의학분야 코퍼스에서 주어진 전문용어에 대한 정의문을 자동으로 추출하는 방법을 제안한다. 우선, 정의문의 구문적 패턴과 용어의 어휘구성 패턴을 이용하여 용어의 상위개념을 추정한다. 상위개념별로 구축된 특성 어휘 목록을 이용하여 구문적 패턴으로 뽑힌 문장에 등장하는 어휘의 적합성 여부를 판단하여 정의문을 추출한다. 실험 결과 코퍼스에 정의 정보가 있는 48개의 용어에 대하여 71.43%의 정확률을 보인다.

  • PDF

Automatic Text Summarization with Two Step Sentence Extraction (2단계 문장 추출방법을 이용한 자동 문서 요약)

  • 정운철;고영중;서정연
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.910-912
    • /
    • 2004
  • 자동 문서 요약 시스템은 문서내에 담겨있는 정보를 최대한 표현하면서 문서의 크기를 줄이는 시스템이다. 본 논문에서는 문서 요약을 크게 2단계로 나누어서 수행한다. 문장내 요약본으로써의 불필요한 문장을 미리 제거하고 이에 더해 다양한 통계적 방법의 여러 장점들을 수용함으로써 보다 나은 성능 향상을 얻을 수 있었다. 비교시스템으로는 제목, 위치, 빈도, 도합유사도, 어휘 클러스터링을 이용한 시스템을 구축하여 사용하였으며 30%, 10% 문장요약에서 제안한 시스템은 모두 우수한 성능을 보였다.

  • PDF

An Automatic Extraction Algorithm of Structure Boundary from Terrestrial LIDAR Data (지상라이다 데이터를 이용한 구조물 윤곽선 자동 추출 알고리즘 연구)

  • Roh, Yi-Ju;Kim, Nam-Woon;Yun, Kee-Bang;Jung, Kyeong-Hoon;Kang, Dong-Wook;Kim, Ki-Doo
    • 전자공학회논문지 IE
    • /
    • v.46 no.1
    • /
    • pp.7-15
    • /
    • 2009
  • In this paper, automatic structure boundary extraction is proposed using terrestrial LIDAR (Light Detection And Ranging) in 3-dimensional data. This paper describes an algorithm which does not use pictures and pre-processing. In this algorithm, an efficient decimation method is proposed, considering the size of object, the amount of LIDAR data, etc. From these decimated data, object points and non-object points are distinguished using distance information which is a major features of LIDAR. After that, large and small values are extracted using local variations, which can be candidate for boundary. Finally, a boundary line is drawn based on the boundary point candidates. In this way, the approximate boundary of the object is extracted.

Automated Brain Region Extraction Method in Head MR Image Sets (머리 MR영상에서 자동화된 뇌영역 추출)

  • Cho, Dong-Uk;Kim, Tae-Woo;Shin, Seung-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.2 no.3
    • /
    • pp.1-15
    • /
    • 2002
  • A noel automated brain region extraction method in single channel MR images for visualization and analysis of a human brain is presented. The method generates a volume of brain masks by automatic thresholding using a dual curve fitting technique and by 3D morphological operations. The dual curve fitting can reduce an error in clue fitting to the histogram of MR images. The 3D morphological operations, including erosion, labeling of connected-components, max-feature operation, and dilation, are applied to the cubic volume of masks reconstructed from the thresholded Drain masks. This method can automatically extract a brain region in any displayed type of sequences, including extreme slices, of SPGR, T1-, T2-, and PD-weighted MR image data sets which are not required to contain the entire brain. In the experiments, the algorithm was applied to 20 sets of MR images and showed over 0.97 of similarity index in comparison with manual drawing.

  • PDF

Study on the Generation Methods of Composition Noun for Efficient Index Term Extraction (효율적인 색인어 추출을 위한 합성명사 생성 방안에 대한 연구)

  • Kim, Mi-Jin;Park, Mi-Seong;Choe, Jae-Hyeok;Lee, Sang-Jo
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.4
    • /
    • pp.1122-1131
    • /
    • 2000
  • The efficiency of thesytem depends upon an accurate extraction capability of index terms in the system of information search or in that of automatic index. Therefore, extraction of accurate index terms is of utmost importance. This report presents the generation methods of composition noun for efficient index term extraction by using words of high frequency appearance, so that the right documents can be found during information search. For the sake of presentation of this method, index terms of composition noun shall be extracted by applying the rule of composition and disintegration to the nouns with high frequency of appearance in the documents, such as those with upper 30%∼40% of frequency ratio. In addition, for he purpose of effecting an inspection of validity in relation to a composition of high frequency nouns such as those with upper 30∼40% of frequency ratio as presented in this report, it proposes an adequate frquency ratio during noun composition. Based upon the proposed application, in this short documents with less than 300 syllables, low frequency omissions were noticed, when composed with nouns in the upper 30% of frequency ratio; whereas the documents with more than 30 syllables, when composed with nouns in he upper 40% of frequency ration, had a considerable reduction of low frequency omissions. Thus, total number of index terms has decreased to 57.7% of these existing and an accurate extraction of index terms with an 85.6% adequacy ratio became possible.

  • PDF

A Knowledge-based Wrapper Learning Agent for Semi-Structured Information Sources (준구조화된 정보소스에 대한 지식기반의 Wrapper 학습 에이전트)

  • Seo, Hee-Kyoung;Yang, Jae-Young;Choi, Joong-Min
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.1_2
    • /
    • pp.42-52
    • /
    • 2002
  • Information extraction(IE) is a process of recognizing and fetching particular information fragments from a document. In previous work, most IE systems generate the extraction rules called the wrappers manually, and although this manual wrapper generation may achieve more correct extraction, it reveals some problems in flexibility, extensibility, and efficiency. Some other researches that employ automatic ways of generating wrappers are also experiencing difficulties in acquiring and representing useful domain knowledge and in coping with the structural heterogeneity among different information sources, and as a result, the real-world information sources with complex document structures could not be correctly analyzed. In order to resolve these problems, this paper presents an agent-based information extraction system named XTROS that exploits the domain knowledge to learn from documents in a semi-structured information source. This system generates a wrapper for each information source automatically and performs information extraction and information integration by applying this wrapper to the corresponding source. In XTROS, both the domain knowledge and the wrapper are represented as XML-type documents. The wrapper generation algorithm first recognizes the meaning of each logical line of a sample document by using the domain knowledge, and then finds the most frequent pattern from the sequence of semantic representations of the logical lines. Eventually, the location and the structure of this pattern represented by an XML document becomes the wrapper. By testing XTROS on several real-estate information sites, we claim that it creates the correct wrappers for most Web sources and consequently facilitates effective information extraction and integration for heterogeneous and complex information sources.

Automation of Snake for Extraction of Multi-Object Contours from a Natural Scene (자연배경에서 여러 객체 윤곽선의 추출을 위한 스네이크의 자동화)

  • 최재혁;서경석;김복만;최흥문
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.6
    • /
    • pp.712-717
    • /
    • 2003
  • A novel multi-snake is proposed for efficient extraction of multi-object contours from a natural scene. An NTGST(noise-tolerant generalized symmetry transform) is used as a context-free attention operator to detect and locate multiple objects from a complex background and then the snake points are automatically initialized nearby the contour of each detected object using symmetry map of the NTGST before multiple snakes are introduced. These procedures solve the knotty subjects of automatic snake initialization and simultaneous extraction of multi-object contours in conventional snake algorithms. Because the snake points are initialized nearby the actual contour of each object, as close as possible, contours with high convexity and/or concavity can be easily extracted. The experimental results show that the proposed method can efficiently extract multi-object contours from a noisy and complex background of natural scenes.

Automatic Product Feature Extraction for Efficient Analysis of Product Reviews Using Term Statistics (효율적인 상품평 분석을 위한 어휘 통계 정보 기반 평가 항목 추출 시스템)

  • Lee, Woo-Chul;Lee, Hyun-Ah;Lee, Kong-Joo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.6
    • /
    • pp.497-502
    • /
    • 2009
  • In this paper, we introduce an automatic product feature extracting system that improves the efficiency of product review analysis. Our system consists of 2 parts: a review collection and correction part and a product feature extraction part. The former part collects reviews from internet shopping malls and revises spoken style or ungrammatical sentences. In the latter part, product features that mean items that can be used as evaluation criteria like 'size' and 'style' for a skirt are automatically extracted by utilizing term statistics in reviews and web documents on the Internet. We choose nouns in reviews as candidates for product features, and calculate degree of association between candidate nouns and products by combining inner association degree and outer association degree. Inner association degree is calculated from noun frequency in reviews and outer association degree is calculated from co-occurrence frequency of a candidate noun and a product name in web documents. In evaluation results, our extraction method showed an average recall of 90%, which is better than the results of previous approaches.

A Study on a technology of extraction of motion objects (3차원 동작객체 추출기술에 관한 연구)

  • 오영진;박노국
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.4 no.3
    • /
    • pp.21-27
    • /
    • 1999
  • This paper introduces the research and development of automatic generation technology to develop the character agent. The R&D of this technology includes three major elements-body model generation, automatic motion generation and synthetic human generation. Main areas of application would by cyber space- 3D game, animation, virtual shopping, on line chatting, virtual education system, simulation and security system.

  • PDF