• Title/Summary/Keyword: Automatic Information Extraction

Search Result 592, Processing Time 0.033 seconds

Main Content Extraction from Web Pages Based on Node Characteristics

  • Liu, Qingtang;Shao, Mingbo;Wu, Linjing;Zhao, Gang;Fan, Guilin;Li, Jun
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.2
    • /
    • pp.39-48
    • /
    • 2017
  • Main content extraction of web pages is widely used in search engines, web content aggregation and mobile Internet browsing. However, a mass of irrelevant information such as advertisement, irrelevant navigation and trash information is included in web pages. Such irrelevant information reduces the efficiency of web content processing in content-based applications. The purpose of this paper is to propose an automatic main content extraction method of web pages. In this method, we use two indicators to describe characteristics of web pages: text density and hyperlink density. According to continuous distribution of similar content on a page, we use an estimation algorithm to judge if a node is a content node or a noisy node based on characteristics of the node and neighboring nodes. This algorithm enables us to filter advertisement nodes and irrelevant navigation. Experimental results on 10 news websites revealed that our algorithm could achieve a 96.34% average acceptable rate.

A New Temporal Filtering Method for Improved Automatic Lipreading (향상된 자동 독순을 위한 새로운 시간영역 필터링 기법)

  • Lee, Jong-Seok;Park, Cheol-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.15B no.2
    • /
    • pp.123-130
    • /
    • 2008
  • Automatic lipreading is to recognize speech by observing the movement of a speaker's lips. It has received attention recently as a method of complementing performance degradation of acoustic speech recognition in acoustically noisy environments. One of the important issues in automatic lipreading is to define and extract salient features from the recorded images. In this paper, we propose a feature extraction method by using a new filtering technique for obtaining improved recognition performance. The proposed method eliminates frequency components which are too slow or too fast compared to the relevant speech information by applying a band-pass filter to the temporal trajectory of each pixel in the images containing the lip region and, then, features are extracted by principal component analysis. We show that the proposed method produces improved performance in both clean and visually noisy conditions via speaker-independent recognition experiments.

Automatic Coastline Extraction and Change Detection Monitoring using LANDSAT Imagery (LANDSAT 영상을 이용한 해안선 자동 추출과 변화탐지 모니터링)

  • Kim, Mi Kyeong;Sohn, Hong Gyoo;Kim, Sang Pil;Jang, Hyo Seon
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.21 no.4
    • /
    • pp.45-53
    • /
    • 2013
  • Global warming causes sea levels to rise and global changes apparently taking place including coastline changes. Coastline change due to sea level rise is also one of the most significant phenomena affected by global climate change. Accordingly, Coastline change detection can be utilized as an indicator of representing global climate change. Generally, Coastline change has happened mainly because of not only sea level rise but also artificial factor that is reclaimed land development by mud flat reclamation. However, Arctic coastal areas have been experienced serious change mostly due to sea level rise rather than other factors. The purposes of this study are automatic extraction of coastline and identifying change. In this study, in order to extract coastline automatically, contrast of the water and the land was maximized utilizing modified NDWI(Normalized Difference Water Index) and it made automatic extraction of coastline possibile. The imagery converted into modified NDWI were applied image processing techniques in order that appropriate threshold value can be found automatically to separate the water and land. Then the coastline was extracted through edge detection algorithm and changes were detected using extracted coastlines. Without the help of other data, automatic extraction of coastlines using LANDSAT was possible and similarity was found by comparing NLCD data as a reference data. Also, the results of the study area that is permafrost always frozen below $0^{\circ}C$ showed quantitative changes of the coastline and verified that the change was accelerated.

Skin Region Extraction Using Multi-Layer Neural Network and Skin-Color Model (다층 신경망과 피부색 모델을 이용한 피부 영역 검출)

  • Park, Sung-Wook;Park, Jong-Wook
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.16 no.2
    • /
    • pp.31-38
    • /
    • 2011
  • Skin color is a very important information for an automatic face recognition. In this paper, we proposed a skin region extraction method using the MLP(Multi-Layer Perceptron) and skin color model. We use the adaptive lighting compensation technique for improved performance of skin region extraction. Also, using an preprocessing filter, normally large areas of easily distinct non-skin pixels, are eliminated from further processing. Experimental results show that the proposed method has better performance than the conventional methods, and reduces processing time by 31~49% on average.

Automatic Generation of Digital Elevation Model from 2D Terrain Map Using Graph-theoretic Algorithms (그래픽이론적 알고리즘들을 이용한 2차원 지형도로 부터 DEM 의 자동생성방법)

  • 구자영
    • Korean Journal of Remote Sensing
    • /
    • v.9 no.2
    • /
    • pp.21-34
    • /
    • 1993
  • Digitalized topographic information is necessary for many areas such as landscape analysis, civil engineering planning and design, and geographic information systems. It can also be used in flight simulator and automatic navigation of unmanned plane if it is stored in computer in relevant format. Topographic information is coded with various symbols including contour lines, and is analyzed by trained personnels. The information should be stored in computer for automatic analysis, but it requires a lot of time and manpower to enter the contours using manual input devices such as digitizing tablet. This paper deals with automatic extraction and reconstruction of 3D topographic information from 2D terrain map. Several algorithms were developed in this work including contour segment finding algorithm and contour segment linking algorithm. The algorithm were tested using real 2D terrain map.

A Study on Face Component Extraction for Automatic Generation of Personal Avatar (개인아바타 자동 생성을 위한 얼굴 구성요소의 추출에 관한 연구)

  • Choi Jae Young;Hwang Seung Ho;Yang Young Kyu;Whangbo Taeg Ken
    • Journal of Internet Computing and Services
    • /
    • v.6 no.4
    • /
    • pp.93-102
    • /
    • 2005
  • In Recent times, Netizens have frequently use virtual character 'Avatar' schemes in order to present their own identity, there is a strong need for avatars to resemble the user. This paper proposes an extraction technique for facial region and features that are used in generating the avatar automatically. For extraction of facial feature component, the method uses ACM and edge information. Also, in the extraction process of facial region, the proposed method reduces the effect of lights and poor image quality on low resolution pictures. this is achieved by using the variation of facial area size which is employed for external energy of ACM. Our experiments show that the success rate of extracting facial regions is $92{\%}$ and accuracy rate of extracting facial feature components is $83.4{\%}$, our results provide good evidence that the suggested method can extract the facial regions and features accurately, moreover this technique can be used in the process of handling features according to the pattern parts of automatic avatar generation system in the near future.

  • PDF

Automatic Extraction and Usage of Terminology Dictionary Based on Definitional Sentences Patterns in Technical Documents (기술문서 정의문 패턴을 이용한 전문용어사전 자동추출 및 활용방안)

  • Han, Hui-Jeong;Kim, Tae-Young;Doo, Hyo-Chul;Oh, Hyo-Jung
    • Journal of the Korean Society for information Management
    • /
    • v.34 no.4
    • /
    • pp.81-99
    • /
    • 2017
  • Technical documents are important research outputs generated by knowledge and information society. In order to properly use the technical documents properly, it is necessary to utilize advanced information processing techniques, such as summarization and information extraction. In this paper, to extract core information, we automatically extracted the terminologies and their definition based on definitional sentences patterns and the structure of technical documents. Based on this, we proposed the system to build a specialized terminology dictionary. And further we suggested the personalized services so that users can utilize the terminology dictionary in various ways as an knowledge memory. The results of this study will allow users to find up-to-date information faster and easier. In addition, providing a personalized terminology dictionary to users can maximize the value, usability, and retrieval efficiency of the dictionary.

Automatic Extraction Method for Basic Insect Footprint Segments (곤충 발자국 인식을 위한 자동 영역 추출기법)

  • Shin, Bok-Suk;Woo, Young-Woon;Cha, Eui-Young
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.06a
    • /
    • pp.275-278
    • /
    • 2007
  • In this paper, we proposed a automatic extraction method as a preprocessing stage for extraction of basic insect footprint segments. In general, sizes and strides of footprints may be different according to type and size of an insect for recognition. Therefore we proposed an improved algorithm for extraction of basic insect footprint segments regardless of size and stride of footprint pattern. In the proposed algorithm, threshold value for clustering is determined automatically using contour shape of the graph created by accumulating distances between all the spots of footprint pattern. In the experimental results applying the proposed method, The basic footprint segments should be extracted from a whole insect footprint image using significant information in order to find out appropriate features for classification.

  • PDF

Automatic classification of failure patterns in semiconductor EDS Test using pattern recognition (반도체 EDS공정에서의 패턴인식기법을 이용한 불량 유형 자동 분류 방법 연구)

  • 한영신;황미영;이칠기
    • Proceedings of the IEEK Conference
    • /
    • 2003.07b
    • /
    • pp.703-706
    • /
    • 2003
  • Yield enhancement in semiconductor fabrication is important. It is ideal to prevent all the failures. However, when a failure occurs, it is important to quickly specify the cause stage and take countermeasure. The automatic method of failure pattern extraction from fail bit map provides reduced time to analysis and facilitates yield enhancement. This paper describes the techniques to automatically classifies a failure pattern using a fail bit map, a new simple schema which facilitates the failure analysis.

  • PDF