• Title/Summary/Keyword: Query Extraction

Search Result 109, Processing Time 0.025 seconds

Snippet Extraction Method using Fuzzy (퍼지를 이용한 스니핏 추출 방법)

  • Park, Sun;Choi, Myeong Su;Kim, Cheong Ho;Kim, Cheong Uck;Na, Hee Kun;Choi, Seock Whan;Kumar, Shiu;Lee, Seong Ro
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.10a
    • /
    • pp.387-388
    • /
    • 2012
  • In order to solve problem which User sometime visits the wrong page with respect to user intention when uses snippet. this paper proposes a new snippet extraction method using fuzzy. The proposed method uses pseudo relevance feedback to expand the use's query. It uses the fuzzy association between the expanded query and the web pages to extract snippet to be well reflected semantic user's intention.

  • PDF

Eigen Value Based Image Retrieval Technique (Eigen Value 기반의 영상검색 기법)

  • 김진용;소운영;정동석
    • The Journal of Information Technology and Database
    • /
    • v.6 no.2
    • /
    • pp.19-28
    • /
    • 1999
  • Digital image and video libraries require new algorithms for the automated extraction and indexing of salient image features. Eigen values of an image provide one important cue for the discrimination of image content. In this paper we propose a new approach for automated content extraction that allows efficient database searching using eigen values. The algorithm automatically extracts eigen values from the image matrix represented by the covariance matrix for the image. We demonstrate that the eigen values representing shape information and the skewness of its distribution representing complexity provide good performance in image query response time while providing effective discriminability. We present the eigen value extraction and indexing techniques. We test the proposed algorithm of searching by eigen value and its skewness on a database of 100 images.

  • PDF

Web Information Extraction using HTML Tag Pattern (HTML 태그페턴을 이용한 웹정보추출시스템)

  • Park, Byung-Kwon
    • Proceedings of the Korea Association of Information Systems Conference
    • /
    • 2005.05a
    • /
    • pp.79-92
    • /
    • 2005
  • To query the vast amount of web pages which are available i]l the Internet, it is necessary to extract the encoded information in the web pages for converting it into structured data (e.g. relational data for SQL) or semistructured data (e.g. XML data for XQuery), In this paper, we propose a new web information extraction system, PIES, to convert web information into XML documents. PIES is based on a user-specified target schema and HTML tag pattern descriptions. The web information is extracted by the pattern descriptions and validated by the target schema. We designed a new language to describe extraction rules, and a new regular expression to describe HTML tag patterns. We implemented PIES and applied it to the US patent web site to evaluate its correctness. It successfully extracted more than thousands of US patent data and converted them into XML documents.

  • PDF

Similarity based Rotation Invariant Image Retrieval (유사도를 이용한 회전 불변 영상검색)

  • 권동현;장정동;이태홍
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.581-584
    • /
    • 1999
  • In order to retrieve the rotated image within database by the content based image retrieval system, the algorithms with rotation robustness is usually applied in the procedure of the feature extraction. In that case, it requires much calculation time for feature extraction and much indexed data for feature indexing. Thus. in this paper. we propose the rotation robust algorithm using the block variance of the projected vector. The algorithm does not require additional calculation for feature extraction and is executed within query time by comparing the extracted data. Proposed method can be processed through database including various size of images with shape information and executed with fast response time in implementation.

  • PDF

Distributed Information Extraction in Wireless Sensor Networks using Multiple Software Agents with Dynamic Itineraries

  • Gupta, Govind P.;Misra, Manoj;Garg, Kumkum
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.1
    • /
    • pp.123-144
    • /
    • 2014
  • Wireless sensor networks are generally deployed for specific applications to accomplish certain objectives over a period of time. To fulfill these objectives, it is crucial that the sensor network continues to function for a long time, even if some of its nodes become faulty. Energy efficiency and fault tolerance are undoubtedly the most crucial requirements for the design of an information extraction protocol for any sensor network application. However, most existing software agent based information extraction protocols are incapable of satisfying these requirements because of static agent itineraries and large agent sizes. This paper proposes an Information Extraction protocol based on Multiple software Agents with Dynamic Itineraries (IEMADI), where multiple software agents are dispatched in parallel to perform tasks based on the query assigned to them. IEMADI decides the itinerary for an agent dynamically at each hop using local information. Through mathematical analysis and simulation, we compare the performance of IEMADI with a well known static itinerary based protocol with respect to energy consumption and response time. The results show that IEMADI provides better performance than the static itinerary based protocols.

A implementation and evaluation of Rule-Based Reverse-Engineering Tool (규칙기반 역공학 도구의 구현 및 평가)

  • Bae Jin Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.9 no.3
    • /
    • pp.135-141
    • /
    • 2004
  • With the diversified and enlarged softwares, the issue of software maintenance became more complex and difficult and consequently, the cost of software maintenance took up the highest portion in the software life cycle. We design Reverse Engineering Tool for software restructuring environment to object-oriented system. We design Rule - Based Reverse - Engineering using Class Information. We allow the maintainer to use interactive query by using Prolog language. We use similarity formula, which is based on relationship between variables and functions, in class extraction and restructuring method in order to extract most appropriate class. The visibility of the extracted class can be identified automatically. Also, we allow the maintainer to use query by using logical language. So We can help the practical maintenance. Therefore, The purpose of this paper is to suggest reverse engineering tool and evaluation reverse engineering tool.

  • PDF

Mining of Frequent Structures over Streaming XML Data (스트리밍 XML 데이터의 빈발 구조 마이닝)

  • Hwang, Jeong-Hee
    • The KIPS Transactions:PartD
    • /
    • v.15D no.1
    • /
    • pp.23-30
    • /
    • 2008
  • The basic research of context aware in ubiquitous environment is an internet technique and XML. The XML data of continuous stream type are popular in network application through the internet. And also there are researches related to query processing for streaming XML data. As a basic research to efficiently query, we propose not only a labeled ordered tree model representing the XML but also a mining method to extract frequent structures from streaming XML data. That is, XML data to continuously be input are modeled by a stream tree which is called by XFP_tree and we exactly extract the frequent structures from the XFP_tree of current window to mine recent data. The proposed method can be applied to the basis of the query processing and index method for XML stream data.

The Levelized Schema Extraction in XML Documents (XML 문서에서의 단계화된 스키마 추출)

  • 김성림;윤용익
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.1
    • /
    • pp.105-113
    • /
    • 2002
  • XML documents, which are becoming new standard for expressing and exchanging data in the Internet, don't have defined schema. It is not adequate to directly apply XML documents to the existing SQL or OQL. Research on how to extract schema for XML documents and query language is going on actively. Fer users' query, the results could be too many or too less. It is important to give the users adequate results. This paper suggests the way to extract many levelized schema according to the frequency of element occurrence in XML documents. The Schema can be reduced or extended to correspond to the users'query more flexibly.

  • PDF

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

  • Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.111-136
    • /
    • 2018
  • In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.

Automatic 5 Layer Model construction of Business Process Framework(BPF) with M2T Transformation (모델변환을 이용한 비즈니스 프로세스 프레임워크 5레이어 모델 자동 구축 방안)

  • Seo, Chae-Yun;Kim, R. Youngchul
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.1
    • /
    • pp.63-70
    • /
    • 2013
  • In previous research, we suggested a business process structured query language(BPSQL) for information extraction and retrieval in the business process framework, and used an existing query language with the tablization for each layer within the framework, but still had a problem to manually build with the specification of each layer information of BFP. To solve this problem, we suggest automatically to build the schema based business process model with model-to-text conversion technique. This procedure consists of 1) defining each meta-model of the entire structure and of database schema, and 2) also defining model transformation rules for it. With this procedure, we can automatically transform from defining through meta-modeling of an integrated information system designed to the schema based model information table specification defined of the entire layer each layer specification with model-to-text conversion techniques. It is possible to develop the efficiently integrated information system.