• Title/Summary/Keyword: Text Retrieval System

Search Result 177, Processing Time 0.023 seconds

A study on the efficient extraction method of SNS data related to crime risk factor (범죄발생 위험요소와 연관된 SNS 데이터의 효율적 추출 방법에 관한 연구)

  • Lee, Jong-Hoon;Song, Ki-Sung;Kang, Jin-A;Hwang, Jung-Rae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.1
    • /
    • pp.255-263
    • /
    • 2015
  • In this paper, we suggest a plan to take advantage of the SNS data to proactively identify the information on crime risk factor and to prevent crime. Recently, SNS(Social Network Service) data have been used to build a proactive prevention system in a variety of fields. However, when users are collecting SNS data with simple keyword, the result is contain a large amount of unrelated data. It may possibly accuracy decreases and lead to confusion in the data analysis. So we present a method that can be efficiently extracted by improving the search accuracy through text mining analysis of SNS data.

A Study of High Speed Retrieval Algorithm of Long Component Keyword (복합키워드의 고속검색 알고리즘에 관한 연구)

  • Lee Jin-Kwan;Jung Kyu-cheol;Lee Tae-hun;Park Ki-hong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.8
    • /
    • pp.1769-1776
    • /
    • 2004
  • Effective keyword extraction is important in the information search system and there are several ways to select proper keyword in many keywords. Among them, DER Structure for AC Algorithm to search single keyword, can search multiple keywords but it has time complexity problem. In this paper, we developed a algorithm, "EDER structure" by expanding standalone search table based on DER structure search method to improve time complexity. We tested the algorithm using 500 text files and found that EDER structure is more efficient than DER structure for AC for keyword posting result and time complexity that 0.2 second for EDER and 0.6 second for DER structure,structure,

Clustering of Web Document Exploiting with the Co-link in Hypertext (동시링크를 이용한 웹 문서 클러스터링 실험)

  • 김영기;이원희;권혁철
    • Journal of Korean Library and Information Science Society
    • /
    • v.34 no.2
    • /
    • pp.233-253
    • /
    • 2003
  • Knowledge organization is the way we humans understand the world. There are two types of information organization mechanisms studied in information retrieval: namely classification md clustering. Classification organizes entities by pigeonholing them into predefined categories, whereas clustering organizes information by grouping similar or related entities together. The system of the Internet information resources extracts a keyword from the words which appear in the web document and draws up a reverse file. Term clustering based on grouping related terms, however, did not prove overly successful and was mostly abandoned in cases of documents used different languages each other or door-way-pages composed of only an anchor text. This study examines infometric analysis and clustering possibility of web documents based on co-link topology of web pages.

  • PDF

A Study on the Enhancement of Medical Information Service Functions by the Utilization of CD-ROM (CD-ROM을 활용한 의학정보봉사기능의 제고방안에 관한 연구)

  • Yun Hee-Yun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.27
    • /
    • pp.183-214
    • /
    • 1994
  • The purpose of this study is to suggest the schemes to enhance information service functions by the utilization of CD-ROM in medical school libraries. The results of the study are summarized as follows : 1. The selection and evaluation of CD-ROM database are necessary steps in the planning of a CD-ROM. Before the CD-ROM is selected, therefore, medical libraries must make a practical evaluation criteria in important order of information services environment, characteristics of hardware/software, service requirements, price and cost, etc. 2. If possible, CD-ROM MEDLINE must be suited for the information services environment. 3. In case of the popular core journals, full-text CD-ROM should be gradually purchased. 4. In order to reduce the time required from search of bibliographic informations to receipt of original articles, CD-NET system and library holding administration program must be built up and developed. And channel of information search and order/receipt of original article should be varied. 5. Search education program for medical librarians and users should be enforced, and librarians must play an important role in CD-ROM retrieval consultant and intermediator.

  • PDF

A Comparative Study on the Functional Similarities between Four Commercial IRS's and ISO 8777 (온라인 상용 정보검색시스템의 기능분석 및 ISO 8777과의 유사성 평가 연구)

  • Chung, Young-Mee;Yoo, Jae-Bok
    • Journal of Information Management
    • /
    • v.26 no.2
    • /
    • pp.1-36
    • /
    • 1995
  • The purpose of this study is to analyze fundamental six functions of DIALOG, STN, ORBIT and CDP and then investigate the similarities of the four systems in terms of the functions, and also to compare the commands of these systems with ISO 8777 which is an international standard of command language for the interactive text searching. It was found that there are few differences between the four systems and ISO 8777 in functional aspects, but few similarities in command expression aspects. In addition, it was proven that these four systems are more effective than ISO 8777 in some commands.

  • PDF

Clustering Representative Annotations for Image Browsing (이미지 브라우징 처리를 위한 전형적인 의미 주석 결합 방법)

  • Zhou, Tie-Hua;Wang, Ling;Lee, Yang-Koo;Ryu, Keun-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2010.06c
    • /
    • pp.62-65
    • /
    • 2010
  • Image annotations allow users to access a large image database with textual queries. But since the surrounding text of Web images is generally noisy. an efficient image annotation and retrieval system is highly desired. which requires effective image search techniques. Data mining techniques can be adopted to de-noise and figure out salient terms or phrases from the search results. Clustering algorithms make it possible to represent visual features of images with finite symbols. Annotationbased image search engines can obtains thousands of images for a given query; but their results also consist of visually noise. In this paper. we present a new algorithm Double-Circles that allows a user to remove noise results and characterize more precise representative annotations. We demonstrate our approach on images collected from Flickr image search. Experiments conducted on real Web images show the effectiveness and efficiency of the proposed model.

  • PDF

A News Video Mining based on Multi-modal Approach and Text Mining (멀티모달 방법론과 텍스트 마이닝 기반의 뉴스 비디오 마이닝)

  • Lee, Han-Sung;Im, Young-Hee;Yu, Jae-Hak;Oh, Seung-Geun;Park, Dai-Hee
    • Journal of KIISE:Databases
    • /
    • v.37 no.3
    • /
    • pp.127-136
    • /
    • 2010
  • With rapid growth of information and computer communication technologies, the numbers of digital documents including multimedia data have been recently exploded. In particular, news video database and news video mining have became the subject of extensive research, to develop effective and efficient tools for manipulation and analysis of news videos, because of their information richness. However, many research focus on browsing, retrieval and summarization of news videos. Up to date, it is a relatively early state to discover and to analyse the plentiful latent semantic knowledge from news videos. In this paper, we propose the news video mining system based on multi-modal approach and text mining, which uses the visual-textual information of news video clips and their scripts. The proposed system systematically constructs a taxonomy of news video stories in automatic manner with hierarchical clustering algorithm which is one of text mining methods. Then, it multilaterally analyzes the topics of news video stories by means of time-cluster trend graph, weighted cluster growth index, and network analysis. To clarify the validity of our approach, we analyzed the news videos on "The Second Summit of South and North Korea in 2007".

A Path Storing and Number Matching Method for Management of XML Documents using RDBMS (RDBMS를 이용하여 XML 문서 관리를 위한 경로 저장과 숫자 매칭 기법)

  • Vong, Ha-Ik;Hwang, Byung-Yeon
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.7
    • /
    • pp.807-816
    • /
    • 2007
  • Since W3C proposed XML in 1996, XML documents have been widely spreaded in many internet documents. Because of this, needs for research related with XML is increasing. Especially, it is being well performed to study XML management system for storage, retrieval, and management with XML Documents. Among these studies, XRel is a representative study for XML management and has been become a comparative study. In this study, we suggest XML documents management system based on Relational DataBase Management System. This system is stored not all possible path expressions such as XRel, but filtered path expression which has text value or attribute value. And by giving each node Node Expression Identifier, we try to match given Node Expression Identifier. Finally, to prove efficiency of the suggested technique, this paper shows the result of experiment that compares XPath query processing performance between suggested study and existing technique, XRel.

  • PDF

The Design and Implementation of a Traffic Order and Safety Education System for Kid on Web (웹기반 어린이 교통 질서 및 안전 교육 시스템의 설계 및 구현)

  • An, Syung-Og
    • The Journal of Engineering Research
    • /
    • v.3 no.1
    • /
    • pp.7-20
    • /
    • 1998
  • With our economic development and increment and increment of GNP, the number of autos has incremented. But lacking in mind for traffic safety and traffic order, many traffic accidents have occurred. So the purpose of development of traffic safety education system based on web is to advertise the importance and the need of traffic order and safety education and protect walkers and drivers from traffic accidents. The Contents and Scopes of Study Development are as follows. There are input of text, image and moving image data for traffic safety education, establishment of hierarchical relation for traffic safety education, relation analysis between traffic safety education information and design of hyper link structure between them, thesaurus implementation for traffic safety education system, design and implementation of information retrieval engine based on thesaurus, design and implementation of database schema for traffic safety education and GUI implementation for user.

  • PDF

A Document Summary System based on Personalized Web Search Systems (개인화 웹 검색 시스템 기반의 문서 요약 시스템)

  • Kim, Dong-Wook;Kang, Soo-Yong;Kim, Han-Joon;Lee, Byung-Jeong;Chang, Jae-Young
    • Journal of Digital Contents Society
    • /
    • v.11 no.3
    • /
    • pp.357-365
    • /
    • 2010
  • Personalized web search engine provides personalized results to users by query expansion, re-ranking or other methods representing user's intention. The personalized result page includes URL, page title and small text fragment of each web document. which is known as snippet. The snippet is the summary of the document which includes the keywords issued by either user or search engine itself. Users can verify the relevancy of the whole document using only the snippet, easily. The document summary (snippet) is an important information which makes users determine whether or not to click the link to the whole document. Hence, if a search engine generates personalized document summaries, it can provide a more satisfactory search results to users. In this paper, we propose a personalized document summary system for personalized web search engines. The proposed system provides increased degree of satisfaction to users with marginal overhead.