• Title/Summary/Keyword: Text Retrieval

Search Result 344, Processing Time 0.031 seconds

Reconstitution of Compact Binary trie for the Efficient Retrieval of Hangul UniCODE Text (한글 유니코드 텍스트의 효율적인 탐색을 위한 컴팩트 바이너리 트라이의 재구성)

  • Jung, Kyu Cheol;Lee, Jong Chan;Park, Sang Joon;Kim, Byung Gi
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.5 no.2
    • /
    • pp.21-28
    • /
    • 2009
  • This paper proposes RCBT(Reduced Compact Binary trie) to correct faults of CBT (Compact Binary trie). First, in the case of CBT, a compact structure was tried for the first time, but as the amount of data was increasing, that of inputted data gained and much difficulty was experienced in insertion due to the dummy nodes used in balancing trees. On the other hand, if the HCBT realized hierarchically, given certain depth to prevent the map from increasing onthe right, reached the depth, the method for making new trees and connecting to them was used. Eventually, fast progress could be made in the inputting and searching speed, but this had a disadvantage of the storage space becoming bigger because of the use of dummy nods like CBT and of many tree links. In the case of RCBT in this thesis, a capacity is increased by about 60% by completely cutting down dummy nods.

The Development of Forest Fire Statistical Management System using Web GIS Technology

  • Jo, Myung-Hee;Kim, Joon-Bum;Kim, Hyun-Sik;Jo, Yun-Won
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.183-190
    • /
    • 2002
  • In this paper forest fire statistical information management system is constructed on web environment using web based GIS(Geographic Information System) technology. Though this system, general users can easily access forest fire statistical information and obtain them in visual method such as maps, graphs, and text if they have web browsers. Moreover, officials related to forest fire can easily control and manage all information in domestic by accessing input interface, retrieval interface, and out interface. In order to implement this system, IIS 5.0 of Microsoft is used as web server and Oracle 8i and ASP(Active Server Page) are used for database construction and dynamic web page operation, respectively. Also, Arc IMS of ESRI is used to serve map data using Java and HTML as system development language. Through this system, general users can obtain the whole information related to forest fire visually in real time also recognize forest fire prevention. In addition, Forest officials can manage the domestic forest resource and control forest fire dangerous area efficiently and scientifically by analyzing and retrieving huge forest data through this system. So, they can save their manpower, time and cost to collect and manage data.

  • PDF

Automatic Extraction of Alternative Words using Parallel Corpus (병렬말뭉치를 이용한 대체어 자동 추출 방법)

  • Baik, Jong-Bum;Lee, Soo-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.12
    • /
    • pp.1254-1258
    • /
    • 2010
  • In information retrieval, different surface forms of the same object can cause poor performance of systems. In this paper, we propose the method extracting alternative words using translation words as features of each word extracted from parallel corpus, korean/english title pair of patent information. Also, we propose an association word filtering method to remove association words from an alternative word list. Evaluation results show that the proposed method outperforms other alternative word extraction methods.

Multimedia Annotation and Retrieval using Semantic Metadata (의미적 메타데이터를 이용한 멀티미디어 주석 및 검색)

  • An, Hyoung-Keun;Koh, Jae-Jin
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10c
    • /
    • pp.199-204
    • /
    • 2006
  • 최근 멀티미디어의 이용과 멀티미디어 접근을 위한 기술이 많이 증가하고 있다. 그렇지만 멀티미디어 검색엔진과 같은 실용시스템에서 멀티미디어에 대한 유용한 정보 추출과 정보의 응용은 여전히 문제로 있다. 특히, 멀티미디어 이용자는 검색의 효율성을 위하여 저장소를 직관적인 구조로 생성을 하고 있다. 그 예로 "KISS 추계학술 대회 이미지"와 같은 데이터 폴더를 만들거나, 각 멀티미디어 데이터에 Free Text 기반의 주석을 하여 관리를 하였다. 하지만 이러한 검색들에도 한계점을 가지고 있으며 또 다른 지능적인 의미 검색에 있어서도 인간이 바라는 검색의 정확도에 미치지 못하고 있다. 본 논문에서는 이러한 문제점을 해결하기 위한 새로운 접근법을 소개한다. 목적을 위하여 멀티미디어의 의미적인 작업을 위하여 컨텐츠 획득과 분류를 위한 새로운 사용자 도구를 소개하고자 한다. 도구를 이용하는 멀티미디어 사용자는 주어진 컨텐츠를 인간이 생각하고 컨텐츠가 내포하는 의미의 일정한 구조적 단위로 분해하고, 각 단위들에 MPEG-7 표준기반의 추가적인 기술 정보(Description information)를 부여하여 새로운 의미적 메타데이터를 생성할 수 있다. 이러한 의미적 메타데이터는 멀티미디어 검색을 위해 사용자들에게 효율성을 줄 것이라 본다.

  • PDF

Movement Search in Video Stream Using Shape Sequence (동영상에서 모양 시퀀스를 이용한 동작 검색 방법)

  • Choi, Min-Seok
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.4
    • /
    • pp.492-501
    • /
    • 2009
  • Information on movement of objects in videos can be used as an important part in categorizing and separating the contents of a scene. This paper is proposing a shape-based movement-matching algorithm to effectively find the movement of an object in video streams. Information on object movement is extracted from the object boundaries from the input video frames becoming expressed in continuous 2D shape information while individual 2D shape information is converted into a lD shape feature using the shape descriptor. Object movement in video can be found as simply as searching for a word in a text without a separate movement segmentation process using the sequence of the shape descriptor listed according to order. The performance comparison results with the MPEG-7 shape variation descriptor showed that the proposed method can effectively express the movement information of the object and can be applied to movement search and analysis applications.

  • PDF

A Study of High Speed Retrieval Algorithm of Long Component Keyword (복합키워드의 고속검색 알고리즘에 관한 연구)

  • Lee Jin-Kwan;Jung Kyu-cheol;Lee Tae-hun;Park Ki-hong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.8
    • /
    • pp.1769-1776
    • /
    • 2004
  • Effective keyword extraction is important in the information search system and there are several ways to select proper keyword in many keywords. Among them, DER Structure for AC Algorithm to search single keyword, can search multiple keywords but it has time complexity problem. In this paper, we developed a algorithm, "EDER structure" by expanding standalone search table based on DER structure search method to improve time complexity. We tested the algorithm using 500 text files and found that EDER structure is more efficient than DER structure for AC for keyword posting result and time complexity that 0.2 second for EDER and 0.6 second for DER structure,structure,

StrokeMed: an integrated literature database for stroke and the differentiation of stroke syndrome

  • Kim, Young-Uk;Kim, Jin-Ho;Park, Young-Kyu;Kim, Young-Joo
    • Interdisciplinary Bio Central
    • /
    • v.2 no.2
    • /
    • pp.2.1-2.4
    • /
    • 2010
  • Complex diseases, such as stroke and cancer, have two or more genetic influences and are affected by environmental factors, which complicate them. Due to the complex characteristics of these diseases, we must search and study comprehensive literature-based article resources. Some disease-related literature databases have been developed through specialized journal issues or major websites. Most of them, however, are scattered throughout a website, and users encounter difficulties in finding accurate and comprehensive information easily and quickly. We developed StrokeMed, an integrated literature database for stroke and the differentiation of stroke syndrome. The system allows users to explore PubMed search results, categorized by MeSH (Medical Subject Headings), and the differentiation of stroke syndrome in Oriental medicine. StrokeMed collects data from important sites, such as PubMed, Scirus, and Scopus, automatically to maintain higher-quality and updated content. Currently, the system indexes more than 20,000 PubMed abstracts that are related to stroke, stroke etiology, and Oriental medicine. The system provides valuable literature information to the scientific and medical fields in stroke.

Clustering of Web Document Exploiting with the Co-link in Hypertext (동시링크를 이용한 웹 문서 클러스터링 실험)

  • 김영기;이원희;권혁철
    • Journal of Korean Library and Information Science Society
    • /
    • v.34 no.2
    • /
    • pp.233-253
    • /
    • 2003
  • Knowledge organization is the way we humans understand the world. There are two types of information organization mechanisms studied in information retrieval: namely classification md clustering. Classification organizes entities by pigeonholing them into predefined categories, whereas clustering organizes information by grouping similar or related entities together. The system of the Internet information resources extracts a keyword from the words which appear in the web document and draws up a reverse file. Term clustering based on grouping related terms, however, did not prove overly successful and was mostly abandoned in cases of documents used different languages each other or door-way-pages composed of only an anchor text. This study examines infometric analysis and clustering possibility of web documents based on co-link topology of web pages.

  • PDF

A Study on the Enhancement of Medical Information Service Functions by the Utilization of CD-ROM (CD-ROM을 활용한 의학정보봉사기능의 제고방안에 관한 연구)

  • Yun Hee-Yun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.27
    • /
    • pp.183-214
    • /
    • 1994
  • The purpose of this study is to suggest the schemes to enhance information service functions by the utilization of CD-ROM in medical school libraries. The results of the study are summarized as follows : 1. The selection and evaluation of CD-ROM database are necessary steps in the planning of a CD-ROM. Before the CD-ROM is selected, therefore, medical libraries must make a practical evaluation criteria in important order of information services environment, characteristics of hardware/software, service requirements, price and cost, etc. 2. If possible, CD-ROM MEDLINE must be suited for the information services environment. 3. In case of the popular core journals, full-text CD-ROM should be gradually purchased. 4. In order to reduce the time required from search of bibliographic informations to receipt of original articles, CD-NET system and library holding administration program must be built up and developed. And channel of information search and order/receipt of original article should be varied. 5. Search education program for medical librarians and users should be enforced, and librarians must play an important role in CD-ROM retrieval consultant and intermediator.

  • PDF

Deep Image Annotation and Classification by Fusing Multi-Modal Semantic Topics

  • Chen, YongHeng;Zhang, Fuquan;Zuo, WanLi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.1
    • /
    • pp.392-412
    • /
    • 2018
  • Due to the semantic gap problem across different modalities, automatically retrieval from multimedia information still faces a main challenge. It is desirable to provide an effective joint model to bridge the gap and organize the relationships between them. In this work, we develop a deep image annotation and classification by fusing multi-modal semantic topics (DAC_mmst) model, which has the capacity for finding visual and non-visual topics by jointly modeling the image and loosely related text for deep image annotation while simultaneously learning and predicting the class label. More specifically, DAC_mmst depends on a non-parametric Bayesian model for estimating the best number of visual topics that can perfectly explain the image. To evaluate the effectiveness of our proposed algorithm, we collect a real-world dataset to conduct various experiments. The experimental results show our proposed DAC_mmst performs favorably in perplexity, image annotation and classification accuracy, comparing to several state-of-the-art methods.