• Title/Summary/Keyword: Text-search

Search Result 553, Processing Time 0.023 seconds

Enhancing the Narrow-down Approach to Large-scale Hierarchical Text Classification with Category Path Information

  • Oh, Heung-Seon;Jung, Yuchul
    • Journal of Information Science Theory and Practice
    • /
    • v.5 no.3
    • /
    • pp.31-47
    • /
    • 2017
  • The narrow-down approach, separately composed of search and classification stages, is an effective way of dealing with large-scale hierarchical text classification. Recent approaches introduce methods of incorporating global, local, and path information extracted from web taxonomies in the classification stage. Meanwhile, in the case of utilizing path information, there have been few efforts to address existing limitations and develop more sophisticated methods. In this paper, we propose an expansion method to effectively exploit category path information based on the observation that the existing method is exposed to a term mismatch problem and low discrimination power due to insufficient path information. The key idea of our method is to utilize relevant information not presented on category paths by adding more useful words. We evaluate the effectiveness of our method on state-of-the art narrow-down methods and report the results with in-depth analysis.

Metadata Processing Technique for Similar Image Search of Mobile Platform

  • Seo, Jung-Hee
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.1
    • /
    • pp.36-41
    • /
    • 2021
  • Text-based image retrieval is not only cumbersome as it requires the manual input of keywords by the user, but is also limited in the semantic approach of keywords. However, content-based image retrieval enables visual processing by a computer to solve the problems of text retrieval more fundamentally. Vision applications such as extraction and mapping of image characteristics, require the processing of a large amount of data in a mobile environment, rendering efficient power consumption difficult. Hence, an effective image retrieval method on mobile platforms is proposed herein. To provide the visual meaning of keywords to be inserted into images, the efficiency of image retrieval is improved by extracting keywords of exchangeable image file format metadata from images retrieved through a content-based similar image retrieval method and then adding automatic keywords to images captured on mobile devices. Additionally, users can manually add or modify keywords to the image metadata.

Evaluating real-time search query variation for intelligent information retrieval service (지능 정보검색 서비스를 위한 실시간검색어 변화량 평가)

  • Chong, Min-Young
    • Journal of Digital Convergence
    • /
    • v.16 no.12
    • /
    • pp.335-342
    • /
    • 2018
  • The search service, which is a core service of the portal site, presents search queries that are rapidly increasing among the inputted search queries based on the highest instantaneous search frequency, so it is difficult to immediately notify a search query having a high degree of interest for a certain period. Therefore, it is necessary to overcome the above problems and to provide more intelligent information retrieval service by bringing improved analysis results on the change of the search queries. In this paper, we present the criteria for measuring the interest, continuity, and attention of real-time search queries. In addition, according to the criteria, we measure and summarize changes in real-time search queries in hours, days, weeks, and months over a period of time to assess the issues that are of high interest, long-lasting issues of interest, and issues that need attention in the future.

A Study of High Speed Retrieval Algorithm of Long Component Keyword (복합키워드의 고속검색 알고리즘에 관한 연구)

  • Lee Jin-Kwan;Jung Kyu-cheol;Lee Tae-hun;Park Ki-hong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.8
    • /
    • pp.1769-1776
    • /
    • 2004
  • Effective keyword extraction is important in the information search system and there are several ways to select proper keyword in many keywords. Among them, DER Structure for AC Algorithm to search single keyword, can search multiple keywords but it has time complexity problem. In this paper, we developed a algorithm, "EDER structure" by expanding standalone search table based on DER structure search method to improve time complexity. We tested the algorithm using 500 text files and found that EDER structure is more efficient than DER structure for AC for keyword posting result and time complexity that 0.2 second for EDER and 0.6 second for DER structure,structure,

CONSTRUCTION OF KOREAN ASTRONOMICAL JOURNAL DB (국내 천문학 논문 검색 DB 구축)

  • Sung, Hyun-Il;Kim, Soon-Wook;Yim, In-Sung;Sang, Jian
    • Publications of The Korean Astronomical Society
    • /
    • v.21 no.2
    • /
    • pp.113-119
    • /
    • 2006
  • The Korean Astronomical Data Center(KADC) in Korea Astronomy and Space Science Institute(KASI) has developed a database of astronomical journals published by the Korean Astronomical Society and the Korean Space Science Society. It consists of all bibliographic records of the Journal of the Korean Astronomical Society(JKAS), Publication of the Korean Astronomical Society(PKAS), and Journal of Astronomy & Space Sciences(JASS). The KADC provides useful search functions in the search page such as search criterion of bibcode, publication date, author names, title words, or abstract words. The journal name is one of the search criterion in which more than one journal can be designated at the same time. The criterion of author name is provided bilingually: English or Korean. The abstract and full text can be downloaded as PDF files. It is also possible to search papers related to a specific research topic published in Korean astronomical journals, provided by the KADC, which often cannot be found the worldwide, Astrophysics Data System(ADS) services. The KADC will become basic infrastructure for the systematic construction of bibliographic records, and hence, make the society of Korean astronomers more interactive and collaborative.

Discovery Layer in Library Retrieval: VuFind as an Open Source Service for Academic Libraries in Developing Countries

  • Roy, Bijan Kumar;Mukhopadhyay, Parthasarathi;Biswas, Anirban
    • Journal of Information Science Theory and Practice
    • /
    • v.10 no.4
    • /
    • pp.3-22
    • /
    • 2022
  • This paper provides an overview of the emergence of resource discovery systems and services, along with their advantages, best practices, and current landscapes. It outlines some of the key services and functionalities of a comprehensive discovery model suitable for academic libraries in developing countries. The proposed model (VuFind as a discovery tool) performs like other existing web-scale resource discovery systems, both commercial and open-source, and is capable of providing information resources from different sources in a single-window search interface. The objective of the paper is to provide seamless access to globally distributed subscribed as well as open access resources through its discovery interface, based on a unified index. This model uses Koha, DSpace, and Greenstone as back-ends and VuFind as a discovery layer in the front-end and has also integrated many enhanced search features like Bento-box search, Geodetic search, and full-text search (using Apache Tika). The goal of this paper is to provide the academic community with a one-stop shop for better utilising and integrating heterogeneous bibliographic data sources with VuFind (https://vufind.org/vufind).

Interplay of Text Mining and Data Mining for Classifying Web Contents (웹 컨텐츠의 분류를 위한 텍스트마이닝과 데이터마이닝의 통합 방법 연구)

  • 최윤정;박승수
    • Korean Journal of Cognitive Science
    • /
    • v.13 no.3
    • /
    • pp.33-46
    • /
    • 2002
  • Recently, unstructured random data such as website logs, texts and tables etc, have been flooding in the internet. Among these unstructured data there are potentially very useful data such as bulletin boards and e-mails that are used for customer services and the output from search engines. Various text mining tools have been introduced to deal with those data. But most of them lack accuracy compared to traditional data mining tools that deal with structured data. Hence, it has been sought to find a way to apply data mining techniques to these text data. In this paper, we propose a text mining system which can incooperate existing data mining methods. We use text mining as a preprocessing tool to generate formatted data to be used as input to the data mining system. The output of the data mining system is used as feedback data to the text mining to guide further categorization. This feedback cycle can enhance the performance of the text mining in terms of accuracy. We apply this method to categorize web sites containing adult contents as well as illegal contents. The result shows improvements in categorization performance for previously ambiguous data.

  • PDF

Probabilistic Model for Performance Analysis of a Heuristic with Multi-byte Suffix Matching

  • Choi, Yoon-Ho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.4
    • /
    • pp.711-725
    • /
    • 2013
  • A heuristic with multi-byte suffix matching plays an important role in real pattern matching algorithms. By skipping many characters at a time in the process of comparing a given pattern with the text, the pattern matching algorithm based on a heuristic with multi-byte suffix matching shows a faster average search time than algorithms based on deterministic finite automata. Based on various experimental results and simulations, the previous works show that the pattern matching algorithms with multi-byte suffix matching performs well. However, there have been limited studies on the mathematical model for analyzing the performance in a standard manner. In this paper, we propose a new probabilistic model, which evaluates the performance of a heuristic with multi-byte suffix matching in an average-case search. When the theoretical analysis results and experimental results were compared, the proposed probabilistic model was found to be sufficient for evaluating the performance of a heuristic with suffix matching in the real pattern matching algorithms.