• Title/Summary/Keyword: Text Retrieval

Search Result 342, Processing Time 0.028 seconds

An Experimental Study on the Performance Improvement of Automatic Classification for the Articles of Korean Journals Based on Controlled Keywords in International Database (해외 데이터베이스의 통제키워드에 기초한 국내 학술지 논문의 자동분류 성능 향상에 관한 실험적 연구)

  • Kim, Pan Jun;Lee, Jae Yun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.48 no.3
    • /
    • pp.491-510
    • /
    • 2014
  • As a major factor for efficient management and retrieval of the articles in databases, keywords are classified into uncontrolled keywords and controlled keywords. Most of Korean scholarly databases fail to provide controlled vocabularies to indexing research articles which help users to retrieve relevant papers exhaustively. In this paper, we carried out automatic descriptor assignment experiments to Korean articles using automatic classifiers learned with descriptors in international database. The results of the experiments show that the classifier learning with descriptors in international database can potentially offer controlled vocabularies to Korean scholarly articles having English s. Also, we sought to improve the performance of automatic descriptor assignment using various classifiers and combination of them.

Automated Development of Rank-Based Concept Hierarchical Structures using Wikipedia Links (위키피디아 링크를 이용한 랭크 기반 개념 계층구조의 자동 구축)

  • Lee, Ga-hee;Kim, Han-joon
    • The Journal of Society for e-Business Studies
    • /
    • v.20 no.4
    • /
    • pp.61-76
    • /
    • 2015
  • In general, we have utilized the hierarchical concept tree as a crucial data structure for indexing huge amount of textual data. This paper proposes a generality rank-based method that can automatically develop hierarchical concept structures with the Wikipedia data. The goal of the method is to regard each of Wikipedia articles as a concept and to generate hierarchical relationships among concepts. In order to estimate the generality of concepts, we have devised a special ranking function that mainly uses the number of hyperlinks among Wikipedia articles. The ranking function is effectively used for computing the probabilistic subsumption among concepts, which allows to generate relatively more stable hierarchical structures. Eventually, a set of concept pairs with hierarchical relationship is visualized as a DAG (directed acyclic graph). Through the empirical analysis using the concept hierarchy of Open Directory Project, we proved that the proposed method outperforms a representative baseline method and it can automatically extract concept hierarchies with high accuracy.

The Online Game Coined Profanity Filtering System by using Semi-Global Alignment (반 전역 정렬을 이용한 온라인 게임 변형 욕설 필터링 시스템)

  • Yoon, Tai-Jin;Cho, Hwan-Gue
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.12
    • /
    • pp.113-120
    • /
    • 2009
  • Currently the verbal abuse in text message over on-line game is so serious. However we do not have any effective policy or technical tools yet. Till now in order to cope with this problem, the online game service providers have accumulated a set of forbidden words and applied this list on the textual word used in on-line game, which is called 'Swear filter'. But young on-line game players easily avoid this filtering method by coining another words which is not kept in the list. Especially Korean is very easy to make new variations of a vulgar word. In this paper, we propose one smart filtering algorithm to identify newly coined profanities. Important features of our method include the canonical form transformation of coined profanities, semi-global alignment between in the level of consonant and vowel units. For experiment, we have collected more than 1000 newly coined vulgar words in on-line gaming sites and tested these word against our methods. where our system have successfully filtered more than 90% of those newly coined vulgar words.

A Document Summary System based on Personalized Web Search Systems (개인화 웹 검색 시스템 기반의 문서 요약 시스템)

  • Kim, Dong-Wook;Kang, Soo-Yong;Kim, Han-Joon;Lee, Byung-Jeong;Chang, Jae-Young
    • Journal of Digital Contents Society
    • /
    • v.11 no.3
    • /
    • pp.357-365
    • /
    • 2010
  • Personalized web search engine provides personalized results to users by query expansion, re-ranking or other methods representing user's intention. The personalized result page includes URL, page title and small text fragment of each web document. which is known as snippet. The snippet is the summary of the document which includes the keywords issued by either user or search engine itself. Users can verify the relevancy of the whole document using only the snippet, easily. The document summary (snippet) is an important information which makes users determine whether or not to click the link to the whole document. Hence, if a search engine generates personalized document summaries, it can provide a more satisfactory search results to users. In this paper, we propose a personalized document summary system for personalized web search engines. The proposed system provides increased degree of satisfaction to users with marginal overhead.

Development and Operation of Marine Environmental Portal Service System (해양환경 포탈서비스시스템 구축과 운영)

  • 최현우;권순철
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.05a
    • /
    • pp.338-341
    • /
    • 2003
  • According to a long-term master plan for the implementing of MOMAF's marine environmental informatization, we have developed marine environment portal web site which consists of 7 main-menu and 39 sub-menu including various types of contents (text, image and multimedia) based on RDBMS. This portal site was opened in Oct., 2002 (http://www.meps.info). Also, for the national institutions' distributed DB which is archived and managed respectively the marine chemical data and biological data, the integrated retrieval system was developed. This system is meaningful for the making collaborative use of real data and could be applied for data mining, marine research, marine environmental GIS and making-decisions.

  • PDF

An Efficient Retrieval Technique for Spatial Web Objects (공간 웹 객체의 효율적인 검색 기법)

  • Yang, PyoungWoo;Nam, Kwang Woo
    • Journal of KIISE
    • /
    • v.42 no.3
    • /
    • pp.390-398
    • /
    • 2015
  • Spatial web objects refer to web documents that contain geographic information. Recently, services that create spatial web objects have increased greatly because of the advancements in devices such as smartphones. For services such as Twitter or Facebook, simple texts posted by users is stored along with information about the post's location. To search for such spatial web objects, a method that uses spatial information and text information simultaneously is required. Conventional spatial web object search methods mostly use R-tree and inverted file methods. However, these methods have a disadvantage of requiring a large volume of space when building indices. Furthermore, such methods are efficient for searching with many keywords but are inefficient for searching with a few keywords.. In this paper, we propose a spatial web object search method that uses a quad-tree and a patricia-trie. We show that the proposed technique is more effective than existing ones in searching with a small number of keywords. Furthermore, we show through an experiment that the space required by the proposed technique is much smaller than that required by existing ones.

A Path Storing and Number Matching Method for Management of XML Documents using RDBMS (RDBMS를 이용하여 XML 문서 관리를 위한 경로 저장과 숫자 매칭 기법)

  • Vong, Ha-Ik;Hwang, Byung-Yeon
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.7
    • /
    • pp.807-816
    • /
    • 2007
  • Since W3C proposed XML in 1996, XML documents have been widely spreaded in many internet documents. Because of this, needs for research related with XML is increasing. Especially, it is being well performed to study XML management system for storage, retrieval, and management with XML Documents. Among these studies, XRel is a representative study for XML management and has been become a comparative study. In this study, we suggest XML documents management system based on Relational DataBase Management System. This system is stored not all possible path expressions such as XRel, but filtered path expression which has text value or attribute value. And by giving each node Node Expression Identifier, we try to match given Node Expression Identifier. Finally, to prove efficiency of the suggested technique, this paper shows the result of experiment that compares XPath query processing performance between suggested study and existing technique, XRel.

  • PDF

Reconstructing Web Broadcasting Information based on User Retrieval Pattern (무선 환경에서 사용자 검색 성향을 반영한 웹 방송 정보 재구성 기법)

  • Kim, Won-Cheol;Lee, Soo-Cheol;Hwang, Een-Jun;Byeon, Kwang-Jun
    • The KIPS Transactions:PartD
    • /
    • v.11D no.5
    • /
    • pp.1149-1158
    • /
    • 2004
  • Today the fastest growing communities of web users are mobile visitors who browse web page with wireless PDAs and cellular phones. However, most web pages are optimiaed exclusively for desktop clients on the broadband network and are inconvenient to users with small screen mobile devices. They display only a few lines of text and cannot run client-side programs or scripts due to lack of system resource. Even worse, their connections are usually slow to support most of the data-intensive applications. In this paper, we propose a pageslet scheme that makes it feasible to browse ordinary web pages on small screen mobile devices. It extracts broadcasting sections of user preference from broadcasting web pages and automatically reorganizes the extracted sections for convenient browsing on mobile devices.

Feature Selection for Anomaly Detection Based on Genetic Algorithm (유전 알고리즘 기반의 비정상 행위 탐지를 위한 특징선택)

  • Seo, Jae-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.7
    • /
    • pp.1-7
    • /
    • 2018
  • Feature selection, one of data preprocessing techniques, is one of major research areas in many applications dealing with large dataset. It has been used in pattern recognition, machine learning and data mining, and is now widely applied in a variety of fields such as text classification, image retrieval, intrusion detection and genome analysis. The proposed method is based on a genetic algorithm which is one of meta-heuristic algorithms. There are two methods of finding feature subsets: a filter method and a wrapper method. In this study, we use a wrapper method, which evaluates feature subsets using a real classifier, to find an optimal feature subset. The training dataset used in the experiment has a severe class imbalance and it is difficult to improve classification performance for rare classes. After preprocessing the training dataset with SMOTE, we select features and evaluate them with various machine learning algorithms.

The Design and Implementation of a Traffic Order and Safety Education System for Kid on Web (웹기반 어린이 교통 질서 및 안전 교육 시스템의 설계 및 구현)

  • An, Syung-Og
    • The Journal of Engineering Research
    • /
    • v.3 no.1
    • /
    • pp.7-20
    • /
    • 1998
  • With our economic development and increment and increment of GNP, the number of autos has incremented. But lacking in mind for traffic safety and traffic order, many traffic accidents have occurred. So the purpose of development of traffic safety education system based on web is to advertise the importance and the need of traffic order and safety education and protect walkers and drivers from traffic accidents. The Contents and Scopes of Study Development are as follows. There are input of text, image and moving image data for traffic safety education, establishment of hierarchical relation for traffic safety education, relation analysis between traffic safety education information and design of hyper link structure between them, thesaurus implementation for traffic safety education system, design and implementation of information retrieval engine based on thesaurus, design and implementation of database schema for traffic safety education and GUI implementation for user.

  • PDF