• Title/Summary/Keyword: 텍스트 검색

Search Result 679, Processing Time 0.048 seconds

An Experimental Study on the Performance Improvement of Automatic Classification for the Articles of Korean Journals Based on Controlled Keywords in International Database (해외 데이터베이스의 통제키워드에 기초한 국내 학술지 논문의 자동분류 성능 향상에 관한 실험적 연구)

  • Kim, Pan Jun;Lee, Jae Yun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.48 no.3
    • /
    • pp.491-510
    • /
    • 2014
  • As a major factor for efficient management and retrieval of the articles in databases, keywords are classified into uncontrolled keywords and controlled keywords. Most of Korean scholarly databases fail to provide controlled vocabularies to indexing research articles which help users to retrieve relevant papers exhaustively. In this paper, we carried out automatic descriptor assignment experiments to Korean articles using automatic classifiers learned with descriptors in international database. The results of the experiments show that the classifier learning with descriptors in international database can potentially offer controlled vocabularies to Korean scholarly articles having English s. Also, we sought to improve the performance of automatic descriptor assignment using various classifiers and combination of them.

Developing Subject Headings for Children's Picture Books based on A to Zoo (어린이 그림책을 위한 주제명표 개발 연구: 『A to Zoo』를 바탕으로)

  • Park, Ziyoung
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.4
    • /
    • pp.251-271
    • /
    • 2012
  • Subject headings support the effective access of children's picture books. However, it is difficult to select subject terms from titles or table of contents in children's picture books because of their relatively little textual information. Therefore, it is necessary to assign subject terms to each picture book. However, it is not adequate to use general subject headings because the types and levels of general subject headings are different from special subject headings for the children's materials. For this reason, this study aims to develop subject headings for children's picture books. The subject terms in A to Zoo were selected, and the selected terms were translated into Korean and modified for the Korean culture and language. Other reference books, such as Elementary Korean Dictionary, were also used to determine adequate terms for children. The resulting subject headings were assigned to the recommended picture books for children and used to search by subject, browse, and recommend books.

A Rule-based Approach to Identifying Citation Text from Korean Academic Literature (한국어 학술 문헌의 본문 인용문 인식을 위한 규칙 기반 방법)

  • Kang, In-Su
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.4
    • /
    • pp.43-60
    • /
    • 2012
  • Identifying citing sentences from article full-text is a prerequisite for creating a variety of future academic information services such as citation-based automatic summarization, automatic generation of review articles, sentiment analysis of citing statements, information retrieval based on citation contexts, etc. However, finding citing sentences is not easy due to the existence of implicit citing sentences which do not have explicit citation markers. While several methods have been proposed to attack this problem for English, it is difficult to find such automatic methods for Korean academic literature. This article presents a rule-based approach to identifying Korean citing sentences. Experiments show that the proposed method could find 30% of implicit citing sentences in our test data in nearly 70% precision.

System Implement to Identify Copyright Infringement Based on the Text Reference Point (텍스트 기준점 기반의 저작권 침해 판단 시스템 구현)

  • Choi, Kyung-Ung;Park, Soon-Cheol;Yang, Seung-Won
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.15 no.1
    • /
    • pp.77-84
    • /
    • 2015
  • Most of the existing methods make the index key with every 6 words in every sentence in a document in order to identify copyright infringement between two documents. However, these methods has the disadvantage to take a long time to inspect the copyright infringement because of the long indexing time for the large-scale document. In this paper, we propose a method to select the longest word (called a feature bock) as an index key in the predetermined-sized window which scans a document character by character. This method can be characterized by removing duplicate blocks in the process of scanning a document, dramatically reducing the number of the index keys. The system with this method can find the copyright infringement positions of two documents very accurately and quickly since relatively small number of blocks are compared.

Extracting Method of User's Interests by Using SNS Follower's Relationship and Sequential Pattern Evaluation Indices for Keyword (키워드를 위한 시퀀셜 패턴 평가 지표와 SNS 팔로워의 관계를 이용한 사용자 관심사항 추출방법)

  • Shin, Bong-Hi;Jeon, Hye-Kyoung
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.8
    • /
    • pp.71-75
    • /
    • 2017
  • Due to the spread of SNS, web-based consumer-generated data is increasing exponentially. It is important in many fields to accurately extract what is appropriate for the user's interest in a large amount of data. It is especially important for business mangers to establish marketing policies to find the right customers for them in many users. In this paper, we try to obtain important information centering on customers who are interested in each account through Twitter follow - following relationship. Because Twitter's current follower relationships do not reflect the user's interests, we try to figure out the details of interest using keyword extraction methods for tweets of followers. To do this, we select two domestic commercial Twitter accounts and apply the sequential pattern evaluation index to the mining key phrase of the text data collected from the follower.

Hangul Encoding Standard based on Unicode (유니코드의 한글 인코딩 표준안)

  • Ahn, Dae-Hyuk;Park, Young-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.12
    • /
    • pp.1083-1092
    • /
    • 2007
  • In Unicode, two types of Hangul encoding schemes are currently in use, namely, the "precomposed modern Hangul syllables" model and the "conjoining Hangul characters" model. The current Unicode Hangul conjoining rules allow a precomposed Hangul syllable to be a member of a syllable which includes conjoining Hangul characters; this has resulted in a number of different Hangul encoding implementations. This unfortunate problem stems from an incomplete understanding of the Hangul writing system when the normalization and encoding schemes were originally designed. In particular, the extended use of old Hangul was not taken into consideration. As a result, there are different ways to represent Hangul syllables, and this cause problem in the processing of Hangul text, for instance in searching, comparison and sorting functions. In this paper, we discuss the problems with the normalization of current Hangul encodings, and suggest a single efficient rule to correctly process the Hangul encoding in Unicode.

Topic-Network based Topic Shift Detection on Twitter (트위터 데이터를 이용한 네트워크 기반 토픽 변화 추적 연구)

  • Jin, Seol A;Heo, Go Eun;Jeong, Yoo Kyung;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.30 no.1
    • /
    • pp.285-302
    • /
    • 2013
  • This study identified topic shifts and patterns over time by analyzing an enormous amount of Twitter data whose characteristics are high accessibility and briefness. First, we extracted keywords for a certain product and used them for representing the topic network allows for intuitive understanding of keywords associated with topics by nodes and edges by co-word analysis. We conducted temporal analysis of term co-occurrence as well as topic modeling to examine the results of network analysis. In addition, the results of comparing topic shifts on Twitter with the corresponding retrieval results from newspapers confirm that Twitter makes immediate responses to news media and spreads the negative issues out quickly. Our findings may suggest that companies utilize the proposed technique to identify public's negative opinions as quickly as possible and to apply for the timely decision making and effective responses to their customers.

Features for Author Disambiguation (저자 식별을 위한 자질 비교)

  • Kang, In-Su;Lee, Seungwoo;Jung, Hanmin;Kim, Pyung;Goo, HeeKwan;Lee, MiKyung;Sung, Won-Kyung;Park, DongIn
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.107-111
    • /
    • 2007
  • There exists a many-to-many mapping relationship between persons and their names. A person may have multiple names, and different persons may share the same name. These synonymous and homonymous names may severely deteriorate the recall and precision of the person search, respectively. This study addresses the characteristics of features for resolving homonymous author names appearing in citation data. As disambiguation features, previous works have employed citation-internal features such as co-authorship, titles of articles, titles of publications as well as citation-external features such as emails, affiliations, Web evidences. To the best of our knowledge, however, there has been no literature to deal with the influences of features on author disambiguation. This study analyzes the effect of individual features on author resolution using a large-scale test set for Korean.

  • PDF

Stepmother Images through Analyses of Twitter and News Articles (트위터와 뉴스기사 분석을 통해 본 계모에 대한 사회적 인식)

  • Jeong, Su Jeong;Kim, So Eun;Chung, Ick Joong
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.7
    • /
    • pp.665-678
    • /
    • 2018
  • The purpose of this study is to analyze the social perception of stepmother in social media and news. For this purpose, we analyzed the texts that were searched as 'stepmother' in Twitter and news articles. The main research results are as follows. The public perception is divided into two types: strengthening the negative image of the stepmother, and providing a positive alternative message to the negative image. Negative images were reported in the media as negative stereotypes about remarried families, as well as images of negative stepmother that afflicts stepchildren like fairy tales. Positive alternative messages were concerned about the negative perception of remarried families. Based on the results of this study, we discussed alternatives to avoid prejudice against stepmother.

Storage Schemes for XML Query Cache (XML 질의 캐쉬의 저장 기법)

  • Kim, Young-Hyun;Kang, Hyun-Chul
    • Journal of KIISE:Databases
    • /
    • v.33 no.5
    • /
    • pp.551-562
    • /
    • 2006
  • XML query caching for XML database-backed Web applications began to be investigated recently. Despite its practical significance, efficiency of the storage schemes for cached query results has not been addressed. In this paper, we deal with the storage schemes for XML query cache. A fundamental problem that needs to be considered in designing an efficient storage structure for XML query cache is that there exist performance tradeoffs between the two major types of operations on a cached query result. The two are (1) retrieving the whole of it to return the query result and (2) updating just a small portion of it for its incremental refresh against the updates done to its source. We propose eight different storage schemes for XML query cache, which are categorized into three groups: (1) the schemes based on the plain text file, (2) the schemes based on the persistent DOM (PDOM) file, and (3) a scheme employing an RDBMS. We implemented all of them, and compared their performance with each other. We also compared our proposal with a storage scheme based on a current state-of-the-art XML storage scheme, showing that ours is more efficient.