• Title/Summary/Keyword: Automatic Keyword Classification

Search Result 17, Processing Time 0.029 seconds

A Study on Automatic Keyword Classification (용어의 자동분류에 관한 연구)

  • Seo, Eun-Gyoung
    • Journal of the Korean Society for information Management
    • /
    • v.1 no.1
    • /
    • pp.78-99
    • /
    • 1984
  • In this paper, the automatic keyword classification which is one of the automatic construction methods of retrieval thesaurus is experimented to the Korean language on the basis that the use of retrieval thesaurus would increase the efficiency of information retrieval in the natural language retrieval system searching machine-readable data base. Furthermore, this paper proposes the application methods. In this experiment, the automatic keyword classification was based on the assumption that semantic relationships between terms can be found out by the statistical patterns of terms occurring in a text.

  • PDF

A Study on the Reclassification of Author Keywords for Automatic Assignment of Descriptors (디스크립터 자동 할당을 위한 저자키워드의 재분류에 관한 실험적 연구)

  • Kim, Pan-Jun;Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.225-246
    • /
    • 2012
  • This study purported to investigate the possibility of automatic descriptor assignment using the reclassification of author keywords in domestic scholarly databases. In the first stage, we selected optimal classifiers and parameters for the reclassification by comparing the characteristics of machine learning classifiers. In the next stage, learning the author keywords that were assigned to the selected articles on readings, the author keywords were automatically added to another set of relevant articles. We examined whether the author keyword reclassifications had the effect of vocabulary control just as descriptors collocate the documents on the same topic. The results showed the author keyword reclassification had the capability of the automatic descriptor assignment.

Automatic Music-Story Video Generation Using Music Files and Photos in Automobile Multimedia System (자동차 멀티미디어 시스템에서의 사진과 음악을 이용한 음악스토리 비디오 자동생성 기술)

  • Kim, Hyoung-Gook
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.9 no.5
    • /
    • pp.80-86
    • /
    • 2010
  • This paper presents automated music story video generation technique as one of entertainment features that is equipped in multimedia system of the vehicle. The automated music story video generation is a system that automatically creates stories to accompany musics with photos stored in user's mobile phone by connecting user's mobile phone with multimedia systems in vehicles. Users watch the generated music story video at the same time. while they hear the music according to mood. The performance of the automated music story video generation is measured by accuracies of music classification, photo classification, and text-keyword extraction, and results of user's MOS-test.

Semi-Automatic Management of Classification Scheme with Interoperability (상호운용적 분류체계 관리를 위한 반자동 분류체계 관리방안)

  • Lee, Won-Goo;Shin, Sung-Ho;Kim, Kwang-Young;Jeon, Do-Heon;Yoon, Hwa-Mook;Sung, Won-Kyung;Lee, Min-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.12
    • /
    • pp.466-474
    • /
    • 2011
  • Under the knowledge-based economy in 21C, the convergence and complexity in science and technology are being more active. Therefore, we have science and technology are classified properly, make not easy to construct the system to new next generation area. Thus we suggest the systematic solution method to flexibly extend classification scheme in order for content management and service organizations. In this way, we expect that the difficult of classification scheme management is minimized and the expense of it is spared.

Hierarchical Automatic Classification of News Articles based on Association Rules (연관규칙을 이용한 뉴스기사의 계층적 자동분류기법)

  • Joo, Kil-Hong;Shin, Eun-Young;Lee, Joo-Il;Lee, Won-Suk
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.6
    • /
    • pp.730-741
    • /
    • 2011
  • With the development of the internet and computer technology, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The conventional document categorization method used only the keywords of related documents for document classification. However, this paper proposed keyword extraction method of based on association rule. This method extracts a set of related keywords which are involved in document's category and classifies representative keyword by using the classification rule proposed in this paper. In addition, this paper proposed the preprocessing method for efficient keywords creation and predicted the new document's category. We can design the classifier and measure the performance throughout the experiment to increase the profile's classification performance. When predicting the category, substituting all the classification rules one by one is the major reason to decrease the process performance in a profile. Finally, this paper suggested automatically categorizing plan which can be applied to hierarchical category architecture, extended from simple category architecture.

Automatic Email Multi-category Classification Using Dynamic Category Hierarchy and Non-negative Matrix Factorization (비음수 행렬 분해와 동적 분류 체계를 사용한 자동 이메일 다원 분류)

  • Park, Sun;An, Dong-Un
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.5
    • /
    • pp.378-385
    • /
    • 2010
  • The explosive increase in the use of email has made to need email classification efficiently and accurately. Current work on the email classification method have mainly been focused on a binary classification that filters out spam-mails. This methods are based on Support Vector Machines, Bayesian classifiers, rule-based classifiers. Such supervised methods, in the sense that the user is required to manually describe the rules and keyword list that is used to recognize the relevant email. Other unsupervised method using clustering techniques for the multi-category classification is created a category labels from a set of incoming messages. In this paper, we propose a new automatic email multi-category classification method using NMF for automatic category label construction method and dynamic category hierarchy method for the reorganization of email messages in the category labels. The proposed method in this paper, a large number of emails are managed efficiently by classifying multi-category email automatically, email messages in their category are reorganized for enhancing accuracy whenever users want to classify all their email messages.

The Automatic Management of Classification Scheme with Interoperability on Heterogeneous Data (이기종 데이터 간 상호운용적 분류체계 관리를 위한 분류체계 자동화 방안)

  • Lee, Won-Goo;Hwang, Myung-Gwon;Lee, Min-Ho;Shin, Sung-Ho;Kim, Kwang-Young;Yoon, Hwa-Mook;Sung, Won-Kyung;Jeon, Do-Heon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.12
    • /
    • pp.2609-2618
    • /
    • 2011
  • Under the knowledge-based economy in 21C, the convergence and complexity in science and technology are being more active. Interoperability between heterogeneous domains is a very important point considered in the field of scholarly information service as well information standardization. Thus we suggest the systematic solution method to flexibly extend classification scheme in order for content management and service organizations. Especially, This paper shows that automatic method for interoperability between heterogeneous scholarly classification code structures will be effective in enhancing the information service system.

A Study on Automatic Classification of Newspaper Articles Based on Unsupervised Learning by Departments (비지도학습 기반의 행정부서별 신문기사 자동분류 연구)

  • Kim, Hyun-Jong;Ryu, Seung-Eui;Lee, Chul-Ho;Nam, Kwang Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.9
    • /
    • pp.345-351
    • /
    • 2020
  • Administrative agencies today are paying keen attention to big data analysis to improve their policy responsiveness. Of all the big data, news articles can be used to understand public opinion regarding policy and policy issues. The amount of news output has increased rapidly because of the emergence of new online media outlets, which calls for the use of automated bots or automatic document classification tools. There are, however, limits to the automatic collection of news articles related to specific agencies or departments based on the existing news article categories and keyword search queries. Thus, this paper proposes a method to process articles using classification glossaries that take into account each agency's different work features. To this end, classification glossaries were developed by extracting the work features of different departments using Word2Vec and topic modeling techniques from news articles related to different agencies. As a result, the automatic classification of newspaper articles for each department yielded approximately 71% accuracy. This study is meaningful in making academic and practical contributions because it presents a method of extracting the work features for each department, and it is an unsupervised learning-based automatic classification method for automatically classifying news articles relevant to each agency.

Similar Patent Search Service System using Latent Dirichlet Allocation (잠재 의미 분석을 적용한 유사 특허 검색 서비스 시스템)

  • Lim, HyunKeun;Kim, Jaeyoon;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.8
    • /
    • pp.1049-1054
    • /
    • 2018
  • Keyword searching used in the past as a method of finding similar patents, and automated classification by machine learning is using in recently. Keyword searching is a method of analyzing data that is formalized through data refinement. While the accuracy for short text is high, long one consisted of several words like as document that is not able to analyze the meaning contained in sentences. In semantic analysis level, the method of automatic classification is used to classify sentences composed of several words by unstructured data analysis. There was an attempt to find similar documents by combining the two methods. However, it have a problem in the algorithm w the methods of analysis are different ways to use simultaneous unstructured data and regular data. In this paper, we study the method of extracting keywords implied in the document and using the LDA(Latent Semantic Analysis) method to classify documents efficiently without human intervention and finding similar patents.

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.