• Title/Summary/Keyword: School categorization

Search Result 198, Processing Time 0.024 seconds

Representation of Texts into String Vectors for Text Categorization

  • Jo, Tae-Ho
    • Journal of Computing Science and Engineering
    • /
    • v.4 no.2
    • /
    • pp.110-127
    • /
    • 2010
  • In this study, we propose a method for encoding documents into string vectors, instead of numerical vectors. A traditional approach to text categorization usually requires encoding documents into numerical vectors. The usual method of encoding documents therefore causes two main problems: huge dimensionality and sparse distribution. In this study, we modify or create machine learning-based approaches to text categorization, where string vectors are received as input vectors, instead of numerical vectors. As a result, we can improve text categorization performance by avoiding these two problems.

Table based Matching Algorithm for Soft Categorization of News Articles in Reuter 21578

  • Jo, Tae-Ho
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.6
    • /
    • pp.875-882
    • /
    • 2008
  • This research proposes an alternative approach to machine learning based ones for text categorization. For using machine learning based approaches for any task of text mining, documents should be encoded into numerical vectors; it causes two problems: huge dimensionality and sparse distribution. Although there are various tasks of text mining such as text categorization, text clustering, and text summarization, the scope of this research is restricted to text categorization. The idea of this research is to avoid the two problems by encoding a document or documents into a table, instead of numerical vectors. Therefore, the goal of this research is to improve the performance of text categorization by proposing approaches, which are free from the two problems.

  • PDF

Semantic Word Categorization using Feature Similarity based K Nearest Neighbor

  • Jo, Taeho
    • Journal of Multimedia Information System
    • /
    • v.5 no.2
    • /
    • pp.67-78
    • /
    • 2018
  • This article proposes the modified KNN (K Nearest Neighbor) algorithm which considers the feature similarity and is applied to the word categorization. The texts which are given as features for encoding words into numerical vectors are semantic related entities, rather than independent ones, and the synergy effect between the word categorization and the text categorization is expected by combining both of them with each other. In this research, we define the similarity metric between two vectors, including the feature similarity, modify the KNN algorithm by replacing the exiting similarity metric by the proposed one, and apply it to the word categorization. The proposed KNN is empirically validated as the better approach in categorizing words in news articles and opinions. The significance of this research is to improve the classification performance by utilizing the feature similarities.

Modified Version of SVM for Text Categorization

  • Jo, Tae-Ho
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.8 no.1
    • /
    • pp.52-60
    • /
    • 2008
  • This research proposes a new strategy where documents are encoded into string vectors for text categorization and modified versions of SVM to be adaptable to string vectors. Traditionally, when the traditional version of SVM is used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and apply the modified version of SVM adaptable to string vectors for text categorization.

Neural Text Categorizer for Exclusive Text Categorization

  • Jo, Tae-Ho
    • Journal of Information Processing Systems
    • /
    • v.4 no.2
    • /
    • pp.77-86
    • /
    • 2008
  • This research proposes a new neural network for text categorization which uses alternative representations of documents to numerical vectors. Since the proposed neural network is intended originally only for text categorization, it is called NTC (Neural Text Categorizer) in this research. Numerical vectors representing documents for tasks of text mining have inherently two main problems: huge dimensionality and sparse distribution. Although many various feature selection methods are developed to address the first problem, the reduced dimension remains still large. If the dimension is reduced excessively by a feature selection method, robustness of text categorization is degraded. Even if SVM (Support Vector Machine) is tolerable to huge dimensionality, it is not so to the second problem. The goal of this research is to address the two problems at same time by proposing a new representation of documents and a new neural network using the representation for its input vector.

Inverted Index based Modified Version of KNN for Text Categorization

  • Jo, Tae-Ho
    • Journal of Information Processing Systems
    • /
    • v.4 no.1
    • /
    • pp.17-26
    • /
    • 2008
  • This research proposes a new strategy where documents are encoded into string vectors and modified version of KNN to be adaptable to string vectors for text categorization. Traditionally, when KNN are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and modify the supervised learning algorithms adaptable to string vectors for text categorization.

An Interactivity-based Framework for Classifying Digital Games

  • Kim, Yong-Young;Kim, Mi-Hye
    • International Journal of Contents
    • /
    • v.6 no.4
    • /
    • pp.35-38
    • /
    • 2010
  • The current categorization of digital games is not objective and is unable to assess the latest and more complex digital games. Digital games need to be systematically categorized so that similarities and differences can be identified and analyzed. The fundamental characteristic of digital games is interactivity. This paper addresses the current categorization gaps through the lens of interactivity. Through this lens, a conceptual framework consisting of primary and corresponding participants and controlling characters is developed. Future research topics are then presented based on this framework.

A Study on Handwritten Digit Categorization of RAM-based Neural Network (RAM 기반 신경망을 이용한 필기체 숫자 분류 연구)

  • Park, Sang-Moo;Kang, Man-Mo;Eom, Seong-Hoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.3
    • /
    • pp.201-207
    • /
    • 2012
  • A RAM-based neural network is a weightless neural network based on binary neural network(BNN) which is efficient neural network with a one-shot learning. RAM-based neural network has multiful information bits and store counts of training in BNN. Supervised learning based on the RAM-based neural network has the excellent performance in pattern recognition but in pattern categorization with unsupervised learning as unsuitable. In this paper, we propose a unsupervised learning algorithm in the RAM-based neural network to perform pattern categorization. By the proposed unsupervised learning algorithm, RAM-based neural network create categories depending on the input pattern by itself. Therefore, RAM-based neural network for supervised learning and unsupervised learning should proof of all possible complex models. The training data for experiments provided by the MNIST offline handwritten digits which is consist of 0 to 9 multi-pattern.

Development of Participatory Ecological Restoration System through Integrative Categorization of Disturbed Areas in BaigDooDaeGahn (백두대간 대규모 훼손지의 통합적 유형구분을 통한 참여형 복원 시스템 개발 - 도입프로그램(생태교육·생태관광)을 중심으로 -)

  • Ahn, Tong Mahn;Kim, In Ho;Lee, Jae Young;Kim, Chan Kook;Chae, Hye Sung;Lee, Young;Min, So Young;Kim, Min Woo
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.12 no.4
    • /
    • pp.11-22
    • /
    • 2009
  • This was a 2nd-year study aiming at developing the procedure of alternative system that was intended to restore not only biophysically disturbed areas but also psychologically and socially damaged community. It was suggested that this participatory restoration system could be constructed based on integrative categorization processes consisting of damage types and readiness of local residents for participation. Three case study sites-High-One resort, Lafarge-Halla cement, and high-altitude farmland near Gangneung city, were selected to apply the theoretical framework proposed as a result of 1st-year work. In order to develop introductory programs, key concepts such as forest for future, carbon offset forest, and healing forest, have been suggested based on analysis of 6 system components including human resources, communication, legal and institutional support, financial sources, restoration methods, and activity programs for each site. More detailed processes and procedures can be identified, defined, and refined after the end of final, 3rd-stage of the study in April of 2010.

A Suggestion of Criteria for Categorizing Libraries into Types: Linking between Library and Information (도서관 관종구분의 기준에 대한 고찰)

  • Kim, Gi-Yeong;Choi, Yoon-Hee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.1
    • /
    • pp.395-404
    • /
    • 2012
  • The categorization of libraries into several types supports an understanding of the concept of library and also provides a framework for the practice of library management, such as planning and management. Although a 4-type categorization with public, academic, special, and school libraries is the most traditional and general approach to categorization, the definition of each type has been set enumeratively and inductively, so that it has weaknesses in its clarity between categories and in its applicability to a new environment. In this conceptual paper, deductive and analytical criteria for the 4-type categorization are suggested based on characteristics of information needs. Implications of the suggestions about library management, and especially, the meaning and impact of stakeholders on library management are discussed. Additionally, this paper attempts to put forth a conceptual link between library and information.