• Title/Summary/Keyword: Classification Algorithms

Search Result 1,191, Processing Time 0.03 seconds

Web Page Classification System based upon Ontology (온톨로지 기반의 웹 페이지 분류 시스템)

  • Choi Jaehyuk;Seo Haesung;Noh Sanguk;Choi Kyunghee;Jung Gihyun
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.723-734
    • /
    • 2004
  • In this paper, we present an automated Web page classification system based upon ontology. As a first step, to identify the representative terms given a set of classes, we compute the product of term frequency and document frequency. Secondly, the information gain of each term prioritizes it based on the possibility of classification. We compile a pair of the terms selected and a web page classification into rules using machine learning algorithms. The compiled rules classify any Web page into categories defined on a domain ontology. In the experiments, 78 terms out of 240 terms were identified as representative features given a set of Web pages. The resulting accuracy of the classification was, on the average, 83.52%.

A Study of Research on Methods of Automated Biomedical Document Classification using Topic Modeling and Deep Learning (토픽모델링과 딥 러닝을 활용한 생의학 문헌 자동 분류 기법 연구)

  • Yuk, JeeHee;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.2
    • /
    • pp.63-88
    • /
    • 2018
  • This research evaluated differences of classification performance for feature selection methods using LDA topic model and Doc2Vec which is based on word embedding using deep learning, feature corpus sizes and classification algorithms. In addition to find the feature corpus with high performance of classification, an experiment was conducted using feature corpus was composed differently according to the location of the document and by adjusting the size of the feature corpus. Conclusionally, in the experiments using deep learning evaluate training frequency and specifically considered information for context inference. This study constructed biomedical document dataset, Disease-35083 which consisted biomedical scholarly documents provided by PMC and categorized by the disease category. Throughout the study this research verifies which type and size of feature corpus produces the highest performance and, also suggests some feature corpus which carry an extensibility to specific feature by displaying efficiency during the training time. Additionally, this research compares the differences between deep learning and existing method and suggests an appropriate method by classification environment.

Spectral Mixture Analysis Using Modified IEA Algorithm for Forest Classification (수정된 IEA 기반의 분광혼합분석 기법을 이용한 임상분류)

  • Song, Ahram;Han, Youkyung;Kim, Younghyun;Kim, Yongil
    • Korean Journal of Remote Sensing
    • /
    • v.30 no.2
    • /
    • pp.219-226
    • /
    • 2014
  • Fractional values resulted from the spectral mixture analysis could be used to classify not only urban area with various materials but also forest area in more detailed spatial scale. Especially South Korea is largely consist of mixed forest, so the spectral mixture analysis is suitable as a classification method. For the successful classification using spectral mixture analysis, extraction of optimal endmembers is prerequisite process. Though geometric endmember selection has been widely used, it is barely suitable for forest area. Therefore, in this study, we modified Iterative Error Analysis (IEA), one of the most famous algorithms of image endmember selection which extracts pure pixel directly from the image. The endmembers which represent deciduous and coniferous trees are automatically extracted. The experiments were implemented on two sites of Compact Airborne Spectrographic Imager (CASI) and classified forest area into two types. Accuracies of each classification results were 86% and 90%, which mean proposed algorithm effectively extracted proper endmembers. For the more accurate classification, another substances like forest gap should be considered.

A Study on the Performance of Deep learning-based Automatic Classification of Forest Plants: A Comparison of Data Collection Methods (데이터 수집방법에 따른 딥러닝 기반 산림수종 자동분류 정확도 변화에 관한 연구)

  • Kim, Bomi;Woo, Heesung;Park, Joowon
    • Journal of Korean Society of Forest Science
    • /
    • v.109 no.1
    • /
    • pp.23-30
    • /
    • 2020
  • The use of increased computing power, machine learning, and deep learning techniques have dramatically increased in various sectors. In particular, image detection algorithms are broadly used in forestry and remote sensing areas to identify forest types and tree species. However, in South Korea, machine learning has rarely, if ever, been applied in forestry image detection, especially to classify tree species. This study integrates the application of machine learning and forest image detection; specifically, we compared the ability of two machine learning data collection methods, namely image data captured by forest experts (D1) and web-crawling (D2), to automate the classification of five trees species. In addition, two methods of characterization to train/test the system were investigated. The results indicated a significant difference in classification accuracy between D1 and D2: the classification accuracy of D1 was higher than that of D2. In order to increase the classification accuracy of D2, additional data filtering techniques were required to reduce the noise of uncensored image data.

Effective Mood Classification Method based on Music Segments (부분 정보에 기반한 효과적인 음악 무드 분류 방법)

  • Park, Gun-Han;Park, Sang-Yong;Kang, Seok-Joong
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.3
    • /
    • pp.391-400
    • /
    • 2007
  • According to the recent advances in multimedia computing, storage and searching technology have made large volume of music contents become prevalent. Also there has been increasing needs for the study on efficient categorization and searching technique for music contents management. In this paper, a new classifying method using the local information of music content and music tone feature is proposed. While the conventional classifying algorithms are based on entire information of music content, the algorithm proposed in this paper focuses on only the specific local information, which can drastically reduce the computing time without losing classifying accuracy. In order to improve the classifying accuracy, it uses a new classification feature based on music tone. The proposed method has been implemented as a part of MuSE (Music Search/Classification Engine) which was installed on various systems including commercial PDAs and PCs.

  • PDF

A Study on the Improvement of Multitree Pattern Recognition Algorithm (Multitree 형상 인식 기법의 성능 개선에 관한 연구)

  • 김태성;이정희;김성대
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.14 no.4
    • /
    • pp.348-359
    • /
    • 1989
  • The multitree pattern recognition algorithm proposed by [1] and [2] is modified in order to improve its performance. The basic idea of the multitree pattern classification algorithm is that the binary dceision tree used to classify an unknow pattern is constructed for each feature and that at each stage, classification rule decides whether to classify the unknown pattern or to extract the feature value according to the feature ordet. So the feature ordering needed in the calssification procedure is simple and the number of features used in the classification procedure is small compared with other classification algorithms. Thus the algorithm can be easily applied to real pattern recognition problems even when the number of features and that of the classes are very large. In this paper, the wighting factor assignment scheme in the decision procedure is modified and various classification rules are proposed by means of the weighting factor. And the branch and bound method is applied to feature subset selection and feature ordering. Several experimental results show that the performance of the multitree pattern classification algorithm is improved by the proposed scheme.

  • PDF

An Empirical Study on Improving the Performance of Text Categorization Considering the Relationships between Feature Selection Criteria and Weighting Methods (자질 선정 기준과 가중치 할당 방식간의 관계를 고려한 문서 자동분류의 개선에 대한 연구)

  • Lee Jae-Yun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.39 no.2
    • /
    • pp.123-146
    • /
    • 2005
  • This study aims to find consistent strategies for feature selection and feature weighting methods, which can improve the effectiveness and efficiency of kNN text classifier. Feature selection criteria and feature weighting methods are as important factor as classification algorithms to achieve good performance of text categorization systems. Most of the former studies chose conflicting strategies for feature selection criteria and weighting methods. In this study, the performance of several feature selection criteria are measured considering the storage space for inverted index records and the classification time. The classification experiments in this study are conducted to examine the performance of IDF as feature selection criteria and the performance of conventional feature selection criteria, e.g. mutual information, as feature weighting methods. The results of these experiments suggest that using those measures which prefer low-frequency features as feature selection criterion and also as feature weighting method. we can increase the classification speed up to three or five times without loosing classification accuracy.

Multi-Level based Application Traffic Classification Method (멀티 레벨 기반의 응용 트래픽 분석 방법)

  • Oh, Young-Suk;Park, Jun-Sang;Yoon, Sung-Ho;Park, Jin-Wan;Lee, Sang-Woo;Kim, Myung-Sup
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.8B
    • /
    • pp.1170-1178
    • /
    • 2010
  • Recently as the number of users and application traffic is increasing on high speed network, the importance of application traffic classification is growing more and more for efficient network resource management. Although a number of methods and algorithms for traffic classification have been introduced, they have some limitations in terms of accuracy and completeness. In this paper we propose an application traffic classification based multi-level architecture which integrates several signature-based methods and behavior algorithm, and analyzes traffic using correlation among traffic flows. By strengthening the strength and making up for the weakness of individual methods we could construct a flexible and robust multi-level classification system. Also, by experiments with our campus network traffic we proved the performance and validity of the proposed mechanism.

A Study on Classification Models for Predicting Bankruptcy Based on XAI (XAI 기반 기업부도예측 분류모델 연구)

  • Jihong Kim;Nammee Moon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.8
    • /
    • pp.333-340
    • /
    • 2023
  • Efficient prediction of corporate bankruptcy is an important part of making appropriate lending decisions for financial institutions and reducing loan default rates. In many studies, classification models using artificial intelligence technology have been used. In the financial industry, even if the performance of the new predictive models is excellent, it should be accompanied by an intuitive explanation of the basis on which the result was determined. Recently, the US, EU, and South Korea have commonly presented the right to request explanations of algorithms, so transparency in the use of AI in the financial sector must be secured. In this paper, an artificial intelligence-based interpretable classification prediction model was proposed using corporate bankruptcy data that was open to the outside world. First, data preprocessing, 5-fold cross-validation, etc. were performed, and classification performance was compared through optimization of 10 supervised learning classification models such as logistic regression, SVM, XGBoost, and LightGBM. As a result, LightGBM was confirmed as the best performance model, and SHAP, an explainable artificial intelligence technique, was applied to provide a post-explanation of the bankruptcy prediction process.

Stabilization of Power System using Self Tuning Fuzzy controller (자기조정 퍼지제어기에 의한 전력계통 안정화에 관한 연구)

  • 정형환;정동일;주석민
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.5 no.2
    • /
    • pp.58-69
    • /
    • 1995
  • In this paper GFI (Generalized Fuzzy Isodata) and FI (Fuzzy Isodata) algorithms are studied and applied to the tire tread pattern classification problem. GFI algorithm which repeatedly grouping the partitioned cluster depending on the fuzzy partition matrix is general form of GI algorithm. In the constructing the binary tree using GFI algorithm cluster validity, namely, whether partitioned cluster is feasible or not is checked and construction of the binary tree is obtained by FDH clustering algorithm. These algorithms show the good performance in selecting the prototypes of each patterns and classifying patterns. Directions of edge in the preprocessed image of tire tread pattern are selected as features of pattern. These features are thought to have useful information which well represents the characteristics of patterns.

  • PDF