• Title/Summary/Keyword: optimal classification method

Search Result 368, Processing Time 0.032 seconds

Using GA based Input Selection Method for Artificial Neural Network Modeling Application to Bankruptcy Prediction (유전자 알고리즘을 활용한 인공신경망 모형 최적입력변수의 선정: 부도예측 모형을 중심으로)

  • 홍승현;신경식
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.1
    • /
    • pp.227-249
    • /
    • 2003
  • Prediction of corporate failure using past financial data is a well-documented topic. Early studies of bankruptcy prediction used statistical techniques such as multiple discriminant analysis, logit and probit. Recently, however, numerous studies have demonstrated that artificial intelligence such as neural networks can be an alternative methodology for classification problems to which traditional statistical methods have long been applied. In building neural network model, the selection of independent and dependent variables should be approached with great care and should be treated as model construction process. Irrespective of the efficiency of a teaming procedure in terms of convergence, generalization and stability, the ultimate performance of the estimator will depend on the relevance of the selected input variables and the quality of the data used. Approaches developed in statistical methods such as correlation analysis and stepwise selection method are often very useful. These methods, however, may not be the optimal ones for the development of neural network model. In this paper, we propose a genetic algorithms approach to find an optimal or near optimal input variables fur neural network modeling. The proposed approach is demonstrated by applications to bankruptcy prediction modeling. Our experimental results show that this approach increases overall classification accuracy rate significantly.

  • PDF

An Improved method of Two Stage Linear Discriminant Analysis

  • Chen, Yarui;Tao, Xin;Xiong, Congcong;Yang, Jucheng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.3
    • /
    • pp.1243-1263
    • /
    • 2018
  • The two-stage linear discrimination analysis (TSLDA) is a feature extraction technique to solve the small size sample problem in the field of image recognition. The TSLDA has retained all subspace information of the between-class scatter and within-class scatter. However, the feature information in the four subspaces may not be entirely beneficial for classification, and the regularization procedure for eliminating singular metrics in TSLDA has higher time complexity. In order to address these drawbacks, this paper proposes an improved two-stage linear discriminant analysis (Improved TSLDA). The Improved TSLDA proposes a selection and compression method to extract superior feature information from the four subspaces to constitute optimal projection space, where it defines a single Fisher criterion to measure the importance of single feature vector. Meanwhile, Improved TSLDA also applies an approximation matrix method to eliminate the singular matrices and reduce its time complexity. This paper presents comparative experiments on five face databases and one handwritten digit database to validate the effectiveness of the Improved TSLDA.

A Study on the Improvement of Standards of Traffic Information Service and Provide Services Based on the Detailed Traffic Information (교통정보서비스 표출기준 개선 및 상세교통정보 기반 서비스 제공방안 연구)

  • Bae, Kwangsoo;Lee, Seungcheol
    • Journal of Information Technology Services
    • /
    • v.17 no.4
    • /
    • pp.85-100
    • /
    • 2018
  • In this study, we formulated rational criteria to efficiently provide traffic information services via a crafted approach. By utilizing this, we presented a detailed traffic information service providing method that can overcome the limitations of existing link unit information provision system. Three methodologies such as user survey, data mining, and KHCM (Korea Highway Capacity Manual) utilization method were applied to formulate a rational expression standard for traffic information service. Each method was designed to establish a quantitative criterion for various traffic conditions and to enable user-oriented traffic information service in consideration of the traffic principal/compatibility. Considering the results of each methodological analysis in a comprehensive manner, the basic expression standards for traffic information service was formulated. Then we presented improvements such as traffic condition step by road, speed range of traffic condition, expression term of traffic condition and so on. In order to complement the problems of the information provision system of the existing link unit based on the derived improvement criterion, we presented the detailed traffic information service provision method by using the traffic speed data of the second order. And we applied this to the two links of Daegu city. The method presented in this research can improve the quality of traffic information service. Not only it can be used for various fields such as optimal route search, traffic safety service and so on.

Document classification using a deep neural network in text mining (텍스트 마이닝에서 심층 신경망을 이용한 문서 분류)

  • Lee, Bo-Hui;Lee, Su-Jin;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.5
    • /
    • pp.615-625
    • /
    • 2020
  • The document-term frequency matrix is a term extracted from documents in which the group information exists in text mining. In this study, we generated the document-term frequency matrix for document classification according to research field. We applied the traditional term weighting function term frequency-inverse document frequency (TF-IDF) to the generated document-term frequency matrix. In addition, we applied term frequency-inverse gravity moment (TF-IGM). We also generated a document-keyword weighted matrix by extracting keywords to improve the document classification accuracy. Based on the keywords matrix extracted, we classify documents using a deep neural network. In order to find the optimal model in the deep neural network, the accuracy of document classification was verified by changing the number of hidden layers and hidden nodes. Consequently, the model with eight hidden layers showed the highest accuracy and all TF-IGM document classification accuracy (according to parameter changes) were higher than TF-IDF. In addition, the deep neural network was confirmed to have better accuracy than the support vector machine. Therefore, we propose a method to apply TF-IGM and a deep neural network in the document classification.

Optimal Thresholds from Mixture Distributions (혼합분포에서 최적분류점)

  • Hong, Chong-Sun;Joo, Jae-Seon;Choi, Jin-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.1
    • /
    • pp.13-28
    • /
    • 2010
  • Assuming a mixture distribution for credit evaluation studies, we discuss estimating threshold methods to minimize errors that default borrowers are predicted as non defaults or non defaults are regarded as defaults. A method by using statistical hypotheses tests, the most powerful test and generalized likelihood ratio test, for the probability density functions which are defined with the score random variable and the parameter space consisted of only two elements such as the default and non default states is proposed to estimate a threshold. And anther optimal thresholds to maximize classification accuracy measures of the accuracy and the true rate for ROC and CAP curves are estimated as equations related with these probability density functions. Three kinds of optimal thresholds in terms of the hypotheses testing, the accuracy and the true rate are obtained from normal random samples with various means and variances. The sums of the type I and type II errors corresponding to each optimal threshold are obtained and compared. Finally we discuss about their efficiency and derive conclusions.

Development of Classification Model for hERG Ion Channel Inhibitors Using SVM Method (SVM 방법을 이용한 hERG 이온 채널 저해제 예측모델 개발)

  • Gang, Sin-Moon;Kim, Han-Jo;Oh, Won-Seok;Kim, Sun-Young;No, Kyoung-Tai;Nam, Ky-Youb
    • Journal of the Korean Chemical Society
    • /
    • v.53 no.6
    • /
    • pp.653-662
    • /
    • 2009
  • Developing effective tools for predicting absorption, distribution, metabolism, excretion properties and toxicity (ADME/T) of new chemical entities in the early stage of drug design is one of the most important tasks in drug discovery and development today. As one of these attempts, support vector machines (SVM) has recently been exploited for the prediction of ADME/T related properties. However, two problems in SVM modeling, i.e. feature selection and parameters setting, are still far from solved. The two problems have been shown to be crucial to the efficiency and accuracy of SVM classification. In particular, the feature selection and optimal SVM parameters setting influence each other, which indicates that they should be dealt with simultaneously. In this account, we present an integrated practical solution, in which genetic-based algorithm (GA) is used for feature selection and grid search (GS) method for parameters optimization. hERG ion-channel inhibitor classification models of ADME/T related properties has been built for assessing and testing the proposed GA-GS-SVM. We generated 6 different models that are 3 different single models and 3 different ensemble models using training set - 1891 compounds and validated with external test set - 175 compounds. We compared single model with ensemble model to solve data imbalance problems. It was able to improve accuracy of prediction to use ensemble model.

EFFICIENT SPECKLE NOISE FILTERING OF SAR IMAGES (SAR 영상의 SPECKLE 잡음 제거)

  • 김병수;최규홍;원중선
    • Journal of Astronomy and Space Sciences
    • /
    • v.15 no.1
    • /
    • pp.175-182
    • /
    • 1998
  • Any classification process using SAR images presupposes the reduction of multiplicative speckle noise, since the variations caused by speckle make it extremely difficult to distinguish between neighboring classes within the feature space. Therefore, several adaptive filter algorithms have been developed in order to distinguish between them. These algorithms aim at the preservation of edges and single scattering peaks, and smooths homogeneous areas as much as possible. This task is rendered more difficult by the multiplicative nature of the speckle noise the signal variation depends on the signal itself. In this paper, LEE(Lee 1908) and R-LEE(Lee 1981) filters using local statistics, local mean and variance, are applied to RADARSAT SAR images. Also, a new method of speckle filtering, EPOS(Edge Preserving Optimal Speckle)(Hagg & Sties 1994) filter based on the statistical properties of speckle noise is described and applied. And then, the results of filtering SAR images with LEE, R-LEE and EPOS filters are compared with mean and median filters.

  • PDF

Rough Entropy-based Knowledge Reduction using Rough Set Theory (러프집합 이론을 이용한 러프 엔트로피 기반 지식감축)

  • Park, In-Kyoo
    • Journal of Digital Convergence
    • /
    • v.12 no.6
    • /
    • pp.223-229
    • /
    • 2014
  • In an attempt to retrieve useful information for an efficient decision in the large knowledge system, it is generally necessary and important for a refined feature selection. Rough set has difficulty in generating optimal reducts and classifying boundary objects. In this paper, we propose quick reduction algorithm generating optimal features by rough entropy analysis for condition and decision attributes to improve these restrictions. We define a new conditional information entropy for efficient feature extraction and describe procedure of feature selection to classify the significance of features. Through the simulation of 5 datasets from UCI storage, we compare our feature selection approach based on rough set theory with the other selection theories. As the result, our modeling method is more efficient than the previous theories in classification accuracy for feature selection.

Identification of Pb-Zn ore under the condition of low count rate detection of slim hole based on PGNAA technology

  • Haolong Huang;Pingkun Cai;Wenbao Jia;Yan Zhang
    • Nuclear Engineering and Technology
    • /
    • v.55 no.5
    • /
    • pp.1708-1717
    • /
    • 2023
  • The grade analysis of lead-zinc ore is the basis for the optimal development and utilization of deposits. In this study, a method combining Prompt Gamma Neutron Activation Analysis (PGNAA) technology and machine learning is proposed for lead-zinc mine borehole logging, which can identify lead-zinc ores of different grades and gangue in the formation, providing real-time grade information qualitatively and semi-quantitatively. Firstly, Monte Carlo simulation is used to obtain a gamma-ray spectrum data set for training and testing machine learning classification algorithms. These spectra are broadened, normalized and separated into inelastic scattering and capture spectra, and then used to fit different classifier models. When the comprehensive grade boundary of high- and low-grade ores is set to 5%, the evaluation metrics calculated by the 5-fold cross-validation show that the SVM (Support Vector Machine), KNN (K-Nearest Neighbor), GNB (Gaussian Naive Bayes) and RF (Random Forest) models can effectively distinguish lead-zinc ore from gangue. At the same time, the GNB model has achieved the optimal accuracy of 91.45% when identifying high- and low-grade ores, and the F1 score for both types of ores is greater than 0.9.

Novel Category Discovery in Plant Species and Disease Identification through Knowledge Distillation

  • Jiuqing Dong;Alvaro Fuentes;Mun Haeng Lee;Taehyun Kim;Sook Yoon;Dong Sun Park
    • Smart Media Journal
    • /
    • v.13 no.7
    • /
    • pp.36-44
    • /
    • 2024
  • Identifying plant species and diseases is crucial for maintaining biodiversity and achieving optimal crop yields, making it a topic of significant practical importance. Recent studies have extended plant disease recognition from traditional closed-set scenarios to open-set environments, where the goal is to reject samples that do not belong to known categories. However, in open-world tasks, it is essential not only to define unknown samples as "unknown" but also to classify them further. This task assumes that images and labels of known categories are available and that samples of unknown categories can be accessed. The model classifies unknown samples by learning the prior knowledge of known categories. To the best of our knowledge, there is no existing research on this topic in plant-related recognition tasks. To address this gap, this paper utilizes knowledge distillation to model the category space relationships between known and unknown categories. Specifically, we identify similarities between different species or diseases. By leveraging a fine-tuned model on known categories, we generate pseudo-labels for unknown categories. Additionally, we enhance the baseline method's performance by using a larger pre-trained model, dino-v2. We evaluate the effectiveness of our method on the large plant specimen dataset Herbarium 19 and the disease dataset Plant Village. Notably, our method outperforms the baseline by 1% to 20% in terms of accuracy for novel category classification. We believe this study will contribute to the community.