• 제목/요약/키워드: Classification Variables

검색결과 920건 처리시간 0.022초

도산예측을 위한 유전 알고리듬 기반 이진분류기법의 개발 (A GA-based Binary Classification Method for Bankruptcy Prediction)

  • 민재형;정철우
    • 한국경영과학회지
    • /
    • 제33권2호
    • /
    • pp.1-16
    • /
    • 2008
  • The purpose of this paper is to propose a new binary classification method for predicting corporate failure based on genetic algorithm, and to validate its prediction power through empirical analysis. Establishing virtual companies representing bankrupt companies and non-bankrupt ones respectively, the proposed method measures the similarity between the virtual companies and the subject for prediction, and classifies the subject into either bankrupt or non-bankrupt one. The values of the classification variables of the virtual companies and the weights of the variables are determined by the proper model to maximize the hit ratio of training data set using genetic algorithm. In order to test the validity of the proposed method, we compare its prediction accuracy with ones of other existing methods such as multi-discriminant analysis, logistic regression, decision tree, and artificial neural network, and it is shown that the binary classification method we propose in this paper can serve as a premising alternative to the existing methods for bankruptcy prediction.

Application of Random Forests to Assessment of Importance of Variables in Multi-sensor Data Fusion for Land-cover Classification

  • Park No-Wook;Chi kwang-Hoon
    • 대한원격탐사학회지
    • /
    • 제22권3호
    • /
    • pp.211-219
    • /
    • 2006
  • A random forests classifier is applied to multi-sensor data fusion for supervised land-cover classification in order to account for the importance of variable. The random forests approach is a non-parametric ensemble classifier based on CART-like trees. The distinguished feature is that the importance of variable can be estimated by randomly permuting the variable of interest in all the out-of-bag samples for each classifier. Two different multi-sensor data sets for supervised classification were used to illustrate the applicability of random forests: one with optical and polarimetric SAR data and the other with multi-temporal Radarsat-l and ENVISAT ASAR data sets. From the experimental results, the random forests approach could extract important variables or bands for land-cover discrimination and showed reasonably good performance in terms of classification accuracy.

빅데이터 분류 기법에 따른 벤처 기업의 성장 단계별 차이 분석 (The Difference Analysis between Maturity Stages of Venture Firms by Classification Techniques of Big Data)

  • 정병호
    • 디지털산업정보학회논문지
    • /
    • 제15권4호
    • /
    • pp.197-212
    • /
    • 2019
  • The purpose of this study is to identify the maturity stages of venture firms through classification analysis, which is widely used as a big data technique. Venture companies should develop a competitive advantage in the market. And the maturity stage of a company can be classified into five stages. I will analyze a difference in the growth stage of venture firms between the survey response and the statistical classification methods. The firm growth level distinguished five stages and was divided into the period of start-up and declines. A classification method of big data uses popularly k-mean cluster analysis, hierarchical cluster analysis, artificial neural network, and decision tree analysis. I used variables that asset increase, capital increase, sales increase, operating profit increase, R&D investment increase, operation period and retirement number. The research results, each big data analysis technique showed a large difference of samples sized in the group. In particular, the decision tree and neural networks' methods were classified as three groups rather than five groups. The groups size of all classification analysis was all different by the big data analysis methods. Furthermore, according to the variables' selection and the sample size may be dissimilar results. Also, each classed group showed a number of competitive differences. The research implication is that an analysts need to interpret statistics through management theory in order to interpret classification of big data results correctly. In addition, the choice of classification analysis should be determined by considering not only management theory but also practical experience. Finally, the growth of venture firms needs to be examined by time-series analysis and closely monitored by individual firms. And, future research will need to include significant variables of the company's maturity stages.

A Resetting Scheme for Process Parameters using the Mahalanobis-Taguchi System

  • Park, Chang-Soon
    • 응용통계연구
    • /
    • 제25권4호
    • /
    • pp.589-603
    • /
    • 2012
  • Mahalanobis-Taguchi system(MTS) is a statistical tool for classifying the normal group and abnormal group in multivariate data structures. In addition to the classification itself, the MTS uses a method for selecting variables useful for the classification. This method can be used efficiently especially when the abnormal group data are scattered without a specific directionality. When the feedback adjustment procedure through the measurements of the process output for controlling process input variables is not practically possible, the reset procedure can be an alternative one. This article proposes a reset procedure using the MTS. Moreover, a method for identifying input variables to reset is also proposed by the use of the contribution. The identification of the root-cause parameters using the existing dimension-reduced contribution tends to be difficult due to the variety of correlation relationships of multivariate data structures. However, it became possible to provide an improved decision when used together with the location-centered contribution and the individual-parameter contribution.

Improvement of Land Cover Classification Accuracy by Optimal Fusion of Aerial Multi-Sensor Data

  • Choi, Byoung Gil;Na, Young Woo;Kwon, Oh Seob;Kim, Se Hun
    • 한국측량학회지
    • /
    • 제36권3호
    • /
    • pp.135-152
    • /
    • 2018
  • The purpose of this study is to propose an optimal fusion method of aerial multi - sensor data to improve the accuracy of land cover classification. Recently, in the fields of environmental impact assessment and land monitoring, high-resolution image data has been acquired for many regions for quantitative land management using aerial multi-sensor, but most of them are used only for the purpose of the project. Hyperspectral sensor data, which is mainly used for land cover classification, has the advantage of high classification accuracy, but it is difficult to classify the accurate land cover state because only the visible and near infrared wavelengths are acquired and of low spatial resolution. Therefore, there is a need for research that can improve the accuracy of land cover classification by fusing hyperspectral sensor data with multispectral sensor and aerial laser sensor data. As a fusion method of aerial multisensor, we proposed a pixel ratio adjustment method, a band accumulation method, and a spectral graph adjustment method. Fusion parameters such as fusion rate, band accumulation, spectral graph expansion ratio were selected according to the fusion method, and the fusion data generation and degree of land cover classification accuracy were calculated by applying incremental changes to the fusion variables. Optimal fusion variables for hyperspectral data, multispectral data and aerial laser data were derived by considering the correlation between land cover classification accuracy and fusion variables.

공개동굴의 유형분류에 관한 사례연구 (Case Studies Regarding the Classification of Public Caves)

  • 홍현철
    • 동굴
    • /
    • 제93호
    • /
    • pp.13-25
    • /
    • 2009
  • 우리나라의 동굴이 조사되고 개발되기 시작했던 1970년대에는 동굴에 대한 유형분류는 학술적 근거를 중심으로 이루어졌고, 그 결과 성인분류, 규모분류, 형태분류에 의한 국한적 동굴분류 방법을 사용하고 있다. 그러나 동굴이 개방되고 관광자원의 하나로서 자리잡고 있는 요즘, 관광객의 수요자 입장에서는 동굴의 유형이나 구분을 단지 학술적 근거뿐 만아니라 정보선택 요소로서의 활용이 더 필요하다. 따라서 본 연구에서는 동굴의 학술적 분류 요소를 확대하여, 동굴내부의 다양한 변수 선택과 동굴 외부의 인문적 환경을 고려한 변수를 선택하여, 동굴관광자원의 정보제공차원에서의 유형분류를 사례연구를 통하여 고찰해보고자 하였다. 분석기법은 다변량해석 기법의 하나인 군집분석(cluster analysis)을 적용하고 그 결과를 검토한다. 분석결과, 개방동굴은 동굴을 포함한 주변지역의 인문 환경적 여러 요소에 따라 다양한 분류 기준을 제시할 수 있으며, 본 연구에서는 지역구분에 따른 개방동굴의 유형분류가 결과로 도출되었다.

A Classification Method Using Data Reduction

  • Uhm, Daiho;Jun, Sung-Hae;Lee, Seung-Joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제12권1호
    • /
    • pp.1-5
    • /
    • 2012
  • Data reduction has been used widely in data mining for convenient analysis. Principal component analysis (PCA) and factor analysis (FA) methods are popular techniques. The PCA and FA reduce the number of variables to avoid the curse of dimensionality. The curse of dimensionality is to increase the computing time exponentially in proportion to the number of variables. So, many methods have been published for dimension reduction. Also, data augmentation is another approach to analyze data efficiently. Support vector machine (SVM) algorithm is a representative technique for dimension augmentation. The SVM maps original data to a feature space with high dimension to get the optimal decision plane. Both data reduction and augmentation have been used to solve diverse problems in data analysis. In this paper, we compare the strengths and weaknesses of dimension reduction and augmentation for classification and propose a classification method using data reduction for classification. We will carry out experiments for comparative studies to verify the performance of this research.

용접결함의 패턴분류를 위한 특징변수 유효성 검증 (Availability Verification of Feature Variables for Pattern Classification on Weld Flaws)

  • 김창현;김재열;유홍연;홍성훈
    • 한국공작기계학회논문집
    • /
    • 제16권6호
    • /
    • pp.62-70
    • /
    • 2007
  • In this study, the natural flaws in welding parts are classified using the signal pattern classification method. The storage digital oscilloscope including FFT function and enveloped waveform generator is used and the signal pattern recognition procedure is made up the digital signal processing, feature extraction, feature selection and classifier design. It is composed with and discussed using the distance classifier that is based on euclidean distance the empirical Bayesian classifier. Feature extraction is performed using the class-mean scatter criteria. The signal pattern classification method is applied to the signal pattern recognition of natural flaws.

대용량 자료에서 핵심적인 소수의 변수들의 선별과 로지스틱 회귀 모형의 전개 (Screening Vital Few Variables and Development of Logistic Regression Model on a Large Data Set)

  • 임용빈;조재연;엄경아;이선아
    • 품질경영학회지
    • /
    • 제34권2호
    • /
    • pp.129-135
    • /
    • 2006
  • In the advance of computer technology, it is possible to keep all the related informations for monitoring equipments in control and huge amount of real time manufacturing data in a data base. Thus, the statistical analysis of large data sets with hundreds of thousands observations and hundred of independent variables whose some of values are missing at many observations is needed even though it is a formidable computational task. A tree structured approach to classification is capable of screening important independent variables and their interactions. In a Six Sigma project handling large amount of manufacturing data, one of the goals is to screen vital few variables among trivial many variables. In this paper we have reviewed and summarized CART, C4.5 and CHAID algorithms and proposed a simple method of screening vital few variables by selecting common variables screened by all the three algorithms. Also how to develop a logistics regression model on a large data set is discussed and illustrated through a large finance data set collected by a credit bureau for th purpose of predicting the bankruptcy of the company.

Logistic Regression Classification by Principal Component Selection

  • Kim, Kiho;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제21권1호
    • /
    • pp.61-68
    • /
    • 2014
  • We propose binary classification methods by modifying logistic regression classification. We use variable selection procedures instead of original variables to select the principal components. We describe the resulting classifiers and discuss their properties. The performance of our proposals are illustrated numerically and compared with other existing classification methods using synthetic and real datasets.