• 제목/요약/키워드: Classification Variables

검색결과 920건 처리시간 0.026초

Combining cluster analysis and neural networks for the classification problem

  • Kim, Kyungsup;Han, Ingoo
    • 한국경영과학회:학술대회논문집
    • /
    • 한국경영과학회 1996년도 추계학술대회발표논문집; 고려대학교, 서울; 26 Oct. 1996
    • /
    • pp.31-34
    • /
    • 1996
  • The extensive researches have compared the performance of neural networks(NN) with those of various statistical techniques for the classification problem. The empirical results of these comparative studies have indicated that the neural networks often outperform the traditional statistical techniques. Moreover, there are some efforts that try to combine various classification methods, especially multivariate discriminant analysis with neural networks. While these efforts improve the performance, there exists a problem violating robust assumptions of multivariate discriminant analysis that are multivariate normality of the independent variables and equality of variance-covariance matrices in each of the groups. On the contrary, cluster analysis alleviates this assumption like neural networks. We propose a new approach to classification problems by combining the cluster analysis with neural networks. The resulting predictions of the composite model are more accurate than each individual technique.

  • PDF

Supervised Learning-Based Collaborative Filtering Using Market Basket Data for the Cold-Start Problem

  • Hwang, Wook-Yeon;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • 제13권4호
    • /
    • pp.421-431
    • /
    • 2014
  • The market basket data in the form of a binary user-item matrix or a binary item-user matrix can be modelled as a binary classification problem. The binary logistic regression approach tackles the binary classification problem, where principal components are predictor variables. If users or items are sparse in the training data, the binary classification problem can be considered as a cold-start problem. The binary logistic regression approach may not function appropriately if the principal components are inefficient for the cold-start problem. Assuming that the market basket data can also be considered as a special regression problem whose response is either 0 or 1, we propose three supervised learning approaches: random forest regression, random forest classification, and elastic net to tackle the cold-start problem, comparing the performance in a variety of experimental settings. The experimental results show that the proposed supervised learning approaches outperform the conventional approaches.

비지도 학습 방법을 적용한 모듈화 신경망 기반의 패턴 분류기 설계 (A Design of Cassifier Using Mudular Neural Networks with Unsupervised Learning)

  • 최종원;오경환
    • 인지과학
    • /
    • 제10권1호
    • /
    • pp.13-24
    • /
    • 1999
  • 논문에서는 모듈화 신경 을 이용한 비지도 학습방법의 분류기를 제안한다. 각 모듈은 데이터의 통계학적인 분석의 결과로 설계되어져서, 데이터의 독립적인 군집들을 나타내게 된다. 이런 신경의 독립적인 분류 결과와 근접거리 척도를 이용한 유사도 측정을 통해 더욱 정확한 분류를 가능케 하며, 오 분류를 하는 모듈을 삭제함으로써 계산 을 줄인다. 이런 과정을 통해 신경 에 사용되는 각종 변수에 대한 별다른 조사 과정 없이 최상의 성능을 발휘하는 신경 에 준 는 성능을 가진 신경 망을 구축했다.

  • PDF

R의 분류방법을 이용한 신용카드 승인 분석 비교 (A Comparison of Classification Methods for Credit Card Approval Using R)

  • 송종우
    • 품질경영학회지
    • /
    • 제36권1호
    • /
    • pp.72-79
    • /
    • 2008
  • The policy for credit card approval/disapproval is based on the applier's personal and financial information. In this paper, we will analyze 2 credit card approval data with several classification methods. We identify which variables are important factors to decide the approval of credit card. Our main tool is an open-source statistical programming environment R which is freely available from http://www.r-project.org. It is getting popular recently because of its flexibility and a lot of packages (libraries) made by R-users in the world. We will use most widely used methods, LDNQDA, Logistic Regression, CART (Classification and Regression Trees), neural network, and SVM (Support Vector Machines) for comparisons.

Neural Networks and Logistic Models for Classification: A Case Study

  • Hwang, Chang-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제7권1호
    • /
    • pp.13-19
    • /
    • 1996
  • In this paper, we study and compare two types of methods for classification when both continuous and categorical variables are used to describe each individual. One is neural network(NN) method using backpropagation learning(BPL). The other is logistic model(LM) method. Both the NN and LM are based on projections of the data in directions determined from interconnection weights.

  • PDF

Bias Reduction in Split Variable Selection in C4.5

  • Shin, Sung-Chul;Jeong, Yeon-Joo;Song, Moon Sup
    • Communications for Statistical Applications and Methods
    • /
    • 제10권3호
    • /
    • pp.627-635
    • /
    • 2003
  • In this short communication we discuss the bias problem of C4.5 in split variable selection and suggest a method to reduce the variable selection bias among categorical predictor variables. A penalty proportional to the number of categories is applied to the splitting criterion gain of C4.5. The results of empirical comparisons show that the proposed modification of C4.5 reduces the size of classification trees.

근육형 남성의 하반신 체형분류에 관한 연구 (A Study on Somatotype Classification of Muscular Men's Lower Body)

  • 정혜진;김소라
    • 대한인간공학회지
    • /
    • 제28권1호
    • /
    • pp.21-27
    • /
    • 2009
  • The purpose of this research is to understand the physiological characteristics of muscular men between the ages of 20 and 34 years who are distinct from the general population due to their muscular development, and to categorize them according to upper body somatotypes. This research was conducted in order to provide basic data necessary for developing clothing products for muscular men. The research method and results were as follows: 1. The study carried out factor analysis with the body measuring value of 168 muscular men according to the body classification method of Sheldon and Heath-Carter. The study materialized muscular men's lower body types statistically by carrying out cluster analysis, regarding scores of each factor extracted from the factor analysis as an independent variable. The study also carried out discriminant analysis with the results of cluster analysis classified so that morphological characters of each type were remarkably distinguished. 2. As the results of factor analysis, the study set up number of factors as three. Factor 1 occupied 38.149% of the total variables as a size factor of the lower body. Factor 2 occupied 20.417% of the total variables as a height and length factor of the lower body. Factor 3 occupied 8.466% of the total variables as a length factor of the hip. 3. The study classified the lower body type into three types and the characteristics by each type were as follows. Type 1 was a group with the best developed muscle in the lower of the body, considering that a size of their lower bodies was the largest. Type 2 was well-balanced muscular males though a size of the lower body was smaller than other types. This type didn't have fatness of the abdomen and large hips. Type 3 was a body type that the length from the waist to the hip was long. 4. As the results of carrying out discriminant analysis to distinguish muscular men's lower body types, the discriminant accuracy was 86.3% over all in the lower bodies.

의사결정나무 기법을 이용한 노인들의 자살생각 예측모형 및 의사결정 규칙 개발 (A Development of Suicidal Ideation Prediction Model and Decision Rules for the Elderly: Decision Tree Approach)

  • 김덕현;유동희;정대율
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제28권3호
    • /
    • pp.249-276
    • /
    • 2019
  • Purpose The purpose of this study is to develop a prediction model and decision rules for the elderly's suicidal ideation based on the Korean Welfare Panel survey data. By utilizing this data, we obtained many decision rules to predict the elderly's suicide ideation. Design/methodology/approach This study used classification analysis to derive decision rules to predict on the basis of decision tree technique. Weka 3.8 is used as the data mining tool in this study. The decision tree algorithm uses J48, also known as C4.5. In addition, 66.6% of the total data was divided into learning data and verification data. We considered all possible variables based on previous studies in predicting suicidal ideation of the elderly. Finally, 99 variables including the target variable were used. Classification analysis was performed by introducing sampling technique through backward elimination and data balancing. Findings As a result, there were significant differences between the data sets. The selected data sets have different, various decision tree and several rules. Based on the decision tree method, we derived the rules for suicide prevention. The decision tree derives not only the rules for the suicidal ideation of the depressed group, but also the rules for the suicidal ideation of the non-depressed group. In addition, in developing the predictive model, the problem of over-fitting due to the data imbalance phenomenon was directly identified through the application of data balancing. We could conclude that it is necessary to balance the data on the target variables in order to perform the correct classification analysis without over-fitting. In addition, although data balancing is applied, it is shown that performance is not inferior in prediction rate when compared with a biased prediction model.

직교요인을 이용한 국소선형 로지스틱 마이크로어레이 자료의 판별분석 (Local Linear Logistic Classification of Microarray Data Using Orthogonal Components)

  • 백장선;손영숙
    • 응용통계연구
    • /
    • 제19권3호
    • /
    • pp.587-598
    • /
    • 2006
  • 본 논문에서는 마이크로어레이 (microarray) 자료에 판별분석을 적용 시 나타나는 고차원 및 소표본 문제의 해결방법으로서 직교요인을 새로운 특징변수로 사용한 비모수적 국소선형 로지스틱 판별분석을 제안한다. 제안된 방법은 국소우도에 기반한 것으로서 다범주 판별분석에 적용될 수 있으며, 고려된 직교인자는 주성분 요인, 부분최소제곱 요인, 인자분석 요인 등이다. 대표적인 두 가지 실제 마이크로어레이 자료에 적용한 결과 직교요인들 중에서 부분최소제곱 요인을 특징변수로 사용한 경우 고전적인 통계적 판별분석보다 향상된 분류 능력을 나타내고 있음을 확인하였다.

인공신경망을 이용한 소비자 선택 예측에 관한 연구 (A study on forecasting of consumers' choice using artificial neural network)

  • 송수섭;이의훈
    • 한국경영과학회지
    • /
    • 제26권4호
    • /
    • pp.55-70
    • /
    • 2001
  • Artificial neural network(ANN) models have been widely used for the classification problems in business such as bankruptcy prediction, credit evaluation, etc. Although the application of ANN to classification of consumers' choice behavior is a promising research area, there have been only a few researches. In general, most of the researches have reported that the classification performance of the ANN models were better than conventional statistical model Because the survey data on consumer behavior may include much noise and missing data, ANN model will be more robust than conventional statistical models welch need various assumptions. The purpose of this paper is to study the potential of the ANN model for forecasting consumers' choice behavior based on survey data. The data was collected by questionnaires to the shoppers of department stores and discount stores. Then the correct classification rates of the ANN models for the training and test sample with that of multiple discriminant analysis(MDA) and logistic regression(Logit) model. The performance of the ANN models were betted than the performance of the MDA and Logit model with respect to correct classification rate. By using input variables identified as significant in the stepwise MDA, the performance of the ANN models were improved.

  • PDF