• Title/Summary/Keyword: Classification Variables

Search Result 932, Processing Time 0.026 seconds

Logistic Regression Classification by Principal Component Selection

  • Kim, Kiho;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.1
    • /
    • pp.61-68
    • /
    • 2014
  • We propose binary classification methods by modifying logistic regression classification. We use variable selection procedures instead of original variables to select the principal components. We describe the resulting classifiers and discuss their properties. The performance of our proposals are illustrated numerically and compared with other existing classification methods using synthetic and real datasets.

Robust Variable Selection in Classification Tree

  • Jang Jeong Yee;Jeong Kwang Mo
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2001.11a
    • /
    • pp.89-94
    • /
    • 2001
  • In this study we focus on variable selection in decision tree growing structure. Some of the splitting rules and variable selection algorithms are discussed. We propose a competitive variable selection method based on Kruskal-Wallis test, which is a nonparametric version of ANOVA F-test. Through a Monte Carlo study we note that CART has serious bias in variable selection towards categorical variables having many values, and also QUEST using F-test is not so powerful to select informative variables under heavy tailed distributions.

  • PDF

Characterization of Korean Clays and Pottery by Neutron Activation Analysis(II). Characterization of Korean Potsherds

  • Lee, Chul;Kwun, Oh-Cheun;Kim, Seung-Won;Lee, Ihn-Chong;Kim, Nak-Bae
    • Bulletin of the Korean Chemical Society
    • /
    • v.7 no.5
    • /
    • pp.347-353
    • /
    • 1986
  • Fisher's discriminant method has been applied to the problem of the classification of Korean potsherds, using their elemental composition as analyzed by neutron activation analysis. A combination of analytical data by means of statistical linear discriminant analysis has resulted in removal of redundant variables, optimal linear combination of meaningful variables and formulation of classification rules.

An Empirical Study on the Relationship between Market Feasibility Levels and Technology Variables from Technology Competitiveness Assessment (기술력평가에서 사업성수준과 기술성변수간 연관성에 관한 실증연구)

  • Sung Oong-Hyun
    • Journal of Korean Society for Quality Management
    • /
    • v.32 no.3
    • /
    • pp.198-215
    • /
    • 2004
  • Technology competitiveness evaluates environmental and engineered technology and process at both the scientific and market levels. There are increasing concerns to measure the effects of the technology variables on the potential market feasibility levels. However, there are very little empirical analysis studies on that issue. This study investigates the impacts of technology variables on the levels of market feasibility based on 230 data obtained from Korea Technology Transfer Center. As various statistical analysis, the canonical discriminant model, logit discriminant model and classification model were used and their results were compared. This study results showed that major technology variables had very significant relations to discriminate high and low categories of market feasibility. Finally, this study will help building management strategies to level up the potential market performance and also help financial Institutions to decide funds needed for small-sized technology firms.

Splitting Decision Tree Nodes with Multiple Target Variables (의사결정나무에서 다중 목표변수를 고려한)

  • 김성준
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.05a
    • /
    • pp.243-246
    • /
    • 2003
  • Data mining is a process of discovering useful patterns for decision making from an amount of data. It has recently received much attention in a wide range of business and engineering fields Classifying a group into subgroups is one of the most important subjects in data mining Tree-based methods, known as decision trees, provide an efficient way to finding classification models. The primary concern in tree learning is to minimize a node impurity, which is evaluated using a target variable in the data set. However, there are situations where multiple target variables should be taken into account, for example, such as manufacturing process monitoring, marketing science, and clinical and health analysis. The purpose of this article is to present several methods for measuring the node impurity, which are applicable to data sets with multiple target variables. For illustrations, numerical examples are given with discussion.

  • PDF

A Study on the Prediction Model of the Elderly Depression

  • SEO, Beom-Seok;SUH, Eung-Kyo;KIM, Tae-Hyeong
    • The Journal of Industrial Distribution & Business
    • /
    • v.11 no.7
    • /
    • pp.29-40
    • /
    • 2020
  • Purpose: In modern society, many urban problems are occurring, such as aging, hollowing out old city centers and polarization within cities. In this study, we intend to apply big data and machine learning methodologies to predict depression symptoms in the elderly population early on, thus contributing to solving the problem of elderly depression. Research design, data and methodology: Machine learning techniques used random forest and analyzed the correlation between CES-D10 and other variables, which are widely used worldwide, to estimate important variables. Dependent variables were set up as two variables that distinguish normal/depression from moderate/severe depression, and a total of 106 independent variables were included, including subjective health conditions, cognitive abilities, and daily life quality surveys, as well as the objective characteristics of the elderly as well as the subjective health, health, employment, household background, income, consumption, assets, subjective expectations, and quality of life surveys. Results: Studies have shown that satisfaction with residential areas and quality of life and cognitive ability scores have important effects in classifying elderly depression, satisfaction with living quality and economic conditions, and number of outpatient care in living areas and clinics have been important variables. In addition, the results of a random forest performance evaluation, the accuracy of classification model that classify whether elderly depression or not was 86.3%, the sensitivity 79.5%, and the specificity 93.3%. And the accuracy of classification model the degree of elderly depression was 86.1%, sensitivity 93.9% and specificity 74.7%. Conclusions: In this study, the important variables of the estimated predictive model were identified using the random forest technique and the study was conducted with a focus on the predictive performance itself. Although there are limitations in research, such as the lack of clear criteria for the classification of depression levels and the failure to reflect variables other than KLoSA data, it is expected that if additional variables are secured in the future and high-performance predictive models are estimated and utilized through various machine learning techniques, it will be able to consider ways to improve the quality of life of senior citizens through early detection of depression and thus help them make public policy decisions.

A Study on the Optimal Discriminant Model Predicting the likelihood of Insolvency for Technology Financing (기술금융을 위한 부실 가능성 예측 최적 판별모형에 대한 연구)

  • Sung, Oong-Hyun
    • Journal of Korea Technology Innovation Society
    • /
    • v.10 no.2
    • /
    • pp.183-205
    • /
    • 2007
  • An investigation was undertaken of the optimal discriminant model for predicting the likelihood of insolvency in advance for medium-sized firms based on the technology evaluation. The explanatory variables included in the discriminant model were selected by both factor analysis and discriminant analysis using stepwise selection method. Five explanatory variables were selected in factor analysis in terms of explanatory ratio and communality. Six explanatory variables were selected in stepwise discriminant analysis. The effectiveness of linear discriminant model and logistic discriminant model were assessed by the criteria of the critical probability and correct classification rate. Result showed that both model had similar correct classification rate and the linear discriminant model was preferred to the logistic discriminant model in terms of criteria of the critical probability In case of the linear discriminant model with critical probability of 0.5, the total-group correct classification rate was 70.4% and correct classification rates of insolvent and solvent groups were 73.4% and 69.5% respectively. Correct classification rate is an estimate of the probability that the estimated discriminant function will correctly classify the present sample. However, the actual correct classification rate is an estimate of the probability that the estimated discriminant function will correctly classify a future observation. Unfortunately, the correct classification rate underestimates the actual correct classification rate because the data set used to estimate the discriminant function is also used to evaluate them. The cross-validation method were used to estimate the bias of the correct classification rate. According to the results the estimated bias were 2.9% and the predicted actual correct classification rate was 67.5%. And a threshold value is set to establish an in-doubt category. Results of linear discriminant model can be applied for the technology financing banks to evaluate the possibility of insolvency and give the ranking of the firms applied.

  • PDF

Optimum Design of Ship Design System Using Neural Network Method in Initial Design of Hull Plate

  • Kim, Soo-Young;Moon, Byung-Young;Kim, Duk-Eun
    • Journal of Mechanical Science and Technology
    • /
    • v.18 no.11
    • /
    • pp.1923-1931
    • /
    • 2004
  • Manufacturing of complex surface plates in stern and stem is a major factor in cost of a preliminary ship design by computing process. If these hull plate parts are effectively classified, it helps to compute the processing cost and find the way to cut-down the processing cost. This paper presents a new method to classify surface plates effectively in the preliminary ship design using neural network. A neural-network-based ship hull plate classification program was developed and tested for the automatic classification of ship design. The input variables are regarded as Gaussian curvature distributions on the plate. Various applicable rules of network topology are applied in the ship design. In automation of hull plate classification, two different numbers of input variables are used. By observing the results of the proposed method, the effectiveness of the proposed method is discussed. As a result, high prediction rate was achieved in the ship design. Accordingly, to the initial design stage, the ship hull plate classification program can be used to predict the ship production cost. And the proposed method will contribute to reduce the production cost of ship.

Using CART to Evaluate Performance of Tree Model (CART를 이용한 Tree Model의 성능평가)

  • Jung, Yong Gyu;Kwon, Na Yeon;Lee, Young Ho
    • Journal of Service Research and Studies
    • /
    • v.3 no.1
    • /
    • pp.9-16
    • /
    • 2013
  • Data analysis is the universal classification techniques, which requires a lot of effort. It can be easily analyzed to understand the results. Decision tree which is developed by Breiman can be the most representative methods. There are two core contents in decision tree. One of the core content is to divide dimensional space of the independent variables repeatedly, Another is pruning using the data for evaluation. In classification problem, the response variables are categorical variables. It should be repeatedly splitting the dimension of the variable space into a multidimensional rectangular non overlapping share. Where the continuous variables, binary, or a scale of sequences, etc. varies. In this paper, we obtain the coefficients of precision, reproducibility and accuracy of the classification tree to classify and evaluate the performance of the new cases, and through experiments to evaluate.

  • PDF

Theoretical Classification of the Clothing Evaluative Criteria (의복평가기준의 이론적 분류기준)

  • 김미영
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.19 no.6
    • /
    • pp.857-865
    • /
    • 1995
  • The main purposes of this study were to find out the new classification system of the clothing evaluative criteria(CEC), 3nd to clear up the relationshiops of new classification system and the existing classification systems. For this purpose, the existing literatures related with the CEC(the classification system, and the variables) were investigated. The results of the study were as follows: 1. The existing classification systems were 'the intrinsic. non-intrinsic classification', 'the level classification', 'the purchase process classification' The new classification system of the CEC is based on 'the view-point of subjets'. The system was divided into the point of clothing itself, the wearer, the other, the wearing situation. The wearer's point of view is divided into the point of the value, and the physical characteristics of wearer 2. The image was included as the concept of the CEC, and the image classification could be suggested. 3. The relationships of the classification systems were as follows: $\circled1$ The intrinsic. non-intrinsic classification system included the level classification, the view-point classification, the image classification, and the buying process classificstion. $\circled2$ The level classification, the view.point classification, and the image classification were linked mutually, but the buying process classification is seperated from these classifications.

  • PDF