• Title/Summary/Keyword: Impurity Measures

Search Result 7, Processing Time 0.018 seconds

Feature Selection for Multi-Class Support Vector Machines Using an Impurity Measure of Classification Trees: An Application to the Credit Rating of S&P 500 Companies

  • Hong, Tae-Ho;Park, Ji-Young
    • Asia pacific journal of information systems
    • /
    • v.21 no.2
    • /
    • pp.43-58
    • /
    • 2011
  • Support vector machines (SVMs), a machine learning technique, has been applied to not only binary classification problems such as bankruptcy prediction but also multi-class problems such as corporate credit ratings. However, in general, the performance of SVMs can be easily worse than the best alternative model to SVMs according to the selection of predictors, even though SVMs has the distinguishing feature of successfully classifying and predicting in a lot of dichotomous or multi-class problems. For overcoming the weakness of SVMs, this study has proposed an approach for selecting features for multi-class SVMs that utilize the impurity measures of classification trees. For the selection of the input features, we employed the C4.5 and CART algorithms, including the stepwise method of discriminant analysis, which is a well-known method for selecting features. We have built a multi-class SVMs model for credit rating using the above method and presented experimental results with data regarding S&P 500 companies.

An Analysis of the Complexity Measurement Factor for a Program (프로그램에 대한 복잡도 측정인자 분석)

  • 이규범;송정영
    • Journal of Internet Computing and Services
    • /
    • v.3 no.4
    • /
    • pp.61-69
    • /
    • 2002
  • Measurement of the object, messages, clones, capsulation, inheritance, etc. that are conventional object-oriented paradigm characteristics as a method of measurement of the complexity of object-oriented programs has been reported. In this paper, the measures that are helpful to designing and coding of JAVA program, which is the representative language of object-oriented programs, are applied to six measures(i,e., Halstead's Program Volume, Program Level, Program Impurity, Macabe's Cyclomatic Number, Handerson-Seller's lock of cohesion in method and Sullivan's PVG.) suggested in the present study by referring to several actual programs as example for comparative analysis.

  • PDF

Splitting Decision Tree Nodes with Multiple Target Variables (의사결정나무에서 다중 목표변수를 고려한)

  • 김성준
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.05a
    • /
    • pp.243-246
    • /
    • 2003
  • Data mining is a process of discovering useful patterns for decision making from an amount of data. It has recently received much attention in a wide range of business and engineering fields Classifying a group into subgroups is one of the most important subjects in data mining Tree-based methods, known as decision trees, provide an efficient way to finding classification models. The primary concern in tree learning is to minimize a node impurity, which is evaluated using a target variable in the data set. However, there are situations where multiple target variables should be taken into account, for example, such as manufacturing process monitoring, marketing science, and clinical and health analysis. The purpose of this article is to present several methods for measuring the node impurity, which are applicable to data sets with multiple target variables. For illustrations, numerical examples are given with discussion.

  • PDF

Selecting the optimal threshold based on impurity index in imbalanced classification (불균형 자료에서 불순도 지수를 활용한 분류 임계값 선택)

  • Jang, Shuin;Yeo, In-Kwon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.711-721
    • /
    • 2021
  • In this paper, we propose the method of adjusting thresholds using impurity indices in classification analysis on imbalanced data. Suppose the minority category is Positive and the majority category is Negative for the imbalanced binomial data. When categories are determined based on the commonly used 0.5 basis, the specificity tends to be high in unbalanced data while the sensitivity is relatively low. Increasing sensitivity is important when proper classification of objects in minority categories is relatively important. We explore how to increase sensitivity through adjusting thresholds. Existing studies have adjusted thresholds based on measures such as G-Mean and F1-score, but in this paper, we propose a method to select optimal thresholds using the chi-square statistic of CHAID, the Gini index of CART, and the entropy of C4.5. We also introduce how to get a possible unique value when multiple optimal thresholds are obtained. Empirical analysis shows what improvements have been made compared to the results based on 0.5 through classification performance metrics.

Optimum Design of the Interdigitated CB Structure

  • qiang, Yang-Hong;bi, Chen-Xing
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.2 no.3
    • /
    • pp.233-236
    • /
    • 2002
  • Some measures are provided for the optimum design of specific on-resistance $R_{on}$ and breakdown-voltage $V_B$ of interdigitated CB (Composite Buffer) MOSFET, including introducing opposite type impurity into the P region near the $N_+$contact, separating P region from N region with an oxide film, and a groove in the N region near the $P_+$ contact. The new relationship between the $R_{on}$ and $V_B$, which proved by numerical device simulation, are more exact and minute than the qualitative results before.

Performance Evaluation of a Feature-Importance-based Feature Selection Method for Time Series Prediction

  • Hyun, Ahn
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.1
    • /
    • pp.82-89
    • /
    • 2023
  • Various machine-learning models may yield high predictive power for massive time series for time series prediction. However, these models are prone to instability in terms of computational cost because of the high dimensionality of the feature space and nonoptimized hyperparameter settings. Considering the potential risk that model training with a high-dimensional feature set can be time-consuming, we evaluate a feature-importance-based feature selection method to derive a tradeoff between predictive power and computational cost for time series prediction. We used two machine learning techniques for performance evaluation to generate prediction models from a retail sales dataset. First, we ranked the features using impurity- and Local Interpretable Model-agnostic Explanations (LIME) -based feature importance measures in the prediction models. Then, the recursive feature elimination method was applied to eliminate unimportant features sequentially. Consequently, we obtained a subset of features that could lead to reduced model training time while preserving acceptable model performance.

Theoretical Analysis and Effect of Condenser In-leakage in the Secondary Systems of YGN-1, 2 (영광-1, 2호기 2차계통 복수기누설의 이론적 분석 및 영향평가)

  • Suk, Tae-Won;Lee, Yong-Woo;Kim, Hong-Tae;Park, Sang-Hoon
    • Nuclear Engineering and Technology
    • /
    • v.23 no.3
    • /
    • pp.299-305
    • /
    • 1991
  • Corrosive environment may be generated within steam generators from condenser cooling water in-leakage. Theoretical analysis of the accumulation of chloride as a sea water impurity is being carried out for the condenser cooling water used at YGN-1,2 nuclear power stations. Calculations have shown that highly concentrated chloride solution would be produced within the steam generators in the case of sea water in-leakage. Maximum allowable design condenser leak rate(0.5 gpm) leads chloride concentration of 2.3 ppm at steam generetor and 0.6 ppm at hotwell with the maximum blowdown rate and condensate purification. Concentration factor at steam generator is dependent only on both blowdoum rate and condensate purification efficiency as follows, Concentration Factor(equation omitted)(B$\neq$O) Blowdown and condensate purification are evaluated as the only effective measures to remove impurities from the secondary systems.

  • PDF