• Title/Summary/Keyword: Classification Variables

Search Result 920, Processing Time 0.02 seconds

A Classification Method of Anthropometric Variables for Improved Usability of Anthropometric Data (인체측정자료의 사용성 제고를 위한 인체측정변수 분류 방법)

  • Yu, Hui-Cheon;Sin, Seung-U;Ryu, Tae-Beom
    • Journal of the Ergonomics Society of Korea
    • /
    • v.23 no.3
    • /
    • pp.13-24
    • /
    • 2004
  • Anthropometric data is a fundamental resource in developing ergonomic products and workplaces. However, designers often experience difficulty in searching anthropometric data relevant to the design due to the technicality of anthropometric terminologies, ambiguity in the description of measurement method for some anthropometric variables, and inefficiency of existing search methods for anthropometric data. The present study suggests a method to develop a classification system of anthropometric variables for systematic, efficient search of anthropometric data. The proposed method first classifies anthropometric variables according to body segment and type of variable, and then arranges anthropometric variables of the same body segment and variable type by comparing the heights of their reference points. The proposed classification method was applied to establish a classification system of 66 anthropometric variables that were selected for an automotive interior design. Then the established anthropometric classification system was utilized to design a search interface of a web-based anthropometric data retrieval system.

Recognition and Classification of Power Quality Disturbances on the basis of Pattern Linguistic Values

  • Liu, XiaoSheng;Liu, Bo;Xu, DianGuo
    • Journal of Electrical Engineering and Technology
    • /
    • v.11 no.2
    • /
    • pp.309-319
    • /
    • 2016
  • This paper presents a new recognition and classification method for power quality (PQ) disturbances on the basis of pattern linguistic values. This method solves the difficulty of recognizing disturbances rapidly and accurately by using fuzzy logic. This method uses classification disturbance patterns to define the linguistic values of fuzzy input variables and used the input variables of corresponding disturbance pattern to set membership functions. This method also sets the fuzzy rules by analyzing the distribution regularities of the input variable values. One characteristic of this method is that the linguistic values of fuzzy input variables and the setting of membership functions are not only related to the input variables but also to the character of classification disturbance and the classification results. Furthermore, the number of fuzzy rules is equal to the number of disturbance patterns. By using this method for disturbance classification, the membership function and design of fuzzy rules are directly related to the objective of classification, thus effectively reducing the complexity of the design process and yielding accurate classification results. The classification results of the simulation and measured data verify the feasibility and effectiveness of this method.

Analyzing Key Variables in Network Attack Classification on NSL-KDD Dataset using SHAP (SHAP 기반 NSL-KDD 네트워크 공격 분류의 주요 변수 분석)

  • Sang-duk Lee;Dae-gyu Kim;Chang Soo Kim
    • Journal of the Society of Disaster Information
    • /
    • v.19 no.4
    • /
    • pp.924-935
    • /
    • 2023
  • Purpose: The central aim of this study is to leverage machine learning techniques for the classification of Intrusion Detection System (IDS) data, with a specific focus on identifying the variables responsible for enhancing overall performance. Method: First, we classified 'R2L(Remote to Local)' and 'U2R (User to Root)' attacks in the NSL-KDD dataset, which are difficult to detect due to class imbalance, using seven machine learning models, including Logistic Regression (LR) and K-Nearest Neighbor (KNN). Next, we use the SHapley Additive exPlanation (SHAP) for two classification models that showed high performance, Random Forest (RF) and Light Gradient-Boosting Machine (LGBM), to check the importance of variables that affect classification for each model. Result: In the case of RF, the 'service' variable and in the case of LGBM, the 'dst_host_srv_count' variable were confirmed to be the most important variables. These pivotal variables serve as key factors capable of enhancing performance in the context of classification for each respective model. Conclusion: In conclusion, this paper successfully identifies the optimal models, RF and LGBM, for classifying 'R2L' and 'U2R' attacks, while elucidating the crucial variables associated with each selected model.

Discriminant Analysis of Binary Data by Using the Maximum Entropy Distribution

  • Lee, Jung Jin;Hwang, Joon
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.909-917
    • /
    • 2003
  • Although many classification models have been used to classify binary data, none of the classification models dominates all varying circumstances depending on the number of variables and the size of data(Asparoukhov and Krzanowski (2001)). This paper proposes a classification model which uses information on marginal distributions of sub-variables and its maximum entropy distribution. Classification experiments by using simulation are discussed.

Note on classification and regression tree analysis (분류와 회귀나무분석에 관한 소고)

  • 임용빈;오만숙
    • Journal of Korean Society for Quality Management
    • /
    • v.30 no.1
    • /
    • pp.152-161
    • /
    • 2002
  • The analysis of large data sets with hundreds of thousands observations and thousands of independent variables is a formidable computational task. A less parametric method, capable of identifying important independent variables and their interactions, is a tree structured approach to regression and classification. It gives a graphical and often illuminating way of looking at data in classification and regression problems. In this paper, we have reviewed and summarized tile methodology used to construct a tree, multiple trees and the sequential strategy for identifying active compounds in large chemical databases.

A comparative study of feature screening methods for ultrahigh dimensional multiclass classification (초고차원 다범주분류를 위한 변수선별 방법 비교 연구)

  • Lee, Kyungeun;Kim, Kyoung Hee;Shin, Seung Jun
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.793-808
    • /
    • 2017
  • We compare various variable screening methods on multiclass classification problems when the data is ultrahigh-dimensional. Two different approaches were considered: (1) pairwise extension from binary classification via one versus one or one versus rest comparisons and (2) direct classification of multiclass responses. We conducted extensive simulation studies under different conditions: heavy tailed explanatory variables, correlated signal and noise variables, correlated joint distributions but uncorrelated marginals, and unbalanced response variables. We then analyzed real data to examine the performance of the methods. The results showed that model-free methods perform better for multiclass classification problems as well as binary ones.

Comparison of Data Mining Classification Algorithms for Categorical Feature Variables (범주형 자료에 대한 데이터 마이닝 분류기법 성능 비교)

  • Sohn, So-Young;Shin, Hyung-Won
    • IE interfaces
    • /
    • v.12 no.4
    • /
    • pp.551-556
    • /
    • 1999
  • In this paper, we compare the performance of three data mining classification algorithms(neural network, decision tree, logistic regression) in consideration of various characteristics of categorical input and output data. $2^{4-1}$. 3 fractional factorial design is used to simulate the comparison situation where factors used are (1) the categorical ratio of input variables, (2) the complexity of functional relationship between the output and input variables, (3) the size of randomness in the relationship, (4) the categorical ratio of an output variable, and (5) the classification algorithm. Experimental study results indicate the following: decision tree performs better than the others when the relationship between output and input variables is simple while logistic regression is better when the other way is around; and neural network appears a better choice than the others when the randomness in the relationship is relatively large. We also use Taguchi design to improve the practicality of our study results by letting the relationship between the output and input variables as a noise factor. As a result, the classification accuracy of neural network and decision tree turns out to be higher than that of logistic regression, when the categorical proportion of the output variable is even.

  • PDF

Comparison Study for Data Fusion and Clustering Classification Performances (다구찌 디자인을 이용한 데이터 퓨전 및 군집분석 분류 성능 비교)

  • 신형원;손소영
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2000.04a
    • /
    • pp.601-604
    • /
    • 2000
  • In this paper, we compare the classification performance of both data fusion and clustering algorithms (Data Bagging, Variable Selection Bagging, Parameter Combining, Clustering) to logistic regression in consideration of various characteristics of input data. Four factors used to simulate the logistic model are (1) correlation among input variables (2) variance of observation (3) training data size and (4) input-output function. Since the relationship between input & output is not typically known, we use Taguchi design to improve the practicality of our study results by letting it as a noise factor. Experimental study results indicate the following: Clustering based logistic regression turns out to provide the highest classification accuracy when input variables are weakly correlated and the variance of data is high. When there is high correlation among input variables, variable bagging performs better than logistic regression. When there is strong correlation among input variables and high variance between observations, bagging appears to be marginally better than logistic regression but was not significant.

  • PDF

Classification of Forest Vegetation Zone over Southern Part of Korean Peninsula Using Geographic Information Systems (環境因子의 空間分析을 통한 南韓지역의 山林植生帶 구분/지리정보시스템(GIS)에 의한 접근)

  • Lee, Kyu-Sung;Byong-Chun Lee;Joon Hwan Shin
    • The Korean Journal of Ecology
    • /
    • v.19 no.5
    • /
    • pp.465-476
    • /
    • 1996
  • There are several environmental variables that may be influential to the spatial distribution of forest vegetation. To create a map of forest vegetation zone over southern part of Korean Peninsula, digital map layers were produced for each of environmental variables that include topography, geographic locations, and climate. In addition, an extensive set of field survey data was collected at relatively undisturbed forests and they were introduced into the GIS database with exact coordinates of survey sites. Preliminary statistical analysis on the survey data showed that the environmental variables were significantly different among the previously defined five forest vegetation zones. Classification of the six layers of digital map representing environmental variables was carried out by a supervised classifier using the training statistics from field survey data and by a clustering algorithm. Although the maps from two classifiers were somewhat different due to the classification procedure applied, they showed overall patterns of vertical and horizontal distribution of forest zones. considering the spatial contents of many ecological studies, GIS can be used as an important tool to manage and analyze spatial data. This study discusses more about the generation of digital map and the analysis procedure rather than the outcome map of forest vegetation zone.

  • PDF

Study on Usability of Cave Type Classification using Cluster Analysis (군집분석을 이용한 동굴 유형분류의 유용성에 관한 연구)

  • Hong, Hyun-Cheol
    • Journal of the Speleological Society of Korea
    • /
    • no.84
    • /
    • pp.1-9
    • /
    • 2008
  • Since the existing cave type classification has no variety but was limited to the structural, genetical and dimensional classification, we need the new cave type classification. When we analyze the theoretical background of cluster analysis, the cave type can be classified in consideration of diverse variables depending on the selection of variables to use and the usability of such classification is very high. With the practical consideration on the internal environment of cave and surrounding environment, three classifications are available; first, numerical classification by the dimension and form of cave; second, classification by the use of land out of the cave and geographic features; third, classification by the feature of location related to the surrounding areas of cave.