• 제목/요약/키워드: Classification Variables

검색결과 920건 처리시간 0.021초

인체측정자료의 사용성 제고를 위한 인체측정변수 분류 방법 (A Classification Method of Anthropometric Variables for Improved Usability of Anthropometric Data)

  • 유희천;신승우;류태범
    • 대한인간공학회지
    • /
    • 제23권3호
    • /
    • pp.13-24
    • /
    • 2004
  • Anthropometric data is a fundamental resource in developing ergonomic products and workplaces. However, designers often experience difficulty in searching anthropometric data relevant to the design due to the technicality of anthropometric terminologies, ambiguity in the description of measurement method for some anthropometric variables, and inefficiency of existing search methods for anthropometric data. The present study suggests a method to develop a classification system of anthropometric variables for systematic, efficient search of anthropometric data. The proposed method first classifies anthropometric variables according to body segment and type of variable, and then arranges anthropometric variables of the same body segment and variable type by comparing the heights of their reference points. The proposed classification method was applied to establish a classification system of 66 anthropometric variables that were selected for an automotive interior design. Then the established anthropometric classification system was utilized to design a search interface of a web-based anthropometric data retrieval system.

Recognition and Classification of Power Quality Disturbances on the basis of Pattern Linguistic Values

  • Liu, XiaoSheng;Liu, Bo;Xu, DianGuo
    • Journal of Electrical Engineering and Technology
    • /
    • 제11권2호
    • /
    • pp.309-319
    • /
    • 2016
  • This paper presents a new recognition and classification method for power quality (PQ) disturbances on the basis of pattern linguistic values. This method solves the difficulty of recognizing disturbances rapidly and accurately by using fuzzy logic. This method uses classification disturbance patterns to define the linguistic values of fuzzy input variables and used the input variables of corresponding disturbance pattern to set membership functions. This method also sets the fuzzy rules by analyzing the distribution regularities of the input variable values. One characteristic of this method is that the linguistic values of fuzzy input variables and the setting of membership functions are not only related to the input variables but also to the character of classification disturbance and the classification results. Furthermore, the number of fuzzy rules is equal to the number of disturbance patterns. By using this method for disturbance classification, the membership function and design of fuzzy rules are directly related to the objective of classification, thus effectively reducing the complexity of the design process and yielding accurate classification results. The classification results of the simulation and measured data verify the feasibility and effectiveness of this method.

SHAP 기반 NSL-KDD 네트워크 공격 분류의 주요 변수 분석 (Analyzing Key Variables in Network Attack Classification on NSL-KDD Dataset using SHAP)

  • 이상덕;김대규;김창수
    • 한국재난정보학회 논문집
    • /
    • 제19권4호
    • /
    • pp.924-935
    • /
    • 2023
  • Purpose: The central aim of this study is to leverage machine learning techniques for the classification of Intrusion Detection System (IDS) data, with a specific focus on identifying the variables responsible for enhancing overall performance. Method: First, we classified 'R2L(Remote to Local)' and 'U2R (User to Root)' attacks in the NSL-KDD dataset, which are difficult to detect due to class imbalance, using seven machine learning models, including Logistic Regression (LR) and K-Nearest Neighbor (KNN). Next, we use the SHapley Additive exPlanation (SHAP) for two classification models that showed high performance, Random Forest (RF) and Light Gradient-Boosting Machine (LGBM), to check the importance of variables that affect classification for each model. Result: In the case of RF, the 'service' variable and in the case of LGBM, the 'dst_host_srv_count' variable were confirmed to be the most important variables. These pivotal variables serve as key factors capable of enhancing performance in the context of classification for each respective model. Conclusion: In conclusion, this paper successfully identifies the optimal models, RF and LGBM, for classifying 'R2L' and 'U2R' attacks, while elucidating the crucial variables associated with each selected model.

Discriminant Analysis of Binary Data by Using the Maximum Entropy Distribution

  • Lee, Jung Jin;Hwang, Joon
    • Communications for Statistical Applications and Methods
    • /
    • 제10권3호
    • /
    • pp.909-917
    • /
    • 2003
  • Although many classification models have been used to classify binary data, none of the classification models dominates all varying circumstances depending on the number of variables and the size of data(Asparoukhov and Krzanowski (2001)). This paper proposes a classification model which uses information on marginal distributions of sub-variables and its maximum entropy distribution. Classification experiments by using simulation are discussed.

분류와 회귀나무분석에 관한 소고 (Note on classification and regression tree analysis)

  • 임용빈;오만숙
    • 품질경영학회지
    • /
    • 제30권1호
    • /
    • pp.152-161
    • /
    • 2002
  • The analysis of large data sets with hundreds of thousands observations and thousands of independent variables is a formidable computational task. A less parametric method, capable of identifying important independent variables and their interactions, is a tree structured approach to regression and classification. It gives a graphical and often illuminating way of looking at data in classification and regression problems. In this paper, we have reviewed and summarized tile methodology used to construct a tree, multiple trees and the sequential strategy for identifying active compounds in large chemical databases.

초고차원 다범주분류를 위한 변수선별 방법 비교 연구 (A comparative study of feature screening methods for ultrahigh dimensional multiclass classification)

  • 이경은;김경희;신승준
    • 응용통계연구
    • /
    • 제30권5호
    • /
    • pp.793-808
    • /
    • 2017
  • 본 논문에서는 초고차원 자료의 다항분류를 위한 변수선별 방법에 대해 비교 연구를 진행하였다. 다항분류를 위한 변수선별 방법에는 일대일 혹은 일대다 비교를 통해 이항분류를 위한 방법을 확장시켜 적용하는 방법과 다항 반응 변수에 직접 적용할 수 있는 방법이 있다. 다항분류를 위한 변수선별 성능을 확인하기 위하여 여러가지 상황-설명변수의 꼬리가 두꺼운 경우, 신호변수와 잡음변수가 서로 연관된 경우, 결합분포상으로 연관되어 있지만 주변분포 상으로는 연관되어 있지 않은 경우, 다범주 반응변수의 분포가 불균형인 경우-을 가정하고 모의실험을 진행하였고, 실제 자료에도 적용해 보았다. 그 결과, 모형 가정을 필요로 하지 않는 방법들이 안정적인 성능을 보이는 것을 확인하였다.

범주형 자료에 대한 데이터 마이닝 분류기법 성능 비교 (Comparison of Data Mining Classification Algorithms for Categorical Feature Variables)

  • 손소영;신형원
    • 산업공학
    • /
    • 제12권4호
    • /
    • pp.551-556
    • /
    • 1999
  • In this paper, we compare the performance of three data mining classification algorithms(neural network, decision tree, logistic regression) in consideration of various characteristics of categorical input and output data. $2^{4-1}$. 3 fractional factorial design is used to simulate the comparison situation where factors used are (1) the categorical ratio of input variables, (2) the complexity of functional relationship between the output and input variables, (3) the size of randomness in the relationship, (4) the categorical ratio of an output variable, and (5) the classification algorithm. Experimental study results indicate the following: decision tree performs better than the others when the relationship between output and input variables is simple while logistic regression is better when the other way is around; and neural network appears a better choice than the others when the randomness in the relationship is relatively large. We also use Taguchi design to improve the practicality of our study results by letting the relationship between the output and input variables as a noise factor. As a result, the classification accuracy of neural network and decision tree turns out to be higher than that of logistic regression, when the categorical proportion of the output variable is even.

  • PDF

다구찌 디자인을 이용한 데이터 퓨전 및 군집분석 분류 성능 비교 (Comparison Study for Data Fusion and Clustering Classification Performances)

  • 신형원;손소영
    • 한국경영과학회:학술대회논문집
    • /
    • 대한산업공학회/한국경영과학회 2000년도 춘계공동학술대회 논문집
    • /
    • pp.601-604
    • /
    • 2000
  • In this paper, we compare the classification performance of both data fusion and clustering algorithms (Data Bagging, Variable Selection Bagging, Parameter Combining, Clustering) to logistic regression in consideration of various characteristics of input data. Four factors used to simulate the logistic model are (1) correlation among input variables (2) variance of observation (3) training data size and (4) input-output function. Since the relationship between input & output is not typically known, we use Taguchi design to improve the practicality of our study results by letting it as a noise factor. Experimental study results indicate the following: Clustering based logistic regression turns out to provide the highest classification accuracy when input variables are weakly correlated and the variance of data is high. When there is high correlation among input variables, variable bagging performs better than logistic regression. When there is strong correlation among input variables and high variance between observations, bagging appears to be marginally better than logistic regression but was not significant.

  • PDF

環境因子의 空間分析을 통한 南韓지역의 山林植生帶 구분/지리정보시스템(GIS)에 의한 접근 (Classification of Forest Vegetation Zone over Southern Part of Korean Peninsula Using Geographic Information Systems)

  • Lee, Kyu-Sung;Byong-Chun Lee;Joon Hwan Shin
    • The Korean Journal of Ecology
    • /
    • 제19권5호
    • /
    • pp.465-476
    • /
    • 1996
  • There are several environmental variables that may be influential to the spatial distribution of forest vegetation. To create a map of forest vegetation zone over southern part of Korean Peninsula, digital map layers were produced for each of environmental variables that include topography, geographic locations, and climate. In addition, an extensive set of field survey data was collected at relatively undisturbed forests and they were introduced into the GIS database with exact coordinates of survey sites. Preliminary statistical analysis on the survey data showed that the environmental variables were significantly different among the previously defined five forest vegetation zones. Classification of the six layers of digital map representing environmental variables was carried out by a supervised classifier using the training statistics from field survey data and by a clustering algorithm. Although the maps from two classifiers were somewhat different due to the classification procedure applied, they showed overall patterns of vertical and horizontal distribution of forest zones. considering the spatial contents of many ecological studies, GIS can be used as an important tool to manage and analyze spatial data. This study discusses more about the generation of digital map and the analysis procedure rather than the outcome map of forest vegetation zone.

  • PDF

군집분석을 이용한 동굴 유형분류의 유용성에 관한 연구 (Study on Usability of Cave Type Classification using Cluster Analysis)

  • 홍현철
    • 동굴
    • /
    • 제84호
    • /
    • pp.1-9
    • /
    • 2008
  • 기존 동굴의 유형분류는 다양성을 갖지 못하고, 성인적, 형태적, 규모적 분류에 국한되어있다. 이러한 분류기준뿐만 아니라 더욱 다양한 동굴의 분류 방법이 필요하다. 이러한 문제점을 해결하는 방법으로 군집분석의 이론적 배경을 살펴보았을 때, 입력변수 선정에 따른 다양한 변수 선정을 통해 다양한 분류 방법이 가능하여 그 유용성이 매우 높다. 실제적으로 동굴의 내부환경, 주변환경 등을 고려한 (1)동굴 규모 및 형태에 따른 수치적 유형분류, (2)동굴외부의 토지이용적 입지특성에 따른 유형분류, (3)동굴의 관계적 주변 입지특성에 따른 유형분류 등이 가능하다.