• 제목/요약/키워드: 최근접 이웃 분류

검색결과 77건 처리시간 0.028초

A Comparison of Ensemble Methods Combining Resampling Techniques for Class Imbalanced Data (데이터 전처리와 앙상블 기법을 통한 불균형 데이터의 분류모형 비교 연구)

  • Leea, Hee-Jae;Lee, Sungim
    • The Korean Journal of Applied Statistics
    • /
    • 제27권3호
    • /
    • pp.357-371
    • /
    • 2014
  • There are many studies related to imbalanced data in which the class distribution is highly skewed. To address the problem of imbalanced data, previous studies deal with resampling techniques which correct the skewness of the class distribution in each sampled subset by using under-sampling, over-sampling or hybrid-sampling such as SMOTE. Ensemble methods have also alleviated the problem of class imbalanced data. In this paper, we compare around a dozen algorithms that combine the ensemble methods and resampling techniques based on simulated data sets generated by the Backbone model, which can handle the imbalance rate. The results on various real imbalanced data sets are also presented to compare the effectiveness of algorithms. As a result, we highly recommend the resampling technique combining ensemble methods for imbalanced data in which the proportion of the minority class is less than 10%. We also find that each ensemble method has a well-matched sampling technique. The algorithms which combine bagging or random forest ensembles with random undersampling tend to perform well; however, the boosting ensemble appears to perform better with over-sampling. All ensemble methods combined with SMOTE outperform in most situations.

Variational Bayesian multinomial probit model with Gaussian process classification on mice protein expression level data (가우시안 과정 분류에 대한 변분 베이지안 다항 프로빗 모형: 쥐 단백질 발현 데이터에의 적용)

  • Donghyun Son;Beom Seuk Hwang
    • The Korean Journal of Applied Statistics
    • /
    • 제36권2호
    • /
    • pp.115-127
    • /
    • 2023
  • Multinomial probit model is a popular model for multiclass classification and choice model. Markov chain Monte Carlo (MCMC) method is widely used for estimating multinomial probit model, but its computational cost is high. However, it is well known that variational Bayesian approximation is more computationally efficient than MCMC, because it uses subsets of samples. In this study, we describe multinomial probit model with Gaussian process classification and how to employ variational Bayesian approximation on the model. This study also compares the results of variational Bayesian multinomial probit model to the results of naive Bayes, K-nearest neighbors and support vector machine for the UCI mice protein expression level data.

Evaluation of Classification Models of Mild Left Ventricular Diastolic Dysfunction by Tei Index (Tei Index를 이용한 경도의 좌심실 이완 기능 장애 분류 모델 평가)

  • Su-Min Kim;Soo-Young Ye
    • Journal of the Korean Society of Radiology
    • /
    • 제17권5호
    • /
    • pp.761-766
    • /
    • 2023
  • In this paper, TI was measured to classify the presence or absence of mild left ventricular diastolic dysfunction. Of the total 306 data, 206 were used as training data and 100 were used as test data, and the machine learning models used for classification used SVM and KNN. As a result, it was confirmed that SVM showed relatively higher accuracy than KNN and was more useful in diagnosing the presence of left ventricular diastolic dysfunction. In future research, it is expected that classification performance can be further improved by adding various indicators that evaluate not only TI but also cardiac function and securing more data. Furthermore, it is expected to be used as basic data to predict and classify other diseases and solve the problem of insufficient medical manpower compared to the increasing number of tests.

Cancer Diagnosis System using Genetic Algorithm and Multi-boosting Classifier (Genetic Algorithm과 다중부스팅 Classifier를 이용한 암진단 시스템)

  • Ohn, Syng-Yup;Chi, Seung-Do
    • Journal of the Korea Society for Simulation
    • /
    • 제20권2호
    • /
    • pp.77-85
    • /
    • 2011
  • It is believed that the anomalies or diseases of human organs are identified by the analysis of the patterns. This paper proposes a new classification technique for the identification of cancer disease using the proteome patterns obtained from two-dimensional polyacrylamide gel electrophoresis(2-D PAGE). In the new classification method, three different classification methods such as support vector machine(SVM), multi-layer perceptron(MLP) and k-nearest neighbor(k-NN) are extended by multi-boosting method in an array of subclassifiers and the results of each subclassifier are merged by ensemble method. Genetic algorithm was applied to obtain optimal feature set in each subclassifier. We applied our method to empirical data set from cancer research and the method showed the better accuracy and more stable performance than single classifier.

An Algorithm of Curved Hull Plates Classification for the Curved Hull Plates Forming Process (곡가공 프로세스를 고려한 곡판 분류 알고리즘)

  • Noh, Ja-Ckyou;Shin, Jong-Gye
    • Journal of the Society of Naval Architects of Korea
    • /
    • 제46권6호
    • /
    • pp.675-687
    • /
    • 2009
  • In general, the forming process of the curved hull plates consists of sub tasks, such as roll bending, line heating, and triangle heating. In order to complement the automated curved hull forming system, it is necessary to develop an algorithm to classify the curved hull plates of a ship into standard shapes with respect to the techniques of forming task, such as the roll bending, the line heating, and the triangle heating. In this paper, the curved hull plates are classified by four standard shapes and the combination of them, or saddle, convex, flat, cylindrical shape, and the combination of them, that are related to the forming tasks necessary to form the shapes. In preprocessing, the Gaussian curvature and the mean curvature at the mid-point of a mesh of modeling surface by Coon's patch are calculated. Then the nearest neighbor method to classify the input plate type is applied. Tests to verify the developed algorithm with sample plates of a real ship data have been performed.

Performance comparison of machine learning classification methods for decision of disc cutter replacement of shield TBM (쉴드 TBM 디스크 커터 교체 유무 판단을 위한 머신러닝 분류기법 성능 비교)

  • Kim, Yunhee;Hong, Jiyeon;Kim, Bumjoo
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • 제22권5호
    • /
    • pp.575-589
    • /
    • 2020
  • In recent years, Shield TBM construction has been continuously increasing in domestic tunnels. The main excavation tool in the shield TBM construction is a disc cutter which naturally wears during the excavation process and significantly degrades the excavation efficiency. Therefore, it is important to know the appropriate time of the disc cutter replacement. In this study, it is proposed a predictive model that can determine yes/no of disc cutter replacement using machine learning algorithm. To do this, the shield TBM machine data which is highly correlated to the disc cutter wears and the disc cutter replacement from the shield TBM field which is already constructed are used as the input data in the model. Also, the algorithms used in the study were the support vector machine, k-nearest neighbor algorithm, and decision tree algorithm are all classification methods used in machine learning. In order to construct an optimal predictive model and to evaluate the performance of the model, the classification performance evaluation index was compared and analyzed.

Development of Interactive Content Services through an Intelligent IoT Mirror System (지능형 IoT 미러 시스템을 활용한 인터랙티브 콘텐츠 서비스 구현)

  • Jung, Wonseok;Seo, Jeongwook
    • Journal of Advanced Navigation Technology
    • /
    • 제22권5호
    • /
    • pp.472-477
    • /
    • 2018
  • In this paper, we develop interactive content services for preventing depression of users through an intelligent Internet of Things(IoT) mirror system. For interactive content services, an IoT mirror device measures attention and meditation data from an EEG headset device and also measures facial expression data such as "sad", "angery", "disgust", "neutral", " happy", and "surprise" classified by a multi-layer perceptron algorithm through an webcam. Then, it sends the measured data to an oneM2M-compliant IoT server. Based on the collected data in the IoT server, a machine learning model is built to classify three levels of depression (RED, YELLOW, and GREEN) given by a proposed merge labeling method. It was verified that the k-nearest neighbor (k-NN) model could achieve about 93% of accuracy by experimental results. In addition, according to the classified level, a social network service agent sent a corresponding alert message to the family, friends and social workers. Thus, we were able to provide an interactive content service between users and caregivers.

Acoustic Emission Source Characterization and Fracture Behavior of Finite-width Plate with a Circular Hole Defect using Artificial Neural Network (인공신경회로망을 이용한 원공결함을 갖는 유한 폭 판재의 음향방출 음원특성과 파괴거동에 관한 연구)

  • Rhee, Zhang-Kyu;Woo, Chang-Ki
    • Transactions of the Korean Society of Machine Tool Engineers
    • /
    • 제18권2호
    • /
    • pp.170-177
    • /
    • 2009
  • The objective of this study is to evaluate an acoustic emission (AE) source characterization and fracture behavior of the SM45C steel by using back-propagation neural network (BPN). In previous research Ref. [8] about k-nearest neighbor classifier (k-NNC) continuity, we used K-means clustering method as an unsupervised learning method for obtaining multi-variate AE main data sets, such as AE counts, energy, amplitude, risetime, duration and counts to peak. Similarly, we applied k-NNC and BPN as a supervised learning method for obtaining multi-variate AE working data sets. According to the error of convergence for determinant criterion Wilk's ${\lambda}$, heuristic criteria D&B(Rij) and Tou values are discussed. As a result, in k-NNC before fracture signal is detected or when fracture signal is detected, showed that produce some empty classes in BPN. And we confirmed that could save trouble in AE signal processing if suitable error of convergence or acceptable encoding error give to BPN.

Acoustic Emission Source Classification of Finite-width Plate with a Circular Hole Defect using k-Nearest Neighbor Algorithm (k-최근접 이웃 알고리즘을 이용한 원공결함을 갖는 유한 폭 판재의 음향방출 음원분류에 대한 연구)

  • Rhee, Zhang-Kyu;Oh, Jin-Soo
    • Journal of the Korea Safety Management & Science
    • /
    • 제11권1호
    • /
    • pp.27-33
    • /
    • 2009
  • A study of fracture to material is getting interest in nuclear and aerospace industry as a viewpoint of safety. Acoustic emission (AE) is a non-destructive testing and new technology to evaluate safety on structures. In previous research continuously, all tensile tests on the pre-defected coupons were performed using the universal testing machine, which machine crosshead was move at a constant speed of 5mm/min. This study is to evaluate an AE source characterization of SM45C steel by using k-nearest neighbor classifier, k-NNC. For this, we used K-means clustering as an unsupervised learning method for obtained multi -variate AE main data sets, and we applied k-NNC as a supervised learning pattern recognition algorithm for obtained multi-variate AE working data sets. As a result, the criteria of Wilk's $\lambda$, D&B(Rij) & Tou are discussed.

Interference Elimination Method of Ultrasonic Sensors Using K-Nearest Neighbor Algorithm (KNN 알고리즘을 활용한 초음파 센서 간 간섭 제거 기법)

  • Im, Hyungchul;Lee, Seongsoo
    • Journal of IKEEE
    • /
    • 제26권2호
    • /
    • pp.169-175
    • /
    • 2022
  • This paper introduces an interference elimination method using k-nearest neighbor (KNN) algorithm for precise distance estimation by reducing interference between ultrasonic sensors. Conventional methods compare current distance measurement result with previous distance measurement results. If the difference exceeds some thresholds, conventional methods recognize them as interference and exclude them, but they often suffer from imprecise distance prediction. KNN algorithm classifies input values measured by multiple ultrasonic sensors and predicts high accuracy outputs. Experiments of distance measurements are conducted where interference frequently occurs by multiple ultrasound sensors of same type, and the results show that KNN algorithm significantly reduce distance prediction errors. Also the results show that the prediction performance of KNN algorithm is superior to conventional voting methods.