• Title/Summary/Keyword: classifiers

Search Result 743, Processing Time 0.028 seconds

Genetic classification of various familial relationships using the stacking ensemble machine learning approaches

  • Su Jin Jeong;Hyo-Jung Lee;Soong Deok Lee;Ji Eun Park;Jae Won Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.3
    • /
    • pp.279-289
    • /
    • 2024
  • Familial searching is a useful technique in a forensic investigation. Using genetic information, it is possible to identify individuals, determine familial relationships, and obtain racial/ethnic information. The total number of shared alleles (TNSA) and likelihood ratio (LR) methods have traditionally been used, and novel data-mining classification methods have recently been applied here as well. However, it is difficult to apply these methods to identify familial relationships above the third degree (e.g., uncle-nephew and first cousins). Therefore, we propose to apply a stacking ensemble machine learning algorithm to improve the accuracy of familial relationship identification. Using real data analysis, we obtain superior relationship identification results when applying meta-classifiers with a stacking algorithm rather than applying traditional TNSA or LR methods and data mining techniques.

A Comparative Study on Neural Network Classifiers for Neurton-Type Security Device (중성자 보안검색 장치를 위한 신경망 분류기 비교 연구)

  • Choi, Chang-Rak;Kim, Ji-Soo;Kim, Soo-Hyung;Sim, Cheul-Muu
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2007.11a
    • /
    • pp.3-6
    • /
    • 2007
  • 현재 우리나라는 원자력 발전에 대한 의존도가 매우 높고 그 기술 또한 우수하다. 그러나 중성자 스펙트럼을 사용하여 폭발물 탐지를 위한 시스템 개발 기술은 미흡한 실정이다. 본 논문은 신경망(Neural Networks)을 한국 원자력 연구소 중성자 스펙트럼 패턴을 분류하는 시스템에 적용하였다. 데이터 획득방법을 달리하여 두 개의 신경망을 구현하였고 그 결과를 분석하여 보았다. 먼저 폭발물에 다량 포함되어 있는 C(Carbon), N(Nitrogen), O(Oxygen) 3개의 물질을 중심으로 중성자 스펙트럼을 분석하였다. 다른 하나는 중성자 스펙트럼을 전체 영역으로 획득한 데이터를 바탕으로 신경망을 구현하여 인식률을 확인하였다. 실험결과 전자의 경우 62.5%의 인식률을, 후자의 경우 신경망은 83.48%의 인식률을 나타내었다.

A sensitivity analysis of machine learning models on fire-induced spalling of concrete: Revealing the impact of data manipulation on accuracy and explainability

  • Mohammad K. al-Bashiti;M.Z. Naser
    • Computers and Concrete
    • /
    • v.33 no.4
    • /
    • pp.409-423
    • /
    • 2024
  • Using an extensive database, a sensitivity analysis across fifteen machine learning (ML) classifiers was conducted to evaluate the impact of various data manipulation techniques, evaluation metrics, and explainability tools. The results of this sensitivity analysis reveal that the examined models can achieve an accuracy ranging from 72-93% in predicting the fire-induced spalling of concrete and denote the light gradient boosting machine, extreme gradient boosting, and random forest algorithms as the best-performing models. Among such models, the six key factors influencing spalling were maximum exposure temperature, heating rate, compressive strength of concrete, moisture content, silica fume content, and the quantity of polypropylene fiber. Our analysis also documents some conflicting results observed with the deep learning model. As such, this study highlights the necessity of selecting suitable models and carefully evaluating the presence of possible outcome biases.

L1-penalized AUC-optimization with a surrogate loss

  • Hyungwoo Kim;Seung Jun Shin
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.2
    • /
    • pp.203-212
    • /
    • 2024
  • The area under the ROC curve (AUC) is one of the most common criteria used to measure the overall performance of binary classifiers for a wide range of machine learning problems. In this article, we propose a L1-penalized AUC-optimization classifier that directly maximizes the AUC for high-dimensional data. Toward this, we employ the AUC-consistent surrogate loss function and combine the L1-norm penalty which enables us to estimate coefficients and select informative variables simultaneously. In addition, we develop an efficient optimization algorithm by adopting k-means clustering and proximal gradient descent which enjoys computational advantages to obtain solutions for the proposed method. Numerical simulation studies demonstrate that the proposed method shows promising performance in terms of prediction accuracy, variable selectivity, and computational costs.

Bankruptcy prediction using an improved bagging ensemble (개선된 배깅 앙상블을 활용한 기업부도예측)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.121-139
    • /
    • 2014
  • Predicting corporate failure has been an important topic in accounting and finance. The costs associated with bankruptcy are high, so the accuracy of bankruptcy prediction is greatly important for financial institutions. Lots of researchers have dealt with the topic associated with bankruptcy prediction in the past three decades. The current research attempts to use ensemble models for improving the performance of bankruptcy prediction. Ensemble classification is to combine individually trained classifiers in order to gain more accurate prediction than individual models. Ensemble techniques are shown to be very useful for improving the generalization ability of the classifier. Bagging is the most commonly used methods for constructing ensemble classifiers. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. Instance selection is to select critical instances while deleting and removing irrelevant and harmful instances from the original set. Instance selection and bagging are quite well known in data mining. However, few studies have dealt with the integration of instance selection and bagging. This study proposes an improved bagging ensemble based on instance selection using genetic algorithms (GA) for improving the performance of SVM. GA is an efficient optimization procedure based on the theory of natural selection and evolution. GA uses the idea of survival of the fittest by progressively accepting better solutions to the problems. GA searches by maintaining a population of solutions from which better solutions are created rather than making incremental changes to a single solution to the problem. The initial solution population is generated randomly and evolves into the next generation by genetic operators such as selection, crossover and mutation. The solutions coded by strings are evaluated by the fitness function. The proposed model consists of two phases: GA based Instance Selection and Instance based Bagging. In the first phase, GA is used to select optimal instance subset that is used as input data of bagging model. In this study, the chromosome is encoded as a form of binary string for the instance subset. In this phase, the population size was set to 100 while maximum number of generations was set to 150. We set the crossover rate and mutation rate to 0.7 and 0.1 respectively. We used the prediction accuracy of model as the fitness function of GA. SVM model is trained on training data set using the selected instance subset. The prediction accuracy of SVM model over test data set is used as fitness value in order to avoid overfitting. In the second phase, we used the optimal instance subset selected in the first phase as input data of bagging model. We used SVM model as base classifier for bagging ensemble. The majority voting scheme was used as a combining method in this study. This study applies the proposed model to the bankruptcy prediction problem using a real data set from Korean companies. The research data used in this study contains 1832 externally non-audited firms which filed for bankruptcy (916 cases) and non-bankruptcy (916 cases). Financial ratios categorized as stability, profitability, growth, activity and cash flow were investigated through literature review and basic statistical methods and we selected 8 financial ratios as the final input variables. We separated the whole data into three subsets as training, test and validation data set. In this study, we compared the proposed model with several comparative models including the simple individual SVM model, the simple bagging model and the instance selection based SVM model. The McNemar tests were used to examine whether the proposed model significantly outperforms the other models. The experimental results show that the proposed model outperforms the other models.

Skin Color Detection Using Partially Connected Multi-layer Perceptron of Two Color Models (두 칼라 모델의 부분연결 다층 퍼셉트론을 사용한 피부색 검출)

  • Kim, Sung-Hoon;Lee, Hyon-Soo
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.3
    • /
    • pp.107-115
    • /
    • 2009
  • Skin color detection is used to classify input pixels into skin and non skin area, and it requires the classifier to have a high classification rate. In previous work, most classifiers used single color model for skin color detection. However the classification rate can be increased by using more than one color model due to the various characteristics of skin color distribution in different color models, and the MLP is also invested as a more efficient classifier with less parameters than other classifiers. But the input dimension and required parameters of MLP will be increased when using two color models in skin color detection, as a result, the increased parameters will cause the huge teaming time in MLP. In this paper, we propose a MLP based classifier with less parameters in two color models. The proposed partially connected MLP based on two color models can reduce the number of weights and improve the classification rate. Because the characteristic of different color model can be learned in different partial networks. As the experimental results, we obtained 91.8% classification rate when testing various images in RGB and CbCr models.

Human Walking Detection and Background Noise Classification by Deep Neural Networks for Doppler Radars (사람 걸음 탐지 및 배경잡음 분류 처리를 위한 도플러 레이다용 딥뉴럴네트워크)

  • Kwon, Jihoon;Ha, Seoung-Jae;Kwak, Nojun
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.29 no.7
    • /
    • pp.550-559
    • /
    • 2018
  • The effectiveness of deep neural networks (DNNs) for detection and classification of micro-Doppler signals generated by human walking and background noise sources is investigated. Previous research included a complex process for extracting meaningful features that directly affect classifier performance, and this feature extraction is based on experiences and statistical analysis. However, because a DNN gradually reconstructs and generates features through a process of passing layers in a network, the preprocess for feature extraction is not required. Therefore, binary classifiers and multiclass classifiers were designed and analyzed in which multilayer perceptrons (MLPs) and DNNs were applied, and the effectiveness of DNNs for recognizing micro-Doppler signals was demonstrated. Experimental results showed that, in the case of MLPs, the classification accuracies of the binary classifier and the multiclass classifier were 90.3% and 86.1%, respectively, for the test dataset. In the case of DNNs, the classification accuracies of the binary classifier and the multiclass classifier were 97.3% and 96.1%, respectively, for the test dataset.

Comparison between Hyperspectral and Multispectral Images for the Classification of Coniferous Species (침엽수종 분류를 위한 초분광영상과 다중분광영상의 비교)

  • Cho, Hyunggab;Lee, Kyu-Sung
    • Korean Journal of Remote Sensing
    • /
    • v.30 no.1
    • /
    • pp.25-36
    • /
    • 2014
  • Multispectral image classification of individual tree species is often difficult because of the spectral similarity among species. In this study, we attempted to analyze the suitability of hyperspectral image to classify coniferous tree species. Several image sets and classification methods were applied and the classification results were compared with the ones from multispectral image. Two airborne hyperspectral images (AISA, CASI) were obtained over the study area in the Gwangneung National Forest. For the comparison, ETM+ multispectral image was simulated using hyperspectral images as to have lower spectral resolution. We also used the transformed hyperspectral data to reduce the data volume for the classification. Three supervised classification schemes (SAM, SVM, MLC) were applied to thirteen image sets. In overall, hyperspectral image provides higher accuracies than multispectral image to discriminate coniferous species. AISA-dual image, which include additional SWIR spectral bands, shows the best result as compared with other hyperspectral images that include only visible and NIR bands. Furthermore, MNF transformed hyperspectral image provided higher classification accuracies than the full-band and other band reduced data. Among three classifiers, MLC showed higher classification accuracy than SAM and SVM classifiers.

A Hybrid Under-sampling Approach for Better Bankruptcy Prediction (부도예측 개선을 위한 하이브리드 언더샘플링 접근법)

  • Kim, Taehoon;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.173-190
    • /
    • 2015
  • The purpose of this study is to improve bankruptcy prediction models by using a novel hybrid under-sampling approach. Most prior studies have tried to enhance the accuracy of bankruptcy prediction models by improving the classification methods involved. In contrast, we focus on appropriate data preprocessing as a means of enhancing accuracy. In particular, we aim to develop an effective sampling approach for bankruptcy prediction, since most prediction models suffer from class imbalance problems. The approach proposed in this study is a hybrid under-sampling method that combines the k-Reverse Nearest Neighbor (k-RNN) and one-class support vector machine (OCSVM) approaches. k-RNN can effectively eliminate outliers, while OCSVM contributes to the selection of informative training samples from majority class data. To validate our proposed approach, we have applied it to data from H Bank's non-external auditing companies in Korea, and compared the performances of the classifiers with the proposed under-sampling and random sampling data. The empirical results show that the proposed under-sampling approach generally improves the accuracy of classifiers, such as logistic regression, discriminant analysis, decision tree, and support vector machines. They also show that the proposed under-sampling approach reduces the risk of false negative errors, which lead to higher misclassification costs.

Binary Tree Architecture Design for Support Vector Machine Using Dynamic Time Warping (DTW를 이용한 SVM 기반 이진트리 구조 설계)

  • Kang, Youn Joung;Lee, Jaeil;Bae, Jinho;Lee, Seung Woo;Lee, Chong Hyun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.6
    • /
    • pp.201-208
    • /
    • 2014
  • In this paper, we propose the classifier structure design algorithm using DTW. Proposed algorithm uses DTW result to design the binary tree architecture based on the SVM which classify the multi-class data. Design the binary tree architecture for Support Vector Machine(SVM-BTA) using the threshold criterion calculated by the sum columns in square matrix which components are the reference data from each class. For comparison the performance of the proposed algorithm, compare the results of classifiers which binary tree structure are designed based on database and k-means algorithm. The data used for classification is 333 signals from 18 classes of underwater transient noise. The proposed classifier has been improved classification performance compared with classifier designed by database system, and probability of detection for non-biological transient signal has improved compare with classifiers using k-means algorithm. The proposed SVM-BTA classified 68.77% of biological sound(BO), 92.86% chain(CHAN) the mechanical sound, and 100% of the 6 kinds of the other classes.