• Title/Summary/Keyword: supervised training

Search Result 313, Processing Time 0.03 seconds

Software Fault Prediction using Semi-supervised Learning Methods (세미감독형 학습 기법을 사용한 소프트웨어 결함 예측)

  • Hong, Euyseok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.3
    • /
    • pp.127-133
    • /
    • 2019
  • Most studies of software fault prediction have been about supervised learning models that use only labeled training data. Although supervised learning usually shows high prediction performance, most development groups do not have sufficient labeled data. Unsupervised learning models that use only unlabeled data for training are difficult to build and show poor performance. Semi-supervised learning models that use both labeled data and unlabeled data can solve these problems. Self-training technique requires the fewest assumptions and constraints among semi-supervised techniques. In this paper, we implemented several models using self-training algorithms and evaluated them using Accuracy and AUC. As a result, YATSI showed the best performance.

Semi-supervised Software Defect Prediction Model Based on Tri-training

  • Meng, Fanqi;Cheng, Wenying;Wang, Jingdong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.11
    • /
    • pp.4028-4042
    • /
    • 2021
  • Aiming at the problem of software defect prediction difficulty caused by insufficient software defect marker samples and unbalanced classification, a semi-supervised software defect prediction model based on a tri-training algorithm was proposed by combining feature normalization, over-sampling technology, and a Tri-training algorithm. First, the feature normalization method is used to smooth the feature data to eliminate the influence of too large or too small feature values on the model's classification performance. Secondly, the oversampling method is used to expand and sample the data, which solves the unbalanced classification of labelled samples. Finally, the Tri-training algorithm performs machine learning on the training samples and establishes a defect prediction model. The novelty of this model is that it can effectively combine feature normalization, oversampling techniques, and the Tri-training algorithm to solve both the under-labelled sample and class imbalance problems. Simulation experiments using the NASA software defect prediction dataset show that the proposed method outperforms four existing supervised and semi-supervised learning in terms of Precision, Recall, and F-Measure values.

Supervised Classification Using Training Parameters and Prior Probability Generated from VITD - The Case of QuickBird Multispectral Imagery

  • Eo, Yang-Dam;Lee, Gyeong-Wook;Park, Doo-Youl;Park, Wang-Yong;Lee, Chang-No
    • Korean Journal of Remote Sensing
    • /
    • v.24 no.5
    • /
    • pp.517-524
    • /
    • 2008
  • In order to classify an satellite imagery into geospatial features of interest, the supervised classification needs to be trained to distinguish these features through training sampling. However, even though an imagery is classified, different results of classification could be generated according to operator's experience and expertise in training process. Users who practically exploit an classification result to their applications need the research accomplishment for the consistent result as well as the accuracy improvement. The experiment includes the classification results for training process used VITD polygons as a prior probability and training parameter, instead of manual sampling. As results, classification accuracy using VITD polygons as prior probabilities shows the highest results in several methods. The training using unsupervised classification with VITD have produced similar classification results as manual training and/or with prior probability.

Semi-supervised Model for Fault Prediction using Tree Methods (트리 기법을 사용하는 세미감독형 결함 예측 모델)

  • Hong, Euyseok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.4
    • /
    • pp.107-113
    • /
    • 2020
  • A number of studies have been conducted on predicting software faults, but most of them have been supervised models using labeled data as training data. Very few studies have been conducted on unsupervised models using only unlabeled data or semi-supervised models using enough unlabeled data and few labeled data. In this paper, we produced new semi-supervised models using tree algorithms in the self-training technique. As a result of the model performance evaluation experiment, the newly created tree models performed better than the existing models, and CollectiveWoods, in particular, outperformed other models. In addition, it showed very stable performance even in the case with very few labeled data.

Supervised Rank Normalization with Training Sample Selection (학습 샘플 선택을 이용한 교사 랭크 정규화)

  • Heo, Gyeongyong;Choi, Hun;Youn, Joo-Sang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.1
    • /
    • pp.21-28
    • /
    • 2015
  • Feature normalization as a pre-processing step has been widely used to reduce the effect of different scale in each feature dimension and error rate in classification. Most of the existing normalization methods, however, do not use the class labels of data points and, as a result, do not guarantee the optimality of normalization in classification aspect. A supervised rank normalization method, combination of rank normalization and supervised learning technique, was proposed and demonstrated better result than others. In this paper, another technique, training sample selection, is introduced in supervised feature normalization to reduce classification error more. Training sample selection is a common technique for increasing classification accuracy by removing noisy samples and can be applied in supervised normalization method. Two sample selection measures based on the classes of neighboring samples and the distance to neighboring samples were proposed and both of them showed better results than previous supervised rank normalization method.

Improve the Performance of Semi-Supervised Side-channel Analysis Using HWFilter Method

  • Hong Zhang;Lang Li;Di Li
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.3
    • /
    • pp.738-754
    • /
    • 2024
  • Side-channel analysis (SCA) is a cryptanalytic technique that exploits physical leakages, such as power consumption or electromagnetic emanations, from cryptographic devices to extract secret keys used in cryptographic algorithms. Recent studies have shown that training SCA models with semi-supervised learning can effectively overcome the problem of few labeled power traces. However, the process of training SCA models using semi-supervised learning generates many pseudo-labels. The performance of the SCA model can be reduced by some of these pseudo-labels. To solve this issue, we propose the HWFilter method to improve semi-supervised SCA. This method uses a Hamming Weight Pseudo-label Filter (HWPF) to filter the pseudo-labels generated by the semi-supervised SCA model, which enhances the model's performance. Furthermore, we introduce a normal distribution method for constructing the HWPF. In the normal distribution method, the Hamming weights (HWs) of power traces can be obtained from the normal distribution of power points. These HWs are filtered and combined into a HWPF. The HWFilter was tested using the ASCADv1 database and the AES_HD dataset. The experimental results demonstrate that the HWFilter method can significantly enhance the performance of semi-supervised SCA models. In the ASCADv1 database, the model with HWFilter requires only 33 power traces to recover the key. In the AES_HD dataset, the model with HWFilter outperforms the current best semi-supervised SCA model by 12%.

Supervised Learning Artificial Neural Network Parameter Optimization and Activation Function Basic Training Method using Spreadsheets (스프레드시트를 활용한 지도학습 인공신경망 매개변수 최적화와 활성화함수 기초교육방법)

  • Hur, Kyeong
    • Journal of Practical Engineering Education
    • /
    • v.13 no.2
    • /
    • pp.233-242
    • /
    • 2021
  • In this paper, as a liberal arts course for non-majors, we proposed a supervised learning artificial neural network parameter optimization method and a basic education method for activation function to design a basic artificial neural network subject curriculum. For this, a method of finding a parameter optimization solution in a spreadsheet without programming was applied. Through this training method, you can focus on the basic principles of artificial neural network operation and implementation. And, it is possible to increase the interest and educational effect of non-majors through the visualized data of the spreadsheet. The proposed contents consisted of artificial neurons with sigmoid and ReLU activation functions, supervised learning data generation, supervised learning artificial neural network configuration and parameter optimization, supervised learning artificial neural network implementation and performance analysis using spreadsheets, and education satisfaction analysis. In this paper, considering the optimization of negative parameters for the sigmoid neural network and the ReLU neuron artificial neural network, we propose a training method for the four performance analysis results on the parameter optimization of the artificial neural network, and conduct a training satisfaction analysis.

Accuracy Assessment of Supervised Classification using Training Samples Acquired by a Field Spectroradiometer: A Case Study for Kumnam-myun, Sejong City (지상 분광반사자료를 훈련샘플로 이용한 감독분류의 정확도 평가: 세종시 금남면을 사례로)

  • Shin, Jung Il;Kim, Ik Jae;Kim, Dong Wook
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.24 no.1
    • /
    • pp.121-128
    • /
    • 2016
  • Many studies are focused on image data and classifier for comparison or improvement of classification accuracy. Therefore studies are needed aspect of the training samples on supervised classification which depend on reference data or skill of analyst. This study tries to assess usability of field spectra as training samples on supervised classification. Classification accuracies of hyperspectral and multispectral images were assessed using training samples from image itself and field spectra, respectively. The results shown about 90% accuracy with training sample collected from image. Using field spectra as training sample, accuracy was decreased 10%p for hyperspectral image, and 20%p for multispectral image. Especially, some classes shown very low accuracies due to similar spectral characteristics on multispectral image. Therefore, field spectra might be used as training samples on classification of hyperspectral image, although it has limitation for multispectral image.

Semi-supervised regression based on support vector machine

  • Seok, Kyungha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.2
    • /
    • pp.447-454
    • /
    • 2014
  • In many practical machine learning and data mining applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore semi-supervised learning algorithms have attracted much attentions. However, previous research mainly focuses on classication problems. In this paper, a semi-supervised regression method based on support vector regression (SVR) formulation that is proposed. The estimator is easily obtained via the dual formulation of the optimization problem. The experimental results with simulated and real data suggest superior performance of the our proposed method compared with standard SVR.

Automatic Text Categorization based on Semi-Supervised Learning (준지도 학습 기반의 자동 문서 범주화)

  • Ko, Young-Joong;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.5
    • /
    • pp.325-334
    • /
    • 2008
  • The goal of text categorization is to classify documents into a certain number of pre-defined categories. The previous studies in this area have used a large number of labeled training documents for supervised learning. One problem is that it is difficult to create the labeled training documents. While it is easy to collect the unlabeled documents, it is not so easy to manually categorize them for creating training documents. In this paper, we propose a new text categorization method based on semi-supervised learning. The proposed method uses only unlabeled documents and keywords of each category, and it automatically constructs training data from them. Then a text classifier learns with them and classifies text documents. The proposed method shows a similar degree of performance, compared with the traditional supervised teaming methods. Therefore, this method can be used in the areas where low-cost text categorization is needed. It can also be used for creating labeled training documents.