• Title/Summary/Keyword: classification/prediction

Search Result 1,115, Processing Time 0.027 seconds

Machine learning application to seismic site classification prediction model using Horizontal-to-Vertical Spectral Ratio (HVSR) of strong-ground motions

  • Francis G. Phi;Bumsu Cho;Jungeun Kim;Hyungik Cho;Yun Wook Choo;Dookie Kim;Inhi Kim
    • Geomechanics and Engineering
    • /
    • 제37권6호
    • /
    • pp.539-554
    • /
    • 2024
  • This study explores development of prediction model for seismic site classification through the integration of machine learning techniques with horizontal-to-vertical spectral ratio (HVSR) methodologies. To improve model accuracy, the research employs outlier detection methods and, synthetic minority over-sampling technique (SMOTE) for data balance, and evaluates using seven machine learning models using seismic data from KiK-net. Notably, light gradient boosting method (LGBM), gradient boosting, and decision tree models exhibit improved performance when coupled with SMOTE, while Multiple linear regression (MLR) and Support vector machine (SVM) models show reduced efficacy. Outlier detection techniques significantly enhance accuracy, particularly for LGBM, gradient boosting, and voting boosting. The ensemble of LGBM with the isolation forest and SMOTE achieves the highest accuracy of 0.91, with LGBM and local outlier factor yielding the highest F1-score of 0.79. Consistently outperforming other models, LGBM proves most efficient for seismic site classification when supported by appropriate preprocessing procedures. These findings show the significance of outlier detection and data balancing for precise seismic soil classification prediction, offering insights and highlighting the potential of machine learning in optimizing site classification accuracy.

슈퍼스칼라 프로세서에서 정적 및 동적 분류를 사용한 혼합형 결과 값 예측기 (A Hybrid Value Predictor using Static and Dynamic Classification in Superscalar Processors)

  • 김주익;박홍준;조영일
    • 한국정보과학회논문지:시스템및이론
    • /
    • 제30권10호
    • /
    • pp.569-578
    • /
    • 2003
  • 데이타 종속성은 명령어 수준 병렬성을 향상시키는데 중요한 장애요소가 되고 있으며, 최근 여러 논문에서 데이타 종속을 제거하기 위하여 결과 값을 예상하는 방법이 연구되고 있다. 혼합형 결과 값 예측기는 여러 예측기의 장점을 이용하여 높은 예상 정확도를 얻을 수 있지만, 동일한 명령어가 여러 개의 예측기 테이블에 중복 엔트리를 갖게되어 높은 하드웨어의 비용을 필요로 한다는 단점이 있다. 본 논문에서는 정적 및 동적 분류 정보를 이용하여 높은 성능을 얻을 수 있는 새로운 혼합형 결과 값 예측기를 제안한다. 제안된 예측기는 반입 단계 동안 정적 분류 정보를 사용하여 적절한 예측기에 할당함으로써 테이블 크기를 효과적으로 감소시켰고 예상정확도를 향상시켰다. 또한 제안된 예측기는 동적 분류를 사용하여“Unknown”유형의 명령어에 가장 적절한 예측방법을 선택하도록 하여 예상 정확도를 더욱 향상시켰다. SimpleScaiar/PISA 툴셋과 SPECint95 벤치마크 프로그램에서 시뮬레이션 한 결과, 정적 분류 정보를 사용하였을 경우 평균 예상 정확도가 85.1%, 정적 및 동적 분류 정보를 모두 사용하였을 경우 87.6%의 평균 예상 정확도를 얻을 수 있었다.

Semi-supervised Software Defect Prediction Model Based on Tri-training

  • Meng, Fanqi;Cheng, Wenying;Wang, Jingdong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권11호
    • /
    • pp.4028-4042
    • /
    • 2021
  • Aiming at the problem of software defect prediction difficulty caused by insufficient software defect marker samples and unbalanced classification, a semi-supervised software defect prediction model based on a tri-training algorithm was proposed by combining feature normalization, over-sampling technology, and a Tri-training algorithm. First, the feature normalization method is used to smooth the feature data to eliminate the influence of too large or too small feature values on the model's classification performance. Secondly, the oversampling method is used to expand and sample the data, which solves the unbalanced classification of labelled samples. Finally, the Tri-training algorithm performs machine learning on the training samples and establishes a defect prediction model. The novelty of this model is that it can effectively combine feature normalization, oversampling techniques, and the Tri-training algorithm to solve both the under-labelled sample and class imbalance problems. Simulation experiments using the NASA software defect prediction dataset show that the proposed method outperforms four existing supervised and semi-supervised learning in terms of Precision, Recall, and F-Measure values.

시프트 시그모이드 분류함수를 가진 로지스틱 회귀를 이용한 신입생 중도탈락 예측모델 연구 (A Study of Freshman Dropout Prediction Model Using Logistic Regression with Shift-Sigmoid Classification Function)

  • 김동형
    • 디지털산업정보학회논문지
    • /
    • 제19권4호
    • /
    • pp.137-146
    • /
    • 2023
  • The dropout of university freshmen is a very important issue in the financial problems of universities. Moreover, the dropout rate is one of the important indicators among the external evaluation items of universities. Therefore, universities need to predict dropout students in advance and apply various dropout prevention programs targeting them. This paper proposes a method to predict such dropout students in advance. This paper is about a method for predicting dropout students. It proposes a method to select dropouts by applying logistic regression using a shift sigmoid classification function using only quantitative data from the first semester of the first year, which most universities have. It is based on logistic regression and can select the number of prediction subjects and prediction accuracy by using the shift sigmoid function as an classification function. As a result of the experiment, when the proposed algorithm was applied, the number of predicted dropout subjects varied from 100% to 20% compared to the actual number of dropout subjects, and it was found to have a prediction accuracy of 75% to 98%.

유전자 알고리즘과 일반화된 회귀 신경망을 이용한 프로모터 서열 분류 (Promoter Classification Using Genetic Algorithm Controlled Generalized Regression Neural Network)

  • 김성모;김근호;김병환
    • 대한전기학회논문지:시스템및제어부문D
    • /
    • 제53권7호
    • /
    • pp.531-535
    • /
    • 2004
  • A new method is presented to construct a classifier. This was accomplished by combining a generalized regression neural network (GRNN) and a genetic algorithm (GA). The classifier constructed in this way is referred to as a GA-GRNN. The GA played a role of controlling training factors simultaneously. The GA-GRNN was applied to classify 4 different Promoter sequences. The training and test data were composed of 115 and 58 sequence patterns, respectively. The classifier performance was investigated in terms of the classification sensitivity and prediction accuracy. Compared to conventional GRNN, GA-GRNN significantly improved the total classification sensitivity as well as the total prediction accuracy. As a result, the proposed GA-GRNN demonstrated improved classification sensitivity and prediction accuracy over the convention GRNN.

베이지안 분류기를 이용한 소프트웨어 품질 분류 (Software Quality Classification using Bayesian Classifier)

  • 홍의석
    • 한국IT서비스학회지
    • /
    • 제11권1호
    • /
    • pp.211-221
    • /
    • 2012
  • Many metric-based classification models have been proposed to predict fault-proneness of software module. This paper presents two prediction models using Bayesian classifier which is one of the most popular modern classification algorithms. Bayesian model based on Bayesian probability theory can be a promising technique for software quality prediction. This is due to the ability to represent uncertainty using probabilities and the ability to partly incorporate expert's knowledge into training data. The two models, Na$\ddot{i}$veBayes(NB) and Bayesian Belief Network(BBN), are constructed and dimensionality reduction of training data and test data are performed before model evaluation. Prediction accuracy of the model is evaluated using two prediction error measures, Type I error and Type II error, and compared with well-known prediction models, backpropagation neural network model and support vector machine model. The results show that the prediction performance of BBN model is slightly better than that of NB. For the data set with ambiguity, although the BBN model's prediction accuracy is not as good as the compared models, it achieves better performance than the compared models for the data set without ambiguity.

A Novel Thresholding for Prediction Analytics with Machine Learning Techniques

  • Shakir, Khan;Reemiah Muneer, Alotaibi
    • International Journal of Computer Science & Network Security
    • /
    • 제23권1호
    • /
    • pp.33-40
    • /
    • 2023
  • Machine-learning techniques are discovering effective performance on data analytics. Classification and regression are supported for prediction on different kinds of data. There are various breeds of classification techniques are using based on nature of data. Threshold determination is essential to making better model for unlabelled data. In this paper, threshold value applied as range, based on min-max normalization technique for creating labels and multiclass classification performed on rainfall data. Binary classification is applied on autism data and classification techniques applied on child abuse data. Performance of each technique analysed with the evaluation metrics.

Truncated Kernel Projection Machine for Link Prediction

  • Huang, Liang;Li, Ruixuan;Chen, Hong
    • Journal of Computing Science and Engineering
    • /
    • 제10권2호
    • /
    • pp.58-67
    • /
    • 2016
  • With the large amount of complex network data that is increasingly available on the Web, link prediction has become a popular data-mining research field. The focus of this paper is on a link-prediction task that can be formulated as a binary classification problem in complex networks. To solve this link-prediction problem, a sparse-classification algorithm called "Truncated Kernel Projection Machine" that is based on empirical-feature selection is proposed. The proposed algorithm is a novel way to achieve a realization of sparse empirical-feature-based learning that is different from those of the regularized kernel-projection machines. The algorithm is more appealing than those of the previous outstanding learning machines since it can be computed efficiently, and it is also implemented easily and stably during the link-prediction task. The algorithm is applied here for link-prediction tasks in different complex networks, and an investigation of several classification algorithms was performed for comparison. The experimental results show that the proposed algorithm outperformed the compared algorithms in several key indices with a smaller number of test errors and greater stability.

유전 알고리듬 기반 제품구매예측 모형의 개발 (A GA-based Classification Model for Predicting Consumer Choice)

  • 민재형;정철우
    • 한국경영과학회지
    • /
    • 제34권3호
    • /
    • pp.29-41
    • /
    • 2009
  • The purpose of this paper is to develop a new classification method for predicting consumer choice based on genetic algorithm, and to validate Its prediction power over existing methods. To serve this purpose, we propose a hybrid model, and discuss Its methodological characteristics in comparison with other existing classification methods. Also, we conduct a series of experiments employing survey data of consumer choices of MP3 players to assess the prediction power of the model. The results show that the suggested model in this paper is statistically superior to the existing methods such as logistic regression model, artificial neural network model and decision tree model in terms of prediction accuracy. The model is also shown to have an advantage of providing several strategic information of practical use for consumer choice.

유전 알고리듬 기반 제품구매예측 모형의 개발 (A GA-based Classification Model for Predicting Consumer Choice)

  • 민재형;정철우
    • 한국경영과학회:학술대회논문집
    • /
    • 한국경영과학회 2008년도 추계학술대회 및 정기총회
    • /
    • pp.1-7
    • /
    • 2008
  • The purpose of this paper is to develop a new classification method for predicting consumer choice based on genetic algorithm, and to validate its prediction power over existing methods. To serve this purpose, we propose a hybrid model, and discuss its methodological characteristics in comparison with other existing classification methods. Also, to assess the prediction power of the model, we conduct a series of experiments employing survey data of consumer choices of MP3 players. The results show that the suggested model in this paper is statistically superior to the existing methods such as logistic regression model, artificial neural network model and decision tree model in terms of prediction accuracy. The model is also shown to have an advantage of providing several strategic information of practical use for consumer choice.

  • PDF