• 제목/요약/키워드: Classification and Prediction

검색결과 1,091건 처리시간 0.027초

Classification of Imbalanced Data Based on MTS-CBPSO Method: A Case Study of Financial Distress Prediction

  • Gu, Yuping;Cheng, Longsheng;Chang, Zhipeng
    • Journal of Information Processing Systems
    • /
    • 제15권3호
    • /
    • pp.682-693
    • /
    • 2019
  • The traditional classification methods mostly assume that the data for class distribution is balanced, while imbalanced data is widely found in the real world. So it is important to solve the problem of classification with imbalanced data. In Mahalanobis-Taguchi system (MTS) algorithm, data classification model is constructed with the reference space and measurement reference scale which is come from a single normal group, and thus it is suitable to handle the imbalanced data problem. In this paper, an improved method of MTS-CBPSO is constructed by introducing the chaotic mapping and binary particle swarm optimization algorithm instead of orthogonal array and signal-to-noise ratio (SNR) to select the valid variables, in which G-means, F-measure, dimensionality reduction are regarded as the classification optimization target. This proposed method is also applied to the financial distress prediction of Chinese listed companies. Compared with the traditional MTS and the common classification methods such as SVM, C4.5, k-NN, it is showed that the MTS-CBPSO method has better result of prediction accuracy and dimensionality reduction.

Semi-supervised Software Defect Prediction Model Based on Tri-training

  • Meng, Fanqi;Cheng, Wenying;Wang, Jingdong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권11호
    • /
    • pp.4028-4042
    • /
    • 2021
  • Aiming at the problem of software defect prediction difficulty caused by insufficient software defect marker samples and unbalanced classification, a semi-supervised software defect prediction model based on a tri-training algorithm was proposed by combining feature normalization, over-sampling technology, and a Tri-training algorithm. First, the feature normalization method is used to smooth the feature data to eliminate the influence of too large or too small feature values on the model's classification performance. Secondly, the oversampling method is used to expand and sample the data, which solves the unbalanced classification of labelled samples. Finally, the Tri-training algorithm performs machine learning on the training samples and establishes a defect prediction model. The novelty of this model is that it can effectively combine feature normalization, oversampling techniques, and the Tri-training algorithm to solve both the under-labelled sample and class imbalance problems. Simulation experiments using the NASA software defect prediction dataset show that the proposed method outperforms four existing supervised and semi-supervised learning in terms of Precision, Recall, and F-Measure values.

시프트 시그모이드 분류함수를 가진 로지스틱 회귀를 이용한 신입생 중도탈락 예측모델 연구 (A Study of Freshman Dropout Prediction Model Using Logistic Regression with Shift-Sigmoid Classification Function)

  • 김동형
    • 디지털산업정보학회논문지
    • /
    • 제19권4호
    • /
    • pp.137-146
    • /
    • 2023
  • The dropout of university freshmen is a very important issue in the financial problems of universities. Moreover, the dropout rate is one of the important indicators among the external evaluation items of universities. Therefore, universities need to predict dropout students in advance and apply various dropout prevention programs targeting them. This paper proposes a method to predict such dropout students in advance. This paper is about a method for predicting dropout students. It proposes a method to select dropouts by applying logistic regression using a shift sigmoid classification function using only quantitative data from the first semester of the first year, which most universities have. It is based on logistic regression and can select the number of prediction subjects and prediction accuracy by using the shift sigmoid function as an classification function. As a result of the experiment, when the proposed algorithm was applied, the number of predicted dropout subjects varied from 100% to 20% compared to the actual number of dropout subjects, and it was found to have a prediction accuracy of 75% to 98%.

슈퍼스칼라 프로세서에서 정적 및 동적 분류를 사용한 혼합형 결과 값 예측기 (A Hybrid Value Predictor using Static and Dynamic Classification in Superscalar Processors)

  • 김주익;박홍준;조영일
    • 한국정보과학회논문지:시스템및이론
    • /
    • 제30권10호
    • /
    • pp.569-578
    • /
    • 2003
  • 데이타 종속성은 명령어 수준 병렬성을 향상시키는데 중요한 장애요소가 되고 있으며, 최근 여러 논문에서 데이타 종속을 제거하기 위하여 결과 값을 예상하는 방법이 연구되고 있다. 혼합형 결과 값 예측기는 여러 예측기의 장점을 이용하여 높은 예상 정확도를 얻을 수 있지만, 동일한 명령어가 여러 개의 예측기 테이블에 중복 엔트리를 갖게되어 높은 하드웨어의 비용을 필요로 한다는 단점이 있다. 본 논문에서는 정적 및 동적 분류 정보를 이용하여 높은 성능을 얻을 수 있는 새로운 혼합형 결과 값 예측기를 제안한다. 제안된 예측기는 반입 단계 동안 정적 분류 정보를 사용하여 적절한 예측기에 할당함으로써 테이블 크기를 효과적으로 감소시켰고 예상정확도를 향상시켰다. 또한 제안된 예측기는 동적 분류를 사용하여“Unknown”유형의 명령어에 가장 적절한 예측방법을 선택하도록 하여 예상 정확도를 더욱 향상시켰다. SimpleScaiar/PISA 툴셋과 SPECint95 벤치마크 프로그램에서 시뮬레이션 한 결과, 정적 분류 정보를 사용하였을 경우 평균 예상 정확도가 85.1%, 정적 및 동적 분류 정보를 모두 사용하였을 경우 87.6%의 평균 예상 정확도를 얻을 수 있었다.

유전자 알고리즘과 일반화된 회귀 신경망을 이용한 프로모터 서열 분류 (Promoter Classification Using Genetic Algorithm Controlled Generalized Regression Neural Network)

  • 김성모;김근호;김병환
    • 대한전기학회논문지:시스템및제어부문D
    • /
    • 제53권7호
    • /
    • pp.531-535
    • /
    • 2004
  • A new method is presented to construct a classifier. This was accomplished by combining a generalized regression neural network (GRNN) and a genetic algorithm (GA). The classifier constructed in this way is referred to as a GA-GRNN. The GA played a role of controlling training factors simultaneously. The GA-GRNN was applied to classify 4 different Promoter sequences. The training and test data were composed of 115 and 58 sequence patterns, respectively. The classifier performance was investigated in terms of the classification sensitivity and prediction accuracy. Compared to conventional GRNN, GA-GRNN significantly improved the total classification sensitivity as well as the total prediction accuracy. As a result, the proposed GA-GRNN demonstrated improved classification sensitivity and prediction accuracy over the convention GRNN.

베이지안 분류기를 이용한 소프트웨어 품질 분류 (Software Quality Classification using Bayesian Classifier)

  • 홍의석
    • 한국IT서비스학회지
    • /
    • 제11권1호
    • /
    • pp.211-221
    • /
    • 2012
  • Many metric-based classification models have been proposed to predict fault-proneness of software module. This paper presents two prediction models using Bayesian classifier which is one of the most popular modern classification algorithms. Bayesian model based on Bayesian probability theory can be a promising technique for software quality prediction. This is due to the ability to represent uncertainty using probabilities and the ability to partly incorporate expert's knowledge into training data. The two models, Na$\ddot{i}$veBayes(NB) and Bayesian Belief Network(BBN), are constructed and dimensionality reduction of training data and test data are performed before model evaluation. Prediction accuracy of the model is evaluated using two prediction error measures, Type I error and Type II error, and compared with well-known prediction models, backpropagation neural network model and support vector machine model. The results show that the prediction performance of BBN model is slightly better than that of NB. For the data set with ambiguity, although the BBN model's prediction accuracy is not as good as the compared models, it achieves better performance than the compared models for the data set without ambiguity.

Truncated Kernel Projection Machine for Link Prediction

  • Huang, Liang;Li, Ruixuan;Chen, Hong
    • Journal of Computing Science and Engineering
    • /
    • 제10권2호
    • /
    • pp.58-67
    • /
    • 2016
  • With the large amount of complex network data that is increasingly available on the Web, link prediction has become a popular data-mining research field. The focus of this paper is on a link-prediction task that can be formulated as a binary classification problem in complex networks. To solve this link-prediction problem, a sparse-classification algorithm called "Truncated Kernel Projection Machine" that is based on empirical-feature selection is proposed. The proposed algorithm is a novel way to achieve a realization of sparse empirical-feature-based learning that is different from those of the regularized kernel-projection machines. The algorithm is more appealing than those of the previous outstanding learning machines since it can be computed efficiently, and it is also implemented easily and stably during the link-prediction task. The algorithm is applied here for link-prediction tasks in different complex networks, and an investigation of several classification algorithms was performed for comparison. The experimental results show that the proposed algorithm outperformed the compared algorithms in several key indices with a smaller number of test errors and greater stability.

A Novel Thresholding for Prediction Analytics with Machine Learning Techniques

  • Shakir, Khan;Reemiah Muneer, Alotaibi
    • International Journal of Computer Science & Network Security
    • /
    • 제23권1호
    • /
    • pp.33-40
    • /
    • 2023
  • Machine-learning techniques are discovering effective performance on data analytics. Classification and regression are supported for prediction on different kinds of data. There are various breeds of classification techniques are using based on nature of data. Threshold determination is essential to making better model for unlabelled data. In this paper, threshold value applied as range, based on min-max normalization technique for creating labels and multiclass classification performed on rainfall data. Binary classification is applied on autism data and classification techniques applied on child abuse data. Performance of each technique analysed with the evaluation metrics.

A multi-dimensional crime spatial pattern analysis and prediction model based on classification

  • Hajela, Gaurav;Chawla, Meenu;Rasool, Akhtar
    • ETRI Journal
    • /
    • 제43권2호
    • /
    • pp.272-287
    • /
    • 2021
  • This article presents a multi-dimensional spatial pattern analysis of crime events in San Francisco. Our analysis includes the impact of spatial resolution on hotspot identification, temporal effects in crime spatial patterns, and relationships between various crime categories. In this work, crime prediction is viewed as a classification problem. When predictions for a particular category are made, a binary classification-based model is framed, and when all categories are considered for analysis, a multiclass model is formulated. The proposed crime-prediction model (HotBlock) utilizes spatiotemporal analysis for predicting crime in a fixed spatial region over a period of time. It is robust under variation of model parameters. HotBlock's results are compared with baseline real-world crime datasets. It is found that the proposed model outperforms the standard DeepCrime model in most cases.

Molecular Classification of Hepatocellular Carcinoma and Its Impact on Prognostic Prediction and Personized Therapy

  • Dhruba Kadel;Lun-Xiu Qin
    • Journal of Digestive Cancer Research
    • /
    • 제5권1호
    • /
    • pp.5-15
    • /
    • 2017
  • Hepatocellular carcinoma (HCC) is the sixth most common cancer and second leading cause of cancer-related death in the world. The aggressive but not always predictable pattern of HCC causes the limited treatment option and poorer outcome. Many researches had already proven the heterogeneity of HCC is one of the major challenges for treatment option and prognosis prediction. Molecular subtyping of HCC and selection of patient based on molecular profile can provide the optimization in the treatment and prognosis prediction. In this review, we have tried to summarize the molecular classification of HCC proposed by different valuable researches presented in the logistic way.

  • PDF