• Title/Summary/Keyword: classification/prediction

Search Result 1,103, Processing Time 0.026 seconds

Machine Learning Methods to Predict Vehicle Fuel Consumption

  • Ko, Kwangho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.9
    • /
    • pp.13-20
    • /
    • 2022
  • It's proposed and analyzed ML(Machine Learning) models to predict vehicle FC(Fuel Consumption) in real-time. The test driving was done for a car to measure vehicle speed, acceleration, road gradient and FC for training dataset. The various ML models were trained with feature data of speed, acceleration and road-gradient for target FC. There are two kind of ML models and one is regression type of linear regression and k-nearest neighbors regression and the other is classification type of k-nearest neighbors classifier, logistic regression, decision tree, random forest and gradient boosting in the study. The prediction accuracy is low in range of 0.5 ~ 0.6 for real-time FC and the classification type is more accurate than the regression ones. The prediction error for total FC has very low value of about 0.2 ~ 2.0% and regression models are more accurate than classification ones. It's for the coefficient of determination (R2) of accuracy score distributing predicted values along mean of targets as the coefficient decreases. Therefore regression models are good for total FC and classification ones are proper for real-time FC prediction.

Optimizing SVM Ensembles Using Genetic Algorithms in Bankruptcy Prediction

  • Kim, Myoung-Jong;Kim, Hong-Bae;Kang, Dae-Ki
    • Journal of information and communication convergence engineering
    • /
    • v.8 no.4
    • /
    • pp.370-376
    • /
    • 2010
  • Ensemble learning is a method for improving the performance of classification and prediction algorithms. However, its performance can be degraded due to multicollinearity problem where multiple classifiers of an ensemble are highly correlated with. This paper proposes genetic algorithm-based optimization techniques of SVM ensemble to solve multicollinearity problem. Empirical results with bankruptcy prediction on Korea firms indicate that the proposed optimization techniques can improve the performance of SVM ensemble.

On the Fuzzy Membership Function of Fuzzy Support Vector Machines for Pattern Classification of Time Series Data (퍼지서포트벡터기계의 시계열자료 패턴분류를 위한 퍼지소속 함수에 관한 연구)

  • Lee, Soo-Yong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.6
    • /
    • pp.799-803
    • /
    • 2007
  • In this paper, we propose a new fuzzy membership function for FSVM(Fuzzy Support Vector Machines). We apply a fuzzy membership to each input point of SVM and reformulate SVM into fuzzy SVM (FSVM) such that different input points can make different contributions to the learning of decision surface. The proposed method enhances the SVM in reducing the effect of outliers and noises in data points. This paper compares classification and estimated performance of SVM, FSVM(1), and FSVM(2) model that are getting into the spotlight in time series prediction.

The Predictive QSAR Model for hERG Inhibitors Using Bayesian and Random Forest Classification Method

  • Kim, Jun-Hyoung;Chae, Chong-Hak;Kang, Shin-Myung;Lee, Joo-Yon;Lee, Gil-Nam;Hwang, Soon-Hee;Kang, Nam-Sook
    • Bulletin of the Korean Chemical Society
    • /
    • v.32 no.4
    • /
    • pp.1237-1240
    • /
    • 2011
  • In this study, we have developed a ligand-based in-silico prediction model to classify chemical structures into hERG blockers using Bayesian and random forest modeling methods. These models were built based on patch clamp experimental results. The findings presented in this work indicate that Laplacian-modified naive Bayesian classification with diverse selection is useful for predicting hERG inhibitors when a large data set is not obtained.

Prediction of Hypertension Complications Risk Using Classification Techniques

  • Lee, Wonji;Lee, Junghye;Lee, Hyeseon;Jun, Chi-Hyuck;Park, Il-Su;Kang, Sung-Hong
    • Industrial Engineering and Management Systems
    • /
    • v.13 no.4
    • /
    • pp.449-453
    • /
    • 2014
  • Chronic diseases including hypertension and its complications are major sources causing the national medical expenditures to increase. We aim to predict the risk of hypertension complications for hypertension patients, using the sample national healthcare database established by Korean National Health Insurance Corporation. We apply classification techniques, such as logistic regression, linear discriminant analysis, and classification and regression tree to predict the hypertension complication onset event for each patient. The performance of these three methods is compared in terms of accuracy, sensitivity and specificity. The result shows that these methods seem to perform similarly although the logistic regression performs marginally better than the others.

Estimating Prediction Errors in Binary Classification Problem: Cross-Validation versus Bootstrap

  • Kim Ji-Hyun;Cha Eun-Song
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.1
    • /
    • pp.151-165
    • /
    • 2006
  • It is important to estimate the true misclassification rate of a given classifier when an independent set of test data is not available. Cross-validation and bootstrap are two possible approaches in this case. In related literature bootstrap estimators of the true misclassification rate were asserted to have better performance for small samples than cross-validation estimators. We compare the two estimators empirically when the classification rule is so adaptive to training data that its apparent misclassification rate is close to zero. We confirm that bootstrap estimators have better performance for small samples because of small variance, and we have found a new fact that their bias tends to be significant even for moderate to large samples, in which case cross-validation estimators have better performance with less computation.

Nondestructive Classification between Normal and Artificially Aged Corn (Zea mays L.) Seeds Using Near Infrared Spectroscopy

  • Min, Tai-Gi;Kang, Woo-Sik
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.53 no.3
    • /
    • pp.314-319
    • /
    • 2008
  • Near infrared (NIR) spectroscopy was used to classify normal and artificially aged nonviable corn (Zea mays L., cv. 'Suwon19') seeds. The spectra at 1100-2500nm were scanned with normal and artificially aged single seeds and analyzed by principle component analysis (PCA). To discriminate normal seeds from artificially aged seeds, a calibration modeling set was developed with a discriminant partial least square 2 (PLS 2) method. The calibration model derived from PLS 2 resulted in 100% classification accuracy of normal and artificially aged (aged) seeds from the raw, the 1st and 2nd derivative spectra. The prediction accuracy of the unknown normal seeds was 88, 100 and 97% from the raw, the $1^{st}$ and $2^{nd}$ derivative spectra, and that of the unknown aged seeds was 100% from all the raw, the $1^{st}$ and $2^{nd}$ derivative spectra, respectively. The results showed a possibility to separate corn seeds into viable and non-viable using NIR spectroscopy.

Radar and Vision Sensor Fusion for Primary Vehicle Detection (레이더와 비전센서 융합을 통한 전방 차량 인식 알고리즘 개발)

  • Yang, Seung-Han;Song, Bong-Sob;Um, Jae-Young
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.16 no.7
    • /
    • pp.639-645
    • /
    • 2010
  • This paper presents the sensor fusion algorithm that recognizes a primary vehicle by fusing radar and monocular vision data. In general, most of commercial radars may lose tracking of the primary vehicle, i.e., the closest preceding vehicle in the same lane, when it stops or goes with other preceding vehicles in the adjacent lane with similar velocity and range. In order to improve the performance degradation of radar, vehicle detection information from vision sensor and path prediction predicted by ego vehicle sensors will be combined for target classification. Then, the target classification will work with probabilistic association filters to track a primary vehicle. Finally the performance of the proposed sensor fusion algorithm is validated using field test data on highway.

Application of data mining and statistical measurement of agricultural high-quality development

  • Yan Zhou
    • Advances in nano research
    • /
    • v.14 no.3
    • /
    • pp.225-234
    • /
    • 2023
  • In this study, we aim to use big data resources and statistical analysis to obtain a reliable instruction to reach high-quality and high yield agricultural yields. In this regard, soil type data, raining and temperature data as well as wheat production in each year are collected for a specific region. Using statistical methodology, the acquired data was cleaned to remove incomplete and defective data. Afterwards, using several classification methods in machine learning we tried to distinguish between different factors and their influence on the final crop yields. Comparing the proposed models' prediction using statistical quantities correlation factor and mean squared error between predicted values of the crop yield and actual values the efficacy of machine learning methods is discussed. The results of the analysis show high accuracy of machine learning methods in the prediction of the crop yields. Moreover, it is indicated that the random forest (RF) classification approach provides best results among other classification methods utilized in this study.

A Study on the Performance Evaluation of Machine Learning for Predicting the Number of Movie Audiences (영화 관객 수 예측을 위한 기계학습 기법의 성능 평가 연구)

  • Jeong, Chan-Mi;Min, Daiki
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.2
    • /
    • pp.49-63
    • /
    • 2020
  • The accurate prediction of box office in the early stage is crucial for film industry to make better managerial decision. With aims to improve the prediction performance, the purpose of this paper is to evaluate the use of machine learning methods. We tested both classification and regression based methods including k-NN, SVM and Random Forest. We first evaluate input variables, which show that reputation-related information generated during the first two-week period after release is significant. Prediction test results show that regression based methods provides lower prediction error, and Random Forest particularly outperforms other machine learning methods. Regression based method has better prediction power when films have small box office earnings. On the other hand, classification based method works better for predicting large box office earnings.