• Title/Summary/Keyword: support vector regression.

Search Result 554, Processing Time 0.027 seconds

Binary Forecast of Asian Dust Days over South Korea in the Winter Season (남한지역 겨울철 황사출현일수에 대한 범주 예측모형 개발)

  • Sohn, Keon-Tae;Lee, Hyo-Jin;Kim, Seung-Bum
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.3
    • /
    • pp.535-546
    • /
    • 2011
  • This study develops statistical models for the binary forecast of Asian dust days over South Korea in the winter season. For this study, we used three kinds of data; the rst one is the observed Asian dust days for a period of 31 years (1980 to 2010) as target values, the second one is four meteorological factors(near surface temperature, precipitation, snowfall, ground wind speed) in the source regions of Asian dust based on the NCEP reanalysis data and the third one is the large-scale climate indices. Four kinds of statistical models(multiple regression models, logistic regression models, decision trees, and support vector machines) are applied and compared based on skill scores(hit rate, probability of detection and false alarm rate).

Energy analysis-based core drilling method for the prediction of rock uniaxial compressive strength

  • Qi, Wang;Shuo, Xu;Ke, Gao Hong;Peng, Zhang;Bei, Jiang;Hong, Liu Bo
    • Geomechanics and Engineering
    • /
    • v.23 no.1
    • /
    • pp.61-69
    • /
    • 2020
  • The uniaxial compressive strength (UCS) of rock is a basic parameter in underground engineering design. The disadvantages of this commonly employed laboratory testing method are untimely testing, difficulty in performing core testing of broken rock mass and long and complicated onsite testing processes. Therefore, the development of a fast and simple in situ rock UCS testing method for field use is urgent. In this study, a multi-function digital rock drilling and testing system and a digital core bit dedicated to the system are independently developed and employed in digital drilling tests on rock specimens with different strengths. The energy analysis is performed during rock cutting to estimate the energy consumed by the drill bit to remove a unit volume of rock. Two quantitative relationship models of energy analysis-based core drilling parameters (ECD) and rock UCS (ECD-UCS models) are established in this manuscript by the methods of regression analysis and support vector machine (SVM). The predictive abilities of the two models are comparatively analysed. The results show that the mean value of relative difference between the predicted rock UCS values and the UCS values measured by the laboratory uniaxial compression test in the prediction set are 3.76 MPa and 4.30 MPa, respectively, and the standard deviations are 2.08 MPa and 4.14 MPa, respectively. The regression analysis-based ECD-UCS model has a more stable predictive ability. The energy analysis-based rock drilling method for the prediction of UCS is proposed. This method realized the quick and convenient in situ test of rock UCS.

Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games (데이터마이닝을 활용한 한국프로야구 승패예측모형 수립에 관한 연구)

  • Oh, Younhak;Kim, Han;Yun, Jaesub;Lee, Jong-Seok
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.1
    • /
    • pp.8-17
    • /
    • 2014
  • In this research, we employed various data mining techniques to build predictive models for win-loss prediction in Korean professional baseball games. The historical data containing information about players and teams was obtained from the official materials that are provided by the KBO website. Using the collected raw data, we additionally prepared two more types of dataset, which are in ratio and binary format respectively. Dividing away-team's records by the records of the corresponding home-team generated the ratio dataset, while the binary dataset was obtained by comparing the record values. We applied seven classification techniques to three (raw, ratio, and binary) datasets. The employed data mining techniques are decision tree, random forest, logistic regression, neural network, support vector machine, linear discriminant analysis, and quadratic discriminant analysis. Among 21(= 3 datasets${\times}$7 techniques) prediction scenarios, the most accurate model was obtained from the random forest technique based on the binary dataset, which prediction accuracy was 84.14%. It was also observed that using the ratio and the binary dataset helped to build better prediction models than using the raw data. From the capability of variable selection in decision tree, random forest, and stepwise logistic regression, we found that annual salary, earned run, strikeout, pitcher's winning percentage, and four balls are important winning factors of a game. This research is distinct from existing studies in that we used three different types of data and various data mining techniques for win-loss prediction in Korean professional baseball games.

Optimal Design for Minimizing Weight of Housing of Hydraulic Breaker (유압 브레이커의 중량 감소를 위한 하우징 최적설계)

  • Park, Gyu-Byung;Park, Chang-Hyun;Park, Yong-Shik;Choi, Dong-Hoon
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.35 no.2
    • /
    • pp.207-212
    • /
    • 2011
  • A hydraulic breaker is an attachment installed at the end of excavator arm and is used for breaking. As per the authors' knowledge, there have been no research results on reducing the weight of the hydraulic breaker even though this weight reduction is very important for improving the performance of the excavator. In this study, we minimize the weight of the housing of the hydraulic breaker under normal operating conditions, while the maximum stress of the housing is lower than the allowable stress. A meta-model, which is generated by using the CAE results for the sampling design points determined by an orthogonal array, is used to solve the minimization problem. The weight of the housing according to the optimal design is found to be lower than the original weight by 4.8% while satisfying the constraint on the maximum stress.

Predicting Daily Nutrient Water Consumption by Strawberry Plants in a Greenhouse Environment

  • Sathishkumar, VE;Lee, Myeong-Bae;Lim, Jong-Hyun;Shin, Chang-Sun;Park, Chang-Woo;Cho, Yong Yun
    • Annual Conference of KIPS
    • /
    • 2019.10a
    • /
    • pp.581-584
    • /
    • 2019
  • Food consumption is growing worldwide every year owing to a growing population. Hence, the increasing population needs the production of sufficient and good quality food products. Strawberry is one of the world's most famous fruit. To obtain the highest strawberry output, we worked with three strawberry varieties supplied with three kinds of nutrient water in a greenhouse and with the outcome of the strawberry production, the highest yielding strawberry variety is detected. This Study uses the nutrient water consumed every day by the highest yielding strawberry variety. The atmospheric temperature, humidity and CO2 levels within the greenhouse are identified and used for the prediction, since the water consumption by any plant depends primarily on weather conditions. Machine learning techniques show successful outcomes in a multitude of issues including time series and regression issues. In this study, daily nutrient water consumption of strawberry plants is predicted using machine learning algorithms is proposed. Four Machine learning algorithms are used such as Linear Regression (LR), K nearest neighbour (KNN), Support Vector Machine with Radial Kernel (SVM) and Gradient Boosting Machine (GBM). Gradient Boosting System produces the best results.

Quality Prediction Model for Manufacturing Process of Free-Machining 303-series Stainless Steel Small Rolling Wire Rods (쾌삭 303계 스테인리스강 소형 압연 선재 제조 공정의 생산품질 예측 모형)

  • Seo, Seokjun;Kim, Heungseob
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.4
    • /
    • pp.12-22
    • /
    • 2021
  • This article suggests the machine learning model, i.e., classifier, for predicting the production quality of free-machining 303-series stainless steel(STS303) small rolling wire rods according to the operating condition of the manufacturing process. For the development of the classifier, manufacturing data for 37 operating variables were collected from the manufacturing execution system(MES) of Company S, and the 12 types of derived variables were generated based on literature review and interviews with field experts. This research was performed with data preprocessing, exploratory data analysis, feature selection, machine learning modeling, and the evaluation of alternative models. In the preprocessing stage, missing values and outliers are removed, and oversampling using SMOTE(Synthetic oversampling technique) to resolve data imbalance. Features are selected by variable importance of LASSO(Least absolute shrinkage and selection operator) regression, extreme gradient boosting(XGBoost), and random forest models. Finally, logistic regression, support vector machine(SVM), random forest, and XGBoost are developed as a classifier to predict the adequate or defective products with new operating conditions. The optimal hyper-parameters for each model are investigated by the grid search and random search methods based on k-fold cross-validation. As a result of the experiment, XGBoost showed relatively high predictive performance compared to other models with an accuracy of 0.9929, specificity of 0.9372, F1-score of 0.9963, and logarithmic loss of 0.0209. The classifier developed in this study is expected to improve productivity by enabling effective management of the manufacturing process for the STS303 small rolling wire rods.

Study of Machine Learning based on EEG for the Control of Drone Flight (뇌파기반 드론제어를 위한 기계학습에 관한 연구)

  • Hong, Yejin;Cho, Seongmin;Cha, Dowan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.249-251
    • /
    • 2022
  • In this paper, we present machine learning to control drone flight using EEG signals. We defined takeoff, forward, backward, left movement and right movement as control targets and measured EEG signals from the frontal lobe for controlling using Fp1. Fp2 Fp2 two-channel dry electrode (NeuroNicle FX2) measuring at 250Hz sampling rate. And the collected data were filtered at 6~20Hz cutoff frequency. We measured the motion image of the action associated with each control target open for 5.19 seconds. Using Matlab's classification learner for the measured EEG signal, the triple layer neural network, logistic regression kernel, nonlinear polynomial Support Vector Machine(SVM) learning was performed, logistic regression kernel was confirmed as the highest accuracy for takeoff and forward, backward, left movement and right movement of the drone in learning by class True Positive Rate(TPR).

  • PDF

Predictive modeling algorithms for liver metastasis in colorectal cancer: A systematic review of the current literature

  • Isaac Seow-En;Ye Xin Koh;Yun Zhao;Boon Hwee Ang;Ivan En-Howe Tan;Aik Yong Chok;Emile John Kwong Wei Tan;Marianne Kit Har Au
    • Annals of Hepato-Biliary-Pancreatic Surgery
    • /
    • v.28 no.1
    • /
    • pp.14-24
    • /
    • 2024
  • This study aims to assess the quality and performance of predictive models for colorectal cancer liver metastasis (CRCLM). A systematic review was performed to identify relevant studies from various databases. Studies that described or validated predictive models for CRCLM were included. The methodological quality of the predictive models was assessed. Model performance was evaluated by the reported area under the receiver operating characteristic curve (AUC). Of the 117 articles screened, seven studies comprising 14 predictive models were included. The distribution of included predictive models was as follows: radiomics (n = 3), logistic regression (n = 3), Cox regression (n = 2), nomogram (n = 3), support vector machine (SVM, n = 2), random forest (n = 2), and convolutional neural network (CNN, n = 2). Age, sex, carcinoembryonic antigen, and tumor staging (T and N stage) were the most frequently used clinicopathological predictors for CRCLM. The mean AUCs ranged from 0.697 to 0.870, with 86% of the models demonstrating clear discriminative ability (AUC > 0.70). A hybrid approach combining clinical and radiomic features with SVM provided the best performance, achieving an AUC of 0.870. The overall risk of bias was identified as high in 71% of the included studies. This review highlights the potential of predictive modeling to accurately predict the occurrence of CRCLM. Integrating clinicopathological and radiomic features with machine learning algorithms demonstrates superior predictive capabilities.

Financial Fraud Detection using Data Mining: A Survey

  • Sudhansu Ranjan Lenka;Bikram Kesari Ratha
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.9
    • /
    • pp.169-185
    • /
    • 2024
  • Due to levitate and rapid growth of E-Commerce, most of the organizations are moving towards cashless transaction Unfortunately, the cashless transactions are not only used by legitimate users but also it is used by illegitimate users and which results in trouncing of billions of dollars each year worldwide. Fraud prevention and Fraud Detection are two methods used by the financial institutions to protect against these frauds. Fraud prevention systems (FPSs) are not sufficient enough to provide fully security to the E-Commerce systems. However, with the combined effect of Fraud Detection Systems (FDS) and FPS might protect the frauds. However, there still exist so many issues and challenges that degrade the performances of FDSs, such as overlapping of data, noisy data, misclassification of data, etc. This paper presents a comprehensive survey on financial fraud detection system using such data mining techniques. Over seventy research papers have been reviewed, mainly within the period 2002-2015, were analyzed in this study. The data mining approaches employed in this research includes Neural Network, Logistic Regression, Bayesian Belief Network, Support Vector Machine (SVM), Self Organizing Map(SOM), K-Nearest Neighbor(K-NN), Random Forest and Genetic Algorithm. The algorithms that have achieved high success rate in detecting credit card fraud are Logistic Regression (99.2%), SVM (99.6%) and Random Forests (99.6%). But, the most suitable approach is SOM because it has achieved perfect accuracy of 100%. But the algorithms implemented for financial statement fraud have shown a large difference in accuracy from CDA at 71.4% to a probabilistic neural network with 98.1%. In this paper, we have identified the research gap and specified the performance achieved by different algorithms based on parameters like, accuracy, sensitivity and specificity. Some of the key issues and challenges associated with the FDS have also been identified.

Nitrogen Oxide (NOx) Emissions Prediction of Gas Turbine in Coal-Fired Power Plant Using Online Learning Method (온라인 학습법을 활용한 석탄화력 발전소의 가스 터빈 내 질소산화물(NOx) 배출량 예측)

  • Jin Park;Changwan Ko;Young-Seon Jeong
    • Smart Media Journal
    • /
    • v.13 no.8
    • /
    • pp.58-66
    • /
    • 2024
  • Nitrogen oxides(NOx) in coal-fired power plants are significant contributors to air pollution, influencing the formation of ozone and fine particulate matter, thereby adversely affecting health. Therefore, accurate prediction of NOx emissions is essential. Existing researches have mainly performed based on off-line learning methods, leading to poor prediction performance with the limited training dataset. This paper proposes the online learning model of online support vector regression to predict NOx emissions from coal-fired power plants. Online learning model, which updates a model whenever new observations come out, demonstrates high prediction accuracy even when initial data is scarce. The experimental results showed that the performance of online learning prediction was better than existing off-line learning methods. The results indicated online learning method is a valuable tool for predicting NOx emissions, especially in situations where initial data is limited and data is continuously updated in real-time.