• Title/Summary/Keyword: support vector regression.

Search Result 554, Processing Time 0.027 seconds

A Hybrid Under-sampling Approach for Better Bankruptcy Prediction (부도예측 개선을 위한 하이브리드 언더샘플링 접근법)

  • Kim, Taehoon;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.173-190
    • /
    • 2015
  • The purpose of this study is to improve bankruptcy prediction models by using a novel hybrid under-sampling approach. Most prior studies have tried to enhance the accuracy of bankruptcy prediction models by improving the classification methods involved. In contrast, we focus on appropriate data preprocessing as a means of enhancing accuracy. In particular, we aim to develop an effective sampling approach for bankruptcy prediction, since most prediction models suffer from class imbalance problems. The approach proposed in this study is a hybrid under-sampling method that combines the k-Reverse Nearest Neighbor (k-RNN) and one-class support vector machine (OCSVM) approaches. k-RNN can effectively eliminate outliers, while OCSVM contributes to the selection of informative training samples from majority class data. To validate our proposed approach, we have applied it to data from H Bank's non-external auditing companies in Korea, and compared the performances of the classifiers with the proposed under-sampling and random sampling data. The empirical results show that the proposed under-sampling approach generally improves the accuracy of classifiers, such as logistic regression, discriminant analysis, decision tree, and support vector machines. They also show that the proposed under-sampling approach reduces the risk of false negative errors, which lead to higher misclassification costs.

Performance Evaluation of Attention-inattetion Classifiers using Non-linear Recurrence Pattern and Spectrum Analysis (비선형 반복 패턴과 스펙트럼 분석을 이용한 집중-비집중 분류기의 성능 평가)

  • Lee, Jee-Eun;Yoo, Sun-Kook;Lee, Byung-Chae
    • Science of Emotion and Sensibility
    • /
    • v.16 no.3
    • /
    • pp.409-416
    • /
    • 2013
  • Attention is one of important cognitive functions in human affecting on the selectional concentration of relevant events and ignorance of irrelevant events. The discrimination of attentional and inattentional status is the first step to manage human's attentional capability using computer assisted device. In this paper, we newly combine the non-linear recurrence pattern analysis and spectrum analysis to effectively extract features(total number of 13) from the electroencephalographic signal used in the input to classifiers. The performance of diverse types of attention-inattention classifiers, including supporting vector machine, back-propagation algorithm, linear discrimination, gradient decent, and logistic regression classifiers were evaluated. Among them, the support vector machine classifier shows the best performance with the classification accuracy of 81 %. The use of spectral band feature set alone(accuracy of 76 %) shows better performance than that of non-linear recurrence pattern feature set alone(accuracy of 67 %). The support vector machine classifier with hybrid combination of non-linear and spectral analysis can be used in later designing attention-related devices.

  • PDF

Optimization of Support Vector Machines for Financial Forecasting (재무예측을 위한 Support Vector Machine의 최적화)

  • Kim, Kyoung-Jae;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.241-254
    • /
    • 2011
  • Financial time-series forecasting is one of the most important issues because it is essential for the risk management of financial institutions. Therefore, researchers have tried to forecast financial time-series using various data mining techniques such as regression, artificial neural networks, decision trees, k-nearest neighbor etc. Recently, support vector machines (SVMs) are popularly applied to this research area because they have advantages that they don't require huge training data and have low possibility of overfitting. However, a user must determine several design factors by heuristics in order to use SVM. For example, the selection of appropriate kernel function and its parameters and proper feature subset selection are major design factors of SVM. Other than these factors, the proper selection of instance subset may also improve the forecasting performance of SVM by eliminating irrelevant and distorting training instances. Nonetheless, there have been few studies that have applied instance selection to SVM, especially in the domain of stock market prediction. Instance selection tries to choose proper instance subsets from original training data. It may be considered as a method of knowledge refinement and it maintains the instance-base. This study proposes the novel instance selection algorithm for SVMs. The proposed technique in this study uses genetic algorithm (GA) to optimize instance selection process with parameter optimization simultaneously. We call the model as ISVM (SVM with Instance selection) in this study. Experiments on stock market data are implemented using ISVM. In this study, the GA searches for optimal or near-optimal values of kernel parameters and relevant instances for SVMs. This study needs two sets of parameters in chromosomes in GA setting : The codes for kernel parameters and for instance selection. For the controlling parameters of the GA search, the population size is set at 50 organisms and the value of the crossover rate is set at 0.7 while the mutation rate is 0.1. As the stopping condition, 50 generations are permitted. The application data used in this study consists of technical indicators and the direction of change in the daily Korea stock price index (KOSPI). The total number of samples is 2218 trading days. We separate the whole data into three subsets as training, test, hold-out data set. The number of data in each subset is 1056, 581, 581 respectively. This study compares ISVM to several comparative models including logistic regression (logit), backpropagation neural networks (ANN), nearest neighbor (1-NN), conventional SVM (SVM) and SVM with the optimized parameters (PSVM). In especial, PSVM uses optimized kernel parameters by the genetic algorithm. The experimental results show that ISVM outperforms 1-NN by 15.32%, ANN by 6.89%, Logit and SVM by 5.34%, and PSVM by 4.82% for the holdout data. For ISVM, only 556 data from 1056 original training data are used to produce the result. In addition, the two-sample test for proportions is used to examine whether ISVM significantly outperforms other comparative models. The results indicate that ISVM outperforms ANN and 1-NN at the 1% statistical significance level. In addition, ISVM performs better than Logit, SVM and PSVM at the 5% statistical significance level.

Improved Estimation of Hourly Surface Ozone Concentrations using Stacking Ensemble-based Spatial Interpolation (스태킹 앙상블 모델을 이용한 시간별 지상 오존 공간내삽 정확도 향상)

  • KIM, Ye-Jin;KANG, Eun-Jin;CHO, Dong-Jin;LEE, Si-Woo;IM, Jung-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.25 no.3
    • /
    • pp.74-99
    • /
    • 2022
  • Surface ozone is produced by photochemical reactions of nitrogen oxides(NOx) and volatile organic compounds(VOCs) emitted from vehicles and industrial sites, adversely affecting vegetation and the human body. In South Korea, ozone is monitored in real-time at stations(i.e., point measurements), but it is difficult to monitor and analyze its continuous spatial distribution. In this study, surface ozone concentrations were interpolated to have a spatial resolution of 1.5km every hour using the stacking ensemble technique, followed by a 5-fold cross-validation. Base models for the stacking ensemble were cokriging, multi-linear regression(MLR), random forest(RF), and support vector regression(SVR), while MLR was used as the meta model, having all base model results as additional input variables. The results showed that the stacking ensemble model yielded the better performance than the individual base models, resulting in an averaged R of 0.76 and RMSE of 0.0065ppm during the study period of 2020. The surface ozone concentration distribution generated by the stacking ensemble model had a wider range with a spatial pattern similar with terrain and urbanization variables, compared to those by the base models. Not only should the proposed model be capable of producing the hourly spatial distribution of ozone, but it should also be highly applicable for calculating the daily maximum 8-hour ozone concentrations.

Analysis of Marine Accident based on Impact of Tidal Stream and Vessel Tracking in VTS Are (VTS 관제 구역 내 조류의 영향과 항적 이동에 따른 해양 사고 분석 방법)

  • Kim, Joo-Sung;Jeong, Jung-Sik;Kang, Seung-Ho;Lim, Se-Wook
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2018.05a
    • /
    • pp.246-247
    • /
    • 2018
  • Since the routes within VTS areas include harbour limit of major ports, there are sections where the traffic volume increases and the routes are normally narrow according to the geographical conditions. In the case of ports and VTS areas located on the west coast of Korea, it is affected by strong current due to large tidal differences. In this paper, we propose a method to produce useful information according to the change of navigation environment by analyzing the characteristics of ship's movement according to tidal stream or current. The SVR seaway model, support vector regression, and grid search were conducted in order to extract models.

  • PDF

Spatio-temporal Load Forecasting Considering Aggregation Features of Electricity Cells and Uncertainties in Input Variables

  • Zhao, Teng;Zhang, Yan;Chen, Haibo
    • Journal of Electrical Engineering and Technology
    • /
    • v.13 no.1
    • /
    • pp.38-50
    • /
    • 2018
  • Spatio-temporal load forecasting (STLF) is a foundation for building the prediction-based power map, which could be a useful tool for the visualization and tendency assessment of urban energy application. Constructing one point-forecasting model for each electricity cell in the geographic space is possible; however, it is unadvisable and insufficient, considering the aggregation features of electricity cells and uncertainties in input variables. This paper presents a new STLF method, with a data-driven framework consisting of 3 subroutines: multi-level clustering of cells considering their aggregation features, load regression for each category of cells based on SLS-SVRNs (sparse least squares support vector regression networks), and interval forecasting of spatio-temporal load with sampled blind number. Take some area in Pudong, Shanghai as the region of study. Results of multi-level clustering show that electricity cells in the same category are clustered in geographic space to some extent, which reveals the spatial aggregation feature of cells. For cellular load regression, a comparison has been made with 3 other forecasting methods, indicating the higher accuracy of the proposed method in point-forecasting of spatio-temporal load. Furthermore, results of interval load forecasting demonstrate that the proposed prediction-interval construction method can effectively convey the uncertainties in input variables.

Prediction of Residual Resistance Coefficient of Low-Speed Full Ships Using Hull Form Variables and Machine Learning Approaches (선형변수 기계학습 기법을 활용한 저속비대선의 잉여저항계수 추정)

  • Kim, Yoo-Chul;Yang, Kyung-Kyu;Kim, Myung-Soo;Lee, Young-Yeon;Kim, Kwang-Soo
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.57 no.6
    • /
    • pp.312-321
    • /
    • 2020
  • In this study, machine learning techniques were applied to predict the residual resistance coefficient (Cr) of low-speed full ships. The used machine learning methods are Ridge regression, support vector regression, random forest, neural network and their ensemble model. 19 hull form variables were used as input variables for machine learning methods. The hull form variables and Cr data obtained from 139 hull forms of KRISO database were used in analysis. 80 % of the total data were used as training models and the rest as validation. Some non-linear models showed the overfitted results and the ensemble model showed better results than others.

Estimation of lightweight aggregate concrete characteristics using a novel stacking ensemble approach

  • Kaloop, Mosbeh R.;Bardhan, Abidhan;Hu, Jong Wan;Abd-Elrahman, Mohamed
    • Advances in nano research
    • /
    • v.13 no.5
    • /
    • pp.499-512
    • /
    • 2022
  • This study investigates the efficiency of ensemble machine learning for predicting the lightweight-aggregate concrete (LWC) characteristics. A stacking ensemble (STEN) approach was proposed to estimate the dry density (DD) and 28 days compressive strength (Fc-28) of LWC using two meta-models called random forest regressor (RFR) and extra tree regressor (ETR), and two novel ensemble models called STEN-RFR and STEN-ETR, were constructed. Four standalone machine learning models including artificial neural network, gradient boosting regression, K neighbor regression, and support vector regression were used to compare the performance of the proposed models. For this purpose, a sum of 140 LWC mixtures with 21 influencing parameters for producing LWC with a density less than 1000 kg/m3, were used. Based on the experimental results with multiple performance criteria, it can be concluded that the proposed STEN-ETR model can be used to estimate the DD and Fc-28 of LWC. Moreover, the STEN-ETR approach was found to be a significant technique in prediction DD and Fc-28 of LWC with minimal prediction error. In the validation phase, the accuracy of the proposed STEN-ETR model in predicting DD and Fc-28 was found to be 96.79% and 81.50%, respectively. In addition, the significance of cement, water-cement ratio, silica fume, and aggregate with expanded glass variables is efficient in modeling DD and Fc-28 of LWC.

Creation of regression analysis for estimation of carbon fiber reinforced polymer-steel bond strength

  • Xiaomei Sun;Xiaolei Dong;Weiling Teng;Lili Wang;Ebrahim Hassankhani
    • Steel and Composite Structures
    • /
    • v.51 no.5
    • /
    • pp.509-527
    • /
    • 2024
  • Bonding carbon fiber-reinforced polymer (CFRP) laminates have been extensively employed in the restoration of steel constructions. In addition to the mechanical properties of the CFRP, the bond strength (PU) between the CFRP and steel is often important in the eventual strengthened performance. Nonetheless, the bond behavior of the CFRP-steel (CS) interface is exceedingly complicated, with multiple failure causes, giving the PU challenging to forecast, and the CFRP-enhanced steel structure is unsteady. In just this case, appropriate methods were established by hybridized Random Forests (RF) and support vector regression (SVR) approaches on assembled CS single-shear experiment data to foresee the PU of CS, in which a recently established optimization algorithm named Aquila optimizer (AO) was used to tune the RF and SVR hyperparameters. In summary, the practical novelty of the article lies in its development of a reliable and efficient method for predicting bond strength at the CS interface, which has significant implications for structural rehabilitation, design optimization, risk mitigation, cost savings, and decision support in engineering practice. Moreover, the Fourier Amplitude Sensitivity Test was performed to depict each parameter's impact on the target. The order of parameter importance was tc> Lc > EA > tA > Ec > bc > fc > fA from largest to smallest by 0.9345 > 0.8562 > 0.79354 > 0.7289 > 0.6531 > 0.5718 > 0.4307 > 0.3657. In three training, testing, and all data phases, the superiority of AO - RF with respect to AO - SVR and MARS was obvious. In the training stage, the values of R2 and VAF were slightly similar with a tiny superiority of AO - RF compared to AO - SVR with R2 equal to 0.9977 and VAF equal to 99.772, but large differences with results of MARS.

Data-mining modeling for the prediction of wear on forming-taps in the threading of steel components

  • Bustillo, Andres;Lopez de Lacalle, Luis N.;Fernandez-Valdivielso, Asier;Santos, Pedro
    • Journal of Computational Design and Engineering
    • /
    • v.3 no.4
    • /
    • pp.337-348
    • /
    • 2016
  • An experimental approach is presented for the measurement of wear that is common in the threading of cold-forged steel. In this work, the first objective is to measure wear on various types of roll taps manufactured to tapping holes in microalloyed HR45 steel. Different geometries and levels of wear are tested and measured. Taking their geometry as the critical factor, the types of forming tap with the least wear and the best performance are identified. Abrasive wear was observed on the forming lobes. A higher number of lobes in the chamber zone and around the nominal diameter meant a more uniform load distribution and a more gradual forming process. A second objective is to identify the most accurate data-mining technique for the prediction of form-tap wear. Different data-mining techniques are tested to select the most accurate one: from standard versions such as Multilayer Perceptrons, Support Vector Machines and Regression Trees to the most recent ones such as Rotation Forest ensembles and Iterated Bagging ensembles. The best results were obtained with ensembles of Rotation Forest with unpruned Regression Trees as base regressors that reduced the RMS error of the best-tested baseline technique for the lower length output by 33%, and Additive Regression with unpruned M5P as base regressors that reduced the RMS errors of the linear fit for the upper and total lengths by 25% and 39%, respectively. However, the lower length was statistically more difficult to model in Additive Regression than in Rotation Forest. Rotation Forest with unpruned Regression Trees as base regressors therefore appeared to be the most suitable regressor for the modeling of this industrial problem.