• Title/Summary/Keyword: support vector regression.

Search Result 554, Processing Time 0.024 seconds

A Study on the Development of Model for Estimating the Thickness of Clay Layer of Soft Ground in the Nakdong River Estuary (낙동강 조간대 연약지반의 지역별 점성토층 두께 추정 모델 개발에 관한 연구)

  • Seongin, Ahn;Dong-Woo, Ryu
    • Tunnel and Underground Space
    • /
    • v.32 no.6
    • /
    • pp.586-597
    • /
    • 2022
  • In this study, a model was developed for the estimating the locational thickness information of the upper clay layer to be used for the consolidation vulnerability evaluation in the Nakdong river estuary. To estimate ground layer thickness information, we developed four spatial estimation models using machine learning algorithms, which are RF (Random Forest), SVR (Support Vector Regression) and GPR (Gaussian Process Regression), and geostatistical technique such as Ordinary Kriging. Among the 4,712 borehole data in the study area collected for model development, 2,948 borehole data with an upper clay layer were used, and Pearson correlation coefficient and mean squared error were used to quantitatively evaluate the performance of the developed models. In addition, for qualitative evaluation, each model was used throughout the study area to estimate the information of the upper clay layer, and the thickness distribution characteristics of it were compared with each other.

Predicting rock brittleness indices from simple laboratory test results using some machine learning methods

  • Davood Fereidooni;Zohre Karimi
    • Geomechanics and Engineering
    • /
    • v.34 no.6
    • /
    • pp.697-726
    • /
    • 2023
  • Brittleness as an important property of rock plays a crucial role both in the failure process of intact rock and rock mass response to excavation in engineering geological and geotechnical projects. Generally, rock brittleness indices are calculated from the mechanical properties of rocks such as uniaxial compressive strength, tensile strength and modulus of elasticity. These properties are generally determined from complicated, expensive and time-consuming tests in laboratory. For this reason, in the present research, an attempt has been made to predict the rock brittleness indices from simple, inexpensive, and quick laboratory test results namely dry unit weight, porosity, slake-durability index, P-wave velocity, Schmidt rebound hardness, and point load strength index using multiple linear regression, exponential regression, support vector machine (SVM) with various kernels, generating fuzzy inference system, and regression tree ensemble (RTE) with boosting framework. So, this could be considered as an innovation for the present research. For this purpose, the number of 39 rock samples including five igneous, twenty-six sedimentary, and eight metamorphic were collected from different regions of Iran. Mineralogical, physical and mechanical properties as well as five well known rock brittleness indices (i.e., B1, B2, B3, B4, and B5) were measured for the selected rock samples before application of the above-mentioned machine learning techniques. The performance of the developed models was evaluated based on several statistical metrics such as mean square error, relative absolute error, root relative absolute error, determination coefficients, variance account for, mean absolute percentage error and standard deviation of the error. The comparison of the obtained results revealed that among the studied methods, SVM is the most suitable one for predicting B1, B2 and B5, while RTE predicts B3 and B4 better than other methods.

Analysis of Regional Fertility Gap Factors Using Explainable Artificial Intelligence (설명 가능한 인공지능을 이용한 지역별 출산율 차이 요인 분석)

  • Dongwoo Lee;Mi Kyung Kim;Jungyoon Yoon;Dongwon Ryu;Jae Wook Song
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.47 no.1
    • /
    • pp.41-50
    • /
    • 2024
  • Korea is facing a significant problem with historically low fertility rates, which is becoming a major social issue affecting the economy, labor force, and national security. This study analyzes the factors contributing to the regional gap in fertility rates and derives policy implications. The government and local authorities are implementing a range of policies to address the issue of low fertility. To establish an effective strategy, it is essential to identify the primary factors that contribute to regional disparities. This study identifies these factors and explores policy implications through machine learning and explainable artificial intelligence. The study also examines the influence of media and public opinion on childbirth in Korea by incorporating news and online community sentiment, as well as sentiment fear indices, as independent variables. To establish the relationship between regional fertility rates and factors, the study employs four machine learning models: multiple linear regression, XGBoost, Random Forest, and Support Vector Regression. Support Vector Regression, XGBoost, and Random Forest significantly outperform linear regression, highlighting the importance of machine learning models in explaining non-linear relationships with numerous variables. A factor analysis using SHAP is then conducted. The unemployment rate, Regional Gross Domestic Product per Capita, Women's Participation in Economic Activities, Number of Crimes Committed, Average Age of First Marriage, and Private Education Expenses significantly impact regional fertility rates. However, the degree of impact of the factors affecting fertility may vary by region, suggesting the need for policies tailored to the characteristics of each region, not just an overall ranking of factors.

Runtime Prediction Based on Workload-Aware Clustering (병렬 프로그램 로그 군집화 기반 작업 실행 시간 예측모형 연구)

  • Kim, Eunhye;Park, Ju-Won
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.3
    • /
    • pp.56-63
    • /
    • 2015
  • Several fields of science have demanded large-scale workflow support, which requires thousands of CPU cores or more. In order to support such large-scale scientific workflows, high capacity parallel systems such as supercomputers are widely used. In order to increase the utilization of these systems, most schedulers use backfilling policy: Small jobs are moved ahead to fill in holes in the schedule when large jobs do not delay. Since an estimate of the runtime is necessary for backfilling, most parallel systems use user's estimated runtime. However, it is found to be extremely inaccurate because users overestimate their jobs. Therefore, in this paper, we propose a novel system for the runtime prediction based on workload-aware clustering with the goal of improving prediction performance. The proposed method for runtime prediction of parallel applications consists of three main phases. First, a feature selection based on factor analysis is performed to identify important input features. Then, it performs a clustering analysis of history data based on self-organizing map which is followed by hierarchical clustering for finding the clustering boundaries from the weight vectors. Finally, prediction models are constructed using support vector regression with the clustered workload data. Multiple prediction models for each clustered data pattern can reduce the error rate compared with a single model for the whole data pattern. In the experiments, we use workload logs on parallel systems (i.e., iPSC, LANL-CM5, SDSC-Par95, SDSC-Par96, and CTC-SP2) to evaluate the effectiveness of our approach. Comparing with other techniques, experimental results show that the proposed method improves the accuracy up to 69.08%.

An Incremental Regression Model for Time Series Data Prediction (시계열 데이터 예측을 위한 점진적인 회귀분석 모델)

  • Kim Sung-Hyun;Lee Yong-Mi;Jin Long;Seo Sung-Bo;Ryu Keun-Ho
    • Annual Conference of KIPS
    • /
    • 2006.05a
    • /
    • pp.23-26
    • /
    • 2006
  • 기존의 데이터 마이닝 예측 기법 중 회귀분석은 학습 단계에서 생성된 모델을 변경 없이 새로운 데이터에 적용하였다. 그러나 시계열 데이터에 모델 변경 없이 동일하게 적용하면 시간이 지남에 따라 정확도가 낮아지는 단점이 있다. 따라서 이 논문에서는 시간에 따라 변화하는 시계열데이터의 특성을 고려하여 점진적으로 회귀 모델을 갱신하는 기법을 제안한다. 이 기법은 입력되는 모든 데이터를 회귀 모델에 적용하여 점진적으로 모델을 갱신한다. 제안된 기법의 타당성은 RME(Relative Mean Error)와 RMSE(Root Mean Square Error)를 이용하여 측정하였다. 정확도 측정 실험 결과 제안 기법인 IMQR(Incremental Multiple Quadratic Regression) 기법이 MLR(Multiple Linear Regression), MQR(Multiple Quadratic Regression), SVR(Support Vector Regression) 기법에 비해 RME 가 평균 2%, RMSE 가 평균 0.02 정도 우수한 결과를 얻었다.

  • PDF

Sparse Multinomial Kernel Logistic Regression

  • Shim, Joo-Yong;Bae, Jong-Sig;Hwang, Chang-Ha
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.1
    • /
    • pp.43-50
    • /
    • 2008
  • Multinomial logistic regression is a well known multiclass classification method in the field of statistical learning. More recently, the development of sparse multinomial logistic regression model has found application in microarray classification, where explicit identification of the most informative observations is of value. In this paper, we propose a sparse multinomial kernel logistic regression model, in which the sparsity arises from the use of a Laplacian prior and a fast exact algorithm is derived by employing a bound optimization approach. Experimental results are then presented to indicate the performance of the proposed procedure.

A Study on Estimating Construction Cost of Apartment Housing Projects Using Genetic Algorithm-Support Vector Regression (유전 알고리즘 - 서포트 벡터 회귀를 활용한 공동주택 공사비 예측에 관한 연구)

  • Nan, Jun;Choi, Jae-Woong;Choi, Hyemi;Kim, Ju-Hyung
    • Korean Journal of Construction Engineering and Management
    • /
    • v.15 no.4
    • /
    • pp.68-76
    • /
    • 2014
  • The accurate estimation of construction cost is important to a successful development in construction projects. In previous studies, the construction cost are estimated by statistical methods. Among the statistical methods, support vector regression (SVR) has attracted a lot of attentions because of the generalization ability in the field of cost estimation. However, despite the simplicity of the parameter to be adjusted, it is not easy to find optimal parameters. Therefore, to build an effective SVR model, SVR's parameters must be set properly without additional data handling loads. So this study proposes a novel approach, known as genetic algorithm (GA), which searches SVR's optimal parameters, then adopt the parameters to the SVR model for estimating cost in the early stage of apartment housing projects. The aim of this study is to propose a GA-SVR model and examine the feasibility in cost estimation by comparing with multiple regression analysis (MRA). The experimental results demonstrate the estimating performance based on the percentage of estimations within 25% and find it can effectively do the accurate estimation without through the trial and error process.

Evaluation and Predicting PM10 Concentration Using Multiple Linear Regression and Machine Learning (다중선형회귀와 기계학습 모델을 이용한 PM10 농도 예측 및 평가)

  • Son, Sanghun;Kim, Jinsoo
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.6_3
    • /
    • pp.1711-1720
    • /
    • 2020
  • Particulate matter (PM) that has been artificially generated during the recent of rapid industrialization and urbanization moves and disperses according to weather conditions, and adversely affects the human skin and respiratory systems. The purpose of this study is to predict the PM10 concentration in Seoul using meteorological factors as input dataset for multiple linear regression (MLR), support vector machine (SVM), and random forest (RF) models, and compared and evaluated the performance of the models. First, the PM10 concentration data obtained at 39 air quality monitoring sites (AQMS) in Seoul were divided into training and validation dataset (8:2 ratio). The nine meteorological factors (mean, maximum, and minimum temperature, precipitation, average and maximum wind speed, wind direction, yellow dust, and relative humidity), obtained by the automatic weather system (AWS), were composed to input dataset of models. The coefficients of determination (R2) between the observed PM10 concentration and that predicted by the MLR, SVM, and RF models was 0.260, 0.772, and 0.793, respectively, and the RF model best predicted the PM10 concentration. Among the AQMS used for model validation, Gwanak-gu and Gangnam-daero AQMS are relatively close to AWS, and the SVM and RF models were highly accurate according to the model validations. The Jongno-gu AQMS is relatively far from the AWS, but since PM10 concentration for the two adjacent AQMS were used for model training, both models presented high accuracy. By contrast, Yongsan-gu AQMS was relatively far from AQMS and AWS, both models performed poorly.

Predictive Analysis of Fire Risk Factors in Gyeonggi-do Using Machine Learning (머신러닝을 이용한 경기도 화재위험요인 예측분석)

  • Seo, Min Song;Castillo Osorio, Ever Enrique;Yoo, Hwan Hee
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.39 no.6
    • /
    • pp.351-361
    • /
    • 2021
  • The seriousness of fire is rising because fire causes enormous damage to property and human life. Therefore, this study aims to predict various risk factors affecting fire by fire type. The predictive analysis of fire factors was carried out targeting Gyeonggi-do, which has the highest number of fires in the country. For the analysis, using machine learning methods SVM (Support Vector Machine), RF (Random Forest), GBRT (Gradient Boosted Regression Tree) the accuracy of each model was presented with a high fit model through MAE (Mean Absolute Error) and RMSE (Root Mean Squared Error), and based on this, predictive analysis of fire factors in Gyeonggi-do was conducted. In addition, using machine learning methods such as SVM (Support Vector Machine), RF (Random Forest), and GBRT (Gradient Boosted Regression Tree), the accuracy of each model was presented with a high-fit model through MAE and RMSE. Predictive analysis of occurrence factors was achieved. Based on this, as a result of comparative analysis of three machine learning methods, the RF method showed a MAE = 1.765 and RMSE = 1.876, as well as the MAE and RMSE verification and test data were very similar with a difference between MAE = 0.046 and RMSE = 0.04 showing the best predictive results. The results of this study are expected to be used as useful data for fire safety management allowing decision makers to identify the sequence of dangers related to the factors affecting the occurrence of fire.

An Optimized Combination of π-fuzzy Logic and Support Vector Machine for Stock Market Prediction (주식 시장 예측을 위한 π-퍼지 논리와 SVM의 최적 결합)

  • Dao, Tuanhung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.43-58
    • /
    • 2014
  • As the use of trading systems has increased rapidly, many researchers have become interested in developing effective stock market prediction models using artificial intelligence techniques. Stock market prediction involves multifaceted interactions between market-controlling factors and unknown random processes. A successful stock prediction model achieves the most accurate result from minimum input data with the least complex model. In this research, we develop a combination model of ${\pi}$-fuzzy logic and support vector machine (SVM) models, using a genetic algorithm to optimize the parameters of the SVM and ${\pi}$-fuzzy functions, as well as feature subset selection to improve the performance of stock market prediction. To evaluate the performance of our proposed model, we compare the performance of our model to other comparative models, including the logistic regression, multiple discriminant analysis, classification and regression tree, artificial neural network, SVM, and fuzzy SVM models, with the same data. The results show that our model outperforms all other comparative models in prediction accuracy as well as return on investment.