• Title/Summary/Keyword: 서포트 벡터 머신 회귀

Search Result 62, Processing Time 0.027 seconds

Effective Drought Prediction Based on Machine Learning (머신러닝 기반 효과적인 가뭄예측)

  • Kim, Kyosik;Yoo, Jae Hwan;Kim, Byunghyun;Han, Kun-Yeun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.326-326
    • /
    • 2021
  • 장기간에 걸쳐 넓은 지역에 대해 발생하는 가뭄을 예측하기위해 많은 학자들의 기술적, 학술적 시도가 있어왔다. 본 연구에서는 복잡한 시계열을 가진 가뭄을 전망하는 방법 중 시나리오에 기반을 둔 가뭄전망 방법과 실시간으로 가뭄을 예측하는 비시나리오 기반의 방법 등을 이용하여 미래 가뭄전망을 실시했다. 시나리오에 기반을 둔 가뭄전망 방법으로는, 3개월 GCM(General Circulation Model) 예측 결과를 바탕으로 2009년도 PDSI(Palmer Drought Severity Index) 가뭄지수를 산정하여 가뭄심도에 대한 단기예측을 실시하였다. 또, 통계학적 방법과 물리적 모델(Physical model)에 기반을 둔 확정론적 수치해석 방법을 이용하여 비시나리오 기반 가뭄을 예측했다. 기존 가뭄을 통계학적 방법으로 예측하기 위해서 시도된 대표적인 방법으로 ARIMA(Autoregressive Integrated Moving Average) 모델의 예측에 대한 한계를 극복하기위해 서포트 벡터 회귀(support vector regression, SVR)와 웨이블릿(wavelet neural network) 신경망을 이용해 SPI를 측정하였다. 최적모델구조는 RMSE(root mean square error), MAE(mean absolute error) 및 R(correlation Coefficient)를 통해 선정하였고, 1-6개월의 선행예보 시간을 갖고 가뭄을 전망하였다. 그리고 SPI를 이용하여, 마코프 연쇄(Markov chain) 및 대수선형모델(log-linear model)을 적용하여 SPI기반 가뭄예측의 정확도를 검증하였으며, 터키의 아나톨리아(Anatolia) 지역을 대상으로 뉴로퍼지모델(Neuro-Fuzzy)을 적용하여 1964-2006년 기간의 월평균 강수량과 SPI를 바탕으로 가뭄을 예측하였다. 가뭄 빈도와 패턴이 불규칙적으로 변하며 지역별 강수량의 양극화가 심화됨에 따라 가뭄예측의 정확도를 높여야 하는 요구가 커지고 있다. 본 연구에서는 복잡하고 비선형성으로 이루어진 가뭄 패턴을 기상학적 가뭄의 정도를 나타내는 표준강수증발지수(SPEI, Standardized Precipitation Evapotranspiration Index)인 월SPEI와 일SPEI를 기계학습모델에 적용하여 예측개선 모형을 개발하고자 한다.

  • PDF

A Success Prediction Model for Debut Webtoon Based on Reader reaction Using Deep Learning and Machine Learning (딥러닝과 머신러닝을 활용한 독자 반응 기반 웹툰 데뷔작 성공 예측 모델)

  • Heo, Eun Yeong;Kim, Seung Hwa;Kim, Hyon Hee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.770-773
    • /
    • 2019
  • 본 논문에서는 매년 성장하는 웹툰 시장 속에서 신인 작가들이 성공할 수 있는 성공 요인을 밝히고자 하였다. 국내 1위 웹툰 플랫폼인 네이버 웹툰 중 데뷔작을 기준으로 완결 웹툰 212개, 연재 중인 웹툰 112개, 총 324개의 웹툰을 수집하여 연구를 진행하였다. 기존 선행연구와의 차별화를 두기 위해 독자의 직접적인 반응 중 하나인 댓글을 성공 요인에 포함하였다. 댓글에 담긴 긍정, 부정을 나타내는 주관을 탐지하기 위해 딥러닝을 이용하여 감성 분석을 실시하였다. 각 웹툰에 대한 댓글 반응을 포함하여 평균, '좋아요' 수, 장르 그리고 첫 화 댓글 수와 5화까지 평균 댓글 수를 흥행에 영향을 미치는 독립변수로 사용했다. 댓글 반응이 중요 요인인지를 확인하기 위해 각 모델 생성 시 댓글 반응을 포함한 모델과 포함하지 않은 모델을 생성하여 성능 평가를 실시하였다. 로지스틱 회귀분석, 아다 부스트, 그리고 서포트 벡터 머신 모델을 정확도와 ROC 그래프를 이용해 효율성을 비교하고, 이를 통해 댓글 반응을 활용한 로지스틱 회귀 모델이 가장 적합하다고 판단하였다. 모델 생성 결과 '좋아요' 수, 1화 댓글 수, 댓글 반응 순으로 성공 요인에 많은 영향을 미치는 것을 알 수 있었다.

Convergence Study in Development of Severity Adjustment Method for Death with Acute Myocardial Infarction Patients using Machine Learning (머신러닝을 이용한 급성심근경색증 환자의 퇴원 시 사망 중증도 보정 방법 개발에 대한 융복합 연구)

  • Baek, Seol-Kyung;Park, Hye-Jin;Kang, Sung-Hong;Choi, Joon-Young;Park, Jong-Ho
    • Journal of Digital Convergence
    • /
    • v.17 no.2
    • /
    • pp.217-230
    • /
    • 2019
  • This study was conducted to develop a customized severity-adjustment method and to evaluate their validity for acute myocardial infarction(AMI) patients to complement the limitations of the existing severity-adjustment method for comorbidities. For this purpose, the subjects of KCD-7 code I20.0 ~ I20.9, which is the main diagnosis of acute myocardial infarction were extracted using the Korean National Hospital Discharge In-depth Injury survey data from 2006 to 2015. Three tools were used for severity-adjustment method of comorbidities : CCI (charlson comorbidity index), ECI (Elixhauser comorbidity index) and the newly proposed CCS (Clinical Classification Software). The results showed that CCS was the best tool for the severity correction, and that support vector machine model was the most predictable. Therefore, we propose the use of the customized method of severity correction and machine learning techniques from this study for the future research on severity adjustment such as assessment of results of medical service.

A study on entertainment TV show ratings and the number of episodes prediction (국내 예능 시청률과 회차 예측 및 영향요인 분석)

  • Kim, Milim;Lim, Soyeon;Jang, Chohee;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.6
    • /
    • pp.809-825
    • /
    • 2017
  • The number of TV entertainment shows is increasing. Competition among programs in the entertainment market is intensifying since cable channels air many entertainment TV shows. There is now a need for research on program ratings and the number of episodes. This study presents predictive models for entertainment TV show ratings and number of episodes. We use various data mining techniques such as linear regression, logistic regression, LASSO, random forests, gradient boosting, and support vector machine. The analysis results show that the average program ratings before the first broadcast is affected by broadcasting company, average ratings of the previous season, starting year and number of articles. The average program ratings after the first broadcast is influenced by the rating of the first broadcast, broadcasting company and program type. We also found that the predicted average ratings, starting year, type and broadcasting company are important variables in predicting of the number of episodes.

A Study for Improving the Performance of Data Mining Using Ensemble Techniques (앙상블기법을 이용한 다양한 데이터마이닝 성능향상 연구)

  • Jung, Yon-Hae;Eo, Soo-Heang;Moon, Ho-Seok;Cho, Hyung-Jun
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.4
    • /
    • pp.561-574
    • /
    • 2010
  • We studied the performance of 8 data mining algorithms including decision trees, logistic regression, LDA, QDA, Neral network, and SVM and their combinations of 2 ensemble techniques, bagging and boosting. In this study, we utilized 13 data sets with binary responses. Sensitivity, Specificity and missclassificate error were used as criteria for comparison.

Diagnosis Atherosclerosis Model Using Radiomics Approach in Carotid Vessel MRI (경동맥 혈관 MRI에서 라디오믹스를 이용한 동맥경화증 진단 모델)

  • Kim, Jong-hun;Park, Hyunjin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.289-290
    • /
    • 2022
  • Arteriosclerosis is a disease in which the carotid vessel wall becomes thick, and it is important to monitor the thickness of the vessel wall for diagnosis. In this study, we propose a model for extracting 324 radiomics features from carotid MRI images and diagnosing arteriosclerosis using machine learning techniques. We learned a total of four classification models: logistic regression, support vector machine, random forest, and XGBoost through radiomics features. XGBoost model, which showed the highest performance in 5-fold cross-validation, shows the results of accuracy 0.9023, sensitivity 0.9517, specificity 0.8035, AUC 0.8776.

  • PDF

Feature selection and prediction modeling of drug responsiveness in Pharmacogenomics (약물유전체학에서 약물반응 예측모형과 변수선택 방법)

  • Kim, Kyuhwan;Kim, Wonkuk
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.153-166
    • /
    • 2021
  • A main goal of pharmacogenomics studies is to predict individual's drug responsiveness based on high dimensional genetic variables. Due to a large number of variables, feature selection is required in order to reduce the number of variables. The selected features are used to construct a predictive model using machine learning algorithms. In the present study, we applied several hybrid feature selection methods such as combinations of logistic regression, ReliefF, TurF, random forest, and LASSO to a next generation sequencing data set of 400 epilepsy patients. We then applied the selected features to machine learning methods including random forest, gradient boosting, and support vector machine as well as a stacking ensemble method. Our results showed that the stacking model with a hybrid feature selection of random forest and ReliefF performs better than with other combinations of approaches. Based on a 5-fold cross validation partition, the mean test accuracy value of the best model was 0.727 and the mean test AUC value of the best model was 0.761. It also appeared that the stacking models outperform than single machine learning predictive models when using the same selected features.

An Energy Consumption Prediction Model for Smart Factory Using Data Mining Algorithms (데이터 마이닝 기반 스마트 공장 에너지 소모 예측 모델)

  • Sathishkumar, VE;Lee, Myeongbae;Lim, Jonghyun;Kim, Yubin;Shin, Changsun;Park, Jangwoo;Cho, Yongyun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.5
    • /
    • pp.153-160
    • /
    • 2020
  • Energy Consumption Predictions for Industries has a prominent role to play in the energy management and control system as dynamic and seasonal changes are occurring in energy demand and supply. This paper introduces and explores the steel industry's predictive models of energy consumption. The data used includes lagging and leading reactive power lagging and leading current variable, emission of carbon dioxide (tCO2) and load type. Four statistical models are trained and tested in the test set: (a) Linear Regression (LR), (b) Radial Kernel Support Vector Machine (SVM RBF), (c) Gradient Boosting Machine (GBM), and (d) Random Forest (RF). Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) are used for calculating regression model predictive performance. When using all the predictors, the best model RF can provide RMSE value 7.33 in the test set.

Development of the Modified Preprocessing Method for Pipe Wall Thinning Data in Nuclear Power Plants (원자력 발전소 배관 감육 측정데이터의 개선된 전처리 방법 개발)

  • Seong-Bin Mun;Sang-Hoon Lee;Young-Jin Oh;Sung-Ryul Kim
    • Transactions of the Korean Society of Pressure Vessels and Piping
    • /
    • v.19 no.2
    • /
    • pp.146-154
    • /
    • 2023
  • In nuclear power plants, ultrasonic test for pipe wall thickness measurement is used during periodic inspections to prevent pipe rupture due to pipe wall thinning. However, when measuring pipe wall thickness using ultrasonic test, a significant amount of measurement error occurs due to the on-site conditions of the nuclear power plant. If the maximum pipe wall thinning rate is decided by the measured pipe wall thickness containing a significant error, the pipe wall thinning rate data have significant uncertainty and systematic overestimation. This study proposes preprocessing of pipe wall thinning measurement data using support vector machine regression algorithm. By using support vector machine, pipe wall thinning measurement data can be smoothened and accordingly uncertainty and systematic overestimation of the estimated pipe wall thinning rate data can be reduced.

Machine Learning-based Production and Sales Profit Prediction Using Agricultural Public Big Data (농업 공공 빅데이터를 이용한 머신러닝 기반 생산량 및 판매 수익금 예측)

  • Lee, Hyunjo;Kim, Yong-Ki;Koo, Hyun Jung;Chae, Cheol-Joo
    • Smart Media Journal
    • /
    • v.11 no.4
    • /
    • pp.19-29
    • /
    • 2022
  • Recently, with the development of IoT technology, the number of farms using smart farms is increasing. Smart farms monitor the environment and optimise internal environment automatically to improve crop yield and quality. For optimized crop cultivation, researches on predict crop productivity are actively studied, by using collected agricultural digital data. However, most of the existing studies are based on statistical models based on existing statistical data, and thus there is a problem with low prediction accuracy. In this paper, we use various predition models for predicting the production and sales profits, and compare the performance results through models by using the agricultural digital data collected in the facility horticultural smart farm. The models that compared the performance are multiple linear regression, support vector machine, artificial neural network, recurrent neural network, LSTM, and ConvLSTM. As a result of performance comparison, ConvLSTM showed the best performance in R2 value and RMSE value.