• Title/Summary/Keyword: support vector regression machine

Search Result 386, Processing Time 0.029 seconds

Fake News Detection on Social Media using Video Information: Focused on YouTube (영상정보를 활용한 소셜 미디어상에서의 가짜 뉴스 탐지: 유튜브를 중심으로)

  • Chang, Yoon Ho;Choi, Byoung Gu
    • The Journal of Information Systems
    • /
    • v.32 no.2
    • /
    • pp.87-108
    • /
    • 2023
  • Purpose The main purpose of this study is to improve fake news detection performance by using video information to overcome the limitations of extant text- and image-oriented studies that do not reflect the latest news consumption trend. Design/methodology/approach This study collected video clips and related information including news scripts, speakers' facial expression, and video metadata from YouTube to develop fake news detection model. Based on the collected data, seven combinations of related information (i.e. scripts, video metadata, facial expression, scripts and video metadata, scripts and facial expression, and scripts, video metadata, and facial expression) were used as an input for taining and evaluation. The input data was analyzed using six models such as support vector machine and deep neural network. The area under the curve(AUC) was used to evaluate the performance of classification model. Findings The results showed that the ACU and accuracy values of three features combination (scripts, video metadata, and facial expression) were the highest in logistic regression, naïve bayes, and deep neural network models. This result implied that the fake news detection could be improved by using video information(video metadata and facial expression). Sample size of this study was relatively small. The generalizablity of the results would be enhanced with a larger sample size.

Development of a Default Prediction Model for Vulnerable Populations Using Imbalanced Data Analysis (불균형 데이터 처리 기반의 취약계층 채무불이행 예측모델 개발)

  • Lee, Jong Hwa
    • The Journal of Information Systems
    • /
    • v.33 no.3
    • /
    • pp.175-185
    • /
    • 2024
  • Purpose This study aims to analyze the relationship between consumption patterns and default risk among financially vulnerable households in a rapidly changing economic environment. Financially vulnerable households are more susceptible to economic shocks, and their consumption patterns can significantly contribute to an increased risk of default. Therefore, this study seeks to provide a systematic approach to predict and manage these risks in advance. Design/methodology/approach The study utilizes data from the Korea Welfare Panel Study (KOWEPS) to analyze the consumption patterns and default status of financially vulnerable households. To address the issue of data imbalance, sampling techniques such as SMOTE, SMOTE-ENN, and SMOTE-Tomek Links were applied. Various machine learning algorithms, including Logistic Regression, Decision Tree, Random Forest, and Support Vector Machine (SVM), were employed to develop the prediction model. The performance of the models was evaluated using Confusion Matrix and F1-score. Findings The findings reveal that when using the original imbalanced data, the prediction performance for the minority class (default) was poor. However, after applying imbalance handling techniques such as SMOTE, the predictive performance for the minority class improved significantly. In particular, the Random Forest model, when combined with the SMOTE-Tomek Links technique, showed the highest predictive performance, making it the most suitable model for default prediction. These results suggest that effectively addressing data imbalance is crucial in developing accurate default prediction models, and the appropriate use of sampling techniques can greatly enhance predictive performance.

Comparison of Carbon Dioxide Emission Concentration according to the Age of Agricultural Heating Machine (농업용 난방기의 사용 연식에 따른 이산화탄소 배출농도 비교)

  • Na-Eun Kim;Dae-Hyun Kim;Yean-Jung Kim;Hyeon-Tae Kim
    • Journal of Bio-Environment Control
    • /
    • v.32 no.3
    • /
    • pp.190-196
    • /
    • 2023
  • This study was carried out to collect gas emitted from agricultural heaters using kerosene and to identify the emission concentration of carbon dioxide according to the age of agricultural heating machine. As a result of the linear regression analysis, the carbon dioxide emissions according to the year of agricultural heating machine are R2 = 0.84, which follows y = 26.99x+721.98. Distributed analysis was classified into three groups according to the age of agricultural heating machine. As a result of the distributed analysis, it was 2.196×10-13, which was smaller than the 0.05 probability set for the analysis, which means that there is a difference in at least one group. As a result, the age of the agriculture machine was divided into three groups and the difference between groups was tested. A statistical analysis result was derived that there was a difference in the emission concentration of carbon dioxide according to the age of agricultural heating machine. It is thought that it can be used to investigate greenhouse gas emissions by investigating the amount of carbon dioxide generated by agricultural heaters in the agricultural field of Korea.

Feature selection and prediction modeling of drug responsiveness in Pharmacogenomics (약물유전체학에서 약물반응 예측모형과 변수선택 방법)

  • Kim, Kyuhwan;Kim, Wonkuk
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.153-166
    • /
    • 2021
  • A main goal of pharmacogenomics studies is to predict individual's drug responsiveness based on high dimensional genetic variables. Due to a large number of variables, feature selection is required in order to reduce the number of variables. The selected features are used to construct a predictive model using machine learning algorithms. In the present study, we applied several hybrid feature selection methods such as combinations of logistic regression, ReliefF, TurF, random forest, and LASSO to a next generation sequencing data set of 400 epilepsy patients. We then applied the selected features to machine learning methods including random forest, gradient boosting, and support vector machine as well as a stacking ensemble method. Our results showed that the stacking model with a hybrid feature selection of random forest and ReliefF performs better than with other combinations of approaches. Based on a 5-fold cross validation partition, the mean test accuracy value of the best model was 0.727 and the mean test AUC value of the best model was 0.761. It also appeared that the stacking models outperform than single machine learning predictive models when using the same selected features.

Calibration of Portable Particulate Mattere-Monitoring Device using Web Query and Machine Learning

  • Loh, Byoung Gook;Choi, Gi Heung
    • Safety and Health at Work
    • /
    • v.10 no.4
    • /
    • pp.452-460
    • /
    • 2019
  • Background: Monitoring and control of PM2.5 are being recognized as key to address health issues attributed to PM2.5. Availability of low-cost PM2.5 sensors made it possible to introduce a number of portable PM2.5 monitors based on light scattering to the consumer market at an affordable price. Accuracy of light scatteringe-based PM2.5 monitors significantly depends on the method of calibration. Static calibration curve is used as the most popular calibration method for low-cost PM2.5 sensors particularly because of ease of application. Drawback in this approach is, however, the lack of accuracy. Methods: This study discussed the calibration of a low-cost PM2.5-monitoring device (PMD) to improve the accuracy and reliability for practical use. The proposed method is based on construction of the PM2.5 sensor network using Message Queuing Telemetry Transport (MQTT) protocol and web query of reference measurement data available at government-authorized PM monitoring station (GAMS) in the republic of Korea. Four machine learning (ML) algorithms such as support vector machine, k-nearest neighbors, random forest, and extreme gradient boosting were used as regression models to calibrate the PMD measurements of PM2.5. Performance of each ML algorithm was evaluated using stratified K-fold cross-validation, and a linear regression model was used as a reference. Results: Based on the performance of ML algorithms used, regression of the output of the PMD to PM2.5 concentrations data available from the GAMS through web query was effective. The extreme gradient boosting algorithm showed the best performance with a mean coefficient of determination (R2) of 0.78 and standard error of 5.0 ㎍/㎥, corresponding to 8% increase in R2 and 12% decrease in root mean square error in comparison with the linear regression model. Minimum 100 hours of calibration period was found required to calibrate the PMD to its full capacity. Calibration method proposed poses a limitation on the location of the PMD being in the vicinity of the GAMS. As the number of the PMD participating in the sensor network increases, however, calibrated PMDs can be used as reference devices to nearby PMDs that require calibration, forming a calibration chain through MQTT protocol. Conclusions: Calibration of a low-cost PMD, which is based on construction of PM2.5 sensor network using MQTT protocol and web query of reference measurement data available at a GAMS, significantly improves the accuracy and reliability of a PMD, thereby making practical use of the low-cost PMD possible.

Prediction Models for Solitary Pulmonary Nodules Based on Curvelet Textural Features and Clinical Parameters

  • Wang, Jing-Jing;Wu, Hai-Feng;Sun, Tao;Li, Xia;Wang, Wei;Tao, Li-Xin;Huo, Da;Lv, Ping-Xin;He, Wen;Guo, Xiu-Hua
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.14 no.10
    • /
    • pp.6019-6023
    • /
    • 2013
  • Lung cancer, one of the leading causes of cancer-related deaths, usually appears as solitary pulmonary nodules (SPNs) which are hard to diagnose using the naked eye. In this paper, curvelet-based textural features and clinical parameters are used with three prediction models [a multilevel model, a least absolute shrinkage and selection operator (LASSO) regression method, and a support vector machine (SVM)] to improve the diagnosis of benign and malignant SPNs. Dimensionality reduction of the original curvelet-based textural features was achieved using principal component analysis. In addition, non-conditional logistical regression was used to find clinical predictors among demographic parameters and morphological features. The results showed that, combined with 11 clinical predictors, the accuracy rates using 12 principal components were higher than those using the original curvelet-based textural features. To evaluate the models, 10-fold cross validation and back substitution were applied. The results obtained, respectively, were 0.8549 and 0.9221 for the LASSO method, 0.9443 and 0.9831 for SVM, and 0.8722 and 0.9722 for the multilevel model. All in all, it was found that using curvelet-based textural features after dimensionality reduction and using clinical predictors, the highest accuracy rate was achieved with SVM. The method may be used as an auxiliary tool to differentiate between benign and malignant SPNs in CT images.

Improved Estimation of Hourly Surface Ozone Concentrations using Stacking Ensemble-based Spatial Interpolation (스태킹 앙상블 모델을 이용한 시간별 지상 오존 공간내삽 정확도 향상)

  • KIM, Ye-Jin;KANG, Eun-Jin;CHO, Dong-Jin;LEE, Si-Woo;IM, Jung-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.25 no.3
    • /
    • pp.74-99
    • /
    • 2022
  • Surface ozone is produced by photochemical reactions of nitrogen oxides(NOx) and volatile organic compounds(VOCs) emitted from vehicles and industrial sites, adversely affecting vegetation and the human body. In South Korea, ozone is monitored in real-time at stations(i.e., point measurements), but it is difficult to monitor and analyze its continuous spatial distribution. In this study, surface ozone concentrations were interpolated to have a spatial resolution of 1.5km every hour using the stacking ensemble technique, followed by a 5-fold cross-validation. Base models for the stacking ensemble were cokriging, multi-linear regression(MLR), random forest(RF), and support vector regression(SVR), while MLR was used as the meta model, having all base model results as additional input variables. The results showed that the stacking ensemble model yielded the better performance than the individual base models, resulting in an averaged R of 0.76 and RMSE of 0.0065ppm during the study period of 2020. The surface ozone concentration distribution generated by the stacking ensemble model had a wider range with a spatial pattern similar with terrain and urbanization variables, compared to those by the base models. Not only should the proposed model be capable of producing the hourly spatial distribution of ozone, but it should also be highly applicable for calculating the daily maximum 8-hour ozone concentrations.

Development of a High-Performance Concrete Compressive-Strength Prediction Model Using an Ensemble Machine-Learning Method Based on Bagging and Stacking (배깅 및 스태킹 기반 앙상블 기계학습법을 이용한 고성능 콘크리트 압축강도 예측모델 개발)

  • Yun-Ji Kwak;Chaeyeon Go;Shinyoung Kwag;Seunghyun Eem
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.36 no.1
    • /
    • pp.9-18
    • /
    • 2023
  • Predicting the compressive strength of high-performance concrete (HPC) is challenging because of the use of additional cementitious materials; thus, the development of improved predictive models is essential. The purpose of this study was to develop an HPC compressive-strength prediction model using an ensemble machine-learning method of combined bagging and stacking techniques. The result is a new ensemble technique that integrates the existing ensemble methods of bagging and stacking to solve the problems of a single machine-learning model and improve the prediction performance of the model. The nonlinear regression, support vector machine, artificial neural network, and Gaussian process regression approaches were used as single machine-learning methods and bagging and stacking techniques as ensemble machine-learning methods. As a result, the model of the proposed method showed improved accuracy results compared with single machine-learning models, an individual bagging technique model, and a stacking technique model. This was confirmed through a comparison of four representative performance indicators, verifying the effectiveness of the method.

A Predictive Model of the Generator Output Based on the Learning of Performance Data in Power Plant (발전플랜트 성능데이터 학습에 의한 발전기 출력 추정 모델)

  • Yang, HacJin;Kim, Seong Kun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.12
    • /
    • pp.8753-8759
    • /
    • 2015
  • Establishment of analysis procedures and validated performance measurements for generator output is required to maintain stable management of generator output in turbine power generation cycle. We developed turbine expansion model and measurement validation model for the performance calculation of generator using turbine output based on ASME (American Society of Mechanical Engineers) PTC (Performance Test Code). We also developed verification model for uncertain measurement data related to the turbine and generator output. Although the model in previous researches was developed using artificial neural network and kernel regression, the verification model in this paper was based on algorithms through Support Vector Machine (SVM) model to overcome the problems of unmeasured data. The selection procedures of related variables and data window for verification learning was also developed. The model reveals suitability in the estimation procss as the learning error was in the range of about 1%. The learning model can provide validated estimations for corrective performance analysis of turbine cycle output using the predictions of measurement data loss.

A Study of the Feature Classification and the Predictive Model of Main Feed-Water Flow for Turbine Cycle (주급수 유량의 형상 분류 및 추정 모델에 대한 연구)

  • Yang, Hac Jin;Kim, Seong Kun;Choi, Kwang Hee
    • Journal of Energy Engineering
    • /
    • v.23 no.4
    • /
    • pp.263-271
    • /
    • 2014
  • Corrective thermal performance analysis is required for thermal power plants to determine performance status of turbine cycle. We developed classification method for main feed water flow to make precise correction for performance analysis based on ASME (American Society of Mechanical Engineers) PTC (Performance Test Code). The classification is based on feature identification of status of main water flow. Also we developed predictive algorithms for corrected main feed-water through Support Vector Machine (SVM) Model for each classified feature area. The results was compared to estimations using Neural Network(NN) and Kernel Regression(KR). The feature classification and predictive model of main feed-water flow provides more practical methods for corrective thermal performance analysis of turbine cycle.