• Title/Summary/Keyword: Models, statistical

Search Result 3,012, Processing Time 0.028 seconds

The Credit Information Feature Selection Method in Default Rate Prediction Model for Individual Businesses (개인사업자 부도율 예측 모델에서 신용정보 특성 선택 방법)

  • Hong, Dongsuk;Baek, Hanjong;Shin, Hyunjoon
    • Journal of the Korea Society for Simulation
    • /
    • v.30 no.1
    • /
    • pp.75-85
    • /
    • 2021
  • In this paper, we present a deep neural network-based prediction model that processes and analyzes the corporate credit and personal credit information of individual business owners as a new method to predict the default rate of individual business more accurately. In modeling research in various fields, feature selection techniques have been actively studied as a method for improving performance, especially in predictive models including many features. In this paper, after statistical verification of macroeconomic indicators (macro variables) and credit information (micro variables), which are input variables used in the default rate prediction model, additionally, through the credit information feature selection method, the final feature set that improves prediction performance was identified. The proposed credit information feature selection method as an iterative & hybrid method that combines the filter-based and wrapper-based method builds submodels, constructs subsets by extracting important variables of the maximum performance submodels, and determines the final feature set through prediction performance analysis of the subset and the subset combined set.

Development of a Convergence Problem Solving Skill Test Tool (융합적 문제해결력 검사 도구)

  • Lee, Dong-Young;Yoon, Jin-A;Nam, Younkyeong
    • Journal of the Korean earth science society
    • /
    • v.41 no.6
    • /
    • pp.670-683
    • /
    • 2020
  • The purpose of this study was to develop a test tool for convergence problem solving skill. To this end, constructs of convergence problem solving skill were defined in three domains: convergence attributes, convergence thinking, and convergence literacy domains. Thirty-seven pilot items were developed on the basis of the sub-categories for each domain that was defined through intensive literature review; problem solving & convergent thinking and creative thinking for convergence thinking domain, individual and social propensity for the convergence attributes domain, and convergence literacy as convergence literacy domain. Through an exploratory factor analysis, 30 items in the constructs of the test tool were confirmed. A confirmatory factor analysis result showed that the five construct models well captured the covariance between all the items well. Finally a statistical result shows that the reliability of the items and constructs were well established (Cronbach's α value= .963). Thus, the test tool for convergence problem solving skill developed in this study was statistically reliable.

Optimization of SWAN Wave Model to Improve the Accuracy of Winter Storm Wave Prediction in the East Sea

  • Son, Bongkyo;Do, Kideok
    • Journal of Ocean Engineering and Technology
    • /
    • v.35 no.4
    • /
    • pp.273-286
    • /
    • 2021
  • In recent years, as human casualties and property damage caused by hazardous waves have increased in the East Sea, precise wave prediction skills have become necessary. In this study, the Simulating WAves Nearshore (SWAN) third-generation numerical wave model was calibrated and optimized to enhance the accuracy of winter storm wave prediction in the East Sea. We used Source Term 6 (ST6) and physical observations from a large-scale experiment conducted in Australia and compared its results to Komen's formula, a default in SWAN. As input wind data, we used Korean Meteorological Agency's (KMA's) operational meteorological model called Regional Data Assimilation and Prediction System (RDAPS), the European Centre for Medium Range Weather Forecasts' newest 5th generation re-analysis data (ERA5), and Japanese Meteorological Agency's (JMA's) meso-scale forecasting data. We analyzed the accuracy of each model's results by comparing them to observation data. For quantitative analysis and assessment, the observed wave data for 6 locations from KMA and Korea Hydrographic and Oceanographic Agency (KHOA) were used, and statistical analysis was conducted to assess model accuracy. As a result, ST6 models had a smaller root mean square error and higher correlation coefficient than the default model in significant wave height prediction. However, for peak wave period simulation, the results were incoherent among each model and location. In simulations with different wind data, the simulation using ERA5 for input wind datashowed the most accurate results overall but underestimated the wave height in predicting high wave events compared to the simulation using RDAPS and JMA meso-scale model. In addition, it showed that the spatial resolution of wind plays a more significant role in predicting high wave events. Nevertheless, the numerical model optimized in this study highlighted some limitations in predicting high waves that rise rapidly in time caused by meteorological events. This suggests that further research is necessary to enhance the accuracy of wave prediction in various climate conditions, such as extreme weather.

A Bayesian zero-inflated negative binomial regression model based on Pólya-Gamma latent variables with an application to pharmaceutical data (폴랴-감마 잠재변수에 기반한 베이지안 영과잉 음이항 회귀모형: 약학 자료에의 응용)

  • Seo, Gi Tae;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.311-325
    • /
    • 2022
  • For count responses, the situation of excess zeros often occurs in various research fields. Zero-inflated model is a common choice for modeling such count data. Bayesian inference for the zero-inflated model has long been recognized as a hard problem because the form of conditional posterior distribution is not in closed form. Recently, however, Pillow and Scott (2012) and Polson et al. (2013) proposed a Pólya-Gamma data-augmentation strategy for logistic and negative binomial models, facilitating Bayesian inference for the zero-inflated model. We apply Bayesian zero-inflated negative binomial regression model to longitudinal pharmaceutical data which have been previously analyzed by Min and Agresti (2005). To facilitate posterior sampling for longitudinal zero-inflated model, we use the Pólya-Gamma data-augmentation strategy.

Predicting a Queue Length Using a Deep Learning Model at Signalized Intersections (딥러닝 모형을 이용한 신호교차로 대기행렬길이 예측)

  • Na, Da-Hyuk;Lee, Sang-Soo;Cho, Keun-Min;Kim, Ho-Yeon
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.20 no.6
    • /
    • pp.26-36
    • /
    • 2021
  • In this study, a deep learning model for predicting the queue length was developed using the information collected from the image detector. Then, a multiple regression analysis model, a statistical technique, was derived and compared using two indices of mean absolute error(MAE) and root mean square error(RMSE). From the results of multiple regression analysis, time, day of the week, occupancy, and bus traffic were found to be statistically significant variables. Occupancy showed the most strong impact on the queue length among the variables. For the optimal deep learning model, 4 hidden layers and 6 lookback were determined, and MAE and RMSE were 6.34 and 8.99. As a result of evaluating the two models, the MAE of the multiple regression model and the deep learning model were 13.65 and 6.44, respectively, and the RMSE were 19.10 and 9.11, respectively. The deep learning model reduced the MAE by 52.8% and the RMSE by 52.3% compared to the multiple regression model.

Prediction of Alcohol Consumption Based on Biosignals and Assessment of Driving Ability According to Alcohol Consumption (생체 신호 기반 음주량 예측 및 음주량에 따른 운전 능력 평가)

  • Park, Seung Won;Choi, Jun won;Kim, Tae Hyun;Seo, Jeong Hun;Jeong, Myeon Gyu;Lee, Kang In;Kim, Han Sung
    • Journal of Biomedical Engineering Research
    • /
    • v.43 no.1
    • /
    • pp.27-34
    • /
    • 2022
  • Drunk driving defines a driver as unable to drive a vehicle safely due to drinking. To crack down on drunk driving, alcohol concentration evaluates through breathing and crack down on drinking using S-shaped courses. A method for assessing drunk driving without using BAC or BrAC is measurement via biosignal. Depending on the individual specificity of drinking, alcohol evaluation studies through various biosignals need to be conducted. In this study, we measure biosignals that are related to alcohol concentration, predict BrAC through SVM, and verify the effectiveness of the S-shaped course. Participants were 8 men who have a driving license. Subjects conducted a d2 test and a scenario evaluation of driving an S-shaped course when they attained BrAC's certain criteria. We utilized SVR to predict BrAC via biosignals. Statistical analysis used a one-way Anova test. Depending on the amount of drinking, there was a tendency to increase pupil size, HR, normLF, skin conductivity, body temperature, SE, and speed, while normHF tended to decrease. There was no apparent change in the respiratory rate and TN-E. The result of the D2 test tended to increase from 0.03% and decrease from 0.08%. Measured biosignals have enabled BrAC predictions using SVR models to obtain high Figs in primary and secondary cross-validations. In this study, we were able to predict BrAC through changes in biosignals and SVMs depending on alcohol concentration and verified the effectiveness of the S-shaped course drinking control method.

Factor augmentation for cryptocurrency return forecasting (암호화폐 수익률 예측력 향상을 위한 요인 강화)

  • Yeom, Yebin;Han, Yoojin;Lee, Jaehyun;Park, Seryeong;Lee, Jungwoo;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.189-201
    • /
    • 2022
  • In this study, we propose factor augmentation to improve forecasting power of cryptocurrency return. We consider financial and economic variables as well as psychological aspect for possible factors. To be more specific, financial and economic factors are obtained by applying principal factor analysis. Psychological factor is summarized by news sentiment analysis. We also visualize such factors through impulse response analysis. In the modeling perspective, we consider ARIMAX as the classical model, and random forest and deep learning to accommodate nonlinear features. As a result, we show that factor augmentation reduces prediction error and the GRU performed the best amongst all models considered.

Comparison of Bond-Slip Behavior and Design Criteria of High Strength Lightweight Concrete with Compressive Strength 50 MPa and Unit Weight 16 kN/m3 (압축강도 50 MPa, 단위중량 16 kN/m3 고강도 경량 콘크리트 부착-슬립 거동의 설계기준과의 비교)

  • Lee, Dong-Kyun;Lee, Do-Kyung;Oh, Jun-Hwan;Yoo, Sung-Won
    • Journal of the Korean Recycled Construction Resources Institute
    • /
    • v.10 no.2
    • /
    • pp.168-175
    • /
    • 2022
  • With the recent development of nanotechnology, its application in the field of construction materials is continuously increasing. However, until now, studies on the bond characteristics of concrete and rebar for applying high-strength lightweight concrete with a compressive strength of 50 MPa and a unit weight of 16 kN/m3 to structural members are lacking. Therefore, in this paper, 81 specimens of high-strength lightweight concrete with a compressive strength of 50 MPa and a unit weight of about 16 kN/m3 were fabricated and a direct pull-out tests were performed. The design code for the bond strength of ACI-408R and the experimental results are shown to be relatively similar, and as a result of the CEB-FIP and modified CMR bond behavior models through statistical analysis, it is shown to describe well on average.

A review of gene selection methods based on machine learning approaches (기계학습 접근법에 기반한 유전자 선택 방법들에 대한 리뷰)

  • Lee, Hajoung;Kim, Jaejik
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.5
    • /
    • pp.667-684
    • /
    • 2022
  • Gene expression data present the level of mRNA abundance of each gene, and analyses of gene expressions have provided key ideas for understanding the mechanism of diseases and developing new drugs and therapies. Nowadays high-throughput technologies such as DNA microarray and RNA-sequencing enabled the simultaneous measurement of thousands of gene expressions, giving rise to a characteristic of gene expression data known as high dimensionality. Due to the high-dimensionality, learning models to analyze gene expression data are prone to overfitting problems, and to solve this issue, dimension reduction or feature selection techniques are commonly used as a preprocessing step. In particular, we can remove irrelevant and redundant genes and identify important genes using gene selection methods in the preprocessing step. Various gene selection methods have been developed in the context of machine learning so far. In this paper, we intensively review recent works on gene selection methods using machine learning approaches. In addition, the underlying difficulties with current gene selection methods as well as future research directions are discussed.

Robust estimation of sparse vector autoregressive models (희박 벡터 자기 회귀 모형의 로버스트 추정)

  • Kim, Dongyeong;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.5
    • /
    • pp.631-644
    • /
    • 2022
  • This paper considers robust estimation of the sparse vector autoregressive model (sVAR) useful in high-dimensional time series analysis. First, we generalize the result of Xu et al. (2008) that the adaptive lasso indeed has robustness in sVAR as well. However, adaptive lasso method in sVAR performs poorly as the number and sizes of outliers increases. Therefore, we propose new robust estimation methods for sVAR based on least absolute deviation (LAD) and Huber estimation. Our simulation results show that our proposed methods provide more accurate estimation in turn showed better forecasting performance when outliers exist. In addition, we applied our proposed methods to power usage data and confirmed that there are unignorable outliers and robust estimation taking such outliers into account improves forecasting.