• Title/Summary/Keyword: Model over-fitting

Search Result 151, Processing Time 0.022 seconds

Image Classification of Damaged Bolts using Convolution Neural Networks (합성곱 신경망을 이용한 손상된 볼트의 이미지 분류)

  • Lee, Soo-Byoung;Lee, Seok-Soon
    • Journal of Aerospace System Engineering
    • /
    • v.16 no.4
    • /
    • pp.109-115
    • /
    • 2022
  • The CNN (Convolution Neural Network) algorithm which combines a deep learning technique, and a computer vision technology, makes image classification feasible with the high-performance computing system. In this thesis, the CNN algorithm is applied to the classification problem, by using a typical deep learning framework of TensorFlow and machine learning techniques. The data set required for supervised learning is generated with the same type of bolts. some of which have undamaged threads, but others have damaged threads. The learning model with less quantity data showed good classification performance on detecting damage in a bolt image. Additionally, the model performance is reviewed by altering the quantity of convolution layers, or applying selectively the over and under fitting alleviation algorithm.

Regionalization of Rainfall-Runoff Model Based on Relationship Between Model Parameters and Watershed Characteristics (매개변수와 유역특성인자 사이의 상호연관성을 고려한 강우-유출모형 지역화)

  • Kim, Jin-Guk;Uranchimeg, Sumiya;Kim, Tae-Jeong;Kim, Jang-Gyeong;Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.293-293
    • /
    • 2021
  • 자연유량이란 인위적 행위에 의한 하천의 유량 변화가 없는 개발되지 않은 상태의 하천유량을 말하며, 실제 유량을 측정하거나 관측자료를 활용한 장기유출모형을 통해 산정할 수 있다. 미계측 유역에 대한 강우-유출 모형 구축시, 무엇보다 실제 미계측유역에 적용시 나타날 수 있는 문제점을 최소화할 수 있는 방향으로 모형 개발이 이루어지는 것이 필요하다. 강우-유출 모형 매개변수의 수가 많아질수록 과적합(over-fitting)의 발생 소지가 증가하게 되며, 지역화 모형 구축시 불확실성을 더욱 가중시키게 된다. 이러한 이유로, 모형의 검정보다는 검증에 초점이 맞춰져 있어야 하며, 더불어 사용되는 강우-유출 모형의 매개변수가 적어야 한다. 본 연구에서는 대표 강우-유출모형의 선정시 여러 평가 기준 중 예측의 정확성 측면에서 통계적 지표를 통해 모형의 수행능력에 중점을 두었으며, 적은 개수의 매개변수를 갖음에도 불구하고 상대적 우수한 모의결과를 제공하는 GR4J(Ge'nie Rural a 4 parame tres Journalier)모형을 최적 유출모형으로 선정하여 댐 상류유역에 대한 자연유량 재현성능을 평가하였다. 최종적으로 강우-유출모형의 최적매개변수와 유역특성인자 사이의 상호연관성을 고려해 매개변수를 지역화하기 위하여, 본 연구에서는 두 가지 이상의 변량에 대한 상관성을 효과적으로 재현하는데 효과적이며, 자유로운 주변확률분포 선택과 결합확률분포의 추정이 용이한 장점이 있는 Copula 함수를 활용하였다. 제시된 방법론에 대한 적합성을 평가하기 위해 교차검증 관점에서 지역화된 매개변수의 적합성을 검토하였으며, 본 연구에서 도출된 결과는 유역특성에 따른 미계측유역의 자연유량 산정시 지역 매개변수를 강우-유출모형에 활용함으로써 신뢰성 있는 자연유량 산정 결과를 제공할 수 있을 것으로 판단된다.

  • PDF

A compensation method for the scaling effects in the simulation of a downburst-generated wind-wave field

  • Haiwei Xu;Tong Zheng;Yong Chen;Wenjuan Lou;Guohui Shen
    • Wind and Structures
    • /
    • v.38 no.4
    • /
    • pp.261-275
    • /
    • 2024
  • Before performing an experimental study on the downburst-generated wave, it is necessary to examine the scale effects and corresponding corrections or compensations. Analysis of similarity is conducted to conclude the non-dimensional force ratios that account for the dynamic similarity in the interaction of downburst with wave between the prototype and the scale model, along with the corresponding scale factors. The fractional volume of fluid (VOF) method in association with the impinging jet model is employed to explore the characteristics of the downburst-generated wave numerically, and the validity of the proposed scaling method is verified. The study shows that the location of the maximum radial wind velocity in a downburst-wave field is a little higher than that identified in a downburst over the land, which might be attributed to the presence of the wave which changes the roughness of the underlying surface of the downburst. The impinging airflow would generate a concavity in the free surface of the water around the stagnation point of the downburst, with a diameter of about two times the jet diameter (Djet). The maximum wave height appears at the location of 1.5Djet from the stagnation point. Reynolds number has an insignificant influence on the scale effects, in accordance with the numerical investigation of the 30 scale models with the Reynolds number varying from 3.85 × 104 to 7.30 × 109. The ratio of the inertial force of air to the gravitational force of water, which is denoted by G, is found to be the most significant factor that would affect the interaction of downburst with wave. For the correction or compensation of the scale effects, fitting curves for the measures of the downburst-wave field (e.g., wind profile, significant wave height), along with the corresponding equations, are presented as a function of the parameter G.

Application of a Geographically Weighted Poisson Regression Analysis to Explore Spatial Varying Relationship Between Highly Pathogenic Avian Influenza Incidence and Associated Determinants (공간가중 포아송 회귀모형을 이용한 고병원성 조류인플루엔자 발생에 영향을 미치는 결정인자의 공간이질성 분석)

  • Choi, Sung-Hyun;Pak, Son-Il
    • Journal of Veterinary Clinics
    • /
    • v.36 no.1
    • /
    • pp.7-14
    • /
    • 2019
  • In South Korea, six large outbreaks of highly pathogenic avian influenza (HPAI) have occurred since the first confirmation in 2003 from chickens. For the past 15 years, HPAI outbreaks have become an annual phenomenon throughout the country and has extended to wider regions, across rural and urban environments. An understanding of the spatial epidemiology of HPAI occurrence is essential in assessing and managing the risk of the infection; however, local spatial variations of relationship between HPAI incidences in Korea and related risk factors have rarely been derived. This study examined whether spatial heterogeneity exists in this relationship, using a geographically weighted Poisson regression (GWPR) model. The outcome variable was the number of HPAI-positive farms at 252 Si-Gun-Gu (administrative boundaries in Korea) level notified to government authority during the period from January 2014 to April 2016. This response variable was regressed to a set of sociodemographic and topographic predictors, including the number of wild birds infected with HPAI virus, the number of wintering birds and their species migrated into Korea, the movement frequency of vehicles carrying animals, the volume of manure treated per day, the number of livestock farms, and mean elevation. Both global and local modeling techniques were employed to fit the model. From 2014 to 2016, a total of 403 HPAI-positive farms were reported with high incidence especially in western coastal regions, ranging from 0 to 74. The results of this study show that local model (adjusted R-square = 0.801, AIC = 954.5) has great advantages over corresponding global model (adjusted R-square = 0.408, AIC = 2323.1) in terms of model fitting and performance. The relationship between HPAI incidence in Korea and seven predictors under consideration were significantly spatially non-stationary, contrary to assumptions in the global model. The comparison between global Poisson and GWPR results indicated that a place-specific spatial analysis not only fit the data better, but also provided insights into understanding the non-stationarity of the associations between the HPAI and associated determinants. We demonstrated that an empirically derived GWPR model has the potential to serve as a useful tool for assessing spatially varying characteristics of HPAI incidences for a given local area and predicting the risk area of HPAI occurrence. Considering the prominent burden of HPAI this study provides more insights into spatial targeting of enhanced surveillance and control strategies in high-risk regions against HPAI outbreaks.

Improving Generalization Performance of Neural Networks using Natural Pruning and Bayesian Selection (자연 프루닝과 베이시안 선택에 의한 신경회로망 일반화 성능 향상)

  • 이현진;박혜영;이일병
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.326-338
    • /
    • 2003
  • The objective of a neural network design and model selection is to construct an optimal network with a good generalization performance. However, training data include noises, and the number of training data is not sufficient, which results in the difference between the true probability distribution and the empirical one. The difference makes the teaming parameters to over-fit only to training data and to deviate from the true distribution of data, which is called the overfitting phenomenon. The overfilled neural network shows good approximations for the training data, but gives bad predictions to untrained new data. As the complexity of the neural network increases, this overfitting phenomenon also becomes more severe. In this paper, by taking statistical viewpoint, we proposed an integrative process for neural network design and model selection method in order to improve generalization performance. At first, by using the natural gradient learning with adaptive regularization, we try to obtain optimal parameters that are not overfilled to training data with fast convergence. By adopting the natural pruning to the obtained optimal parameters, we generate several candidates of network model with different sizes. Finally, we select an optimal model among candidate models based on the Bayesian Information Criteria. Through the computer simulation on benchmark problems, we confirm the generalization and structure optimization performance of the proposed integrative process of teaming and model selection.

Development of Bond Strength Model for FRP Plates Using Back-Propagation Algorithm (역전파 학습 알고리즘을 이용한 콘크리트와 부착된 FRP 판의 부착강도 모델 개발)

  • Park, Do-Kyong
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.10 no.2
    • /
    • pp.133-144
    • /
    • 2006
  • In order to catch out such Bond Strength, the preceding researchers had ever examined the Bond Strength of FRP Plate through their experimentations by setting up of various fluent. However, since the experiment for research on such Bond Strength takes much of expenditure for equipment structure and time-consuming, also difficult to carry out, it is conducting limitedly. This Study purposes to develop the most suitable Artificial Neural Network Model by application of various Neural Network Model and Algorithm to the adhering experiment data of the preceding researchers. Output Layer of Artificial Neural Network Model, and Input Layer of Bond Strength were performed the learning by selection as the variable of the thickness, width, adhered length, the modulus of elasticity, tensile strength, and the compressive strength of concrete, tensile strength, width, respectively. The developed Artificial Neural Network Model has applied Back-Propagation, and its error was learnt to be converged within the range of 0.001. Besides, the process for generalization has dissolved the problem of Over-Fitting in the way of more generalized method by introduction of Bayesian Technique. The verification on the developed Model was executed by comparison with the resulted value of Bond Strength made by the other preceding researchers which was never been utilized to the learning as yet.

A Performance Analysis by Adjusting Learning Methods in Stock Price Prediction Model Using LSTM (LSTM을 이용한 주가예측 모델의 학습방법에 따른 성능분석)

  • Jung, Jongjin;Kim, Jiyeon
    • Journal of Digital Convergence
    • /
    • v.18 no.11
    • /
    • pp.259-266
    • /
    • 2020
  • Many developments have been steadily carried out by researchers with applying knowledge-based expert system or machine learning algorithms to the financial field. In particular, it is now common to perform knowledge based system trading in using stock prices. Recently, deep learning technologies have been applied to real fields of stock trading marketplace as GPU performance and large scaled data have been supported enough. Especially, LSTM has been tried to apply to stock price prediction because of its compatibility for time series data. In this paper, we implement stock price prediction using LSTM. In modeling of LSTM, we propose a fitness combination of model parameters and activation functions for best performance. Specifically, we propose suitable selection methods of initializers of weights and bias, regularizers to avoid over-fitting, activation functions and optimization methods. We also compare model performances according to the different selections of the above important modeling considering factors on the real-world stock price data of global major companies. Finally, our experimental work brings a fitness method of applying LSTM model to stock price prediction.

Spatial Data Analysis for the U.S. Regional Income Convergence,1969-1999: A Critical Appraisal of $\beta$-convergence (미국 소득분포의 지역적 수렴에 대한 공간자료 분석(1969∼1999년) - 베타-수렴에 대한 비판적 검토 -)

  • Sang-Il Lee
    • Journal of the Korean Geographical Society
    • /
    • v.39 no.2
    • /
    • pp.212-228
    • /
    • 2004
  • This paper is concerned with an important aspect of regional income convergence, ${\beta}$-convergence, which refers to the negative relationship between initial income levels and income growth rates of regions over a period of time. The common research framework on ${\beta}$-convergence which is based on OLS regression models has two drawbacks. First, it ignores spatially autocorrelated residuals. Second, it does not provide any way of exploring spatial heterogeneity across regions in terms of ${\beta}$-convergence. Given that empirical studies on ${\beta}$-convergence need to be edified by spatial data analysis, this paper aims to: (1) provide a critical review of empirical studies on ${\beta}$-convergence from a spatial perspective; (2) investigate spatio-temporal income dynamics across the U.S. labor market areas for the last 30 years (1969-1999) by fitting spatial regression models and applying bivariate ESDA techniques. The major findings are as follows. First, the hypothesis of ${\beta}$-convergence was only partially evidenced, and the trend substantively varied across sub-periods. Second, a SAR model indicated that ${\beta}$-coefficient for the entire period was not significant at the 99% confidence level, which may lead to a conclusion that there is no statistical evidence of regional income convergence in the US over the last three decades. Third, the results from bivariate ESDA techniques and a GWR model report that there was a substantive level of spatial heterogeneity in the catch-up process, and suggested possible spatial regimes. It was also observed that the sub-periods showed a substantial level of spatio-temporal heterogeneity in ${\beta}$-convergence: the catch-up scenario in a spatial sense was least pronounced during the 1980s.

The Study of the Validity Test on the Self-monitoring Scale (자기 검색척도(Self-Monitoring Scale)의 타당성 검정에 관한 연구)

  • 이선아
    • Journal of Korean Academy of Nursing
    • /
    • v.28 no.3
    • /
    • pp.751-759
    • /
    • 1998
  • The study of the validity test on the self-monitoring scale for nurses In this study, both the literary survey as well as empirical research has been executed to test the validity of the scales that measure the construct of the self-monitoring scale. The self-monitoring scale could not be classified into five factors as Snyder suggested. Many other scholars (Briggs, Cheek and Buss, 1980) suggested 3 different classifications which was accepted by Snyder and Gangestad (1986). John, Cheek and Klohnen(1996) claimed a two-factor classification. As has been discussed, factor analysis is used to prove convergent validity within the factor and discriminant validity between the factors. However, depending on the researchers, many variations in classification of the factors were found and a lack of content and discriminant validity were found in the previous research findings. It is also important to note that Snyder's self-monitoring scale did not factor-load at over. 30 for all 25 items, regardless of how many factors could be classified. According to findings of this study, the self-monitoring scale neither classified as five, three or two factors nor factor loaded as hypothesized. It is also clear that Snyder's self-monitoring scale lacks convergent validity as the sub-factors of the scale failed to prove its uni-dimensionality. The A self-monit oring scale not only fail to overcome the problems of Snyder's self-monitori ng scale but even lost the attractiveness of the self-monitoring scale. In this study it was also found that the A self-monitoring scale was not classified in either in a two or three-factor classification as hypothesized. It is, of course, not desirable to use any scale that lacks convergent and discriminant validity even though it has been widely used and has held a great deal of influence on the field of social psychology. To overcome the shortcomings of Snyder's self-monitoring scale, Lennox and Wolfe(1984) suggested 13 items. This study was dedicated to test the validity and reliability of the scale, in which we found that the data presented in validity as the two factors were class ified and loaded as expected. Reliability was also proven by checking Cronbach's α for each factor and for the total items. In addition, a confirmatory factor analysis was executed for the 13 items using LISREL 8.12 program to confirm convergent validity in a two-factor classification. The model was fitting and sound : however, the self-monitoring scale was unfitted and not validated. Thus, it is recommended to use not the original nor the abbreviated self-monitoring scale but the 13 items in future studies. It should also be noted that items 7 and 13 should be removed to obtain better uni-dimensionality for the 13 items. These items loaded at over. 30, too high for the two factors in the test results of Factor analysis. In addition, it is necessary to double-check the cause of two-hold loading at over .30 for the two factors. It could be a problem caused by data or by the scale itself. Therefore, additional studies should follow to better clarify this matter.

  • PDF

Study of the Validity Test on the Self-monitoring Scale for Primi-Gravida (초임부를 대상으로 한 자가검색도 척도의 타당도 비교)

  • Lee, Seon-Ah
    • Women's Health Nursing
    • /
    • v.4 no.2
    • /
    • pp.173-186
    • /
    • 1998
  • In this study, both the literary survey as well as empirical research has been executed to test the validity of the scales that measure the construct of self-monitoring scale could not be classified into five factors as Snyder suggested. Many other scholars (Briggs, Cheek and Buss, 1980) suggested 3 different classifications which was accepted by Snyder and Gangestad (1986). John, Cheek and Klohnen (1996) claimed a two-factor classification. As has been discussed, factor analysis is used to prove convergent validity within the factor and discriminant validity between the factors. However, depending on the researchers, many variations in classification of the factors were found and a lack of content and discriminant validity was found in the previous research findings. It is also important to note that Snyder's self-monitoring scale, did not factor-load at over 30 for all 25 items, regardless of how many factors could be classified. According to findings of this study, the self-monitoring scale neither classified as five, three or two factors nor factor loaded as hypothesized. It is also clear that Snyder's self-monitoring scale lack convergent validity as the sub-factors of the scale fail to prove its uni-dimensionality. The A self-monitoring scale not only fail to overcome the problems of Snyder's self-monitoring scale but even lost the attractiveness of the self-monitoring scale. In this study, it was also found that the A self-monitoring scale was not classified as hypothesized in either in a two or three-factor classification. It is, of course, not desirable to use any scale that lacks convergent and discriminant validity even though it has been widely used but also has held a great deal of influence on the field of social psychology. To overcome the shortcomings of Snyder's self-monitoring scale, Lennox and Wolfe(1984) suggested 13 items. This study 1. was dedicated to test the validity and reliability of the scale, in which we found that the data presented in validity as the two factors were classified and loaded as expected. Reliability was also proven by checking Cronbach's alpha for each factor and for the total items. In addition, a confirmatory factor analysis was executed for the 13 items using LISREL 8.12 program to confirm convergent validity in a two-factor classification. The model was fitting and sound ; however, the self-monitoring scale was unfitted and not validated. Thus, it is recommended to use not the original or the abbreviated self-monitoring scale but the 13 items in future studies. It should also be noted that items 7 and 13 should be removed to obtain better uni-dimensionality for the 13 items. These items loaded at over .30, too high for the two factors in the test results of factor analysis. In addition, it is necessary to double-check the cause of two-hold loading at over .30 for the two factors. It could be a problem caused by data or by the scale itself. Therefore, additional studies should follow to better clarify this matter.

  • PDF