• Title/Summary/Keyword: 잔차 검정

Search Result 65, Processing Time 0.026 seconds

Multi-objective Genetic Algorithm for Variable Selection in Linear Regression Model and Application (선형회귀모델의 변수선택을 위한 다중목적 유전 알고리즘과 응용)

  • Kim, Dong-Il;Park, Cheong-Sool;Baek, Jun-Geol;Kim, Sung-Shick
    • Journal of the Korea Society for Simulation
    • /
    • v.18 no.4
    • /
    • pp.137-148
    • /
    • 2009
  • The purpose of this study is to implement variable selection algorithm which helps construct a reliable linear regression model. If we use all candidate variables to construct a linear regression model, the significance of the model will be decreased and it will cause 'Curse of Dimensionality'. And if the number of data is less than the number of variables (dimension), we cannot construct the regression model. Due to these problems, we consider the variable selection problem as a combinatorial optimization problem, and apply GA (Genetic Algorithm) to the problem. Typical measures of estimating statistical significance are $R^2$, F-value of regression model, t-value of regression coefficients, and standard error of estimates. We design GA to solve multi-objective functions, because statistical significance of model is not to be estimated by a single measure. We perform experiments using simulation data, designed to consider various kinds of situations. As a result, it shows better performance than LARS (Least Angle Regression) which is an algorithm to solve variable selection problems. We modify algorithm to solve portfolio selection problem which construct portfolio by selecting stocks. We conclude that the algorithm is able to solve real problems.

Statistical Techniques to Detect Sensor Drifts (센서드리프트 판별을 위한 통계적 탐지기술 고찰)

  • Seo, In-Yong;Shin, Ho-Cheol;Park, Moon-Ghu;Kim, Seong-Jun
    • Journal of the Korea Society for Simulation
    • /
    • v.18 no.3
    • /
    • pp.103-112
    • /
    • 2009
  • In a nuclear power plant (NPP), periodic sensor calibrations are required to assure sensors are operating correctly. However, only a few faulty sensors are found to be calibrated. For the safe operation of an NPP and the reduction of unnecessary calibration, on-line calibration monitoring is needed. In this paper, principal component-based Auto-Associative support vector regression (PCSVR) was proposed for the sensor signal validation of the NPP. It utilizes the attractive merits of principal component analysis (PCA) for extracting predominant feature vectors and AASVR because it easily represents complicated processes that are difficult to model with analytical and mechanistic models. With the use of real plant startup data from the Kori Nuclear Power Plant Unit 3, SVR hyperparameters were optimized by the response surface methodology (RSM). Moreover the statistical techniques are integrated with PCSVR for the failure detection. The residuals between the estimated signals and the measured signals are tested by the Shewhart Control Chart, Exponentially Weighted Moving Average (EWMA), Cumulative Sum (CUSUM) and generalized likelihood ratio test (GLRT) to detect whether the sensors are failed or not. This study shows the GLRT can be a candidate for the detection of sensor drift.

A Statistical model to Predict soil Temperature by Combining the Yearly Oscillation Fourier Expansion and Meteorological Factors (연주기(年週期) Fourier 함수(函數)와 기상요소(氣象要素)에 의(依)한 지온예측(地溫豫測) 통계(統計) 모형(模型))

  • Jung, Yeong-Sang;Lee, Byun-Woo;Kim, Byung-Chang;Lee, Yang-Soo;Um, Ki-Tae
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.23 no.2
    • /
    • pp.87-93
    • /
    • 1990
  • A statistical model to predict soil temperature from the ambient meteorological factors including mean, maximum and minimum air temperatures, precipitation, wind speed and snow depth combined with Fourier time series expansion was developed with the data measured at the Suwon Meteorolical Service from 1979 to 1988. The stepwise elimination technique was used for statistical analysis. For the yearly oscillation model for soil temperature with 8 terms of Fourier expansion, the mean square error was decreased with soil depth showing 2.30 for the surface temperature, and 1.34-0.42 for 5 to 500-cm soil temperatures. The $r^2$ ranged from 0.913 to 0.988. The number of lag days of air temperature by remainder analysis was 0 day for the soil surface temperature, -1 day for 5 to 30-cm soil temperature, and -2 days for 50-cm soil temperature. The number of lag days for precipitaion, snow depth and wind speed was -1 day for the 0 to 10-cm soil temperatures, and -2 to -3 days for the 30 to 50-cm soil teperatures. For the statistical soil temperature prediction model combined with the yearly oscillation terms and meteorological factors as remainder terms considering the lag days obtained above, the mean square error was 1.64 for the soil surfac temperature, and ranged 1.34-0.42 for 5 to 500cm soil temperatures. The model test with 1978 data independent to model development resulted in good agreement with $r^2$ ranged 0.976 to 0.996. The magnitudes of coeffcicients implied that the soil depth where daily meteorological variables night affect soil temperature was 30 to 50 cm. In the models, solar radiation was not included as a independent variable ; however, in a seperated analysis on relationship between the difference(${\Delta}Tmxs$) of the maximum soil temperature and the maximum air temperature and solar radiation(Rs ; $J\;m^{-2}$) under a corn canopy showed linear relationship as $${\Delta}Tmxs=0.902+1.924{\times}10^{-3}$$ Rs for leaf area index lower than 2 $${\Delta}Tmxs=0.274+8.881{\times}10^{-4}$$ Rs for leaf area index higher than 2.

  • PDF

Methods for Genetic Parameter Estimations of Carcass Weight, Longissimus Muscle Area and Marbling Score in Korean Cattle (한우의 도체중, 배장근단면적 및 근내지방도의 유전모수 추정방법)

  • Lee, D.H.
    • Journal of Animal Science and Technology
    • /
    • v.46 no.4
    • /
    • pp.509-516
    • /
    • 2004
  • This study is to investigate the amount of biased estimates for heritability and genetic correlation according to data structure on marbling scores in Korean cattle. Breeding population with 5 generations were simulated by way of selection for carcass weight, Longissimus muscle area and latent values of marbling scores and random mating. Latent variables of marbling scores were categorized into five by the thresholds of 0, I, 2, and 3 SD(DSI) or seven by the thresholds of -2, -1, 0,1I, 2, and 3 SD(DS2). Variance components and genetic pararneters(Heritabilities and Genetic correlations) were estimated by restricted maximum likelihood on multivariate linear mixed animal models and by Gibbs sampling algorithms on multivariate threshold mixed animal models in DS1 and DS2. Simulation was performed for 10 replicates and averages and empirical standard deviation were calculated. Using REML, heritabilitis of marbling score were under-estimated as 0.315 and 0.462 on DS1 and DS2, respectively, with comparison of the pararneter(0.500). Otherwise, using Gibbs sampling in the multivariate threshold animal models, these estimates did not significantly differ to the parameter. Residual correlations of marbling score to other traits were reduced with comparing the parameters when using REML algorithm with assuming linear and normal distribution. This would be due to loss of information and therefore, reduced variation on marbling score. As concluding, genetic variation of marbling would be well defined if liability concepts were adopted on marbling score and implemented threshold mixed model on genetic parameter estimation in Korean cattle.

Correlation Analyses on Growth Traits, Body Size Traits and Carcass Traits in Hanwoo Steers (한우 후대검정우 체중, 체척 및 도체형질간 상관분석)

  • Lee, Jae-Gu;Choy, Yun-Ho;Park, Byung-Ho;Choi, Jae-Kwan;Lee, Seung-Su;Na, Jong-Sam;Roh, Seung-Hee;Choi, Tae-Jeong
    • Journal of agriculture & life science
    • /
    • v.46 no.1
    • /
    • pp.123-131
    • /
    • 2012
  • This study was conducted to estimate correlation structure between Hanwoo steer growth traits - body weights at 6 month, 12 month, 18 month and 24 month of age, average daily gain, carcass traits, body size traits at 18 months of age. Hanwoo progeny test data(body weight, body size traits) collected from 2004 to 2008 on a total of 1,838 steers at Hanwoo Improvement Main Center(NACF) were analyzed. Carcass traits were used to score the 24 months of age and slaughter. Correlation analyses were performed with observed scales of the traits and with residuals considering fixed effects in generalized linear models. The correlated coefficient estimated between live weight at slaughter(24 months of age) and cold carcass weight was high at 0.92. Correlation between beef yield index values and backfat thickness was estimated to be high and negative at -0.92. Hip height and wither height was found to be highly correlated(0.89). Chest width and chest depth also was found to be highly correlated at 0.73. Rump width was highly correlated with chest depth(0.75) and chest width(0.74). Correlation between pelvic width and rump width was estimated to be 0.74. Hipbone width was shown to be highly correlated with chest depth(0.73), chest width(0.70), rump width(0.75), or pelvic width(0.75). Correlation between wither height and carcass weight was 0.48 in observed scale. Chest girth was phenotyically (residual correlation) correlated with carcass weight (0.51), the estimates of which were some higher than than with the other carcass traits. This study will be utilized for Hanwoo Steers genetic evaluation.

A Study on the Development of Plural Gravity Models and their Application Method (복수 중력모형의 구축과 적용방법에 관한 연구)

  • Ryu, Yeong-Geun
    • Journal of Korean Society of Transportation
    • /
    • v.31 no.2
    • /
    • pp.60-68
    • /
    • 2013
  • This study developed plural gravity models and their application method in order to increase the accuracy of trip distribution estimation. The developed method initially involves utilizing the coefficient of determination ($R^2$) to set the target level. Afterwards, the gravity model is created, and if the gravity model's coefficient of determination is satisfactory in regards to the target level, the model creation is complete and future trip distribution estimation is calculated. If the coefficient of determination is not on par with the target level, the zone pair with the largest standardized residual is removed from the model until the target level is obtained. In respect to the model, the removed zone pairs are divided into positive(+) and negative(-) sides. In each of these sides, gravity models are made until the target level is reached. If there are no more zone pairs to remove, the model making process concludes, and future trip distribution estimation is calculated. The newly developed plural gravity model and application method was adopted for 42 zone pairs as a case study. The existing method of utilizing only one gravity model exhibited a coefficient of determination value ($R^2$) of 51.3%, however, the newly developed method produced three gravity models, and exhibited a coefficient of determination value ($R^2$) of over 90%. Also, the accuracy of the future trip distribution estimation was found to be higher than the existing method.

BDS Statistic: Applications to Hydrologic Data (BDS 통계: 수문자료에의 응용)

  • Kim, Hyeong-Su;Gang, Du-Seon;Kim, Jong-U;Kim, Jung-Hun
    • Journal of Korea Water Resources Association
    • /
    • v.31 no.6
    • /
    • pp.769-777
    • /
    • 1998
  • In this study, various time series are analyzed to check nonlinearities of the data. The nonlinearity of a system can be investigated by testing the randomness of the time series data. To test the randomness, four nonparametric test statistics and a new test statistic, called the BDS statistic are used and the results and the results are compared. The Brock, Dechert, and Scheinkman (BDS) statistic is originated from the statistical properties of the correlation integral which is used for searching for chaos and has been shown very effective in distinguishing nonlinear structures in dynamic systems from random structures. As a result of application to linear and nonlinear models which are well known, the BDS statistic is found to be more effective than nonparametric test statistics in identifying nonlinear structure in the time series. Hydrologic time series data are fitted to ARMA type models and the statistics are applied to the residuals. The results show that the BDS statistic can distinguish chaotic nonlinearity from randomness and that the BDS statistic can also be used for verifying the validity of the fitted model.

  • PDF

Studies on Derivation of Appropriate Geodetic System Transformation Schemes for Spatial Data (공간정보의 측지기준체계 변환 기법 도출에 관한 연구)

  • Yun, Seonghyeon;Lee, Hungkyu;Song, Jinhun
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.38 no.6
    • /
    • pp.561-571
    • /
    • 2020
  • Seven techniques widely used in the geodetic transformations have been reviewed and compared to figure out their theoretical characteristics. A series of numerical tests were performed about four data sets. This was followed by result analyses in terms of transformation residuals and accuracies together with some hypothesis testings based on the student-t distribution to confirm the statistical significance of the techniques. In the case of the transformation between the geodetic frames implemented in the same system, no statistical significance was revealed in the results of the 3D transformation techniques, even if the testing area becomes large as the Asia-Oceania continent. Among the 2D transformations, it was possible for the NTv2 grid modeling technique to deliver improved transformation accuracy. Finally, it was possible from the results analyzed in this study to propose the Helmert transformation to geodetic control points and the NTv2 technique to the 2D spatial data transformation of the geodetic systems.

Estimation of Genetic Parameter for Milk Production and Linear Type Traits in Holstein Dairy Cattle in Korea (국내 Holstein 젖소의 유생산 형질과 유방 및 지제 선형심사 형질에 대한 유전모수 추정)

  • Won, J.I.;Dang, C.K.;Lim, H.J.;Jung, Y.S.;Im, S.K.;Yoon, H.B.
    • Journal of agriculture & life science
    • /
    • v.50 no.1
    • /
    • pp.167-178
    • /
    • 2016
  • This study was conducted to estimate genetic parameters for milk production and linear type traits in Holstein dairy cattle in Korea. The data including milk yields, fat yields, protein yields, fat percent, protein percent, somatic score and 15 linear type traits for 10,218 first parity cows collected by Dairy Cattle Improvement Center, National Agricultural Cooperative, Korea, which were calving from January 2009 to April 2013. Genetic and error (co)variances between two traits selected form 19 traits were estimated using bi-trait pairwise analyses with WOMBAT package. The estimated heritabilities for milk yield(MY), fat yield(FY), protein yield(PY), fat percent(FP), protein percent(PP), somatic cell score(SCS), udder depth(UD), udder texture(UT), median suspensory(MS), fore udder attachment(FUA), front teat placement (FTP), rear attachment height(RAH), rear attachment width(RAW), rear teat placement(RTP), front teat length(FTL), foot angle(FA), heel depth(HD), bone quality(BQ), rear legs side view(RLSV), rear legs rear view(RLRV) and locomotion(LC) were 0.128, 0.144, 0.100, 0.273, 0.333, 0.090, 0.179, 0.066, 0.104, 0.109, 0.127, 0.099, 0.059, 0.069, 0.154, 0.014, 0.010, 0.052, 0.065, 0.175 and 0.031, respectively. Among the genetic correlations, UD, UT, FTP, RAW, FTL, FA and RLSV with MY were -0.334, 0.271, 0.445, 0.544, 0.076, -0.281 and -0.228, respectively, and MS, FTP, RTP, FTL, FA, BQ, RLSV, RLRV and LC with PP were -0.147, -0.182, -0.262, -0.136, 0.355, 0.311, 0.135, 0.233 and 0.143, respectively. Especially, MY had the highest positive genetic correlation with RAW (0.544), while SCS had the highest negative genetic correlation with LC (-0.603). FP had negative genetic correlation with most udder traits, whereas, FP had positive genetic correlation with leg and hoof traits (0.056 - 0.355).

Estimation of Genetic Parameters for Milk Production Traits in Holstein Dairy Cattle (홀스타인의 유생산형질에 대한 유전모수 추정)

  • Cho, Chungil;Cho, Kwanghyeon;Choy, Yunho;Choi, Jaekwan;Choi, Taejeong;Park, Byoungho;Lee, Seungsu
    • Journal of Animal Science and Technology
    • /
    • v.55 no.1
    • /
    • pp.7-11
    • /
    • 2013
  • The purpose of this study was to estimate (co) variance components of three milk production traits for genetic evaluation using a multiple lactation model. Each of the first five lactations was treated as different traits. For the parameter estimation study, a data set was set up including lactations from cows calved from 2001 to 2009. The total number of raw lactation records in first to fifth parities reached 1,416,589. At least 10 cows were required for each contemporary group, herd-year-season effect. Sires with fewer than 10 daughters were discarded. Lactations with 305d milk yield exceeding 15,000 kg were removed. In total, 1,456 sires of cows were remained after all the selection steps. A complete pedigree consisting of 292,382 records was used for the study. A sire model containing herd-year-season, caving age, and sire additive genetic effects was applied to the selected lactation data and pedigree for estimating (co) variance components via VCE. Heritabilities and genetic or residual correlations were then derived from the (co) variance estimates using R package. Genetic correlations between lactations ranged from 0.76 to 0.98 for milk yield, 0.79~1.00 for fat yield, 0.75~1.00 for protein yield. On individual lactation basis, relatively low heritability values were obtained 0.14~0.23, 0.13~0.20 and 0.14~0.19 for milk, fat, and protein yields, respectively. For the combined lactation heritability values were 0.29, 0.28, and 0.26 for milk, fat, and protein yields. The estimated parameters will be used in national genetic evaluations for production traits.