Search | Korea Science

Detection of Outliers in Constrained Regression

Kim, Myung-Geun
- Communications for Statistical Applications and Methods
- /
- v.10 no.2
- /
- pp.519-524
- /
- 2003
We suggest a method of identifying outliers, using local influence, in regression when linear constraints are imposed on the regression coefficients. An example is given for illustration.
https://doi.org/10.5351/CKSS.2003.10.2.519 인용 PDF KSCI

Pitman Nearness for a Generalized Stein-Rule Estimators of Regression Coefficients

R. Karan Singh;N. Rastogi
- Journal of the Korean Statistical Society
- /
- v.31 no.2
- /
- pp.229-235
- /
- 2002
A generalized Stein-rule estimator of the vector of regression coefficients in linear regression model is considered and its properties are analyzed according to the criterion of Pitman nearness. A comparative study shows that the generalized Stein-rule estimator representing a class of estimators contains particular members which are better than the usual Stein-rule estimator according to the Pitman closeness.
PDF KSCI

Study of Selection of Regression Equation for Flow-conditions using Machine-learning Method: Focusing on Nakdonggang Waterbody (머신러닝 기법을 활용한 유황별 LOADEST 모형의 적정 회귀식 선정 연구: 낙동강 수계를 중심으로)

Kim, Jonggun;Park, Youn Shik;Lee, Seoro;Shin, Yongchul;Lim, Kyoung Jae;Kim, Ki-sung
- Journal of The Korean Society of Agricultural Engineers
- /
- v.59 no.4
- /
- pp.97-107
- /
- 2017
This study is to determine the coefficients of regression equations and to select the optimal regression equation in the LOADEST model after classifying the whole study period into 5 flow conditions for 16 watersheds located in the Nakdonggang waterbody. The optimized coefficients of regression equations were derived using the gradient descent method as a learning method in Tensorflow which is the engine of machine-learning method. In South Korea, the variability of streamflow is relatively high, and rainfall is concentrated in summer that can significantly affect the characteristic analysis of pollutant loads. Thus, unlike the previous application of the LOADEST model (adjusting whole study period), the study period was classified into 5 flow conditions to estimate the optimized coefficients and regression equations in the LOADEST model. As shown in the results, the equation #9 which has 7 coefficients related to flow and seasonal characteristics was selected for each flow condition in the study watersheds. When compared the simulated load (SS) to observed load, the simulation showed a similar pattern to the observation for the high flow condition due to the flow parameters related to precipitation directly. On the other hand, although the simulated load showed a similar pattern to observation in several watersheds, most of study watersheds showed large differences for the low flow conditions. This is because the pollutant load during low flow conditions might be significantly affected by baseflow or point-source pollutant load. Thus, based on the results of this study, it can be found that to estimate the continuous pollutant load properly the regression equations need to be determined with proper coefficients based on various flow conditions in watersheds. Furthermore, the machine-learning method can be useful to estimate the coefficients of regression equations in the LOADEST model.
https://doi.org/10.5389/KSAE.2017.59.4.097 인용 PDF KSCI

Robust Fuzzy Varying Coefficient Regression Analysis with Crisp Inputs and Gaussian Fuzzy Output

Yang, Zhihui;Yin, Yunqiang;Chen, Yizeng
- Journal of Computing Science and Engineering
- /
- v.7 no.4
- /
- pp.263-271
- /
- 2013
This study presents a fuzzy varying coefficient regression model after deleting the outliers to improve the feasibility and effectiveness of the fuzzy regression model. The objective of our methodology is to allow the fuzzy regression coefficients to vary with a covariate, and simultaneously avoid the impact of data contaminated by outliers. In this paper, fuzzy regression coefficients are represented by Gaussian fuzzy numbers. We also formulate suitable goodness of fit to evaluate the performance of the proposed methodology. An example is given to demonstrate the effectiveness of our methodology.
https://doi.org/10.5626/JCSE.2013.7.4.263 인용 PDF KSCI KPUBS

Performing linear regression with responses calculated using Monte Carlo transport codes

Price, Dean;Kochunas, Brendan
- Nuclear Engineering and Technology
- /
- v.54 no.5
- /
- pp.1902-1908
- /
- 2022
In many of the complex systems modeled in the field of nuclear engineering, it is often useful to use linear regression-based analyses to analyze relationships between model parameters and responses of interests. In cases where the response of interest is calculated by a simulation which uses Monte Carlo methods, there will be some uncertainty in the responses. Further, the reduction of this uncertainty increases the time necessary to run each calculation. This paper presents some discussion on how the Monte Carlo error in the response of interest influences the error in computed linear regression coefficients. A mathematical justification is given that shows that when performing linear regression in these scenarios, the error in regression coefficients can be largely independent of the Monte Carlo error in each individual calculation. This condition is only true if the total number of calculations are scaled to have a constant total time, or amount of work, for all calculations. An application with a simple pin cell model is used to demonstrate these observations in a practical problem.
https://doi.org/10.1016/j.net.2021.11.003 인용 PDF KSCI

QUASI-LIKELIHOOD REGRESSION FOR VARYING COEFFICIENT MODELS WITH LONGITUDINAL DATA

Kim, Choong-Rak;Jeong, Mee-Seon;Kim, Woo-Chul;Park, Byeong-U.
- Journal of the Korean Statistical Society
- /
- v.33 no.4
- /
- pp.367-379
- /
- 2004
This article deals with the nonparametric analysis of longitudinal data when there exist possible correlations among repeated measurements for a given subject. We consider a quasi-likelihood regression model where a transformation of the regression function through a link function is linear in time-varying coefficients. We investigate the local polynomial approach to estimate the time-varying coefficients, and derive the asymptotic distribution of the estimators in this quasi-likelihood context. A real data set is analyzed as an illustrative example.
PDF KSCI

LIKELIHOOD DISTANCE IN CONSTRAINED REGRESSION

Kim, Myung-Geun
- Journal of applied mathematics & informatics
- /
- v.25 no.1_2
- /
- pp.489-493
- /
- 2007
Two diagnostic measures based on the likelihood distance for constrained regression with linear constraints on regression coefficients are derived. They are used for identifying influential observations in constrained regression. A numerical example is provided for illustration.

Influence Analysis of Constrained Regression Models

Kim, Myung-Geun
- Communications for Statistical Applications and Methods
- /
- v.14 no.2
- /
- pp.281-286
- /
- 2007
Cook's distance is generalized to the multiple linear regression with linear constraints on regression coefficients. It is used for identifying influential observations in constrained regression models. A numerical example is provided for illustration.
https://doi.org/10.5351/CKSS.2007.14.2.281 인용 PDF KSCI

Lasso Regression of RNA-Seq Data based on Bootstrapping for Robust Feature Selection (안정적 유전자 특징 선택을 위한 유전자 발현량 데이터의 부트스트랩 기반 Lasso 회귀 분석)

Jo, Jeonghee;Yoon, Sungroh
- KIISE Transactions on Computing Practices
- /
- v.23 no.9
- /
- pp.557-563
- /
- 2017
When large-scale gene expression data are analyzed using lasso regression, the estimation of regression coefficients may be unstable due to the highly correlated expression values between associated genes. This irregularity, in which the coefficients are reduced by L1 regularization, causes difficulty in variable selection. To address this problem, we propose a regression model which exploits the repetitive bootstrapping of gene expression values prior to lasso regression. The genes selected with high frequency were used to build each regression model. Our experimental results show that several genes were consistently selected in all regression models and we verified that these genes were not false positives. We also identified that the sign distribution of the regression coefficients of the selected genes from each model was correlated to the real dependent variables.
https://doi.org/10.5626/KTCP.2017.23.9.557 인용 KSCI

Bootstrapping Composite Quantile Regression (복합 분위수 회귀에 대한 붓스트랩 방법의 응용)

Seo, Kang-Min;Bang, Sung-Wan;Jhun, Myoung-Shic
- The Korean Journal of Applied Statistics
- /
- v.25 no.2
- /
- pp.341-350
- /
- 2012
Composite quantile regression model is considered for iid error case. Since the regression coefficients are the same across different quantiles, composite quantile regression can be used to combine the strength across multiple quantile regression models. For the composite quantile regression, bootstrap method is examined for statistical inference including the selection of the number of quantiles and confidence intervals for the regression coefficients. Feasibility of the bootstrap method is demonstrated through a simulation study.
https://doi.org/10.5351/KJAS.2012.25.2.341 인용 PDF KSCI

Search Result 2,279, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)