• Title/Summary/Keyword: Random regression

Search Result 963, Processing Time 0.027 seconds

Semiparametric kernel logistic regression with longitudinal data

  • Shim, Joo-Yong;Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.2
    • /
    • pp.385-392
    • /
    • 2012
  • Logistic regression is a well known binary classification method in the field of statistical learning. Mixed-effect regression models are widely used for the analysis of correlated data such as those found in longitudinal studies. We consider kernel extensions with semiparametric fixed effects and parametric random effects for the logistic regression. The estimation is performed through the penalized likelihood method based on kernel trick, and our focus is on the efficient computation and the effective hyperparameter selection. For the selection of optimal hyperparameters, cross-validation techniques are employed. Numerical results are then presented to indicate the performance of the proposed procedure.

Performance Comparison of Machine-learning Models for Analyzing Weather and Traffic Accident Correlations

  • Li Zi Xuan;Hyunho Yang
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.3
    • /
    • pp.225-232
    • /
    • 2023
  • Owing to advancements in intelligent transportation systems (ITS) and artificial-intelligence technologies, various machine-learning models can be employed to simulate and predict the number of traffic accidents under different weather conditions. Furthermore, we can analyze the relationship between weather and traffic accidents, allowing us to assess whether the current weather conditions are suitable for travel, which can significantly reduce the risk of traffic accidents. In this study, we analyzed 30000 traffic flow data points collected by traffic cameras at nearby intersections in Washington, D.C., USA from October 2012 to May 2017, using Pearson's heat map. We then predicted, analyzed, and compared the performance of the correlation between continuous features by applying several machine-learning algorithms commonly used in ITS, including random forest, decision tree, gradient-boosting regression, and support vector regression. The experimental results indicated that the gradient-boosting regression machine-learning model had the best performance.

Machine learning model for residual chlorine prediction in sediment basin to control pre-chlorination in water treatment plant (정수장 전염소 공정제어를 위한 침전지 잔류염소농도 예측 머신러닝 모형)

  • Kim, Juhwan;Lee, Kyunghyuk;Kim, Soojun;Kim, Kyunghun
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.spc1
    • /
    • pp.1283-1293
    • /
    • 2022
  • The purpose of this study is to predict residual chlorine in order to maintain stable residual chlorine concentration in sedimentation basin by using artificial intelligence algorithms in water treatment process employing pre-chlorination. Available water quantity and quality data are collected and analyzed statistically to apply into mathematical multiple regression and artificial intelligence models including multi-layer perceptron neural network, random forest, long short term memory (LSTM) algorithms. Water temperature, turbidity, pH, conductivity, flow rate, alkalinity and pre-chlorination dosage data are used as the input parameters to develop prediction models. As results, it is presented that the random forest algorithm shows the most moderate prediction result among four cases, which are long short term memory, multi-layer perceptron, multiple regression including random forest. Especially, it is result that the multiple regression model can not represent the residual chlorine with the input parameters which varies independently with seasonal change, numerical scale and dimension difference between quantity and quality. For this reason, random forest model is more appropriate for predict water qualities than other algorithms, which is classified into decision tree type algorithm. Also, it is expected that real time prediction by artificial intelligence models can play role of the stable operation of residual chlorine in water treatment plant including pre-chlorination process.

Longitudinal Analysis of Body Weight and Feed Intake in Selection Lines for Residual Feed Intake in Pigs

  • Cai, W.;Wu, H.;Dekkers, J.C.M.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.24 no.1
    • /
    • pp.17-27
    • /
    • 2011
  • A selection experiment for reduced residual feed intake (RFI) in Yorkshire pigs consisted of a line selected for lower RFI (LRFI) and a random control line (CTRL). Longitudinal measurements of daily feed intake (DFI) and body weight (BW) from generation 5 of this experiment were used. The objectives of this study were to evaluate the use of random regression (RR) and nonlinear mixed models to predict DFI and BW for individual pigs, accounting for the substantial missing information that characterizes these data, and to evaluate the effect of selection for RFI on BW and DFI curves. Forty RR models with different-order polynomials of age as fixed and random effects, and with homogeneous or heterogeneous residual variance by month of age, were fitted for both DFI and BW. Based on predicted residual sum of squares (PRESS) and residual diagnostics, the quadratic polynomial RR model was identified to be best, but with heterogeneous residual variance for DFI and homogeneous residual variance for BW. Compared to the simple quadratic and linear regression models for individual pigs, these RR models decreased PRESS by 1% and 2% for DFI and by 42% and 36% for BW on boars and gilts, respectively. Given the same number of random effects as the polynomial RR models, i.e., two for BW and one for DFI, the non-linear Gompertz model predicted better than the polynomial RR models but not as good as higher order polynomial RR models. After five generations of selection for reduced RFI, the LRFI line had a lower population curve for DFI and BW than the CTRL line, especially towards the end of the growth period.

Empirical Study on Test Case Prioritization Techniques of Regression Testing (회귀 테스팅의 테스트 케이스 우선 순위화 기법의 실험적 연구)

  • So Sun Sup;Chae Yigeun
    • The KIPS Transactions:PartD
    • /
    • v.12D no.2 s.98
    • /
    • pp.283-288
    • /
    • 2005
  • Test case prioritization methods schedule test cases for execution when we can not practically run all test cases for regression testing. We proposed a new prioritization method that is based on historical execution and mr detection data. And we conducted an experiment to compare the proposed method with existing Random and LRU methods using the fault age under the long run environment as criterion. The experiment shows several interesting results. First, our results show that they are complementary. Random method shows good performance for programs that have many error-detectable test cases and HED is more effective for the programs that can be detected by very small amount of test cases. But LRU is more effective for the programs that have relatively medium amount of error detectable test cases. Next, the performance of prioritization method is affected by the size of test suites. Two experiments that have different size of test suites show considerably different fault ages and performance order. And lastly, the $20\%$ of test cases shows considerably good performance compared to the execution result of the full test suite.

A Bayesian zero-inflated Poisson regression model with random effects with application to smoking behavior (랜덤효과를 포함한 영과잉 포아송 회귀모형에 대한 베이지안 추론: 흡연 자료에의 적용)

  • Kim, Yeon Kyoung;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.2
    • /
    • pp.287-301
    • /
    • 2018
  • It is common to encounter count data with excess zeros in various research fields such as the social sciences, natural sciences, medical science or engineering. Such count data have been explained mainly by zero-inflated Poisson model and extended models. Zero-inflated count data are also often correlated or clustered, in which random effects should be taken into account in the model. Frequentist approaches have been commonly used to fit such data. However, a Bayesian approach has advantages of prior information, avoidance of asymptotic approximations and practical estimation of the functions of parameters. We consider a Bayesian zero-inflated Poisson regression model with random effects for correlated zero-inflated count data. We conducted simulation studies to check the performance of the proposed model. We also applied the proposed model to smoking behavior data from the Regional Health Survey (2015) of the Korea Centers for disease control and prevention.

A Study on Predictive Modeling of I-131 Radioactivity Based on Machine Learning (머신러닝 기반 고용량 I-131의 용량 예측 모델에 관한 연구)

  • Yeon-Wook You;Chung-Wun Lee;Jung-Soo Kim
    • Journal of radiological science and technology
    • /
    • v.46 no.2
    • /
    • pp.131-139
    • /
    • 2023
  • High-dose I-131 used for the treatment of thyroid cancer causes localized exposure among radiology technologists handling it. There is a delay between the calibration date and when the dose of I-131 is administered to a patient. Therefore, it is necessary to directly measure the radioactivity of the administered dose using a dose calibrator. In this study, we attempted to apply machine learning modeling to measured external dose rates from shielded I-131 in order to predict their radioactivity. External dose rates were measured at 1 m, 0.3 m, and 0.1 m distances from a shielded container with the I-131, with a total of 868 sets of measurements taken. For the modeling process, we utilized the hold-out method to partition the data with a 7:3 ratio (609 for the training set:259 for the test set). For the machine learning algorithms, we chose linear regression, decision tree, random forest and XGBoost. To evaluate the models, we calculated root mean square error (RMSE), mean square error (MSE), and mean absolute error (MAE) to evaluate accuracy and R2 to evaluate explanatory power. Evaluation results are as follows. Linear regression (RMSE 268.15, MSE 71901.87, MAE 231.68, R2 0.92), decision tree (RMSE 108.89, MSE 11856.92, MAE 19.24, R2 0.99), random forest (RMSE 8.89, MSE 79.10, MAE 6.55, R2 0.99), XGBoost (RMSE 10.21, MSE 104.22, MAE 7.68, R2 0.99). The random forest model achieved the highest predictive ability. Improving the model's performance in the future is expected to contribute to lowering exposure among radiology technologists.

Asymptotic Distribution of the LM Test Statistic for the Nested Error Component Regression Model

  • Jung, Byoung-Cheol;Myoungshic Jhun;Song, Seuck-Heun
    • Journal of the Korean Statistical Society
    • /
    • v.28 no.4
    • /
    • pp.489-501
    • /
    • 1999
  • In this paper, we consider the panel data regression model in which the disturbances have nested error component. We derive a Lagrange Multiplier(LM) test which is jointly testing for the presence of random individual effects and nested effects under the normality assumption of the disturbances. This test extends the earlier work of Breusch and Pagan(1980) and Baltagi and Li(1991). Further, it is shown that this LM test has the same asymptotic distribution without normality assumption of the disturbances.

  • PDF

Balanced Simultaneous Confidence Intervals in Logistic Regression Models

  • Lee, Kee-Won
    • Journal of the Korean Statistical Society
    • /
    • v.21 no.2
    • /
    • pp.139-151
    • /
    • 1992
  • Simultaneous confidence intervals for the parameters in the logistic regression models with random regressors are considered. A method based on the bootstrap and its stochastic approximation will be developed. A key idea in using the bootstrap method to construct simultaneous confidence intervals is the concept of prepivoting which uses the transformation of a root by its estimated cumulative distribution function. Repeated use of prepivoting makes the overall coverage probability asymptotically correct and the coverage probabilities of the individual confidence statement asymptotically equal. This method is compared with ordinary asymptotic methods based on Scheffe's and Bonferroni's through Monte Carlo simulation.

  • PDF

Asymptotic Consistency of Least Squares Estimators in Fuzzy Regression Model

  • Yoon, Jin-Hee;Kim, Hae-Kyung;Choi, Seung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.6
    • /
    • pp.799-813
    • /
    • 2008
  • This paper deals with the properties of the fuzzy least squares estimators for fuzzy linear regression model. Especially fuzzy triangular input-output model including error term is proposed. The error term is considered as a fuzzy random variable. The asymptotic unbiasedness and the consistency of the estimators are proved using a suitable metric.