• 제목/요약/키워드: interval regression model

검색결과 349건 처리시간 0.023초

A Yield Estimation Model of Forage Rye Based on Climate Data by Locations in South Korea Using General Linear Model

  • Peng, Jing Lun;Kim, Moon Ju;Kim, Byong Wan;Sung, Kyung Il
    • 한국초지조사료학회지
    • /
    • 제36권3호
    • /
    • pp.205-214
    • /
    • 2016
  • The objective of this study was to construct a forage rye (FR) dry matter yield (DMY) estimation model based on climate data by locations in South Korea. The data set (n = 549) during 29 years were used. Six optimal climatic variables were selected through stepwise multiple regression analysis with DMY as the response variable. Subsequently, via general linear model, the final model including the six climatic variables and cultivated locations as dummy variables was constructed as follows: DMY = 104.166SGD + 1.454AAT + 147.863MTJ + 59.183PAT150 - 4.693SRF + 45.106SRD - 5230.001 + Location, where SGD was spring growing days, AAT was autumnal accumulated temperature, MTJ was mean temperature in January, PAT150 was period to accumulated temperature 150, SRF was spring rainfall, and SRD was spring rainfall days. The model constructed in this research could explain 24.4 % of the variations in DMY of FR. The homoscedasticity and the assumption that the mean of the residuals were equal to zero was satisfied. The goodness-of-fit of the model was proper based on most scatters of the predicted DMY values fell within the 95% confidence interval.

다중회귀모형을 이용한 벤츄리가 없는 충격기류식 여과집진장치 압력손실 예측 (Pressure Drop Predictions Using Multiple Regression Model in Pulse Jet Type Bag Filter Without Venturi)

  • 서정민;박정호;조재환;진경호;정문섭;이병인;홍성철;시바쿠마르;최금찬
    • 한국환경과학회지
    • /
    • 제23권12호
    • /
    • pp.2045-2056
    • /
    • 2014
  • In this study, pressure drop was measured in the pulse jet bag filter without venturi on which 16 numbers of filter bags (Ø$140{\times}850{\ell}$) are installed according to operation condition(filtration velocity, inlet dust concentration, pulse pressure, and pulse interval) using coke dust from steel mill. The obtained 180 pressure drop test data were used to predict pressure drop with multiple regression model so that pressure drop data can be used for effective operation condition and as basic data for economical design. The prediction results showed that when filtration velocity was increased by 1%, pressure drop was increased by 2.2% which indicated that filtration velocity among operation condition was attributed on the pressure drop the most. Pressure was dropped by 1.53% when pulse pressure was increased by 1% which also confirmed that pulse pressure was the major factor affecting on the pressure drop next to filtration velocity. Meanwhile, pressure drops were found increased by 0.3% and 0.37%, respectively when inlet dust concentration and pulse interval were increased by 1% implying that the effects of inlet dust concentration and pulse interval were less as compared with those changes of filtration velocity and pulse pressure. Therefore, the larger effect on the pressure drop the pulse jet bag filter was found in the order of filtration velocity($V_f$), pulse pressure($P_p$), inlet dust concentration($C_i$), pulse interval($P_i$). Also, the prediction result of filtration velocity, inlet dust concentration, pulse pressure, and pulse interval which showed the largest effect on the pressure drop indicated that stable operation can be executed with filtration velocity less than 1.5 m/min and inlet dust concentration less than $4g/m^3$. However, it was regarded that pulse pressure and pulse interval need to be adjusted when inlet dust concentration is higher than $4g/m^3$. When filtration velocity and pulse pressure were examined, operation was possible regardless of changes in pulse pressure if filtration velocity was at 1.5 m/min. If filtration velocity was increased to 2 m/min. operation would be possible only when pulse pressure was set at higher than $5.8kgf/cm^2$. Also, the prediction result of pressure drop with filtration velocity and pulse interval showed that operation with pulse interval less than 50 sec. should be carried out under filtration velocity at 1.5 m/min. While, pulse interval should be set at lower than 11 sec. if filtration velocity was set at 2 m/min. Under the conditions of filtration velocity lower than 1 m/min and high pulse pressure higher than $7kgf/cm^2$, though pressure drop would be less, in this case, economic feasibility would be low due to increased in installation and operation cost since scale of dust collection equipment becomes larger and life of filtration bag becomes shortened due to high pulse pressure.

비만 폐쇄수면무호흡 환자에서 기계학습을 통한 적정양압 예측모형 (Predictive Model of Optimal Continuous Positive Airway Pressure for Obstructive Sleep Apnea Patients with Obesity by Using Machine Learning)

  • 김승수;양광익
    • Journal of Sleep Medicine
    • /
    • 제15권2호
    • /
    • pp.48-54
    • /
    • 2018
  • Objectives: The aim of this study was to develop a predicting model for the optimal continuous positive airway pressure (CPAP) for obstructive sleep apnea (OSA) patient with obesity by using a machine learning. Methods: We retrospectively investigated the medical records of 162 OSA patients who had obesity [body mass index (BMI) ≥ 25] and undertaken successful CPAP titration study. We divided the data to a training set (90%) and a test set (10%), randomly. We made a random forest model and a least absolute shrinkage and selection operator (lasso) regression model to predict the optimal pressure by using the training set, and then applied our models and previous reported equations to the test set. To compare the fitness of each models, we used a correlation coefficient (CC) and a mean absolute error (MAE). Results: The random forest model showed the best performance {CC 0.78 [95% confidence interval (CI) 0.43-0.93], MAE 1.20}. The lasso regression model also showed the improved result [CC 0.78 (95% CI 0.42-0.93), MAE 1.26] compared to the Hoffstein equation [CC 0.68 (95% CI 0.23-0.89), MAE 1.34] and the Choi's equation [CC 0.72 (95% CI 0.30-0.90), MAE 1.40]. Conclusions: Our random forest model and lasso model ($26.213+0.084{\times}BMI+0.004{\times}$apnea-hypopnea index+$0.004{\times}oxygen$ desaturation index-$0.215{\times}mean$ oxygen saturation) showed the improved performance compared to the previous reported equations. The further study for other subgroup or phenotype of OSA is required.

데이터 마이닝을 활용한 사립대학 교육비 환원요인 분석 : 패널 고정효과모형과 비모수회귀추정을 중심으로 (Analysis of Factors for Private Universities Educational Restitution Rate using Data Mining : Focusing on the Panel Fixed Effect Model and Non-parametric Regression Estimation)

  • 채동우;이문범;정군오
    • Journal of Information Technology Applications and Management
    • /
    • 제27권6호
    • /
    • pp.153-170
    • /
    • 2020
  • The Educational Restitution Rate is an important parameter that determines the quality of university education. This paper analyzed data from 148 private universities over the 10 years from 2009 to 2018 using data mining techniques in Korea. A significant causal relationship is detected in the fixed effect model as a result of the panel estimation. And the scale of faculty expansion and fund management, which are the university evaluation indicators, and the size of basic funds, respectively, have a positive effect on the ERR, which is within the confidence interval. In the analysis, the more private universities improve the tuition dependence rate, the more decisively positive affecting ERR. As a result of nonparametric regression estimation, when the faculty expansion ratio is reinforced, the effect of economies of scale is detected in some sections, the improvement of the tuition dependence rate, and the result value is generated through the improvement that results are derived at a certain point in time. We hope that the university based on this study can be a basic Indicators for the diagnosis of basic competencies and policy of student-centered education.

유전 알고리즘을 이용한 국소가중회귀의 다중모델 결합을 위한 점진적 앙상블 학습 (Incremental Ensemble Learning for The Combination of Multiple Models of Locally Weighted Regression Using Genetic Algorithm)

  • 김상훈;정병희;이건호
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제7권9호
    • /
    • pp.351-360
    • /
    • 2018
  • 전통적으로 나태한 학습에 해당하는 국소가중회귀(LWR: Locally Weighted Regression)모델은 입력변수인 질의지점에 따라 예측의 해를 얻기 위해 일정구간 범위내의 학습 데이터를 대상으로 질의지점의 거리에 따라 가중값을 달리 부여하여 학습 한 결과로 얻은 짧은 구간내의 회귀식이다. 본 연구는 메모리 기반학습의 형태에 해당하는 LWR을 위한 점진적 앙상블 학습과정을 제안한다. LWR를 위한 본 연구의 점진적 앙상블 학습법은 유전알고리즘을 이용하여 시간에 따라 LWR모델들을 순차적으로 생성하고 통합하는 것이다. 기존의 LWR 한계는 인디케이터 함수와 학습 데이터의 선택에 따라 다중의 LWR모델이 생성될 수 있으며 이 모델에 따라 예측 해의 질도 달라질 수 있다. 하지만 다중의 LWR 모델의 선택이나 결합의 문제 해결을 위한 연구가 수행되지 않았다. 본 연구에서는 인디케이터 함수와 학습 데이터에 따라 초기 LWR 모델을 생성한 후 진화 학습 과정을 반복하여 적절한 인디케이터 함수를 선택하며 또한 다른 학습 데이터에 적용한 LWR 모델의 평가와 개선을 통하여 학습 데이터로 인한 편향을 극복하고자 한다. 모든 구간에 대해 데이터가 발생 되면 점진적으로 LWR모델을 생성하여 보관하는 열심학습(Eager learning)방식을 취하고 있다. 특정 시점에 예측의 해를 얻기 위해 일정구간 내에 신규로 발생된 데이터들을 기반으로 LWR모델을 생성한 후 유전자 알고리즘을 이용하여 구간 내의 기존 LWR모델들과 결합하는 방식이다. 제안하는 학습방법은 기존 단순평균법을 이용한 다중 LWR모델들의 선택방법 보다 적합도 평가에서 우수한 결과를 보여주고 있다. 특정지역의 시간 별 교통량, 고속도로 휴게소의 시간별 매출액 등의 실제 데이터를 적용하여 본 연구의 LWR에 의한 결과들의 연결된 패턴과 다중회귀분석을 이용한 예측결과를 비교하고 있다.

Estimation of R factor using hourly rainfall data

  • Risal, Avay;Kum, Donghyuk;Han, Jeongho;Lee, Dongjun;Lim, Kyoungjae
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2016년도 학술발표회
    • /
    • pp.260-260
    • /
    • 2016
  • Soil erosion is a very serious problem from agricultural as well as environmental point of view. Various computer models have been used to estimate soil erosion and assess erosion control practice. Universal Soil loss equation (USLE) is a popular model which has been used in many countries around the world. Erosivity (USLE R-factor) is one of the USLE input parameters to reflect impacts of rainfall in computing soil loss. Value of R factor depends upon Energy (E) and maximum rainfall intensity of specific period ($I30_{max}$) of that rainfall event and thus can be calculated using higher temporal resolution rainfall data such as 10 minute interval. But 10 minute interval rainfall data may not be available in every part of the world. In that case we can use hourly rainfall data to compute this R factor. Maximum 60 minute rainfall ($I60_{max}$) can be used instead of maximum 30 minute rainfall ($I30_{max}$) as suggested by USLE manual. But the value of Average annual R factor computed using hourly rainfall data needs some correction factor so that it can be used in USLE model. The objective of our study are to derive relation between averages annual R factor values using 10 minute interval and hourly rainfall data and to determine correction coefficient for R factor using hourly Rainfall data.75 weather stations of Korea were selected for our study. Ten minute interval rainfall data for these stations were obtained from Korea Meteorological Administration (KMA) and these data were changed to hourly rainfall data. R factor and $I60_{max}$ obtained from hourly rainfall data were compared with R factor and $I30_{max}$ obtained from 10 minute interval data. Linear relation between Average annual R factor obtained from 10 minute interval rainfall and from hourly data was derived with $R^2=0.69$. Correction coefficient was developed for the R factor calculated using hourly rainfall data.. Similarly, the relation was obtained between event wise $I30_{max}$ and $I60_{max}$ with higher $R^2$ value of 0.91. Thus $I30_{max}$ can be estimated from I60max with higher accuracy and thus the hourly rainfall data can be used to determine R factor more precisely by multiplying Energy of each rainfall event with this corrected $I60_{max}$.

  • PDF

Survival Analysis of Gastric Cancer Patients with Incomplete Data

  • Moghimbeigi, Abbas;Tapak, Lily;Roshanaei, Ghodaratolla;Mahjub, Hossein
    • Journal of Gastric Cancer
    • /
    • 제14권4호
    • /
    • pp.259-265
    • /
    • 2014
  • Purpose: Survival analysis of gastric cancer patients requires knowledge about factors that affect survival time. This paper attempted to analyze the survival of patients with incomplete registered data by using imputation methods. Materials and Methods: Three missing data imputation methods, including regression, expectation maximization algorithm, and multiple imputation (MI) using Monte Carlo Markov Chain methods, were applied to the data of cancer patients referred to the cancer institute at Imam Khomeini Hospital in Tehran in 2003 to 2008. The data included demographic variables, survival times, and censored variable of 471 patients with gastric cancer. After using imputation methods to account for missing covariate data, the data were analyzed using a Cox regression model and the results were compared. Results: The mean patient survival time after diagnosis was $49.1{\pm}4.4$ months. In the complete case analysis, which used information from 100 of the 471 patients, very wide and uninformative confidence intervals were obtained for the chemotherapy and surgery hazard ratios (HRs). However, after imputation, the maximum confidence interval widths for the chemotherapy and surgery HRs were 8.470 and 0.806, respectively. The minimum width corresponded with MI. Furthermore, the minimum Bayesian and Akaike information criteria values correlated with MI (-821.236 and -827.866, respectively). Conclusions: Missing value imputation increased the estimate precision and accuracy. In addition, MI yielded better results when compared with the expectation maximization algorithm and regression simple imputation methods.

피부섬유모세포 전사체 정보를 활용한 구간 선택 기반 연령 예측 (Age Prediction based on the Transcriptome of Human Dermal Fibroblasts through Interval Selection)

  • 석호식
    • 전기전자학회논문지
    • /
    • 제26권3호
    • /
    • pp.494-499
    • /
    • 2022
  • 본 논문에서는 인간의 피부섬유모세포(Human dermal fibroblasts)로부터 확보한 전사체 정보를 활용하여 나이를 예측하는 방법을 소개한다. 제안 방법에서는 훈련을 통해 확보한 분류기 및 회귀 모델을 이용하여 샘플이 속한 적합한 연령 그룹을 선택한 후, 선택된 연령 그룹에 속하는 훈련 데이터의 관측값을 활용하여 구체적인 연령을 예측한다. 연령을 예측하려는 샘플이 입력되면 복수 개의 판별 규칙이 순서대로 실행되는데, 개별 판별 규칙에서는 분류기와 회귀 모델을 동시에 실행하여 해당 판별 규칙에 대한 선택조건이 만족되는지 여부를 확인한다. 선택 조건이 만족될 경우 판별 규칙의 타겟 연령 그룹에 속하는 데이터를 이용하여 훈련된 회귀 모델로 연령을 예측하며, 선택 조건이 만족되지 않으면 후속 판별 규칙을 실행한다. 공개 데이터에 대하여 실험한 결과 기존 연구에서 달성한 7.7년의 평균 예측 오차보다 우수한 5.7년이라는 평균 예측 오차를 달성함을 확인하였다.

VALIDATION OF ON-LINE MONITORING TECHNIQUES TO NUCLEAR PLANT DATA

  • Garvey, Jamie;Garvey, Dustin;Seibert, Rebecca;Hines, J. Wesley
    • Nuclear Engineering and Technology
    • /
    • 제39권2호
    • /
    • pp.133-142
    • /
    • 2007
  • The Electric Power Research Institute (EPRI) demonstrated a method for monitoring the performance of instrument channels in Topical Report (TR) 104965, 'On-Line Monitoring of Instrument Channel Performance.' This paper presents the results of several models originally developed by EPRI to monitor three nuclear plant sensor sets: Pressurizer Level, Reactor Protection System (RPS) Loop A, and Reactor Coolant System (RCS) Loop A Steam Generator (SG) Level. The sensor sets investigated include one redundant sensor model and two non-redundant sensor models. Each model employs an Auto-Associative Kernel Regression (AAKR) model architecture to predict correct sensor behavior. Performance of each of the developed models is evaluated using four metrics: accuracy, auto-sensitivity, cross-sensitivity, and newly developed Error Uncertainty Limit Monitoring (EULM) detectability. The uncertainty estimate for each model is also calculated through two methods: analytic formulas and Monte Carlo estimation. The uncertainty estimates are verified by calculating confidence interval coverages to assure that 95% of the measured data fall within the confidence intervals. The model performance evaluation identified the Pressurizer Level model as acceptable for on-line monitoring (OLM) implementation. The other two models, RPS Loop A and RCS Loop A SG Level, highlight two common problems that occur in model development and evaluation, namely faulty data and poor signal selection

불완전한 관측틈을 가진 재발 사건 소요시간에 대한 자료 분석 (Statistical analysis of recurrent gap time events with incomplete observation gaps)

  • 신슬비;김양진
    • Journal of the Korean Data and Information Science Society
    • /
    • 제25권2호
    • /
    • pp.327-336
    • /
    • 2014
  • 재발 사건 자료란 연구대상이 같은 종류의 사건을 반복적으로 경험할 때 발생하는 자료이다. 이러한 재발 사건은 사회과학, 자연과학, 공학, 의약학 등 다양한 분야에서 나타날 수 있다. 재발 사건자료를 분석할 때 연구자의 관심에 따라 사건 발생시간이나 사건 발생간의 소요시간을 이용하여 분석할 수 있다. 이 논문에서는 사건 발생시점간의 소요시간을 이용하여 불완전한 관측을 가진 재발 사건자료를 분석하고자 한다. 이 자료의 특징은 일부 관측대상들이 일정기간 동안 연구에서 제외되는 관측틈을 갖는다는 것이다. 이 때 관측틈은 불완전한 형태로 나타나게 되는데 그 이유는 관측틈의 시작시점은 알고 있지만 종료시점은 알 수 없기 때문이다. 이러한 미지의 종료시점을 추정하기 위해서 구간 중도 절단 방법이 적용된다. 따라서 종료시점이 추정된 후 프레일티를 포함한 회귀모형을 적용하여 공변량이 사건 재발에 미치는 영향을 알아볼 수 있다. 또한 제안한 방법을 실제자료에 적용하여 관측틈을 고려한 경우와 고려하지 않은 경우를 비교하고자 한다.