• Title/Summary/Keyword: mean squared error (MSE)

Search Result 171, Processing Time 0.025 seconds

Different penalty methods for assessing interval from first to successful insemination in Japanese Black heifers

  • Setiaji, Asep;Oikawa, Takuro
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.32 no.9
    • /
    • pp.1349-1354
    • /
    • 2019
  • Objective: The objective of this study was to determine the best approach for handling missing records of first to successful insemination (FS) in Japanese Black heifers. Methods: Of a total of 2,367 records of heifers born between 2003 and 2015 used, 206 (8.7%) of open heifers were missing. Four penalty methods based on the number of inseminations were set as follows: C1, FS average according to the number of inseminations; C2, constant number of days, 359; C3, maximum number of FS days to each insemination; and C4, average of FS at the last insemination and FS of C2. C5 was generated by adding a constant number (21 d) to the highest number of FS days in each contemporary group. The bootstrap method was used to compare among the 5 methods in terms of bias, mean squared error (MSE) and coefficient of correlation between estimated breeding value (EBV) of non-censored data and censored data. Three percentages (5%, 10%, and 15%) were investigated using the random censoring scheme. The univariate animal model was used to conduct genetic analysis. Results: Heritability of FS in non-censored data was $0.012{\pm}0.016$, slightly lower than the average estimate from the five penalty methods. C1, C2, and C3 showed lower standard errors of estimated heritability but demonstrated inconsistent results for different percentages of missing records. C4 showed moderate standard errors but more stable ones for all percentages of the missing records, whereas C5 showed the highest standard errors compared with noncensored data. The MSE in C4 heritability was $0.633{\times}10^{-4}$, $0.879{\times}10^{-4}$, $0.876{\times}10^{-4}$ and $0.866{\times}10^{-4}$ for 5%, 8.7%, 10%, and 15%, respectively, of the missing records. Thus, C4 showed the lowest and the most stable MSE of heritability; the coefficient of correlation for EBV was 0.88; 0.93 and 0.90 for heifer, sire and dam, respectively. Conclusion: C4 demonstrated the highest positive correlation with the non-censored data set and was consistent within different percentages of the missing records. We concluded that C4 was the best penalty method for missing records due to the stable value of estimated parameters and the highest coefficient of correlation.

Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models

  • Oh Beom Kwon;Solji Han;Hwa Young Lee;Hye Seon Kang;Sung Kyoung Kim;Ju Sang Kim;Chan Kwon Park;Sang Haak Lee;Seung Joon Kim;Jin Woo Kim;Chang Dong Yeo
    • Tuberculosis and Respiratory Diseases
    • /
    • v.86 no.3
    • /
    • pp.203-215
    • /
    • 2023
  • Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models. Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets. Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07. Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.

Real Time ECG Derived Respiratory Extraction from Heart Rate for Single Lead ECG Measurement using Conductive Textile Electrode (전도성 직물을 이용한 단일 리드 심전도 측정 및 실시간 심전도 유도 호흡 추출 방법에 관한 연구)

  • Yi, Kye-Hyoung;Park, Sung-Bin;Yoon, Hyoung-Ro
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.55 no.7
    • /
    • pp.335-343
    • /
    • 2006
  • We have designed the system that measure one channel ECG by two electrode and extract real-time EDR with more related resipiration and comportable to subject by using conductive textile. On the assumption that relation between RL electrode and potential measurement electrode is coupled with RC connected model, we designed RL drive output to feedback two electrode for reduction of common mode signal. The conductive textile which was used for two ECG electrode was offered more comfort during night sleep in bed than any other method using attachments. In the method of single-lead EDR, R wave point or QRS interval area could be used for EDR estimation in traditional method, it is, so to speak, the amplitude modulation(AM) method for EDR. Alternatively, R-R interval could be used for frequency modulation(FM) method based on Respiratory Sinus Arrhythmia(RSA). For evaluation of performance on AM EDR and FM EDR from 14 subject, ECG lead III was measured. Each EDR was compared with both temperature around nose(direct measurement of respiration) and respiration signal from thoracic belt(indirect measurement of respiration) on mean squared error(MSE), cross correlation(Xcorr), and Coherence. The upsampling interpolation technique of multirate signal processing is applied to interpolating data instead of cubic spline interpolation. As a result, we showed the real-time EDR extraction processing to be implemented at micro-controller.

Predictive Growth Models of Bacillus cereus on Dried Laver Pyropia pseudolinearis as Function of Storage Temperature (저장온도에 따른 마른김(Pyropia pseudolinearis)의 Bacillus cereus 성장예측모델 개발)

  • Choi, Man-Seok;Kim, Ji Yoon;Jeon, Eun Bi;Park, Shin Young
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.53 no.5
    • /
    • pp.699-706
    • /
    • 2020
  • Predictive models in food microbiology are used for predicting microbial growth or death rates using mathematical and statistical tools considering the intrinsic and extrinsic factors of food. This study developed predictive growth models for Bacillus cereus on dried laver Pyropia pseudolinearis stored at different temperatures (5, 10, 15, 20, and 25℃). Primary models developed for specific growth rate (SGR), lag time (LT), and maximum population density (MPD) indicated a good fit (R2≥0.98) with the Gompertz equation. The SGR values were 0.03, 0.08, and 0.12, and the LT values were 12.64, 4.01, and 2.17 h, at the storage temperatures of 15, 20, and 25℃, respectively. Secondary models for the same parameters were determined via nonlinear regression as follows: SGR=0.0228-0.0069*T1+0.0005*T12; LT=113.0685-9.6256*T1+0.2079*T12; MPD=1.6630+0.4284*T1-0.0080*T12 (where T1 is the storage temperature). The appropriateness of the secondary models was validated using statistical indices, such as mean squared error (MSE<0.01), bias factor (0.99≤Bf≤1.07), and accuracy factor (1.01≤Af≤1.14). External validation was performed at three random temperatures, and the results were consistent with each other. Thus, these models may be useful for predicting the growth of B. cereus on dried laver.

Statistical Verification of Precipitation Forecasts from MM5 for Heavy Snowfall Events in Yeongdong Region (영동대설 사례에 대한 MM5 강수량 모의의 통계적 검증)

  • Lee, Jeong-Soon;Kwon, Tae-Yong;Kim, Deok-Rae
    • Atmosphere
    • /
    • v.16 no.2
    • /
    • pp.125-139
    • /
    • 2006
  • Precipitation forecasts from MM5 have been verified for the period 1989-2001 over Yeongdong region to show a tendency of model forecast. We select 57 events which are related with the heavy snowfall in Yeongdong region. They are classified into three precipitation types; mountain type, cold-coastal type, and warm type. The threat score (TS), the probability of detection (POD), and the false-alarm rate (FAR) are computed for categorical verification and the mean squared error (MSE) is also computed for scalar accuracy measures. In the case of POD, warm, mountain, and cold-coastal precipitation type are 0.71, 0.69, and 0.55 in turn, respectively. In aspect of quantitative verification, mountain and cold-coastal type are relatively well matched between forecasts and observations, while for warm type MM5 tends to overestimate precipitation. There are 12 events for the POD below 0.2, mountain, cold-coastal, warm type are 2, 7, 3 events, respectively. Most of their precipitation are distributed over the East Sea nearby Yeongdong region. These events are also shown when there are no or very weak easterlies in the lower troposphere. Even in the case that we use high resolution sea surface temperature (about 18 km) for the boundary condition, there are not much changes in the wind direction to compare that with low resolution sea surface temperature (about 100 km).

A Study on the Postprocessing of Channel Estimates in LTE System (LTE 시스템 채널 추정치의 후처리 기법 연구)

  • Yoo, Kyung-Yul
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.60 no.1
    • /
    • pp.205-213
    • /
    • 2011
  • The Long Term Evolution (LTE) system is designed to provide a high quality data service for fast moving mobile users. It is based on the Orthogonal Frequency Division Multiplexing (OFDM) and relies its channel estimation on the training samples which are systematically built within the transmitting data. Either a preamble or a lattice type is used for the distribution of training samples and the latter suits better for the multipath fading channel environment whose channel frequency response (CFR) fluctuates rapidly with time. In the lattice-type structure, the estimation of the CFR makes use of the least squares estimate (LSE) for each pilot samples, followed by an interpolation both in time-and in frequency-domain to fill up the channel estimates for subcarriers corresponding to data samples. All interpolation schemes should rely on the pilot estimates only, and thus, their performances are bounded by the quality of pilot estimates. However, the additive noise give rise to high fluctuation on the pilot estimates, especially in a communication environment with low signal-to-noise ratio. These high fluctuations could be monitored in the alternating high values of the first forward differences (FFD) between pilot estimates. In this paper, we analyzed statistically those FFD values and propose a postprocessing algorithm to suppress high fluctuations in the noisy pilot estimates. The proposed method is based on a localized adaptive moving-average filtering. The performance of the proposed technique is verified on a multipath environment suggested on a 3GPP LTE specification. It is shown that the mean-squared error (MSE) between the actual CFR and pilot estimates could be reduced up to 68% from the noisy pilot estimates.

A Study on the Performance improvement of TEA adaptive equalizer using Precoding (사전 부호화를 이용한 TEA 적응 등화기의 성능 개선에 관한 연구)

  • Lim Seung-Gag
    • The KIPS Transactions:PartC
    • /
    • v.13C no.3 s.106
    • /
    • pp.369-374
    • /
    • 2006
  • This paper related with the performance improvement of adaptive equalizer that is a based on the tricepstrum eqalization algorithm by using the received signal. Adaptive equalizer used for the improvement of communication performance, like as high speed, maintain of synchronization, BER, at the receive side in the environment of communication channel of the presence of the aditive noise, phase distortion and frequency selective fading, mainly. It's characteristics are nearly same as the inverse characterstics of the communication channel. In this paper, the TEA algorithm using the HOS and the 16-QAM which is 2-dimensional signaling method for being considered signal was used. For the precoding of 16-QAM singnal in the assignment of the signal costellation, Gray code was used, and the improvement of performance was gained by computer simulation in the residual intersymbol interence and mean squared error which is representive measurement of adaptive equalizer.

Panel data analysis with regression trees (회귀나무 모형을 이용한 패널데이터 분석)

  • Chang, Youngjae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1253-1262
    • /
    • 2014
  • Regression tree is a tree-structured solution in which a simple regression model is fitted to the data in each node made by recursive partitioning of predictor space. There have been many efforts to apply tree algorithms to various regression problems like logistic regression and quantile regression. Recently, algorithms have been expanded to the panel data analysis such as RE-EM algorithm by Sela and Simonoff (2012), and extension of GUIDE by Loh and Zheng (2013). The algorithms are briefly introduced and prediction accuracy of three methods are compared in this paper. In general, RE-EM shows good prediction accuracy with least MSE's in the simulation study. A RE-EM tree fitted to business survey index (BSI) panel data shows that sales BSI is the main factor which affects business entrepreneurs' economic sentiment. The economic sentiment BSI of non-manufacturing industries is higher than that of manufacturing ones among the relatively high sales group.

Development of Diameter Growth Models by Thinning Intensity of Planted Quercus glauca Thunb. Stands

  • Jung, Su Young;Lee, Kwang Soo;Kim, Hyun Soo
    • Journal of People, Plants, and Environment
    • /
    • v.24 no.6
    • /
    • pp.629-638
    • /
    • 2021
  • Background and objective: This study was conducted to develop diameter growth models for thinned Quercus glauca Thunb. (QGT) stands to inform production goals for treatment and provide the information necessary for the systematic management of this stands. Methods: This study was conducted on QGT stands, of which initial thinning was completed in 2013 to develop a treatment system. To analyze the tree growth and trait response for each thinning treatment, forestry surveys were conducted in 2014 and 2021, and a one-way analysis of variance (ANOVA) was executed. In addition, non-linear least squares regression of the PROC NLIN procedure was used to develop an optimal diameter growth model. Results: Based on growth and trait analyses, the height and height-to-diameter (H/D) ratio were not different according to treatment plot (p > .05). For the diameter of basal height (DBH), the heavy thinning (HT) treatment plot was significantly larger than the control plot (p < .05). As a result of the development of diameter growth models by treatment plot, the mean squared error (MSE) of the Gompertz polymorphic equation (control: 2.2381, light thinning: 0.8478, and heavy thinning: 0.8679) was the lowest in all treatment plots, and the Shapiro-Wilk statistic was found to follow a normal distribution (p > .95), so it was selected as an equation fit for the diameter growth model. Conclusion: The findings of this study provide basic data for the systematic management of Quercus glauca Thunb. stands. It is necessary to construct permanent sample plots (PSP) that consider stand status, location conditions, and climatic environments.

Machine Learning-based Rapid Seismic Performance Evaluation for Seismically-deficient Reinforced Concrete Frame (기계학습 기반 지진 취약 철근콘크리트 골조에 대한 신속 내진성능 등급 예측모델 개발 연구)

  • Kang, TaeWook;Kang, Jaedo;Oh, Keunyeong;Shin, Jiuk
    • Journal of the Earthquake Engineering Society of Korea
    • /
    • v.28 no.4
    • /
    • pp.193-203
    • /
    • 2024
  • Existing reinforced concrete (RC) building frames constructed before the seismic design was applied have seismically deficient structural details, and buildings with such structural details show brittle behavior that is destroyed early due to low shear performance. Various reinforcement systems, such as fiber-reinforced polymer (FRP) jacketing systems, are being studied to reinforce the seismically deficient RC frames. Due to the step-by-step modeling and interpretation process, existing seismic performance assessment and reinforcement design of buildings consume an enormous amount of workforce and time. Various machine learning (ML) models were developed using input and output datasets for seismic loads and reinforcement details built through the finite element (FE) model developed in previous studies to overcome these shortcomings. To assess the performance of the seismic performance prediction models developed in this study, the mean squared error (MSE), R-square (R2), and residual of each model were compared. Overall, the applied ML was found to rapidly and effectively predict the seismic performance of buildings according to changes in load and reinforcement details without overfitting. In addition, the best-fit model for each seismic performance class was selected by analyzing the performance by class of the ML models.