• Title/Summary/Keyword: data bias

Search Result 1,778, Processing Time 0.034 seconds

THE EFFECT OF THE REPEATABILITY FILE IN THE NIRS EATTY ACIDS ANALYSIS OF ANIMAL EATS

  • Perez Marin, M.D.;De Pedro, E.;Garcia Olmo, J.;Garrido Varo, A.
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.4107-4107
    • /
    • 2001
  • Previous works have shown the viability of NIRS technology for the prediction of fatty acids in Iberian pig fat, but although the resulting equations showed high precision, in the predictions of new samples important fluctuations were detected, greater with the time passed from calibration development to NIRS analysis. This fact makes the use of NIRS calibrations in routine analysis difficult. Moreover, this problem only appears in products like fat, that show spectrums with very defined absorption peaks at some wavelengths. This circumstance causes a high sensibility to small changes of the instrument, which are not perceived with the normal checks. To avoid these inconveniences, the software WinISI 1.04 has a mathematic algorithm that consist of create a “Repeatability File”. This file is used during calibration development to minimize the variation sources that can affect the NIRS predictions. The objective of the current work is the evaluation of the use of a repeatability file in quantitative NIRS analysis of Iberian pig fat. A total of 188 samples of Iberian pig fat, produced by COVAP, were used. NIR data were recorded using a FOSS NIRSystems 6500 I spectrophotometer equipped with a spinning module. Samples were analysed by folded transmission, using two sample cells of 0.1mm pathlength and gold surface. High accuracy calibration equations were obtained, without and with repeatability file, to determine the content of six fatty acids: miristic (SECV$\sub$without/=0.07% r$^2$$\sub$without/=0.76 and SECV$\sub$with/=0.08% r$^2$$\sub$with/=0.65), Palmitic (SECV$\sub$without/=0.28 r$^2$$\sub$without/=0.97 and SECV$\sub$with/=0.24% r$^2$$\sub$with/=0.98), palmitoleic (SECV$\sub$without/=0.08 r$^2$$\sub$without/=0.94 and SECV$\sub$with/=0.09% r$^2$$\sub$with/=0.92), Stearic (SECV$\sub$without/=0.27 r$^2$$\sub$without/=0.97 and SECV$\sub$with/=0.29% r$^2$$\sub$with/=0.96), oleic (SECV$\sub$without/=0.20 r$^2$$\sub$without/=0.99 and SECV$\sub$with/=0.20% r$^2$$\sub$with/=0.99) and linoleic (SECV$\sub$without/=0.16 r$^2$$\sub$without/=0.98 and SECV$\sub$with/=0.16% r$^2$$\sub$with/=0.98). The use of a repeatability file like a tool to reduce the variation sources that can disturbed the prediction accuracy was very effective. Although in calibration results the differences are negligible, the effect caused by the repeatability file is appreciated mainly when are predicted new samples that are not in the calibration set and whose spectrum were recorded a long time after the equation development. In this case, bias values corresponding to fatty acids predictions were lower when the repeatability file was used: miristic (bias$\sub$without/=-0.05 and bias$\sub$with/=-0.04), Palmitic (bias$\sub$without/=-0.42 and bias$\sub$with/=-0.11), Palmitoleic (bias$\sub$without/=-0.03 and bias$\sub$with/=0.03), Stearic (bias$\sub$without/=0.47 and bias$\sub$with/=0.28), oleic (bias$\sub$without/=0.14 and bias$\sub$with/=-0.04) and linoleic (bias$\sub$without/=0.25 and bias$\sub$with/=-0.20).

  • PDF

A comparison study on the estimation of the relative risk for the unemployed rate in small area (소지역의 실업률에 대한 상대위험도의 추정에 관한 비교연구)

  • Park, Jong-Tae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.2
    • /
    • pp.349-356
    • /
    • 2009
  • In this study, we suggest the estimation method of the relative risk for the unemployment statistics of a small area such as si, gun, gu in Korea. The considered method are the usual pooled estimator, weighted estimator with the inverse of log-variance as weights, and the Jackknife estimator. And we compare with the efficiency of the three estimators by estimating the bias and mean square errors using real data from the 2002 Economically Active Population Survey of Gyeonggi-do. We compute the unemployed rate of male and female in small areas, and then estimate the common relative risk for the unemployed rate between male and female. Also, the stability and reliability of the three estimators for the common relative risk was evaluated using the RB(relative bias) and the RRMSE(relative root mean square error) of these estimators. Finally, the Jackknife estimator turned out to be much more efficient than the other estimators.

  • PDF

Nearest-neighbor Rule based Prototype Selection Method and Performance Evaluation using Bias-Variance Analysis (최근접 이웃 규칙 기반 프로토타입 선택과 편의-분산을 이용한 성능 평가)

  • Shim, Se-Yong;Hwang, Doo-Sung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.10
    • /
    • pp.73-81
    • /
    • 2015
  • The paper proposes a prototype selection method and evaluates the generalization performance of standard algorithms and prototype based classification learning. The proposed prototype classifier defines multidimensional spheres with variable radii within class areas and generates a small set of training data. The nearest-neighbor classifier uses the new training set for predicting the class of test data. By decomposing bias and variance of the mean expected error value, we compare the generalization errors of k-nearest neighbor, Bayesian classifier, prototype selection using fixed radius and the proposed prototype selection method. In experiments, the bias-variance changing trends of the proposed prototype classifier are similar to those of nearest neighbor classifiers with all training data and the prototype selection rates are under 27.0% on average.

Assessment of Turbulent Spectral Estimators in LDV (LDV의 난류 스펙트럼 추정치 평가)

  • 이도환;성형진
    • Transactions of the Korean Society of Mechanical Engineers
    • /
    • v.16 no.9
    • /
    • pp.1788-1795
    • /
    • 1992
  • Numerical simulations have been performed to investigate various spectral estimators used in LDV signal processing. In order to simulate a particle arrival time statistics known as the doubly stochastic poisson process, an autoregressive vector model was adopted to construct a primary velocity field. The conditional Poisson process with a random rate parameter was generated through the rescaling time process using the mean value function. The direct transform based on random sampling sequences and the standard periodogram using periodically resampled data by the sample and hold interpolation were applied to obtain power spectral density functions. For low turbulent intensity flows, the direct transform with a constant Poisson intensity is in good agreement with the theoretical spectrum. The periodogram using the sample and hold sequences is better than the direct transform in the view of the stability and the weighting of the velocity bias for high data density flows. The high Reynolds stress and high fluctuation of the transverse velocity component affects the velocity bias which increases the distortion of spectral components in the direct transform.

Composite estimation type weighting adjustment for bias reduction of non-continuous response group in panel survey (패널조사에서 비연속 응답 그룹 편향 보정을 위한 복합가중값)

  • Choi, Hyunga;Kim, Youngwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.3
    • /
    • pp.375-389
    • /
    • 2019
  • Sample attrition according to a long-term tracking reduces the representativeness of the sample data in a panel study. Most panel surveys in South Korea and other countries have prepared response adjustment weights in order to solve problems regarding representativeness due to sample attrition. In this paper, we divided the panel data into continuous response group and non-continuous response group according to response patterns and considered a weighting adjustment method to reduce the bias of the non-continuous response group. A simulation indicated that the proposed composite estimation type weighting method, which reflected the characteristics of non-continuous response groups, could be more efficient than other weighting methods in terms of reducing non-response bias. As a case study, the proposed methods are applied to the Korean Longitudinal Study of Ageing (KLoSA) data of the Korea Employment Information Service.

Evaluation of the equation for predicting dry matter intake of lactating dairy cows in the Korean feeding standards for dairy cattle

  • Lee, Mingyung;Lee, Junsung;Jeon, Seoyoung;Park, Seong-Min;Ki, Kwang-Seok;Seo, Seongwon
    • Animal Bioscience
    • /
    • v.34 no.10
    • /
    • pp.1623-1631
    • /
    • 2021
  • Objective: This study aimed to validate and evaluate the dry matter (DM) intake prediction model of the Korean feeding standards for dairy cattle (KFSD). Methods: The KFSD DM intake (DMI) model was developed using a database containing the data from the Journal of Dairy Science from 2006 to 2011 (1,065 observations 287 studies). The development (458 observations from 103 studies) and evaluation databases (168 observations from 74 studies) were constructed from the database. The body weight (kg; BW), metabolic BW (BW0.75, MBW), 4% fat-corrected milk (FCM), forage as a percentage of dietary DM, and the dietary content of nutrients (% DM) were chosen as possible explanatory variables. A random coefficient model with the study as a random variable and a linear model without the random effect was used to select model variables and estimate parameters, respectively, during the model development. The best-fit equation was compared to published equations, and sensitivity analysis of the prediction equation was conducted. The KFSD model was also evaluated using in vivo feeding trial data. Results: The KFSD DMI equation is 4.103 (±2.994)+0.112 (±0.022)×MBW+0.284 (±0.020)×FCM-0.119 (±0.028)×neutral detergent fiber (NDF), explaining 47% of the variation in the evaluation dataset with no mean nor slope bias (p>0.05). The root mean square prediction error was 2.70 kg/d, best among the tested equations. The sensitivity analysis showed that the model is the most sensitive to FCM, followed by MBW and NDF. With the in vivo data, the KFSD equation showed slightly higher precision (R2 = 0.39) than the NRC equation (R2 = 0.37), with a mean bias of 1.19 kg and no slope bias (p>0.05). Conclusion: The KFSD DMI model is suitable for predicting the DMI of lactating dairy cows in practical situations in Korea.

Efficiency of Aggregate Data in Non-linear Regression

  • Huh, Jib
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.2
    • /
    • pp.327-336
    • /
    • 2001
  • This work concerns estimating a regression function, which is not linear, using aggregate data. In much of the empirical research, data are aggregated for various reasons before statistical analysis. In a traditional parametric approach, a linear estimation of the non-linear function with aggregate data can result in unstable estimators of the parameters. More serious consequence is the bias in the estimation of the non-linear function. The approach we employ is the kernel regression smoothing. We describe the conditions when the aggregate data can be used to estimate the regression function efficiently. Numerical examples will illustrate our findings.

  • PDF

A Comparison of Analysis Methods for Work Environment Measurement Databases Including Left-censored Data (불검출 자료를 포함한 작업환경측정 자료의 분석 방법 비교)

  • Park, Ju-Hyun;Choi, Sangjun;Koh, Dong-Hee;Park, Donguk;Sung, Yeji
    • Journal of Korean Society of Occupational and Environmental Hygiene
    • /
    • v.32 no.1
    • /
    • pp.21-30
    • /
    • 2022
  • Objectives: The purpose of this study is to suggest an optimal method by comparing the analysis methods of work environment measurement datasets including left-censored data where one or more measurements are below the limit of detection (LOD). Methods: A computer program was used to generate left-censored datasets for various combinations of censoring rate (1% to 90%) and sample size (30 to 300). For the analysis of the censored data, the simple substitution method (LOD/2), β-substitution method, maximum likelihood estimation (MLE) method, Bayesian method, and regression on order statistics (ROS)were all compared. Each method was used to estimate four parameters of the log-normal distribution: (1) geometric mean (GM), (2) geometric standard deviation (GSD), (3) 95th percentile (X95), and (4) arithmetic mean (AM) for the censored dataset. The performance of each method was evaluated using relative bias and relative root mean squared error (rMSE). Results: In the case of the largest sample size (n=300), when the censoring rate was less than 40%, the relative bias and rMSE were small for all five methods. When the censoring rate was large (70%, 90%), the simple substitution method was inappropriate because the relative bias was the largest, regardless of the sample size. When the sample size was small and the censoring rate was large, the Bayesian method, the β-substitution method, and the MLE method showed the smallest relative bias. Conclusions: The accuracy and precision of all methods tended to increase as the sample size was larger and the censoring rate was smaller. The simple substitution method was inappropriate when the censoring rate was high, and the β-substitution method, MLE method, and Bayesian method can be widely applied.

Estimating Method of Starting Point Bias in Bidding Game (서베이를 이용한 입찰게임에서 출발점 편의의 추정)

  • 박용치
    • Survey Research
    • /
    • v.4 no.2
    • /
    • pp.63-86
    • /
    • 2003
  • The objective of this study was to investigate the existence of starting point bias in the bidding game contingent valuation elicitation technique when determining willingness to pay (WTP) for improving the qualify of running water in Seoul and its vicinity. Of all existing contingent techniques, the bidding game most closely mimics the normal price taking behavior in local markets. Three different starting points (low, medium and high) were used to determine WTP and the existence of starting point bias in the meanwhile respectively. The respondents were randomly assigned to the three different starting point groups in order to ensure homogeneity, so that any variation seen in WTP could be attributed to the starting point effects. And a pretested interviewer-administered questionnaire used to elicit WTP. Non-parametric test and the logit model were used to analyze the data for evidence of starting point bias. In this instance, the high starting point group had a high WTP, and low starting point group had a low WTP. This means there exist starting point bias in estimating WTP by bidding game in this instance. This finding might signal that people may actually be making up their minds on the maximum amount they are willing to pay for running water service as a bidding iterating is going on and is influenced by the starting point used in the bidding game. The problem of slaking point bias can be avoided if the respondent is asked directly for the maximum WTP without payment cards or a bidding game. But such a question is perceived as being very difficult to answer and this leads to problems of non-response and being unrealistic.

  • PDF

The Effects of Substrate Bias Voltage on the Formation of $(ZnS)_{1-x}-(SiO_2)_x$ Protective Films in Phase Change Optical Disk by R.F. Sputtering Method. (R.F. 스퍼터링법에 의한 상변화형 광디스크의 $(ZnS)_{1-x}-(SiO_2)_x$ 보호막 제조시 기판 바이어스전압의 영향)

  • Lee, Tae-Yun;Kim, Do-Hun
    • Korean Journal of Materials Research
    • /
    • v.8 no.10
    • /
    • pp.961-968
    • /
    • 1998
  • In order to investigate the effects of substrate bias voltage on the formation of$ZnS-SiO_2$ protective film in phase change optical disk by R.F. magnetron sputtering method, thin dielectric film was formed on Si wafer and Corning glass by using ZnS(80mol%)-$SiO_2$(20mol%)t arget under argon gas. In this study, the Taguchi experimental method was applied in order to obtain optimum conditions with reduced number of experiments and to control numerous variables effectively. At the same time this method can assure the reproducibility of experiments. Optimum conditions for film formation obtained by above method were target RF power of 200 W. substrate RF power of 20 W, Ar pressure of 5 mTorr. sputtering time of 20 min.. respectively. The phase of specimen was determined by using XRD and TEM. The compositional analysis of specimen was performed by XPS test. In order to measure the thermal resistivity of deposited specimen, annealing test was carried out at $300^{\circ}C$ and $600^{\circ}C$. For the account of void fraction in thin film, the Bruggeman EMA(Effective Medium Approximation) method was applied using the optical data obtained by Spectroscopic Ellipsometry. According to the results of this work, the existence of strong interaction between bias voltage and sputtering time was confirmed for refractive index value. According to XRD and TEM analysis of specimen, the film structure formed in bias voltage resulted in more refined structures than that formed without bias voltage. But excess bias voltage resulted in grain growth in thin film. It was confirmed that the application of optimum bias voltage increased film density by reduction of void fraction of about 3.7%.

  • PDF