• 제목/요약/키워드: Prediction mean squares error

검색결과 27건 처리시간 0.02초

A modified partial least squares regression for the analysis of gene expression data with survival information

  • Lee, So-Yoon;Huh, Myung-Hoe;Park, Mira
    • Journal of the Korean Data and Information Science Society
    • /
    • 제25권5호
    • /
    • pp.1151-1160
    • /
    • 2014
  • In DNA microarray studies, the number of genes far exceeds the number of samples and the gene expression measures are highly correlated. Partial least squares regression (PLSR) is one of the popular methods for dimensional reduction and known to be useful for the classifications of microarray data by several studies. In this study, we suggest a modified version of the partial least squares regression to analyze gene expression data with survival information. The method is designed as a new gene selection method using PLSR with an iterative procedure of imputing censored survival time. Mean square error of prediction criterion is used to determine the dimension of the model. To visualize the data, plot for variables superimposed with samples are used. The method is applied to two microarray data sets, both containing survival time. The results show that the proposed method works well for interpreting gene expression microarray data.

Large-sample comparisons of calibration procedures when both measurements are subject to error

  • Lee, Seung-Hoon;Yum, Bong-Jin
    • 한국경영과학회:학술대회논문집
    • /
    • 대한산업공학회/한국경영과학회 1990년도 춘계공동학술대회논문집; 한국과학기술원; 28 Apr. 1990
    • /
    • pp.254-262
    • /
    • 1990
  • A predictive functional relationship model is presented for the calibration problem in which the standard as well as the nonstandard measurements are subject to error. For the estimation of the relationship between the two measurements, the ordinary least squares and maximum likelihood estimation methods are considered, while for the prediction of unknown standard measurementswe consider direct and inverse approaches. Relative performances of those calibration procedures are compared in terms of the asymptotic mean square error of prediction.

  • PDF

Large-Sample Comparisons of Statistical Calibration Procedures When the Standard Measurement is Also Subject to Error: The Replicated Case

  • Lee, Seung-Hoon;Yum, Bong-Jin
    • Journal of the Korean Statistical Society
    • /
    • 제17권1호
    • /
    • pp.9-23
    • /
    • 1988
  • The classicla theory of statistical calibration assumes that the standard measurement is exact. From a realistic point of view, however, this assumption needs to be relaxed so that more meaningful calibration procedures may be developed. This paper presents a model which explicitly considers errors in both standard and nonstandard measurements. Under the assumption that replicated observations are available in the calibration experiment, three estimation techniques (ordinary least squares, grouping least squares, and maximum likelihood estimation) combined with two prediction methods (direct and inverse prediction) are compared in terms of the asymptotic mean square error of prediction.

  • PDF

Construction of a Ginsenoside Content-predicting Model based on Hyperspectral Imaging

  • Ning, Xiao Feng;Gong, Yuan Juan;Chen, Yong Liang;Li, Hongbo
    • Journal of Biosystems Engineering
    • /
    • 제43권4호
    • /
    • pp.369-378
    • /
    • 2018
  • Purpose: The aim of this study was to construct a saponin content-predicting model using shortwave infrared imaging spectroscopy. Methods: The experiment used a shortwave imaging spectrometer and ENVI spectral acquisition software sampling a spectrum of 910 nm-2500 nm. The corresponding preprocessing and mathematical modeling analysis was performed by Unscrambler 9.7 software to establish a ginsenoside nondestructive spectral testing prediction model. Results: The optimal preprocessing method was determined to be a standard normal variable transformation combined with the second-order differential method. The coefficient of determination, $R^2$, of the mathematical model established by the partial least squares method was found to be 0.9999, while the root mean squared error of prediction, RMSEP, was found to be 0.0043, and root mean squared error of calibration, RMSEC, was 0.0041. The residuals of the majority of the samples used for the prediction were between ${\pm}1$. Conclusion: The experiment showed that the predicted model featured a high correlation with real values and a good prediction result, such that this technique can be appropriately applied for the nondestructive testing of ginseng quality.

Application of AutoFom III equipment for prediction of primal and commercial cut weight of Korean pig carcasses

  • Choi, Jung Seok;Kwon, Ki Mun;Lee, Young Kyu;Joeng, Jang Uk;Lee, Kyung Ok;Jin, Sang Keun;Choi, Yang Il;Lee, Jae Joon
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제31권10호
    • /
    • pp.1670-1676
    • /
    • 2018
  • Objective: This study was conducted to enable on-line prediction of primal and commercial cut weights in Korean slaughter pigs by AutoFom III, which non-invasively scans pig carcasses early after slaughter using ultrasonic sensors. Methods: A total of 162 Landrace, Yorkshire, and Duroc (LYD) pigs and 154 LYD pigs representing the yearly Korean slaughter distribution were included in the calibration and validation dataset, respectively. Partial least squares (PLS) models were developed for prediction of the weight of deboned shoulder blade, shoulder picnic, belly, loin, and ham. In addition, AutoFom III's ability to predict the weight of the commercial cuts of spare rib, jowl, false lean, back rib, diaphragm, and tenderloin was investigated. Each cut was manually prepared by local butchers and then recorded. Results: The cross-validated prediction accuracy ($R^2cv$) of the calibration models for deboned shoulder blade, shoulder picnic, loin, belly, and ham ranged from 0.77 to 0.86. The $R^2cv$ for tenderloin, spare rib, diaphragm, false lean, jowl, and back rib ranged from 0.34 to 0.62. Because the $R^2cv$ of the latter commercial cuts were less than 0.65, AutoFom III was less accurate for the prediction of those cuts. The root mean squares error of cross validation calibration (RMSECV) model was comparable to the root mean squares error of prediction (RMSEP), although the RMSECV was numerically higher than RMSEP for the deboned shoulder blade and belly. Conclusion: AutoFom III predicts the weight of deboned shoulder blade, shoulder picnic, loin, belly, and ham with high accuracy, and is a suitable process analytical tool for sorting pork primals in Korea. However, AutoFom III's prediction of smaller commercial Korean cuts is less accurate, which may be attributed to the lack of anatomical reference points and the lack of a good correlation between the scanned area of the carcass and those traits.

Recursive Least Squares Run-to-Run Control with Time-Varying Metrology Delays

  • Fan, Shu-Kai;Chang, Yuan-Jung
    • Industrial Engineering and Management Systems
    • /
    • 제9권3호
    • /
    • pp.262-274
    • /
    • 2010
  • This article investigates how to adaptively predict the time-varying metrology delay that could realistically occur in the semiconductor manufacturing practice. Metrology delays pose a great challenge for the existing run-to-run (R2R) controllers, driving the process output significantly away from target if not adequately predicted. First, the expected asymptotic double exponentially weighted moving average (DEWMA) control output, by using the EWMA and recursive least squares (RLS) prediction methods, is derived. It has been found that the relationships between the expected control output and target in both estimation methods are parallel, and six cases are addressed. Within the context of time-varying metrology delay, this paper presents a modified recursive least squares-linear trend (RLS-LT) controller, in combination with runs test. Simulated single input-single output (SISO) R2R processes subject to various time-varying metrology delay scenarios are used as a testbed to evaluate the proposed algorithms. The simulation results indicate that the modified RLS-LT controller can yield the process output more accurately on target with smaller mean squared error (MSE) than the original RLSLT controller that only deals with constant metrology delays.

Glucose Prediction in the Interstitial Fluid Based on Infrared Absorption Spectroscopy Using Multi-component Analysis

  • Kim, Hye-Jeong;Noh, In-Sup;Yoon, Gil-Won
    • Journal of the Optical Society of Korea
    • /
    • 제13권2호
    • /
    • pp.279-285
    • /
    • 2009
  • Prediction of glucose concentration in the interstitial fluid (ISF) based on mid-infrared absorption spectroscopy was examined at the glucose fundamental absorption band of 1000 - 1500/cm (10 - 6.67 um) using multi-component analysis. Simulated ISF samples were prepared by including four major ISF components. Sodium lactate had absorption spectra that interfere with those of glucose. The rest NaCl, KCl and $CaCl_2$ did not have any signatures. A preliminary experiment based on Design of Experiment, an optimization method, proved that sodium lactate influenced the prediction accuracy of glucose. For the main experiment, 54 samples were prepared whose glucose and sodium lactate concentration varied independently. A partial least squares regression (PLSR) analysis was used to build calibration models. The prediction accuracy was dependent on spectrum preprocessing methods, and Mean Centering produced the best results. Depending on calibration sample sets whose sodium lactate had different concentration levels, the standard error prediction (SEP) of glucose ranged $17.19{\sim}21.02\;mg/dl$.

Partial Least Squares Analysis on Near-Infrared Absorbance Spectra by Air-dried Specific Gravity of Major Domestic Softwood Species

  • Yang, Sang-Yun;Park, Yonggun;Chung, Hyunwoo;Kim, Hyunbin;Park, Se-Yeong;Choi, In-Gyu;Kwon, Ohkyung;Cho, Kyu-Chae;Yeo, Hwanmyeong
    • Journal of the Korean Wood Science and Technology
    • /
    • 제45권4호
    • /
    • pp.399-408
    • /
    • 2017
  • Research on the rapid and accurate prediction of physical properties of wood using near-infrared (NIR) spectroscopy has attracted recent attention. In this study, partial least squares analysis was performed between NIR spectra and air-dried specific gravity of five domestic conifer species including larch (Larix kaempferi), Korean pine (Pinus koraiensis), red pine (Pinus densiflora), cedar (Cryptomeria japonica), and cypress (Chamaecyparis obtusa). Fifty different lumbers per species were purchased from the five National Forestry Cooperative Federations of Korea. The air-dried specific gravity of 100 knot- and defect-free specimens of each species was determined by NIR spectroscopy in the range of 680-2500 nm. Spectral data preprocessing including standard normal variate, detrend and forward first derivative (gap size = 8, smoothing = 8) were applied to all the NIR spectra of the specimens. Partial least squares analysis including cross-validation (five groups) was performed with the air-dried specific gravity and NIR spectra. When the performance of the regression model was expressed as $R^2$ (coefficient of determination) and root mean square error of calibration (RMSEC), $R^2$ and RMSEC were 0.63 and 0.027 for larch, 0.68 and 0.033 for Korean pine, 0.62 and 0.033 for red pine, 0.76 and 0.022 for cedar, and 0.79 and 0.027 for cypress, respectively. For the calibration model, which contained all species in this study, the $R^2$ was 0.75 and the RMSEC was 0.37.

Effects of Temporal Aggregation on Hannan-Rissanen Procedure

  • Shin, Dong-Wan;Lee, Jong-Hyup
    • Journal of the Korean Statistical Society
    • /
    • 제23권2호
    • /
    • pp.325-340
    • /
    • 1994
  • Effects of temporal aggregation on estimation for ARMA models are studied by investigating the Hannan & Rissanen (1982)'s procedure. The temporal aggregation of autoregressive process has a representation of an autoregressive moving average. The characteristic polynomials associated with autoregressive part and moving average part tend to have roots close to zero or almost identical. This caused a numerical problem in the Hannan & Rissanen procedure for identifying and estimating the temporally aggregated autoregressive model. A Monte-Carlo simulation is conducted to show the effects of temporal aggregation in predicting one period ahead realization.

  • PDF

Basal Area-Stump Diameter Models for Tectona grandis Linn. F. Stands in Omo Forest Reserve, Nigeria

  • Chukwu, Onyekachi;Osho, Johnson S.A.
    • Journal of Forest and Environmental Science
    • /
    • 제34권2호
    • /
    • pp.119-125
    • /
    • 2018
  • The tropical forests in developing countries are faced with the problem of illegal exploitation of trees. However, dearth of empirical means of expressing the dimensions, structure, quality and quantity of a removed tree has imped conviction of offenders. This study aimed at developing a model that can effectively estimate individual tree basal area (BA) from stump diameter (Ds) for Tectona grandis stands in Omo Forest Reserve, Nigeria, for timber valuation in case of illegal felling. Thirty-six $25m{\times}25m$ temporary sample plots (TSPs) were laid randomly in six age strata; 26, 23, 22, 16, 14, and 12 years specifically. BA, Ds and diameter at breast height were measured in all living T. grandis trees within the 36 TSPs. Least square method was used to convert the counted stumps into harvested stem cross-sectional areas. Six basal area models were fitted and evaluated. The BA-Ds relationship was best described by power model which gave least values of Root mean square error (0.0048), prediction error sum of squares (0.0325) and Akaike information criterion (-15391) with a high adjusted coefficient of determination (0.921). This study revealed that basal area estimation was realistic even when the only information available was stump diameter. The power model was validated using independent data obtained from additional plots and was found to be appropriate for estimating the basal area of Tectona grandis stands in Omo Forest Reserve, Nigeria.