• Title/Summary/Keyword: Cross-Validation Approach

Search Result 130, Processing Time 0.027 seconds

THE VALIDITY OF HEALTH ASSESSMENTS: RESOLVING SOME RECENT DIFFERENCES

  • Hyland Michael E.
    • 대한예방의학회:학술대회논문집
    • /
    • 1994.02b
    • /
    • pp.137-141
    • /
    • 1994
  • The purpose of this paper is to examine what is meant by a ralid measure of health. Guyatt, Kirshner and Jaeschke propose that health tests should be designed so as to have one of several kinds of validity: 'longitudinal construct validity' for those which are used for longitudinal research designs, and 'cross-sectional construct validity' for those which are used for cross-sectional designs. Williams and Naylor argue that this approach to test classification and validation confuses what a test purports to measure with the purpose for which it is used, and that some tests have multiple uses. A review of the meanings of validity in the psychologica test literature shows that both sets of authors use the term validity in an idiosyncratic way. Although the use of a test (evaluated by content validity) should not be conflated with whether the test actually measures a specified construct (evaluated by construct validity);' if health is actually made up of several constructs (as suggested in Hyland's interactional model) then there may be an association between types of construct and types of purpose. Evidence is reviewed that people make several, independent judgements about their health: cognitive perceptions of health problems are likely to be more sensitive to change in a longitudinal research design. whereas emotional evaluations of health provide less bias in cross-sectional designs. Thus. a classification of health measures in terms of the purpose of the test may parallel a classification in terms of what tests purport to measure.

  • PDF

Modified parity space averaging approaches for online cross-calibration of redundant sensors in nuclear reactors

  • Kassim, Moath;Heo, Gyunyoung
    • Nuclear Engineering and Technology
    • /
    • v.50 no.4
    • /
    • pp.589-598
    • /
    • 2018
  • To maintain safety and reliability of reactors, redundant sensors are usually used to measure critical variables and estimate their averaged time-dependency. Nonhealthy sensors can badly influence the estimation result of the process variable. Since online condition monitoring was introduced, the online cross-calibration method has been widely used to detect any anomaly of sensor readings among the redundant group. The cross-calibration method has four main averaging techniques: simple averaging, band averaging, weighted averaging, and parity space averaging (PSA). PSA is used to weigh redundant signals based on their error bounds and their band consistency. Using the consistency weighting factor (C), PSA assigns more weight to consistent signals that have shared bands, based on how many bands they share, and gives inconsistent signals of very low weight. In this article, three approaches are introduced for improving the PSA technique: the first is to add another consistency factor, so called trend consistency (TC), to include a consideration of the preserving of any characteristic edge that reflects the behavior of equipment/component measured by the process parameter; the second approach proposes replacing the error bound/accuracy based weighting factor ($W^a$) with a weighting factor based on the Euclidean distance ($W^d$), and the third approach proposes applying $W^d$, TC, and C, all together. Cold neutron source data sets of four redundant hydrogen pressure transmitters from a research reactor were used to perform the validation and verification. Results showed that the second and third modified approaches lead to reasonable improvement of the PSA technique. All approaches implemented in this study were similar in that they have the capability to (1) identify and isolate a drifted sensor that should undergo calibration, (2) identify a faulty sensor/s due to long and continuous missing data range, and (3) identify a healthy sensor.

Image Processing-based Validation of Unrecognizable Numbers in Severely Distorted License Plate Images

  • Jang, Sangsik;Yoon, Inhye;Kim, Dongmin;Paik, Joonki
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.1 no.1
    • /
    • pp.17-26
    • /
    • 2012
  • This paper presents an image processing-based validation method for unrecognizable numbers in severely distorted license plate images which have been degraded by various factors including low-resolution, low light-level, geometric distortion, and periodic noise. Existing vehicle license plate recognition (LPR) methods assume that most of the image degradation factors have been removed before performing the recognition of printed numbers and letters. If this is not the case, conventional LPR becomes impossible. The proposed method adopts a novel approach where a set of reference number images are intentionally degraded using the same factors estimated from the input image. After a series of image processing steps, including geometric transformation, super-resolution, and filtering, a comparison using cross-correlation between the intentionally degraded reference and the input images can provide a successful identification of the visually unrecognizable numbers. The proposed method makes it possible to validate numbers in a license plate image taken under low light-level conditions. In the experiment, using an extended set of test images that are unrecognizable to human vision, the proposed method provides a successful recognition rate of over 95%, whereas most existing LPR methods fail due to the severe distortion.

  • PDF

Development of kNN QSAR Models for 3-Arylisoquinoline Antitumor Agents

  • Tropsha, Alexander;Golbraikh, Alexander;Cho, Won-Jea
    • Bulletin of the Korean Chemical Society
    • /
    • v.32 no.7
    • /
    • pp.2397-2404
    • /
    • 2011
  • Variable selection k nearest neighbor QSAR modeling approach was applied to a data set of 80 3-arylisoquinolines exhibiting cytotoxicity against human lung tumor cell line (A-549). All compounds were characterized with molecular topology descriptors calculated with the MolconnZ program. Seven compounds were randomly selected from the original dataset and used as an external validation set. The remaining subset of 73 compounds was divided into multiple training (56 to 61 compounds) and test (17 to 12 compounds) sets using a chemical diversity sampling method developed in this group. Highly predictive models characterized by the leave-one out cross-validated $R^2$ ($q^2$) values greater than 0.8 for the training sets and $R^2$ values greater than 0.7 for the test sets have been obtained. The robustness of models was confirmed by the Y-randomization test: all models built using training sets with randomly shuffled activities were characterized by low $q^2{\leq}0.26$ and $R^2{\leq}0.22$ for training and test sets, respectively. Twelve best models (with the highest values of both $q^2$ and $R^2$) predicted the activities of the external validation set of seven compounds with $R^2$ ranging from 0.71 to 0.93.

Three-dimensional geostatistical modeling of subsurface stratification and SPT-N Value at dam site in South Korea

  • Mingi Kim;Choong-Ki Chung;Joung-Woo Han;Han-Saem Kim
    • Geomechanics and Engineering
    • /
    • v.34 no.1
    • /
    • pp.29-41
    • /
    • 2023
  • The 3D geospatial modeling of geotechnical information can aid in understanding the geotechnical characteristic values of the continuous subsurface at construction sites. In this study, a geostatistical optimization model for the three-dimensional (3D) mapping of subsurface stratification and the SPT-N value based on a trial-and-error rule was developed and applied to a dam emergency spillway site in South Korea. Geospatial database development for a geotechnical investigation, reconstitution of the target grid volume, and detection of outliers in the borehole dataset were implemented prior to the 3D modeling. For the site-specific subsurface stratification of the engineering geo-layer, we developed an integration method for the borehole and geophysical survey datasets based on the geostatistical optimization procedure of ordinary kriging and sequential Gaussian simulation (SGS) by comparing their cross-validation-based prediction residuals. We also developed an optimization technique based on SGS for estimating the 3D geometry of the SPT-N value. This method involves quantitatively testing the reliability of SGS and selecting the realizations with a high estimation accuracy. Boring tests were performed for validation, and the proposed method yielded more accurate prediction results and reproduced the spatial distribution of geotechnical information more effectively than the conventional geostatistical approach.

Prediction of Protein-Protein Interaction Sites Based on 3D Surface Patches Using SVM (SVM 모델을 이용한 3차원 패치 기반 단백질 상호작용 사이트 예측기법)

  • Park, Sung-Hee;Hansen, Bjorn
    • The KIPS Transactions:PartD
    • /
    • v.19D no.1
    • /
    • pp.21-28
    • /
    • 2012
  • Predication of protein interaction sites for monomer structures can reduce the search space for protein docking and has been regarded as very significant for predicting unknown functions of proteins from their interacting proteins whose functions are known. In the other hand, the prediction of interaction sites has been limited in crystallizing weakly interacting complexes which are transient and do not form the complexes stable enough for obtaining experimental structures by crystallization or even NMR for the most important protein-protein interactions. This work reports the calculation of 3D surface patches of complex structures and their properties and a machine learning approach to build a predictive model for the 3D surface patches in interaction and non-interaction sites using support vector machine. To overcome classification problems for class imbalanced data, we employed an under-sampling technique. 9 properties of the patches were calculated from amino acid compositions and secondary structure elements. With 10 fold cross validation, the predictive model built from SVM achieved an accuracy of 92.7% for classification of 3D patches in interaction and non-interaction sites from 147 complexes.

Quantitative Analysis of GIS-based Landslide Prediction Models Using Prediction Rate Curve (예측비율곡선을 이용한 GIS 기반 산사태 예측 모델의 정량적 비교)

  • 지광훈;박노욱;박노욱
    • Korean Journal of Remote Sensing
    • /
    • v.17 no.3
    • /
    • pp.199-210
    • /
    • 2001
  • The purpose of this study is to compare the landslide prediction models quantitatively using prediction rate curve. A case study from the Jangheung area was used to illustrate the methodologies. The landslide locations were detected from remote sensing data and field survey, and geospatial information related to landslide occurrences were built as a spatial database in GIS. As prediction models, joint conditional probability model and certainty factor model were applied. For cross-validation approach, landslide locations were partitioned into two groups randomly. One group was used to construct prediction models, and the other group was used to validate prediction results. From the cross-validation analysis, it is possible to compare two models to each other in this study area. It is expected that these approaches will be used effectively to compare other prediction models and to analyze the causal factors in prediction models.

Comparison between Neural Network and Conventional Statistical Analysis Methods for Estimation of Water Quality Using Remote Sensing (원격탐사를 이용한 수질평가시의 인공신경망에 의한 분석과 기존의 회귀분석과의 비교)

  • 임정호;정종철
    • Korean Journal of Remote Sensing
    • /
    • v.15 no.2
    • /
    • pp.107-117
    • /
    • 1999
  • A comparison of a neural network approach with the conventional statistical methods, multiple regression and band ratio analyses, for the estimation of water quality parameters in presented in this paper. The Landsat TM image of Lake Daechung acquired on March 18, 1996 and the thirty in-situ sampling data sets measured during the satellite overpass were used for the comparison. We employed a three-layered and feedforward network trained by backpropagation algorithm. A cross validation was applied because of the small number of training pairs available for this study. The neural network showed much more successful performance than the conventional statistical analyses, although the results of the conventional statistical analyses were significant. The superiority of a neural network to statistical methods in estimating water quality parameters is strictly because the neural network modeled non-linear behaviors of data sets much better.

Revisiting the Z-R Relationship Using Long-term Radar Reflectivity over the Entire South Korea Region in a Bayesian Perspective

  • Kim, Tae-Jeong;Kim, Jin-Guk;Kim, Ho Jun;Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.275-275
    • /
    • 2021
  • A fixed Z-R relationship approach, such as the Marshall-Palmer relationship, for an entire year and for different seasons can be problematic in cases where the relationship varies spatially and temporally throughout a region. From this perspective, this study explores the use of long-term radar reflectivity for South Korea to obtain a nationwide calibrated Z-R relationship and the associated uncertainties within a Bayesian regression framework. This study also investigates seasonal differences in the Z-R relationship and their roles in reducing systematic error. Distinct differences in the Z-R parameters in space are identified, and more importantly, an inverse relationship between the parameters is clearly identified with distinct regimes based on the seasons. A spatially structured pattern in the parameters exists, particularly parameter α for the wet season and parameter β for the dry season. A pronounced region of high values during the wet and dry seasons may be partially associated with storm movements in that season. Finally, the radar rainfall estimates through the calibrated Z-R relationship are compared with the existing Z-R relationships for estimating stratiform rainfall and convective rainfall. Overall, the radar rainfall fields based on the proposed modeling procedure are similar to the observed rainfall fields, whereas the radar rainfall fields obtained from the existing Marshall-Palmer Z-R relationship show a systematic underestimation. The obtained Z-R relationships are validated by testing the predictions on unseen radar-gauge pairs in the year 2018, in the context of cross-validation. The cross-validation results are largely similar to those in the calibration process, suggesting that the derived Z-R relationships fit the radar-gauge pairs reasonably well.

  • PDF

The Prediction Ability of Genomic Selection in the Wheat Core Collection

  • Yuna Kang;Changsoo Kim
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2022.10a
    • /
    • pp.235-235
    • /
    • 2022
  • Genome selection is a promising tool for plant and animal breeding, which uses genome-wide molecular marker data to capture large and small effect quantitative trait loci and predict the genetic value of selection candidates. Genomic selection has been shown previously to have higher prediction accuracies than conventional marker-assisted selection (MAS) for quantitative traits. In this study, the prediction accuracy of 10 agricultural traits in the wheat core group with 567 points was compared. We used a cross-validation approach to train and validate prediction accuracy to evaluate the effects of training population size and training model.As for the prediction accuracy according to the model, the prediction accuracy of 0.4 or more was evaluated except for the SVN model among the 6 models (GBLUP, LASSO, BayseA, RKHS, SVN, RF) used in most all traits. For traits such as days to heading and days to maturity, the prediction accuracy was very high, over 0.8. As for the prediction accuracy according to the training group, the prediction accuracy increased as the number of training groups increased in all traits. It was confirmed that the prediction accuracy was different in the training population according to the genetic composition regardless of the number. All training models were verified through 5-fold cross-validation. To verify the prediction ability of the training population of the wheat core collection, we compared the actual phenotype and genomic estimated breeding value using 35 breeding population. In fact, out of 10 individuals with the fastest days to heading, 5 individuals were selected through genomic selection, and 6 individuals were selected through genomic selection out of the 10 individuals with the slowest days to heading. Therefore, we confirmed the possibility of selecting individuals according to traits with only the genotype for a shorter period of time through genomic selection.

  • PDF