• Title/Summary/Keyword: Validation Set

Search Result 665, Processing Time 0.045 seconds

Development of the Algorithm for Optimizing Wavelength Selection in Multiple Linear Regression

  • Hoeil Chung
    • Near Infrared Analysis
    • /
    • v.1 no.1
    • /
    • pp.1-7
    • /
    • 2000
  • A convenient algorithm for optimizing wavelength selection in multiple linear regression (MLR) has been developed. MOP (MLP Optimization Program) has been developed to test all possible MLR calibration models in a given spectral range and finally find an optimal MLR model with external validation capability. MOP generates all calibration models from all possible combinations of wavelength, and simultaneously calculates SEC (Standard Error of Calibration) and SEV (Standard Error of Validation) by predicting samples in a validation data set. Finally, with determined SEC and SEV, it calculates another parameter called SAD (Sum of SEC, SEV, and Absolute Difference between SEC and SEV: sum(SEC+SEV+Abs(SEC-SEV)). SAD is an useful parameter to find an optimal calibration model without over-fitting by simultaneously evaluating SEC, SEV, and difference of error between calibration and validation. The calibration model corresponding to the smallest SAD value is chosen as an optimum because the errors in both calibration and validation are minimal as well as similar in scale. To evaluate the capability of MOP, the determination of benzene content in unleaded gasoline has been examined. MOP successfully found the optimal calibration model and showed the better calibration and independent prediction performance compared to conventional MLR calibration.

Assessment of the Near Real-Time Validation for the AQUA Satellite Level-2 Observation Products

  • Yang Min-Sil;Lee Jeongsoon;Lee Chol;Park Jong-Seo;Kim Hee-Ah
    • Proceedings of the KSRS Conference
    • /
    • 2004.10a
    • /
    • pp.35-38
    • /
    • 2004
  • We developed a Near Real-Time Validation System (NRVS) for the Level-2 Products of AQUA Satellite. AQUA satellite is the second largest project of Earth Observing System (EOS) mission of NASA. This satellite provides the information of water cycle of the entire earth with many different forms. Among its products, we have used five kinds of level-2 geophysical parameters containing rain rate, sea surface wind speed, skin surface temperature, atmospheric temperature profile, and atmospheric humidity profile. To use these products in a scientific purpose, reasonable quantification is indispensable. In this paper we explain the near real-time validation system process and its detail algorithm. Its simulation results are also analyzed in a quantitative way. As reference data set in-situ measured meteorological data which are periodically gathered and provided by the Korea Meteorological Administration (KMA) is processed. Not only site-specific analysis but also time-series analysis of the validation results are explained and detail algorithms are described.

  • PDF

Ground Software Validation Test for Wheel Off-loading of COMS (통신해양기상위성의 휠오프로딩 지상국 소프트웨어 검증시험)

  • Park, Young-Woong;Yang, Koon-Ho
    • Aerospace Engineering and Technology
    • /
    • v.9 no.2
    • /
    • pp.51-56
    • /
    • 2010
  • There are two main software in COMS ground station at the normal mode operation - stationkeeping and wheel off-loading. In this paper, ground software validation test for wheel off-loading is summarized and described. The wheel off-loading was performed the design change from E3000 heritage and analyzed. The wheel off-loading of ground software has two part; one is wheel off-loading management for parameters change at the thruster set switching time and the other is wheel off-loading set-point being sent to satellite for the reference momentum.

Development of kNN QSAR Models for 3-Arylisoquinoline Antitumor Agents

  • Tropsha, Alexander;Golbraikh, Alexander;Cho, Won-Jea
    • Bulletin of the Korean Chemical Society
    • /
    • v.32 no.7
    • /
    • pp.2397-2404
    • /
    • 2011
  • Variable selection k nearest neighbor QSAR modeling approach was applied to a data set of 80 3-arylisoquinolines exhibiting cytotoxicity against human lung tumor cell line (A-549). All compounds were characterized with molecular topology descriptors calculated with the MolconnZ program. Seven compounds were randomly selected from the original dataset and used as an external validation set. The remaining subset of 73 compounds was divided into multiple training (56 to 61 compounds) and test (17 to 12 compounds) sets using a chemical diversity sampling method developed in this group. Highly predictive models characterized by the leave-one out cross-validated $R^2$ ($q^2$) values greater than 0.8 for the training sets and $R^2$ values greater than 0.7 for the test sets have been obtained. The robustness of models was confirmed by the Y-randomization test: all models built using training sets with randomly shuffled activities were characterized by low $q^2{\leq}0.26$ and $R^2{\leq}0.22$ for training and test sets, respectively. Twelve best models (with the highest values of both $q^2$ and $R^2$) predicted the activities of the external validation set of seven compounds with $R^2$ ranging from 0.71 to 0.93.

Face Detection Based on Incremental Learning from Very Large Size Training Data (대용량 훈련 데이타의 점진적 학습에 기반한 얼굴 검출 방법)

  • 박지영;이준호
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.7
    • /
    • pp.949-958
    • /
    • 2004
  • race detection using a boosting based algorithm requires a very large size of face and nonface data. In addition, the fact that there always occurs a need for adding additional training data for better detection rates demands an efficient incremental teaming algorithm. In the design of incremental teaming based classifiers, the final classifier should represent the characteristics of the entire training dataset. Conventional methods have a critical problem in combining intermediate classifiers that weight updates depend solely on the performance of individual dataset. In this paper, for the purpose of application to face detection, we present a new method to combine an intermediate classifier with previously acquired ones in an optimal manner. Our algorithm creates a validation set by incrementally adding sampled instances from each dataset to represent the entire training data. The weight of each classifier is determined based on its performance on the validation set. This approach guarantees that the resulting final classifier is teamed by the entire training dataset. Experimental results show that the classifier trained by the proposed algorithm performs better than by AdaBoost which operates in batch mode, as well as by ${Learn}^{++}$.

Net Analyte Signal-based Quantitative Determination of Fusel Oil in Korean Alcoholic Beverage Using FT-NIR Spectroscopy

  • Lohumi, Santosh;Kandpal, Lalit Mohan;Seo, Young Wook;Cho, Byoung Kwan
    • Journal of Biosystems Engineering
    • /
    • v.41 no.3
    • /
    • pp.208-220
    • /
    • 2016
  • Purpose: Fusel oil is a potent volatile aroma compound found in many alcoholic beverages. At low concentrations, it makes an essential contribution to the flavor and aroma of fermented alcoholic beverages, while at high concentrations, it induced an off-flavor and is thought to cause undesirable side effects. In this work, we introduce Fourier transform near-infrared (FT-NIR) spectroscopy as a rapid and nondestructive technique for the quantitative determination of fusel oil in the Korean alcoholic beverage "soju". Methods: FT-NIR transmittance spectra in the 1000-2500 nm region were collected for 120 soju samples with fusel oil concentrations ranging from 0 to 1400 ppm. The calibration and validation data sets were designed using data from 75 and 45 samples, respectively. The net analyte signal (NAS) was used as a preprocessing method before the application of the partial least-square regression (PLSR) and principal component regression (PCR) methods for predicting fusel oil concentration. A novel variable selection method was adopted to determine the most informative spectral variables to minimize the effect of nonmodeled interferences. Finally, the efficiency of the developed technique was evaluated with two different validation sets. Results: The results revealed that the NAS-PLSR model with selected variables ($R^2_{\upsilon}=0.95$, RMSEV = 100ppm) did not outperform the NAS-PCR model (($R^2_{\upsilon}=0.97$, RMSEV = 7 8.9ppm). In addition, the NAS-PCR shows a better recovery for validation set 2 and a lower relative error for validation set 3 than the NAS-PLSR model. Conclusion: The experimental results indicate that the proposed technique could be an alternative to conventional methods for the quantitative determination of fusel oil in alcoholic beverages and has the potential for use in in-line process control.

A Feature Map Compression Method for Multi-resolution Feature Map with PCA-based Transformation (PCA 기반 변환을 통한 다해상도 피처 맵 압축 방법)

  • Park, Seungjin;Lee, Minhun;Choi, Hansol;Kim, Minsub;Oh, Seoung-Jun;Kim, Younhee;Do, Jihoon;Jeong, Se Yoon;Sim, Donggyu
    • Journal of Broadcast Engineering
    • /
    • v.27 no.1
    • /
    • pp.56-68
    • /
    • 2022
  • In this paper, we propose a compression method for multi-resolution feature maps for VCM. The proposed compression method removes the redundancy between the channels and resolution levels of the multi-resolution feature map through PCA-based transformation. According to each characteristic, the basis vectors and mean vector used for transformation, and the transformation coefficient obtained through the transformation are compressed using a VVC-based coder and DeepCABAC. In order to evaluate performance of the proposed method, the object detection performance was measured for the OpenImageV6 and COCO 2017 validation set, and the BD-rate of MPEG-VCM anchor and feature map compression anchor proposed in this paper was compared using bpp and mAP. As a result of the experiment, the proposed method shows a 25.71% BD-rate performance improvement compared to feature map compression anchor in OpenImageV6. Furthermore, for large objects of the COCO 2017 validation set, the BD-rate performance is improved by up to 43.72% compared to the MPEG-VCM anchor.

Cross-Validation Probabilistic Neural Network Based Face Identification

  • Lotfi, Abdelhadi;Benyettou, Abdelkader
    • Journal of Information Processing Systems
    • /
    • v.14 no.5
    • /
    • pp.1075-1086
    • /
    • 2018
  • In this paper a cross-validation algorithm for training probabilistic neural networks (PNNs) is presented in order to be applied to automatic face identification. Actually, standard PNNs perform pretty well for small and medium sized databases but they suffer from serious problems when it comes to using them with large databases like those encountered in biometrics applications. To address this issue, we proposed in this work a new training algorithm for PNNs to reduce the hidden layer's size and avoid over-fitting at the same time. The proposed training algorithm generates networks with a smaller hidden layer which contains only representative examples in the training data set. Moreover, adding new classes or samples after training does not require retraining, which is one of the main characteristics of this solution. Results presented in this work show a great improvement both in the processing speed and generalization of the proposed classifier. This improvement is mainly caused by reducing significantly the size of the hidden layer.

Influence of Self-driving Data Set Partition on Detection Performance Using YOLOv4 Network (YOLOv4 네트워크를 이용한 자동운전 데이터 분할이 검출성능에 미치는 영향)

  • Wang, Xufei;Chen, Le;Li, Qiutan;Son, Jinku;Ding, Xilong;Song, Jeongyoung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.6
    • /
    • pp.157-165
    • /
    • 2020
  • Aiming at the development of neural network and self-driving data set, it is also an idea to improve the performance of network model to detect moving objects by dividing the data set. In Darknet network framework, the YOLOv4 (You Only Look Once v4) network model was used to train and test Udacity data set. According to 7 proportions of the Udacity data set, it was divided into three subsets including training set, validation set and test set. K-means++ algorithm was used to conduct dimensional clustering of object boxes in 7 groups. By adjusting the super parameters of YOLOv4 network for training, Optimal model parameters for 7 groups were obtained respectively. These model parameters were used to detect and compare 7 test sets respectively. The experimental results showed that YOLOv4 can effectively detect the large, medium and small moving objects represented by Truck, Car and Pedestrian in the Udacity data set. When the ratio of training set, validation set and test set is 7:1.5:1.5, the optimal model parameters of the YOLOv4 have highest detection performance. The values show mAP50 reaching 80.89%, mAP75 reaching 47.08%, and the detection speed reaching 10.56 FPS.

QSPR Study of the Absorption Maxima of Azobenzene Dyes

  • Xu, Jie;Wang, Lei;Liu, Li;Bai, Zikui;Wang, Luoxin
    • Bulletin of the Korean Chemical Society
    • /
    • v.32 no.11
    • /
    • pp.3865-3872
    • /
    • 2011
  • A quantitative structure-property relationship (QSPR) study was performed for the prediction of the absorption maxima of azobenzene dyes. The entire set of 191 azobenzenes was divided into a training set of 150 azobenzenes and a test set of 41 azobenzenes according to Kennard and Stones algorithm. A seven-descriptor model, with squared correlation coefficient ($R^2$) of 0.8755 and standard error of estimation (s) of 14.476, was developed by applying stepwise multiple linear regression (MLR) analysis on the training set. The reliability of the proposed model was further illustrated using various evaluation techniques: leave-many-out crossvalidation procedure, randomization tests, and validation through the test set.