• Title/Summary/Keyword: Data validation

Search Result 3,187, Processing Time 0.03 seconds

Multiclass LS-SVM ensemble for large data

  • Hwang, Hyungtae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1557-1563
    • /
    • 2015
  • Multiclass classification is typically performed using the voting scheme method based on combining binary classifications. In this paper we propose multiclass classification method for large data, which can be regarded as the revised one-vs-all method. The multiclass classification is performed by using the hat matrix of least squares support vector machine (LS-SVM) ensemble, which is obtained by aggregating individual LS-SVM trained on each subset of whole large data. The cross validation function is defined to select the optimal values of hyperparameters which affect the performance of multiclass LS-SVM proposed. We obtain the generalized cross validation function to reduce computational burden of cross validation function. Experimental results are then presented which indicate the performance of the proposed method.

CROSS- VALIDATION OF LANDSLIDE SUSCEPTIBILITY MAPPING IN KOREA

  • LEE SARO
    • Proceedings of the KSRS Conference
    • /
    • 2004.10a
    • /
    • pp.291-293
    • /
    • 2004
  • The aim of this study was to cross-validate a spatial probabilistic model of landslide likelihood ratios at Boun, Janghung and Yongin, in Korea, using a Geographic Information System (GIS). Landslide locations within the study areas were identified by interpreting aerial photographs, satellite images and field surveys. Maps of the topography, soil type, forest cover, lineaments and land cover were constructed from the spatial data sets. The 14 factors that influence landslide occurrence were extracted from the database and the likelihood ratio of each factor was computed. 'Landslide susceptibility maps were drawn for these three areas using likelihood ratios derived not only from the data for that area but also using the likelihood ratios calculated from each of the other two areas (nine maps in all) as a cross-check of the validity of the method For validation and cross-validation, the results of the analyses were compared, in each study area, with actual landslide locations. The validation and cross-validation of the results showed satisfactory agreement between the susceptibility map and the existing landslide locations.

  • PDF

Introduction to the Validation Module Design for CMDPS Baseline Products

  • Kim, Shin-Young;Chung, Chu-Yong;Ou, Mi-Lim
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.146-148
    • /
    • 2007
  • CMDPS (COMS Meteorological Data Processing System) is the operational meteorological products extraction system for data observed from COMS (Communication, Ocean and Meteorological Satellite) meteorological imager. CMDPS baseline products consist of 16 parameters including cloud information, water vapor products, surface information, environmental products and atmospheric motion vector. Additionally, CMDPS includes the function of calibration monitoring, and validation mechanism of the baseline products. The main objective of CMDPS validation module development is near-real time monitoring for the accuracy and reliability of the whole CMDPS products. Also, its long time validation statistics are used for upgrade of CMDPS such as algorithm parameter tuning and retrieval algorithm modification. This paper introduces the preliminary design on CMDPS validation module.

  • PDF

Diagnostic In Spline Regression Model With Heteroscedasticity

  • Lee, In-Suk;Jung, Won-Tae;Jeong, Hye-Jeong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.6 no.1
    • /
    • pp.63-71
    • /
    • 1995
  • We have consider the study of local influence for smoothing parameter estimates in spline regression model with heteroscedasticity. Practically, generalized cross-validation does not work well in the presence of heteroscedasticity. Thus we have proposed the local influence measure for generalized cross-validation estimates when errors are heteroscedastic. And we have examined effects of diagnostic by above measures through Hyperinflation data.

  • PDF

Development of 2D Data Quality Validation Techniques for Pipe-type Underground Facilities (2차원 관로형 지하시설물 정보 품질검증기술 개발)

  • Sang-Keun Bae;Sang-Min Kim;Eun-Jin Yoo;Keo-Bae Lim;Da-Woon Jeong
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.3
    • /
    • pp.285-292
    • /
    • 2023
  • As various accidents have occurred in underground spaces, we aim to improve the quality validation standards and methods as specified in the Regulations on Producing Integrated Map of Underground Spaces devised by the Ministry of Land, Infrastructure and Transport of the Republic of Korea for a high-quality integrated map of underground spaces. Specifically, we propose measures to improve the quality assurance of pipeline-type underground facilities, the so-called life lines given their importance for citizens' daily activities and their highest risk of accident among the 16 types of underground facilities. After implementing quality validation software based on the developed quality validation standards, the adequacy of the validation standards was demonstrated by testing using data from two-dimensional water supply facilities in some areas of Busan, Korea. This paper has great significance in that it has laid the foundation for reducing the time and manpower required for data quality inspection and improving data quality reliability by improving current quality validation standards and developing technologies that can automatically extract errors through software.

Case Study of BIM Quality Assurance (BIM 모델의 품질검증 사례연구)

  • Jeong, Yeon-Suk;Park, Sang-Il;Lee, Sang-Ho
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 2010.04a
    • /
    • pp.379-382
    • /
    • 2010
  • This study proposes a way to validate BIM data quality in BIM applications. Solibri model checker is adopted as a module development platform, which is based on Java programming language. The platform makes application developers implement BIM model checker for their own purpose. This study has developed a BIM validation module for circulation analysis of building design. The validation module enables end-users to automatically detect data corrupted or not defined. In case studies, the module found that an IFC file generated from a BIM software has wrong relation information between a space and boundary elements. A building model should satisfy modeling requirements and then domain users can get analysis results. The BIM data validation module needs to be developed in each BIM application domain.

  • PDF

The Simulation and Research of Information for Space Craft(Autonomous Spacecraft Health Monitoring/Data Validation Control Systems)

  • Kim, H;Jhonson, R.;Zalewski, D.;Qu, Z.;Durrance, S.T.;Ham, C.
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.2 no.2
    • /
    • pp.81-89
    • /
    • 2001
  • Space systems are operating in a changing and uncertain space environment and are desired to have autonomous capability for long periods of time without frequent telecommunications from the ground station At the same time. requirements for new set of projects/systems calling for ""autonomous"" operations for long unattended periods of time are emerging. Since, by the nature of space systems, it is desired that they perform their mission flawlessly and also it is of extreme importance to have fault-tolerant sensor/actuator sub-systems for the purpose of validating science measurement data for the mission success. Technology innovations attendant on autonomous data validation and health monitoring are articulated for a growing class of autonomous operations of space systems. The greatest need is on focus research effort to the development of a new class of fault-tolerant space systems such as attitude actuators and sensors as well as validation of measurement data from scientific instruments. The characterization for the next step in evolving the existing control processes to an autonomous posture is to embed intelligence into actively control. modify parameters and select sensor/actuator subsystems based on statistical parameters of the measurement errors in real-time. This research focuses on the identification/demonstration of critical technology innovations that will be applied to Autonomous Spacecraft Health Monitoring/Data Validation Control Systems (ASHMDVCS). Systems (ASHMDVCS).

  • PDF

Application of Time-series Cross Validation in Hyperparameter Tuning of a Predictive Model for 2,3-BDO Distillation Process (시계열 교차검증을 적용한 2,3-BDO 분리공정 온도예측 모델의 초매개변수 최적화)

  • An, Nahyeon;Choi, Yeongryeol;Cho, Hyungtae;Kim, Junghwan
    • Korean Chemical Engineering Research
    • /
    • v.59 no.4
    • /
    • pp.532-541
    • /
    • 2021
  • Recently, research on the application of artificial intelligence in the chemical process has been increasing rapidly. However, overfitting is a significant problem that prevents the model from being generalized well to predict unseen data on test data, as well as observed training data. Cross validation is one of the ways to solve the overfitting problem. In this study, the time-series cross validation method was applied to optimize the number of batch and epoch in the hyperparameters of the prediction model for the 2,3-BDO distillation process, and it compared with K-fold cross validation generally used. As a result, the RMSE of the model with time-series cross validation was lower by 9.06%, and the MAPE was higher by 0.61% than the model with K-fold cross validation. Also, the calculation time was 198.29 sec less than the K-fold cross validation method.

Anomaly Detection In Real Power Plant Vibration Data by MSCRED Base Model Improved By Subset Sampling Validation (Subset 샘플링 검증 기법을 활용한 MSCRED 모델 기반 발전소 진동 데이터의 이상 진단)

  • Hong, Su-Woong;Kwon, Jang-Woo
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.1
    • /
    • pp.31-38
    • /
    • 2022
  • This paper applies an expert independent unsupervised neural network learning-based multivariate time series data analysis model, MSCRED(Multi-Scale Convolutional Recurrent Encoder-Decoder), and to overcome the limitation, because the MCRED is based on Auto-encoder model, that train data must not to be contaminated, by using learning data sampling technique, called Subset Sampling Validation. By using the vibration data of power plant equipment that has been labeled, the classification performance of MSCRED is evaluated with the Anomaly Score in many cases, 1) the abnormal data is mixed with the training data 2) when the abnormal data is removed from the training data in case 1. Through this, this paper presents an expert-independent anomaly diagnosis framework that is strong against error data, and presents a concise and accurate solution in various fields of multivariate time series data.

Bandwidth selections based on cross-validation for estimation of a discontinuity point in density (교차타당성을 이용한 확률밀도함수의 불연속점 추정의 띠폭 선택)

  • Huh, Jib
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.4
    • /
    • pp.765-775
    • /
    • 2012
  • The cross-validation is a popular method to select bandwidth in all types of kernel estimation. The maximum likelihood cross-validation, the least squares cross-validation and biased cross-validation have been proposed for bandwidth selection in kernel density estimation. In the case that the probability density function has a discontinuity point, Huh (2012) proposed a method of bandwidth selection using the maximum likelihood cross-validation. In this paper, two forms of cross-validation with the one-sided kernel function are proposed for bandwidth selection to estimate the location and jump size of the discontinuity point of density. These methods are motivated by the least squares cross-validation and the biased cross-validation. By simulated examples, the finite sample performances of two proposed methods with the one of Huh (2012) are compared.