• 제목/요약/키워드: Data validation

검색결과 3,187건 처리시간 0.029초

인공지능 데이터 품질검증 기술 및 오픈소스 프레임워크 분석 연구 (An Evaluation Study on Artificial Intelligence Data Validation Methods and Open-source Frameworks)

  • 윤창희;신호경;추승연;김재일
    • 한국멀티미디어학회논문지
    • /
    • 제24권10호
    • /
    • pp.1403-1413
    • /
    • 2021
  • In this paper, we investigate automated data validation techniques for artificial intelligence training, and also disclose open-source frameworks, such as Google's TensorFlow Data Validation (TFDV), that support automated data validation in the AI model development process. We also introduce an experimental study using public data sets to demonstrate the effectiveness of the open-source data validation framework. In particular, we presents experimental results of the data validation functions for schema testing and discuss the limitations of the current open-source frameworks for semantic data. Last, we introduce the latest studies for the semantic data validation using machine learning techniques.

Finding Unexpected Test Accuracy by Cross Validation in Machine Learning

  • Yoon, Hoijin
    • International Journal of Computer Science & Network Security
    • /
    • 제21권12spc호
    • /
    • pp.549-555
    • /
    • 2021
  • Machine Learning(ML) splits data into 3 parts, which are usually 60% for training, 20% for validation, and 20% for testing. It just splits quantitatively instead of selecting each set of data by a criterion, which is very important concept for the adequacy of test data. ML measures a model's accuracy by applying a set of validation data, and revises the model until the validation accuracy reaches on a certain level. After the validation process, the complete model is tested with the set of test data, which are not seen by the model yet. If the set of test data covers the model's attributes well, the test accuracy will be close to the validation accuracy of the model. To make sure that ML's set of test data works adequately, we design an experiment and see if the test accuracy of model is always close to its validation adequacy as expected. The experiment builds 100 different SVM models for each of six data sets published in UCI ML repository. From the test accuracy and its validation accuracy of 600 cases, we find some unexpected cases, where the test accuracy is very different from its validation accuracy. Consequently, it is not always true that ML's set of test data is adequate to assure a model's quality.

딥러닝을 이용한 당뇨성황반부종 등급 분류의 정확도 개선을 위한 검증 데이터 증강 기법 (Validation Data Augmentation for Improving the Grading Accuracy of Diabetic Macular Edema using Deep Learning)

  • 이태수
    • 대한의용생체공학회:의공학회지
    • /
    • 제40권2호
    • /
    • pp.48-54
    • /
    • 2019
  • This paper proposed a method of validation data augmentation for improving the grading accuracy of diabetic macular edema (DME) using deep learning. The data augmentation technique is basically applied in order to secure diversity of data by transforming one image to several images through random translation, rotation, scaling and reflection in preparation of input data of the deep neural network (DNN). In this paper, we apply this technique in the validation process of the trained DNN, and improve the grading accuracy by combining the classification results of the augmented images. To verify the effectiveness, 1,200 retinal images of Messidor dataset was divided into training and validation data at the ratio 7:3. By applying random augmentation to 359 validation data, $1.61{\pm}0.55%$ accuracy improvement was achieved in the case of six times augmentation (N=6). This simple method has shown that the accuracy can be improved in the N range from 2 to 6 with the correlation coefficient of 0.5667. Therefore, it is expected to help improve the diagnostic accuracy of DME with the grading information provided by the proposed DNN.

K-겹 교차 검증과 서포트 벡터 머신을 이용한 고무 오링결함 검출 시스템 (Rubber O-ring defect detection system using K-fold cross validation and support vector machine)

  • 이용은;최낙준;변영후;김대원;김경천
    • 한국가시화정보학회지
    • /
    • 제19권1호
    • /
    • pp.68-73
    • /
    • 2021
  • In this study, the detection of rubber o-ring defects was carried out using k-fold cross validation and Support Vector Machine (SVM) algorithm. The data process was carried out in 3 steps. First, we proceeded with a frame alignment to eliminate unnecessary regions in the learning and secondly, we applied gray-scale changes for computational reduction. Finally, data processing was carried out using image augmentation to prevent data overfitting. After processing data, SVM algorithm was used to obtain normal and defect detection accuracy. In addition, we applied the SVM algorithm through the k-fold cross validation method to compare the classification accuracy. As a result, we obtain results that show better performance by applying the k-fold cross validation method.

A Visual Approach for Data-Intensive Workflow Validation

  • Park, Minjae;Ahn, Hyun;Kim, Kwanghoon Pio
    • 인터넷정보학회논문지
    • /
    • 제17권5호
    • /
    • pp.43-49
    • /
    • 2016
  • This paper presents a workflow validation method for data-intensive graphical workflow models using real-time workflow tracing mode on data-intensive workflow designer. In order to model and validate workflows, we try to divide as modes have editable mode and tracing mode on data-intensive workflow designer. We could design data-intensive workflow using drag and drop in editable-mode, otherwise we could not design but view and trace workflow model in tracing mode. We would like to focus on tracing-mode for workflow validation, and describe how to use workflow tracing on data-intensive workflow model designer. Especially, it is support data centered operation about control logics and exchange variables on workflow runtime for workflow tracing.

OVERVIEW OF KOMPSAT APPLICATION PRODUCT VALIDATION SITE AND THE RELATED ACTIVITIES

  • Lee, Kwang-Jae;Youn, Bo-Yeol;Kim, Duk-Jin;Kim, Youn-Soo
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2007년도 Proceedings of ISRS 2007
    • /
    • pp.122-125
    • /
    • 2007
  • In recent years, there has been an increasing demand for improved accuracy and reliability of Earth Observation Satellite (EOS) data. Most of the data users in the field of remote sensing require understanding of product accuracy and uncertainty. Especially, EOS application products should be validated for practical application in the field. In order to evaluate the availability and applicability of application products, it will be necessary to establish a systematic validation system including techniques, equipments, ground truth data, etc. The Product Validation Site (PVS) for generation and validation of KOMPSAT application products was designed and established with various in-situ equipment and dataset. This paper presents the status of PVS and summarizes some results from experiment studies at PVS.

  • PDF

유전자 발현 자료를 이용한 군집 타당성분석 기법 비교 (Comparison of the Cluster Validation Techniques using Gene Expression Data)

  • 정윤경;백장선
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2006년도 PROCEEDINGS OF JOINT CONFERENCEOF KDISS AND KDAS
    • /
    • pp.63-76
    • /
    • 2006
  • 유전자 발현 자료(gene expression data)를 분석하기 위한 여러 가지 군집 알고리즘(clustering algorithm)과 군집 결과들을 검증하는 척도, 즉 군집 타당성분석 기법(cluster validation technique)이 제안되고 있지만, 이틀 군집 타당성을 분석하는 기법들에 대한 성능의 비교 평가는 매우 드물다. 본 논문에서는 모의 생성 자료로 몇 가지 특정 상황을 연출하여 군집 타당성 분석 기법들을 비교해 보고, 실제 유전자 발현 자료 두 가지에 대해서도 이들 기법의 성능을 비교 평가해 보았다.

  • PDF

Design of an Algorithm for the Validation of SCL in Digital Substations

  • Jang, B.T.;Alidu, A.;Kim, N.D.
    • KEPCO Journal on Electric Power and Energy
    • /
    • 제3권2호
    • /
    • pp.89-97
    • /
    • 2017
  • The substation is a critical node in the power network where power is transformed in the power generation, transmission and distribution system. The IEC 61850 is a global standard which proposes efficient substation automation by defining interoperable communication and data modelling techniques. In order to achieve this level of interoperability and automation, the IEC 61850 (Part 6) defines System Configuration description Language (SCL). The SCL is an XML based file format for defining the abstract model of primary and secondary substation equipment, communications systems and also the relationship between them. It enables the interoperable exchange of data during substation engineering by standardizing the description of applications at different stages of the engineering process. To achieve the seamless interoperability, multi-vendor devices are required to adhere completely to the IEC 61850. This paper proposes an efficient algorithm required for verifying the interoperability of multi-vendor devices by checking the adherence of the SCL file to specifications of the standard. Our proposed SCL validation algorithm consists of schema validation and other functionalities including information model validation using UML data model, the Vendor Defined Extension model validation, the User Defined Rule validation and the IED Engineering Table (IET) consistency validation. It also integrates the standard UCAIUG (Utility Communication Architecture International Users Group) Procedure validation for quality assurance testing. Our proposed algorithm is not only flexible and efficient in terms of ensuring interoperable functionality of tested devices, it is also convenient for use by system integrators and test engineers.

고차원 (유전자 발현) 자료에 대한 군집 타당성분석 기법의 성능 비교 (Comparison of the Cluster Validation Methods for High-dimensional (Gene Expression) Data)

  • 정윤경;백장선
    • 응용통계연구
    • /
    • 제20권1호
    • /
    • pp.167-181
    • /
    • 2007
  • 유전자 발현 자료(gene expression data)는 전형적인 고차원 자료이며, 이를 분석하기 위한 여러 가지 군집 알고리즘(clustering algorithm)과 군집 결과들을 검증하는 군집타당성분석 기법(cluster validation technique)이 제안되고 있지만, 이들 군집 타당성을 분석하는 기법의 성능에 대한 비교, 평가는 매우 드물다. 본 논문에서는 저차원의 모의실험 자료와 실제 유전자 발현 자료에 대하여 군집 타당성분석 기법들의 성능을 비교하였으며, 그 결과 내적 측도에서는 Dunn 지수, Silhouette 지수 순으로 뛰어났고 외적 측도에서는 Jaccard 지수가 성능이 가장 우수한 것으로 평가되었다.

Basic Principles of the Validation for Good Laboratory Practice Institutes

  • Cho, Kyu-Hyuk;Kim, Jin-Sung;Jeon, Man-Soo;Lee, Kyu-Hong;Chung, Moon-Koo;Song, Chang-Woo
    • Toxicological Research
    • /
    • 제25권1호
    • /
    • pp.1-8
    • /
    • 2009
  • Validation specifies and coordinates all relevant activities to ensure compliance with good laboratory practices (GLP) according to suitable international standards. This includes validation activities of past, present and future for the best possible actions to ensure the integrity of non-clinical laboratory data. Recently, validation has become increasingly important, not only in good manufacturing practice (GMP) institutions but also in GLP facilities. In accordance with the guideline for GLP regulations, all equipments used to generate, measure, or assess data should undergo validation to ensure that this equipment is of appropriate design and capacity and that it will consistently function as intended. Therefore, the implantation of validation processes is considered to be an essential step in a global institution. This review describes the procedures and documentations required for validation of GLP. It introduces basic elements such as the validation master plan, risk assessment, gap analysis, design qualification, installation qualification, operational qualification, performance qualification, calibration, traceability, and revalidation.