통합 검색 | Korea Science

Finding Unexpected Test Accuracy by Cross Validation in Machine Learning

Yoon, Hoijin
- International Journal of Computer Science & Network Security
- /
- 제21권12spc호
- /
- pp.549-555
- /
- 2021
Machine Learning(ML) splits data into 3 parts, which are usually 60% for training, 20% for validation, and 20% for testing. It just splits quantitatively instead of selecting each set of data by a criterion, which is very important concept for the adequacy of test data. ML measures a model's accuracy by applying a set of validation data, and revises the model until the validation accuracy reaches on a certain level. After the validation process, the complete model is tested with the set of test data, which are not seen by the model yet. If the set of test data covers the model's attributes well, the test accuracy will be close to the validation accuracy of the model. To make sure that ML's set of test data works adequately, we design an experiment and see if the test accuracy of model is always close to its validation adequacy as expected. The experiment builds 100 different SVM models for each of six data sets published in UCI ML repository. From the test accuracy and its validation accuracy of 600 cases, we find some unexpected cases, where the test accuracy is very different from its validation accuracy. Consequently, it is not always true that ML's set of test data is adequate to assure a model's quality.
https://doi.org/10.22937/IJCSNS.2021.21.12.76 인용 PDF KSCI

뉴로-퍼지 소프트웨어 신뢰성 예측에 대한 최적의 데이터 분할비율에 관한 연구 (A Study of Optimal Ratio of Data Partition for Neuro-Fuzzy-Based Software Reliability Prediction)

이상운
- 정보처리학회논문지D
- /
- 제8D권2호
- /
- pp.175-180
- /
- 2001
본 논문은 미래의 소프크웨어 공장 수나 고장시간 예측 정확성을 얻기 위해, 뉴로-피지 시스템을 이용할 경우 최적의 검증 데이터 할당 비율에 대한 연구이다. 훈련 데이터가 주어졌을 때, 과소 적합과 과잉 적합을 회피하면서 최적의 일반화 능력을 얻기 취해 Early Stopping 방법이 일반적으로 사용되고 있다. 그러나 훈련과 검증 데이터로 얼마나 많은 데이터를 할당갈 것인가는 시행착오법을 이용해 경험적으로 해를 구해야만 하며, 과다한 시간이 소요된다. 최적의 검증 데이터 양을 구하기 위해 규칙 수를 증가시키면서 다양한 검증 데이터 양을 할당하였다. 실험결과 최소의 검증 데이터로도 좋은 예측 능력을 보였다. 이 결과는 뉴로-퍼지 시스템을 소프트웨어 신뢰성 분야에 적용시 실질직언 지침을 제공할 수 있는 것이다.
PDF

The Safeguard Validation Data Set (SGVDS) 1과 2를 활용한 군중 대피 시뮬레이션 검증 방안에 관한 연구 (A Study on Crowd Evacuation Simulation Validation Method using The Safeguard Validation Data Set (SGVDS) 1 and 2)

이승현;이재민;김현철
- 한국안전학회지
- /
- 제39권3호
- /
- pp.50-59
- /
- 2024
In recent years, building architecture has become increasingly complex and larger in scale to accommodate many people. In densely populated facilities, the interiors are becoming more intricate and high-rise, with narrow corridors, hallways, and stairs. This poses challenges for evacuating occupants in case of emergencies such as fires, making it crucial to assess the evacuation safety in advance. In evacuation safety research, there are significant limitations to theoretical studies owing to their association with crowd behavior and human evacuation characteristics, as well as the risks associated with experiments involving human participants. Consequently, evacuation experiments conducted using simulation-based methodologies are gaining recognition worldwide. However, crowd simulations face validation difficulties because of variations in crowd movement and evacuation characteristics across different cases and scenarios, as well as the challenge of accurately reflecting human characteristics during evacuations. In this study, we investigated validation methods for evacuation simulations using the SAFEGUARD validation data set (SGVDS) provided by the University of Greenwich, UK. The SGVDS collects data on crowd evacuations through actual evacuation tests conducted on ColorLine's large RO-PAX ferry and Royal Caribbean International's cruise ships. The accuracy of the crowd simulations can be validated by comparing SGVDS and crowd simulation results. This study will contribute to the development of highly accurate crowd simulations by verifying various crowd simulations.
https://doi.org/10.14346/JKOSOS.2024.39.3.50 인용 PDF

근적외분광분석법을 사용한 암브록솔 정제의 비파괴적 정량분석 (Nondestructive Quantification of Intact Ambroxol Tablet using Near-infrared Spectroscopy)

임현량;우영아;김도형;김효진;강신정;최현철;최한곤
- 약학회지
- /
- 제48권1호
- /
- pp.60-64
- /
- 2004
Near-infrared (NIR) spectroscopy was used to determine rapidly and nondestructively the content of ambroxol in intact ambroxol tablets containing 30 mg (12.5％ m/m nominal concentration) by collecting NIR spectra in range 1100-1750 nm. The laboratory-made samples had 10.3∼15.9％ m/m nominal ambroxol concentration. The measurements were made by reflection using a fiber-optic probe and calibration was carried out by partial least square regression (PLSR) with autoscaling. Model validation was performed by randomly splitting the data set into calibration and validation data set (7 samples as a calibration data set and 5 samples as a validation data set). The developed NIR method gave results comparable to the known values of tablets in a laboratorial manufacturing Process, standard error of calibration (SEC) and standard error of prediction (SEP) being 0.49％ and 0.49％ m/m respectively. The method showed good accuracy and repeatability NIR spectroscopic determination in intact tablets allowed the potential use of real time monitoring for a running production process.
PDF KSCI

더미 클래스를 가지는 열린 집합 얼굴 인식 방법의 유효성 검증에 대한 연구 (A Study on the Validation Test for Open Set Face Recognition Method with a Dummy Class)

안정호;최권택
- 디지털콘텐츠학회 논문지
- /
- 제18권3호
- /
- pp.525-534
- /
- 2017
열린 집합 인식 방법론은 테스트 데이터의 클래스를 학습 시에 모두 파악할 수 없는 경우에 대한 인식 방법론이다. 따라서 열린 집합 인식 방법론은 분류와 유효성 검증의 절차를 필요로 한다. 이러한 연구는 얼굴 인식 모듈의 상용화를 위해 필수적이지만 지금까지 국내에서 연구 결과들이 거의 발표되지 않았다. 우리는 두 개의 검증 단계를 가지는 열린 집합 얼굴 인식 방법론을 제안한다. 첫 번째 단계에서는 학습 클래스 외에 더미 클래스들을 설정하고 희소표현 기반 분류를 수행한다. 이 때 테스트 데이터가 더미 클래스로 분류되면 무효 데이터로 판별하고, 유효한 클래스로 분류되면 다음 유효성 검증 단계로 넘어간다. 두 번째 단계에서 제안하는 네 가지 특징을 추출하고, 확률분포에 기반을 둔 판별함수를 통해 유효성 검증을 수행한다. 우리는 실험을 통해 열린 집합 인식 방법론의 시뮬레이션 방법을 제안하였고 제안하는 방법론의 성능을 제시하고, 희소기반 분류 방식에서 널리 사용되는 SCI 지표를 이용한 유효성 테스트보다 높은 성능을 보임을 입증할 수 있었다.
https://doi.org/10.9728/dcs.2017.18.3.525 인용 PDF KSCI

근적외 분광분석법을 이용한 나프록센 정제의 정량분석 (Quantification of Naproxen in Pharmaceutical Formulation using Near-Infrared Spectrometry)

김도형;우영아;김효진
- 약학회지
- /
- 제49권1호
- /
- pp.1-5
- /
- 2005
Near-infrared (NIR) spectroscopy has been widely applied in various field, since it is nondestructive and no sample preparation is required. In this paper, NIR spectroscopy was used for the determination of naproxen in a commercial pharmaceutical preparation. NIR spectroscopy was used to determine the content of naproxen in intact naproxen tablets containing 250 mg ($65.8\%$ nominal concentration) by collecting NIR spectra in the range of $1100{\sim}1750nm$. The laboratory-made samples had $46.1{\sim}85.5\%$ nominal naproxen concentration. The measurements were made by reflection using a fiber-optic probe and calibration was carried out by partial least square regression (PLSR). Model validation was performed by randomly splitting the data set into calibration and validation data set (63 samples as a calibration data set and 42 samples as a validation data set). The developed NIR calibration gave results comparable to the known values of tablets in a laboratorial manufacturing process with standard error of calibration (SEC) and standard error of prediction (SEP) of $1.06\%\;and\;1.04\%$, respectively. The NIR method showed good accuracy and repeatability. NIR spectroscopic determination in intact tablets allowed the potential use of real time monitoring for a running production process.
PDF KSCI

적외선 영상에서 표적 추적을 위한 신호세기 기반 초기 유효게이트 설정 방법 (Setting an Initial Validation Gate based on Signal Intensity for Target Tracking in IR Image Sequences)

양유경;김지은;이부환
- 한국군사과학기술학회지
- /
- 제17권1호
- /
- pp.108-114
- /
- 2014
This paper describes a method to set an intensity-based initial validation gate for tracking filter while preserves the ability of tracking a target with maximum speed. First, we collected real data set of signal versus distance of an airplane target. And at each data point, we computed maximum distance the target can move. And a function is modeled to expect the maximum moving pixels on the lateral direction based on the intensity of the detected target in IR image sequence. The initial prediction error covariance can be computed using this function to decide the size of the initial validation gate. The simulation results show the proposed method can set the appropriate initial validation gates to track the targets with the maximum speed.
https://doi.org/10.9766/KIMST.2014.17.1.108 인용 PDF KSCI

Deformable image registration in radiation therapy

Oh, Seungjong;Kim, Siyong
- Radiation Oncology Journal
- /
- 제35권2호
- /
- pp.101-111
- /
- 2017
The number of imaging data sets has significantly increased during radiation treatment after introducing a diverse range of advanced techniques into the field of radiation oncology. As a consequence, there have been many studies proposing meaningful applications of imaging data set use. These applications commonly require a method to align the data sets at a reference. Deformable image registration (DIR) is a process which satisfies this requirement by locally registering image data sets into a reference image set. DIR identifies the spatial correspondence in order to minimize the differences between two or among multiple sets of images. This article describes clinical applications, validation, and algorithms of DIR techniques. Applications of DIR in radiation treatment include dose accumulation, mathematical modeling, automatic segmentation, and functional imaging. Validation methods discussed are based on anatomical landmarks, physical phantoms, digital phantoms, and per application purpose. DIR algorithms are also briefly reviewed with respect to two algorithmic components: similarity index and deformation models.
https://doi.org/10.3857/roj.2017.00325 인용 PDF KSCI

신경망 학습앙상블에 관한 연구 - 주가예측을 중심으로 - (A Study on Training Ensembles of Neural Networks - A Case of Stock Price Prediction)

이영찬;곽수환
- 지능정보연구
- /
- 제5권1호
- /
- pp.95-101
- /
- 1999
In this paper, a comparison between different methods to combine predictions from neural networks will be given. These methods are bagging, bumping, and balancing. Those are based on the analysis of the ensemble generalization error into an ambiguity term and a term incorporating generalization performances of individual networks. Neural Networks and AI machine learning models are prone to overfitting. A strategy to prevent a neural network from overfitting, is to stop training in early stage of the learning process. The complete data set is spilt up into a training set and a validation set. Training is stopped when the error on the validation set starts increasing. The stability of the networks is highly dependent on the division in training and validation set, and also on the random initial weights and the chosen minimization procedure. This causes early stopped networks to be rather unstable: a small change in the data or different initial conditions can produce large changes in the prediction. Therefore, it is advisable to apply the same procedure several times starting from different initial weights. This technique is often referred to as training ensembles of neural networks. In this paper, we presented a comparison of three statistical methods to prevent overfitting of neural network.
PDF

영상 데이터 특징 커버리지 기반 딥러닝 모델 검증 기법 (Deep Learning Model Validation Method Based on Image Data Feature Coverage)

임창남;박예슬;이정원
- 정보처리학회논문지:소프트웨어 및 데이터공학
- /
- 제10권9호
- /
- pp.375-384
- /
- 2021
딥러닝 기법은 영상 처리 분야에서 높은 성능을 입증 받아 다양한 분야에서 적용되고 있다. 이러한 딥러닝 모델의 검증에 가장 널리 사용되는 방법으로는 홀드아웃 검증 방법, k-겹 교차 검증 방법, 부트스트랩 방법 등이 있다. 이러한 기존의 기법들은 데이터 셋을 분할하는 과정에서 클래스 간의 비율에 대한 균형을 고려하지만, 같은 클래스 내에서도 존재하는 다양한 특징들의 비율은 고려하지 않고 있다. 이러한 특징들을 고려하지 않을 경우, 일부 특징에 편향된 검증 결과를 얻게 될 수 있다. 따라서 본 논문에서는 기존 검증 방법들을 개선하여 영상 분류를 위한 데이터 특징 커버리지 기반의 딥러닝 모델 검증 기법을 제안한다. 제안하는 기법은 딥러닝 모델의 학습과 검증을 위한 훈련 데이터 셋과 평가 데이터 셋이 전체 데이터 셋의 특징을 얼마나 반영하고 있는지 수치로 측정할 수 있는 데이터 특징 커버리지를 제안한다. 이러한 방식은 전체 데이터 셋의 특징을 모두 포함하도록 커버리지를 보장하여 데이터 셋을 분할할 수 있고, 모델의 평가 결과를 생성한 특징 군집 단위로 분석할 수 있다. 검증결과, 훈련 데이터 셋의 데이터 특징 커버리지가 낮아질 경우, 모델이 특정 특징에 편향되게 학습하여 모델의 성능이 낮아지며, Fashion-MNIST의 경우 정확도가 8.9%까지 차이나는 것을 확인하였다.
https://doi.org/10.3745/KTSDE.2021.10.9.375 인용 PDF KSCI

검색결과 381건 처리시간 0.025초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)