• Title/Summary/Keyword: cross-validation test

검색결과 177건 처리시간 0.026초

시계열 교차검증을 적용한 2,3-BDO 분리공정 온도예측 모델의 초매개변수 최적화 (Application of Time-series Cross Validation in Hyperparameter Tuning of a Predictive Model for 2,3-BDO Distillation Process)

  • 안나현;최영렬;조형태;김정환
    • Korean Chemical Engineering Research
    • /
    • 제59권4호
    • /
    • pp.532-541
    • /
    • 2021
  • 최근 인공지능에 대한 관심이 높아짐에 따라 화학공정분야에서도 인공지능을 활용한 연구가 많아지고 있다. 그러나 인공지능 기반 모델이 충분히 일반화되지 않아 학습에 이용되지 않은 새로운 데이터에 대한 예측률이 떨어지는 과적합 현상이 빈번하게 일어나고 있으며, 교차검증은 과적합을 해결하는 방법 중 하나이다. 본 연구에서는 2,3-BDO 분리 공정 온도 예측 모델의 초매개변수 중에서 배치 개수와 반복횟수를 조정하기 위해 시계열 교차검증을 적용하고 일반적으로 사용되는 K 겹 교차검증과 비교하였다. 결과적으로 K 겹 교차검증을 사용했을 때 보다 시계열 교차검증 방식을 사용했을 때 MAPE는 0.61% 증가한 반면 RMSE는 9.06% 감소하였고 학습 시간은 198.29초 적게 소요되었다.

Finding Unexpected Test Accuracy by Cross Validation in Machine Learning

  • Yoon, Hoijin
    • International Journal of Computer Science & Network Security
    • /
    • 제21권12spc호
    • /
    • pp.549-555
    • /
    • 2021
  • Machine Learning(ML) splits data into 3 parts, which are usually 60% for training, 20% for validation, and 20% for testing. It just splits quantitatively instead of selecting each set of data by a criterion, which is very important concept for the adequacy of test data. ML measures a model's accuracy by applying a set of validation data, and revises the model until the validation accuracy reaches on a certain level. After the validation process, the complete model is tested with the set of test data, which are not seen by the model yet. If the set of test data covers the model's attributes well, the test accuracy will be close to the validation accuracy of the model. To make sure that ML's set of test data works adequately, we design an experiment and see if the test accuracy of model is always close to its validation adequacy as expected. The experiment builds 100 different SVM models for each of six data sets published in UCI ML repository. From the test accuracy and its validation accuracy of 600 cases, we find some unexpected cases, where the test accuracy is very different from its validation accuracy. Consequently, it is not always true that ML's set of test data is adequate to assure a model's quality.

실제 네트워크 모니터링 환경에서의 ML 알고리즘을 이용한 트래픽 분류 (Traffic Classification Using Machine Learning Algorithms in Practical Network Monitoring Environments)

  • 정광본;최미정;김명섭;원영준;홍원기
    • 한국통신학회논문지
    • /
    • 제33권8B호
    • /
    • pp.707-718
    • /
    • 2008
  • Traffic classification의 방법은 동적으로 변하는 application의 변화에 대처하기 위하여 페이로드나 port를 기반으로 하는 것에서 ML 알고리즘을 기반으로 하는 것으로 변하여 가고 있다. 그러나 현재의 ML 알고리즘을 이용한 traffic classification 연구는 offline 환경에 맞추어 진행되고 있다. 특히, 현재의 기존 연구들은 testing 방법으로 cross validation을 이용하여 traffic classification을 수행하고 있으며, traffic flow를 기반으로 classification 결과를 제시하고 있다. 본 논문에서는 testing방법으로 cross validation과 split validation을 이용했을 때, traffic classification의 정확도 결과를 비교한다. 또한 바이트를 기반으로 한 classification의 결과와 flow를 기반으로 한 classification의 결과를 비교해 본다. 본 논문에서는 J48, REPTree, RBFNetwork, Multilayer perceptron, BayesNet, NaiveBayes와 같은 ML 알고리즘과 다양한 feature set을 이용하여 트래픽을 분류한다. 그리고 split validation을 이용한 traffic classification에 적합한 최적의 ML 알고리즘과 feature set을 제시한다.

Estimating Prediction Errors in Binary Classification Problem: Cross-Validation versus Bootstrap

  • Kim Ji-Hyun;Cha Eun-Song
    • Communications for Statistical Applications and Methods
    • /
    • 제13권1호
    • /
    • pp.151-165
    • /
    • 2006
  • It is important to estimate the true misclassification rate of a given classifier when an independent set of test data is not available. Cross-validation and bootstrap are two possible approaches in this case. In related literature bootstrap estimators of the true misclassification rate were asserted to have better performance for small samples than cross-validation estimators. We compare the two estimators empirically when the classification rule is so adaptive to training data that its apparent misclassification rate is close to zero. We confirm that bootstrap estimators have better performance for small samples because of small variance, and we have found a new fact that their bias tends to be significant even for moderate to large samples, in which case cross-validation estimators have better performance with less computation.

Cross-cultural Validation of Instruments Measuring Health Beliefs about Colorectal Cancer Screening among Korean Americans

  • Lee, Shin-Young;Lee, Eunice E.
    • 대한간호학회지
    • /
    • 제45권1호
    • /
    • pp.129-138
    • /
    • 2015
  • Purpose: The purpose of this study was to report the instrument modification and validation processes to make existing health belief model scales culturally appropriate for Korean Americans (KAs) regarding colorectal cancer (CRC) screening utilization. Methods: Instrument translation, individual interviews using cognitive interviewing, and expert reviews were conducted during the instrument modification phase, and a pilot test and a cross-sectional survey were conducted during the instrument validation phase. Data analyses of the cross-sectional survey included internal consistency and construct validity using exploratory and confirmatory factor analysis. Results: The main issues identified during the instrument modification phase were (a) cultural and linguistic translation issues and (b) newly developed items reflecting Korean cultural barriers. Cross-sectional survey analyses during the instrument validation phase revealed that all scales demonstrate good internal consistency reliability (Cronbach's alpha=.72~.88). Exploratory factor analysis showed that susceptibility and severity loaded on the same factor, which may indicate a threat variable. Items with low factor loadings in the confirmatory factor analysis may relate to (a) lack of knowledge about fecal occult blood testing and (b) multiple dimensions of the subscales. Conclusion: Methodological, sequential processes of instrument modification and validation, including translation, individual interviews, expert reviews, pilot testing and a cross-sectional survey, were provided in this study. The findings indicate that existing instruments need to be examined for CRC screening research involving KAs.

자동기계학습 TPOT 기반 저수위 예측 정확도 향상을 위한 시계열 교차검증 기법 연구 (A Study on Time Series Cross-Validation Techniques for Enhancing the Accuracy of Reservoir Water Level Prediction Using Automated Machine Learning TPOT)

  • 배주현;박운지;이서로;박태선;박상빈;김종건;임경재
    • 한국농공학회논문집
    • /
    • 제66권1호
    • /
    • pp.1-13
    • /
    • 2024
  • This study assessed the efficacy of improving the accuracy of reservoir water level prediction models by employing automated machine learning models and efficient cross-validation methods for time-series data. Considering the inherent complexity and non-linearity of time-series data related to reservoir water levels, we proposed an optimized approach for model selection and training. The performance of twelve models was evaluated for the Obong Reservoir in Gangneung, Gangwon Province, using the TPOT (Tree-based Pipeline Optimization Tool) and four cross-validation methods, which led to the determination of the optimal pipeline model. The pipeline model consisting of Extra Tree, Stacking Ridge Regression, and Simple Ridge Regression showed outstanding predictive performance for both training and test data, with an R2 (Coefficient of determination) and NSE (Nash-Sutcliffe Efficiency) exceeding 0.93. On the other hand, for predictions of water levels 12 hours later, the pipeline model selected through time-series split cross-validation accurately captured the change pattern of time-series water level data during the test period, with an NSE exceeding 0.99. The methodology proposed in this study is expected to greatly contribute to the efficient generation of reservoir water level predictions in regions with high rainfall variability.

지진 취약성 평가 모델 교차검증: 경주(2016)와 포항(2017) 지진을 대상으로 (A Cross-Validation of SeismicVulnerability Assessment Model: Application to Earthquake of 9.12 Gyeongju and 2017 Pohang)

  • 한지혜;김진수
    • 대한원격탐사학회지
    • /
    • 제37권3호
    • /
    • pp.649-655
    • /
    • 2021
  • 본 연구는 경주시를 대상으로 수행한 선행연구를 바탕으로 도출된 최적의 지진 취약성 평가 모델을 타 지역에 적용하여 그 성능을 교차 검증(cross-validation)하고자 한다. 테스트 지역은 2017 포항지진(Pohang Earthquake)이 발생한 포항시이며, 선행연구와 동일한 영향인자 및 피해현황 관련 데이터셋을 구축하였다. 검증 데이터 셋은 무작위로 추출해 구축하였으며, 경주시의 랜덤 포레스트(random forest, RF) 기반의 모델에 적용하여 예측 정확도를 도출하였다. 경주시의 모델(success) 및 예측(prediction) 정확도는 100%, 94.9%이며, 포항시 검증 데이터 셋을 적용해 예측 정확도를 확인한 결과 70.4%로 나타났다.

A Study on the Land Cover Classification and Cross Validation of AI-based Aerial Photograph

  • Lee, Seong-Hyeok;Myeong, Soojeong;Yoon, Donghyeon;Lee, Moung-Jin
    • 대한원격탐사학회지
    • /
    • 제38권4호
    • /
    • pp.395-409
    • /
    • 2022
  • The purpose of this study is to evaluate the classification performance and applicability when land cover datasets constructed for AI training are cross validation to other areas. For study areas, Gyeongsang-do and Jeolla-do in South Korea were selected as cross validation areas, and training datasets were obtained from AI-Hub. The obtained datasets were applied to the U-Net algorithm, a semantic segmentation algorithm, for each region, and the accuracy was evaluated by applying them to the same and other test areas. There was a difference of about 13-15% in overall classification accuracy between the same and other areas. For rice field, fields and buildings, higher accuracy was shown in the Jeolla-do test areas. For roads, higher accuracy was shown in the Gyeongsang-do test areas. In terms of the difference in accuracy by weight, the result of applying the weights of Gyeongsang-do showed high accuracy for forests, while that of applying the weights of Jeolla-do showed high accuracy for dry fields. The result of land cover classification, it was found that there is a difference in classification performance of existing datasets depending on area. When constructing land cover map for AI training, it is expected that higher quality datasets can be constructed by reflecting the characteristics of various areas. This study is highly scalable from two perspectives. First, it is to apply satellite images to AI study and to the field of land cover. Second, it is expanded based on satellite images and it is possible to use a large scale area and difficult to access.

평균제곱오차를 이용한 크리깅 근사모델의 오차 평가 (An Error Assessment of the Kriging Based Approximation Model Using a Mean Square Error)

  • 주병현;조태민;정도현;이병채
    • 대한기계학회논문집A
    • /
    • 제30권8호
    • /
    • pp.923-930
    • /
    • 2006
  • A Kriging model is a sort of approximation model and used as a deterministic model of a computationally expensive analysis or simulation. Although it has various advantages, it is difficult to assess the accuracy of the approximated model. It is generally known that a mean square error (MSE) obtained from the kriging model can't calculate statistically exact error bounds contrary to a response surface method, and a cross validation is mainly used. But the cross validation also has many uncertainties. Moreover, the cross validation can't be used when a maximum error is required in the given region. For solving this problem, we first proposed a modified mean square error which can consider relative errors. Using the modified mean square error, we developed the strategy of adding a new sample to the place that the MSE has the maximum when the MSE is used for the assessment of the kriging model. Finally, we offer guidelines for the use of the MSE which is obtained from the kriging model. Four test problems show that the proposed strategy is a proper method which can assess the accuracy of the kriging model. Based on the results of four test problems, a convergence coefficient of 0.01 is recommended for an exact function approximation.

An Availability of Low Cost Sensors for Machine Fault Diagnosis

  • SON, JONG-DUK
    • 한국소음진동공학회:학술대회논문집
    • /
    • 한국소음진동공학회 2012년도 추계학술대회 논문집
    • /
    • pp.394-399
    • /
    • 2012
  • 최근 MEMS 센서는 기계상태감시에 있어서 전력소모, 크기, 비용, 이동성, 응용 등에 있어서 각광을 받고 있다. 특히, MEMS 센서는 스마트센서와 통합가능하고, 대량생산이 가능하여 가격이 저렴하다는 장점이 있다. 이와 관련한 기계상태감시를 위한 많은 실험적 연구가 수행되고 있다. 이 논문은 MEMS 센서들을 3 가지 인공지능 분류기 성능평가를 위한 비교연구에 대해 설명하고 있다. 회전기계에 MEMS 가속도와 전류센서들을 부착하여 데이터를 취득했고, 특징추출과 파라미터 최적화를 위해 Cross validation 기법을 사용하였다. MEMS 센서를 이용한 결함분류기 적용은 적합하다고 판단된다.

  • PDF