• Title/Summary/Keyword: Cross validation technique

Search Result 126, Processing Time 0.023 seconds

Deep Learning Model Validation Method Based on Image Data Feature Coverage (영상 데이터 특징 커버리지 기반 딥러닝 모델 검증 기법)

  • Lim, Chang-Nam;Park, Ye-Seul;Lee, Jung-Won
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.9
    • /
    • pp.375-384
    • /
    • 2021
  • Deep learning techniques have been proven to have high performance in image processing and are applied in various fields. The most widely used methods for validating a deep learning model include a holdout verification method, a k-fold cross verification method, and a bootstrap method. These legacy methods consider the balance of the ratio between classes in the process of dividing the data set, but do not consider the ratio of various features that exist within the same class. If these features are not considered, verification results may be biased toward some features. Therefore, we propose a deep learning model validation method based on data feature coverage for image classification by improving the legacy methods. The proposed technique proposes a data feature coverage that can be measured numerically how much the training data set for training and validation of the deep learning model and the evaluation data set reflects the features of the entire data set. In this method, the data set can be divided by ensuring coverage to include all features of the entire data set, and the evaluation result of the model can be analyzed in units of feature clusters. As a result, by providing feature cluster information for the evaluation result of the trained model, feature information of data that affects the trained model can be provided.

Automatic Detection of Type II Solar Radio Burst by Using 1-D Convolution Neutral Network

  • Kyung-Suk Cho;Junyoung Kim;Rok-Soon Kim;Eunsu Park;Yuki Kubo;Kazumasa Iwai
    • Journal of The Korean Astronomical Society
    • /
    • v.56 no.2
    • /
    • pp.213-224
    • /
    • 2023
  • Type II solar radio bursts show frequency drifts from high to low over time. They have been known as a signature of coronal shock associated with Coronal Mass Ejections (CMEs) and/or flares, which cause an abrupt change in the space environment near the Earth (space weather). Therefore, early detection of type II bursts is important for forecasting of space weather. In this study, we develop a deep-learning (DL) model for the automatic detection of type II bursts. For this purpose, we adopted a 1-D Convolution Neutral Network (CNN) as it is well-suited for processing spatiotemporal information within the applied data set. We utilized a total of 286 radio burst spectrum images obtained by Hiraiso Radio Spectrograph (HiRAS) from 1991 and 2012, along with 231 spectrum images without the bursts from 2009 to 2015, to recognizes type II bursts. The burst types were labeled manually according to their spectra features in an answer table. Subsequently, we applied the 1-D CNN technique to the spectrum images using two filter windows with different size along time axis. To develop the DL model, we randomly selected 412 spectrum images (80%) for training and validation. The train history shows that both train and validation losses drop rapidly, while train and validation accuracies increased within approximately 100 epoches. For evaluation of the model's performance, we used 105 test images (20%) and employed a contingence table. It is found that false alarm ratio (FAR) and critical success index (CSI) were 0.14 and 0.83, respectively. Furthermore, we confirmed above result by adopting five-fold cross-validation method, in which we re-sampled five groups randomly. The estimated mean FAR and CSI of the five groups were 0.05 and 0.87, respectively. For experimental purposes, we applied our proposed model to 85 HiRAS type II radio bursts listed in the NGDC catalogue from 2009 to 2016 and 184 quiet (no bursts) spectrum images before and after the type II bursts. As a result, our model successfully detected 79 events (93%) of type II events. This results demonstrates, for the first time, that the 1-D CNN algorithm is useful for detecting type II bursts.

Quantile regression using asymmetric Laplace distribution (비대칭 라플라스 분포를 이용한 분위수 회귀)

  • Park, Hye-Jung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.6
    • /
    • pp.1093-1101
    • /
    • 2009
  • Quantile regression has become a more widely used technique to describe the distribution of a response variable given a set of explanatory variables. This paper proposes a novel modelfor quantile regression using doubly penalized kernel machine with support vector machine iteratively reweighted least squares (SVM-IRWLS). To make inference about the shape of a population distribution, the widely popularregression, would be inadequate, if the distribution is not approximately Gaussian. We present a likelihood-based approach to the estimation of the regression quantiles that uses the asymmetric Laplace density.

  • PDF

Quantification of an active ingredient in tablets by NIR transmission measurements

  • Niemoller, Andreas;Schmidt, Angela;Weis, Aaron;Weiler, Helmut
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.4114-4114
    • /
    • 2001
  • For the quality control of tablets several parameters have to be checked. The most important one is the content of an active ingredient which has to match a narrow range around the designated content. The only useful measurement mode is transmission which provides information of the complete tablet. A measurement in diffuse reflectance would register only the surface which is useless especially in case of a coated tablet. In this work tablets for a clinical study (placebo/verum studies) with very low concentrations of the active ingredient were measured. The concentration range was 0 to 6 mg with a total weight of the tablets of 105 mg, leading to a highest concentration of the active component of 5.7% by weight. Especially the spectroscopic distinction between the placebo and the low dosage forms with 0.25 and 0.5 mg active agent requires an extraordinarily accurate sampling technique. Using the VECTOR 22/N-T in transmission mode allows the collection of the information from the complete tablets. A quantitative PLS-model with transmission spectra from the tablets described above shows that the active substance can be predicted with a RMSECV (root mean square error of cross validation) of 0.04% absolute for this special application. The results are compared with those of measurements in diffuse reflectance using different accessories.

  • PDF

A mathematical spatial interpolation method for the estimation of convective rainfall distribution over small watersheds

  • Zhang, Shengtang;Zhang, Jingzhou;Liu, Yin;Liu, Yuanchen
    • Environmental Engineering Research
    • /
    • v.21 no.3
    • /
    • pp.226-232
    • /
    • 2016
  • Rainfall is one of crucial factors that impact on our environment. Rainfall data is important in water resources management, flood forecasting, and designing hydraulic structures. However, it is not available in some rural watersheds without rain gauges. Thus, effective ways of interpolating the available records are needed. Despite many widely used spatial interpolation methods, few studies have investigated rainfall center characteristics. Based on the theory that the spatial distribution of convective rainfall event has a definite center with maximum rainfall, we present a mathematical interpolation method to estimate convective rainfall distribution and indicate the rainfall center location and the center rainfall volume. We apply the method to estimate three convective rainfall events in Santa Catalina Island where reliable hydrological data is available. A cross-validation technique is used to evaluate the method. The result shows that the method will suffer from high relative error in two situations: 1) when estimating the minimum rainfall and 2) when estimating an external site. For all other situations, the method's performance is reasonable and acceptable. Since the method is based on a continuous function, it can provide distributed rainfall data for distributed hydrological model sand indicate statistical characteristics of given areas via mathematical calculation.

IMPROVING THE ESP ACCURACY WITH COMBINATION OF PROBABILISTIC FORECASTS

  • Yu, Seung-Oh;Kim, Young-Oh
    • Water Engineering Research
    • /
    • v.5 no.2
    • /
    • pp.101-109
    • /
    • 2004
  • Aggregating information by combining forecasts from two or more forecasting methods is an alternative to using forecasts from just a single method to improve forecast accuracy. This paper describes the development and use of a monthly inflow forecast model based on an optimal linear combination (OLC) of forecasts derived from naive, persistence, and Ensemble Streamflow Prediction (ESP) forecasts. Using the cross-validation technique, the OLC model made 1-month ahead probabilistic forecasts for the Chungju multi-purpose dam inflows for 15 years. For most of the verification months, the skill associated with the OLC forecast was superior to those drawn from the individual forecast techniques. Therefore this study demonstrates that OLC can improve the accuracy of the ESP forecast, especially during the dry season. This study also examined the value of the OLC forecasts in reservoir operations. Stochastic Dynamic Programming (SDP) derived the optimal operating policy for the Chungju multi-purpose dam operation and the derived policy was simulated using the 15-year observed inflows. The simulation results showed the SDP model that updated its probability from the new OLC forecast provided more efficient operation decisions than the conventional SDP model.

  • PDF

Estimating GARCH models using kernel machine learning (커널기계 기법을 이용한 일반화 이분산자기회귀모형 추정)

  • Hwang, Chang-Ha;Shin, Sa-Im
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.3
    • /
    • pp.419-425
    • /
    • 2010
  • Kernel machine learning is gaining a lot of popularities in analyzing large or high dimensional nonlinear data. We use this technique to estimate a GARCH model for predicting the conditional volatility of stock market returns. GARCH models are usually estimated using maximum likelihood (ML) procedures, assuming that the data are normally distributed. In this paper, we show that GARCH models can be estimated using kernel machine learning and that kernel machine has a higher predicting ability than ML methods and support vector machine, when estimating volatility of financial time series data with fat tail.

Bayesian Inversion of Gravity and Resistivity Data: Detection of Lava Tunnel

  • Kwon, Byung-Doo;Oh, Seok-Hoon
    • Journal of the Korean earth science society
    • /
    • v.23 no.1
    • /
    • pp.15-29
    • /
    • 2002
  • Bayesian inversion for gravity and resistivity data was performed to investigate the cavity structure appearing as a lava tunnel in Cheju Island, Korea. Dipole-dipole DC resistivity data were proposed for a prior information of gravity data and we applied the geostatistical techniques such as kriging and simulation algorithms to provide a prior model information and covariance matrix in data domain. The inverted resistivity section gave the indicator variogram modeling for each threshold and it provided spatial uncertainty to give a prior PDF by sequential indicator simulations. We also presented a more objective way to make data covariance matrix that reflects the state of the achieved field data by geostatistical technique, cross-validation. Then Gaussian approximation was adopted for the inference of characteristics of the marginal distributions of model parameters and Broyden update for simple calculation of sensitivity matrix and SVD was applied. Generally cavity investigation by geophysical exploration is difficult and success is hard to be achieved. However, this exotic multiple interpretations showed remarkable improvement and stability for interpretation when compared to data-fit alone results, and suggested the possibility of diverse application for Bayesian inversion in geophysical inverse problem.

GS-MARS method for predicting the ultimate load-carrying capacity of rectangular CFST columns under eccentric loading

  • Luat, Nguyen-Vu;Lee, Jaehong;Lee, Do Hyung;Lee, Kihak
    • Computers and Concrete
    • /
    • v.25 no.1
    • /
    • pp.1-14
    • /
    • 2020
  • This study presents applications of the multivariate adaptive regression splines (MARS) method for predicting the ultimate loading carrying capacity (Nu) of rectangular concrete-filled steel tubular (CFST) columns subjected to eccentric loading. A database containing 141 experimental data was collected from available literature to develop the MARS model with a total of seven variables that covered various geometrical and material properties including the width of rectangular steel tube (B), the depth of rectangular steel tube (H), the wall thickness of steel tube (t), the length of column (L), cylinder compressive strength of concrete (f'c), yield strength of steel (fy), and the load eccentricity (e). The proposed model is a combination of the MARS algorithm and the grid search cross-validation technique (abbreviated here as GS-MARS) in order to determine MARS' parameters. A new explicit formulation was derived from MARS for the mentioned input variables. The GS-MARS estimation accuracy was compared with four available mathematical methods presented in the current design codes, including AISC, ACI-318, AS, and Eurocode 4. The results in terms of criteria indices indicated that the MARS model was much better than the available formulae.

Feature Analysis on Industrial Accidents of Manufacturing Businesses Using QUEST Algorithm

  • Leem, Young-Moon;Rogers, K.J.;Hwang, Young-Seob
    • International Journal of Safety
    • /
    • v.5 no.1
    • /
    • pp.37-41
    • /
    • 2006
  • The major objective of the statistical analysis about industrial accidents is to determine the safety factors so that it is possible to prevent or decrease the number of future accidents by educating those who work in a given industrial field in safety management. So far, however, there exists no quantitative method for evaluating danger related to industrial accidents. Therefore, as a method for developing quantitative evaluation technique, this study presents feature analysis of industrial accidents in manufacturing field using QUEST algorithm. In order to analyze features of industrial accidents, a retrospective analysis was performed on 10,536 subjects (10,313 injured people, 223 deaths). The sample for this work was chosen from data related to manufacturing businesses during a three-year period ($2002{\sim}2004$) in Korea. This study used AnswerTree of SPSS and the analysis results enabled us to determine the most important variables that can affect injured people such as the occurrence type, the company size, and the time of occurrence. Also, it was found that the classification system adopted in the present study using QUEST algorithm is quite reliable.