• Title/Summary/Keyword: Ensemble size

Search Result 97, Processing Time 0.028 seconds

Assessment of the Prediction Derived from Larger Ensemble Size and Different Initial Dates in GloSea6 Hindcast (기상청 기후예측시스템(GloSea6) 과거기후 예측장의 앙상블 확대와 초기시간 변화에 따른 예측 특성 분석)

  • Kim, Ji-Yeong;Park, Yeon-Hee;Ji, Heesook;Hyun, Yu-Kyung;Lee, Johan
    • Atmosphere
    • /
    • v.32 no.4
    • /
    • pp.367-379
    • /
    • 2022
  • In this paper, the evaluation of the performance of Korea Meteorological Administratio (KMA) Global Seasonal forecasting system version 6 (GloSea6) is presented by assessing the effects of larger ensemble size and carrying out the test using different initial conditions for hindcast in sub-seasonal to seasonal scales. The number of ensemble members increases from 3 to 7. The Ratio of Predictable Components (RPC) approaches the appropriate signal magnitude with increase of ensemble size. The improvement of annual variability is shown for all basic variables mainly in mid-high latitude. Over the East Asia region, there are enhancements especially in 500 hPa geopotential height and 850 hPa wind fields. It reveals possibility to improve the performance of East Asian monsoon. Also, the reliability tends to become better as the ensemble size increases in summer than winter. To assess the effects of using different initial conditions, the area-mean values of normalized bias and correlation coefficients are compared for each basic variable for hindcast according to the four initial dates. The results have better performance when the initial date closest to the forecasting time is used in summer. On the seasonal scale, it is better to use four initial dates, where the maximum size of the ensemble increases to 672, mainly in winter. As the use of larger ensemble size, therefore, it is most efficient to use two initial dates for 60-days prediction and four initial dates for 6-months prediction, similar to the current Time-Lagged ensemble method.

Assessment of the Prediction Performance of Ensemble Size-Related in GloSea5 Hindcast Data (기상청 기후예측시스템(GloSea5)의 과거기후장 앙상블 확대에 따른 예측성능 평가)

  • Park, Yeon-Hee;Hyun, Yu-Kyung;Heo, Sol-Ip;Ji, Hee-Sook
    • Atmosphere
    • /
    • v.31 no.5
    • /
    • pp.511-523
    • /
    • 2021
  • This study explores the optimal ensemble size to improve the prediction performance of the Korea Meteorological Administration's operational climate prediction system, global seasonal forecast system version 5 (GloSea5). The GloSea5 produces an ensemble of hindcast data using the stochastic kinetic energy backscattering version2 (SKEB2) and timelagged ensemble. An experiment to increase the hindcast ensemble from 3 to 14 members for four initial dates was performed and the improvement and effect of the prediction performance considering Root Mean Square Error (RMSE), Anomaly Correlation Coefficient (ACC), ensemble spread, and Ratio of Predictable Components (RPC) were evaluated. As the ensemble size increased, the RMSE and ACC prediction performance improved and more significantly in the high variability area. In spread and RPC analysis, the prediction accuracy of the system improved as the ensemble size increased. The closer the initial date, the better the predictive performance. Results show that increasing the ensemble to an appropriate number considering the combination of initial times is efficient.

Typhoon Wukong (200610) Prediction Based on The Ensemble Kalman Filter and Ensemble Sensitivity Analysis (앙상블 칼만 필터를 이용한 태풍 우쿵 (200610) 예측과 앙상블 민감도 분석)

  • Park, Jong Im;Kim, Hyun Mee
    • Atmosphere
    • /
    • v.20 no.3
    • /
    • pp.287-306
    • /
    • 2010
  • An ensemble Kalman filter (EnKF) with Weather Research and Forecasting (WRF) Model is applied for Typhoon Wukong (200610) to investigate the performance of ensemble forecasts depending on experimental configurations of the EnKF. In addition, the ensemble sensitivity analysis is applied to the forecast and analysis ensembles generated in EnKF, to investigate the possibility of using the ensemble sensitivity analysis as the adaptive observation guidance. Various experimental configurations are tested by changing model error, ensemble size, assimilation time window, covariance relaxation, and covariance localization in EnKF. First of all, experiments using different physical parameterization scheme for each ensemble member show less root mean square error compared to those using single physics for all the forecast ensemble members, which implies that considering the model error is beneficial to get better forecasts. A larger number of ensembles are also beneficial than a smaller number of ensembles. For the assimilation time window, the experiment using less frequent window shows better results than that using more frequent window, which is associated with the availability of observational data in this study. Therefore, incorporating model error, larger ensemble size, and less frequent assimilation window into the EnKF is beneficial to get better prediction of Typhoon Wukong (200610). The covariance relaxation and localization are relatively less beneficial to the forecasts compared to those factors mentioned above. The ensemble sensitivity analysis shows that the sensitive regions for adaptive observations can be determined by the sensitivity of the forecast measure of interest to the initial ensembles. In addition, the sensitivities calculated by the ensemble sensitivity analysis can be explained by dynamical relationships established among wind, temperature, and pressure.

Tree size determination for classification ensemble

  • Choi, Sung Hoon;Kim, Hyunjoong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.1
    • /
    • pp.255-264
    • /
    • 2016
  • Classification is a predictive modeling for a categorical target variable. Various classification ensemble methods, which predict with better accuracy by combining multiple classifiers, became a powerful machine learning and data mining paradigm. Well-known methodologies of classification ensemble are boosting, bagging and random forest. In this article, we assume that decision trees are used as classifiers in the ensemble. Further, we hypothesized that tree size affects classification accuracy. To study how the tree size in uences accuracy, we performed experiments using twenty-eight data sets. Then we compare the performances of ensemble algorithms; bagging, double-bagging, boosting and random forest, with different tree sizes in the experiment.

Illegal Cash Accommodation Detection Modeling Using Ensemble Size Reduction (신용카드 불법현금융통 적발을 위한 축소된 앙상블 모형)

  • Lee, Hwa-Kyung;Han, Sang-Bum;Jhee, Won-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.1
    • /
    • pp.93-116
    • /
    • 2010
  • Ensemble approach is applied to the detection modeling of illegal cash accommodation (ICA) that is the well-known type of fraudulent usages of credit cards in far east nations and has not been addressed in the academic literatures. The performance of fraud detection model (FDM) suffers from the imbalanced data problem, which can be remedied to some extent using an ensemble of many classifiers. It is generally accepted that ensembles of classifiers produce better accuracy than a single classifier provided there is diversity in the ensemble. Furthermore, recent researches reveal that it may be better to ensemble some selected classifiers instead of all of the classifiers at hand. For the effective detection of ICA, we adopt ensemble size reduction technique that prunes the ensemble of all classifiers using accuracy and diversity measures. The diversity in ensemble manifests itself as disagreement or ambiguity among members. Data imbalance intrinsic to FDM affects our approach for ICA detection in two ways. First, we suggest the training procedure with over-sampling methods to obtain diverse training data sets. Second, we use some variants of accuracy and diversity measures that focus on fraud class. We also dynamically calculate the diversity measure-Forward Addition and Backward Elimination. In our experiments, Neural Networks, Decision Trees and Logit Regressions are the base models as the ensemble members and the performance of homogeneous ensembles are compared with that of heterogeneous ensembles. The experimental results show that the reduced size ensemble is as accurate on average over the data-sets tested as the non-pruned version, which provides benefits in terms of its application efficiency and reduced complexity of the ensemble.

Evaluation of PNU CGCM Ensemble Forecast System for Boreal Winter Temperature over South Korea (PNU CGCM 앙상블 예보 시스템의 겨울철 남한 기온 예측 성능 평가)

  • Ahn, Joong-Bae;Lee, Joonlee;Jo, Sera
    • Atmosphere
    • /
    • v.28 no.4
    • /
    • pp.509-520
    • /
    • 2018
  • The performance of the newly designed Pusan National University Coupled General Circulation Model (PNU CGCM) Ensemble Forecast System which produce 40 ensemble members for 12-month lead prediction is evaluated and analyzed in terms of boreal winter temperature over South Korea (S. Korea). The influence of ensemble size on prediction skill is examined with 40 ensemble members and the result shows that spreads of predictability are larger when the size of ensemble member is smaller. Moreover, it is suggested that more than 20 ensemble members are required for better prediction of statistically significant inter-annual variability of wintertime temperature over S. Korea. As for the ensemble average (ENS), it shows superior forecast skill compared to each ensemble member and has significant temporal correlation with Automated Surface Observing System (ASOS) temperature at 99% confidence level. In addition to forecast skill for inter-annual variability of wintertime temperature over S. Korea, winter climatology around East Asia and synoptic characteristics of warm (above normal) and cold (below normal) winters are reasonably captured by PNU CGCM. For the categorical forecast with $3{\times}3$ contingency table, the deterministic forecast generally shows better performance than probabilistic forecast except for warm winter (hit rate of probabilistic forecast: 71%). It is also found that, in case of concentrated distribution of 40 ensemble members to one category out of the three, the probabilistic forecast tends to have relatively high predictability. Meanwhile, in the case when the ensemble members distribute evenly throughout the categories, the predictability becomes lower in the probabilistic forecast.

Double-Bagging Ensemble Using WAVE

  • Kim, Ahhyoun;Kim, Minji;Kim, Hyunjoong
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.5
    • /
    • pp.411-422
    • /
    • 2014
  • A classification ensemble method aggregates different classifiers obtained from training data to classify new data points. Voting algorithms are typical tools to summarize the outputs of each classifier in an ensemble. WAVE, proposed by Kim et al. (2011), is a new weight-adjusted voting algorithm for ensembles of classifiers with an optimal weight vector. In this study, when constructing an ensemble, we applied the WAVE algorithm on the double-bagging method (Hothorn and Lausen, 2003) to observe if any significant improvement can be achieved on performance. The results showed that double-bagging using WAVE algorithm performs better than other ensemble methods that employ plurality voting. In addition, double-bagging with WAVE algorithm is comparable with the random forest ensemble method when the ensemble size is large.

An Ensemble Classifier using Two Dimensional LDA

  • Park, Cheong-Hee
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.6
    • /
    • pp.817-824
    • /
    • 2010
  • Linear Discriminant Analysis (LDA) has been successfully applied for dimension reduction in face recognition. However, LDA requires the transformation of a face image to a one-dimensional vector and this process can cause the correlation information among neighboring pixels to be disregarded. On the other hand, 2D-LDA uses 2D images directly without a transformation process and it has been shown to be superior to the traditional LDA. Nevertheless, there are some problems in 2D-LDA. First, it is difficult to determine the optimal number of feature vectors in a reduced dimensional space. Second, the size of rectangular windows used in 2D-LDA makes strong impacts on classification accuracies but there is no reliable way to determine an optimal window size. In this paper, we propose a new algorithm to overcome those problems in 2D-LDA. We adopt an ensemble approach which combines several classifiers obtained by utilizing various window sizes. And a practical method to determine the number of feature vectors is also presented. Experimental results demonstrate that the proposed method can overcome the difficulties with choosing an optimal window size and the number of feature vectors.

Sparsity Increases Uncertainty Estimation in Deep Ensemble

  • Dorjsembe, Uyanga;Lee, Ju Hong;Choi, Bumghi;Song, Jae Won
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.373-376
    • /
    • 2021
  • Deep neural networks have achieved almost human-level results in various tasks and have become popular in the broad artificial intelligence domains. Uncertainty estimation is an on-demand task caused by the black-box point estimation behavior of deep learning. The deep ensemble provides increased accuracy and estimated uncertainty; however, linearly increasing the size makes the deep ensemble unfeasible for memory-intensive tasks. To address this problem, we used model pruning and quantization with a deep ensemble and analyzed the effect in the context of uncertainty metrics. We empirically showed that the ensemble members' disagreement increases with pruning, making models sparser by zeroing irrelevant parameters. Increased disagreement implies increased uncertainty, which helps in making more robust predictions. Accordingly, an energy-efficient compressed deep ensemble is appropriate for memory-intensive and uncertainty-aware tasks.

Estimation of bubble size distribution using deep ensemble physics-informed neural network (딥앙상블 물리 정보 신경망을 이용한 기포 크기 분포 추정)

  • Sunyoung Ko;Geunhwan Kim;Jaehyuk Lee;Hongju Gu;Kwangho Moon;Youngmin Choo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.4
    • /
    • pp.305-312
    • /
    • 2023
  • Physics-Informed Neural Network (PINN) is used to invert bubble size distributions from attenuation losses. By considering a linear system for the bubble population inversion, Adaptive Learned Iterative Shrinkage Thresholding Algorithm (Ada-LISTA), which has been solved linear systems in image processing, is used as a neural network architecture in PINN. Furthermore, a regularization based on the linear system is added to a loss function of PINN and it makes a PINN have better generalization by a solution satisfying the bubble physics. To evaluate an uncertainty of bubble estimation, deep ensemble is adopted. 20 Ada-LISTAs with different initial values are trained using the same training dataset. During test with attenuation losses different from those in the training dataset, the bubble size distribution and corresponding uncertainty are indicated by average and variance of 20 estimations, respectively. Deep ensemble Ada-LISTA demonstrate superior performance in inverting bubble size distributions than the conventional convex optimization solver of CVX.