• Title/Summary/Keyword: 앙상블 예측

Search Result 328, Processing Time 0.025 seconds

A Study on the Work-time Estimation for Block Erections Using Stacking Ensemble Learning (Stacking Ensemble Learning을 활용한 블록 탑재 시수 예측)

  • Kwon, Hyukcheon;Ruy, Wonsun
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.56 no.6
    • /
    • pp.488-496
    • /
    • 2019
  • The estimation of block erection work time at a dock is one of the important factors when establishing or managing the total shipbuilding schedule. In order to predict the work time, it is a natural approach that the existing block erection data would be used to solve the problem. Generally the work time per unit is the product of coefficient value, quantity, and product value. Previously, the work time per unit is determined statistically by unit load data. However, we estimate the work time per unit through work time coefficient value from series ships using machine learning. In machine learning, the outcome depends mainly on how the training data is organized. Therefore, in this study, we use 'Feature Engineering' to determine which one should be used as features, and to check their influence on the result. In order to get the coefficient value of each block, we try to solve this problem through the Ensemble learning methods which is actively used nowadays. Among the many techniques of Ensemble learning, the final model is constructed by Stacking Ensemble techniques, consisting of the existing Ensemble models (Decision Tree, Random Forest, Gradient Boost, Square Loss Gradient Boost, XG Boost), and the accuracy is maximized by selecting three candidates among all models. Finally, the results of this study are verified by the predicted total work time for one ship among the same series.

Improvement of Disaster Prevention Performance Target Rainfall Considering Climate Change (기후변화를 고려한 방재성능목표 강우량 개선 방향)

  • Lee, Jeonghoon;Kim, Kyungmin;Kim, Sangdan
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.175-175
    • /
    • 2018
  • 우리나라에서 발생하는 대규모 자연재해의 상당부분은 강우에 의한 홍수피해이다. 최근 이러한 홍수피해는 기후변화와 더불어 극한강우 현상의 빈발에 의한 새로운 재해양상으로 전개되고 있으며, 이에 따라 정부에서도 재해발생시 원상복구의 개념이 아닌 항구복구의 개념으로 복구사업을 수행하고 있다. 그러나 설계에 기후변화에 대한 영향을 반영하고 있지 못하기 때문에 기후변화에 의하여 미래에 발생할 극한강우로 반복적인 피해가 예상되고 있으므로 기존의 방재성능목표 강우량의 설정 방법에 대한 개선이 필요하다. 전 세계적으로 이러한 기후변화에 의한 현상을 모의하기 위한 연구로 전지구기후모델(Global Climate Model, 이하 GCM)과 지역기후모델(Reginal Climate Model, 이하 RCM)을 사용하고 있다.우리나라 기상청에서도 CMIP5 국제사업의 표준 실험체계를 통해 전지구 기후변화 시나리오 산출을 위해서 영국 기상청 해들리센터의 GCM인 HadGEM2-AO를 도입하였다. 또한 한반도 기후변화 시나리오를 산출하기 위해 HadGEM3-RA 모형을 이용하여 전지구 기후변화 시나리오를 역학적으로 상세화하고 이를 한반도에 대해 12.5km 공간 해상도로 일 자료를 제공하고 있다. 하지만 유역규모 혹은 지점규모에서 사용하기 위해서는 이러한 일자료의 시 공간적인 상세화기법이 요구된다. 본 연구에서는 기후변화를 고려한 방재성능목표 강우량 개선 방향을 제안하기 위해 다양한 연구단에서 도출된 상세화 결과를 수집하고 비교분석을 통해 기후변화를 고려하고자 하였다. 다양한 연구기관에서 생산된 미래 확률 전망을 살펴본 결과, 동일한 GCM자료를 사용하더라도 상세화 방법론에 따라 서로 다른 결과가 도출되는 것을 확인하였다. 미래 예측의 불확실성을 고려하면 특정한 방법론이 우수하다고 평가하기는 어려움에 따라 앙상블 평균을 활용한 개선방향을 제안한다. 본 연구의 결과는 전국 지자체의 강우특성만을 고려한 것으로, 연안지역의 경우 해수면 상승을 고려하여 추가적인 대책이 필요할 것으로 판단된다.

  • PDF

Evaluation of PNU CGCM Ensemble Forecast System for Boreal Winter Temperature over South Korea (PNU CGCM 앙상블 예보 시스템의 겨울철 남한 기온 예측 성능 평가)

  • Ahn, Joong-Bae;Lee, Joonlee;Jo, Sera
    • Atmosphere
    • /
    • v.28 no.4
    • /
    • pp.509-520
    • /
    • 2018
  • The performance of the newly designed Pusan National University Coupled General Circulation Model (PNU CGCM) Ensemble Forecast System which produce 40 ensemble members for 12-month lead prediction is evaluated and analyzed in terms of boreal winter temperature over South Korea (S. Korea). The influence of ensemble size on prediction skill is examined with 40 ensemble members and the result shows that spreads of predictability are larger when the size of ensemble member is smaller. Moreover, it is suggested that more than 20 ensemble members are required for better prediction of statistically significant inter-annual variability of wintertime temperature over S. Korea. As for the ensemble average (ENS), it shows superior forecast skill compared to each ensemble member and has significant temporal correlation with Automated Surface Observing System (ASOS) temperature at 99% confidence level. In addition to forecast skill for inter-annual variability of wintertime temperature over S. Korea, winter climatology around East Asia and synoptic characteristics of warm (above normal) and cold (below normal) winters are reasonably captured by PNU CGCM. For the categorical forecast with $3{\times}3$ contingency table, the deterministic forecast generally shows better performance than probabilistic forecast except for warm winter (hit rate of probabilistic forecast: 71%). It is also found that, in case of concentrated distribution of 40 ensemble members to one category out of the three, the probabilistic forecast tends to have relatively high predictability. Meanwhile, in the case when the ensemble members distribute evenly throughout the categories, the predictability becomes lower in the probabilistic forecast.

Development of Deep Learning Ensemble Modeling for Cryptocurrency Price Prediction : Deep 4-LSTM Ensemble Model (암호화폐 가격 예측을 위한 딥러닝 앙상블 모델링 : Deep 4-LSTM Ensemble Model)

  • Choi, Soo-bin;Shin, Dong-hoon;Yoon, Sang-Hyeak;Kim, Hee-Woong
    • Journal of Information Technology Services
    • /
    • v.19 no.6
    • /
    • pp.131-144
    • /
    • 2020
  • As the blockchain technology attracts attention, interest in cryptocurrency that is received as a reward is also increasing. Currently, investments and transactions are continuing with the expectation and increasing value of cryptocurrency. Accordingly, prediction for cryptocurrency price has been attempted through artificial intelligence technology and social sentiment analysis. The purpose of this paper is to develop a deep learning ensemble model for predicting the price fluctuations and one-day lag price of cryptocurrency based on the design science research method. This paper intends to perform predictive modeling on Ethereum among cryptocurrencies to make predictions more efficiently and accurately than existing models. Therefore, it collects data for five years related to Ethereum price and performs pre-processing through customized functions. In the model development stage, four LSTM models, which are efficient for time series data processing, are utilized to build an ensemble model with the optimal combination of hyperparameters found in the experimental process. Then, based on the performance evaluation scale, the superiority of the model is evaluated through comparison with other deep learning models. The results of this paper have a practical contribution that can be used as a model that shows high performance and predictive rate for cryptocurrency price prediction and price fluctuations. Besides, it shows academic contribution in that it improves the quality of research by following scientific design research procedures that solve scientific problems and create and evaluate new and innovative products in the field of information systems.

Development of AI-based Smart Agriculture Early Warning System

  • Hyun Sim;Hyunwook Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.12
    • /
    • pp.67-77
    • /
    • 2023
  • This study represents an innovative research conducted in the smart farm environment, developing a deep learning-based disease and pest detection model and applying it to the Intelligent Internet of Things (IoT) platform to explore new possibilities in the implementation of digital agricultural environments. The core of the research was the integration of the latest ImageNet models such as Pseudo-Labeling, RegNet, EfficientNet, and preprocessing methods to detect various diseases and pests in complex agricultural environments with high accuracy. To this end, ensemble learning techniques were applied to maximize the accuracy and stability of the model, and the model was evaluated using various performance indicators such as mean Average Precision (mAP), precision, recall, accuracy, and box loss. Additionally, the SHAP framework was utilized to gain a deeper understanding of the model's prediction criteria, making the decision-making process more transparent. This analysis provided significant insights into how the model considers various variables to detect diseases and pests.

A Real-Time Stock Market Prediction Using Knowledge Accumulation (지식 누적을 이용한 실시간 주식시장 예측)

  • Kim, Jin-Hwa;Hong, Kwang-Hun;Min, Jin-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.109-130
    • /
    • 2011
  • One of the major problems in the area of data mining is the size of the data, as most data set has huge volume these days. Streams of data are normally accumulated into data storages or databases. Transactions in internet, mobile devices and ubiquitous environment produce streams of data continuously. Some data set are just buried un-used inside huge data storage due to its huge size. Some data set is quickly lost as soon as it is created as it is not saved due to many reasons. How to use this large size data and to use data on stream efficiently are challenging questions in the study of data mining. Stream data is a data set that is accumulated to the data storage from a data source continuously. The size of this data set, in many cases, becomes increasingly large over time. To mine information from this massive data, it takes too many resources such as storage, money and time. These unique characteristics of the stream data make it difficult and expensive to store all the stream data sets accumulated over time. Otherwise, if one uses only recent or partial of data to mine information or pattern, there can be losses of valuable information, which can be useful. To avoid these problems, this study suggests a method efficiently accumulates information or patterns in the form of rule set over time. A rule set is mined from a data set in stream and this rule set is accumulated into a master rule set storage, which is also a model for real-time decision making. One of the main advantages of this method is that it takes much smaller storage space compared to the traditional method, which saves the whole data set. Another advantage of using this method is that the accumulated rule set is used as a prediction model. Prompt response to the request from users is possible anytime as the rule set is ready anytime to be used to make decisions. This makes real-time decision making possible, which is the greatest advantage of this method. Based on theories of ensemble approaches, combination of many different models can produce better prediction model in performance. The consolidated rule set actually covers all the data set while the traditional sampling approach only covers part of the whole data set. This study uses a stock market data that has a heterogeneous data set as the characteristic of data varies over time. The indexes in stock market data can fluctuate in different situations whenever there is an event influencing the stock market index. Therefore the variance of the values in each variable is large compared to that of the homogeneous data set. Prediction with heterogeneous data set is naturally much more difficult, compared to that of homogeneous data set as it is more difficult to predict in unpredictable situation. This study tests two general mining approaches and compare prediction performances of these two suggested methods with the method we suggest in this study. The first approach is inducing a rule set from the recent data set to predict new data set. The seocnd one is inducing a rule set from all the data which have been accumulated from the beginning every time one has to predict new data set. We found neither of these two is as good as the method of accumulated rule set in its performance. Furthermore, the study shows experiments with different prediction models. The first approach is building a prediction model only with more important rule sets and the second approach is the method using all the rule sets by assigning weights on the rules based on their performance. The second approach shows better performance compared to the first one. The experiments also show that the suggested method in this study can be an efficient approach for mining information and pattern with stream data. This method has a limitation of bounding its application to stock market data. More dynamic real-time steam data set is desirable for the application of this method. There is also another problem in this study. When the number of rules is increasing over time, it has to manage special rules such as redundant rules or conflicting rules efficiently.

Characteristics of Signal-to-Noise Paradox and Limits of Potential Predictive Skill in the KMA's Climate Prediction System (GloSea) through Ensemble Expansion (기상청 기후예측시스템(GloSea)의 앙상블 확대를 통해 살펴본 신호대잡음의 역설적 특징(Signal-to-Noise Paradox)과 예측 스킬의 한계)

  • Yu-Kyung Hyun;Yeon-Hee Park;Johan Lee;Hee-Sook Ji;Kyung-On Boo
    • Atmosphere
    • /
    • v.34 no.1
    • /
    • pp.55-67
    • /
    • 2024
  • This paper aims to provide a detailed introduction to the concept of the Ratio of Predictable Component (RPC) and the Signal-to-Noise Paradox. Then, we derive insights from them by exploring the paradoxical features by conducting a seasonal and regional analysis through ensemble expansion in KMA's climate prediction system (GloSea). We also provide an explanation of the ensemble generation method, with a specific focus on stochastic physics. Through this study, we can provide the predictability limits of our forecasting system, and find way to enhance it. On a global scale, RPC reaches a value of 1 when the ensemble is expanded to a maximum of 56 members, underlining the significance of ensemble expansion in the climate prediction system. The feature indicating RPC paradoxically exceeding 1 becomes particularly evident in the winter North Atlantic and the summer North Pacific. In the Siberian Continent, predictability is notably low, persisting even as the ensemble size increases. This region, characterized by a low RPC, is considered challenging for making reliable predictions, highlighting the need for further improvement in the model and initialization processes related to land processes. In contrast, the tropical ocean demonstrates robust predictability while maintaining an RPC of 1. Through this study, we have brought to attention the limitations of potential predictability within the climate prediction system, emphasizing the necessity of leveraging predictable signals with high RPC values. We also underscore the importance of continuous efforts aimed at improving models and initializations to overcome these limitations.

Comparison of the Performance of Machine Learning Models for TOC Prediction Based on Input Variable Composition (입력변수 구성에 따른 총유기탄소(TOC) 예측 머신러닝 모형의 성능 비교)

  • Sohyun Lee;Jungsu Park
    • Journal of the Korea Organic Resources Recycling Association
    • /
    • v.32 no.3
    • /
    • pp.19-29
    • /
    • 2024
  • Total organic carbon (TOC) represents the total amount of organic carbon contained in water and is a key water quality parameter used, along with biochemical oxygen demand (BOD) and chemical oxygen demand (COD), to quantify the amount of organic matter in water. In this study, a model to predict TOC was developed using XGBoost (XGB), a representative ensemble machine learning algorithm. Independent variables for model construction included water temperature, pH, electrical conductivity, dissolved oxygen concentration, BOD, COD, suspended solids, total nitrogen, total phosphorus, and discharge. To quantitatively analyze the impact of various water quality parameters used in model construction, the feature importance of input variables was calculated. Based on the results of feature importance analysis, items with low importance were sequentially excluded to observe changes in model performance. When built by sequentially excluding items with low importance, the performance of the model showed a root mean squared error-observation standard deviation ratio (RSR) range of 0.53 to 0.55. The model that applied all input variables showed the best performance with an RSR value of 0.53. To enhance the model's field applicability, models using relatively easily measurable parameters were also built, and the performance changes were analyzed. The results showed that a model constructed using only the relatively easily measurable parameters of water temperature, electrical conductivity, pH, dissolved oxygen concentration, and suspended solids had an RSR of 0.72. This indicates that stable performance can be achieved using relatively easily measurable field water quality parameters.

A Characterization of Oil Sand Reservoir and Selections of Optimal SAGD Locations Based on Stochastic Geostatistical Predictions (지구통계 기법을 이용한 오일샌드 저류층 해석 및 스팀주입중력법을 이용한 비투멘 회수 적지 선정 사전 연구)

  • Jeong, Jina;Park, Eungyu
    • Economic and Environmental Geology
    • /
    • v.46 no.4
    • /
    • pp.313-327
    • /
    • 2013
  • In the study, three-dimensional geostatistical simulations on McMurray Formation which is the largest oil sand reservoir in Athabasca area, Canada were performed, and the optimal site for steam assisted gravity drainage (SAGD) was selected based on the predictions. In the selection, the factors related to the vertical extendibility of steam chamber were considered as the criteria for an optimal site. For the predictions, 110 borehole data acquired from the study area were analyzed in the Markovian transition probability (TP) framework and three-dimensional distributions of the composing media were predicted stochastically through an existing TP based geostatistical model. The potential of a specific medium at a position within the prediction domain was estimated from the ensemble probability based on the multiple realizations. From the ensemble map, the cumulative thickness of the permeable media (i.e. Breccia and Sand) was analyzed and the locations with the highest potential for SAGD applications were delineated. As a supportive criterion for an optimal SAGD site, mean vertical extension of a unit permeable media was also delineated through transition rate based computations. The mean vertical extension of a permeable media show rough agreement with the cumulative thickness in their general distribution. However, the distributions show distinctive disagreement at a few locations where the cumulative thickness was higher due to highly alternating juxtaposition of the permeable and the less permeable media. This observation implies that the cumulative thickness alone may not be a sufficient criterion for an optimal SAGD site and the mean vertical extension of the permeable media needs to be jointly considered for the sound selections.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.