• Title/Summary/Keyword: optimal combination forecasts

Search Result 8, Processing Time 0.023 seconds

IMPROVING THE ESP ACCURACY WITH COMBINATION OF PROBABILISTIC FORECASTS

  • Yu, Seung-Oh;Kim, Young-Oh
    • Water Engineering Research
    • /
    • v.5 no.2
    • /
    • pp.101-109
    • /
    • 2004
  • Aggregating information by combining forecasts from two or more forecasting methods is an alternative to using forecasts from just a single method to improve forecast accuracy. This paper describes the development and use of a monthly inflow forecast model based on an optimal linear combination (OLC) of forecasts derived from naive, persistence, and Ensemble Streamflow Prediction (ESP) forecasts. Using the cross-validation technique, the OLC model made 1-month ahead probabilistic forecasts for the Chungju multi-purpose dam inflows for 15 years. For most of the verification months, the skill associated with the OLC forecast was superior to those drawn from the individual forecast techniques. Therefore this study demonstrates that OLC can improve the accuracy of the ESP forecast, especially during the dry season. This study also examined the value of the OLC forecasts in reservoir operations. Stochastic Dynamic Programming (SDP) derived the optimal operating policy for the Chungju multi-purpose dam operation and the derived policy was simulated using the 15-year observed inflows. The simulation results showed the SDP model that updated its probability from the new OLC forecast provided more efficient operation decisions than the conventional SDP model.

  • PDF

Hierarchical time series forecasting with an application to traffic accident counts (계층적 시계열 분석을 이용한 지역별 교통사고 발생건수 예측)

  • Lee, Jooeun;Seong, Byeongchan
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.181-193
    • /
    • 2017
  • The paper introduces bottom-up and optimal combination methods that can analyze and forecast hierarchical time series. These methods allow forecasts at lower levels to be summed consistently to upper levels without any ad-hoc adjustment. They can also potentially improve forecast performance in comparison to independent forecasts. We forecast regional traffic accident counts as time series data in order to identify efficiency gains from hierarchical forecasting. We observe that bottom-up or optimal combination methods are superior to independent methods in terms of forecast accuracy.

A study on time series linkage in the Household Income and Expenditure Survey (가계동향조사 지출부문 시계열 연계 방안에 관한 연구)

  • Kim, Sihyeon;Seong, Byeongchan;Choi, Young-Geun;Yeo, In-kwon
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.4
    • /
    • pp.553-568
    • /
    • 2022
  • The Household Income and Expenditure Survey is a representative survey of Statistics Korea, which aims to measure and analyze national income and consumption levels and their changes by understanding the current state of household balances. Recently, the disconnection problem in these time series caused by the large-scale reorganization of the survey methods in 2017 and 2019 has become an issue. In this study, we model the characteristics of the time series in the Household Income and Expenditure Survey up to 2016, and use the modeling to compute forecasts for linking the expenditures in 2017 and 2018. In order to evenly reflect the characteristics across all expenditure item series and to reduce the impact of a specific forecast model, we synthesize a total of 8 models such as regression models, time series models, and machine learning techniques. In particular, the noteworthy aspect of this study is that it improves the forecast by using the optimal combination technique that can exactly reflect the hierarchical structure of the Household Income and Expenditure Survey without loss of information as in the top-down or bottom-up methods. As a result of applying the proposed method to forecast expenditure series from 2017 to 2019, it contributed to the recovery of time series linkage and improved the forecast. In addition, it was confirmed that the hierarchical time series forecasts by the optimal combination method make linkage results closer to the actual survey series.

A Baltic Dry Index Prediction using Deep Learning Models

  • Bae, Sung-Hoon;Lee, Gunwoo;Park, Keun-Sik
    • Journal of Korea Trade
    • /
    • v.25 no.4
    • /
    • pp.17-36
    • /
    • 2021
  • Purpose - This study provides useful information to stakeholders by forecasting the tramp shipping market, which is a completely competitive market and has a huge fluctuation in freight rates due to low barriers to entry. Moreover, this study provides the most effective parameters for Baltic Dry Index (BDI) prediction and an optimal model by analyzing and comparing deep learning models such as the artificial neural network (ANN), recurrent neural network (RNN), and long short-term memory (LSTM). Design/methodology - This study uses various data models based on big data. The deep learning models considered are specialized for time series models. This study includes three perspectives to verify useful models in time series data by comparing prediction accuracy according to the selection of external variables and comparison between models. Findings - The BDI research reflecting the latest trends since 2015, using weekly data from 1995 to 2019 (25 years), is employed in this study. Additionally, we tried finding the best combination of BDI forecasts through the input of external factors such as supply, demand, raw materials, and economic aspects. Moreover, the combination of various unpredictable external variables and the fundamentals of supply and demand have sought to increase BDI prediction accuracy. Originality/value - Unlike previous studies, BDI forecasts reflect the latest stabilizing trends since 2015. Additionally, we look at the variation of the model's predictive accuracy according to the input of statistically validated variables. Moreover, we want to find the optimal model that minimizes the error value according to the parameter adjustment in the ANN model. Thus, this study helps future shipping stakeholders make decisions through BDI forecasts.

Optimal Multi-Model Ensemble Model Development Using Hierarchical Bayesian Model Based (Hierarchical Bayesian Model을 이용한 GCMs 의 최적 Multi-Model Ensemble 모형 구축)

  • Kwon, Hyun-Han;Min, Young-Mi;Hameed, Saji N.
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2009.05a
    • /
    • pp.1147-1151
    • /
    • 2009
  • In this study, we address the problem of producing probability forecasts of summer seasonal rainfall, on the basis of Hindcast experiments from a ensemble of GCMs(cwb, gcps, gdaps, metri, msc_gem, msc_gm2, msc_gm3, msc_sef and ncep). An advanced Hierarchical Bayesian weighting scheme is developed and used to combine nine GCMs seasonal hindcast ensembles. Hindcast period is 23 years from 1981 to 2003. The simplest approach for combining GCM forecasts is to weight each model equally, and this approach is referred to as pooled ensemble. This study proposes a more complex approach which weights the models spatially and seasonally based on past model performance for rainfall. The Bayesian approach to multi-model combination of GCMs determines the relative weights of each GCM with climatology as the prior. The weights are chosen to maximize the likelihood score of the posterior probabilities. The individual GCM ensembles, simple poolings of three and six models, and the optimally combined multimodel ensemble are compared.

  • PDF

Analyzing effect and importance of input predictors for urban streamflow prediction based on a Bayesian tree-based model

  • Nguyen, Duc Hai;Bae, Deg-Hyo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.134-134
    • /
    • 2022
  • Streamflow forecasting plays a crucial role in water resource control, especially in highly urbanized areas that are very vulnerable to flooding during heavy rainfall event. In addition to providing the accurate prediction, the evaluation of effects and importance of the input predictors can contribute to water manager. Recently, machine learning techniques have applied their advantages for modeling complex and nonlinear hydrological processes. However, the techniques have not considered properly the importance and uncertainty of the predictor variables. To address these concerns, we applied the GA-BART, that integrates a genetic algorithm (GA) with the Bayesian additive regression tree (BART) model for hourly streamflow forecasting and analyzing input predictors. The Jungrang urban basin was selected as a case study and a database was established based on 39 heavy rainfall events during 2003 and 2020 from the rain gauges and monitoring stations. For the goal of this study, we used a combination of inputs that included the areal rainfall of the subbasins at current time step and previous time steps and water level and streamflow of the stations at time step for multistep-ahead streamflow predictions. An analysis of multiple datasets including different input predictors was performed to define the optimal set for streamflow forecasting. In addition, the GA-BART model could reasonably determine the relative importance of the input variables. The assessment might help water resource managers improve the accuracy of forecasts and early flood warnings in the basin.

  • PDF

Forecasting hierarchical time series for foodborne disease outbreaks (식중독 발생 건수에 대한 계층 시계열 예측)

  • In-Kwon Yeo
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.4
    • /
    • pp.499 -508
    • /
    • 2024
  • In this paper, we investigate hierarchical time series forecasting that adhere to a hierarchical structure when deriving predicted values by analyzing segmented data as well as aggregated datasets. The occurrences of food poisoning by a specific pathogen are analyzed using zero-inflated Poisson regression models and negative binomial regression models. The occurrences of major, miscellaneous, and overall food poisoning are analyzed using Poisson regression models and negative binomial regression models. For hierarchical time series forecasting, the MinT estimation proposed by Wickramasuriya et al. (2019) is employed. Negative predicted values resulting from hierarchical adjustments are adjusted to zero, and weights are multiplied to the remaining lowest-level variables to satisfy the hierarchical structure. Empirical analysis revealed that there is little difference between hierarchical and non-hierarchical adjustments in predictions based on pathogens. However, hierarchical adjustments generally yield superior results for predictions concerning major, miscellaneous, and overall occurrences. Without hierarchical adjustment, instances may occur where the predicted frequencies of the lowest-level variables exceed that of major or miscellaneous occurrences. However, the proposed method enables the acquisition of predictions that adhere to the hierarchical structure.

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.