• Title/Summary/Keyword: statistical forecast model

Search Result 254, Processing Time 0.026 seconds

Forecasting daily PM10 concentrations in Seoul using various data mining techniques

  • Choi, Ji-Eun;Lee, Hyesun;Song, Jongwoo
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.2
    • /
    • pp.199-215
    • /
    • 2018
  • Interest in $PM_{10}$ concentrations have increased greatly in Korea due to recent increases in air pollution levels. Therefore, we consider a forecasting model for next day $PM_{10}$ concentration based on the principal elements of air pollution, weather information and Beijing $PM_{2.5}$. If we can forecast the next day $PM_{10}$ concentration level accurately, we believe that this forecasting can be useful for policy makers and public. This paper is intended to help forecast a daily mean $PM_{10}$, a daily max $PM_{10}$ and four stages of $PM_{10}$ provided by the Ministry of Environment using various data mining techniques. We use seven models to forecast the daily $PM_{10}$, which include five regression models (linear regression, Randomforest, gradient boosting, support vector machine, neural network), and two time series models (ARIMA, ARFIMA). As a result, the linear regression model performs the best in the $PM_{10}$ concentration forecast and the linear regression and Randomforest model performs the best in the $PM_{10}$ class forecast. The results also indicate that the $PM_{10}$ in Seoul is influenced by Beijing $PM_{2.5}$ and air pollution from power stations in the west coast.

Error Forecasting Using Linear Regression Model

  • Ler, Lian Guey;Kim, Byung-Sik;Choi, Gye-Woon;Kang, Byung-Hwa;Kwang, Jung-Jae
    • Journal of Wetlands Research
    • /
    • v.13 no.1
    • /
    • pp.13-23
    • /
    • 2011
  • In this study, Mike11 will be used as the numerical model where a data assimilation method will be applied to it. This paper aims to gain an insight and understanding of data assimilation in flood forecasting models. It will start with a general discussion of data assimilation, followed by a description of the methodology and discussion of the statistical error forecast model used, which in this case is the linear regression. This error forecast model is applied to the water level forecast simulated by MIKE11 to produced improved forecast and validated against real measurements. It is found that there exists a phase error in the improved forecasts. Hence, 2 general formula are used to account for this phase error and they have shown improvement to the accuracy of the forecasts, where one improved the immediate forecast of up to 5 hours while the other improved the estimation of the peak discharge.

Improvement of WRF forecast meteorological data by Model Output Statistics using linear, polynomial and scaling regression methods

  • Jabbari, Aida;Bae, Deg-Hyo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.147-147
    • /
    • 2019
  • The Numerical Weather Prediction (NWP) models determine the future state of the weather by forcing current weather conditions into the atmospheric models. The NWP models approximate mathematically the physical dynamics by nonlinear differential equations; however these approximations include uncertainties. The errors of the NWP estimations can be related to the initial and boundary conditions and model parameterization. Development in the meteorological forecast models did not solve the issues related to the inevitable biases. In spite of the efforts to incorporate all sources of uncertainty into the forecast, and regardless of the methodologies applied to generate the forecast ensembles, they are still subject to errors and systematic biases. The statistical post-processing increases the accuracy of the forecast data by decreasing the errors. Error prediction of the NWP models which is updating the NWP model outputs or model output statistics is one of the ways to improve the model forecast. The regression methods (including linear, polynomial and scaling regression) are applied to the present study to improve the real time forecast skill. Such post-processing consists of two main steps. Firstly, regression is built between forecast and measurement, available during a certain training period, and secondly, the regression is applied to new forecasts. In this study, the WRF real-time forecast data, in comparison with the observed data, had systematic biases; the errors related to the NWP model forecasts were reflected in the underestimation of the meteorological data forecast by the WRF model. The promising results will indicate that the post-processing techniques applied in this study improved the meteorological forecast data provided by WRF model. A comparison between various bias correction methods will show the strength and weakness of the each methods.

  • PDF

The Development of Ensemble Statistical Prediction Model for Changma Precipitation (장마 강수를 위한 앙상블 통계 예측 모델 개발)

  • Kim, Jin-Yong;Seo, Kyong-Hwan
    • Atmosphere
    • /
    • v.24 no.4
    • /
    • pp.533-540
    • /
    • 2014
  • Statistical forecast models for the prediction of the summertime Changma precipitation have been developed in this study. As effective predictors for the Changma precipitation, the springtime sea surface temperature (SST) anomalies over the North Atlantic (NA1), the North Pacific (NPC) and the tropical Pacific Ocean (CNINO) has been suggested in Lee and Seo (2013). To further improve the performance of the statistical prediction scheme, we select other potential predictors and construct 2 additional statistical models. The selected predictors are the Northern Indian Ocean (NIO) and the Bering Sea (BS) SST anomalies, and the spring Eurasian snow cover anomaly (EUSC). Then, using the total three statistical prediction models, a simple ensemble-mean prediction is performed. The resulting correlation skill score reaches as high as ~0.90 for the last 21 years, which is ~16% increase in the skill compared to the prediction model by Lee and Seo (2013). The EUSC and BS predictors are related to a strengthening of the Okhotsk high, leading to an enhancement of the Changma front. The NIO predictor induces the cyclonic anomalies to the southwest of the Korean peninsula and southeasterly flows toward the peninsula, giving rise to an increase in the Changma precipitation.

The roles of differencing and dimension reduction in machine learning forecasting of employment level using the FRED big data

  • Choi, Ji-Eun;Shin, Dong Wan
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.5
    • /
    • pp.497-506
    • /
    • 2019
  • Forecasting the U.S. employment level is made using machine learning methods of the artificial neural network: deep neural network, long short term memory (LSTM), gated recurrent unit (GRU). We consider the big data of the federal reserve economic data among which 105 important macroeconomic variables chosen by McCracken and Ng (Journal of Business and Economic Statistics, 34, 574-589, 2016) are considered as predictors. We investigate the influence of the two statistical issues of the dimension reduction and time series differencing on the machine learning forecast. An out-of-sample forecast comparison shows that (LSTM, GRU) with differencing performs better than the autoregressive model and the dimension reduction improves long-term forecasts and some short-term forecasts.

Forecast of Korea Defense Expenditures based on Time Series Models

  • Park, Kyung Ok;Jung, Hye-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.1
    • /
    • pp.31-40
    • /
    • 2015
  • This study proposes a mathematical model that can forecast national defense expenditures. The ongoing European debt crisis weighs heavily on markets; consequently, government spending in many countries will be constrained. However, a forecasting model to predict military spending is acutely needed for South Korea because security threats still exist and the estimation of military spending at a reasonable level is closely related to economic growth. This study establishes two models: an Auto-Regressive Moving Average model (ARIMA) based on past military expenditures and Transfer Function model with the Gross Domestic Product (GDP), exchange rate and consumer price index as input time series. The proposed models use defense spending data as of 2012 to create defense expenditure forecasts up to 2025.

Improvement of Wave Height Mid-term Forecast for Maintenance Activities in Southwest Offshore Wind Farm (서남권 해상풍력단지 유지보수 활동을 위한 중기 파고 예보 개선)

  • Ji-Young Kim;Ho-Yeop Lee;In-Seon Suh;Da-Jeong Park;Keum-Seok Kang
    • Journal of Wind Energy
    • /
    • v.14 no.3
    • /
    • pp.25-33
    • /
    • 2023
  • In order to secure the safety of increasing offshore activities such as offshore wind farm maintenance and fishing, IMPACT, a mid-term marine weather forecasting system, was established by predicting marine weather up to 7 days in advance. Forecast data from the Korea Hydrographic and Oceanographic Agency (KHOA), which provides the most reliable marine meteorological service in Korea, was used, but wind speed and wave height forecast errors increased as the leading forecast period increased, so improvement of the accuracy of the model results was needed. The Model Output Statistics (MOS) method, a post-correction method using statistical machine learning, was applied to improve the prediction accuracy of wave height, which is an important factor in forecasting the risk of marine activities. Compared with the observed data, the wave height prediction results by the model before correction for 6 to 7 days ahead showed an RMSE of 0.692 m and R of 0.591, and there was a tendency to underestimate high waves. After correction with the MOS technique, RMSE was 0.554 m and R was 0.732, confirming that accuracy was significantly improved.

A Combination and Calibration of Multi-Model Ensemble of PyeongChang Area Using Ensemble Model Output Statistics (Ensemble Model Output Statistics를 이용한 평창지역 다중 모델 앙상블 결합 및 보정)

  • Hwang, Yuseon;Kim, Chansoo
    • Atmosphere
    • /
    • v.28 no.3
    • /
    • pp.247-261
    • /
    • 2018
  • The objective of this paper is to compare probabilistic temperature forecasts from different regional and global ensemble prediction systems over PyeongChang area. A statistical post-processing method is used to take into account combination and calibration of forecasts from different numerical prediction systems, laying greater weight on ensemble model that exhibits the best performance. Observations for temperature were obtained from the 30 stations in PyeongChang and three different ensemble forecasts derived from the European Centre for Medium-Range Weather Forecasts, Ensemble Prediction System for Global and Limited Area Ensemble Prediction System that were obtained between 1 May 2014 and 18 March 2017. Prior to applying to the post-processing methods, reliability analysis was conducted to identify the statistical consistency of ensemble forecasts and corresponding observations. Then, ensemble model output statistics and bias-corrected methods were applied to each raw ensemble model and then proposed weighted combination of ensembles. The results showed that the proposed methods provide improved performances than raw ensemble mean. In particular, multi-model forecast based on ensemble model output statistics was superior to the bias-corrected forecast in terms of deterministic prediction.

Development of statistical forecast model for PM10 concentration over Seoul (서울지역 PM10 농도 예측모형 개발)

  • Sohn, Keon Tae;Kim, Dahong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.2
    • /
    • pp.289-299
    • /
    • 2015
  • The objective of the present study is to develop statistical quantitative forecast model for PM10 concentration over Seoul. We used three types of data (weather observation data in Korea, the China's weather observation data collected by GTS, and air quality numerical model forecasts). To apply the daily forecast system, hourly data are converted to daily data and then lagging was performed. The potential predictors were selected based on correlation analysis and multicollinearity check. Model validation has been performed for checking model stability. We applied two models (multiple regression model and threshold regression model) separately. The two models were compared based on the scatter plot of forecasts and observations, time series plots, RMSE, skill scores. As a result, a threshold regression model performs better than multiple regression model in high PM10 concentration cases.

A Study on an Automatical BKLS Measurement By Programming Technology

  • Shin, YeounOuk;Kim, KiBum
    • International journal of advanced smart convergence
    • /
    • v.7 no.3
    • /
    • pp.73-78
    • /
    • 2018
  • This study focuses on presenting the IT program module provided by BKLS measure in order to solve the problem of capital cost due to information asymmetry of external investors and corporate executives. Barron at al(1998) set up a BKLS measure to guide the market by intermediate analysts. The BKLS measure was measured by using the changes in the analyst forecast dispersion and analyst mean forecast error squared. This study suggests a model of the algorithm that the BKLS measure can be provided to all investors immediately by IT program in order to deliver the meaningful value in the domestic capital market as measured. This is a method of generating and analyzing real-time or non-real-time prediction models by transferring the predicted estimates delivered to the Big Data Log Analysis System through the statistical DB to the statistical forecasting engine. Because BKLS measure is not carried out in a concrete method, it is practically very difficult to estimate the BKLS measure. It is expected that the BKLS measure of Barron at al(1998) introduced in this study and the model of IT module provided in real time will be the starting point for the follow-up study for the introduction and realization of IT technology in the future.