• Title/Summary/Keyword: Statistical Forecasting

Search Result 480, Processing Time 0.026 seconds

Real-time private consumption prediction using big data (빅데이터를 이용한 실시간 민간소비 예측)

  • Seung Jun Shin;Beomseok Seo
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.1
    • /
    • pp.13-38
    • /
    • 2024
  • As economic uncertainties have increased recently due to COVID-19, there is a growing need to quickly grasp private consumption trends that directly reflect the economic situation of private economic entities. This study proposes a method of estimating private consumption in real-time by comprehensively utilizing big data as well as existing macroeconomic indicators. In particular, it is intended to improve the accuracy of private consumption estimation by comparing and analyzing various machine learning methods that are capable of fitting ultra-high-dimensional big data. As a result of the empirical analysis, it has been demonstrated that when the number of covariates including big data is large, variables can be selected in advance and used for model fit to improve private consumption prediction performance. In addition, as the inclusion of big data greatly improves the predictive performance of private consumption after COVID-19, the benefit of big data that reflects new information in a timely manner has been shown to increase when economic uncertainty is high.

Online news-based stock price forecasting considering homogeneity in the industrial sector (산업군 내 동질성을 고려한 온라인 뉴스 기반 주가예측)

  • Seong, Nohyoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.1-19
    • /
    • 2018
  • Since stock movements forecasting is an important issue both academically and practically, studies related to stock price prediction have been actively conducted. The stock price forecasting research is classified into structured data and unstructured data, and it is divided into technical analysis, fundamental analysis and media effect analysis in detail. In the big data era, research on stock price prediction combining big data is actively underway. Based on a large number of data, stock prediction research mainly focuses on machine learning techniques. Especially, research methods that combine the effects of media are attracting attention recently, among which researches that analyze online news and utilize online news to forecast stock prices are becoming main. Previous studies predicting stock prices through online news are mostly sentiment analysis of news, making different corpus for each company, and making a dictionary that predicts stock prices by recording responses according to the past stock price. Therefore, existing studies have examined the impact of online news on individual companies. For example, stock movements of Samsung Electronics are predicted with only online news of Samsung Electronics. In addition, a method of considering influences among highly relevant companies has also been studied recently. For example, stock movements of Samsung Electronics are predicted with news of Samsung Electronics and a highly related company like LG Electronics.These previous studies examine the effects of news of industrial sector with homogeneity on the individual company. In the previous studies, homogeneous industries are classified according to the Global Industrial Classification Standard. In other words, the existing studies were analyzed under the assumption that industries divided into Global Industrial Classification Standard have homogeneity. However, existing studies have limitations in that they do not take into account influential companies with high relevance or reflect the existence of heterogeneity within the same Global Industrial Classification Standard sectors. As a result of our examining the various sectors, it can be seen that there are sectors that show the industrial sectors are not a homogeneous group. To overcome these limitations of existing studies that do not reflect heterogeneity, our study suggests a methodology that reflects the heterogeneous effects of the industrial sector that affect the stock price by applying k-means clustering. Multiple Kernel Learning is mainly used to integrate data with various characteristics. Multiple Kernel Learning has several kernels, each of which receives and predicts different data. To incorporate effects of target firm and its relevant firms simultaneously, we used Multiple Kernel Learning. Each kernel was assigned to predict stock prices with variables of financial news of the industrial group divided by the target firm, K-means cluster analysis. In order to prove that the suggested methodology is appropriate, experiments were conducted through three years of online news and stock prices. The results of this study are as follows. (1) We confirmed that the information of the industrial sectors related to target company also contains meaningful information to predict stock movements of target company and confirmed that machine learning algorithm has better predictive power when considering the news of the relevant companies and target company's news together. (2) It is important to predict stock movements with varying number of clusters according to the level of homogeneity in the industrial sector. In other words, when stock prices are homogeneous in industrial sectors, it is important to use relational effect at the level of industry group without analyzing clusters or to use it in small number of clusters. When the stock price is heterogeneous in industry group, it is important to cluster them into groups. This study has a contribution that we testified firms classified as Global Industrial Classification Standard have heterogeneity and suggested it is necessary to define the relevance through machine learning and statistical analysis methodology rather than simply defining it in the Global Industrial Classification Standard. It has also contribution that we proved the efficiency of the prediction model reflecting heterogeneity.

Predicting stock movements based on financial news with systematic group identification (시스템적인 군집 확인과 뉴스를 이용한 주가 예측)

  • Seong, NohYoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.1-17
    • /
    • 2019
  • Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.

Geographical Characteristics of PM2.5, PM10 and O3 Concentrations Measured at the Air Quality Monitoring Systems in the Seoul Metropolitan Area (수도권 지역 도시대기측정소 PM2.5, PM10, O3 농도의 지리적 분포 특성)

  • Kang, Jung-Eun;Mun, Da-Som;Kim, Jae-Jin;Choi, Jin-Young;Lee, Jae-Bum;Lee, Dae-Gyun
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.3
    • /
    • pp.657-664
    • /
    • 2021
  • In this study, we investigated the relationships between the air quality (PM2.5, PM10, O3) concentrations and local geographical characteristics (terrain heights, building area ratios, population density in 9 km × 9 km gridded subareas) in the Seoul metropolitan area. To analyze the terrain heights and building area ratios, we used the geographic information system data provided by the NGII (National Geographic Information Institute). Also, we used the administrative districts and population provided by KOSIS (Korean Statistical Information Service) to estimate population densities. We analyzed the PM2.5, PM10, and O3 concentrations measured at the 146 AQMSs (air quality monitoring system) within the Seoul metropolitan area. The analysis period is from January 2010 to December 2020, and the monthly concentrations were calculated by averaging the hourly concentrations. The terrain is high in the northern and eastern parts of Gyeonggi-do and low near the west coastline. The distributions of building area ratios and population densities were similar to each other. During the analysis period, the monthly PM2.5 and PM10 concentrations at 146 AQMSs were high from January to March. The O3 concentrations were high from April to June. The population densities were negatively correlated with PM2.5, PM10, and O3 concentrations (weakly with PM2.5 and PM10 but strongly with O3). On the other hand, the AQMS heights showed no significant correlation with the pollutant concentrations, implying that further studies on the relationship between terrain heights and pollutant concentrations should be accompanied.

Long-term forecasting reference evapotranspiration using statistically predicted temperature information (통계적 기온예측정보를 활용한 기준증발산량 장기예측)

  • Kim, Chul-Gyum;Lee, Jeongwoo;Lee, Jeong Eun;Kim, Hyeonjun
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.12
    • /
    • pp.1243-1254
    • /
    • 2021
  • For water resources operation or agricultural water management, it is important to accurately predict evapotranspiration for a long-term future over a seasonal or monthly basis. In this study, reference evapotranspiration forecast (up to 12 months in advance) was performed using statistically predicted monthly temperatures and temperature-based Hamon method for the Han River basin. First, the daily maximum and minimum temperature data for 15 meterological stations in the basin were derived by spatial-temporal downscaling the monthly temperature forecasts. The results of goodness-of-fit test for the downscaled temperature data at each site showed that the percent bias (PBIAS) ranged from 1.3 to 6.9%, the ratio of the root mean square error to the standard deviation of the observations (RSR) ranged from 0.22 to 0.27, the Nash-Sutcliffe efficiency (NSE) ranged from 0.93 to 0.95, and the Pearson correlation coefficient (r) ranged from 0.97 to 0.98 for the monthly average daily maximum temperature. And for the monthly average daily minimum temperature, PBIAS was 7.8 to 44.7%, RSR was 0.21 to 0.25, NSE was 0.94 to 0.96, and r was 0.98 to 0.99. The difference by site was not large, and the downscaled results were similar to the observations. In the results of comparing the forecasted reference evapotranspiration calculated using the downscaled data with the observed values for the entire region, PBIAS was 2.2 to 5.4%, RSR was 0.21 to 0.28, NSE was 0.92 to 0.96, and r was 0.96 to 0.98, indicating a very high fit. Due to the characteristics of the statistical models and uncertainty in the downscaling process, the predicted reference evapotranspiration may slightly deviate from the observed value in some periods when temperatures completely different from the past are observed. However, considering that it is a forecast result for the future period, it will be sufficiently useful as information for the evaluation or operation of water resources in the future.

Correction of Latent Errors in Pavement Deterioration Data using Statistical Methods (통계기법을 활용한 포장파손자료의 잠재오차 보정)

  • Han, Daeseok;Do, Myungsik
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.32 no.6D
    • /
    • pp.587-598
    • /
    • 2012
  • Successful implementation of infrastructure asset management system can be started with rich and reliable data. However, measurement errors in the data have always existed in the real world caused for many unknown reasons. It disturbs maintenance activities of agencies, and makes negative effects to reliability of research results on forecasting deterioration process and life cycle cost. Above all, it makes a contradiction that road agencies cannot believe their inspection data surveyed by their hands. It is particularly serious in the road pavement management field. Although road agencies are well recognized the fact, inspecting without measurement error would be a great challenge. Considering the facts, this paper aimed to suggest statistical error processing methods to correct latent error included in pavement surface inspection data. As alternatives, this paper suggested two methods based on probability distribution to consider structure of error and reliability of the data. The suggested methods were empirically tested by using pavement inspection data from Korean National Highway. As the result, this paper confirmed that conventional error processing that just removes only visible errors is not enough to cover uncertainty in pavement deterioration process. The suggested methods would be useful for improving reliability of analysis results required for road infrastructure asset management.

Application and First Evaluation of the Operational RAMS Model for the Dispersion Forecast of Hazardous Chemicals - Validation of the Operational Wind Field Generation System in CARIS (유해화학물질 대기확산 예측을 위한 RAMS 기상모델의 적용 및 평가 - CARIS의 바람장 모델 검증)

  • Kim, C.H.;Na, J.G.;Park, C.J.;Park, J.H.;Im, C.S.;Yoon, E.;Kim, M.S.;Park, C.H.;Kim, Y.J.
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.19 no.5
    • /
    • pp.595-610
    • /
    • 2003
  • The statistical indexes such as RMSE (Root Mean Square Error), Mean Bias error, and IOA (Index of agreement) are used to evaluate 3 Dimensional wind and temperature fields predicted by operational meteorological model RAMS (Regional Atmospheric Meteorological System) implemented in CARIS (Chemical Accident Response Information System) for the dispersion forecast of hazardous chemicals in case of the chemical accidents in Korea. The operational atmospheric model, RAMS in CARIS are designed to use GDAPS, GTS, and AWS meteorological data obtained from KMA (Korean Meteorological Administration) for the generation of 3-dimensional initial meteorological fields. The predicted meteorological variables such as wind speed, wind direction, temperature, and precipitation amount, during 19 ∼ 23, August 2002, are extracted at the nearest grid point to the meteorological monitoring sites, and validated against the observations located over the Korean peninsula. The results show that Mean bias and Root Mean Square Error are 0.9 (m/s), 1.85 (m/s) for wind speed at 10 m above the ground, respectively, and 1.45 ($^{\circ}C$), 2.82 ($^{\circ}C$) for surface temperature. Of particular interest is the distribution of forecasting error predicted by RAMS with respect to the altitude; relatively smaller error is found in the near-surface atmosphere for wind and temperature fields, while it grows larger as the altitude increases. Overall, some of the overpredictions in comparisons with the observations are detected for wind and temperature fields, whereas relatively small errors are found in the near-surface atmosphere. This discrepancies are partly attributed to the oversimplified spacing of soil, soil contents and initial temperature fields, suggesting some improvement could probably be gained if the sub-grid scale nature of moisture and temperature fields was taken into account. However, IOA values for the wind field (0.62) as well as temperature field (0.78) is greater than the 'good' value criteria (> 0.5) implied by other studies. The good value of IOA along with relatively small wind field error in the near surface atmosphere implies that, on the basis of current meteorological data for initial fields, RAMS has good potentials to be used as a operational meteorological model in predicting the urban or local scale 3-dimensional wind fields for the dispersion forecast in association with hazardous chemical releases in Korea.

Forecasting Vacant Technology of Patent Analysis System using Self Organizing Map and Matrix Analysis (자기조직화 지도와 매트릭스분석을 이용한 특허분석시스템의 공백기술 예측)

  • Jun, Sung-Hae;Park, Sang-Sung;Shin, Young-Geun;Jang, Dong-Sik;Chung, Ho-Seok
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.2
    • /
    • pp.462-480
    • /
    • 2010
  • Patent analysis is the extracting knowledge which is needed for the company's research and development strategy through accumulated worldwide patent database. In order to set the future direction of corresponding technology which is scheduled to be developed, the technology trends and deployment processes are identified by analyzing results of present patent applications. The patent analysis provides the required results for analyzing present patent applications. In this paper, we will carry out technology classification for related patent analysis methods and systems. Moreover we will investigate and analyze related domestic patents, U.S. patents and IEEE papers. Due to the characteristics of technology sector, not only patents are applied but also research papers are released actively about patent analysis system. We will analyze patents according to the technology classification by using the final searching results which come from the selected search words in this study. To find necessary niche technology which is needed for patent analysis system, matrix analysis was performed to all of valid patents and papers. Identifying the technology development trends of registered patent analysis systems, and presenting the future direction of technology development which is related to patent analysis system. To figure out the technology which is developed relatively weak based on domestic patents, U.S patent and research papers by analyzing the valid patents and papers with statistical test and self-organizing map quantitatively. Then, presenting the necessity of this technology development.

A Study on the Statistical Characteristics and Numerical Hindcasts of Storm Waves in East Sea (동해 폭풍파랑의 통계적 특성과 파랑 후측모의 실험에 관한 연구)

  • Chun, Hwusub;Kang, Tae-Soon;Ahn, Kyungmo;Jeong, Weon Mu;Kim, Tae-Rim;Lee, Dong Hwan
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.26 no.2
    • /
    • pp.81-95
    • /
    • 2014
  • In the present study, the statistical analysis on the storm waves in the East Sea have been carried out, and the several storm waves were reproduced by the modified WAM as a first step for the accurate and prompt forecasting and warning against the swell waves in East Sea. According to the present study, the occurrences of the storm waves from the North were the most probable, while the waves from the Northeast were most frequently observed. It was found that the significant wave heights of storm waves from the North and Northern northeast were larger than those of storm waves from the Northeast. But due to long fetch distance, the significant wave periods of storm waves from the Northesast were longer than those of North and Northern northeast. In addition to the wave analysis, the numerical experiments for the storm waves in East Sea were carried out using the modified WAM, and three periods of storm waves in 2013 were calculated. The numerical results were well agreed with wave measurements. However the numerical simulation results in shallow water region showed lower accuracies compared to deep water, which might be due to lower resolution of wind field and bottom topography caused by large grid size, 5 minute, adopted in the present study. Overall computational efficiency of the modified WAM found to be excellent compared to original WAM. It is because the modified WAM adopted the implicit scheme, thereby the present model performed 10 time faster than original WAM in computation time.

The hierarchical structures of cause-and-effect relationships on the profit factors in overseas construction projects (해외건설공사 수익성 영향인자의 계층구조 및 사레적용에 관한 연구)

  • Han, Seung-Heon;Sun, Seung-Min;Park, Sang-Hyuk;Jung, Do-Young
    • Korean Journal of Construction Engineering and Management
    • /
    • v.7 no.5
    • /
    • pp.64-76
    • /
    • 2006
  • Korea's overseas construction industry has been rather depressed by the weakened profitability as well as the sharp decrease of the market shares due to the lack of international competitiveness and the declined international market. There exist a lot of various risks in performing the overseas construction, and especially EPC projects, which entail complicated process from different parts, also require a sophisticated procurement and management skill. Subsequently, to survive in the competitive international market, we need to establish strategies to select potentially profitable projects at the initial stage of bidding process and to mitigate the high degree of risk exposure through contract negotiation and its adjustment. This research discusses the trend of environment in international construction markets. Then, it identifies the key factors that affect the profitability significantly through the structured surveys from 59 actual overseas projects, and it analyzes the key factors by using statistical methods. This research provides the profitability evaluation bases, with which overseas construction participants can forecast and analyze the risk more systematically, by eliciting profit-influencing factors using the result of statistical analysis, literature review and structuring their cause-and-effect relationships. The profitability casual hierarchy structure describes the profitability factors' hierarchy in details and their interrelationships. It also enables us to find out critical factors directly related to profitability aggravation through a qualitative and quantitative analysis. Ultimately, with this hierarchy structure as the base, the research will suggest how to develop the quantitative profitability forecasting model.