DOI QR코드

DOI QR Code

노이즈 필터링과 충분차원축소를 이용한 비정형 경제 데이터 활용에 대한 연구

Using noise filtering and sufficient dimension reduction method on unstructured economic data

  • 유재근 (이화여자대학교 통계학과) ;
  • 박유진 (이화여자대학교 통계학과) ;
  • 서범석 (한국은행 경제모형실)
  • Jae Keun Yoo (Department of Statistics, Ewha Womans University) ;
  • Yujin Park (Department of Statistics, Ewha Womans University) ;
  • Beomseok Seo (Office of Economic Modeling and Policy Analysis, Bank of Korea)
  • 투고 : 2023.07.07
  • 심사 : 2023.10.07
  • 발행 : 2024.04.30

초록

본 연구는 노이즈 필터링과 차원축소 등의 방법을 이용하여 텍스트 지표의 정상화에 대해 검토하고 실증 분석을 통해 동 지표의 활용가능성을 제고할 수 있는 후처리 과정을 탐색하고자 하였다. 실증분석에 대한 예측 목표 변수로 월별 선행지수 순환 변동치, BSI 전산업 매출실적, BSI 전산업 매출전망 그리고 분기별 실질 GDP SA전기비와 실질 GDP 원계열 전년동기비를 상정하고 계량경제학에서 널리 활용되는 Hodrick and Prescott 필터와 비모수 차원축소 방법론인 충분차원축소를 비정형 텍스트 데이터와 결합하여 분석하였다. 분석 결과 월별과 분기별 변수 모두에서 자료의 수가 많은 경우 텍스트 지표의 노이즈 필터링이 예측 정확도를 높이고, 차원 축소를 적용함에 따라 보다 높은 예측력을 확보할 수 있음을 확인하였다. 분석 결과가 시사하는 바는 텍스트 지표의 활용도 제고를 위해서는 노이즈 필터링과 차원 축소 등의 후처리 과정이 중요하며 이를 통해 경기 예측의 정도를 높일 수 있다는 것이다.

Text indicators are increasingly valuable in economic forecasting, but are often hindered by noise and high dimensionality. This study aims to explore post-processing techniques, specifically noise filtering and dimensionality reduction, to normalize text indicators and enhance their utility through empirical analysis. Predictive target variables for the empirical analysis include monthly leading index cyclical variations, BSI (business survey index) All industry sales performance, BSI All industry sales outlook, as well as quarterly real GDP SA (seasonally adjusted) growth rate and real GDP YoY (year-on-year) growth rate. This study explores the Hodrick and Prescott filter, which is widely used in econometrics for noise filtering, and employs sufficient dimension reduction, a nonparametric dimensionality reduction methodology, in conjunction with unstructured text data. The analysis results reveal that noise filtering of text indicators significantly improves predictive accuracy for both monthly and quarterly variables, particularly when the dataset is large. Moreover, this study demonstrated that applying dimensionality reduction further enhances predictive performance. These findings imply that post-processing techniques, such as noise filtering and dimensionality reduction, are crucial for enhancing the utility of text indicators and can contribute to improving the accuracy of economic forecasts.

키워드

과제정보

본 연구는 한국은행의 연구용역지원을 받아 수행되었습니다. 유재근과 박유진은 2023년도 정부(과학기술정보통신부)의 재원으로 한국연구재 단의 지원을 받아수행된 기초연구사업임 (RS-2023-00240564 and RS-2023-00217022).

참고문헌

  1. Chen CC, Huang HH, Huang YL, and Chen HH (2021, October). Constructing noise free economic policy uncertainty index. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Gold Coast, Queensland, Australia, 2915-2919.
  2. Cook RD, Li B, and Chiaromonte F (2007). Dimension reduction in regression without matrix inversion, Biometrika, 94, 569-584. https://doi.org/10.1093/biomet/asm038
  3. Hall P and Li KC (1993). On almost linearity of low dimensional projections from high dimensional data, Annals of Statistics, 21, 867-889. https://doi.org/10.1214/aos/1176349155
  4. Hamilton JD (2018). Why you should never use the Hodrick-Prescott filter, Review of Economics and Statistics, 100, 831-843. https://doi.org/10.1162/rest_a_00706
  5. Hodrick R and Prescott EC (1997). Postwar U.S. business cycles: An empirical investigation, Journal of Money, Credit and Banking, 29, 1-16.
  6. Kim Y, Ahn YH, Yoo JK, and Kown O (2017). Verifying identities of plant-based multivitamins using phytochemical fingerprinting in combination with multiple bioassays, Plant Foods for Human Nutrition, 72, 288-293. https://doi.org/10.1007/s11130-017-0622-5
  7. Lee K, Choi Y, Um HY, and Yoo JK (2019). On fused dimension reduction in multivariate regression, Chemo-metrics and Intelligent Laboratory Systems, 193, 103828.
  8. Li B (2018). Sufficient Dimension Reduction Methods and Applications with R, Chapman and Hall/CRC, London, UK.
  9. Li B and Wang S (2007). On directional regression for dimension reduction, Journal of the American Statistical Association, 102, 997-1008. https://doi.org/10.1198/016214507000000536
  10. Li B, Zha H, and Chiaromonte F (2005). Contour regression: A general approach to dimension reduction, Annals of Statistics, 33, 1580-1616. https://doi.org/10.1214/009053605000000192
  11. Li KC (1991). Sliced inverse regression for dimension reduction, American Statistical Association, 86, 316-327. https://doi.org/10.1080/01621459.1991.10475035
  12. Ravn MO and Uhlig H (2002). On adjusting the Hodrick-Prescott filter for the frequency of observations, Review of Economics and Statistics, 84, 371-376. https://doi.org/10.1162/003465302317411604
  13. Li, B., Zha, H., and Chiaromonte, F. (2005). Contour regression: A general approach to dimension reduction, Annals of Statistics, 33, 1580-1616. https://doi.org/10.1214/009053605000000192
  14. Seo B (2023a). Econometric Forecasting Using Ubiquitous News Text: Text-enhanced Factor Model, Bank of Korea WP, 10, 1-48. https://doi.org/10.2139/ssrn.4466622
  15. Seo B (2023b). Industry analysis using AI algorithms: Text analysis of securities research reports, Bank of Korea Issue Note, 5, 1-26.
  16. Seo B, Lee Y, and Cho H (2024). Measuring News Sentiment of Korea Using Transformer. Korean Economic Review, 40, 149-176.
  17. Sung T and Si G (2020). Research Methodology, (3rd ed), Hakjisa, Seoul.
  18. Um HY, Won S, An H, and Yoo JK (2018). Case study: Application of fused sliced average variance estimation to near-infrared spectroscopy of biscuit dough data, The Korean Journal of Applied Statistics, 31, 835-842. https://doi.org/10.5351/KJAS.2018.31.6.835
  19. Yoo JK (2019). Analysis of microarray right-censored data through fused sliced inverse regression, Scientific Reports, 9, 15094.