DOI QR코드

DOI QR Code

A statistical journey to DNN, the second trip: Architecture of RNN and image classification

심층신경망으로 가는 통계 여행, 두 번째 여행: RNN의 구조와 이미지 분류

  • Received : 2024.07.31
  • Accepted : 2024.08.12
  • Published : 2024.10.31

Abstract

RNNs are models that play a pivotal role in understanding various forms of DNNs. They have evolved into Seq2Seq models and subsequently into Transformers, leading to the development of large language models (LLMs) that are currently the focus of significant interest. Nonetheless, understanding the operation of RNNs is not an easy task. In particular, the core models of RNNs, LSTM and GRU, are challenging to comprehend due to their structural complexity. This paper explores ways to understand the operation of LSTM and GRU. Additionally, to demonstrate specific use cases of LSTM and GRU, we applied them to the problem of handwritten digit classification using the MNIST dataset. We utilized a method of segmenting each image into multiple patches and applied bidirectional LSTM and bidirectional GRU. The results were then compared with those of CNN.

RNN은 DNN의 여러 모형을 이해하는 데 있어 중추적 역할을 하는 모형이다. 또 이후 Seq2Seq 모형으로 발전하고, transformer로 발전하는 과정을 통하여, 현시점 최고의 관심이 되고 있는 대규모 언어모형의 발전을 이끌어 온 핵심적 기술이라 할 수 있다. 그럼에도 불구하고 RNN의 작동방식을 이해하는 것은 쉬운 일이 아니다. 특히 RNN의 핵심 모형인 LSTM과 GRU는 그 구조의 복잡성 때문에, 작동방식을 이해하기 쉽지 않다. 본 논문에서는 LSTM과 GRU의 작동방식을 이해하기 위한 방안을 모색한다. 더하여 LSTM과 GRU에 대한 구체적인 사용 사례를 보이기 위하여, MNIST 데이터에서의 필기숫자 분류 문제에 적용하였다. 각각의 이미지를 여러 개의 패치로 구획하는 방법을 이용하여 양방향 LSTM과 양방향 GRU를 적용하였다. 그 결과를 CNN과 비교하였다.

Keywords

References

  1. Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, and Bengio Y (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 1724-1734, Association for Computational Linguistics.
  2. Choi J and Shin DW (2019). The roles of differencing and dimension reduction in machine learning forecasting of employment level using the FRED big data, Communications for Statistical Applications and Methods, 26, 497-506.
  3. Hochreiter S and Schmidhuber J (1997). Long short-term memory, Neural Computation, 9, 1735-1780.
  4. Hwang IJ, Kim HJ, Kim YJ, and Lee YD (2024). Generalized neural collaborative filtering, The Korean Journal of Applied Statistics, 37, 311-322.
  5. Kim YJ, Hwang IJ, Jang K, and Lee YD (2024a). A statistical journey to DNN, the third trip: Language model and transformer, The Korean Journal of Applied Statistics, 37, 567-582.
  6. Kim HJ, Hwang IJ, Kim YJ, and Lee YD (2024b). A statistical journey to DNN, the first trip: From regression to deep neural network, The Korean Journal of Applied Statistics, 37, 541-551.
  7. Krizhevsky A and Hinton G (2009). Learning multiple layers of features from tiny images (Technical Report 0), University of Toronto, Toronto, Ontario.
  8. LeCun Y, Bottou L, Bengio Y, and Haffner P (1998). Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86, 2278-2324.
  9. Shin J and Shin DW (2022). Deep learning forecasting for financial realized volatilities with aid of implied volatilities and internet search volumes, The Korean Journal of Applied Statistics, 35, 93-104.