Search | Korea Science

Kwang-Yeob Lee;Ji-Won Lee;Tae-Ryong Park
- Journal of IKEEE
- /
- v.27 no.3
- /
- pp.313-318
- /
- 2023
This paper proposes a neural network applying fine-tuning as a way to improve the performance of Video Classification based on Vision Transformer. Recently, the need for real-time video image analysis based on deep learning has emerged. Due to the characteristics of the existing CNN model used in Image Classification, it is difficult to analyze the association of consecutive frames. We want to find and solve the optimal model by comparing and analyzing the Vision Transformer and Non-local neural network models with the Attention mechanism. In addition, we propose an optimal fine-tuning neural network model by applying various methods of fine-tuning as a transfer learning method. The experiment trained the model with the UCF101 dataset and then verified the performance of the model by applying a transfer learning method to the UTA-RLDD dataset.
https://doi.org/10.7471/ikeee.2023.27.3.313 인용 PDF

Kwang-Yeob Lee;Hwang-Hee Moon;Tae-Ryong Park
- Journal of IKEEE
- /
- v.27 no.3
- /
- pp.251-257
- /
- 2023
In this paper, we propose a dual-structured self-attention method that improves the lack of regional features of the vision transformer's self-attention. Vision Transformers, which are more computationally efficient than convolutional neural networks in object classification, object segmentation, and video image recognition, lack the ability to extract regional features relatively. To solve this problem, many studies are conducted based on Windows or Shift Windows, but these methods weaken the advantages of self-attention-based transformers by increasing computational complexity using multiple levels of encoders. This paper proposes a dual-structure self-attention using self-attention and neighborhood network to improve locality inductive bias compared to the existing method. The neighborhood network for extracting local context information provides a much simpler computational complexity than the window structure. CIFAR-10 and CIFAR-100 were used to compare the performance of the proposed dual-structure self-attention transformer and the existing transformer, and the experiment showed improvements of 0.63% and 1.57% in Top-1 accuracy, respectively.
https://doi.org/10.7471/ikeee.2023.27.3.251 인용 PDF