DOI QR코드

DOI QR Code

Regularization Strength Control for Continuous Learning based on Attention Transfer

어텐션 기반의 지속학습에서 정규화값 제어 방법

  • Kang, Seok-Hoon (Dept. of Embedded Systems Engineering, Incheon National University) ;
  • Park, Seong-Hyeon (Dept. of Embedded Systems Engineering, Incheon National University)
  • Received : 2021.12.28
  • Accepted : 2022.03.19
  • Published : 2022.03.31

Abstract

In this paper, we propose an algorithm that applies a different variable lambda to each loss value to solve the performance degradation caused by domain differences in LwF, and show that the retention of past knowledge is improved. The lambda value could be variably adjusted so that the current task to be learned could be well learned, by the variable lambda method of this paper. As a result of learning by this paper, the data accuracy improved by an average of 5% regardless of the scenario. And in particular, the performance of maintaining past knowledge, the goal of this paper, was improved by up to 70%, and the accuracy of past learning data increased by an average of 22% compared to the existing LwF.

본 논문에서는 LwF에서 도메인 차이에 따른 성능 하락 현상을 해결하기 위해, 각 손실값에 각각 다른 가변람다를 적용하는 알고리즘을 제안하여, 향상된 과거 지식유지가 이루어 지게 한다. 이 지식 전달 기반의 방법을 LwF와 접목하여, 과거 학습 태스크의 지식 유지 성능을 강화하였다. 가변 람다 방법을 추가적으로 적용하여, 현재 학습할 태스크를 잘 학습할 수 있도록 람다 값을 가변적으로 조절할 수 있었다. 본 논문의 제안 방법을 적용하여 학습한 결과 시나리오에 상관없이 평균 5% 정도 데이터의 정확도가 향상하였고, 특히 본 논문의 목표인 과거 지식을 유지하는 성능이 최대 70% 가까이 개선되었고, 과거 학습 데이터의 정확도가 기존 LwF 대비 평균 22% 상승하였다.

Keywords

References

  1. R. M. French, "Catastrophic forgetting in connectionist networks," Trends in cognitive sciences, vol.3, no.4, pp.128-135, 1999. DOI: 10.1016/s1364-6613(99)01294-2
  2. G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, "Continual lifelong learning with neural networks: A review," Neural Networks, vol.113, pp.54-71, 2019. DOI: 10.1016/j.neunet.2019.01.012
  3. F. Zenke, B. Poole, and S. Ganguli, "Continual learning through synaptic intelligence," Proceedings of the 34th International Conference on Machine Learning, vol 70, pp.3987-3995, 2017. DOI: 10.5555/3305890.3306093
  4. Y. Hsu, Y. Liu, A. Ramasamy, and Z. Kira, "Re-evaluating continual learning scenarios: A categorization and case for strong baselines," arXiv:1810.12488, 2019.
  5. J. Yoon, E. Yang, J. Lee, and S. J. Hwang, "Lifelong learning with dynamically expandable networks," arXiv:1708.01547, 2017.
  6. H. Shin, J. K. Lee, J. Kim, and J. Kim, "Continual learning with deep generative replay," arXiv: 1705.08690, 2017.
  7. G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural network," NIPS Workshop, arXiv:1503.02531, 2014.
  8. K. McRae, and PA. Hetherington, "Catastrophic interference is eliminated in pretrained networks," Proceedings of the 15h Annual Conference of the Cognitive Science Society, pp.723-728, 1993. DOI: 10.1.1.30.4449 https://doi.org/10.1.1.30.4449
  9. R. M. French, "Catastrophic forgetting in connectionist networks," Trends in cognitive sciences 3.4, pp.128-135, 1999. DOI: 10.1016/S1364-6613(99)01294-2
  10. J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, "Overcoming catastrophic forgetting in neural networks," Proceedings of the national academy of sciences, vol.114, no.13, pp.3521-3526, 2017. https://doi.org/10.1073/pnas.1611835114
  11. Z. Li and D. Hoiem, "Learning without forgetting", IEEE transactions on pattern analysis and machine intelligence, vol.40, no.12, pp.2935- 2947, 2017. DOI: 10.48550/arXiv.1612.00796
  12. S. Zagoruyko, and N. Komodakis, "Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer," arXiv:1612.03928, 2016.
  13. B. Heo, M. Lee, S Yun, and JY. Choi, "Knowledge transfer via distillation of activation boundaries formed by hidden neurons," Proceedings of the AAAI Conference on Artificial Intelligence, Vol.33, No.1, pp.3779-3787, 2019. DOI: 10.48550/arXiv.1811.03233