DOI QR코드

DOI QR Code

A Code Recommendation Method Using RNN Based on Interaction History

RNN을 이용한 동작기록 마이닝 기반의 추천 방법

  • 조희태 (경상대학교 정보과학과) ;
  • 이선아 (경상대학교 항공우주및소프트웨어전공) ;
  • 강성원 (KAIST 전산학부)
  • Received : 2018.08.27
  • Accepted : 2018.11.08
  • Published : 2018.12.31

Abstract

Developers spend a significant amount of time exploring and trying to understand source code to find a source location to modify. To reduce such time, existing studies have recommended the source location using statistical language model techniques. However, in these techniques, the recommendation does not occur if input data does not exactly match with learned data. In this paper, we propose a code location recommendation method using Recurrent Neural Networks and interaction histories, which does not have the above problem of the existing techniques. Our method achieved an average precision of 91% and an average recall of 71%, thereby reducing time for searching and exploring code more than the existing recommendation techniques.

개발자들은 소프트웨어 개발과 유지보수 작업 중 하나의 코드를 수정하는데 들이는 시간보다 이를 위해 코드를 탐색하고 이해하는데 더 많은 시간을 소모한다. 코드를 탐색하는 시간을 줄이기 위하여 기존 연구들은 데이터 마이닝과 통계적 언어모델 기법을 이용하여 수정할 코드를 추천하여 왔다. 그러나 이 경우 모델의 학습 데이터와 입력되는 데이터가 정확하게 일치하지 않으면 추천이 발생하지 않는다. 이 논문에서 우리는 딥러닝의 기법 중 하나인 Recurrent Neural Networks에 동작기록을 학습시켜 기존 연구의 상기 문제점 없이 수정할 코드의 위치를 추천하는 방법을 제안한다. 제안 방법은 RNN과 동작기록을 활용한 추천 기법으로 평균 약 91%의 정확도와 71%의 재현율을 달성함으로써 기존의 추천방법보다 코드 탐색 시간을 더욱 줄일 수 있게 해 준다.

Keywords

JBCRJM_2018_v7n12_461_f0001.png 이미지

Fig. 1. Operation of Simple Recurrent Neural Network

JBCRJM_2018_v7n12_461_f0002.png 이미지

Fig. 2. Example of Interaction History

JBCRJM_2018_v7n12_461_f0003.png 이미지

Fig. 3. Preprocessing of Interaction Histories

JBCRJM_2018_v7n12_461_f0004.png 이미지

Fig. 4. Example of Data Pair

JBCRJM_2018_v7n12_461_f0005.png 이미지

Fig. 5. Model Overview

JBCRJM_2018_v7n12_461_f0006.png 이미지

Fig. 6. Precision of N-gram and RNN

JBCRJM_2018_v7n12_461_f0007.png 이미지

Fig. 7. Recall of N-gram and RNN

Table 1. Number of Data After Preprocessing

JBCRJM_2018_v7n12_461_t0001.png 이미지

Table 2. Model Performance

JBCRJM_2018_v7n12_461_t0002.png 이미지

Table 3. Result of Parameter Tuning

JBCRJM_2018_v7n12_461_t0003.png 이미지

References

  1. J. S. Cho and J. C. Park, "A Study on The Project Schedule Management System Development for Small Scale IT Companies," The KORMS, pp.1264-1272. 2008.
  2. S. A. Lee, S. W. Kang, S. H. Kim, and M. Staats, "The Impact of View Histories on Edit Recommendations," IEEE Transactions on Software Engineering, Vol.41, Issue 3, pp.314-330, 2015. https://doi.org/10.1109/TSE.2014.2362138
  3. V. Raychev, M. Vechev, and E. Yahav, "Code completion with statistical language models," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Edinburgh, pp.419-428, 2014.
  4. T. T. Nguyen, A. T. Nguyen, H. A. Nguyen, and T. N. Nguyen, "A statistical semantic language model for source code," in Proceedings of the Joint Meeting on Foundations of Software Engineering, Saint Petersburg, pp.532-542, 2013.
  5. A. T. Nguyen and T. N. Nguyen, "Graph-based statistical language model for code," in IEEE International Conference on Software Engineering, Florence, 2015.
  6. X. Gu, H. Zhang, D. Zhang, and S. Kim, "Deep API learning," in Proceedings of the ACM SIGSOFT International Symposium on Foundation of Software Engineering, Seattle, pp.631-642, 2016.
  7. X. Gu, H. Zhang, D. Zhang, and S. Kim, "DeepAM: Migrate APIs with multi-modal sequence to sequence learning," in Proceedings of International Joint Conference on Artificial Intelligence, Melbourne, pp.3675-3681, 2017.
  8. S. R. Lee, M. J. Heo, C. G. Lee, M. Kim, and G. Jeong, "Applying deep learning based automatic bug triager to industrial projects," in Proceedings of the Joint Meeting on Foundations of Software Engineering, Paderborn, pp.926-931.
  9. S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, Vol.9, No.8 pp.1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
  10. J. Y. Chung, C. Gulcehre, K. H. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv preprint arXiv:1412.3555, 2014.
  11. D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
  12. R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen, "Boa: Ultra-large-scale software repository and source-code mining," ACM Transactions on Software Engineering and Methodology, Vol.25, Issue 1, Article No.7, 2015.
  13. S. A. Lee, S. W. Kim, S. H. Kim, and M. Staats, "The impact of view histories on edit recommendations," IEEE Transactions on Software Engineering, Vol.41, No.3, pp.314-330, 2015. https://doi.org/10.1109/TSE.2014.2362138
  14. K. Damevski, D. C. Shepherd, J. Schneider, and L. Pollock, "Mining sequences of developer interactions in visual studio for usage smells," IEEE Transactions on Software Engineering, Vol.43, No.4, pp.359-371, 2016. https://doi.org/10.1109/TSE.2016.2592905