Applying Token Tagging to Augment Dataset for Automatic Program Repair

Hu, Huimin;Lee, Byungjeong;

doi:10.3745/JIPS.04.0251

Journal of Information Processing Systems

Volume 18 Issue 5
/
Pages.628-636
/
2022
/
1976-913X(pISSN)
/
2092-805X(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Applying Token Tagging to Augment Dataset for Automatic Program Repair

Hu, Huimin (Dept. of Computer Science, University of Seoul) ;
Lee, Byungjeong (Dept. of Computer Science, University of Seoul)

Received : 2022.08.23
Accepted : 2022.09.11
Published : 2022.10.31

https://doi.org/10.3745/JIPS.04.0251 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Automatic program repair (APR) techniques focus on automatically repairing bugs in programs and providing correct patches for developers, which have been investigated for decades. However, most studies have limitations in repairing complex bugs. To overcome these limitations, we developed an approach that augments datasets by utilizing token tagging and applying machine learning techniques for APR. First, to alleviate the data insufficiency problem, we augmented datasets by extracting all the methods (buggy and non-buggy methods) in the program source code and conducting token tagging on non-buggy methods. Second, we fed the preprocessed code into the model as an input for training. Finally, we evaluated the performance of the proposed approach by comparing it with the baselines. The results show that the proposed approach is efficient for augmenting datasets using token tagging and is promising for APR.

Keywords

Acknowledgement

This work was supported by National Research Foundation of Korea (NRF) grants funded by the Korean government (MSIT) (No. NRF-2020R1A2B5B01002467 and NRF-2022M3J6A1084845).

References

R. Just, D. Jalali, and M. D. Ernst, "Defects4J: a database of existing faults to enable controlled testing studies for Java programs," in Proceedings of the 2014 International Symposium on Software Testing and Analysis, San Jose, CA, 2014, pp. 437-440.
J. Patra and M. Pradel, "Semantic bug seeding: a learning-based approach for creating realistic bugs," in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, 2021, pp. 906-918.
N. Gupta, A. Sharma, and M. K. Pachariya, "A novel approach for mutant diversity-based fault localization: DAM-FL," International Journal of Computers and Applications, vol. 43, no. 8, pp. 804, 2021.
M. Tufano, C. Watson, G. Bavota, M. Di Penta, M. White, and D. Poshyvanyk, "An empirical investigation into learning bug-fixing patches in the wild via neural machine translation," in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, Montpellier, France, 2018, pp. 832-837.
M. Tufano, C. Watson, G. Bavota, M. D. Penta, M. White, and D. Poshyvanyk, "An empirical study on learning bug-fixing patches in the wild via neural machine translation," ACM Transactions on Software Engineering and Methodology, vol. 28, no. 4, article no. 19, 2019. https://doi.org/10.1145/3340544
S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Advances in Neural Information Processing Systems, vol. 30, pp. 5998-6008, 2017.
Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, et al., "CodeBERT: a pre-trained model for programming and natural languages," 2020 [Online]. Available: https://arxiv.org/abs/2002.08155.
S. Lu, D. Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, et al., "Codexglue: a machine learning benchmark dataset for code understanding and generation," 2021 [Online]. Available: https://arxiv.org/abs/2102.04664.

Journal of Information Processing Systems

Applying Token Tagging to Augment Dataset for Automatic Program Repair

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)