DOI QR코드

DOI QR Code

Leveraging LLMs for Corporate Data Analysis: Employee Turnover Prediction with ChatGPT

대형 언어 모델을 활용한 기업데이터 분석: ChatGPT를 활용한 직원 이직 예측

  • Sungmin Kim (aSSIST University) ;
  • Jee Yong Chung ( Duksung Women's University)
  • 김성민 (서울과학종합대학원대학교) ;
  • 정지용 (덕성여자대학교)
  • Received : 2024.04.17
  • Accepted : 2024.06.07
  • Published : 2024.06.30

Abstract

Organizational ability to analyze and utilize data plays an important role in knowledge management and decision-making. This study aims to investigate the potential application of large language models in corporate data analysis. Focusing on the field of human resources, the research examines the data analysis capabilities of these models. Using the widely studied IBM HR dataset, the study reproduces machine learning-based employee turnover prediction analyses from previous research through ChatGPT and compares its predictive performance. Unlike past research methods that required advanced programming skills, ChatGPT-based machine learning data analysis, conducted through the analyst's natural language requests, offers the advantages of being much easier and faster. Moreover, its prediction accuracy was found to be competitive compared to previous studies. This suggests that large language models could serve as effective and practical alternatives in the field of corporate data analysis, which has traditionally demanded advanced programming capabilities. Furthermore, this approach is expected to contribute to the popularization of data analysis and the spread of data-driven decision-making (DDDM). The prompts used during the data analysis process and the program code generated by ChatGPT are also included in the appendix for verification, providing a foundation for future data analysis research using large language models.

기업의 데이터 분석 및 활용 역량은 전사 차원의 지식경영과 의사결정에 중요한 역할을 한다. 이 연구는 대형 언어 모델이 기업데이터 분석에서 어떻게 활용될 수 있는지 알아보고자 수행되었다. 구체적으로 인적자원 분야에 초점을 맞추어, 대형 언어 모델의 데이터 분석 역량을 검증해 보았다. 이를 위해 인사분야에서 많은 연구가 이루어져온 공개데이터셋 IBM HR 데이터를 소재로, 선행연구들에서 이루어진 머신러닝 기반 이직자 예측 분석을 ChatGPT를 통해 재현하고 그 예측성능을 비교해보았다. 고급 프로그래밍 역량이 필요했던 과거 연구방식과 달리, 분석가의 자연어 요청으로 진행한 ChatGPT 기반 머신러닝 데이터 분석은 훨씬 쉽고 빠르다는 장점이 있었고, 예측 정확도 역시 선행연구와 비교해 경쟁력 있는 수준을 기록했다. 이는 그동안 고급 프로그래밍 역량이 요구되던 기업데이터 분석 분야에서, ChatGPT를 비롯한 대형 언어 모델들이 효과적이고 실질적인 대안이 될 수 있다는 가능성을 시사한다. 또한 이를 통해 데이터 분석의 대중화 나아가 데이터 기반 의사결정(DDDM: Data-Driven Decision Making)의 확산에도 기여할 수 있을 것으로 기대된다. 데이터분석 과정에서 사용한 프롬프트와 ChatGPT가 생성한 프로그래밍 코드도 부록에 수록하여 검증 가능하게 함으로써, 향후 대형 언어 모델을 활용한 데이터분석 연구의 초석을 제공하고자 한다.

Keywords

References

  1. 박진우 (2021, 5월 6일). 파이썬 가르치는 4대 은행들... 1년 만에 석사급 150명 양성. 한국경제, https://www.hankyung.com/article/202105064284i
  2. 최예지 (2019, 12월 3일). 직장에도 부는 '코딩교육' 바람. 조선에듀, https://edu.chosun.com/m/edu_article.html?contid=2019120202376
  3. Al Akasheh, M., Malik, E. F., Hujran, O., & Zaki, N. (2023). A decade of research on data mining techniques for predicting employee turnover: A systematic literature review. Expert Systems with Applications, 238, 121794.
  4. Al-Darraji, S., Honi, D. G., Fallucchi, F., Abdulsada, A. I., Giuliano, R., & Abdulmalik, H. A. (2021). Employee attrition prediction using deep neural networks. Computers, 10(11), 141.
  5. Alduayj, S. S., & Rajpoot, K. (2018, November). Predicting employee attrition using machine learning. In Proceedings of 2018 International Conference on Innovations in Information Technology (IIT), IEEE.
  6. Alshiddy, M. S., & Aljaber, B. N. (2023). Employee attrition prediction using nested ensemble learning techniques. International Journal of Advanced Computer Science and Applications, 14(7), 932-938.
  7. Arqawi, S. M., Rumman, M. A., Zitawi, E. A., Abunasser, B. S., & Abu-Naser, S. S. (2022). Predicting Employee Attrition and Performance Using Deep Learning. Journal of Theoretical and Applied Information Technology, 100(21), 6526-6536.
  8. Atef, M., S Elzanfaly, D., & Ouf, S. (2022). Early prediction of employee turnover using machine learning algorithms. International Journal of Electrical and Computer Engineering Systems, 13(2), 135-144.
  9. Bhuva, K., & Srivastava, K. (2018). Comparative study of the machine learning techniques for predicting the employee attrition. IJRAR-International Journal of Research and Analytical Reviews (IJRAR), 5(3), 568-577.
  10. Brynjolfsson, E., Hitt, L. M., & Kim, H. H. (2011). Strength in numbers: How does data-driven decisionmaking affect firm performance? Available at SSRN 1819486.
  11. Brynjolfsson, E., & McElheran, K. (2016). The rapid adoption of data-driven decision-making. American Economic Review, 106(5), 133-139.
  12. Chakraborty, R., Mridha, K., Shaw, R. N., & Ghosh, A. (2021, September). Study and prediction analysis of the employee turnover using machine learning approaches. In Proceedings of 2021 IEEE 4th International Conference on Computing, Power and Communication Technologies (GUCON), IEEE.
  13. Ciampi, F., Marzi, G., Demi, S., & Faraoni, M. (2020). The big data-business strategy interconnection: A grand challenge for knowledge management. A review and future perspectives. Journal of Knowledge Management, 24(5), 1157-1176.
  14. Eysenbach, G. (2023). The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Medical Education, 9(1), e46885.
  15. Fallucchi, F., Coladangelo, M., Giuliano, R., & William De Luca, E. (2020). Predicting employee attrition using machine learning techniques. Computers, 9(4), 86.
  16. Ferraris, A., Mazzoleni, A., Devalle, A., & Couturier, J. (2019). Big data analytics capabilities and knowledge management: impact on firm performance. Management Decision, 57(8), 1923-1936.
  17. Gao, X., Wen, J., & Zhang, C. (2019). An improved random forest algorithm for predicting employee turnover. Mathematical Problems in Engineering, 2019(1), 4140707.
  18. Guerranti, F., & Dimitri, G. M. (2023). A comparison of machine learning approaches for predicting employee attrition. Applied Sciences, 13(1), 267.
  19. Gurler, K., Pak, B. K., & Gungor, V. C. (2023, June). Deep Learning Based Employee Attrition Prediction. In Proceedings of IFIP International Conference on Artificial Intelligence Applications and Innovations, Cham: Springer Nature Switzerland. 
  20. Hassan, M. M., Knipper, A., & Santu, S. K. K. (2023). ChatGPT as your personal data scientist. arXiv preprint arXiv2305.13657.
  21. Hassani, H., & Silva, E. S. (2023). The role of ChatGPT in data science: How AI-assisted conversational interfaces are revolutionizing the field. Big Data and Cognitive Computing, 7(2), 62.
  22. Jian, X., & Feng, Y. (2023). Research on enterprise intelligent knowledge management and decision making based on big data mining. Accounting and Corporate Management, 5(8), 27-34.
  23. Kaya, I. E., & Korkmaz, O. (2021). Machine learning approach for predicting employee attrition and factors leading to attrition. Cukurova universitesi Muhendislik Fakultesi Dergisi, 36(4), 913-928.
  24. Khan, Z., & Vorley, T. (2017). Big data text analytics: an enabler of knowledge management. Journal of Knowledge Management, 21(1), 18-34.
  25. Krishna, S., & Sidharth, S. (2022). HR analytics: Employee attrition analysis using random forest. International Journal of Performability Engineering, 18(4), 275.
  26. Li, W. (2023). A transformer-based deep learning framework to predict employee attrition. PeerJ Computer Science, 9, e1570.
  27. Lingo, R. (2023). The role of ChatGPT in democratizing data science: An exploration of ai-facilitated data analysis in telematics. arXiv preprint arXiv:2308.02045.
  28. Mansor, N., Sani, N. S., & Aliff, M. (2021). Machine learning for predicting employee attrition. International Journal of Advanced Computer Science and Applications, 12(11), 435-445.
  29. Meng, D., & Li, Y. (2022). An imbalanced learning method by combining SMOTE with Center Offset Factor. Applied Soft Computing, 120, 108618.
  30. Mohamad, M. R., Nasaruddin, F. H., Hamid, S., Bukhari, S., & Ijab, M. T. (2021, November). Predicting employees' turnover in IT industry using classification method with feature selection. In Proceedings of 2021 International Conference on Computer Science and Engineering (IC2SE), IEEE.
  31. Najafi-Zangeneh, S., Shams-Gharneh, N., ArjomandiNezhad, A., & Hashemkhani Zolfani, S. (2021). An improved machine learning-based employees attrition prediction framework with emphasis on feature selection. Mathematics, 9(11), 1226.
  32. Ozdemir, F., Coskun, M., Gezer, C., & Gungor, V. C. (2020, May). Assessing employee attrition using classifications algorithms. In Proceedings of the 2020 the 4th International Conference on Information System and Data Mining, (pp. 118-122).
  33. Pisoni, G., Molnar, B., & Tarcsi, A. (2023). Knowledge Management and Data Analysis Techniques for Data-Driven Financial Companies. Journal of the Knowledge Economy, 1-20.
  34. Pulari, S. R., Punitha, A., Raja Varshni Meenachi, S., & Vasudevan, S. (2022). A Comparative Study of Employee Attrition Analysis Using Machine Learning and Deep Learning Techniques. In Proceedings of Inventive Communication and Computational Technologies: Proceedings of ICICCT 2022, Singapore: Springer Nature Singapore.
  35. Raza, A., Munir, K., Almutairi, M., Younas, F., & Fareed, M. M. S. (2022). Predicting employee attrition using machine learning approaches. Applied Sciences, 12(13), 6424.
  36. Saripuddin, M., Suliman, A., Syarmila Sameon, S., & Jorgensen, B. N. (2021, September). Random undersampling on imbalance time series data for anomaly detection. In Proceedings of the 2021 4th International Conference on Machine Learning and Machine Intelligence, (pp. 151-156).
  37. Shen, Y., Ai, X., Soosai Raj, A. G., Leo John, R. J., & Syamkumar, M. (2024, March). Implications of ChatGPT for Data Science Education. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1.
  38. Tallon-Ballesteros, A. J., & Riquelme, J. C. (2014, July). Deleting or keeping outliers for classifier training? In 2014 sixth World Congress on Nature and Biologically Inspired Computing (NaBIC 2014), IEEE.
  39. Tawil, A. R., Mohamed, M., Schmoor, X., Vlachos, K., & Haidar, D. (2023). Trends and challenges towards an effective data-driven decision making in UK SMEs: Case studies and lessons learnt from the analysis of 85 SMEs. arXiv preprint arXiv:2305.15454.
  40. Wang, S., & Wang, H. (2020). Big data for small and medium-sized enterprises (SMEs): A knowledge management model. Journal of Knowledge Management, 24(4), 881-897.
  41. Yigit, I. O., & Shourabizadeh, H. (2017, September). An approach for predicting employee churn by using data mining. In Proceedings of 2017 International Artificial Intelligence and Data Processing Symposium (IDAP), IEEE.
  42. Zhang, Y., Deng, L., Huang, H., & Wei, B. (2023). An improved SMOTE based on center offset factor and synthesis strategy for imbalanced data classification.