DOI QR코드

DOI QR Code

Research on Optimization Strategies for Random Forest Algorithms in Federated Learning Environments

연합 학습 환경에서의 랜덤 포레스트 알고리즘 최적화 전략 연구

  • InSeo Song ;
  • KangYoon Lee
  • 송인서 (가천대학교 IT융합공학과 컴퓨터공학부) ;
  • 이강윤 (가천대학교 IT융합공학과 컴퓨터공학부)
  • Received : 2024.05.27
  • Accepted : 2024.06.24
  • Published : 2024.06.30

Abstract

Federated learning has garnered attention as an efficient method for training machine learning models in a distributed environment while maintaining data privacy and security. This study proposes a novel FedRFBagging algorithm to optimize the performance of random forest models in such federated learning environments. By dynamically adjusting the trees of local random forest models based on client-specific data characteristics, the proposed approach reduces communication costs and achieves high prediction accuracy even in environments with numerous clients. This method adapts to various data conditions, significantly enhancing model stability and training speed. While random forest models consist of multiple decision trees, transmitting all trees to the server in a federated learning environment results in exponentially increasing communication overhead, making their use impractical. Additionally, differences in data distribution among clients can lead to quality imbalances in the trees. To address this, the FedRFBagging algorithm selects only the highest-performing trees from each client for transmission to the server, which then reselects trees based on impurity values to construct the optimal global model. This reduces communication overhead and maintains high prediction performance across diverse data distributions. Although the global model reflects data from various clients, the data characteristics of each client may differ. To compensate for this, clients further train additional trees on the global model to perform local optimizations tailored to their data. This improves the overall model's prediction accuracy and adapts to changing data distributions. Our study demonstrates that the FedRFBagging algorithm effectively addresses the communication cost and performance issues associated with random forest models in federated learning environments, suggesting its applicability in such settings.

연합 학습은 분산 환경에서 데이터 프라이버시와 보안을 유지하면서 효율적으로 머신러닝 모델을 학습하는 방법으로 주목받고 있다. 본 연구에서는 이러한 연합 학습 환경에서 랜덤 포레스트 모델의 성능을 최적화하기 위해 새로운 FedRFBagging 알고리즘을 제안한다. 클라이언트별 데이터 특성에 기반하여 로컬 랜덤 포레스트 모델의 트리를 동적으로 조정함으로써 통신 비용을 줄이고, 다수의 클라이언트 환경에서도 높은 예측 정확도를 달성할 수 있다. 제안하는 방법은 다양한 데이터 조건에 적응하여 모델의 안정성과 학습 속도를 크게 향상시킨다. 랜덤 포레스트 모델은 여러 개의 결정 트리로 구성되나, 연합 학습 환경에서 모든 트리를 서버로 전송하면 통신 오버헤드가 기하급수적으로 증가하여 사용이 어려워진다. 또한 클라이언트 간 데이터 분포의 차이로 인해 트리의 품질 불균형이 발생할 수 있다. 이를 해결하기 위해 FedRFBagging 알고리즘을 제안하며 이는 각 클라이언트에서 성능이 높은 트리만을 선택해 서버로 전송하고, 서버는 불순도 값을 기준으로 트리들을 선택하여 최적의 글로벌 모델을 구성한다. 이를 통해 통신 오버헤드를 줄이고 다양한 데이터 분포에서도 높은 예측 성능을 유지할 수 있다. 글로벌 모델은 다양한 클라이언트 데이터를 반영하지만, 각 클라이언트의 데이터 특성은 다를 수 있다. 이를 보완하기 위해 클라이언트는 글로벌 모델에 추가 트리를 학습하여 로컬 데이터에 맞춘 최적화를 수행한다. 이를 통해 전체 모델의 예측 정확도를 높이고 변화하는 데이터 분포에 적응할 수 있다. 본 연구는 연합 학습 환경에서 랜덤 포레스트 모델이 가지는 통신 비용과 성능 문제를 효과적으로 해결하여 적용 가능한 연합 학습 환경에서 랜덤 포레스트 모델을 위한 알고리즘임을 시사한다.

Keywords

Acknowledgement

본 연구는 보건복지부의 재원으로 한국보건산업진흥원의 보건의료기술연구개발사업 지원(과제: HI22C1651)과 한국연구재단의 기초연구사업 (grant number: NRF-2022R1F1A1069069) 지원에 의하여 이루어진 것입니다.

References

  1. S. Shen, T. Zhu, D. Wu, W. Wang, W. Zhou, "From distributed machine learning to federated learning: In the view of data privacy and security", Concurrency and Computation: Practice and Experience, Vol.34, pp.27-38, 2020. 
  2. F. Sattler, S. Wiedemann, K. Muller, W. Samek, "Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data", IEEE Transactions on Neural Networks and Learning Systems, Vol.31, No.2, pp.3400-3413, 2019. 
  3. J. Ma, S. Naas, S. Sigg, X. Lyu, "Privacy-preserving federated learning based on multi-key homomorphic encryption", International Journal of Intelligent Systems, Vol.37, No.3, pp.5880- 5901, 2022. 
  4. E. Sahin, "Assessing the predictive capability of ensemble tree methods for landslide susceptibility mapping using XGBoost, gradient boosting machine, and random forest", SN Applied Sciences, Vol.2, pp.1-17, 2020. 
  5. M. A. Afrianto, M. Wasesa, "The impact of tree-based machine learning models, length of training data, and quarantine search query on tourist arrival prediction's accuracy under COVID-19 in Indonesia", Current Issues in Tourism, Vol.25, pp.3854-3870, 2022. 
  6. S. Premanand, S. Narayanan, "A Tree Based Machine Learning Approach for PTB Diagnostic Dataset", Journal of Physics: Conference Series, Vol.2115, 2021. 
  7. S. Saju, S.Vasantha Swami Nathan, G. Kavitha, A. R. Mahendran, A. Amudha, "Random Forest in Closed-Loop Control of Anesthesia", Journal on Electronic and Automation Engineering, pp.107-112, 2023. 
  8. Yiran Zhao, Houbao xu, "Prediction of Anti-Breast Cancer Drugs Activity Based on Bayesian Optimization Random Forest", 42nd Chinese Control Conference (CCC), pp.3471-3475, 2023. 
  9. Chrobak, D., Kolodzieczak, M., Kozlovska, P., Krzeminska, A., & Miller, T., "Leveraging random forest techniques for enhanced microbiological analysis: a machine learning approach to investigating microbial communities and their interactions", Scientific Collection InterConf, pp. 386-398, 2023. 
  10. Raposo, L.M., Rosa, P.T.C.R., Nobre, F.F, "Random Forest Algorithm for Prediction of HIV Drug Resistance", Pattern Recognition Techniques Applied to Biomedical Problems, 2020. 
  11. S. Mehta, S. Sharma, S. Anand, S. Sharma, A. Goel, "Random Forest Algorithm for Enhanced Prediction of Drug Target Interactions," International Journal of Innovative Technology and Exploring Engineering, Vol.9, No.4, pp.208-212, 2020. 
  12. K. V. Sarma, S. Harmon, T. Sanford, H. Roth, Z. Xu, J. Tetreault, D. Xu, M. G. Flores, A. Raman, R. Kulkarni, B. Wood, P. Choyke, A. Priester, L. Marks, S. Raman, D. Enzmann, B. Turkbey, W. Speier, C. Arnold, "Federated learning improves site performance in multicenter deep learning without data sharing", Journal of the American Medical Informatics Association, Vol.28, No.6, pp.1259-1264, 2021. 
  13. Micah J. Sheller, Brandon Edwards, G. A. Reina, Jason Martin, Sarthak Pati, Aikaterini Kotrotsou, Mikhail Milchenko, Weilin Xu, D. Marcus, R. Colen, S. Bakas, "Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data", Scientific Reports, Vol.10, 2020. 
  14. O. A. Wahab, A. Mourad, H. Otrok, T. Taleb, "Federated Machine Learning: Survey, Multi-Level Classification, Desirable Criteria and Future Directions in Communication and Networking Systems", IEEE Communications Surveys & Tutorials, Vol.23, No.3, pp.1342-1397, 2021. 
  15. P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. Bhagoji, K. Bonawitz, Z. B. Charles, G. Cormode, R. Cummings, R. G. L. D'Oliveira, S. Rouayheb, D. Evans, J. Gardner, Z. Garrett, A. Gascon, B. Ghazi, P. B. Gibbons, M. Gruteser, Z. Harchaoui, C. He, L. He, Z. Huo, B. Hutchinson, J. Hsu, M. Jaggi, T. Javidi, G. Joshi, M. Khodak, J. Konecny, A. Korolova, F. Koushanfar, O. Koyejo, T. Lepoint, Y. Liu, P. Mittal, M. Mohri, R. Nock, A. Ozgur, R. Pagh, M. Raykova, H. Qi, D. Ramage, R. Raskar, D. Song, W. Song, S. U. Stich, Z. Sun, A. Suresh, F. Tramer, P. Vepakomma, J. Wang, L. Xiong, Z. Xu, Q. Yang, F. X. Yu, H. Yu, S. Zhao, "Advances and Open Problems in Federated Learning", Foundations and Trends in Machine Learning, Vol.14, No.1-2, pp.1-210, 2021. 
  16. Ma, C., Qiu, X., Beutel, D. J., & Lane, N, "Gradient-less Federated Gradient Boosting Tree with Learnable Learning Rates", Journal of Federated Learning, Vol.3, No.2, pp.27-38, 2023
  17. M. Gencturk, A. A. Sinaci, N. Cicekli, "BOFRF: A Novel Boosting-Based Federated Random Forest Algorithm on Horizontally Partitioned Data", IEEE Access, Vol.10, pp.89835-89851, 2022. 
  18. H. Yao, J. Wang, P. Dai, L. Bo, Y. Chen, "An Efficient and Robust System for Vertically Federated Random Forest", arXiv preprint arXiv:2201.10761, 2022. 
  19. Y. Wang, H. Wu, D. Nettleton, "Stability of Random Forests and Coverage of Random-Forest Prediction Intervals," arXiv preprint arXiv:2310.18814, 2023. 
  20. Y. Wu and B. Wang, "A Framework Using Absolute Compression Hard-Threshold for Improving The Robustness of Federated Learning Model", 6th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 1106-1111, 2023. 
  21. Q. Li, C. Xie, X. Xu, X. Liu, C. Zhang, B. Li, B. He, D. Song, "Effective and Efficient Federated Tree Learning on Hybrid Data," in Proceedings of the International Conference on Learning Representations (ICLR), 2024. 
  22. Q. Li, W. Zhaomin, Y. Cai, Y. Han, C. Yang, T. Fu, B. He, "FedTree: A Federated Learning System For Trees," in Proceedings of Machine Learning and Systems (MLSys), Vol.5, 2023. 
  23. Kwatra, S., Varshney, A.K., Torra, V., "Integrally Private Model Selection for Support Vector Machine", Computer Security. ESORICS 2023 International Workshops, vol.14398, pp.249-259, 2024. 
  24. M. Mohammadi, A. A. Atashin, D. A. Tamburri, "From ℓ1 Subgradient to Projection: A Compact Neural Network for ℓ1 -Regularized Logistic Regression," Neurocomputing, Vol.526, pp.30-38, 2023. 
  25. I. J. Mouri, M. Ridowan, M. A. Adnan, "Data Poisoning Attacks and Mitigation Strategies on Federated Support Vector Machines," SN Computer Science, Vol.5, Article No.241, 2024. 
  26. J. Moon, S. Yang and K. Lee, "FedOps: A Platform of Federated Learning Operations With Heterogeneity Management", IEEE Access, vol. 12, pp. 4301-4314, 2024. 
  27. S. Yang, J. Moon, J. Kim, K. Lee and K. Lee, "FLScalize: Federated Learning Lifecycle Management Platform", IEEE Access, vol. 11, pp. 47212-47222, 2023.