키워드 인식 시스템을 위한 연합 미세 조정 활용 위스퍼-타이니 모델

Whisper-Tiny Model with Federated Fine Tuning for Keyword Recognition System

  • 시바니 산제이 콜레카르 (전남대학교 인공지능학과) ;
  • 김경백 (전남대학교 인공지능학과)
  • Shivani Sanjay Kolekar (Dept. of Artificial Intelligence Convergence, Chonnam National University) ;
  • Kyungbaek Kim (Dept. of Artificial Intelligence Convergence, Chonnam National University)
  • 발행 : 2024.10.31

초록

Fine-tuning is critical to enhance the model's ability to operate effectively in resource-constrained environments by incorporating domain-specific data, improving reliability, fairness, and accuracy. Large language models (LLMs) traditionally prefer centralized training due to the ease of managing vast computational resources and having direct access to large, aggregated datasets, which simplifies the optimization process. However, centralized training presents significant drawbacks, including significant delay, substantial communication costs, and slow convergence, particularly when deploying models to devices with limited resources. Our proposed framework addresses these challenges by employing a federated fine-tuning strategy with Whisper-tiny model for keyword recognition system (KWR). Federated learning allows edge devices to perform local updates without the need for constant data transmission to a central server. By selecting a cluster of clients and aggregating their updates each round based on federated averaging, this strategy accelerates convergence, reduces communication overhead, and achieves higher accuracy in comparatively less time, making it more suitable than centralized approach. By the tenth round of federated updates, the fine-tuned model demonstrates notable improvements, achieving over 95.48% test accuracy. We compare the FL-finetuning method with and centralized strategy. Our framework shows significant improvement in accuracy in fewer training rounds.

키워드

과제정보

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development (IITP-2023-RS-2023-00256629) grant funded by the Korea government(MSIT). This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2024-RS-2024-00437718) supervised by the IITP(Institute for Information & Communications Technology Planning & Evaluation).

참고문헌

  1. Brown, T. B., et al. "Language models are few-shot learners." Advances in Neural Information Processing Systems 33 (2020): 1877-1901.
  2. Devlin, J., et al. "BERT: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
  3. Reddi, S. J., et al. "Adaptive Federated Optimization." International Conference on Learning Representations (2021).
  4. Kairouz, P., et al. "Advances and open problems in federated learning." arXiv preprint arXiv:1912.04977 (2019).
  5. Hinton, G., et al. "Deep neural networks for acoustic modeling in speech recognition." IEEE Signal Processing Magazine 29.6 (2012): 82-97.
  6. McMahan, H. B., et al. "Communication-efficient learning of deep networks from decentralized data." Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. PMLR, 2017.
  7. Bonawitz, K., et al. "Towards federated learning at scale: System design." Proceedings of the 2nd SysML Conference. 2019.
  8. Li, X., et al. "On the Convergence of FedAvg on Non-IID Data." International Conference on Learning Representations (2020).
  9. Konecny, J., et al. "Federated optimization: Distributed machine learning for on-device intelligence." arXiv preprint arXiv:1610.02527 (2016).
  10. Smith, V., et al. "Federated multi-task learning." Advances in Neural Information Processing Systems 30 (2017): 4424-4434. Link
  11. Wang, S., et al. "Adaptive Federated Learning in Resource-Constrained Edge Computing Systems." IEEE Journal on Selected Areas in Communications 37.6 (2019): 1205-1221.
  12. Warden, Pete. "Speech commands: A dataset for limited vocabulary speech recognition." arXiv preprint arXiv:1804.03209 (2018).