Acknowledgement
이 논문은 2024 년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임 (IITP-2024-2021-0-01817, No.RS-2020-II201373, 인공지능대학원지원(한양대학교))
References
- Hendrycks, Dan, and Kevin Gimpel. "Gaussian error linear units (gelus)." arXiv preprint arXiv:1606.08415 (2016).
- Elfwing, Stefan, Eiji Uchibe, and Kenji Doya. "Sigmoid-weighted linear units for neural network function approximation in reinforcement learning." Neural networks 107 (2018): 3-11.
- Devlin, Jacob. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
- Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI blog 1.8 (2019): 9.
- Groeneveld, Dirk, et al. "Olmo: Accelerating the science of language models. arXiv preprint, 2024." URL https://api.semanticscholar.org/CorpusID267365485.