과제정보
본 연구는 한국지질자원연구원 자체연구사업인 "지질자원분야 대규모 언어 모델 시범개발(23-7512)" 과제의 일환으로 수행되었습니다.
참고문헌
- Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger G., Henighan T., Child R., Ramesh A., Ziegler D., Wu J., Winter C., Hesse C., Chen M., Sigler E., Litwin M., Gray S., Chess B., Clark J., Berner C., McCandlish S., Radford A., Sutskever I. and Amodei, D. (2020) Language models are few-shot learners. Advances in Neural Information Processing Systems, v.33, p.1877-1901. https://doi.org/10.48550/arXiv.2005.14165
- Deng, C., Zhang, T., He, Z., Xu, Y., Chen, Q., Shi, Y., Fu, L., Zhang, W., Wang, X., Zhou, C., Lin, Z. and He, J. (2024, March) K2: Learning A Foundation Language Model for Geoscience Knowledge Understanding and Utilization. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining, Association for Computing Machinery, p.161-170. https://doi.org/10.1145/3616855.3635772
- Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L. and Chen, W. (2021) Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685. doi: 10.48550/arXiv.2106.09685
- Lawley, C.J., Raimondo, S., Chen, T., Brin, L., Zakharov, A., Kur, D., Hui, J., Newton, G., Burgoyne, S.L. and Marquis, G. (2022) Geoscience language models and their intrinsic evaluation. Applied Computing and Geosciences, v.14, 100084. doi: 10.1016/j.acags.2022.100084
- Lee. J. and Choi, T. (2023) Llama-2-KoEn-13B. doi: 10.57967/hf/1280
- Lee, A.N., Hunter, C.J. and Ruiz, N. (2023) Platypus: Quick, cheap, and powerful refinement of llms. arXiv preprint arXiv:2308.07317. doi: 10.48550/arXiv.2308.07317
- Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kuttler, H., Lewis, M., Yih, W., Rocktaschel, T., Riedel, S. and Kiela, D. (2020) Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, v33, p.9459-9474. https://doi.org/10.48550/arXiv.2005.11401
- Mukherjee, S., Mitra, A., Jawahar, G., Agarwal, S., Palangi, H. and Awadallah, A. (2023) Orca: Progressive learning from complex explanation traces of gpt-4. arXiv preprint arXiv:2306.02707. doi: 10.48550/arXiv.2306.02707
- Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper, F.A., Ippolito, D., Choquette-Choo, C.A., Wallace, E., Tramer, F. and Lee, K. (2023) Scalable extraction of training data from (production) language models. arXiv preprint arXiv:2311.17035. doi: 10.48550/arXiv.2311.17035
- Niederfahrenhorst, A., Hakhamaneshi, K. and Ahmad, R. (2023, September 6) Fine-Tuning LLMs: LoRA or Full-Parameter? An in-depth Analysis with Llama 2. Anyscale. https://www.anyscale.com/blog/fine-tuning-llms-lora-or-full-parameter-an-indepth-analysis-with-llama-2
- Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C.L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J. and Lowe, R. (2022) Training language models to follow instructions with human feedback. In Proceedings of the Advances in Neural Information Processing Systems 35 (NeurIPS 2022), Curran Associates, Inc., v.35, p.27730-27744. https://doi.org/10.48550/arXiv.2203.02155
- Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D. and Finn, C. (2023) Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290. doi: 10.48550/arXiv.2305.18290
- Rasley, J., Rajbhandari, S., Ruwase, O. and He, Y. (2020, August) Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Association for Computing Machinery, p.3505-3506. https://doi.org/10.1145/3394486.3406703
- Sanh, V., Webson, A., Raffel, C., Bach, S. H., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Le Scao, T., Raja, A., Dey, M., Bari, M. S., Xu, C., Thakker, U., Sharma, S.S., Szczechla, E., Kim, T., Chhablani, G., Nayak, N., Datta, D., Chang, J., Jiang, M.T., Wang, H., Manica, M., Shen, S., Yong, Z.X., Pandey, H., Bawden, R., Wang, T., Neeraj, T., Rozen, J., Sharma, A., Santilli, A., Fevry, T., Fries, J.A., Teehan, R., Bers, T., Biderman, S., Gao, L., Wolf, T. and Rush, A.M. (2021) Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207. doi: 10.48550/arXiv.2110.08207
- Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C.C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao, C., Goswami, V., Goyal, N., Hartshorn, A., Hosseini, S., Hou, R., Inan, H., Kardas, M., Kerkez, V., Khabsa, M., Kloumann, I., Korenev, A., Koura, P.S., Lachaux, M., Lavril, T., Lee, J., Liskovich, D., Lu, Y., Mao, Y., Martinet, X., Mihaylov, T., Mishra, P., Molybog, I., Nie, Y., Poulton, A., Reizenstein, J., Rungta, R., Saladi, K., Schelten, A., Silva, R., Smith, E.M., Subramanian, R., Tan, X.E., Tang, B., Taylor, R., Williams, A., Kuan, J.X., Xu, P., Yan, Z., Zarov, I., Zhang, Y., Fan, A., Kambadur, M., Narang, S., Rodriguez, A., Stojnic, R., Edunov, S. and Scialom, T. (2023) Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288. doi: 10.48550/arXiv.2307.09288
- Wang, Y.E., Wei, G.Y. and Brooks, D. (2019) Benchmarking TPU, GPU, and CPU platforms for deep learning. arXiv preprint arXiv:1907.10701. doi: 10.48550/arXiv.1907.10701
- Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J., Wen, J. (2023) A survey of large language models. arXiv preprint arXiv:2303.18223. doi: 10.48550/arXiv.2303.18223