Acknowledgement
이 논문은 2023년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행한 연구임[RS-2023-00216011, 사람처럼 개념적으로 이해/추론이 가능한 복합인공지능 원천기술 연구].
References
- J. Kaplan et al., "Scaling laws for neural language models," arXiv preprint, CoRR, 2020, arXiv: 2001.08361.
- J. Wei et al., "Emergent Abilities of Large Language Models," arXiv preprint, CoRR, 2022, arXiv: 2206.07682.
- W.X. Zhao et al., "A survey of large language models," arXiv preprint, CoRR, 2023, arXiv: 2303.18223.
- H. Touvron et al., "LLaMA: Open and efficient foundation language models," arXiv preprint, CoRR, 2023, arXiv: 2302.13971.
- S. Zhang et al., "OPT: Open pre-trained transformer language models," arXiv preprint, CoRR, 2022, arXiv: 2205.01068.
- S. Black et al., "Gpt-neox-20b: An open-source autoregressive language model," arXiv preprint, CoRR, 2022, arXiv: 2204.06745.
- H. Ko et al., "A technical report for Polyglot-Ko: Open-source large-scale korean language models," arXiv preprint. arXiv: 2306.02254.2023.
- S. Gunasekar et al., "Textbooks are all you need," arXiv preprint, CoRR, 2023, arXiv: 2306.11644.
- L. Ouyang et al., "Training language models to follow instructions with human feedback," arXiv preprint, CoRR, 2022, arXiv: 2203.02155.
- J. Wei et al., "Chain-of-thought prompting elicits reasoning in large language models," arXiv preprint, CoRR, 2022, arXiv: 2201.11903.
- D. Zhou et al., "Least-to-most prompting enables complex reasoning in large language models," arXiv preprint, CoRR, 2023, arXiv: 2205.10625.
- H. Lee et al., "Rlaif: Scaling reinforcement learning from human feedback with ai feedback," arXiv preprint, CoRR, 2023, arXiv: 2309.00261.
- T. Shin et al., "Autoprompt: Eliciting knowledge from language models with automatically generated prompts," arXiv preprint, CoRR, 2020, arXiv: 2010.15980.
- X.L. Li and P. Liang, "Prefix-tuning: Optimizing continuous prompts for generation," arXiv preprint, CoRR, 2021, arXiv: 2101.00190.
- T. Dettmers et al., "LLM.int8(): 8-bit matrix multiplication for transformers at scale," arXiv preprint, CoRR, 2022, arXiv: 2208.07339.
- E.J. Hu et al., "LoRA: Low-rank adaptation of large language models," arXiv preprint, CoRR, 2021, arXiv: 2106.09685.
- H. Liu et al., "Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning," arXiv preprint, CoRR, 2022, arXiv: 2205.05638.
- T. Dettmers et al., "QLoRA: Efficient finetuning of quantized LLMs," arXiv preprint, CoRR, 2023, arXiv: 2305.14314.
- P. Lu et al., "A survey of deep learning for mathematical reasoning," in Proc. ACL, (Toronto, Canada), Jul. 2023, pp. 14605-14631.
- P. Lu et al., "Inter-GPS: Interpretable geometry problem solving with formal language and symbolic reasoning," in Proc.. ACL&IJCNLP, (Online Only), Aug. 2021, pp. 6774-6786.
- D. Dua et al., "DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs," in Proc. NAACL-HLT, (Minneapolis, MN, USA), June 2019, pp. 2368-2378.
- L. Gao et al., "PAL: Program-aided language models," arXiv preprint, CoRR, 2022, arXiv: abs/2211.10435.
- C. Zheng et al., "Progressive-hint prompting improves reasoning in large language models," arXiv preprint, CoRR, 2023, arXiv: 2304.09797.
- A. Madaan et al., "Self-refine: Iterative refinement with self-feedback," arXiv preprint, CoRR, 2023, arXiv: 2303.17651.
- X. Zhao et al., "Automatic model selection with large language models for reasoning," arXiv preprint, CoRR, 2023, arXiv: 2305.14333.
- A. Zhou et al., "Solving challenging math word problems using GPT-4 code interpreter with code-based self-verification," arXiv preprint, CoRR, 2023, arXiv: 2308.07921.
- Y. Fu et al., "Complexity-based prompting for multistep reasoning," arXiv preprint, CoRR, 2022, arXiv: 2210.00720.
- https://paperswithcode.com/sota/arithmeticreasoning-on-gsm8k
- T. Kojima et al., "Large language models are zero-shot reasoners," in Proc. NeurIPS, (Virtual Only), Nov. 2022, pp. 22199-22213.
- L. Wang et al., "Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models," in Proc. ACL, (Toronto, Canada), Jul. 2023, pp. 2609-2634.
- W. Chen et al., "Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks," arXiv preprint, CoRR, 2022, arXiv: 2211.12588.
- Y. Fu et al., "Specializing smaller language models towards multi-step reasoning," arXiv preprint, CoRR, 2023, arXiv: 2301.12726.
- H. Luo et al., "Wizardmath: Empowering mathematical reasoning for large language models via reinforced evolinstruct," arXiv preprint, CoRR, 2023, arXiv: 2308.09583.
- Y. Hao et al., "PGDP5K: A diagram parsing dataset for plane geometry problems," in Proc. ICPR, (Montreal, Canada), Aug. 2022.
- M.L. Zhang et al., "Plane geometry diagram parsing," arXiv preprint, CoRR, 2022, arXiv: 2205.09363.
- J. Chen et al., "UniGeo: Unifying geometry logical reasoning via reformulating mathematical expression," arXiv preprint, CoRR, 2022, arXiv: 2212.02746.
- P. Lu et al., "Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning," arXiv preprint, CoRR, 2022, arXiv: 2209.14610.
- Z. Chen et al., "Finqa: A dataset of numerical reasoning over financial data," arXiv preprint, CoRR, 2021, arXiv: 2109.00122.
- F. Zhu et al., "TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance," arXiv preprint, CoRR, 2021, arXiv: 2105.07624.
- Y. Zhao et al., "MultiHiertt: Numerical reasoning over multi hierarchical tabular and textual data," arXiv preprint, CoRR, 2022, arXiv: 2206.01347.
- J. Herzig et al., "TaPas: Weakly supervised table parsing via pre-training," arXiv preprint, CoRR, 2020, arXiv: 2004.02349.
- F. Liu et al., "DePlot: One-shot visual language reasoning by plot-to-table translation," arXiv preprint, CoRR, 2022, arXiv: 2212.10505.
- B. Zhang et al., "Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning," arXiv preprint, CoRR, 2023, arXiv: 2306.02408.
- P. Lu et al., "Chameleon: Plug-and-play compositional reasoning with large language models," arXiv preprint, CoRR, 2023, arXiv: 2304.09842.