DOI QR코드

DOI QR Code

Standardization Trends on Safety and Trustworthiness Technology for Advanced AI

첨단 인공지능 안전 및 신뢰성 기술 표준 동향

  • J.H. Jeon
  • 전종홍 (지능정보표준연구실)
  • Published : 2024.10.01

Abstract

Artificial Intelligence (AI) has rapidly evolved over the past decade and has advanced in areas such as language comprehension, image and video recognition, programming, and scientific reasoning. Recent AI technologies based on large language models and foundation models are approaching or surpassing artificial general intelligence. These systems demonstrate superior performance in complex problem-solving, natural language processing, and multidomain tasks, and can potentially transform fields such as science, industry, healthcare, and education. However, these advancements have raised concerns regarding the safety and trustworthiness of advanced AI, including risks related to uncontrollability, ethical conflicts, long-term socioeconomic impacts, and safety assurance. Efforts are being expended to develop internationally agreed-upon standards to ensure the safety and reliability of AI. This study analyzes international trends in safety and trustworthiness standardization for advanced AI, identifies key areas for standardization, proposes future directions and strategies, and draws policy implications. The goal is to support the safe and trustworthy development of advanced AI and enhance international competitiveness through effective standardization.

Keywords

Acknowledgement

본 연구는 정부(과학기술정보통신부, 산업통상자원부, 보건복지부, 식품의약품안전처)의 재원으로 범부처전주기의료기기연구개발 사업단의 지원을 받아 수행된 연구임[과제고유번호: RS-2023-00208294].

References

  1. UK Government, "Frontier AI: capabilities and risks-discussion paper," 2023.
  2. Y. Bengio et al., "Managing extreme AI risks amid rapid progress," Sci., vol. 384, 2023, pp. 842-845.
  3. R. Bommasani et al., "On the Opportunities and Risks of Foundation Models," arXiv preprint, 2022, https://doi.org/10.48550/arXiv.2108.07258
  4. UK Government, "Future Risks of Frontier AI," Government Office for Science.
  5. UK Government, "International Scientific Report on the Safety of Advanced AI," 2024.
  6. H. Toner and A. Acharya, "Exploring clusters of research in three areas of AI safety," Center for Security and Emerging Technology, Feb. 2022.
  7. AI Safety Summit, "The Bletchley Declaration by countries attending the AI Safety Summit, 1-2 November 2023," UK Government. Nov. 2023.
  8. L. Weidinger et al., "Taxonomy of Risks posed by Language Models." in Proc. 2022 ACM Conf. Fairness, Accountability, Transparency, (Seoul, Rep. of Korea), 2022, https://doi.org/10.1145/3531146.3533088
  9. I. Solaiman et al., "Evaluating the Social Impact of Generative AI Systems in Systems and Society," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2306.05949
  10. L. Wang et al., "A Survey on Large Language Model based Autonomous Agents," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2308.11432
  11. M. Kinniment et al., "Evaluating Language-Model Agents on Realistic Autonomous Tasks," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2312.11671
  12. M. Anderljung et al., "Frontier AI Regulation: Managing Emerging Risks to Public Safety," arXiv preprint, 2023, https://doi.org/10.48550/arXiv.2307.03718
  13. United Nations Digital Library, "Seizing the opportunities of safe, secure and trustworthy artificial intelligence systems for sustainable development," Resolution A/RES/78/265, 2024.
  14. Trustworthiness Characteristics Matrix, https://github.com/hollobit/WG3_TCM
  15. Y. Liu et al., "Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2308.05374
  16. L. Sun et al., "TrustLLM: Trustworthiness in Large Language Models," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2401.05561
  17. Standford CRFM, "The Foundation Model Transparency Index," https://crfm.stanford.edu/fmti/May-2024/index.html
  18. R. Bommasani et al., "Foundation Model Transparency Reports," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2402.16268
  19. D. Hendrycks et al., "An Overview of Catastrophic AI Risks," arXiv preprint, 2023, https://doi.org/10.48550/arXiv.2306.12001
  20. L. Weidinger et al., "Holistic Safety and Responsibility Evaluations of Advanced AI Models," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2404.14068
  21. NIST AI RMF(Risk Management Framework), https://www.nist.gov/itl/ai-risk-management-framework
  22. G. Abercrombie et al., "A Collaborative, Human-Centred Taxonomy of AI, Algorithmic, and Automation Harms," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2407.01294
  23. Y. Zeng et al., "AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2406.17864
  24. OECD, Defining AI incidents and related terms, https://www.oecd.org/en/publications/2024/05/defining-ai-incidents-and-related-terms_88d089ec.html
  25. Y. Dong et al., "Safeguarding Large Language Models: A Survey," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2406.02622
  26. B. Xia et al., "An AI System Evaluation Framework for Advancing AI Safety: Terminology, Taxonomy, Lifecycle Mapping," arXiv preprint, 2024, https://doi.org/10.1145/3664646.3664766
  27. M.R. Morris et al., "Levels of AGI for Operationalizing Progress on the Path to AGI," arXiv preprint, 2023, https://doi.org/10.48550/arXiv.2311.02462
  28. M. Phuong et al., "Evaluating Frontier Models for Dangerous Capabilities," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2403.13793
  29. Z. Ji et al., "Survey of Hallucination in Natural Language Generation," arXiv preprint, 2022, https://doi.org/10.48550/arXiv.2202.03629
  30. P.A. Park et al., "AI deception: A survey of examples, risks, and potential solutions," Patterns, vol. 5., https://doi.org/10.1016/j.patter.2024.100988
  31. X. Huang et al., "A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation," arXiv preprint, 2023, https://doi.org/10.48550/arXiv.2305.11391
  32. S. Minaee et al., "Large Language Models: A Survey," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2402.06196
  33. J. Wang et al., "Software Testing with Large Language Models: Survey, Landscape, and Vision," arXiv preprint, 2023, https://doi.org/10.48550/arXiv.2307.07221
  34. X. Hou et al., "Large Language Models for Software Engineering: A Systematic Literature Review," arXiv preprint, 2023, https://doi.org/10.48550/arXiv.2308.10620
  35. Y. Chang et al., "A Survey on Evaluation of Large Language Models," arXiv preprint, 2023, https://doi.org/10.48550/arXiv.2307.03109
  36. Z. Guo et al., "Evaluating Large Language Models: A Comprehensive Survey," arXiv preprint, 2023, https://doi.org/10.48550/arXiv.2310.19736
  37. S. Schulhoff et al., "The Prompt Report: A Systematic Survey of Prompting Techniques," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2406.06608
  38. X. Yue et al., "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2311.16502
  39. K. Zhu et al., "PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2306.04528
  40. IEC 63521, Machine Learning-enabled Medical Device - Performance Evaluation Process
  41. H. Wang et al., "A Survey on an Emerging Safety Challenge for Autonomous Vehicles: Safety of the Intended Functionality," Eng., vol. 33, 2024, https://doi.org/10.1016/j.eng.2023.10.011
  42. Y. Wang et al., "Aligning Large Language Models with Human: A Survey," arXiv preprint, 2023, https://doi.org/10.48550/arXiv.2307.12966
  43. T. Shen et al., "Large Language Model Alignment: A Survey," arXiv preprint, 2023, https://doi.org/10.48550/arXiv.2309.15025
  44. J. Ji et al., "AI Alignment: A Comprehensive Survey," arXiv preprint, 2023, https://doi.org/10.48550/arXiv.2310.19852
  45. Z. Wang et al., "A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2407.16216
  46. V. Vats et al., "A Survey on Human-AI Teaming with Large Pre-Trained Models," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2403.04931
  47. Stanford Univ., "2024 AI Index Report," https://aiindex.stanford.edu/report/
  48. A. Draganm H. King, and A. Dafoe, "Introducing the Frontier Safety Framework," Deepmind, 2024, https://deepmind.google/discover/blog/introducing-the-frontier-safety-framework/
  49. M.M. Ferdaus et al., "Towards Trustworthy AI: A Review of Ethical and Robust Large Language Models," https://arxiv.org/abs/2407.13934
  50. N. Kolt et al., "Responsible reporting for frontier AI Development," arXiv preprint, 2024, https://doi.org/10.48550/arXiv.2407.13934
  51. N. Diaz-Rodriguez et al., "Connecting the Dots in Trustworthy Artificial Intelligence: From AI Principles, Ethics, and Key Requirements to Responsible AI Systems and Regulation," Inf. Fusion., vol. 99, 2023, https://doi.org/10.1016/j.inffus.2023.101896
  52. 정규환 외, "LLM의 의료분야 적용 가능성 및 시사점," 대한의료정보학회 이슈 리포트, vol. 5, no. 1, 2023, pp. 1-18.
  53. Notable AI Models, Epoch AI, https://epochai.org/data/notable-ai-models