DOI QR코드

DOI QR Code

LLM 사용자의 민감정보 유출 방지를 위한 지식그래프 기반 챗봇

A Knowledge Graph-based Chatbot to Prevent the Leakage of LLM User's Sensitive Information

  • 유기동 (단국대학교 경영학부)
  • Keedong Yoo (Dept. of Business Administration, Dankook Univ.)
  • 투고 : 2024.05.01
  • 심사 : 2024.06.09
  • 발행 : 2024.06.30

초록

거대언어모델(LLM)에 대한 수요와 활용 사례가 증가함에 따라 사용자의 민감정보가 LLM 사용 과정 중에 입력 및 유출되는 위험성 또한 증가하고 있다. 일반적으로 LLM 환각 문제의 해결을 위한 도구로 알려진 지식그래프는, LLM과는 별개로 구축되어 사용자의 민감정보를 별도로 보관 및 관리할 수 있으므로, 민감정보의 유출 가능성을 최소화하는 하나의 방법이 될 수 있다. 따라서 본 연구는 사용자로부터 입력된 자연어 기반의 질문을 LLM을 통해 지식그래프 유형에 맞는 쿼리문으로 변환하고 이를 이용하여 쿼리 실행과 결과 추출을 진행하는 지식그래프 기반 챗봇을 제시한다. 또한 본 연구에서 개발된 지식그래프 기반 챗봇의 기능적 유효성 판단을 위하여, 기존 지식그래프에 대한 이해도와 적응력, 새로운 개체 클라스 생성 능력, 그리고 지식그래프 콘텐츠에 대한 LLM의 접근 가능성 여부를 판단하는 성능 테스트를 수행한다.

With the increasing demand for and utilization of large language models (LLMs), the risk of user sensitive information being inputted and leaked during the use of LLMs also escalates. Typically recognized as a tool for mitigating the hallucination issues of LLMs, knowledge graphs, constructed independently from LLMs, can store and manage sensitive user information separately, thereby minimizing the potential for data breaches. This study, therefore, presents a knowledge graph-based chatbot that transforms user-inputted natural language questions into queries appropriate for the knowledge graph using LLMs, subsequently executing these queries and extracting the results. Furthermore, to evaluate the functional validity of the developed knowledge graph-based chatbot, performance tests are conducted to assess the comprehension and adaptability to existing knowledge graphs, the capability to create new entity classes, and the accessibility of LLMs to the knowledge graph content.

키워드

참고문헌

  1. 정진명, 김남규 (2023). 프라이버시 보호를 위한 오프사이트 튜닝 기반 언어모델 미세 조정 방법론. 지능정보연구, 29(4), 165-184. 
  2. 유기동 (2021). 연관지식의 효율적인 표현 및 추론이 가능한 지식그래프 기반 지식지도. 지능정보연구, 27(4), 49-71.
  3. Agrawal, G., Kumarage, T., Alghami, Z., & Liu, H. (2023). Can knowledge graphs reduce hallucinations in LLMs? A survey. arXiv preprint arXiv:2311.07914. 
  4. Baek, J., Aji, A. F., & Saffari, A. (2023). Knowledgeaugmented language model prompting for zero-shot knowledge graph question answering. arXiv preprint arXiv:2306.04136. 
  5. Feng, Q., He, D., Liu, Z., Wang, H., & Choo, K. K. R. (2020). SecureNLP: A system for multi-party privacy-preserving natural language processing. IEEE Transactions on Information Forensics and Security, 15, 3709-3721. 
  6. Hao, S., Tan, B., Tang, K., Ni, B., Shao, X., Zhang, H., ... & Hu, Z. (2022). BertNet: Harvesting knowledge graphs with arbitrary relations from pretrained language models. arXiv preprint arXiv:2206.14268. 
  7. Li, C. Y., Liang, X., Hu, Z., & Xing, E. P. (2019). Knowledge-driven encode, retrieve, paraphrase for medical image report generation. In Proceedings of the AAAI Conference on Artificial Intelligence, 33(1), 6666-6673. 
  8. Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3), 50-60. 
  9. Lin, G., Hua, W., & Zhang, Y. (2024). Prompt crypt: Prompt encryption for secure communication with large language models. arXiv preprint arXiv:2402.05868. 
  10. Mothersbaugh, D. L., Foxx, W. K., Beatty, S. E., & Wang, S. (2012). Disclosure Antecedents in an Online Service Context: The Role of Sensitivity of Information. Journal of Service Research, 15(1), 76-98. 
  11. Romero, O. J., Zimmerman, J., Steinfeld, A., & Tomasic, A. (2023). Synergistic integration of large language models and cognitive architectures for robust ai: An exploratory analysis. In Proceedings of the AAAI Symposium Series, 2(1), 396-405. 
  12. Santos, A., Colaco, A. R., Nielsen, A. B., Niu, L., Strauss, M., Geyer, P. E., Coscia, F., Albrechtsen, N. J. W., Mundt, F., Jensen, L. J. & Mann, M. (2022). A knowledge graph to interpret clinical proteomics data. Nature Biotechnology, 40(5), 692-702. 
  13. Sun, Y., Wang, S., Feng, S., Ding, S., Pang, C., Shang, J., ... & Wang, H. (2021). Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation. arXiv preprint arXiv:2107.02137. 
  14. Wang, H., & Shu, K. (2023). Explainable claim verification via knowledge-grounded reasoning with large language models. arXiv preprint arXiv:2310.05253. 
  15. Wang, H., Zhang, F., Zhao, M., Li, W., Xie, X., & Guo, M. (2019). Multi-task feature learning for knowledge graph enhanced recommendation. In Proceedings of the World Wide Web Conference, 2000-2010. 
  16. Xiong, C., Power, R., & Callan, J. (2017). Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the 26th International Conference on World Wide Web, 1271-1279. 
  17. Yang, Y., Cao, Z., Zhao, P., Zeng, D. D., Zhang, Q., & Luo, Y. (2021). Constructing public health evidence knowledge graph for decision-making support from COVID-19 literature of modelling study. Journal of Safety Science and Resilience, 2(3), 146-156. 
  18. Yao, L., Mao, C., & Luo, Y. (2019). KG-BERT: BERT for knowledge graph completion. arXiv preprint arXiv:1909.03193. 
  19. Yao, Y., Duan, J., Xu, K., Cai, Y., Sun, Z., & Zhang, Y. (2024), A survey on large language model (LLM) security and privacy: The good, the bad, and the ugly. High-Confidence Computing, 4(2), 100211. 
  20. Zafar, A., Parthasarathy, V. B., Van, C. L., Shahid, S., & Shahid, A. (2023). Building trust in conversational ai: A comprehensive review and solution architecture for explainable, privacy-aware systems using llms and knowledge graph. arXiv preprint arXiv:2308.13534. 
  21. Zhao, C., Zhao, S., Zhao, M., Chen, Z., Gao, C. Z., Li, H., & Tan, Y. A. (2019). Secure multi-party computation: Theory, practice and applications. Information Sciences, 476, 357-372. 
  22. Zhao, J., Chen, Y., & Zhang, W. (2019). Differential privacy preservation in deep learning: Challenges, opportunities and solutions. IEEE Access, 7, 48901-48911. 
  23. Zhu, K., Wang, J., Zhou, J., Wang, Z., Chen, H., Wang, Y., Yang, L., Ye, W., Gong, N. Z., Zhang, Y., & Xie, X. (2023). Promptbench: Towards evaluating the robustness of large language models on adversarial prompts. arXiv preprint arXiv:2306.04528. [URL] 
  24. Bratanic, T. (2023). Knowledge graph-based chatbot with GPT-3 and Neo4j. Medium, https://medium.com/neo4j/knowledge-graph-based-chatbot-with-gpt-3-and-neo4j-c4ebbd325ed 
  25. Cuomo, J. (2023). Exploring the risks and alternatives of ChatGPT: Paving a path to trustworthy AI. IBM, https://www.ibm.com/blog/exploring-the-risks-and-alternatives-of-chatgpt-paving-a-path-to-trustworthy-ai/ 
  26. Huang, S. (2022). English to Cypher with GPT-3 in Doctor.ai. Medium, https://towardsdatascience.com/gpt-3-for-doctor-ai-1396d1cd6fa5 
  27. Lim, S. B. (2023). 7 biggest ChatGPT security risks for organisations. https://metomic.io/resource-centre/is-chatgpt-a-security-risk-to-your-business 
  28. OpenAI (2024). What is ChatGPT? Commonly asked questions about ChatGPT. https://help.openai.com/en/articles/6783457-what-is-chatgpt 
  29. Singhal, A. (2012). Introducing the Knowledge Graph: things, not strings. https://blog.google/products/search/introducing-knowledge-graph-things-not/