DOI QR코드

DOI QR Code

A Case Study on Metadata Extractionfor Records Management Using ChatGPT

챗GPT를 활용한 기록관리 메타데이터 추출 사례연구

  • 김민지 (명지대학교 기록정보과학전문대학원 데이터기록전공) ;
  • 강성희 (명지대학교 기록정보과학전문대학원 데이터기록전공) ;
  • 이해영 (명지대학교 기록정보과학전문대학원 기록관리전공)
  • Received : 2024.04.16
  • Accepted : 2024.05.10
  • Published : 2024.05.31

Abstract

Metadata is a crucial component of record management, playing a vital role in properly managing and understanding the record. In cases where automatic metadata assignment is not feasible, manual input by records professionals becomes necessary. This study aims to alleviate the challenges associated with manual entry by proposing a method that harnesses ChatGPT technology for extracting records management metadata elements. To employ ChatGPT technology, a Python program utilizing the LangChain library was developed. This program was designed to analyze PDF documents and extract metadata from records through questions, both with a locally installed instance of ChatGPT and the ChatGPT online service. Multiple PDF documents were subjected to this process to test the effectiveness of metadata extraction. The results revealed that while using LangChain with ChatGPT-3.5 turbo provided a secure environment, it exhibited some limitations in accurately retrieving metadata elements. Conversely, the ChatGPT-4 online service yielded relatively accurate results despite being unable to handle sensitive documents for security reasons. This exploration underscores the potential of utilizing ChatGPT technology to extract metadata in records management. With advancements in ChatGPT-related technologies, safer and more accurate results are expected to be achieved. Leveraging these advantages can significantly enhance the efficiency and productivity of tasks associated with managing records and metadata in archives.

기록관리에서 메타데이터는 기록을 구성하는 필수 요소 중 하나로 기록물을 적절하게 관리하고 이해하도록 하는데 매우 중요한 역할을 한다. 기록관리 업무에서 메타데이터 요소들의 자동 부여가 불가능할 경우에는 기록전문가가 메타데이터 값을 직접 입력해야 한다. 이러한 업무의 불편함을 개선하기 위해 본 연구에서는 신기술인 챗GPT를 활용하여 기록관리 메타데이터 요소의 추출 방안을 제시하고자 하였다. 챗GPT 기술을 활용하기 위해 파이썬 프로그램과 랭체인 라이브러리를 이용하여 PDF 문서를 제시하고 질문을 통해 기록물의 메타데이터를 추출해보았고, 챗GPT 온라인 서비스를 통해 여러 건의 PDF 문서를 첨부하여 기록물의 메타데이터 요소를 추출해보았다. 그 결과 챗GPT-3.5 turbo를 사용한 랭체인에서는 보안상으로는 안전한 추출 방법이긴 하나 메타데이터의 정확한 요소를 얻기에는 다소 한계가 있었고, 챗GPT-4 온라인 서비스에서는 보안상 중요 문서를 첨부할 수 없지만 비교적 정확한 결과를 추출하였다. 이를 통해 기록관리에서의 메타데이터 추출을 위한 챗GPT 기술 활용의 가능성을 타진할 수 있었고, 챗GPT 관련 기술의 발달에 따라 좀 더 안전하고 정확한 결과 추출이 가능해질 것이다. 이러한 챗GPT의 장점을 활용함으로써 기록관에서 기록 및 메타데이터의 관리적 측면에서 업무의 효율성 및 생산성을 증대시키는데 도움을 줄 수 있을 것이라 기대한다.

Keywords

Acknowledgement

본 논문은 김민지의 석사학위논문 「기록관리 메타데이터 추출을 위한 챗GPT 활용 방안」(2024)을 요약·수정한 것임.

References

  1. Ahn, Jong-Bae. (2023). ChatGPT-4: The Future World of Artificial Intelligence. Paju: KwangmoonGak Publishing Media.
  2. Ahn, Sejin, Hwang, Hyunho, & Yim, Junhee (2022). A Case Study on the Application of AI-OCR for Data Transformation of Paper Records. Journal of the Korean Society for Information Management, 39(3), 165-193. https://doi.org/10.3743/KOSIM.2022.39.3.165
  3. Ann Media (2023). ChatGPT & AI Usage. Seoul: Seongandang.
  4. Choi, Eunseok (2023). Learning by Creating Your Own AI Service. Paju: Wikibooks.
  5. Choi, Joo ho & Lee, Jae Young (2012). Extracting and Validating Metadata in Electronic Records. Journal of Korean Society of Archives and Records Management, 12(1), 7-32. https://doi.org/10.14404/JKSARM.2012.12.1.007
  6. Choi, Sang-Mi & Lee, Sang-Yong (2007). A Study on the Elements of Current Electronic Records. Journal of Korean Society of Archives and Records Management, 7(1), 39-60. https://doi.org/10.14404/JKSARM.2007.7.1.039
  7. Hurukawa, Hidekazu (2023). Using GPT-4, ChatGPT, LlamaIndex, and Langchain for AI Programming. Paju: Wikibooks.
  8. Jo, Yong-ju & Im, Jwa-sang (2023). The Definitive Guide to Python Programming for Beginners. Seoul: Gilbut Publishing Company.
  9. Kang, Yoona & Oh, Hyo-Jung (2023). The Use of Generative AI Technologies in Electronic Records Management and Archival Information Service. Journal of Korean Society of Archives and Records Management, 23(4), 179-200. https://doi.org/10.14404/JKSARM.2023.23.4.179
  10. Kim Sumin & Baek Sunhwan (2023). ChatGPT: The Great Transition. Seoul: RH Korea.
  11. Kim, InA, Kang, Young-Sun, & Lee, Kyu-chul (2020). Metadata Design and Machine Learning-Based Automatic Indexing for Efficient Data Management of Image Archives of Local Governments in South Korea. Journal of Korean Society of Archives and Records Management, 20(2), 67-83. https://doi.org/10.14404/JKSARM.2020.2 0.2.067
  12. Lee, Seongyong (2023). Creating AI Employees with ChatGPT & Python. Seoul: EJIS Publishing.
  13. Metadata Standard for Records and Archives Management Version 2.3. NAK 8:2022(v2.3).
  14. Nam, Yongwoo (2013). A Study on Design of Archival Digital Components Metadata Extraction Tool. Master's thesis, Myongji University Graduate School of Records, Archives & Information Science.
  15. National Archives of Korea [n.d.] National Archives of Korea's Electronic Records Management Long-Term Preservation Format. Available: https://www.archives.go.kr/next/manager/electronicPreservation.do
  16. Seo, Jiin, Rho & Jee-Hyun (2022). A Study on Designing Metadata Elements for the Management of Digitized Records. Journal of Korean Society of Archives and Records Management, 22(4), 1-24. https://doi.org/10.14404/JKSAR M.2022.22.4.001
  17. Seoul Information Communication Plaza (2023. 10. 10.). 2023 4th National Policy Issue Legislative Understanding Course (Common) Registration Guide. Available: https://opengov.seoul.go.kr/sanction/29414765
  18. Seoul Information Communication Plaza (2023. 10. 06.). 8th Artifact Collection Practical Affairs Committee Meeting Held in 2023. Available: https://opengov.seoul.go.kr/sanction/29400804
  19. Seoul Information Communication Plaza (2023. 10. 10.). Notification of Administrative Procedures for Illegal Burning of Trash and Lighting Fires in Areas Adjacent to Forests. Available: https://opengov.seoul.go.kr/sanction/29415813
  20. Seoul Information Communication Plaza (2023. 10. 06.). Notification of Approval for Weekend Work on the Small Park Construction Project at Donuimun Museum Village. Available: https://opengov.seoul.go.kr/sanction/29398818
  21. Seoul Information Communication Plaza (2023. 10. 10.). Payment for Costs Related to the Briefing Session on the Design of the West Seoul Art Museum. Available: https://opengov.seoul.go.kr/sanction/29415447
  22. Seoul Information Communication Plaza (2023. 10. 06.). Registration Guide for the Self-Guided Tour Program for Youths at the Hanseong Baekje Museum. Available: https://opengov.seoul.go.kr/sanction/29397512
  23. Seoul Information Communication Plaza (2023. 10. 06.). Request for Cooperation for the Free Invitation to 'Seoul Citizens Day' by the City-Woori Card WooriWON Professional Volleyball Team. Available: https://opengov.seou l.go.kr/sanction/29398328
  24. Seoul Information Communication Plaza (2023. 10. 10.). Request for Cooperation in Renting the Meeting Room on the First Floor of Seoul Museum of Art. Available: https://opengov.seoul.go.kr/sanction/29408813
  25. Seoul Information Communication Plaza (2023. 10. 10.). Review of Current Affairs Related to Support for Small Business Owners. Available: https://opengov.seoul.go.kr/sanction/29416536
  26. Seoul Information Communication Plaza (2023. 10. 10.). Seoul Metropolitan Art Archive Generation, Kim Yong-Ik> Children and Youth Program Participation Recruitment Notice. Available: https://opengov.seoul.go.kr/sanction/29412010
  27. Lund, B. D., & Wang, T. (2023). Chatting about ChatGPT: how may AI and GPT impact academia and libraries?. Library Hi Tech News, 40(3), 26-29.