DOI QR코드

DOI QR Code

OCR 프로그램을 활용한 선박 항해일지 데이터 추출 모델 개발

Development of a Ship's Logbook Data Extraction Model Using OCR Program

  • 이다인 (목포해양대학교 해상운송시스템학부) ;
  • 김성철 (목포해양대학교 승선실습과정부) ;
  • 윤익현 (목포해양대학교 항해정보시스템학부)
  • Dain Lee (Department of Maritime Transportation System, Mokpo National Maritime University) ;
  • Sung-Cheol Kim (Division of Cadet Training, Mokpo National Maritime University) ;
  • Ik-Hyun Youn (Division of Navigation & Information Systems, Mokpo National Maritime University)
  • 투고 : 2024.02.05
  • 심사 : 2024.02.23
  • 발행 : 2024.02.28

초록

빠르게 발전하는 이미지 인식 기술에도 불구하고 표 형식의 문서와 수기로 작성된 문서를 완벽하게 디지털화하기에는 아직 어려움이 따른다. 본 연구는 표 형식의 수기 문서인 선박 항해일지를 작성하는 데에 사용되는 규칙을 이용하여 보정 작업을 수행함으로써 OCR 결과물의 정확도를 향상시키고자 한다. 이를 통해 OCR 프로그램을 통하여 추출된 항해일지 데이터의 정확성과 신뢰성을 높일 것으로 기대된다. 본 연구는 목포해양대학교 실습선 새누리호의 2023년에 항해한 57일간의 항해일지 데이터를 대상으로 OCR 프로그램 인식 후 발생한 오류를 보정하여 그 정확도를 개선하고자 하였다. 이 모델은 항해일지 기재 시 고려되는 몇 가지 규칙을 활용하여 오류를 식별한 후, 식별된 오류를 보정하는 방식으로 구성하였다. 모델을 활용하여 오류를 보정 후, 그 효과를 평가하고자 보정 전과 후의 데이터를 항차별로 구분한 후, 같은 항차의 같은 변수끼리 비교하였다. 본 모델을 활용하여 실제 셀 오류율은 약 11.8% 중 약 10.6%의 오류를 식별하였고, 123개의 오류 중 56개를 개선하였다. 본 연구는 항해일지 중 항해정보를 기입하는 Dist.Run부터 Stand Course까지의 정보만을 대상으로 수행하였다는 한계점이 있으므로, 추후 항해정보 뿐만 아니라 기상정보 등 항해일지의 더 많은 정보를 보정하기 위한 연구를 진행할 예정이다.

Despite the rapid advancement in image recognition technology, achieving perfect digitization of tabular documents and handwritten documents still challenges. The purpose of this study is to improve the accuracy of digitizing the logbook by correcting errors by utilizing associated rules considered during logbook entries. Through this, it is expected to enhance the accuracy and reliability of data extracted from logbook through OCR programs. This model is to improve the accuracy of digitizing the logbook of the training ship "Saenuri" at the Mokpo Maritime University by correcting errors identified after Optical Character Recognition (OCR) program recognition. The model identified and corrected errors by utilizing associated rules considered during logbook entries. To evaluate the effect of model, the data before and after correction were divided by features, and comparisons were made between the same sailing number and the same feature. Using this model, approximately 10.6% of errors out of the total estimated error rate of about 11.8% were identified, and 56 out of 123 errors were corrected. A limitation of this study is that it only focuses on information from Dist.Run to Stand Course sections of the logbook, which contain navigational information. Future research will aim to correct more information from the logbook, including weather information, to overcome this limitation.

키워드

참고문헌

  1. Ayre, M., J. Nicholls, C. Ward, and D. Wheeler(2015), Ships' logbooks from the Arctic in the pre instrumental period, Geoscience Data Journal, Vol. 2, No. 2, pp. 53-62. https://doi.org/10.1002/gdj3.27
  2. Catchpole, A. J. W. and M. A. Faurer(1985), Ships' log-books, sea ice and the cold summer of 1816 in Hudson Bay and its approaches, Arctic, pp. 121-128.
  3. Garcia-Herrera, R., D. Barriopedro, D. Gallego, J. Mellado-Cano, D. Wheeler, and C. Wilkinson(2018), Understanding weather and climate of the last 300 years from ships' logbooks, Wiley Interdisciplinary Reviews: Climate Change, Vol. 9, No. 6, pp. e544.
  4. Hong, O. S. and N. H. Kim(2020), The Exploratory Voyages of Joseon by Europeans around the 19th Century and the Records of Voyages, Journal of the Center for Korean Studies, Inha University, Vol. 58, pp. 9-39.
  5. Jeon, J. H. and T. G. Jeong(2016), Studies on the Improvement and Analysis of Data Entry Error to the AIS System for the Traffic Ships in the Korean Coastal Area, The Journal of Fisheries and Marine Sciences Education, Vol. 28, No. 6, pp. 1812-1821. https://doi.org/10.13000/JFMSE.2016.28.6.1812
  6. Kim, D. Y., H. S. Kim, J. S. Kim, S. C. Kim, and K. I. Hwang(2011), Development of a TTS based Book Reader for the Blind, Annual Spring Conference of KIPS, Vol. 18, No. 2, pp. 422-424.
  7. Kim, J. H.(2017), A Case Study of Transcription Programs Based on Citizens' Contribution to Overseas Archival Institutions, Journal of Korean Society of Archives and Records Management, Vol. 17, No. 4, pp. 51-86.
  8. Kwon, O.(2014), A Study on the salvage of shipwreck vessel and the reward of salvage charge, The Journal of Korea Research Society for Customs, Vol. 15, No. 4, pp. 239-259.
  9. Lorrey, A. M., P. R. Pearce, R. Allan, C. Wilkinson, J. M. Woolley, E. Judd, S. Mackay, S. Rawhat, L. Slivinski, S. Wilkinson, E. Hawkins, P. Quesnel, and G. P. Compo(2022), Meteorological data rescue: Citizen science lessons learned from Southern Weather Discovery, Patterns, Vol. 3, No. 6.
  10. Moon, S. H. and J. W. Kim(2023), Deep Learning-based Automated Sentence Segmentation for Digitization of Offline Document from Industrial Jobsites, 2023 Spring Joint Conference of KORMS and KIIE, pp. 3090-3097.
  11. Park, Y. S.(2022), A Proposed Amendment to the Korean Seafarers' Act on Log Book Entries, The Journal of Korea Maritime Law Association, Vol. 44, No. 3, pp. 227-264.
  12. Prieto, J. R., J. Andres, E. Granell, J. A. Sanchez, and E. Vidal(2023), Information extraction in handwritten historical logbooks, Pattern Recognition Letters, Vol. 172, pp. 128-136. https://doi.org/10.1016/j.patrec.2023.06.008
  13. Seida, K., H. Chiba, and A. Ohsaka(2020), Digitalizing the Full Documents of Deck Log Book of Sail Training Ship "Kaiwo Maru I" and a Study for the Usage of the Data, The Journal of Japan Institute of Navigation, Vol. 143.
  14. Teleti, P., E. Hawkins, and K. R. Wood(2023), Digitizing weather observations from World War II US naval ship logbooks, Geoscience Data Journal.
  15. Wheeler, D.(2014), Hubert Lamb's 'treasure trove': ships' logbooks in climate research, Weather, Vol. 69, No. 5, pp. 133-139. https://doi.org/10.1002/wea.2284
  16. Wiegmans, B., P. Witte, M. Janic, and T. de Jong(2020), Big data of the past: Analysis of historical freight shipping corridor data in the period 1662-1855, Research in Transportation Business & Management, Vol. 34, pp. 100459.
  17. Woodruff, S. D., H. F. Diaz, S. J. Worley, R. W. Reynolds, and S. J. Lubker(2005), Early ship observational data and ICOADS, Climatic Change, Vol. 73, No. 1-2, pp. 169-194. https://doi.org/10.1007/s10584-005-3456-3