DOI QR코드

DOI QR Code

Study on Navigation Data Preprocessing Technology for Efficient Route Clustering

효율적인 항로 군집화를 위한 항해 데이터 전처리 기술에 관한 연구

  • Dae-Han Lee (Graduate School of Mokpo National Maritime University)
  • 이대한 (국립목포해양대학교 해상운송시스템학과)
  • Received : 2024.06.28
  • Accepted : 2024.08.29
  • Published : 2024.08.31

Abstract

The global maritime industry is developing rapidly owing to the emergence of autonomous ship technology, and interest in utilizing artificial intelligence derived from marine data is increasing. Among the diverse technological developments, ship-route clustering is emerging as an important technology for the commercialization of autonomous ships. Through route clustering, ship-route patterns are extracted from the sea to obtain the fastest and safest route and serve as a basis for the development of a collision-prevention system. High-quality, well-processed data are essential in ensuring the accuracy and efficiency of route-clustering algorithms. In this study, among the various route-clustering methods, we focus on the ship-route-similarity-based clustering method, which can accurately reflect the actual shape and characteristics of a route. To maximize the efficiency of this method, we attempt to formulate an optimal combination of data-preprocessing technologies. Specifically, we combine four methods of measuring similarity between ship routes and three dimensionality-reducing methods. We perform k-means cluster analysis for each combination and then quantitatively evaluate the results using the silhouette index to obtain the best-performing preprocessing combination. This study extends beyond merely identifying the optimal preprocessing technique and emphasizes the importance of extracting meaningful information from a wide range of ocean data. Additionally, this study can be used as a reference for effectively responding to the digital transformation of the maritime and shipping industry in the Fourth Industrial Revolution era.

세계 해양산업은 자율운항선박 기술의 등장으로 급속도로 발전하고 있으며, 해양 데이터에서 파생된 인공지능 활용에 관한 관심이 높아지고 있다. 다양한 기술 발전 중에서 선박 항로 군집화는 자율운항선박 상용화를 위한 중요한 기술로 부각되고 있다. 항로 군집화를 통해 해상에서 선박 항로 패턴을 추출하여 가장 빠르고 안전한 항로를 최적화하고 충돌 방지 시스템의 개발에 기반이 된다. 항로 군집화 알고리즘의 정확성과 효율성을 보장하기 위해 고품질의 잘 처리된 데이터가 필수적이다. 본 연구에서는 다양한 항로 군집화 방법중 항로의 실제 형태와 특성을 정확히 반영할 수 있는 선박 항로 유사도 기반 군집화 방식에 주목하였다. 이러한 방식의 효율을 극대화하기 위해 최적의 데이터 전처리 기술 조합을 구성하고자 한다. 구체적으로, 4가지의 선박 항로 간 유사도 측정법과 3가지의 차원 축소 방법을 조합하여 연구를 진행하였다. 각 조합에 대해 k-means 군집 분석을 수행하고, 그 결과를 Silhouette Index를 통해 정량적으로 평가하여 최고 성능을 보이는 전처리 기법 조합을 도출하였다. 본 연구는 단순히 최적의 전처리 기법을 찾는 것에 그치지 않고, 광범위한 해양 데이터에서 의미 있는 정보를 추출하는 과정의 중요성을 강조한다. 이는 4차 산업혁명 시대의 해양 및 해운 산업이 직면한 디지털 전환에 효과적으로 대응하기 위한 기초 연구로서 의의를 갖는다.

Keywords

References

  1. Abdi, H. and L. J. Williams(2010), Principal Component Analysis, Wiley Interdisciplinary Reviews: Computational Statistics, Vol. 2, No. 4, pp. 433-459.
  2. Askari, H. R. and M. N. Hossain(2022), Towards utilizing autonomous ships: A viable advance in industry 4.0, Journal of International Maritime Safety, Environmental Affairs, and Shipping, Vol. 6, No. 1, pp. 39-49.
  3. Atev, S., G. Miller, and N. P. Papanikolopoulos(2010), Clustering of vehicle trajectories, Transactions on Intelligent Transportation Systems, Vol. 11, No. 3, pp. 647-657.
  4. Balkan, D.(2020), Maritime 4.0 And Expectations in Maritime Sector, Akademik Incelemeler Dergisi, Vol. 15, No. 1, pp. 133-170.
  5. Bergroth, L., H. Hakonen, and T. Raita(2000), A survey of longest common subsequence algorithms, Proceedings Seventh International Symposium on String Processing and Information Retrieval, pp. 39-48.
  6. Berndt, D. J. and J. Clifford(1994), Using Dynamic Time Warping to Find Patterns in Time Series, Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 359-370.
  7. Bui, V. D. and H. P. Nguyen(2021), A Comprehensive Review on Big Data-Based Potential Applications in Marine Shipping Management, International Journal on Advanced Science, Engineering and Information Technology, Vol. 11, No. 3, pp. 1067-1077.
  8. Dubuisson, M. P. and A. K. Jain(1994), A modified Hausdorff distance for object matching, Proceedings of 12th International Conference on Pattern Recognition, Vol. 1, pp. 566-568.
  9. Durlik, I., T. Miller, D. Cembrowska-Lech, A. Krzeminska, E. Zloczowska, and A. Nowak(2023), Navigating the sea of data: A comprehensive review on data analysis in maritime IoT applications, Applied Sciences, Vol. 13, No. 17, 9742.
  10. Emmens, T., C. Amrit, A. Abdi, and M. Ghosh(2021), The promises and perils of Automatic Identification System data, Expert Systems with Applications, Vol. 178, 2021, 114975.
  11. Fan, C., M. Chen, X. Wang, J. Wang, and B. Huang(2021), A review on data preprocessing techniques toward efficient and reliable knowledge discovery from building operational data, Frontiers in Energy Research, Vol. 9, 652801.
  12. Garcia, S., J. Luengo, and F. Herrera(2016), Tutorial on practical tips of the most influential data preprocessing algorithms in data mining, Knowledge-Based Systems, Vol. 98, pp. 1-29.
  13. Hahbakhsh, M., G. R. Emad, and S. Cahoon(2022), Industrial revolutions and transition of the maritime industry: The case of Seafarer's role in autonomous shipping, Asian Journal of Shipping and Logistics, Vol. 38, No. 1, pp, 10-18.
  14. Hotelling, H.(1933), Analysis of a Complex of Statistical Variables Into Principal Components, Journal of Educational Psychology, Vol. 24, No. 6, pp. 417-441.
  15. Huang, J., Z. Fang, and H. Kasai(2021), LCS graph kernel based on Wasserstein distance in longest common subsequence metric space, Signal Processing, Vol. 189, 108281.
  16. IMO(2018), Regulatory Scoping Exercise for the Use of Maritime Autonomous Surface Ships (MASS), MSC. 99, WP. 9.
  17. Karagiannidis, P. and N. Themelis(2021), Data-driven modelling of ship propulsion and the effect of data pre-processing on the prediction of ship fuel consumption and speed loss. Ocean Engineering, Vol. 222, 108616.
  18. Little, A., Y. Xie, and Q. Sun(2022), An analysis of classical multidimensional scaling with applications to clustering, Information and Inference: A Journal of the IMA, Vol. 12, No. 1, pp. 72-112.
  19. Liu, Z., H. Gao, M. Zhang, R. Yan, and J. Liu(2023), A data mining method to extract traffic network for maritime transport management, Ocean & Coastal Management, Vol. 239, 106622.
  20. MacQueen, J.(1967), Some methods for classification and analysis of multivariate observations, In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1, pp. 281-297.
  21. Min, Y. H.(2018), Cluster analysis of daily electricity demand with t-SNE, Journal of the Korea Society of Computer and Information, Vol. 23, No. 5, pp. 9-14.
  22. Morris, B. and M. Trivedi(2009), Learning trajectory patterns by clustering: Experimental studies and comparative evaluation, In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 312-319.
  23. Obinwanne, T., C. Udokwu, R. Zimmermann, and P. Brandtner(2023), Data Preprocessing in Supply Chain Management Analytics - A Review of Methods, the Operations They Fulfill, and the Tasks They Accomplish.: Data Preprocessing in Supply Chain Management Analytics, Proceedings of the 2023 6th International Conference on Computers in Management and Business, pp. 93-99.
  24. Rousseeuw, P. J.(1987), Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, Vol. 20, pp. 53-65.
  25. Salem, N. and S. Hussein(2019), Data dimensional reduction and principal components analysis, Procedia Computer Science, Vol. 163, pp. 292-299.
  26. Svanberg, M., V. Santen, A. Horteborn, H. Holm, and C. Finnsgard(2019), AIS in maritime research, Marine Policy, Vol. 106, 103520.
  27. Van der Maaten, L. and G. Hinton(2008). Visualizing Data using t-SNE, Journal of machine learning research, Vol. 9, No. 11, pp. 2579-2605.
  28. Vaserstein, L. N.(1969), Markov processes over denumerable products of spaces, describing large systems of automata, Problemy Peredachi Informatsii, Vol. 5, No. 3, pp. 64-72.
  29. Velasco, C. and I. Lazakis(2022), PreONA: A Data Pre-processing Tool for Marine Systems Sensor Data, Ocean And Marine Engineering, pp. 1-16.
  30. Vlachos, M., G. Kollios, and D. Gunopulos(2002), Discovering similar multidimensional trajectories, Proceedings 18th International Conference on Data Engineering, pp. 673-684.
  31. Wickelmaier, F.(2003), An introduction to MDS, Sound Quality Research Unit at Alaborg University, Vol. 46, No. 5, pp. 1-26.