DOI QR코드

DOI QR Code

A NODE PREDICTION ALGORITHM WITH THE MAPPER METHOD BASED ON DBSCAN AND GIOTTO-TDA

  • DONGJIN LEE (GRADUATE SCHOOL OF ARTIFICIAL INTELLIGENCE, POHANG UNIVERSITY OF SCIENCE AND TECHNOLOGY) ;
  • JAE-HUN JUNG (DEPARTMENT OF MATHEMATICS, POHANG UNIVERSITY OF SCIENCE AND TECHNOLOGY)
  • Received : 2023.11.30
  • Accepted : 2023.12.23
  • Published : 2023.12.25

Abstract

Topological data analysis (TDA) is a data analysis technique, recently developed, that investigates the overall shape of a given dataset. The mapper algorithm is a TDA method that considers the connectivity of the given data and converts the data into a mapper graph. Compared to persistent homology, another popular TDA tool, that mainly focuses on the homological structure of the given data, the mapper algorithm is more of a visualization method that represents the given data as a graph in a lower dimension. As it visualizes the overall data connectivity, it could be used as a prediction method that visualizes the new input points on the mapper graph. The existing mapper packages such as Giotto-TDA, Gudhi and Kepler Mapper provide the descriptive mapper algorithm, that is, the final output of those packages is mainly the mapper graph. In this paper, we develop a simple predictive algorithm. That is, the proposed algorithm identifies the node information within the established mapper graph associated with the new emerging data point. By checking the feature of the detected nodes, such as the anomality of the identified nodes, we can determine the feature of the new input data point. As an example, we employ the fraud credit card transaction data and provide an example that shows how the developed algorithm can be used as a node prediction method.

Keywords

Acknowledgement

This work is supported by National Research Foundation of Korea under the grant number 2021R1A2C3009648 and POSTECH Basic Science Research Institute under the NRF grant number NRF2021R1A6A1A1004294412.

References

  1. G. Carlsson. Topology and data. Bulletin of the American Mathematical Society, 46:255-308, 2009. https://doi.org/10.1090/S0273-0979-09-01249-X
  2. F. Hensel, M. Moor, and B. Rieck. A survey of topological machine learning methods. Frontiers Artificial Intelligence, 4, 681108, 2021.
  3. D. Cohen-Steiner, H. Edelsbrunner, and J. Harer. Stability of persistence diagrams. Discrete Computational Geometry, 37:103-120, 2007. https://doi.org/10.1007/s00454-006-1276-5
  4. A. Zomorodian and G. Carlsson. Computing persistent homology. In: SCG '04 Proceedings of the Twentieth Annual Symposium on Computational Geometry, page 347-356, 2004.
  5. C. Leesten and J.-H. Jung. Detection of gravitational waves using topological data analysis and convolutional neural network: An improved approach. arXiv:1910.08245, 2019.
  6. Jose A. Perea and John Harer. Sliding windows and persistence: An application of topological methods to signal analysis. Foundations of Computational Mathematics, 15(3):799-838, Jun 2015. https://doi.org/10.1007/s10208-014-9206-z
  7. Keunsu Kim and Jae-Hun Jung. Exact multi-parameter persistent homology of time-series data: Fast and variable one-dimensional reduction of multi-parameter persistence theory, 2023.
  8. John Nicponski and Jae-Hun Jung. Topological data analysis of vascular disease: A theoretical framework. Frontiers in Applied Mathematics and Statistics, 6, 2020.
  9. Gurjeet Singh, Facundo Memoli, and Gunnar Carlsson. Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition. In M. Botsch, R. Pajarola, B. Chen, and M. Zwicker, editors, Eurographics Symposium on Point-Based Graphics. The Eurographics Association, 2007.
  10. Guillaume Tauzin, Umberto Lupo, Lewis Tunstall, Julian Burella Perez, Matteo Caorsi, Anibal M. Medina-Mardones, Alberto Dassatti, and Kathryn Hess. giotto-tda: A topological data analysis toolkit for machine learning and data exploration. Journal of Machine Learning Research, 22(39):1-6, 2021.
  11. Karl Pearson F.R.S. Liii. on lines and planes of closest fit to systems of points in space. Philosophical Magazine Series 1, 2:559-572, 1901. https://doi.org/10.1080/14786440109462720
  12. Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579-2605,
  13. Enrique Alvarado, Robin Belton, Emily Fischer, Kang-Ju Lee, Sourabh Palande, Sarah Percival, and Emilie Purvine. g-mapper: Learning a cover in the mapper construction, arXiv:2309.06634, 2023.
  14. Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu. A density-based algorithm for discover-ing clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD'96, page 226-231. AAAI Press, 1996.
  15. Clement Maria, Jean-Daniel Boissonnat, Marc Glisse, and Mariette Yvinec. The gudhi library: Simplicial complexes and persistent homology. In Hoon Hong and Chee Yap, editors, Mathematical Software - ICMS 2014, pages 167-174, Berlin, Heidelberg, 2014. Springer Berlin Heidelberg.
  16. Hendrik Jacob van Veen, Nathaniel Saul, David Eargle, and Sam W. Mangham. Kepler mapper: A flexible python implementation of the mapper algorithm. Journal of Open Source Software, 4(42):1315, 2019.
  17. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011.
  18. Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining, pages 413-422, 2008.