DOI QR코드

DOI QR Code

고차원 매핑기법과 딥러닝 네트워크를 통한 정형데이터의 분류

Classification of Tabular Data using High-Dimensional Mapping and Deep Learning Network

  • 김경택 (부경대학교 인공지능융합학과) ;
  • 장원두 (부경대학교 컴퓨터.인공지능공학부)
  • Kyeong-Taek Kim (Artificial Intelligence Convergence, Pukyong National University) ;
  • Won-Du Chang (Division of Computer Engineering and Artificial Intelligence, Pukyong National University)
  • 투고 : 2023.10.26
  • 심사 : 2023.11.24
  • 발행 : 2023.12.31

초록

최근 딥러닝은 다양한 분야에서 전통적인 기계학습에 비해 월등히 높은 성능을 보이고 있으며, 패턴인식을 위한 보편적인 방법으로 자리 잡아 가고 있다. 하지만, 이에 비해 정형데이터를 사용하는 분류 문제에서는 여전히 머신러닝 기법이 주류를 이루고 있다. 본 논문에서는 정형데이터를 고차원 텐서로 변환하는 네트워크 모듈을 제안하며, 이 모듈을 보편적인 딥러닝 네트워크와 함께 구성하여 정형데이터의 분류 문제에 적용하였다. 제안된 방법은 4종의 데이터셋을 활용하여 학습 및 검증되었으며, 제안된 방법은 90.22%의 평균 정확도를 달성하여, 최신 딥러닝 모델인 TabNet에 비해 2.55%p 높은 정확도를 보였다. 제안된 방법은 컴퓨터 비전 분야에서 높은 성능을 보이는 다양한 네트워크 구조를 정형데이터에 활용할 수 있다는 점에서 의미가 있다.

Deep learning has recently demonstrated conspicuous efficacy across diverse domains than traditional machine learning techniques, as the most popular approach for pattern recognition. The classification problems for tabular data, however, are remain for the area of traditional machine learning. This paper introduces a novel network module designed to tabular data into high-dimensional tensors. The module is integrated into conventional deep learning networks and subsequently applied to the classification of structured data. The proposed method undergoes training and validation on four datasets, culminating in an average accuracy of 90.22%. Notably, this performance surpasses that of the contemporary deep learning model, TabNet, by 2.55%p. The proposed approach acquires significance by virtue of its capacity to harness diverse network architectures, renowned for their superior performance in the domain of computer vision, for the analysis of tabular data.

키워드

과제정보

본 연구는 과학기술정보통신부 및 정보통신기획평가원의 지역지능화혁신인재양성(Grand ICT연구센터) 사업 (IITP-2023-2016-0-00318)과 과학기술정보통신부의 재원으로 수행된 연구개발특구진흥재단-기술사업화 역량강화 사업의 지원을 받아 수행된 연구임.(No. 2023-BS-RD-0061 / 지능형 보안 감시 시스템 고도화 및 상용화 기술 개발)

참고문헌

  1. A.Krizhevsky, I.Sutskever and G.E.Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," Advances in Neural Information Processing Systems 25, 2012. 
  2. C.Szegedy, W.Liu, Y.Jia, P.Sermanet, S.Reed, D.Anguelov, D.Erhan, V.Vanhoucke and A.Rabinovich, "Going Deeper with Convolutions," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1-9, 2015. 
  3. A.Dosovitskiy, L.Beyer, A.Kolesnikov, D.Weissenborn, X.Zhai, T.Unterthiner, M.Dehghani, M.Minderer, G.Heigold, S.Gelly, J.Uszkoreit and N.Houlsby, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," International Conference on Learning Representations, 2021. 
  4. K.He, X.Zhang, S.Ren and J.Sun, "Deep Residual Learning for Image Recognition," Computer Vision and Pattern Recognition, pp.770-778, 2016. 
  5. B.Lim, S.O.Arik, N.Loeff and T.Pfister, "Temporal Fusion Transformers for Interpretable Multi-Horizon Time Series Forecasting," International Journal of Forecasting, Vol.37, No.4, pp.1748-1764, 2021.  https://doi.org/10.1016/j.ijforecast.2021.03.012
  6. J.Devlin, M.W.Chang, K.Lee and K.Toutanova. "BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding," Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vo1.1, 2019. 
  7. A.Vaswani, N.Shazeer, N.Parmar, J.Uszkoreit, L.Jones, A.N.Gomez, L.Kaiser and L.Polosukhin, "Attention Is All You Need," Advances in Neural Information Processing systems 30, pp.5998-6008, 2017. 
  8. S.O.Arik and T.Pfister, "Tabnet: Attentive Interpretable Tabular Learning," Proceedings of the AAAI Conference on Artificial Intelligence, Vol.35, No.8, pp.6679-6687, 2021. 
  9. R.Shwartz-Ziv and A.Armon, "Tabular Data: Deep Learning is Not All You Need," Information Fusion, Vol.81, pp.84-90, 2022.  https://doi.org/10.1016/j.inffus.2021.11.011
  10. V.Borisov, T.Leemann, K.Sessler, J.Haug, M.Pawelczyk and G.Kansneci, "Deep Neural Networks and Tabular Data: A Survey," IEEE Transactions on Neural Networks and Learning Systems, 2022. 
  11. I.Shavitt and E.Segal, "Regularization Learning Networks: Deep Learning for Tabulat Datasets," International Conference on Neural Information Processing Systems 31, pp.1379-1389, 2018. 
  12. G.Somepalli, M.Goldblum, A.Schwarzschild, C.B.Bruss and T.Goldstein, "SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training," arXiv:2106.01342, 2021. 
  13. L.Buturovic and D.Miljkovic, "A Novel Method for Classification of Tabular Data Using Convolutional Neural Network," BioRxiv, 2020. 
  14. M.I.Iqbal, S.H.Mukta and A.R.Hasan, "A Dynamic Weighted Tabular Method for Convolutional Neural Networks," IEEE Acces, 10, pp.134183-134198, 2022.  https://doi.org/10.1109/ACCESS.2022.3231102
  15. Y.Zhu, T.Brettin, F.Xia, A.Partin, M.Shukla, H.Yoo, Y.A.Evard, J.H.Doroshow and R.L.Stevens, "Converting Tabular Data into Images for Deep Learning With Convolutional Neural Networks," Scientific Reports 11.1, 2021. 
  16. T.Chen and C.Guestrin, "XGBoost: A Scalable Tree Boosting System," The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.785-794, 2016.