DOI QR코드

DOI QR Code

Structuring of Unstructured SNS Messages on Rail Services using Deep Learning Techniques

  • Park, JinGyu (Dept. of Computer Engineering, Hanbat National University) ;
  • Kim, HwaYeon (Dept. of Computer Engineering, Hanbat National University) ;
  • Kim, Hyoung-Geun (Smart R&D Center, U-CORE System Co.) ;
  • Ahn, Tae-Ki (Smart Station Research Team, Korea Railroad Research Institute) ;
  • Yi, Hyunbean (Dept. of Computer Engineering, Hanbat National University)
  • Received : 2018.06.22
  • Accepted : 2018.07.12
  • Published : 2018.07.31

Abstract

This paper presents a structuring process of unstructured social network service (SNS) messages on rail services. We crawl messages about rail services posted on SNS and extract keywords indicating date and time, rail operating company, station name, direction, and rail service types from each message. Among them, the rail service types are classified by machine learning according to predefined rail service types, and the rest are extracted by regular expressions. Words are converted into vector representations using Word2Vec and a conventional Convolutional Neural Network (CNN) is used for training and classification. For performance measurement, our experimental results show a comparison with a TF-IDF and Support Vector Machine (SVM) approach. This structured information in the database and can be easily used for services for railway users.

Keywords

References

  1. K. indicator, http://www.index.go.kr
  2. Rail Safety Information System, http://www.railsafety.or.kr
  3. N.G. Kim, D.H. Lee, H.C. Choi and W.X.S. Wong, "Investigations on Techniques and Applications of Text Analytics," The Journal of Korean Institute of Communications and Information Sciences, Vol. 42, No. 2, pp. 471-492, February 2017. https://doi.org/10.7840/kics.2017.42.2.471
  4. G. Salton and C. Buckley, "Term-weighting approaches in automatic text retrieval," Information Processing and Management, Vol. 24, No. 5, pp. 513-523, January 1988. https://doi.org/10.1016/0306-4573(88)90021-0
  5. D.W. Kim and M.W. Koo, "Categorization of Korean News Articles Based on Convolutional Neural Network Using Doc2Vec and Word2Vec," Journal of Computing Science and Engineering, Vol. 44, No. 7, pp. 742-747, July 2017.
  6. Y. LeCun and Y. Bengio, "Convolutional Network for Images, Speech, and Time-Series," The handbook of brain theory and neural networks, pp. 255-258, 1998.
  7. Y. Kim, "Convolutional Neural Networks for Sentence Classification," Proceedings of the 2014 Conference on Empirical Methods on Natural Language Processing, pp. 1746-1751, Doha, Qatar, October 2014.
  8. T. Mikolov, K. Chen, G. Corrado and J. Dean, "Efficient Estimation of Word Representations in Vector Space," arXiv:1301.3781v3, September 2013.
  9. Q. Le and T. Mikolov, "Distributed Representation of Sentences and Documents," Proceedings of the 31st International Conference on International Conference on Machine Learning, Vol. 32, No. 2, pp 1188-1196, Beijing, China, June 2014.
  10. Y. Bengio, R. Ducharme, P. Vincent and C. Jauvin, "A Neural Probabilistic Language Model," Journal of Machine Learning Research, Vol. 3, No. 6, PP. 1137-1155, February 2003.
  11. Machine Learning Framework, www.tensorflow.org
  12. Wiki Web Site, https://namu.wiki
  13. W.J. Kim, D.H. Kim and H.W. Jang, "Semantic Extension Search for Documents Using the Word2vec," Journal of the Korea Contents Association, Vol. 16, No. 10, PP. 687-692, October 2016. https://doi.org/10.5392/JKCA.2016.16.10.687