DOI QR코드

DOI QR Code

Introduction and Utilization of Time Series Data Integration Framework with Different Characteristics

서로 다른 특성의 시계열 데이터 통합 프레임워크 제안 및 활용

  • Received : 2022.09.15
  • Accepted : 2022.11.15
  • Published : 2022.11.30

Abstract

With the development of the IoT industry, different types of time series data are being generated in various industries, and it is evolving into research that reproduces and utilizes it through re-integration. In addition, due to data processing speed and issues of the utilization system in the actual industry, there is a growing tendency to compress the size of data when using time series data and integrate it. However, since the guidelines for integrating time series data are not clear and each characteristic such as data description time interval and time section is different, it is difficult to use it after batch integration. In this paper, two integration methods are proposed based on the integration criteria setting method and the problems that arise during integration of time series data. Based on this, integration framework of a heterogeneous time series data was constructed that is considered the characteristics of time series data, and it was confirmed that different heterogeneous time series data compressed can be used for integration and various machine learning.

IoT 산업 발전으로 다양한 산업군에서 서로 다른 형태의 시계열 데이터를 생성하고 있으며 이를 다시 통합하여 재생산 및 활용하는 연구로 진화하고 있다. 더불어, 실제 산업에서 데이터 처리 속도 및 활용 시스템의 이슈 등으로 인해 시계열 데이터 활용 시 데이터의 크기를 압축하여 통합 활용하는 경향이 증가하고 있다. 그러나 시계열 데이터의 통합 가이드라인이 명확하지 않고 데이터 기술 시간 간격, 시간 구간 등 각각의 특성이 달라 일괄 통합하여 활용하기 어렵다. 본 논문에서는 통합 기준 설정 방법과 시계열 데이터의 통합시 발생하는 문제점을 기반으로 두 가지의 통합 방법을 제시하였다. 이를 기반으로 시계열 데이터의 특성을 고려한 이질적 시계열 데이터 통합 프레임워크를 구성하였으며 압축된 서로 다른 이질적 시계열 데이터의 통합과 다양한 기계 학습에 활용할 수 있음을 확인하였다.

Keywords

Acknowledgement

이 논문은 2021년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임 (No.2021-0-00034, 파편화된 데이터의 적극 활용을 위한 시계열 기반 통합 플랫폼 기술 개발).

References

  1. Dong-Gyu Jeong.(2017).A Study on IoT-Related Industry Trend. Korea Institute of Information Technology Magazine, 15(1),31-37. https://www.dbpia.co.kr/Journal/articleDetail?nodeId=NODE07187891
  2. Ahn, H., Chae, H., Jung, W., & Kim, S. (2017, February). Integration of heterogeneous time series gene expression data by clustering on time dimension. In 2017 IEEE International Conference on Big Data and Smart Computing (BigComp) (pp. 332-335). IEEE. doi: https://doi.org/10.1109/BIGCOMP.2017.7881688
  3. Yoonjin Hyun, Namgyu Kim.(2018).Text Mining-based Fake News Detection Using News And Social Media Data.The Jounal of Society for e-Business Studies,23(4),19-39. http://www.jsebs.org/jsebs/index.php/jsebs/article/view/338
  4. Seoha Song, Junhong Kim, Hyungseok Kim, Jaeseon Park, Pilsung Kang.(2019).Development of Early Warning Model for Financial Firms Using Financial and Text Data : A Case Study on Insolvent Bank Prediction. Journal of the Korean Institute of Industrial Engineers, 45(3),248-259. doi: https://doi.org/10.7232/JKIIE.2019.45.3.248
  5. Stevens, S. S. (1946). On the theory of scales of measure- ment. Science, 103(2684), 677-680. doi: https://doi.org/10.1126/science.103.2684.677
  6. Matthew Renze. Nominal, Ordinal, Interval, and Ratio Data. https://matthewrenze.com/articles/the-four-subtypes-of-data-in-data-s cience/ (accessed June 15. 2019).
  7. Kreindler, D. M., & Lumsden, C. J. (2016). The effects of the irregular sample and missing data in time series analysis. In Nonlinear Dynamical Systems Analysis for the Behavioral Sciences Using Real Data (pp. 149-172). CRC Press. https://psycnet.apa.org/record/2007-00569-003
  8. Eden Kim, Seok-gap Seok, Seung-cheol Son, & Byeong-tak Lee. (2021). Technical Trends of Time-Series Data Imputation. Electronics and Telecommunications Trends, 36(4), 145-153. doi: https://doi.org/10.22648/ETRI.2021.J.360414
  9. Won Seok Lee, Hyun Hee Kang.(2020).Interpretable convolutional neural network model for yield prediction in semiconductor fabrication.Journal of the Korean Data And Information Science Society,31(5),691-720. doi: https://doi.org/10.7465/jkdi.2020.31.5.691
  10. Kang-hyeon Shin, Kyo-hong Jin.(2021).Irregularly-Sampled time Series Correction Method for Anomaly detection in Manufacturing Facility. Proceedings of the Korean Institute of Information and Commucation Sciences Conference,25(2),85-88. https://koreascience.kr/article/CFKO202132348514233.page
  11. Yue, Z., Wang, Y., Duan, J., Yang, T., Huang, C., Tong, Y., & Xu, B. (2022, June). Ts2vec: Towards universal representation of time series. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 8, pp. 8980-8987). doi: https://doi.org/10.48550/arXiv.2106.10466
  12. Jin, H. Y., Jung, E. S., & Lee, D. (2020). High-performance IoT streaming data prediction system using Spark: a case study of air pollution. Neural Computing and Applications, 32(17), 13147-13154. doi: https://doi.org/10.1007/s00521-019-04678-9
  13. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. doi: https://doi.org/10.1162/neco.1997.9.8.1735
  14. Xue, J., Huang, Q., Wu, S., & Nagao, T. (2022). LSTM-Autoencoder Network for the Detection of Seismic Electric Signals. IEEE Transactions on Geoscience and Remote Sensing, 60, 1-12. doi: 10.1109/TGRS.2022.3183389
  15. Detecting Mobile Traffic Anomalies Through Physical Control Channel Fingerprinting: A Deep Semi-Supervised Approach - Scientific Figure on ResearchGate. Available from: https://www.researchgate. net/figure/LSTM-Autoencoder-for-Anomaly-Detection_fig2_336594630 [accessed 13 Nov, 2022] doi: 10.1109/ACCESS.2019.2947742
  16. Du, Q., Gu, W., Zhang, L., & Huang, S. L. (2018, November). Attention-based LSTM-CNNs for time-series classification. In Proceedings of the 16th ACM conference on embedded networked sensor systems (pp. 410-411). doi: https://doi.org/10.1145/3274783.3275208
  17. Zhao, B., Lu, H., Chen, S., Liu, J., & Wu, D. (2017). Convolutional neural networks for time series classification. Journal of Systems Engineering and Electronics, 28(1), 162-169. doi: https://doi.org/10.21629/JSEE.2017.01.18
  18. Wibawa, A. P., Utama, A. B. P., Elmunsyah, H., Pujianto, U., Dwiyanto, F. A., & Hernandez, L. (2022). Time-series analysis with smoothed Convolutional Neural Network. Journal of big Data, 9(1), 1-18. doi: https://doi.org/10.1186/s40537-022-00599-y
  19. Youngjun Jang, Jiho Kim, Hongchul Lee. (2022). A Proposal of Sensor-based Time Series Classification Model using Explainable Convolutional Neural Network. Journal of the Korea Society of Computer and Information , 27(5), 55-67. doi: 10.9708/jksci.2022.27.05.055