DOI QR코드

DOI QR Code

High-quality data collection for machine learning using block chain

블록체인을 활용한 양질의 기계학습용 데이터 수집 방안 연구

  • Kim, Youngrang (School of Electronics and Information Engineering, Korea Aerospace University) ;
  • Woo, Junghoon (School of Electronics and Information Engineering, Korea Aerospace University) ;
  • Lee, Jaehwan (School of Electronics and Information Engineering, Korea Aerospace University) ;
  • Shin, Ji Sun (Department of Computer and Information Security, Sejong University)
  • Received : 2018.11.12
  • Accepted : 2018.12.04
  • Published : 2019.01.31

Abstract

The accuracy of machine learning is greatly affected by amount of learning data and quality of data. Collecting existing Web-based learning data has danger that data unrelated to actual learning can be collected, and it is impossible to secure data transparency. In this paper, we propose a method for collecting data directly in parallel by blocks in a block - chain structure, and comparing the data collected by each block with data in other blocks to select only good data. In the proposed system, each block shares data with each other through a chain of blocks, utilizes the All-reduce structure of Parallel-SGD to select only good quality data through comparison with other block data to construct a learning data set. Also, in order to verify the performance of the proposed architecture, we verify that the original image is only good data among the modulated images using the existing benchmark data set.

기계학습의 정확도는 학습용 데이터의 양과 데이터의 품질에 많은 영향을 받는다. 기존의 웹을 기반으로 학습용 데이터를 수집하는 것은 실제 학습과 무관한 데이터가 수집 될 수 있는 위험성이 있으며 데이터의 투명성을 보장할 수가 없다. 본 논문에서는 블록체인구조에서 블록들이 직접 병렬적으로 데이터를 수집하게 하고 각 블록들이 수집한 데이터를 타 블록의 데이터와 비교하여 양질의 데이터만을 선별하는 방안을 제안한다. 제안하는 시스템은 각 블록들은 데이터를 서로 블록체인을 통해 공유하며 All-reduce 구조의 Parallel-SGD를 활용하여 다른 블록들의 데이터와 비교를 통해 양질의 데이터만을 선별하여 학습용 데이터셋을 구성할 수가 있다. 또한 본 논문에서는 제안한 구조의 성능을 확인하기 위해 실험을 통해 기존의 벤치마크용 데이터셋의 이미지를 활용하여 변조된 이미지 사이에서 원본 이미지만을 양질의 데이터로 판별함을 확인하였다.

Keywords

HOJBC0_2019_v23n1_13_f0001.png 이미지

Fig. 1 Structure of a Parameter Server system consist with parameter server and workers including model replica

HOJBC0_2019_v23n1_13_f0002.png 이미지

Fig. 2 Structure of All-reduce aggregation methodconsisting of only Worker

HOJBC0_2019_v23n1_13_f0003.png 이미지

Fig. 3 Shifting of parameter values using Parallel-SGD

HOJBC0_2019_v23n1_13_f0004.png 이미지

Fig. 4 The process of determining the data quality by calculating the slope of the data input to each block

HOJBC0_2019_v23n1_13_f0005.png 이미지

Fig. 5 Perform training using modulated data for each worker

HOJBC0_2019_v23n1_13_f0006.png 이미지

Fig. 6 The number of times that workers using different modulations received rewards

References

  1. M. Zeinkevich, M. Weimer, L. Li, A. J. Smola, "Parallelized Stochastic Gradient' Descent," Advances in Neural Information Processing Systems 23, pp. 2595-2603, 2010.
  2. A. Sergeev, M. D. Balso, "Horovod: fast and easy distributed deep learning in TensorFlow," arXiv e-prints, Feb. 2018.
  3. E. C. Ferrer, "The blockchain: a new framework for robotic swarm systems," arXiv e-prints, Jun. 2017.
  4. Y. Lecun, C. Cortes, C. J.C. Burges, "The Mnist database" [Internet]. Available: http://yann.lecun.com/exdb/mnist/.
  5. J. Li, S. Y. Shin, H. C. Lee, "Text Mining and Visualization of Papers Reviews Using R Language," Journal of information and communication convergence engineering, 15-3, pp. 170-174, Sep. 2017. https://doi.org/10.6109/JICCE.2017.15.3.170
  6. E. Seo, J. Jang, "Design of Driving Record System using Block Chain," The Korea Institute of Information and Communication Engineering, Journal of the Korea Institute of Information and Communication Engineering, vol. 22, no. 6, pp. 916-921, Jun. 2018. https://doi.org/10.6109/JKIICE.2018.22.6.916
  7. V. K. Rao, R. Caytiles, "SUBGRAPH WITH SET SIMILARITY IN ADATABASE," Asia-pacific Journal of Convergent Research Interchange, HSST, vol. 3, no. 2, pp. 29-37, Jun. 2017. https://doi.org/10.21742/apjcri.2017.03.03
  8. S. Nakamoto. (2011, November) Bitcoin: A peer-to-peer electronic cash system [Internet]. Available: http://bitcoin.org/bitcoin.pdf.
  9. G. Wood. (2015) ETHEREUM: A secure decentralised generalised transaction ledger [Internet]. Available: http://gavwood.com/paper.pdf.