DOI QR코드

DOI QR Code

Experimental Analysis of Equilibrization in Binary Classification for Non-Image Imbalanced Data Using Wasserstein GAN

  • Wang, Zhi-Yong (Weifang University of Science and Technology) ;
  • Kang, Dae-Ki (Department of Computer Engineering, Dongseo University)
  • Received : 2019.09.02
  • Accepted : 2019.09.15
  • Published : 2019.11.30

Abstract

In this paper, we explore the details of three classic data augmentation methods and two generative model based oversampling methods. The three classic data augmentation methods are random sampling (RANDOM), Synthetic Minority Over-sampling Technique (SMOTE), and Adaptive Synthetic Sampling (ADASYN). The two generative model based oversampling methods are Conditional Generative Adversarial Network (CGAN) and Wasserstein Generative Adversarial Network (WGAN). In imbalanced data, the whole instances are divided into majority class and minority class, where majority class occupies most of the instances in the training set and minority class only includes a few instances. Generative models have their own advantages when they are used to generate more plausible samples referring to the distribution of the minority class. We also adopt CGAN to compare the data augmentation performance with other methods. The experimental results show that WGAN-based oversampling technique is more stable than other approaches (RANDOM, SMOTE, ADASYN and CGAN) even with the very limited training datasets. However, when the imbalanced ratio is too small, generative model based approaches cannot achieve satisfying performance than the conventional data augmentation techniques. These results suggest us one of future research directions.

Acknowledgement

Supported by : National Research Foundation of Korea(NRF)

References

  1. B. Krawczyk, “Learning from Imbalanced Data: Open Challenges and Future Directions,” Progress in Artificial Intelligence, Vol. 5, No. 4, pp. 221-232, 2016. DOI: http://dx.doi.org/10.1007/s13748-016-0094-0 https://doi.org/10.1007/s13748-016-0094-0
  2. C. X. Ling, and C. Li. "Data Mining for Direct Marketing: Problems and Solutions," KDD, Vol. 98, pp. 73-79. 1998.
  3. N.V. Chawla, K.W. Bowyer, L.O. Hall and W.P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal of Artificial Intelligence Research, Vol. Jun 1, No. 16, pp. 321-357, 2002. DOI: https://doi.org/10.1613/jair.953
  4. H. He, Y. Bai, E.A. Garcia and S. Li, "ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning," in 2008 IEEE International Joint Conference on Neural Networks, pp. 1322-1328, Jun. 1, 2008.
  5. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative Adversarial Nets." in Proc. Neural Information Processing Systems 2014, pp. 2672-2680, Dec. 8-13, 2014.
  6. M. Mirza and S. Osindero, "Conditional Generative Adversarial Nets," arXiv preprint, pp. 1411.1784, Nov 6, 2014.
  7. M. Arjovsky, S. Chintala, L. Bottou, "Wasserstein Generative Adversarial Networks," in Proc. International Conference on Machine Learning, pp. 214-223, Jul 17, 2017.
  8. UCI Machine Learning Repository. [Online]. http://archive.ics.uci.edu/ml/
  9. Y. Saatci and A.G. Wilson, "Bayesian GAN," in Proc. Neural Information Processing Systems 2017, pp. 3622-3631, Dec. 4-9, 2017.
  10. G. Agrawal, and D.-K. Kang, "Wine Quality Classification with Multilayer Perceptron," International Journal of Internet, Broadcasting and Communication (IJIBC), 10(2):25-30, May 2018. DOI: http://dx.doi.org/10.7236/IJIBC.2016.8.4.19
  11. Ho, J., and Kang, D.-K., "Ensemble-By-Session Method on Keystroke Dynamics based User Authentication," International Journal of Internet, Broadcasting and Communication (IJIBC), 8(4), November 2016. DOI: https://doi.org/10.7236/IJIBC.2018.10.2.5