Figure 2.1. Changes in data set after applying various over-sampling methods.
Figure 2.2. Changes in data set after applying various CNN-based under-sampling methods.
Figure 2.3. Changes in data set after applying various under-sampling methods.
Figure 2.4. Changes in data set after applying two combined methods.
Figure 3.1. Sensitivity, accuracy, ACU, and F1-score of logistic regression for simulation 3.
Figure 3.2. Sensitivity, accuracy, ACU, and F1-score of SVM for simulation 3.
Figure 3.3. Sensitivity, accuracy, ACU, and F1-score of random forest for simulation 3.
Figure 3.4. An example of original data set and changed data set after applying the NM2 method when the rare class values were distributed in two extremes.
Figure 4.1. Sensitivity, accuracy, ACU, and F1-score of logistic regression, SVM, and random forest for so-lar flare m0 data.
Table 2.1. Misclassification table
Table 4.1. Example data sets
References
- Batista, G. E., Prati, R. C., and Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD Explorations Newsletter, 6, 20-29. https://doi.org/10.1145/1007730.1007735
- Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953
- Gates, G. (1972). The reduced nearest neighbor rule (Corresp.), IEEE Transactions on Information Theory, 18, 431-433. https://doi.org/10.1109/TIT.1972.1054809
- Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., and Bing, G. (2017). Learning from classimbalanced data: Review of methods and applications, Expert Systems with Applications, 73, 220-239. https://doi.org/10.1016/j.eswa.2016.12.035
- Han, H., Wang, W. Y., and Mao, B. H. (2005). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In International Conference on Intelligent Computing (pp. 878-887). Springer, Berlin, Heidelberg.
- Hart, P. (1968). The condensed nearest neighbor rule (Corresp.), IEEE Transactions on Information Theory, 14, 515-516. https://doi.org/10.1109/TIT.1968.1054155
- He, H., Bai, Y., Garcia, E. A., and Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference (pp. 1322-1328). IEEE.
- He, H. and Garcia, E. A. (2009). Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering, 21, 1263-1284. https://doi.org/10.1109/TKDE.2008.239
- Laurikkala, J. (2001). Improving identification of difficult small classes by balancing class distribution. In Conference on Artificial Intelligence in Medicine in Europe (pp. 63-66).
- Lemaitre, G., Nogueira, F., and Aridas, C. K. (2017). Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning, Journal of Machine Learning Research, 18, 1-5.
- Mani, I. and Zhang, I. (2003). kNN approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of Workshop on Learning from Imbalanced Datasets II, ICML (Vol. 126), Washington.
- Moon, S. Y. (2018). Performance comparison of classification methods based on the random forest in class imbalanced data (Master thesis), Korea University, Seoul.
- Prati, R. C., Batista, G. E., Monard, M. C. (2009). Data mining with imbalanced class distributions: concepts and methods. In Proceedings of the 4th Indian International Conference on Artificial Intelligence (pp. 359-376), Tumkur, Karnataka.
- Smith, M. R., Martinez, T., and Giraud-Carrier, C. (2014). An instance level analysis of data complexity, Machine Learning, 95, 225-256. https://doi.org/10.1007/s10994-013-5422-z
- Tomek, I. (1976a). An experiment with the edited nearest-neighbor rule, IEEE Transactions on Systems, Man, and Cybernetics, 6, 448-452. https://doi.org/10.1109/TSMC.1976.4309523
- Tomek, I. (1976b). Two modifications of CNN, IEEE Transactions on Systems, Man, and Cybernetics, 6, 769-772. https://doi.org/10.1109/TSMC.1976.4309452
- Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data, IEEE Transactions on Systems, Man, and Cybernetics, 2, 408-421. https://doi.org/10.1109/TSMC.1972.4309137