Development of Evaluation Metrics that Consider Data Imbalance between Classes in Facies Classification

지도학습 기반 암상 분류 시 클래스 간 자료 불균형을 고려한 평가지표 개발

  • Kim, Dowan (Dept. of Earth Resources and Environmental Engineering, Hanyang Univ.) ;
  • Choi, Junhwan (Dept. of Earth Resources and Environmental Engineering, Hanyang Univ.) ;
  • Byun, Joongmoo (Dept. of Earth Resources and Environmental Engineering, Hanyang Univ.)
  • 김도완 (한양대학교 자원환경공학과) ;
  • 최준환 (한양대학교 자원환경공학과) ;
  • 변중무 (한양대학교 자원환경공학과)
  • Received : 2020.06.10
  • Accepted : 2020.08.26
  • Published : 2020.08.31


In training a classification model using machine learning, the acquisition of training data is a very important stage, because the amount and quality of the training data greatly influence the model performance. However, when the cost of obtaining data is so high that it is difficult to build ideal training data, the number of samples for each class may be acquired very differently, and a serious data-imbalance problem can occur. If such a problem occurs in the training data, all classes are not trained equally, and classes containing relatively few data will have significantly lower recall values. Additionally, the reliability of evaluation indices such as accuracy and precision will be reduced. Therefore, this study sought to overcome the problem of data imbalance in two stages. First, we introduced weighted accuracy and weighted precision as new evaluation indices that can take into account a data-imbalance ratio by modifying conventional measures of accuracy and precision. Next, oversampling was performed to balance weighted precision and recall among classes. We verified the algorithm by applying it to the problem of facies classification. As a result, the imbalance between majority and minority classes was greatly mitigated, and the boundaries between classes could be more clearly identified.


  1. Amin, A., Rahim, F., Ali, I., Khan, C., and Anwar, S., 2015, A comparison of two oversampling techniques (SMOTE vs MTDF) for handling class imbalance problem: A case study of customer churn prediction, in Rocha A., Correia A., Costanzo S., Reis L., eds., New Contributions in Information Systems and Technologies: Advances in Intelligent Systems and Computing, 353, 215-225, doi: 10.1007/978-3-319-16486-1_22.
  2. Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P., 2002, SMOTE: synthetic minority over-sampling technique, J. Artif. Intell. Res., 16, 321-357, doi: 10.1613/jair.953.
  3. Chi, X. G., and Han, D. H., 2009, Lithology and fluid differentiation using a rock physics template, Lead. Edge, 28(1), 60-65, doi: 10.1190/1.3064147.
  4. Choi, J., Kim, S., Kim, B., and Byun, J., 2019, Probabilistic reservoir characterisation using 3D pdf of stochastic forward modelling results in Vincent oil field, Explor. Geophys., 51(3), 341-354, doi: 10.1080/08123985.2019.1696151.
  5. Chopra, S., and Marfurt K. J., 2005, Seismic attributes - A historical perspective, Geophysics, 70(5), 3SO-28SO, doi:10.1190/1.2098670.
  6. Gao, D., 2003, Volume texture extraction for 3D seismic visualization and interpretation, Geophysics, 68(4), 1294-1302, doi: 10.1190/1.1598122.
  7. Goodway, B., Chen, T., and Downton, J., 1997, Improved AVO fluid detection and lithology discrimination using Lame petrophysical parameters; "${\lambda}{\rho}$", "${\mu}{\rho}$", & "${\lambda}{\mu}$ fluid stack", from P and S inversions, 67th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, 183-186, doi: 10.1190/1.1885795.
  8. Hampson, D. P., Russell, B. H., and Bankhead, B., 2005, Simultaneous inversion of prestack seismic data, 75th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, 1633-1637, doi: 10.1190/1.2148008.
  9. Han, H., Wang, W. Y., and Mao, B. H., 2005, Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning, in Huang D. S., Zhang X. P., Huang G. B., eds, Advances in Intelligent Computing, ICIC 2005, Lecture Notes in Computer Science, 3644, 878-887, doi: 10.1007/11538059_91.
  10. He, H., and Garcia, E. A., 2009, Learning from imbalanced data, IEEE Trans. Knowl. Data Eng., 21(9), 1263-1284, doi: 10.1109/TKDE.2008.239.
  11. Jin, L., 2018, Machine learning approaches for seismic-facies prediction and reservoir-property inversion, 88th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, 2147-2151, doi: 10.1190/segam2018-2996374.1.
  12. Kim, D., Choi, J., and Byun, J., 2019, Facies classification using oversampling based machine learning, 2019 Fall Joint Conference of KSMER-KSRM-KSEG (in Korean).
  13. More, A., 2016, Survey of resampling techniques for improving classification performance in unbalanced datasets, arXiv preprint arXiv:1608.06048.
  14. Naeini, E. Z., 2019, A machine learning approach to quantitative interpretation, 89th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, 3176-3180, doi:10.1190/segam2019-3216138.1.
  15. Nieto, J., Batlai, B., and Delbecq, F., 2013, Seismic lithology prediction: a Montney shale gas case study, CSEG Recorder, 38(2), 34-43.
  16. Pendrel, J., Schouten, H., and Bornard, R., 2017, Bayesian estimation of petrophysical facies and their applications to reservoir characterization, 87th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, 3082-3086, doi: 10.1190/segam2017-17588007.1.
  17. Saleem, A., Choi, J., Yoon, D., and Byun, J., 2019, Facies classification using semi-supervised deep learning with pseudo-labeling strategy, 89th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, 3171-3175, doi: 10.1190/segam2019-3216086.1.
  18. Suh, Y., Yu, J., Mo, J., Song, L., and Kim, C., 2017, A comparison of oversampling methods on imbalanced topic classification of korean news articles, Journal of Cognitive Science, 18(4), 391-437, doi: 10.17791/jcs.2017.18.4.391.
  19. Yenugu, M., Marfurt, K. J., and Marson, S., 2010, Seismic texture analysis for reservoir prediction and characterization, Lead. Edge, 29(9), 1116-1121, doi: 10.1190/1.3485772.