DOI QR코드

DOI QR Code

Development of Evaluation Metrics that Consider Data Imbalance between Classes in Facies Classification

지도학습 기반 암상 분류 시 클래스 간 자료 불균형을 고려한 평가지표 개발

  • Kim, Dowan (Dept. of Earth Resources and Environmental Engineering, Hanyang Univ.) ;
  • Choi, Junhwan (Dept. of Earth Resources and Environmental Engineering, Hanyang Univ.) ;
  • Byun, Joongmoo (Dept. of Earth Resources and Environmental Engineering, Hanyang Univ.)
  • 김도완 (한양대학교 자원환경공학과) ;
  • 최준환 (한양대학교 자원환경공학과) ;
  • 변중무 (한양대학교 자원환경공학과)
  • Received : 2020.06.10
  • Accepted : 2020.08.26
  • Published : 2020.08.31

Abstract

In training a classification model using machine learning, the acquisition of training data is a very important stage, because the amount and quality of the training data greatly influence the model performance. However, when the cost of obtaining data is so high that it is difficult to build ideal training data, the number of samples for each class may be acquired very differently, and a serious data-imbalance problem can occur. If such a problem occurs in the training data, all classes are not trained equally, and classes containing relatively few data will have significantly lower recall values. Additionally, the reliability of evaluation indices such as accuracy and precision will be reduced. Therefore, this study sought to overcome the problem of data imbalance in two stages. First, we introduced weighted accuracy and weighted precision as new evaluation indices that can take into account a data-imbalance ratio by modifying conventional measures of accuracy and precision. Next, oversampling was performed to balance weighted precision and recall among classes. We verified the algorithm by applying it to the problem of facies classification. As a result, the imbalance between majority and minority classes was greatly mitigated, and the boundaries between classes could be more clearly identified.

머신러닝을 이용한 분류 모델 훈련에서 학습자료의 양과 질은 학습한 모델의 성능을 좌우하므로 학습자료 생성이 매우 중요한 역할을 한다. 그러나 자료 생성에 높은 비용이 들어 이상적인 학습자료 생성이 어려울 때에는 클래스 간 자료 불균형 문제가 발생한다. 만약 학습자료로 사용될 탐사자료가 클래스 간 불균형하게 얻어지면, 클래스 별로 균형있는 학습이 이루어지기 힘들다. 따라서 데이터가 상대적으로 적은 클래스는 재현율이 현저히 떨어지게 된다. 그 뿐만 아니라 정확도와 정밀도 등의 평가지표들에 대한 신뢰도가 떨어지게 된다. 따라서 이 연구에서는 두 단계에 걸쳐 자료 불균형 문제를 해소하고자 하였다. 첫 번째로 기존의 정확도와 정밀도를 개선하여 자료 불균형을 고려할 수 있는 새로운 평가지표로 가중정확도와 가중정밀도를 고안하였다. 다음으로 클래스 간의 가중정밀도와 재현율의 균형을 맞추어 주도록 오버샘플링을 수행하였다. 개발한 알고리듬을 물리검층 자료를 이용한 암상 및 공극유체 규명 문제에 적용함으로써 검증하였다. 그 결과 다수 클래스와 소수 클래스들 간의 불균형이 상당 부분 완화되었고, 클래스 간의 경계를 보다 명확하게 확인할 수 있었다.

Keywords

References

  1. Amin, A., Rahim, F., Ali, I., Khan, C., and Anwar, S., 2015, A comparison of two oversampling techniques (SMOTE vs MTDF) for handling class imbalance problem: A case study of customer churn prediction, in Rocha A., Correia A., Costanzo S., Reis L., eds., New Contributions in Information Systems and Technologies: Advances in Intelligent Systems and Computing, 353, 215-225, doi: 10.1007/978-3-319-16486-1_22.
  2. Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P., 2002, SMOTE: synthetic minority over-sampling technique, J. Artif. Intell. Res., 16, 321-357, doi: 10.1613/jair.953.
  3. Chi, X. G., and Han, D. H., 2009, Lithology and fluid differentiation using a rock physics template, Lead. Edge, 28(1), 60-65, doi: 10.1190/1.3064147.
  4. Choi, J., Kim, S., Kim, B., and Byun, J., 2019, Probabilistic reservoir characterisation using 3D pdf of stochastic forward modelling results in Vincent oil field, Explor. Geophys., 51(3), 341-354, doi: 10.1080/08123985.2019.1696151.
  5. Chopra, S., and Marfurt K. J., 2005, Seismic attributes - A historical perspective, Geophysics, 70(5), 3SO-28SO, doi:10.1190/1.2098670.
  6. Gao, D., 2003, Volume texture extraction for 3D seismic visualization and interpretation, Geophysics, 68(4), 1294-1302, doi: 10.1190/1.1598122.
  7. Goodway, B., Chen, T., and Downton, J., 1997, Improved AVO fluid detection and lithology discrimination using Lame petrophysical parameters; "${\lambda}{\rho}$", "${\mu}{\rho}$", & "${\lambda}{\mu}$ fluid stack", from P and S inversions, 67th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, 183-186, doi: 10.1190/1.1885795.
  8. Hampson, D. P., Russell, B. H., and Bankhead, B., 2005, Simultaneous inversion of prestack seismic data, 75th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, 1633-1637, doi: 10.1190/1.2148008.
  9. Han, H., Wang, W. Y., and Mao, B. H., 2005, Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning, in Huang D. S., Zhang X. P., Huang G. B., eds, Advances in Intelligent Computing, ICIC 2005, Lecture Notes in Computer Science, 3644, 878-887, doi: 10.1007/11538059_91.
  10. He, H., and Garcia, E. A., 2009, Learning from imbalanced data, IEEE Trans. Knowl. Data Eng., 21(9), 1263-1284, doi: 10.1109/TKDE.2008.239.
  11. Jin, L., 2018, Machine learning approaches for seismic-facies prediction and reservoir-property inversion, 88th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, 2147-2151, doi: 10.1190/segam2018-2996374.1.
  12. Kim, D., Choi, J., and Byun, J., 2019, Facies classification using oversampling based machine learning, 2019 Fall Joint Conference of KSMER-KSRM-KSEG (in Korean).
  13. More, A., 2016, Survey of resampling techniques for improving classification performance in unbalanced datasets, arXiv preprint arXiv:1608.06048.
  14. Naeini, E. Z., 2019, A machine learning approach to quantitative interpretation, 89th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, 3176-3180, doi:10.1190/segam2019-3216138.1.
  15. Nieto, J., Batlai, B., and Delbecq, F., 2013, Seismic lithology prediction: a Montney shale gas case study, CSEG Recorder, 38(2), 34-43.
  16. Pendrel, J., Schouten, H., and Bornard, R., 2017, Bayesian estimation of petrophysical facies and their applications to reservoir characterization, 87th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, 3082-3086, doi: 10.1190/segam2017-17588007.1.
  17. Saleem, A., Choi, J., Yoon, D., and Byun, J., 2019, Facies classification using semi-supervised deep learning with pseudo-labeling strategy, 89th Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, 3171-3175, doi: 10.1190/segam2019-3216086.1.
  18. Suh, Y., Yu, J., Mo, J., Song, L., and Kim, C., 2017, A comparison of oversampling methods on imbalanced topic classification of korean news articles, Journal of Cognitive Science, 18(4), 391-437, doi: 10.17791/jcs.2017.18.4.391.
  19. Yenugu, M., Marfurt, K. J., and Marson, S., 2010, Seismic texture analysis for reservoir prediction and characterization, Lead. Edge, 29(9), 1116-1121, doi: 10.1190/1.3485772.