DOI QR코드

DOI QR Code

Multi-class Classification of Histopathology Images using Fine-Tuning Techniques of Transfer Learning

  • Received : 2021.05.26
  • Accepted : 2021.07.12
  • Published : 2021.07.30

Abstract

Prostate cancer (PCa) is a fatal disease that occurs in men. In general, PCa cells are found in the prostate gland. Early diagnosis is the key to prevent the spreading of cancers to other parts of the body. In this case, deep learning-based systems can detect and distinguish histological patterns in microscopy images. The histological grades used for the analysis were benign, grade 3, grade 4, and grade 5. In this study, we attempt to use transfer learning and fine-tuning methods as well as different model architectures to develop and compare the models. We implemented MobileNet, ResNet50, and DenseNet121 models and used three different strategies of freezing layers techniques of fine-tuning, to get various pre-trained weights to improve accuracy. Finally, transfer learning using MobileNet with the half-layer frozen showed the best results among the nine models, and 90% accuracy was obtained on the test data set.

Keywords

1. INTRODUCTION

Cancer begins when cells in the human body start to grow uncontrollably. Cells in almost any part of the body can become cancerous, which can then spread to other areas of the body. PCa is one of the most common cancers after skin cancer in American men. According to the American Cancer Society statistics, approximately 1 in 8 men will be diagnosed with PCa in their lifetime[1].

Histological examination of tissues and the detection of cancer by physicians remains the gold standard in cancer diagnosis. The diagnosis of PCa is heavily time-consuming. In addition, it is based on subjective grading. For example, the study by Ozkan et al. reported that two pathologists disagreed about the presence of cancer in 31 of 407 baseline biopsies and that the total concordance of the accessed Gleason score was only 51.7%, describing these challenges in diagnosing the PCa consistently [2]. Therefore, the development of computer-assisted decision support tools is essential for saving time, predicting disease outcomes, and improving precision medicine for pathologists.

There has been considerable interest in the development of methods based on digital image processing and machine learning. These methods are used to automatically analyze pathological images to classify tissues and disease, as well as to improve accuracy and diagnostic standard [3-5]. On top of that, recent advances in deep learning research have succeeded in increasing the performance of such analytics [6-9]. However, the proposed deep learning models often require a significant amount of annotated data to be properly trained. Because cohort sizes can be small and histopathological image annotation takes a long time, a concept known as transfer learning, training neural networks with an external dataset, primarily ImageNet [10], and then fine-tuning the model with the dataset in hand, can prove useful. Such a pre-trained model and fine-tuning approach are more effective than training the same neural network architecture from scratch in studies involving digital pathological image analysis [11-15]. Transfer learning can also be beneficial in adapting areas imaged by different microscopes or staining procedures.

B. Kieffer et al. [16] explored the problem of classification within a medical image dataset based on a feature vector extracted from the deepest layer of pre-trained Convolution Neural Network (CNN). They used feature vectors from several pre-trained structures, including networks with/without transfer learning to evaluate the performance of pre-trained deep features versus CNN which have been trained by that specific dataset. Their result shows that pre-trained networks are quite competitive against training from scratch. As well, fine-tuning does not seem to add any tangible improvement for VGG16 to justify additional training while we observed considerable improvement in retrieval and classification accuracy with 56.98% when they fine-tuned the Inception structure.

Nguyen et al. [17] introduced a novel approach to grade prostate malignancy using digitized histopathological specimens of the prostate tissue. They have extracted tissue structural features from the gland morphology and co-occurrence texture features from 82 regions of interest (ROI) with 620 × 550 pixels to classify a tissue pattern into three major categories: benign, grade 3 carcinoma, and grade 4 carcinoma. The authors proposed a hierarchical (binary) classification scheme and obtained 85.6% accuracy in classifying an input tissue pattern into one of the three classes.

D. Albashish et al. [18] implemented a new multi-class approach called multi-level (hierarchical) learning architecture (MLA), which addresses the binary classification tasks in the hierarchical strategy. It focuses on solving the three-class classification problem in prostate cancer grading, i.e., Grade 3, Grade 4, and Benign. The results also confirmed the high efficiency of the ensemble framework with the MLA scheme in dealing with the multiclass classification problem, which results in 85.9% accuracy.

N. Bayramoglu et al. [19] evaluated performances of convolutional neural network models to classify cell nuclei in hematoxylin and eosin (H&E) stained histopathology images of colorectal adenocarcinoma. They compared four different CNN architectures AlexNet, GenderNet, GoogLeNet, and VGG-16 trained on natural images and facial images using transfer learning and fine-tuning and got a maximum of 88.03% accuracy with fine-tuned VGG-16 model.

Subrata Bhattacharjee et al. [8] developed a machine learning technique to predict the histological grades in prostate biopsy. To perform a multiclass classification, an AI-based deep learning algorithm, a multichannel convolutional neural network (MCCNN) was developed by connecting layers with artificial neurons inspired by the human brain system. The histological grades that were used for the analysis are benign, grade 3, grade 4, and grade 5. An author aimed to classify multiple patterns of images extracted from the whole slide image (WSI) of a prostate biopsy based on the Gleason grading system. The MCCNN model takes three input channels (Red, Green, and Blue) to extract the computational features from each channel and concatenate them for multiclass classification. Stain normalization was carried out for each histological grade to standardize the intensity and contrast level in the image and got an average accuracy of 95.1%.

In this study, MobileNet [20,21], ResNet50 [22,23], and DenseNet121 [24,25] deep convolutional networks were used as a transfer learning framework where they were pre-trained on the Image Net dataset. In addition, three different fine-tuning strategies were applied to freeze some of their layers for comparison and increased accuracy. The PANDA dataset, from Kaggle [26] was used for the experiment to classify images into four classes. In this paper, the classification accuracies of the second model of MobileNet have been visualized using a non-normalized and normalized confusion matrix, giving more accurate results.

2. MATERIALS AND METHODS

2.1 Data Set

In this experiment, we have selected 900 good images for each class, which is a publicly available PANDA dataset on the Kaggle repository. PCa grade assessment (PANDA) [26] started a challenge to develop models for detecting PCa on histopathology images of prostate tissue samples and estimate the severity of the disease using the most extensive multi-center dataset on Gleason grading. Our dataset consists of 3600 patched color images of size 256 × 256 pixels that were extracted from WSI. There is a total of four classes to predict, some of the sample images for each class are shown in Fig. 1. To train the model, 80 percent of data samples were chosen for training (2800 images) and the remaining 20 percent (700 images) for validation. Further, to test the model, a total of 100 unseen data samples were selected (i.e., 25 images per class). Table 1 shows the details of the dataset employed in this work.

MTMDCW_2021_v24n7_849_f0001.png 이미지

Fig. 1. Sample images of each class of prostate cancer. (a) Benign. (b) Grade 3. (c) Grade 4. (d) Grade 5.

Table 1. The arrangement of the dataset for multiclass classification.

MTMDCW_2021_v24n7_849_t0001.png 이미지

2.2 Transfer Learning Methods

Transfer learning is a deep learning technique that stores the knowledge gained while solving one issue and use it to a new but related problem. Instead of starting the learning process from scratch, we can start from patterns that have been learned when solving a different problem. This way we can build accurate models in a time-saving way. Many pre-trained models used in transfer learning are based on large CNN, which has 2 main parts convolutional base and classifier.

Nine models based on MobileNet, ResNet50, and DenseNet121 were used in this research study. They are originally trained on the ImageNet database which can classify images into thousands of object categories. Due to these advantages, while we were building models for our own needs, we started by removing the original classifier, next we added new classifier layers that fit our purpose, and finally, we had a fine-tuning process on our model according to one of the below mentioned three strategies. Fig. 2 demonstrates one of the transfer learning and fine-tuning techniques that we used in our experiment. Table 2 demonstrates nine models, feature extraction layers, classification layers, and the depth of each model. The first three models have been applied to 84 MobileNet layers and their pre-trained weights and two additional layers were added to the classification block. While the other three models were pre-trained 173 layers of Res Net50. Furthermore, the remaining models were built-in with 425 layers of DenseNet121. Finally, GlobalAveragePooling2D and Dense layers were used for the multi-class classification.

MTMDCW_2021_v24n7_849_f0002.png 이미지

Fig. 2. Transfer learning and fine-tuning technique.

Table 2. List of the CNN architectures.

MTMDCW_2021_v24n7_849_t0002.png 이미지

2.3 Fine Tuning

Fine-tuning is a way of utilizing or applying transfer learning. There are different kinds of fine-tuning techniques, truncating the last layer, using a smaller learning rate, and freezing the weights of the layers are among these methods. In this paper, the freezing technique is used to compare each model. Fig. 3 shows three strategies of freezing techniques used with three architectures, while Table 3 demonstrates a list of frozen layers and trainable parameters.

MTMDCW_2021_v24n7_849_f0003.png 이미지

Fig. 3. Three strategies of fine-tuning.

Table 3. List of Frozen layers and trainable parameters.

MTMDCW_2021_v24n7_849_t0003.png 이미지

Freezing the convolutional base is the first strategy trained with model_1, model_4 and, model_7. This case illustrates the extreme condition of the train and freezes trade-off. The main idea is to remove the last fully connected layer, run the pre-trained model as a fixed feature extractor, and then use the resulting features to train a new classifier. We used the pre-trained model as a fixed feature extraction technique (MobileNet, ResNet50, and DenseNet121), which can be useful in case, we have less computational power or a small dataset, and the pre-trained model solves a problem very similar to the one we want to solve.

The second strategy, used for model_2, model_5, and model_8, is training some layers and keep the others frozen. Mainly, lower layers mean general features (problem-independent), while higher layers indicate specific features (problem-dependent). Here, we played with that dichotomy by selecting how much we want to modify the weights of the network (a frozen layer does not change during training). Usually, if we have a small dataset and a large number of parameters, we leave more layers frozen to keep away from overfitting. In contrast, if the dataset is large and the number of parameters is small, we can enhance our model by training more layers to the new task since overfitting is not an issue. Therefore we have frozen more layers to find the average parameters, as we have a small and different dataset from the p retrained models’ ImageNet dataset. We froze 80, 160, and 250 layers to get a balance between the number of layers and freeze that gave average parameters on model_2, model_5, and model_8 respectively.

The last strategy, trained with model_3, model_6, and model_9, is to train the entire model. In this case, we implemented the architecture of the pre-trained model and trained it according to our dataset. Our models learned from scratch, so we needed a lot of computational power and more time compared to previous methods.

3. RESULTS AND DISCUSSION

In this section, learning graphs and confusion matrices are used to demonstrate the best result achieved from model_2. For classification, we trained with MobileNet, ResNet50, and DenseNet 121 architectures, which are followed by Global AvaragePooling2D and Dense layers. All layers used ReLU activation functions. When it comes to the output layer, softmax activation was employed. The optimizer is selected to be Adam [27] with a learning rate of 10-4 and the loss function is chosen to be categorical cross-entropy. Nine different models were trained with three different architectures, with three models in each: training from scratch (random weight initialization), fine-tuning half layers and parameters, finally fine-tuning all backbone layers on ImageNet pre-trained model with 20 epochs.

Commonly, above mentioned strategy 1 is used when we have a small dataset and similar to the pre-trained model’s dataset, while strategy 3 is usually applied in case of having a large dataset, but different from the pre-trained model’s dataset. As we have, a small dataset and different from the pre-trained model’s ImageNet 1000 class dataset, strategy 2 is the most suitable for our proposed model. Fig. 4 and 5 approve that re-trained model_2 with only 20 epochs represent a significant increase in accuracy and decrease in loss values, showing better performance in early epochs. Comparing all re-trained models, model_2 used Mobile Net architecture, and pre-trained weights with 80 frozen layers yield the highest accuracy of 93.03% and 91.32% and the lowest loss of 2.52% and 2.78% on training and validation sets, respectively. As a result, we prevented overfitting by balancing between the number of layers to train and freeze.

MTMDCW_2021_v24n7_849_f0004.png 이미지

Fig. 4. Training Accuracy of model_2.

MTMDCW_2021_v24n7_849_f0005.png 이미지

Fig. 5. Training Loss of model_2.

Furthermore, Table 4 demonstrates predictions of 100 unseen test data and their accuracy. On the test set, model_2 predicted 90 images correctly and only making 10 mistakes out of 100 test images, scoring 90% as overall accuracy. Fig. 6 and 7 show the non-normalized and normalized confusion matrices.

Table 4. Accuracy of test dataset.

MTMDCW_2021_v24n7_849_t0004.png 이미지

MTMDCW_2021_v24n7_849_f0006.png 이미지

Fig. 6. Non-normalized Confusion Matrix of model_2.

MTMDCW_2021_v24n7_849_f0007.png 이미지

Fig. 7. Normalized Confusion Matrix of model_2.

In Table 5, we compare the accuracy of different multi-class classification methods with our proposed model. B. Kieffer et al. [16] got 56.98% accuracy on the Inception v3 classification method by using transfer learning and predicting four different classes. Nguyen et al. [17] extracted tissue structural features from the gland morphology and co-occurrence texture features and classified three classes with obtaining 85.6% accuracy. D. Albashish et al. [18] used multi-level learning architecture (MLA) and focused on solving classification problems in prostate cancer by categorizing the cases into three classes. The authors achieved 85.9% accuracy that shows the high efficiency of the ensemble framework with MLA. N. Bayramoglu et. al [19] performed 88.03% accuracy with transfer learning and fine-tuning on VGG16 architecture that changes learning rate according to each epoch. S. Bhattacharjee et. al [8], on the other hand, developed MCCNN to predict the histological grades in prostate biopsy and achieved an excellent accuracy of 95.1% with analyzing four different classes. In our proposed architecture, we have achieved one of the most satisfying accuracies of 90% by implementing MobileNet (model_2) classification method that trains some layers and at the same time, keeps others frozen.

Table 5. Comparison between the proposed method and other standard methods for the multi-class classification of prostate cancer grading.

MTMDCW_2021_v24n7_849_t0005.png 이미지

4. CONCLUSION

In this paper, it was found that pre-trained deep learning models on the ImageNet dataset can be fine-tuned to improve the accuracy to classify PCa dataset. Three architectures and their weights were used and applied three different freezing techniques on the layers making them overall nine models. After re-training models with histology image datasets, they are evaluated by predicting 100 unseen test images. As a result, model_2 from MobileNet showed the best accuracy of 90%. It is recommended that this model could be used in real-time environments when we do not have enough data.

In conclusion, the proposed fine-tuning techniques used in this study is also effective for solving the over-fitting issue in the model.

References

  1. Key Statistics for Prostate Cancer(2021). https://www.cancer.org/cancer/prostate-cancer/about/key-statistics.html (accessed April 1, 2021).
  2. T.A. Ozkan, A.T. Eruyar, O.O. Cebeci, O. Memik, L. Ozcan, and I. Kuskonmaz, "Interobserver Variability in Gleason Histological Grading of Prostate Cancer," Scandinavian Journal of Urology, Vol. 50, No. 6, pp. 420-424, 2016. https://doi.org/10.1080/21681805.2016.1206619
  3. A. Madabhushi and G. Lee, "Image analysis and machine learning in digital pathology: Challenges and opportunities," Medical Image Analysis, Vol. 33, pp. 170-175, 2016. https://doi.org/10.1016/j.media.2016.06.037
  4. A. Madabhushi, "Digital Pathology Image Analysis: Opportunities and Challenges," Imaging in Medicine, Vol. 1, No. 1, pp. 7-10, 2009. https://doi.org/10.2217/iim.09.9
  5. K. Daisuke and I. Shumpei, "Machine Learning Methods for Histopathological Image Analysis," Computational and Structural Biotechnology Journal, Vol. 16, pp. 34-42, 2018. https://doi.org/10.1016/j.csbj.2018.01.001
  6. Z. Hu, J. Tang, Z. Wang, K. Zhang, L. Zhang, and Q. Sun, "Deep Learning for Image-Based Cancer Detection and Diagnosis-A Survey," Pattern Recognition, Vol. 83, pp. 134-149, 2018. https://doi.org/10.1016/j.patcog.2018.05.014
  7. O. Iizuka, F. Kanavati, K. Kato, R. Michael, A. Koji, and T. Masayuki, "Deep Learning Models for Histopathological Classification of Gastric and Colonic Epithelial Tumours," Scientific Reports, Vol. 10, No. 1504, pp. 1-11, 2020. https://doi.org/10.1038/s41598-019-56847-4
  8. S. Bhattacharjee, D. Prakash, C.-H. Kim, and H.-K. Choi, "Multichannel Convolution Neural Network Classification for the Detection of Histological Pattern in Prostate Biopsy Images," Journal of Korea Multimedia Society, Vol. 23, No. 12, pp. 1486-1495, 2020. https://doi.org/10.9717/KMMS.2020.23.12.1486
  9. S. Bhattacharjee, R.I. Sumon, K. Ikromjanov, Y.-B. Hwang, H.-C. Kim, and H.-K Choi, "Unsupervised Classification of Prostate Cancer Using Deep Learning Based Image Clustering Technique," Proceeding of International Conference on Multimedia Information Technology and Applications, pp. 113-116, 2021.
  10. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A Large-Scale Hierarchical Image Database," Proceeding of International Conference on Computer Vision and Pattern Recognition, pp. 248-255, 2009.
  11. Shallu and R. Mehra, "Breast Cancer Histology Images Classification: Training from Scratch or Transfer Learning?," ICT Express, Vol. 4, No. 4, pp. 247-254, 2018. https://doi.org/10.1016/j.icte.2018.10.007
  12. R. Mormont, P. Geurts, and R. Maree, "Comparison of Deep Transfer Learning Strategies for Digital Pathology," Proceedings of International Conference on Computer Vision and Pattern Recognition Workshops, pp. 2262-2271, 2018.
  13. B. Kieffer, M. Babaie, S. Kalra, and H.R. Tizhoosh, "Convolutional Neural Networks for Histopathology Image Classification: Training vs. Using Pre-trained Networks," Proceeding of International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 1-6, 2017.
  14. N. Tajbakhsh, J.Y. Shin, S.R. Gurudu, R.T. Hurst, C.B. Kendall, M.B. Gotway, et al., "Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?," IEEE Transactions on Medical Imaging, Vol. 35, No. 5, pp. 1299-1312, 2016. https://doi.org/10.1109/TMI.2016.2535302
  15. H.-G. Park, S. Bhattacharjee, P. Deekshitha, C.-H. Kim, and H.-K. Choi, "A Study on Deep Learning Binary Classification of Prostate Pathological Images Using Multiple Image Enhancement Techniques," Journal of Korea Multimedia Society, Vol. 23, No. 4, pp. 539-548, 2020.
  16. B. Kieffer, M. Babaie, S. Kalra and H.R. Tizhoosh, "Convolutional Neural Networks for Histopathology Image Classification: Training vs. Using pre-trained Networks," Proceeding of Seventh International Conference on Image Processing Theory, Tools and Application, pp. 1-6, 2017.
  17. K. Nguyen, B. Sabata, and A.K. Jain, "Prostate Cancer Grading: Gland Segmentation and Structural Features," Pattern Recognition Letters, Vol 33, No. 7, pp. 951-961, 2012. https://doi.org/10.1016/j.patrec.2011.10.001
  18. D. Albashish, S. Sahran, A. Abdullah, M. Alweshah, and A. Adam, "A Hierarchical Classifier for Multiclass Prostate Histopathology Image Gleason Grading," Journal of ICT, Vol. 17, No. 2, pp. 323-346, 2018.
  19. N. Bayramoglu and J. Heikkila, "Transfer Learning for Cell Nuclei Classification in Histopathology Images," Proceeding of European Conference on Computer Vision (ECCV), pp. 532-539, 2016.
  20. H. Pan, Z. Pang, Y. Wang, Y. Wang, and L. Chen, "A New Image Recognition and Classification Method Combining Transfer Learning Algorithm and MobileNet Model for Welding Defects," IEEE Access, Vol. 8, pp. 119951-119960, 2020. https://doi.org/10.1109/ACCESS.2020.3005450
  21. L.G. Falconi, M. Perez, and W.G. Aguilar, "Transfer Learning in Breast Mammogram Abnormalities Classification With Mobilenet and Nasnet," Proceeding of International Conference on Systems, Signals and Image Processing, pp. 109-114, 2019.
  22. K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," Proceeding of International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, 2016.
  23. N.S. Ismail and C. Sovuthy, "Breast Cancer Detection Based on Deep Learning Technique," Proceeding of UNIMAS STEM International Conference, pp. 89-92, 2019.
  24. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely Connected Convolutional Networks," Proceeding of International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261-2269, 2017.
  25. A. Mahbod, G. Schaefer, C. Wang, R. Ecker, G. Dorffner, and I. Ellinger, "Investigating and Exploiting Image Resolution for Transfer Learning-based Skin Lesion Classification," Proceeding of International Conference on Pattern Recognition (ICPR), pp. 4047-4053, 2020.
  26. Prostate Cancer Grade Assessment (PANDA) Challenge(2020). https://www.kaggle.com/c/prostate-cancer-grade-assessment (accessed March 20, 2021).
  27. D.P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," Proceeding of International Conference for Learning Representations, pp. 1-15, 2014.