1. Introduction
Breast cancer is known to be one of the most common cancers in women worldwide and it is considered as the second leading cause of female cancer deaths[2]. In addition to mammography, which is the primary imaging modality for screening, ultrasound (US) imaging is performed in breast imaging protocol as a diagnostic tool[3]. Double-reading the same mammograms by two radiologists independently has been reported to reduce the occurrence of missed cancers and it is included in most screening programs[4]. Because double-reading requires additional work load and costs, computer-aided diagnosis (CADx) is considered to provide radiologists with a second opinion for the medical image interpretation and diagnosis[5-14]. In many cases, CADx is applied to differentiate malignancy or benignancy for tumors or lesions[7, 8, 15- 20]. Because these systems can provide a second opinion to radiologists in a cost-effective way, they can help detect breast cancer in early stage and reduce the breast cancer death rate among women[21].
A wide variety of machine learning methods have been researched for early detection of breast cancer[22- 24]. Recently, deep learning methods have been widely adopted in perception-related problem[25]. The deep learning methods have been introduced to the medical imaging field with promising results on various applications, such as organ segmentations[26] and detection[27-30], tissue classification in histology and histopathology images[31-32], ultrasound standard plane selection[33],and knee cartilage segmentation[34], the computer aided prognosis or diagnosis for Alzheimer’s disease[35- 38] and so forth.
In terms of breast cancer, there are some previous works that applied deep learning methods to classify the identified lesions in breast images[39-40]. In this study, we exploit the performance of several reputational CNNs to differentiate the distinctive types of lesions and nodules acquired with ultrasound imaging, using relatively large size database. Its performance and accuracy in classifying and discriminating breast lesions were evaluated. The proposed framework is a component algorithm of S-Detect technology, which is implemented in RS80A(Samsung Medison, Inc.) This research is an practically implemented method of the authors’ previous method[42] and a substantial extention of [43]. Differently from the previous research of the authors[42], we employ a segmentation method instead of translation augmentation of the training data..
2. Method and Materials
The proposed procedure is in Fig.1.
Figure 1. The conceptual architecture of the proposed deep learning CAD framework
For this research, 7408 ultrasound breast images of 5151 patients cases were gathered. All cases have been proven in all biopsies and semi-automatically segmented lesions have been associated with the masses. All images were histogram-equalized, and each image was cropped to match the input image size. In 7408 ultrasound breast images, 6579 images were used for training, and 829 images were used as test set. The training dataset consists of 3765 benign and 2814 malignant mass images. Then, the training images were augmented by varying the margin between the boundary of each lesion and the boundary of the image itself. Optimal parameters were selected based on 10-fold cross validation with the training data.
2.1 Data Preparation
7408 breast ultrasound images were scanned from 5151 patients, in Samsung Medical Center(Seoul, South Korea). 5254 images were acquired with IU22(Philips, Inc.) and 2154 images were acquired with RS80A(Samsung Medison, Inc.). Histopathological characteristics of all breast lesions were biopsy proven. Some examples of malignant lesions and benign lesions are in Fig.2. All experimental protocols were approved by Samsung Medical Center, Seoul, South Korea. Informed consent was obtained from all patients for their consent to use their information in the research without violating their privacy. Among the images, 829 lesions(489 benign lesions and 340 malignant lesions) were randomly selected as a test set. We selected those test data so that the train and test set were separated at the patient level to avoid bias. In Table.1, an overview of the lesion size attributes of training data and test data is presented.
Figure 2. The conceptual architecture of the proposed deep learning CAD framework
Table 1. Overview of the lesion size attributes of training data and test data.
The suggested CADx method aims to classify an identified ROI as benign or malignant lesion. In this research, the ROI location was provided by six radiologists first, then an automatic segmentation method followed to draw the boundary of lesions. Based on the resultant boundary, the ROI was cropped with a margin, which is defined as the distance between the lesion boundary and the boundary of the cropped image itself.
2.2 Lesion Boundary Segmentation
In classifying the ROI of an input image with benign or malignant lesion, the shift of each lesion from the imagecenter may affect the performance of the classification. To compensate for the shift, the boundary of each lesion is drawn using Fully Convolutional Networks(FCN) segmentation method[1] based on the radiologist specified points.
In Fig.3(a) and Fig.3(b), we presented examples of breast lesion images. After applying FCN segmentation, the boundary of the lesion can be drawn as in Fig.3. Taking advantage of the drawn boundary, the center position of the breast lesion can be estimated as the median of the boundary. The ROI can be cropped centered based on the estimated center position of the breast lesion. This process matters in that in the real clinical situation, the radiologist provides the input image after pointing the seed point to the area of the potential lesion. It can be possibly anywhere in the acquired US image. After operating FCN segmentation method considering the point provided by a radiologist, we can center the lesion on the center of the input image and crop the input image based on the result of the segmentation result.
Figure 3. (a) and (b) are examples of segmented boundaries of breast lesion.
2.3 Data Augmentation by Image Cropping with Margin
The margin is defined as the distance between the lesion boundary and the boundary of the cropped image itself in this research. To figure out how the size of the margin affects the overall performance, we created a database without a margin, a database with a margin of 50 pixel. The performance using each database was compared to each other so that we could figure out whether image cropping with a margin is better than image cropping without a margin or not. With the margin, the image contains information about the breast lesion as well as the background. If the dataset with a margin shows better result than the result of the dataset without a margin, it indicates that the information in the background also affects the overall classification performance. In addition to the performance comparison, we made the network feed backward to create the saliency map[41]. The final label evaluation is inversely fed back to determine what part of the input image affects the final label evaluation. To do that, the derivative of each layer is obtained. Derivatives at each layer is the gradient of that layer with respect to the output of the layer. To construct the saliency map, we obtained derivatives with respect to the input layer. In the saliency map, the portion of the input image that affects the estimated label is enhanced. This result may explain why the margin affects the performance. If the information to classify the lesion exists outside the lesion as well as the lesion itself, the margin will surely affect the performance. To create the training dataset, the images that were cropped with a margin of 30 pixels and 70 pixels were added to the images cropped with a margin of 50 pixels for data augmentation. However, the margin was set to 50 pixels in test data set. All ROI images have been resized to 255x255 because the input size of the network is set to 255x255. Because the input image size is fixed at 255x255, this augmentation affects and determines the relative size of the breast tumor in the images. Differently from the previous research of the authors[42], we have nont adopted translation augmentation. With FCN segmentation, the center of the lesion is displayed at the center of the training image and the input image. Thus, rather than the translation augmentation, we moved the image to the center through the segmented boundary.
3. Experimental Results
We employed GoogLeNet, which is established in 2014, and modified the network for our purpose. Two Auxiliary classifiers were removed in this research as in the authors’ previous research[42]. In this research, we have just 2 class problem, benign and malignant lesion. Because GoogLeNet has 1000 class outputs, we reduced the output to 2 class outputs. All pixels in each patch are treated as the input neurons. This modified GoogLeNet was set as a reference network in this research. We evaluated the performance of the proposed deep learning framework of breast cancer classification in terms of accuracy, sensitivity, specificity, and AUC(area under curve). Optimal parameters were chosen based on 10-fold cross validation with the training data. Then, the optimized parameters were applied to evaluate the performance on the test dataset.
First, we compared the number of the training images and the required time of the proposed method to those of the previous method of the authors[42] in Table.2. As can be seen in Table.2, the proposed method has a much shorter learning time compared to the previous method. Using the proposed method, we could train the network within a day, while we should spend more than a month to train the network using the previous method, which makes the proposed method much more practical than the previous one in terms of implementation efficiency.
Table 2. Comparison of the required time for learning and the number of training images.
Fig.4 shows ROC(Receiver-Operating Characteristic) curves for the evaluated CNNs. One neural network is trained and tested on the images without a margin. The other neural network is trained and tested on the images with 50 pixel margin.
Figure 4. ROC curves of GoogLeNets trained and tested on the images without a margin(black) and with a margin(red).
As can be seen in Fig.4 and Table.3 training and testing the neural network on the images with a margin seem to improve the performance of the neural network. To figure out if the information to classify the lesion exist outside the lesion as well as in the lesion, we implemented the saliency map[41] and applied it to the trained network. In Fig.5, four resultant examples are presented. In each example, the input image is on the right side while the saliency map corresponding to the input image is on the left side. The black region indicates the pixels that affect the label estimation. As can be seen in Fig.5, the black region seems to exist also in the boundary of the lesion as well as in the lesion itself.
Table 3. Performance comparison of a CNN trained on the images with a margin to a network trained on images without a margin. GLN refers to GoogLeNet.
Figure 5. Saliency map examples that shows where the important information exists in the image.
Considering the result in Fig.4 and Fig.5, we augmented the training dataset by adding images cropped with a margin of 30 pixels and 70 pixels. Thus, we present the evaluation results of the GoogLeNet CNNs with the data augmentation and without the data augmentation in Fig.6.
Figure 6. ROC curves of GoogLeNet without the data augmentation and with the dataaugmentation.
As we can see in Table.4, the performance of CNN is very promising. The networks trained on the data with augmentation and the data without augmentation both show AUC over 0.95. The network showed about 90% accuracy, 0.86 sensitivity and 0.95 specificity. The result shows that the data augmentation in terms of the margin improves the performance of classification. Considering the large number of the images in the training set and the test set, the proposed framework seems to be very helpful to classify benign or malignant breast tumor in real clinical application. Although it is assumed that the lesion should be centered on the basis of the segmentation result, there is a possibility that each radiologist may have differently specified the center point which affects the performance of the proposed framework. Thus we also simulated that situation. We have made the center position of each test image unstable by random vertical and horizontal movements that are uniformly distributed between 0 and 32 pixels (one pixel corresponds to about 0.3 mm). In Fig.7 shows the result. As can be seen in Fig.7 and Table.5, the perturbed unstable position by the radiologist may affect the performance. Centered image indicates the result without unstable movement in Table.4. Perturbed image in Table.5 indicates the result with perturbation recovery based on FCN segmentation. Some performance may have been lost, but the proposed method was able to recover the perturbation. Modified GoogLenet was employed for this simulation test.
Table 4. Diagnostic performances of CNN networks.
Figure 7. The perturbation in the center location by radiologist does not affect the performance much
Table 5. Diagnostic performances on centered images, and perturbed images.
In Fig.8, we presented some implementation examples.
Figure 8. An implementation examples of (a)a possibly benign lesion, (b)a possibly malignant lesion.
In this research, the threshold was set to 0.1 for high sensitivity model, and 0.6 for high specificity model. CNN training is implemented with the Caffe[44] deep learning framework, using a NVidia K40 GPU on Ubuntu 14.04. A model snapshot with low validation loss is taken for the final model. Learning hyperparameters are set as follows: momentum 0.9, weight decay 0.0002, and a poly learning policy with base learning rate of 0.0001. The image batch size is 32, which is the maximum batch size that works in our system circumstances.
4. Discussion
The proposed method could accurately distinguish malignant lesions from benign lesions when the location of the tumor was given by a radiologist. Thus, the proposed framework can help radiologists make accurate decisions about the next procedures. This research applied a deep learning method to data sets collected by over 1000 patients. Considering the number of images and patients, the results of the proposed framework may be reproduced in different data sets, demonstrating the benefits of clinical application of deep learning methods. Unlike our previous research[42], we have not adopted translation augmentation. With FCN segmentation, the center location of the lesion centered in the training images, as well as the input images. This reduced the number of the training images. In Reference[42], we had to increase the number of training images for translation augmentation as well as margin augmentation. This research is very useful and practical for implementation because it reduces the number of the training images and makes the data set preparation much easier. The neural networks trained on the data with augmentation and the data without augmentation both showed AUC over 0.9. We added the images cropped with a margin of 30, 50, and 70 pixels to the input image data for data augmentation. Trying more data augmentation in terms of margins may improve results.
The difficulty comes from the fact that the number of malignant tumor images is smaller than the number of benign tumor images. Increasing the number of malignant tumor images can reduce the loss of accuracy. If the proposed framework is applied to the actual clinical situation by radiologist, it can classify the malignant lesions in a short time and support the diagnosis of radiologist who discriminates malignant lesions. Therefore, it is possible to give successful performance in cooperation with human radiologist as a second opinion, which meets the fundamental purpose of CADx.
5. Conclusion
In this research, we used a deep learning framework to differentiate the distinctive type of lesions and nodules in breast acquired with ultrasound imaging. Biopsy-proven benchmarking datasets were created to evaluate the proposed method. The proposed framework consists of histogram equalization, image cropping, and margin augmentation. Optimal parameters were selected based on 10-fold cross validation with training data. The networks showed AUC of 0.95, about 0.9(90%) accuracy, 0.86 sensitivity and 0.95 specificity. Although the proposed framework still requires to point to the location of the target ROI with the help of radiologists, the result of the suggested framework showed promising results. Using this method in conjunction with a radiologist in a clinical setting can assess the malignancy of the lesion and the radiologist can identify malignant lesions at the right time. Therefore, the proposed framework can support human radiologist to give successful performance and help to create a fluent diagnostic workflow that meets the fundamental purpose of CADx
Acknowledgement
This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (2017R1C1B5077068) and by Korea National University of Transportation in 2018. The study is also supported by 2017 Research Grant from Kangwon National University(No.620170073).
References
- Jonathan Long, Evan Shelhamer, and Trevor Darrell, "Fully Convolutional Networks for Semantic Segmentation", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, April 2017,pp.640-651. https://doi.org/10.1109/TPAMI.2016.2572683
- https://www.cdc.gov/cancer/dcpc/data/women.htm
- Kornecki, "A 2011 Current Status of Breast Ultrasound," Can. Assoc. Radiol. J., vol.62, 2011,pp.31-40. https://doi.org/10.1016/j.carj.2010.07.006
- L. TabA¡r, B. Vitak, T.H. Chen, A.M. Yen, A. Cohen, T. Tot, S.Y. Chiu, S.L. Chen, J.C. Fann, J. Rosell, H. Fohlin,R.A. Smith, and S.W. Duffy, "Swedish two-county trial: impact of mammographic screening on breast cancer mortality during3 decades," Radiology, vol.260 (3), 2011, pp.658-663. https://doi.org/10.1148/radiol.11110469
- K. Doi, "Computer-aided diagnosis in medical imaging: historical review, current status and future potential," Comput Med Imaging Graph, vol.31, 2007pp.198-211. https://doi.org/10.1016/j.compmedimag.2007.02.002
- B. van Ginneken, C.M. Schaefer-Prokop, and M. Prokop, "Computer-aided diagnosis: how to move from the laboratory to the clinic," Radiology, vol.261, 2011, pp.719-732. https://doi.org/10.1148/radiol.11091710
- M. L. Giger, H. Chan, and J. Boone, "Anniversary paper: history and status of CAD and quantitative image analysis: the role of medical physics and AAPM," Med Phys, vol.35, 2008, pp.5799-5820. https://doi.org/10.1118/1.3013555
- J. Cheng et al., "Computer-aided US diagnosis of breast lesions by using cell-based contour grouping1." Radiology, vol.255, 2010, pp.746-754. https://doi.org/10.1148/radiol.09090001
- M. L. Giger, N. Karssemeijer, and J.A. Schnabel, "Breast image analysis for risk assessment, detection, diagnosis, and treatment of cancer," Annu Rev Biomed Eng, vol.15, 2013, pp.327-357 https://doi.org/10.1146/annurev-bioeng-071812-152416
- S. Joo, Y.S Yang, W.K. Moon, and H.C. Kim, "Computer-aided diagnosis of solid breast nodules: use of an artificial neural network based on multiple sonographic features," IEEE Trans Med Imag, vol.23, 1292-1300 (2004) https://doi.org/10.1109/TMI.2004.834617
- C.M. Chen et al, "Breast Lesions on Sonograms: Computer-aided Diagnosis with Nearly Setting-Independent Features and Artificial Neural Networks," Radiology, vol.226, 2003, pp.504-514. https://doi.org/10.1148/radiol.2262011843
- K. Drukker, C. Sennett and M.L. Giger, "Automated method for improving system performance of computer-aided diagnosis in breast ultrasound," IEEE Trans Med Imag,vol.28, 122-128 (2009). https://doi.org/10.1109/TMI.2008.928178
- K. Awai et al, "Pulmonary Nodules: Estimation of Malignancy at Thin-Section Helical CTâ€"Effect of Computer-aided Diagnosis on Performance of Radiologists," Radiology, vol.239, 2006,pp.276-284. https://doi.org/10.1148/radiol.2383050167
- M.B. McCarvilleet et al, "Distinguishing Benign from Malignant Pulmonary Nodules with Helical Chest CT in Children with Malignant Solid Tumors," Radiology, vol.239, 2006, pp.514-520. https://doi.org/10.1148/radiol.2392050631
- I.C. Sluimer, P.F. van Waes, M.A. Viergever, and B. van Ginneken, "Computer-aided diagnosis in high resolution CT of the lungs," Med Phys, vol.30, 2003, pp.3081-3090. https://doi.org/10.1118/1.1624771
- T. Sun, R. Zhang, J. Wang, X. Li, and X. Guo, "Computer-aided diagnosis for early-stage lung cancer based on longitudinal and balanced data," Plos ONE, vol.8, 2013, pp.e63559. https://doi.org/10.1371/journal.pone.0063559
- T.W. Way et al, "Computer-aided diagnosis of pulmonary nodules on CT scans: improvement of classification performance with nodule surface features," Med Phys, vol.36, 2009, pp.3086-3098. https://doi.org/10.1118/1.3140589
- S.G. Armato III, and W.F. Sensakovic, "Automated lung segmentation for thoracic CT: Impact on computer-aided diagnosis," Acad Radiol, vol.11, 2004, pp.1011-1021. https://doi.org/10.1016/j.acra.2004.06.005
- T.W. Way et al, "Computer-aided diagnosis of pulmonary nodules on CT scans: segmentation and classification using 3D active contours," Med Phys, vol.33, 2006, pp.2323-2337. https://doi.org/10.1118/1.2207129
- J. Wang et al, "Discrimination of Breast Cancer with Microcalcifications on Mammography by Deep Learning," Sci Rep, vol.6, 2016.
- T. Ayer et al, "Computer-aided diagnostic models in breast cancer screening,", Imaging Med., vol.2(3), 2010, pp.313-323. https://doi.org/10.2217/iim.10.24
- J.A. Cruz and D.S. Wishartl, "Applications of Machine Learning in Cancer Prediction and Prognosis," Cancer Inform., vol.2, 2006, pp.59-77.
- V. Vishrutha and M. Ravishankar, "Early Detection and Classification of Breast Cancer," Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA), 2014, pp.413-419.
- M. Krishnan et al, "Statistical analysis of mammographic features and its classification using support vector machine," Expert Systems with Applications, vol.37(1), 2010,pp.470-478. https://doi.org/10.1016/j.eswa.2009.05.045
- Y. Bengio, A. Courville, and P. Vincent, "Representation learning: are view and new perspectives," IEEE Trans Pattern Anal Mach Intell. vol.35(8), 2013, pp.1798-1828. https://doi.org/10.1109/TPAMI.2013.50
- W. Zhang et al, "Deep convolutional neural networks for multi-modality isointense infant brain image segmentation," NeuroImage, vol.108, 2015, pp.214-224. https://doi.org/10.1016/j.neuroimage.2014.12.061
- H. -C. Shin, M. R. Orton, D. J. Collins, S. J. Doran, and M. O. Leach, "Stacked autoencoders for unsupervised feature learning and multiple organ detection in a pilot study using 4D patient data," IEEE Trans Pattern Anal Mach Intell, vol.35, 2013, pp. 1930-1943 https://doi.org/10.1109/TPAMI.2012.277
- H. Roth et al, "Improving Computer-aided Detection using Convolutional Neural Networks and Random View Aggregation. IEEE Trans Med Imag, vol.35(5), 2016, pp.1170-1181. https://doi.org/10.1109/TMI.2015.2482920
- A. Seff et al. "Leveraging Mid-Level Semantic Boundary Cues for Automated Lymph Node Detection, " Med Image Comput Comput Assist Interv(MICCAI), vol.9350, 2015, pp.53-61.
- N. Tajbakhsh, M. B. Gotway, and J. Liang, "Computer-aided pulmonary embolism detection using a novel vessel-aligned multi-planar image
- J. Arevalo, A. Cruz-Roa, and F.A. Gonzalez, "Hybrid image representation learning model with invariant features for basal cell carcinoma detection," Proc SPIE 8922, 2013, pp. 89220M-89220M-6.
- A. A. Cruz-Roa, J. E. A. Ovalle, A. Madabhushi, and F. A. G. Osorio, "A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection," Medical Image Computing and Computer-Assisted Intervention(MICCAI) 2013, 2013, pp.403-410,
- H. Chen et al, "Automatic Fetal Ultrasound Standard Plane Detection Using Knowledge Transferred Recurrent Neural Networks," Med Image Comput Comput Assist Interv(MICCAI), vol.9349, 2015, pp.507-514.
- A. Prasoon, K. Petersen, C. Igel, F. Lauze, E. Dam, and M. Nielsen, "Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network," Medical Image Computing and Computer-Assisted Intervention(MICCAI 2013), Vol. 8150 of Lecture Notes in Computer Science, 2013, pp. 246-253.
- H.I. Suk, and D. Shen, "Deep learning-based feature representation for AD/MCI classification," Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8150, 2013, pp.583-590.
- H.-I. Suk, S.-W. Lee, and D. Shen, "Latent feature representation with stacked auto-encoder for AD/MCI diagnosis," Brain Struct Funct, 2013, pp.1-19.
- H.-I. Suk, S.-W. Lee, and D. Shen, "Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis," Neuroimage, vol.101, 2014, pp.569-582. https://doi.org/10.1016/j.neuroimage.2014.06.077
- F. Li, L. Tran, K.-H. Thung, S. Ji, D. Shen, and J. Li, "A Robust deep learning for improved classification of AD/MCI patients," IEEE J. Biomed. Health Inform., 2015, pp.1610-1616
- A. Jalalian, S.B. Mashohor, H.R. Mahmud, M.I.B. Saripan,A.R.B. Ramli, and B. Karasfi, "Computer-aided detection/diagnosisof breast cancer in mammography and ultrasound: a review," Clin Imaging, vol.37(3), 2013, pp.420-426. https://doi.org/10.1016/j.clinimag.2012.09.024
- J. Z. Cheng et al, "Computer-Aided Diagnosis with Deep Learning Architecture: Applications to Breast Lesions in US Images and Pulmonary Nodules in CT Scans," Sci Rep, 2016.
- K Simonyan, A Vedaldi, A Zisserman, "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps," 2013, arXiv.org:1312.6034.
- S.Han, H.K. Kang, J.Y. Jeong, M.H. Park, W. Kim, W.C. Bang, and Y.K. Seong, "A Deep Learning Framework for Supporting the Classification of Breast Lesions in Ultrasound images," Phys. Med. Biol., vol.62, 2017
- S.Han, J.Jeong, H.Kim, "An Implementation of Deep Learning Method of Breast Lesion Classification," Proceedings of Joint Conference on Communications and Information 2019, submitted.
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. B. Girshick, S. Guadarrama, and T. Darrel, "Caffe: Convolutional Architecture for Fast Feature Embedding," in ACM Multimedia, vol.2, 2014.