DOI QR코드

DOI QR Code

A Parallel Deep Convolutional Neural Network for Alzheimer's disease classification on PET/CT brain images

  • Baydargil, Husnu Baris (Department of Electric Electronic and Communication Engineering, Kyungsung University) ;
  • Park, Jangsik (Department of Electric Electronic and Communication Engineering, Kyungsung University) ;
  • Kang, Do-Young (Department of Nuclear Medicine, Dong-a University College of Medicine, Dong-A University Hospital) ;
  • Kang, Hyun (Institute of Convergence Bio-Health, Dong-A University) ;
  • Cho, Kook (College of General Education, Dong-A University)
  • Received : 2019.10.23
  • Accepted : 2020.07.29
  • Published : 2020.09.30

Abstract

In this paper, a parallel deep learning model using a convolutional neural network and a dilated convolutional neural network is proposed to classify Alzheimer's disease with high accuracy in PET/CT images. The developed model consists of two pipelines, a conventional CNN pipeline, and a dilated convolution pipeline. An input image is sent through both pipelines, and at the end of both pipelines, extracted features are concatenated and used for classifying Alzheimer's disease. Complimentary abilities of both networks provide better overall accuracy than single conventional CNNs in the dataset. Moreover, instead of performing binary classification, the proposed model performs three-class classification being Alzheimer's disease, mild cognitive impairment, and normal control. Using the data received from Dong-a University, the model performs classification detecting Alzheimer's disease with an accuracy of up to 95.51%.

Keywords

1. Introduction

Alzheimer’s disease (AD) is statistically the most common type of dementia generally seen in elderly people; it makes up about 70% of the cases in individuals who are 70 years of age or higher [1]. In most people, AD late-onset symptoms begin to appear in their mid-60s, while the early-onset occurs between 30s and mid-60s although the latter case is quite rare. It is also estimated that around 640 million people will be diagnosed by 2050 due to the increase in the aging population [2]. During the progress of the disorder, the brain structure progressively]changes; while the initial damage occurs in the hippocampus. Mild cognitive impairment (MCI) can be considered the early sign of AD, however not everyone with MCI may develop into the AD. As the disease progresses, hard plaques and tangles occur in the brain, causing debilitating issues such as progressive memory loss and at later stages, inability to move. AD also shrinks the hippocampus and cerebral cortex, while enlarging ventricles. Since the hippocampus is the part responsible for episodic and spatial memory and also works as a relay between the brain and the rest of the body, neurons cannot communicate through synapses. This causes issues with remembering, thinking, planning, and judgment. The effects of this progress can be seen as a low-intensity brain cell in medical imaging. As of writing this paper, there is no definite cure for AD, and current treatments help only with prolonging life and alleviating symptoms.

E1KOBZ_2020_v14n9_3583_f0001.png 이미지

Fig. 1. Amyloid-positive (left) and amyloid-negative (right) images using PET scans are used in the diagnosis of Alzheimer’s disease.

As the AD progresses, protein plaques in the brain named amyloid-β (Aβ) and hyperphosphorylated tau and lead to progressive neuron and axon damage. The changes generally follow affecting the early medial temporal lobe (entorhinal cortex and hippocampus), followed by progressive neocortical damage [3]. These changes start to appear years before AD becomes prevalent. It appears that the symptoms of AD start to develop due to the toxic effects of the plaques in the brain achieve a certain clinical threshold. For that reason, this paper’s work focuses on PET/CT (Positron Emission Tomography/Computed Tomography) images where the prevalence of these plaques is highlighted in patients.

Since the early 1970s, researchers developed several computer-aided diagnostics systems that were based on manually subtracted feature vectors [4]. Afterward, these vectors were trained in a supervised manner to a machine learning model such as a support vector machine to perform classification. However, it was soon realized that there were serious limitations and shortcomings to such systems [5]. Recognizing these limitations, researchers turned towards data mining approaches in the 1980s and 1990s in hopes of developing more advanced and flexible systems.

Nowadays, the performance of deep learning models is being compared with human experts] in computer-assisted-diagnosis (CAD). Due to deep learning methodologies such as convolutional neural networks (CNN) were able to extract meaningful features from a given image automatically through supervised training and produce highly accurate results, the possibility of human experts being replaced with automated systems in near future is being discussed in the medical field due to the reduced cost and relatively similar performance [6]. Deep learning models show promising results for other medical image analysis tasks such as medical segmentation, disease, and tumor detection, and classification, breast, bone, etc. using microscopy, and ultrasound images as well.

In this paper, a parallel pipeline CNN model that uses both convolution operation and dilated convolution operation is proposed. The model extracts two distinct, but complementary image features that are spatially complementary to each other for each given brain image. One pipeline consists of conventional layers, whereas the other pipeline consists of dilated convolutional layers. The extracted features coming from two pipelines are concatenated to be used in the classification stage The proposed model is trained with the dataset that is obtained from Dong-A University Department of Nuclear Medicine, using PET/CT images.

Paper is written in a format such that section 2 discusses related work on AD detection in brain imaging. Section 3 explains the proposed model. Section 4 shows the experimental results and considerations. Section 5 concludes the paper and discusses future work.

2. Alzheimer’s Disease Detection with Computer Vision

There has been a massive effort in the field of medical imaging to better observe the changes in the brain the AD causes through time. There have been machine learning applications proposed to classify different stages of AD using imaging data [7,9]. Structural differences in the brain were identified through such research, to show the difference between a healthy brain and a brain that’s affected by AD. There is also a strong correlation between brain connectivity and the AD patient’s behavior [10]. The degeneration of the brain cells caused by AD can be seen using a variety of imaging techniques, such as PET/CT, structural and functional magnetic resonance imaging (sMRI, fMRI), diffusion tensor imaging (DTI), etc. These imaging techniques were used in AD diagnosis research such as PET [11], sMRI [12], fMRI [13]. It’s been also shown that the features combined from multiple modalities improve the accuracy. According to the results of one study, the accuracy of classification in the AD case has increased from 72% to 93.2% when a multi-modal is used instead of an individual modal [14].

18F-florbetaben (FBB) PET imaging is useful in providing information about Aβ deposition of the brain of a patient with AD. Even though a variety of medical imaging techniques are widely used for AD research, clinical testing is still the default tool in AD diagnosis. However, regarding the prodromal AD and MCI, accurate diagnosis is challenging from the current existing clinical examination, which prompted including biomarkers such as Aβ and/or tau into AD diagnostics. The development of radiotracers to visualize Aβ plaques in the brain has become an active area of AD research. In this work, FBB brain PET imaging data was used to develop and validate the proposed model.

PET/CT is a combination of the cross-sectional anatomic information obtained through CT and the metabolic information obtained through PET and CT, and the sequential images from both devices are combined into a single superposed image. This offers advantages over single CT or PET alone, such as its ability to accurately localize the increased FBB activity in abnormal locations which may be impossible to localize with PET alone with the complement of lack of anatomical information and CT-based attenuation map. The work so far in the literature shows that PET images provide better classification accuracy [15] than MRI images [16,17] alone.

Deep learning models are used in so many areas such as object recognition, natural language processing, object tracking, segmentation, etc. due to their extraordinary feature extraction capabilities from input data automatically. Their hierarchical and deep structures allow them to learn high-level features, and further down the pipeline, more abstract features are obtained. These models are also capable of learning certain disease-caused features in the brain from images in their hidden or latent representations. Therefore, it didn’t take much time to apply and develop such models to the field of medical imaging for a more accurate diagnosis of disorders and diseases. Researchers used sparse autoencoder for multi-class classification using 2D CNN [18], and 3D CNN [19]. Brosh et. al. [20] developed a deep belief network (DBN) using manifold learning to detect AD. Suk et. al. [21 – 23] developed an autoencoder that incorporated SVM kernels using magnetic current imaging (MCI) for classification. Cárdenas-Peña et. al. [24] developed a model using kernel alignment and showed that supervised pre-training of stacked autoencoder provides higher classification accuracy than unsupervised pre-training with plain autoencoders and principal component analysis (PCA).

Unfortunately, AD is only classified at its later stage, and earlier detection may only help slow the progression of cognitive decline, so it’s vital to be able to detect AD at its earlier stages. In this paper, a new proposed very deep parallel CNN which takes in three classes AD, MCI, and normal control (NC), and is capable of extracting different features of the same input data, concatenate these features and provide higher classification accuracy. The dataset used was provided with the collaborating Dong-a University Department of Nuclear Medicine.

3. The Proposed Parallel Model for Classification

3.1 Deep Learning Model

The proposed model is a parallel deep CNN that is designed to extract different abstract color-based spatial information using convolutional and dilated convolutional layers, and concatenate these extracted abstract information to achieve higher accuracy. While one pipeline is a standard 8-layer-deep CNN with normal convolutions, the other pipeline is an 8-layer-deep CNN with dilated convolutions [25]. Both parallel pipelines consist of operations such as Convolution or dilated convolution, Batch Normalization [26], Rectified Linear Units (ReLU) activation, max-pooling, and dropout [27].

Both first four layers of the network perform the first four operations respectively, while the eighth layer also performs dropout. After the computations are done, the features are flattened in the form of a dense layer similar to Huang et. al.’s work [28] and produce a dense block concatenating both of the layers from both pipelines. An example of respective operations can be seen in Fig. 2

E1KOBZ_2020_v14n9_3583_f0002.png 이미지

Fig. 2. Standard operations performed by the model down both pipelines.

Regular convolution operation, while it’s very effective, generally struggle to integrate global context to its activations. If one considers a pure CNN with 𝑘 × 𝑘 convolutions without pooling, then the receptive field of each unit is which affects the activation is 𝑙 × (𝑘 − 1) + 𝑘 with l being the layer index. Therefore, it can be seen that the effective receptive field grows only linearly through the layers. This is limiting when it comes to higher resolution input images. Dilated convolution, however, works different in a way where between signal f and kernel k with dilation factor 𝑙 is defined as,

\(\left(k_{* l} f\right)_{t}=\sum_{\tau=-\infty}^{\infty} k_{\tau} \cdot f_{t-l \tau}\)       (1)

where 𝑓𝑡−𝑙𝜏 differs from normal convolution where it would be 𝑓𝑡−𝜏. In the dilated convolution operation, the kernel comes into contact with the signal every 𝑙𝑡ℎ entry, which can also be extended for 2D convolutions. This makes dilated convolution operation computationally more efficient than convolution operation. In the proposed 4-layer-deep model, the conventional convolution pipeline has roughly 60 million trainable parameters, and the dilated convolution pipeline has around 27 million trainable parameters. This equals to around 25% more computationally lightweight pipeline for the dilated convolution pipeline compared with a pipeline that has the same perceptive field using 5 × 5 convolutional filters.

E1KOBZ_2020_v14n9_3583_f0003.png 이미지

Fig. 3. The architecture of the proposed model. The top pipeline is a regular CNN with eight layers with each specific layer operations highlighted in Fig. 2. The bottom pipeline is dilated CNN with eight identical layers except that in this pipeline, dilation operation is used. In both pipelines, fully-connected layers are obtained, concatenated together, and this layer is connected to the classification layer with softmax operation.

Both convolutional and dilated convolutional filters used in pipelines have a size of 3 × 3, while the dilated convolutional filter also has a dilation size of 2. As the weights and other parameters are constantly changed during training, sometimes the data is affected by being too big or too small, which is a phenomenon called the internal covariance shift. Batch normalization is used to decrease the training time and fixes this issue. Max-pooling with a 2 × 2 size is used for both pipelines.

Due to its nature, dilated convolution increases the receptive view of the network exponentially depending on the dilation size. In this work, this fact is taken advantage of, as this makes the network lighter and captures different features than plain convolution operation of similar kernel size compared to 5 × 5 convolutional layer.

The proposed model takes in the initial input with size 160 × 160 × 3, and after each max-pooling operation, the image size decreases by half while the depth doubles. This operation is done four times until there is a 1024-unit fully-connected layer for each pipeline. Concatenating these fully-connected layers gives us a new fully-connected layer with a unit size of 2048, with both pipelines’ extracted features into one. This layer is connected to the classification layer with softmax operation, where it’s defined as,

\(p_{i}=\frac{\exp \left(f_{i}\right)}{\sum_{i} \exp \left(f_{i}\right)}, i=1, \ldots, m\)       (2)

where 𝑓𝑖 is the representation, 𝑝𝑖 is the probability score, and m is the number of classes, which in this case is 3 (AD, MCI, and NC). With this information, L is obtained such that,

\(\mathrm{L}=-\sum_{i} t_{i} \log \left(p_{i}\right)\)       (3)

where L is the cross-entropy loss of the model. This value is used during the backpropagation, where gradients are calculated. Say, the PET/CT image data is called 𝑡𝑖, then,

\(\frac{\partial \mathrm{L}}{\partial \mathrm{f}_{i}}=p_{i}-t_{i}\)       (4)

Stochastic gradient descent with momentum optimization algorithm was used for the proposed model. The training data is also split with training and cross-validation in 8:2 proportions. Nesterov momentum was also used for speeding up the training and improve convergence. It works as, given the objective function 𝑓(𝛼) is to be minimized, Nesterov momentum is given as, 

𝑣𝑡 = 𝜇𝑣𝑡−1 − 𝜀𝛻𝑓(𝛼𝑡−1 + 𝜇𝑣𝑡−1)       (5)

𝛼𝑡 = 𝛼𝑡−1 + 𝑣𝑡       (6)

where 𝑣𝑡 is the velocity, 𝜀 > 0 is the learning rate, µ ∈ [0,1] is the momentum coefficient, and 𝛻𝑓(𝛼𝑡) is the gradient at 𝛼𝑡.

Let x={𝑥𝑖, i=1,…,N} be a set of PET/CT images with 𝑥𝑖 ∈ [0,1,2, …, 𝐿 − 1](ℎ × 𝑤 × 𝑙), a 3D image with L grayscale values, h × w × l voxels and y ∈ {0, 1, 2} the classes in the dataset (AD, MCI, and NC). Next, construct a classifier f,

𝑓: 𝑋 → 𝑌; x → y       (7)

that predicts a label y according to an input x with a minimum error rate. We would like to determine this classifier 𝑓 through an optimal parameter set w ∈ ℝ𝑃(where P could easily be millions) through minimizing the error rate during the prediction. Therefore, training is an iterative process of finding the set of parameters 𝑤, while minimizing the classifier loss,

\(L(w, X)=\frac{1}{n} \sum_{i=1}^{n} l\left(f\left(x_{i}, w\right), \hat{c}_{i}\right)\)       (8)

where 𝑥𝑖 is the ith image of 𝑋,𝑓(𝑥𝑖, 𝑤) is the classifier function predicting the class 𝑐𝑖 of 𝑥𝑖 in a given 𝑤, 𝑐̂𝑖 the ground truth class for the specific image, and 𝑙(𝑐𝑖, 𝑐̂𝑖) is the loss function for the wrong prediction (𝑐̂𝑖 instead of 𝑐𝑖). Then we set L for the cross-entropy loss.

\(L=\sum_{i} \hat{c}_{i} \log c_{i}\)       (9)

3.2 Dataset

In this paper, PET/CT images dataset is the property of Dong-a University Hospital Department of Nuclear Medicine. There are three separate classes, AD, MCI, and NC. The detailed information about the dataset is given in Table 1.

Table 1. The dataset information with the total number of images and the total number of patients for each designated class.

E1KOBZ_2020_v14n9_3583_t0001.png 이미지

In a given PET/CT scan of a person’s brain, there were 110 axial images. However, the top 30, and bottom 30 slices provided no useful information about plaque formations in the brain information about classification, so these images were discarded. Therefore, 50 images from each patient’s brain were used for training and testing. Each image size is 160 × 160 × 3.

E1KOBZ_2020_v14n9_3583_f0010.png 이미지

Fig. 4. Sample images from the dataset. Left image AD, middle one MCI, and the right image is the NC case. Image slices were taken from the same slice number of different patient's images (60th slice out of 110).

The human brain in real life is a 3-dimensional organ, however, the medical images only represent one plane, which is the axial plane. Other axes, namely sagittal and coronal axes also carry spatial information in their relevant planes about plaque formations in the brain, which can be used to train the model. Thus, in this work, the proposed model was also tested to see how it would perform with other axes of the brain, sagittal, and coronal. Using the bicubic interpolation method, sagittal and coronal axes were created. For these axes, initial 15 and the last 15 images were discarded for these images did not carry any useful information about. This means for one patient, out of 110 created images, 80 training images contain useful information about AD for both sagittal and coronal axes. The flowchart of the followed method can be seen in Fig. 5. The number of images obtained through this process is shown in Table 2. An example of created coronal and sagittal images can be seen in Fig. 6 and Fig. 7, respectively.

E1KOBZ_2020_v14n9_3583_t0002.png 이미지

Fig. 5. Flowchart of the method for creating sagittal and coronal axes using the axial images.

Table 2. The total number of images that are used for training images after the bicubic interpolation process.

E1KOBZ_2020_v14n9_3583_t0003.png 이미지

E1KOBZ_2020_v14n9_3583_f0004.png 이미지

Fig. 6. An example of created coronal images with AD, MCI, and NC, respectively. The slices are taken from the same 𝑖𝑡ℎ image of a person.

E1KOBZ_2020_v14n9_3583_f0005.png 이미지

Fig. 7. An example of created sagittal images with AD, MCI, and NC, respectively. The slices are taken from the same 𝑖𝑡ℎ image of a person.

For a specific set of data points (𝑥𝑘, 𝑦𝑘), 𝑘 = 0: 𝑁 cubic spline consists of 𝑁 cubic polynomial 𝑠𝑘(𝑥), 𝑠 assigned to each subinterval classifying given constraints. The cubic spline is defined as a function 𝑆(𝑥) = 𝑠𝑗(𝑥) on an interval [𝑥𝑗, 𝑥𝑗+1] for 𝑛 = 0, 1, …, 𝑛 − 1 defined as,

𝑆𝑗(𝑥) = 𝑎𝐽 + 𝑏𝐽(𝑥 − 𝑥𝐽) + 𝑐𝐽(𝑥 − 𝑥𝐽)2 + 𝑑𝐽(𝑥 − 𝑥𝐽)3       (10)

In deep learning, data augmentation is defined as enlarging the dataset through different means without affecting possible obtainable features and preventing overfitting [29]. Since adding more data in the medical imaging domain would prove costly, augmentation is the most reliable and easiest way to increase the image number in the dataset. Horizontal flip, width shift, height shift operations were used to increase the dataset size. Each image is randomly augmented through one or more of the mentioned methods to create a second image, doubling the total number of images in the dataset. The final number of total images after the data augmentation used in the training is given in Table 3. 20% of the data was used for testing and 10% of the data for validation. The augmentation process was applied separately for training, testing, and validation sets.

Table 3. The total number of images in the dataset after the data augmentation is applied.

E1KOBZ_2020_v14n9_3583_t0004.png 이미지

4. Experimental Results and Considerations

4.1 Experimental Environment and Training

Implementation was done using Tensorflow [30], Keras [31], Python 2.7 in Ubuntu OS 16.04 computer with Intel Xeon E5-2650 v3 CPU, NVIDIA Quadro M5000 dual-GPU, and 64GB RAM. We used SGD with a mini-batch size of 30, a learning rate of 0.001, a weight decay of 0.06, and a momentum of 0.9 with Nesterov optimization. The model’s accuracy and loss graph change during the training of sagittal images can be seen in Fig. 8.

E1KOBZ_2020_v14n9_3583_f0006.png 이미지

Fig. 8. The model’s accuracy and loss value change through the epochs during the training. It can be seen that 120 epochs were sufficient for optimized accuracy and loss value.

Small fluctuations that are seen in Fig. 8 are due to the SGD optimization algorithm trying to reduce the loss value obtained after a forward-propagation pass, by updating the weights in a backward-propagation. It’s seen that after 120 epochs, the proposed model was sufficiently trained with a validation accuracy close to 100%. It can also be seen that both validation and training loss dropped below the value of 0.002 after 120 epochs of training. It would be possible to reduce the loss value even further, however, this also increases the risk of overfitting over the dataset instead of generalizing. Overfitting means the model has memorized the dataset, and it would mean that it would perform less accurately over any other image that it has not been trained with.

4.2 Performance of the model

The performance of the model during inference is of utmost importance, and activation maps give clues about what the model has learned. Activation maps in deep learning help understanding where the model’s attention is at, in given visualizing data. These fine-grained details in images are crucial in computer vision tasks as the model tries to interpret this extracted information in its task; in this work, classification. As given in Fig. 9, Fig. 10, and Fig. 11, the model’s pipeline attention maps can be seen. On each image, the left side is the convolutional pipeline activation map, and on the left side is the dilated convolutional pipeline map. It can be seen in the images that while the convolutional pipeline activations are more focused on narrower areas, dilated convolution pipeline activations are more focused on wider areas, such as the circumference of the brain.

E1KOBZ_2020_v14n9_3583_f0007.png 이미지

Fig. 9. Activation maps of the model for a test image for the AD case. The left box is the convolutional pipeline, and the right box is the dilated convolution pipeline. It can be seen that the activated areas for the test image complement each other; meaning, the dilated pipeline can capture features not obtained by convolutional pipeline, and vice versa.

E1KOBZ_2020_v14n9_3583_f0008.png 이미지

Fig. 10. Activation maps of the model for a test image for the MCI case.

E1KOBZ_2020_v14n9_3583_f0009.png 이미지

Fig. 11. Activation maps of the model for a test image for the NC case.

Complementary nature of the pipeline activations points to the fact that more thorough and accurate feature extraction is possible through the proposed model. In all three cases for AD, MCI, NC, convolutional pipeline, and the dilated pipeline extracted features that the other could not obtain. Throughout the pipelines, these features combined led the proposed model to achieve higher accuracy than other models.

4.3 Results

The proposed model was compared with other well-known state-of-the-art models such as VGG16 [32], GoogLeNet Inception v4 [33], ResNet50 [34], and a sparse autoencoder specifically developed for AD classification in [35]. The results show that the proposed model outperforms all the comparison models by a large margin. The results can be seen in Table 4.

Table 4. Accuracy comparison between the developed model and other state-of-the-art classification models.

E1KOBZ_2020_v14n9_3583_t0005.png 이미지

The total number of trainable parameters and training time in benchmark models and the proposed model is given in Table 5. Even though the model has a high number of trainable parameters, it shows higher accuracy than the benchmark models. In model inference, the accuracy of the model is calculated as,

Table 5. Trainable parameter comparison between the proposed model and the benchmark models. The model has a large number of parameters.

E1KOBZ_2020_v14n9_3583_t0006.png 이미지

\(Accuracy = \frac {True \ Positive} {True \ Positive + False \ Positive}\)       (113)

The proposed model was able to extract more useful information from created sagittal images than the original axial images. This means that the spatial information carried in the sagittal plane is found to be more useful for the proposed model to perform classification more accurately.

5. Conclusion

In this paper, a parallel deep learning CNN model is proposed that is both capable of performing accurately in a given dataset, and computationally more efficient than similar very deep models such as VGG16. Convolutional and dilated convolutional pipelines work complementary with each other in extracting features the other pipeline cannot, which provides higher accuracy. Although the majority of the works in the literature focus on binary classification, in this paper multi-class classification was focused on, for the future development of AD diagnosis. Sagittal and coronal plane images were also tested for relevant spatial information, which showed that the sagittal plane provided with the highest accuracy score with 91.51%. Even though the model has been tested for the AD diagnosis only, it’s believed that it can be used for other medical imaging areas, as well as modifications for further use given its simplistic yet robust approach to the extraction of image features. The focus of future work will be on increasing the accuracy as well as lowering the computational cost of the model.

References

  1. Alzheimer's Association, "2019 Alzheimer's disease facts and figures," Alzheimer's & Dementia, vol. 15, pp. 321-387, 2019. https://doi.org/10.1016/j.jalz.2019.01.010
  2. R. Brookmeyer, E. Johnson, K. Ziegler-Graham and H.M. Arrighi, "Forecasting the global burden of Alzheimer's disease," Alzheimer's & Dementia, vol. 3, no. 3, pp. 186-191, July 2007. https://doi.org/10.1016/j.jalz.2007.04.381
  3. G. Frisoni, N.C. Fox, C. Jack, P. Scheltens and P. Thompson, "The clinical use of structural MRI in Alzheimer's disease," Nature Reviews Neurology, vol. 6, no.3, pp. 67-77, February 2010.
  4. G. Litjens, T. Kooi, B.E. Bejnordi, A.A.A. Setio, F. Ciompi, M. Ghafoorian, J.A. van der Laak, B. Van Ginneken and C.I. Sanchez, "A survey on deep learning in medical image analysis," Medical Image Analysis, vol. 42, pp. 60-88, December 2017. https://doi.org/10.1016/j.media.2017.07.005
  5. J. Yanase and E. Triantaphyllou, "A systematic survey of computer-aided diagnosis in medicine: Past and present developments," Expert Systems with Applications, vol. 138, pp. 112821, December 2019. https://doi.org/10.1016/j.eswa.2019.112821
  6. A. Tufail, C. Rudisill. C. Egan, V.V. Kapetanakis, S. Vega-Salas, C.G. Owen, A. Lee, V. Louw, J. Anderson, G. Liew, L. Bolter, S. Srivinas, M. Nittala, S. Sadda, P. Taylor, and A.R. Rudnicka, "Automated Diabetic Retinopathy Image Assessment Software: Diagnostic Accuracy and Cost-Effectiveness Compared with Human Graders," Ophthalmology, Vol. 124, no. 3, pp. 343-351, March 2017. https://doi.org/10.1016/j.ophtha.2016.11.014
  7. Y. Fan, S.M. Resnick, X. Wu and C. Davatzikos, "Structural and functional biomarkers of prodromal Alzheimer's disease: a high-dimensional pattern classification study," Neuroimage, vol. 41, no. 2, pp. 277-285, June 2008. https://doi.org/10.1016/j.neuroimage.2008.02.043
  8. K. Hu, Y. Wang, K. Chen, L. Hou and X. Zhang, "Multi-scale features extraction from baseline structure MRI for MCI patient classification and AD early diagnosis," Neurocomputing, vol. 175, pp. 132-145, October 2016. https://doi.org/10.1016/j.neucom.2015.10.043
  9. R. Filiphovych, and C. Davatzikos and Initiative ADN et. al., "Semi-supervised pattern classification of medical images: application to mild cognitive impairment (MCI)," Neuroimage, vol. 55, no. 3, pp. 1109-1119, April 2011. https://doi.org/10.1016/j.neuroimage.2010.12.066
  10. A.K. Ambastha, "Neuroanatomical characterization of Alzheimer's disease using deep learning," Journal of National University of Singapore, October 2015.
  11. K.R. Gray, R. Wolz, R.A. Heckermann, P. Aljabar, A. Hammers, D. Rueckert and Initiative ADN et. al., "Multi-region analysis of longitudinal FDG-PET for classification for the Alzheimer's disease," Neuroimage, vol. 60, no. 1, pp. 221-229, March 2012. https://doi.org/10.1016/j.neuroimage.2011.12.071
  12. C. Davatzikos, P. Bhatt, L.M. Shaw, K.N. Batmangehelich and J.Q. Trojanowski, "Prediction of MCI to AD conversion, via MRI, CSF biomarkers, and pattern classification," Neurobiology of Aging, vol. 32 no.12, pp. 2322.e19-2322.e27, December 2011. https://doi.org/10.1016/j.neurobiolaging.2010.05.023
  13. H, Suk, C.Y. Wee and D. Shen, "Discriminative group sparse representation for mild cognitive impairment classification," in Proc. of 4th Int. work-shop on machine learning in medical imaging, vol. 8134, pp.131-138, September 2013.
  14. R.J. Perrin, A.M. Fagan and D.M. Holtzman, "Multimodal techniques for diagnosis and prognosis of Alzheimer's disease," Nature, vol. 461, no. 7266, pp. 916-922, October 2009. https://doi.org/10.1038/nature08538
  15. H. Choi and K.H. Jin, "Predicting cognitive decline with deep learning of brain metabolism and amyloid imaging," Behavioural Brain Research, vol. 344, pp. 103-109, February 2018. https://doi.org/10.1016/j.bbr.2018.02.017
  16. M. Liu, J. Zhang, E. Adeli and D. Shen, "Landmark-based deep multi-instance learning for brain disease diagnosis," Medical Image Analysis, vol. 43, pp. 157-168, October 2017. https://doi.org/10.1016/j.media.2017.10.005
  17. J. Islam and Y. Zhang, "Brain MRI analysis for Alzheimer's disease diagnosis using an emsemble system of deep convolutional neural networks," Brain Informatics, vol. 5, no. 2, May 2018.
  18. A. Gupta, M. Ayhan and A. Maida, "Natural image bases to represent neuroimaging data," in Proc. of the 30th Int. Conf. on machine learning, vol. 28, pp. 987-994, June 2013.
  19. A. Payan and G. Montana, "Predicting Alzheimer's disease; a neuroimaging study with 3D convolutional networks," in Proc. of 4th Int. Conf. on Pattern Recognition Applications and Methods, vol. 2, February 2015.
  20. T. Brosch, R. Tam and Initiative ADN et. al., "Manifold learning of brain MRIs by deep learning," in Proc. of the Int. Conf. on Medical Image Computing and Computer-assisted Intervention, vol. 16, pp. 633-640, 2013.
  21. H.I. Suk, S.W. Lee, D. Shen and Initiative ADN et. al., "Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis," Neuroimage, vol. 101, pp. 569-582, November 2014. https://doi.org/10.1016/j.neuroimage.2014.06.077
  22. H.I. Suk and D. Shen, "Deep learning-based feature representation for AD/MCI classification," in Proc. of Int. Conf. on Medical Image Computing and Computer-assisted Intervention, vol. 16, pp. 583-590, September 2013.
  23. H.I. Suk, S.W. Lee, D. Shen and Initiative ADN et. al., "Latent feature representation with stacked auto-encoder for AD/MCI diagnosis," Brain Structure and Function, vol. 220, no. 2, pp. 841-859, March 2013. https://doi.org/10.1007/s00429-013-0687-3
  24. D. Cardenas-Pena, D. Collazos-Huertas and G. Castellanos-Dominguez, "Centered kernel alignment enhancing neural network pretraining for MRI-based dementia diagnosis," Computational and Mathematical Methods in Medicine, vol. 2016, pp. 1-10, January 2016.
  25. Y. Fisher and V. Koltun, "Multi-Scale Context Aggregation by Dilated Convolutions," in Proc. of the Int. Conf. on Learning Representations, May 2016.
  26. S. Ioffe and C. Szegedy, "Batch normalization: accelerating deep network training by reducing internal covariate shift," in Proc of the Int. Conf. on Machine Learning, pp. 448-456, July 2015.
  27. N. Srivastava, G.E. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhundinov, "Dropout: a simple way to prevent neural networks from overfitting," Machine Learning, vol. 15, no. 1, pp. 1929-1958, June 2014.
  28. G. Huang, Z.Liu, L. Van der Mateen and K.Q. Weinberger, "Densely connected convolutional networks," in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2261-2269, July 2017.
  29. A. Krizhevsky, I. Sutskever and G.E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, no. 2, pp. 1097-1105, December 2012.
  30. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghenmawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu and X. Zheng, "Tensorflow: Large-scale machine learning on heterogeneous systems," 2015. https://www.tensorflow.org
  31. F. Chollet et. al., "Keras," 2015. https://github.com/keras-team/keras
  32. K. Simonyan and A. Sizzerman, "Very deep convolutional networks for large-scale image recognition," in Proc. of 3rd Int. Conf. on Machine Learning, May 2015.
  33. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, "Going Deeper with Convolutions," in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1-9, June 2015.
  34. K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp.770-778, July 2016.
  35. D. Jha and G.R. Kwon, "Alzheimer's Disease Detection Using Sparse Autoencoder, Scale Conjugate Gradient and Softmax Output Layer with Fine Tuning," Int. Journal of Machine Learning and Computing, vol. 7, no. 1, pp. 13-17, February 2017. https://doi.org/10.18178/ijmlc.2017.7.1.612

Cited by

  1. Anomaly Analysis of Alzheimer’s Disease in PET Images Using an Unsupervised Adversarial Deep Learning Model vol.11, pp.5, 2020, https://doi.org/10.3390/app11052187
  2. Unsupervised Anomaly Approach to Pedestrian Age Classification from Surveillance Cameras Using an Adversarial Model with Skip-Connections vol.11, pp.21, 2020, https://doi.org/10.3390/app11219904