Figure 3.1. Illustration of consistency and complementary principles (Liu et al., 2015a).
Figure 3.2. Data integration: projection method from each view onto common feature space.
Figure 3.3. Flow chart of co-training learning.
Figure 3.4. Two kinds of the multi-view strategy: (a) One-view-one-network strategy, (b) Multi-view-one-networkstrategy (Kang et al., 2017).
Figure 4.1. The word-level matching CNN (Ma et al., 2015).
Figure 5.1. A schematic of DCCA (Andrew et al., 2013).
Figure 5.2. A schematic of DGCCA with deep networks for J views (Benton et al., 2017).
Figure 5.3. A schematic of Multimodal DBM (Srivastava and Salakhutdinov, 2012).
Figure 5.4. Autoencoder model (Jaques et al., 2017).
Figure 5.5. Correspondence Autoencoder (Feng et al., 2014).
Figure 5.7. Architecture of text-conditional convolutional GAN (Reed et al., 2016).
Figure 4.2. (a) The simple RNN model (b) The m-RNN model (Mao et al., 2014).
Figure 5.6. (a) Graphical model of JMVAE (b) Two approaches to estimate encoders with a single input,
Table 6.1. R & Python Implementations of Multi-view Learning Models
References
- Adetiba, E. and Olugbara, O. O. (2015). Lung cancer prediction using neural network ensemble with histogram of oriented gradient genomic features, The Scientific World Journal.
- Andrew, G., Arora, R., Bilmes, J., and Livescu, K. (2013). Deep canonical correlation analysis, In International Conference on Machine Learning, 1247-1255.
- Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473.
- Bai, B., Weston, J., Grangier, D., Collobert, R., Sadamasa, K., Qi, Y., andWeinberger, K. (2010). Learning to rank with (a lot of) word features, Information retrieval, 13, 291-314. https://doi.org/10.1007/s10791-009-9117-9
- Bengio, Y. (2009). Learning deep architectures for AI, Foundations and trends in Machine Learning, 2, 1-127. https://doi.org/10.1561/2200000006
- Benton, A., Khayrallah, H., Gujral, B., Reisinger, D., Zhang, S., and Arora, R. (2017). Deep generalized canonical correlation analysis, arXiv preprint arXiv:1702.02519.
- Blum, A. and Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, 11, 92-100.
- Bokhari, M. U. and Hasan, F. (2013). Multimodal information retrieval: Challenges and future trends, International Journal of Computer Applications, 74(14).
- Cai, J., Tang, Y., and Wang, J. (2016). Kernel canonical correlation analysis via gradient descent, Neurocomputing, 182, 322-331. https://doi.org/10.1016/j.neucom.2015.12.039
- Cho, K., Van Merrienboer, B., Bahdanau, D., and Bengio, Y. (2014a). On the properties of neural machine translation: Encoder-decoder approaches, arXiv preprint arXiv:1409.1259.
- Cho, K., Van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014b). Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv preprint arXiv:1406.1078.
- Cristianini, N., Shawe-Taylor, J., and Lodhi, H. (2002). Latent semantic kernels, Journal of Intelligent Information Systems, 18, 127-152. https://doi.org/10.1023/A:1013625426931
- Dasgupta, S., Littman, M. L., and McAllester, D. A. (2002). PAC generalization bounds for co-training. In Advances in Neural Information Processing Systems, 375-382.
- Dey, A. (2016). Machine learning algorithms: a review, (IJCSIT) International Journal of Computer Science and Information Technologies, 7, 1174-1179.
- Ding, C. and Tao, D. (2015). Robust face recognition via multimodal deep face representation, IEEE Transactions on Multimedia, 17, 2049-2058. https://doi.org/10.1109/TMM.2015.2477042
- Doersch, C. (2016). Tutorial on variational autoencoders, arXiv preprint arXiv:1606.05908.
- Dou, Q., Chen, H., Yu, L., Qin, J., and Heng, P. A. (2017). Multilevel contextual 3-D CNNs for false positive reduction in pulmonary nodule detection, IEEE Transactions on Biomedical Engineering, 64, 1558-1567. https://doi.org/10.1109/TBME.2016.2613502
- Du, J., Ling, C. X., and Zhou, Z. H. (2011). When does cotraining work in real data?, IEEE Transactions on Knowledge and Data Engineering, 23, 788-799. https://doi.org/10.1109/TKDE.2010.158
- Elman, J. L. (1990). Finding structure in time, Cognitive science, 14, 179-211. https://doi.org/10.1207/s15516709cog1402_1
- Farquhar, J., Hardoon, D., Meng, H., Shawe-Taylor, J. S., and Szedmak, S. (2006). Two view learning: SVM-2K, theory and practice. In Advances in Neural Information Processing Systems, 355-362.
- Feng, F., Wang, X., and Li, R. (2014). Cross-modal retrieval with correspondence autoencoder. In Proceedings of the 22nd ACM international conference on Multimedia, (pp. 7-16), ACM
- Freund, Y. and Haussler, D. (1992). Unsupervised learning of distributions on binary vectors using two layer networks. In Advances in neural information processing systems, 912-919.
- Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., and Mikolov, T. (2013). Devise: a deep visualsemantic embedding model. In Advances in Neural Information Processing Systems, 2121-2129.
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., and Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems, 11, 2672-2680.
- Hardoon, D. R., Szedmak, S., and Shawe-Taylor, J. (2004). Canonical correlation analysis: an overview with application to learning methods, Neural Computation, 16, 2639-2664. https://doi.org/10.1162/0899766042321814
- Hinton, G. E. and Salakhutdinov, R. R. (2006).Reducing the dimensionality of data with neural networks, Science, 313(5786), 504-507. https://doi.org/10.1126/science.1127647
- Hinton, G. E. and Salakhutdinov, R. R. (2009). Replicated softmax: an undirected topic model. In Advances in Neural Information Processing Systems, 1607-1614.
- Horst, P. (1961). Generalized canonical correlations and their applications to experimental data, Journal of Clinical Psychology, 17, 331-347. https://doi.org/10.1002/1097-4679(196110)17:4<331::AID-JCLP2270170402>3.0.CO;2-D
- Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift., arXiv preprint arXiv:1502.03167.
- Jain, A., Zamir, A. R., Savarese, S., and Saxena, A. (2016). Structural-RNN: Deep learning on spatiotemporal graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5308-5317.
- Jaques, N., Taylor, S., Sano, A., and Picard, R. (2017). Multimodal Autoencoder: A Deep Learning Approach to Filling in Missing Sensor Data and Enabling Better Mood Prediction. In Proceedings of International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, Texas.
- Jeni, L. A., Girard, J. M., Cohn, J. F., and De La Torre, F. (2013). Continuous au intensity estimation using localized, sparse facial feature space. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 1-7.
- Kang, G., Liu, K., Hou, B., and Zhang, N. (2017). 3D multi-view convolutional neural networks for lung nodule classification, PLoS One, 12(11).
- Kingma, D. P. and Welling, M. (2013). Auto-encoding variational Bayes, arXiv preprint arXiv:1312.6114.
- Kiros, R., Popuri, K., Cobzas, D., and Jagersand, M. (2014). Stacked multiscale feature learning for domain independent medical image segmentation. In International Workshop on Machine Learning in Medical Imaging, 25-32. Springer, Cham.
- Kullback, S. and Leibler, R. A. (1951). On information and sufficiency, The Annals of Mathematical Statistics, 22, 79-86. https://doi.org/10.1214/aoms/1177729694
- Kumari, J., Rajesh, R., and Pooja, K. M. (2015). Facial expression recognition: a survey, Procedia Computer Science, 58, 486-491. https://doi.org/10.1016/j.procs.2015.08.011
- Lahat, D., Adali, T., and Jutten, C. (2015). Multimodal data fusion: an overview of methods, challenges, and prospects, Proceedings of the IEEE, 103, 1449-1477. https://doi.org/10.1109/JPROC.2015.2460697
- Li, Y., Yang, M., and Zhang, Z. (2016). Multi-view representation learning: a survey from shallow methods to deep methods, arXiv preprint arXiv:1610.01206.
- Liu, J., Jiang, Y., Li, Z., Zhou, Z. H., and Lu, H. (2015a). Partially shared latent factor learning with multiview data, IEEE Transactions on Neural Networks and Learning Systems, 26, 1233-1246. https://doi.org/10.1109/TNNLS.2014.2335234
- Liu, S., Liu, S., Cai, W., Che, H., Pujol, S., Kikinis, R., and Fulham, M. J. (2015b). Multimodal neuroimaging feature learning for multiclass diagnosis of Alzheimer's disease, IEEE Transactions on Biomedical Engineering, 62, 1132-1140. https://doi.org/10.1109/TBME.2014.2372011
- Lorincz, A., Jeni, L., Szabo, Z., Cohn, J., and Kanade, T. (2013). Emotional expression classification using time-series kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 889-895
- Ma, L., Lu, Z., Shang, L., and Li, H. (2015). Multimodal convolutional neural networks for matching image and sentence. In Proceedings of the IEEE International Conference on Computer Vision, 2623-2631.
- Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., and Yuille, A. (2014). Deep captioning with multimodal recurrent neural networks (m-rnn), arXiv preprint arXiv:1412.6632.
- Neverova, N., Wolf, C., Taylor, G., and Nebout, F. (2016). Moddrop: adaptive multi-modal gesture recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 1692-1706. https://doi.org/10.1109/TPAMI.2015.2461544
- Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., and Ng, A. Y. (2011). Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), 689-696.
- Nigam, K. and Ghani, R. (2000). Analyzing the effectiveness and applicability of co-training. In Proceedings of the Ninth International Conference on Information and Knowledge Management, 86-93.
- Pohl, C., Ali, R. M., Chand, S. J. H., Tamin, S. S., Nazirun, N. N. N., and Supriyanto, E. (2014). Interdisciplinary approach to multimodal image fusion for vulnerable plaque detection, In Biomedical Engineering and Sciences (IECBES), 2014 IEEE Conference on, 11-16.
-
Qi, C. R., Su, H., Nie
$\ss$ ner, M., Dai, A., Yan, M., and Guibas, L. J. (2016). Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5648-5656. - Radford, A., Metz, L., and Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks, arXiv preprint arXiv:1511.06434.
- Ramachandram, D. and Taylor, G. W. (2017). Deep multimodal learning: a survey on recent advances and trends, IEEE Signal Processing Magazine, 34, 96-108. https://doi.org/10.1109/MSP.2017.2738401
- Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., and Lee, H. (2016). Generative adversarial text to image synthesis, arXiv preprint arXiv:1605.05396.
- Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, Social Service Review, 61, 85-117. https://doi.org/10.1016/j.neunet.2014.09.003
- Shen, D., Wu, G., and Suk, H. I. (2017). Deep learning in medical image analysis, Annual Review of Biomedical Engineering, 19, 221-248. https://doi.org/10.1146/annurev-bioeng-071516-044442
- Sindhwani, V., Niyogi, P., and Belkin, M. (2005). A co-regularization approach to semi-supervised learning with multiple views. In Proceedings of ICML Workshop on Learning with Multiple Views, 74-79, Citeseer.
- Sitova, Z., Sedenka, J., Yang, Q., Peng, G., Zhou, G., Gasti, P., and Balagani, K. S. (2016). HMOG: New behavioral biometric features for continuous authentication of smartphone users IEEE Transactions on Information Forensics and Security, 11(5), 877-892. https://doi.org/10.1109/TIFS.2015.2506542
- Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory (No. CU-CS-321-86), COLORADO UNIV AT BOULDER DEPT OF COMPUTER SCIENCE.
- Sousa, R. T. and Gama, J. (2017). Comparison between Co-training and Self-training for single-target regression in data streams using AMRules.
- Srivastava, N. and Salakhutdinov, R. R. (2012). Multimodal learning with deep Boltzmann machines, In Advances in neural information processing systems, 2222-2230.
- Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3d shape recognition. In In Proceedings of the IEEE International Conference on Computer Vision, 945-953.
- Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, 3104-3112.
- Suzuki, M., Nakayama, K., and Matsuo, Y. (2016). Joint multimodal learning with deep generative models, arXiv preprint arXiv:1611.01891.
- Suzuki, M., Nakayama, K., and Matsuo, Y. (2018). Improving Bi-directional Generation between Different Modalities with Variational Autoencoders, arXiv preprint arXiv:1801.08702.
- Tatulli, E. and Hueber, T. (2017). Feature extraction using multimodal convolutional neural networks for visual speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, 2971-2975.
- Taylor, G. W., Fergus, R., LeCun, Y., and Bregler, C. (2010). Convolutional learning of spatio-temporal features. In European conference on computer vision, (pp. 140-153), Springer, Berlin, Heidelberg.
- Uludag, K. and Roebroeck, A. (2014). General overview on the merits of multimodal neuroimaging data fusion, Neuroimage, 102, 3-10. https://doi.org/10.1016/j.neuroimage.2014.05.018
- Vargas, R., Mosavi, A., and Ruiz, L. (2017). Deep Learning: A Review,Advances in Intelligent Systems and Computing.
- Vrigkas, M., Nikou, C., and Kakadiaris, I. A. (2015). A review of human activity recognition methods, Frontiers in Robotics and AI, 2, 28.
- Wang, W., Ooi, B. C., Yang, X., Zhang, D., and Zhuang, Y. (2014). Effective multi-modal retrieval based on stacked auto-encoders. In Proceedings of the VLDB Endowment, 7, 649-660.
- Xu, C., Tao, D., and Xu, C. (2013). A survey on multi-view learning, arXiv preprint arXiv:1304.5634.
- Yu, Y., Lin, H., Meng, J., Wei, X., Guo, H., and Zhao, Z. (2017). Deep transfer learning for modality classification of medical images, Information, 8, 91. https://doi.org/10.3390/info8030091
- Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., and Metaxas, D. (2017). StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks, arXiv preprint.
- Zhang, W., Zhang, Y., Ma, L., Guan, J., and Gong, S. (2015). Multimodal learning for facial expression recognition, Pattern Recognition, 48, 3191-3202. https://doi.org/10.1016/j.patcog.2015.04.012
- Zhao, J., Xie, X., Xu, X., and Sun, S. (2017). Multi-view learning overview: recent progress and new challenges, Information Fusion, 38, 43-54. https://doi.org/10.1016/j.inffus.2017.02.007
- Zhou, Z. H. and Li, M. (2005). Tri-training: exploiting unlabeled data using three classifiers, IEEE Transactions on knowledge and Data Engineering, 86, 660-689.
- Zhu, X. (2006). Semi-supervised learning literature survey, Computer Science, University of Wisconsin-Madison, 2, 4.
- Zhu, X. and Goldberg, A. B. (2007). Introduction to semi-supervised learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, 3, 1-130.