References
- Bahdanau, D., Cho, K., & Bengio, Y (2014). Neural Machine Translation by Jointly Learning to Align and Translate. In arXiv: 1409.0473.
- Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends in Machine Learning, 2(1), 1-127. doi:10.1561/2200000006
- Bengio, Y, Courville, A., & Vincent, P. (2013). Representation Learning: A Review and New Perspectives. IEEE Transactions on PAMI, 35(8), 1798-1828. https://doi.org/10.1109/TPAMI.2013.50
- Bengio, Y, Ducharme, R., & Vincent, P. (2001). A Neural Probabilistic Language Model. In NIPS'2000.
- Bengio, Y, Ducharme, R, Vincent, P., Jauvin, C., Kandola, J., Hofmann, T., Shawe-Taylor, J. (2003). A Neural Probabilistic Language Model. Journal of Machine Learning Research, 3, 1137-1155.
- Bengio, Y, Jerome Louradour, Collobert, R, & Weston, J. (2009). Curriculum Learning. In ICML '09.
- Bengio, Y, Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy Layer-Wise Training of Deep Networks. In NIPS '2006.
- Brown, P. F., deSouza, P. V, Mercer, L., Della Pietra, V. J., & Lai, J. C. (1992). Class-Based n-gram Models of Natural Language. Computational Linguistics, 18(4), 467-479.
- Chen, D., & Manning, C. D. (2014). A Fast and Accurate Dependency Parser using Neural Networks. In EMNLP '14.
- Chen, W., Zhang, Y., & Zhang, M. (2014). Feature Embedding for Dependency Parsing. In COLING '14.
- Cho, K. (2014). Foundations and Advances in Deep Learning. Aalto University.
- Cho, K., Van Merrienboer, B., & Bahdanau, D. (2014). On the Properties of Neural Machine Translation: Encoder - Decoder Approaches. SSST-8.
- Cho, K, Van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder - Decoder for Statistical Machine Translation. In EMNLP'14.
- Collobert, R., & Weston, J. (2008). A Unified Architecture for Natural Language Processing : Deep Neural Networks with Multitask Learning. ln ICML'08
- Collobert, R, Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural Language Processing (Almost) from Scratch. Journal of Machine Learning Research, 12, 2493-2537.
- Deerwester, S., Dumais, S. T, Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6), 391-407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
- G. E. Hinton, & R. S. Zemel. (1994). Autoencoders, minimum description length, and Helmholtz free energy. In NIPS 6.
- G. E. Hinton, & Salakhutdinov. (2006). Reducing the Dimensionality of Data with Neural Networks. Science, 313, 504 - 507. https://doi.org/10.1126/science.1127647
- Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation
- Gers, F. A., Schraudolph, N. N., & Schmidhuber, J. (2002). Learning Precise Timing with LSTM Recurrent Networks. Journal of Machine Learning Research, 3, 115-143.
- Graves, A. (2013). Generating sequences with recurrent neural networks. arXiv Preprint arXiv:1308.0850.
- Hochreiter, S., & Vrgen Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
- Jean, S., Cho, K., & Memisevic, R. (2015). On Using Very Large Target Vocabulary for Neural Machine Translation. In ACL '15.
- Jeffrey L. Elman. (1990). Finding structure in time. Cognitive Science, 14, 179-211 https://doi.org/10.1207/s15516709cog1402_1
- Luong, M.-T., Sutskever Google Quoc Le, I. V, & Vinyals Google Wojciech Zaremba, O. (2015). Addressing the Rare Word Problem in Neural Machine Translation. In ACL '15.
- Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. In Workshop at ICLR '13.
- Mikolov, T, Karafiat, M., Burget, L., Cernocky, J. "Honza," & Khudanpur, S. (2000). Recurrent neural network based language model. In INTERSPEECH.
- Mikolov, T, Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. In NIPS '2013.
- Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. NAACL-HLT '13.
- Mitchell, J., & Lapata, M. (2008). Vector-based Models of Semantic Composition. In ACL.
- Mnih, A., & Hinton, G. (2008). A Scalable Hierarchical Distributed Language Model. In NIPS '2008.
- Mnih, A., & Teh, Y. W. (2012). A fast and simple algorithm for training neural probabilistic language models. In ICML'12.
- Morin, F., & Bengio, Y. (2005). Hierarchical Probabilistic Neural Network Language Model. In The International Workshop on Artificial Intelligence (AlSTAT).
- Pei, W., Ge, T., & Chang, B. (2015). An Effective Neural Network Model for Graph-based Dependency Parsing. In ACL'15.
- Ranzato, M. A., Poultney, C., Chopra, S., & Lecun, Y, (2007). Efficient Learning of Sparse Representations with an Energy-Based Model. In NIPS '2006.
- Roark, B., Saraclar, M., & Collins, M. (2007). Discriminative n-gram language modeling. Computer Speech and Language, 21(2), 373-392. https://doi.org/10.1016/j.csl.2006.06.006
- Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain Psychological Review, 65(6), 386-408. https://doi.org/10.1037/h0042519
- D.E. Rumelhart, G. Hinton, & R. J. Williams. (1986). Learning representations by back-propagation errors. Nature, 323, 533-536. https://doi.org/10.1038/323533a0
- Socher, R., Bauer, J., Manning, C. D., & Ng, A. Y. (2013). Parsing with Compositional Vector Grammars. In ACL '13.
- Socher, R., Chen, D., Manning, C. D., & Ng, A. Y. (2013). Reasoning With Neural Tensor Networks for Knowledge Base Completion. In NIPS'13.
- Socher, R., Chiung, C., Lin, -Yu, Ng, A. Y., & Manning, C. D. (2011). Parsing Natural Scenes and Natural Language with Recursive Neural Networks. In ICML '11.
- Socher, R., Huang, E. H., Pennington, J., Ng, A. Y., & Manning, C. D. (2011). Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. In NIPS '11.
- Socher, R., Huval, B., Manning, C. D., & Ng, A. Y. (2012). Semantic Compositionality through Recursive Matrix-Vector Spaces. In EMNLP '12.
- Socher, R., Pennington, J., Huang, E. H., Ng, A. Y., & Manning, C. D. (2011). Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. In EMNLP'11.
- Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In EMNLP '13.
- Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. In NIPS' 14.
- Turian, J., Ratinov, L., & Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning. In ACL. Retrieved from http://metaoptimize.
- Weiss, D., Alberti, C., Collins, M., & Petrov, S. (2015). Structured Training for Neural Network Transition-Based Parsing. In ACL '15.
- Werbos, P. J. (1990). Backpropagation through time: What it does and how to do it. In Proceedings of the IEEE (pp. 1550 - 1560).
- Zheng, X., Chen, H., & Xu, T. (2013). Deep Learning for Chinese Word Segmentation and POS Tagging. In EMNLP '13.