References
- C. D. Manning and H. Schutze, "Foundations of statistical natural language processing". MIT press, 1999.
- D. Jurafsky and J. H. Martin, "Speech &Ianguage processing. Pearson Education India", 2000.
- B. Arias, N. Bel, B. Fisas, M. Lorente, M. Marimon, C. Morell, and J. Vivaldi, "The IULA Spanish LSP Treebank: building and browsing".
- M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini, "Building a large annotated corpus of English: The Penn Treebank. Computational linguistics", 19(2), pp. 313-330, 1993.
- H. Tseng, D. Jurafsky, and C. Manning, "Morphological features help POS tagging of unknown words across language varieties", In Proceedings of the fourth SIGHAN workshop on Chinese language processing, pp. 32-39, October 2005.
- L. R. Rabiner and B. H. Juang, "An introduction to hidden Markov models", ASSP Magazine, IEEE, 3(1), pp. 4-16, 1986 https://doi.org/10.1109/MASSP.1986.1165351
- E. Charniak, C. Hendrickson, N. Jacobson, and M. Perkowitz, "Equations for part-of-speech tagging", In AAAl, pp. 784-789, July 1993.
- J. Kupiec, "Robust part-of-speech tagging using a hidden Markov model", Computer Speech & Language, 6(3), pp. 225-242, 1992. https://doi.org/10.1016/0885-2308(92)90019-Z
- A. McCallum, D. Freitag, and F. C. Pereira, "Maximum Entropy Markov Models for Information Extraction and Segmentation", In ICML, Vol. 17, pp. 591-598, June 2000.
- A. Ratnaparkhi, "A maximum entropy model for part-of-speech tagging", In Proceedings of the conference on empirical methods in natural language processing, Vol. 1, pp. 133-142, May 1996.
- J. D. Lafferty, A. McCallum, F. C. N. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data", Proceedings of the Eighteenth International Conference on Machine Learning, p.282-289, June 28-July 01, 2001
- I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, "Large Margin Methods for Structured and Interdependent Output Variables", Journal of Machine Learning Research, 6, 1453-1484, December 2005.
- G. D. Jr. Forney, "The viterbi algorithm", Proceedings of the IEEE, 61(3), pp. 268-278, March 1973. https://doi.org/10.1109/PROC.1973.9030
- D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, "A learning algorithm for boltzmann machines", Cognitive Science, 9(1), pp. 147-169, 1985. https://doi.org/10.1207/s15516709cog0901_7
- Y. Le Cun, "Learning process in an asymmetric threshold network", In Disordered systems and biological organization, Springer Berlin Heidelberg, pp. 233-240, 1986
- D. E. Rumelhmi, G. E. Hinton, and R. J. Williams, "Learning internal representations by error propagation", In Parallel distributed processing: explorations in the microstructure of cognition, vol. 1, MIT Press, Cambridge, MA, USA, pp. 318-362, 1986.
- Y. Bengio, P. Simard, and P. Frasconi, "Learning long-term dependencies with gradient descent is difficult", IEEE Transactions on Neural Networks, 5(2), pp. 157-166, 1994. https://doi.org/10.1109/72.279181
- G. E. Hinton, S. Osindero, and Y. W. Teh, "A fast learning algorithm for deep belief nets", Neural computation, 18(7), pp. 1527-1554, 2006. https://doi.org/10.1162/neco.2006.18.7.1527
- D. Erhan, Y. Bengio, A. Courville, P. A. Manzagol, P. Vincent, and S. Bengio, "Why does unsupervised pre-training help deep learning?", The Journal of Machine Learning Research, 11, pp. 625-660, 2010.
- C. Dyer, M. Ballesteros, W. Ling, A. Matthews, and N. A. Smith, "Transition-Based Dependency Parsing with Stack Long Short-Tenn Memory", arXiv preprint arXiv:1505.08075, 2015.
- Weiss, D., Alberti, C., Collins, M., &Petrov, S. (2015). Structured training for neural network transition-based parsing. arXiv preprint arXiv: 1506.06158.
- H. C. Carneiro, F. M. Franya, and P. M. Lima, "Multilingual part-of-speech tagging with weightless neural networks", Neural Networks, 66, pp. 11-21, 2015. https://doi.org/10.1016/j.neunet.2015.02.012
- R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, "Natural language processing(almost) from scratch", The Journal of Machine Learning Research, 12, pp. 2493-2537, 2011.
- Z. S. Harris, "Distributional structure", Word, 1954.
- Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin, "A neural probabilistic language model", The Journal of Machine Learning Research, 3, pp. 1137-1155, 2003.
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition", Proceedings of the IEEE, 86(11), pp. 2278-2324, 1998. https://doi.org/10.1109/5.726791
- V Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines", In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807-814, 2010.
- P. Sibi, S. A. Jones, and P. Siddarth, "Analysis of different activation functions using back propagation neural networks", Journal of Theoretical and Applied Information Technology, 47(3), pp. 1264-1268, 2013.
- B. Karlik and A. V. Olgac, "Performance analysis of various activation functions in generalized MLP architectures of neural networks",. Internation Journal of Artificial Intelligence and Expert Systems, 1(4), pp. 111-122, 2011.
- M. T. Luong, I. Sutskever, Q. V Le, O. Vinyals, and W. Zaremba, "Addressing the rare word problem in neural machine translation", In Proceedings of Association of computational linguistics(ACL), 2015.
- S. Hochreiter and J. Schmidhuber, "Long short-term memory", Neural computation, 9(8), pp. 1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
- D. Koller and N. Friedman, "Probabilistic graphical models: principles and techniques", MIT press, 2009.