References
- Bahdanau D, Cho K, and Bengio Y (2016). Neural machine translation by jointly learning to align and translate. In Proceedings of the Third International Conference on Learning Representations (ICLR 2015), San Diego, CA, Available from: https://arxiv.org/abs/1409.0473
- Ba JL, Kiros JR, and Hinton GE (2016). Layer normalization, Available from: pre-print: arXiv:1607.06450
- Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, and Bengio Y (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 1724-1734.
- Chorowski JK, Bahdanau D, Serdyuk D, Cho K, and Bengio Y (2015). Attention-based models for speech recognition, Advances in Neural Information Processing Systems, 28 (NIPS 2015), 577-585.
- He K, Zhang X, Ren S, and Sun J (2016). Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 770-778.
- Hwang IJ, Kim HJ, Kim YJk, and Lee YD (2024). Generalized neural collaborative filtering, The Korean Journal of Applied Statistics, 37, 311-322.
- Ioffe S and Szegedy C (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), Lille, 448-456.
- Kim HJ, Kim YJ, Jang K, and Lee YD (2024a). A statistical journey to DNN, the second trip: Architecture of RNN and image classification, The Korean Journal of Applied Statistics, 37, 553-563.
- Kim HJ, Hwang IJ, Kim YJ, and Lee YD (2024b). A statistical journey to DNN, the first trip: From regression to deep neural network, The Korean Journal of Applied Statistics, 37, 541-551.
- Li Y, Si S, Li G, Hsieh CJ, and Bengio S (2021). Learnable fourier features for multi-dimensional spatial positional encoding, Advances in Neural Information Processing Systems, 34 (NeurIPS 2021), 15816-15829.
- Luong MT, Pham H, and Manning CD (2015). Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Lisbon, 1412-1421.
- Mikolov T, Chen K, Corrado G, and Dean J (2013a). Efficient estimation of word representations in vector space, Available from: pre-print: arXiv:1301.3781
- Mikolov T, Sutskever I, Chen K, Corrado G, and Dean J (2013b). Distributed representations of words and phrases and their compositionality, Advances in Neural Information Processing Systems, 26 (NIPS 2013), 3111-3119.
- Park C, Na I, Jo Y et al. (2019). SANVis: Visual Analytics for Understanding Self-Attention Networks. In Proceedings of 2019 IEEE Visualization Conference (VIS), Vancouver, BC, 146.
- Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Shazeer N, Ku A, and Tran S (2018). Image transformer. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholmsmassan, Stockholm, 4052-4061.
- Shaw P, Uszkoreit J, and Vaswani A (2018). Self-attention with relative position representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018), New Orleans, Louisiana, 464-468.
- Siu C (2019). Residual networks behave like boosting algorithms. In Proceedings of 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Washington DC, 31-40.
- Su J, Ahmed M, Lu Y, Pan S, Bo W, and Liu Y (2024). Roformer: Enhanced transformer with rotary position embedding, Neurocomputing, 568, 127063.
- Sutskever I, Vinyals O, and Le QV (2014). Sequence to sequence learning with neural networks, Advances in Neural Information Processing Systems, 27 (NIPS 2014), 3104-3112.
- Vaswani A, Shazeer N, and Parmar N (2017). Attention is all you need, Advances in Neural Information Processing Systems, 30 (NIPS 2017), 5998-6008.
- Veit A, Wilber MJ, and Belongie S (2016). Residual networks behave like ensembles of relatively shallow networks, Advances in Neural Information Processing Systems, 29 (NIPS 2016), 550-558.
- Wang X, Tu Z, Wang L, and Shi S (2019). Self-attention with structural position representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, 1403-1409.
- Zhou X, Ren Z, Zhou S, Jiang Z, Yu TZ, and Luo H (2024). Rethinking position embedding methods in the transformer architecture, Neural Process Letters 56, 41.