Fig. 1. Deep Learning-based Classification System
Fig. 2. The Preprocessing of Model Training
Fig. 3. Model Training
Fig. 4. LSTM Model
Fig. 5. GRU Model
Fig. 6. CNN Model
Fig. 7. Preprocessing of Genre Selection
Fig. 8. Genre Selection
Fig. 9. Word2Vec Visualization using T-SNE
Fig. 10. Deep Learning Model-based Training and Validation Accuracy in Classifying Articles
Fig. 11. Loss Change In Classifying Articles,
Fig. 12. Model Training Accuracy and Validation Accuracy Change In Classifying paragraphs,
Fig. 13. Loss Change In Classifying Paragraphs,
Table 1. Number of Articles and Paragraphs in COCA
Table 2. COCA Database Sample
Table 3. Hardware Configuration
Table 4. Software Configuration
Table 5. Article and Article Tagging List
Table 6. Paragraph and Paragraph Tagging List
Table 7. Sentence and Tagging List
Table 8. Maximum and Minimum Sequence Length in Article- and Paragraph-based Sentence List
Table 9. Preprocessing of Model Training
Table 10. Parameters of LSTM and GRU
Table 11. Parameters of CNN
Table 12. Training Accuracy, Validation Accuracy, and Genre Test Accuracy in the Articles Experiment
Table 13. Training Accuracy, Validation Accuracy and Genre Test Accuracy in the Paragraphs Experiment
References
- COCA, https://corpus.byu.edu/coca/
- Sejong Corpus, https://ithub.korean.go.kr/user/main.do
- J. Swales, "Genre Analysis: English in Academic and Research Settings," Cambridge University Press, 1990.
- D. Biber, "Variation across Speech and Writing," Cambridge University Press, 1988.
- D. M. Blei, "Probabilistic Topic Models," Communications of the ACM, Vol. 55, No. 4, 77-84, Apr. 2012. https://doi.org/10.1145/2133806.2133826
- Z. S. Harris, "Distributional Structure," pp.775-794, Springer, 1997.
- N. Friedman, D. Geiger, and M. Goldszmidt, "Bayesian Network Classifiers," Machine Learning 29.2-3, pp.131-163, Nov. 1997. https://doi.org/10.1023/A:1007465528199
- H. Jo, J-H. Kim, S. Yoon, K-M. Kim, and B-T. Zhang, "Large-Scale Text Classification with a Convolutional Neural Network," 42th The Korean Institute of Information Scientists and Engineers Annual Meetings, 2015.
- H. Jo, J-H. Kim, K-M. Kim, J-H Chang, J-H. Eom, and B-T. Zhang, "Large-Scale Text Classification with Recurrent Neural Networks," 43th The Korean Institute of Information Scientists and Engineers Annual Meetings, 2016.
- T. Young, D. Hazarika, S. Poria, E. Cambria, "Recent Trends in Deep Learning Based Natural Language Processing," arXiv:1708.02709, Oct. 2018.
- T. Mikolov, K. Chen, G. Corrado, J. Dean, "Efficient estimation of word representations in vector space," arXiv:1301.3781, Jan. 2013.
- Q. Le and T. Mikolov, "Distributed representations of sentences and documents," International Conference on Machine Learning, pp. 1188-1196, Jan. 2014.
- C. Goller and A. Kuchler, "Learning task-dependent distributed representations by backpropergation through structure," Neural Networks, IEEE International Conference, Vol. 1, 1996.
- S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation 9.8, pp. 1735-1780, Nov. 1997. https://doi.org/10.1162/neco.1997.9.8.1735
- K. Cho, et al., "Learning phrase representations using RNN encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.
- R. Jozefowicz, W. Zaremba, and I. Sutskever, "An empirical exploration of recurrent network architecture," Proceedings of the 32nd Intenational Conference on Machine Learning, 2015.
- Y. LeCun and Y. Bengio, "Convoluntional networks for images, speech, and time series," In M. A. Arbib (Ed.), The handbook of brain theory and neural networks, Cambridge, MA: MIT Press, pp. 255-258, 1995.
- Yoon Kim, "Convoluntional Neural Networks for Sentence Classification", Empirical Methods on Natural Language Proceeding, 2014.
- Y. Liu and M. Zhang, "Neural Network Methods for Natural Language Processing", Computational Linguistics, Vol. 44, pp.193-195, Mar. 2018. https://doi.org/10.1162/COLI_r_00312
- E-S. You, G-H. Choi, and S-H. Kim, "Study on Extraction of Keywords Using TF-IDF and Text Structure of Novels", Journal of The Korea Society of Computer and Information, Vol. 20(2), pp. 121-129, Feb. 2015. https://doi.org/10.9708/jksci.2015.20.2.121
- J. Park, H. Kim, H-G. Kim, T-K. Ahn, and H. Yi "Structuring of Unstructured 눈 Messages on Rail Services using Deep Learning Techniques", Journal of The Korea Society of Computer and Information, Vol. 23(7), pp. 19-26, Jul. 2018. https://doi.org/10.9708/JKSCI.2018.23.07.019