DOI QR코드

DOI QR Code

Recommendation System for Research Field of R&D Project Using Machine Learning

머신러닝을 이용한 R&D과제의 연구분야 추천 서비스

  • Kim, Yunjeong (Korea Institute of Science and Technology Information) ;
  • Shin, Donggu (Korea Institute of Science and Technology Information) ;
  • Jung, Hoekyung (Department of Computer Engineering, Paichai University)
  • Received : 2021.09.09
  • Accepted : 2021.10.22
  • Published : 2021.12.31

Abstract

In order to identify the latest research trends using data related to national R&D projects and to produce and utilize meaningful information, the application of automatic classification technology was also required in the national R&D information service, so we conducted research to automatically classify and recommend research field. About 450,000 cases of national R&D project data from 2013 to 2020 were collected and used for learning and evaluation. A model was selected after data pre-processing, analysis, and performance analysis for valid data among collected data. The performance of Word2vec, GloVe, and fastText was compared for the purpose of deriving the optimal model combination. As a result of the experiment, the accuracy of only the subcategories used as essential items of task information is 90.11%. This model is expected to be applicable to the automatic classification study of other classification systems with a hierarchical structure similar to that of the national science and technology standard classification research field.

국가연구개발사업 관련 데이터를 이용한 최신 연구동향 파악, 의미 있는 정보의 생산과 활용을 위해 국가R&D 정보 서비스에도 자동 분류 기술 적용이 요구되어 R&D과제의 연구분야를 자동 분류하고 추천하기 위한 연구를 진행했다. 2013~2020년 국가R&D 과제 데이터 약 45만 건을 수집하여 학습과 평가에 사용했다. 수집 데이터 중 유효한 데이터를 대상으로 데이터 전처리 및 분석, 실험을 통한 성능 분석 후 모델을 선정했다. 최적의 모델 조합 도출을 목적으로 Word2vec, GloVe, fastText 성능을 비교했다. 실험 결과, 과제정보의 필수 항목으로 사용되는 소분류만의 정확도는 90.11%이다. 이 모델은 국가과학기술표준분류 연구분야와 유사한 계층 구조를 가진 다른 분류체계의 자동 분류 연구에 활용 가능할 것으로 기대한다.

Keywords

Acknowledgement

This research was supported by Construction of NTIS funded by the Ministry of Science and ICT.

References

  1. J. Y. Choi, H. Hahn, and Y. C. Jung, "Research on Text Classification of Research Reports using Korea National Science and Technology Standards Classification Codes," Journal of the Korea Academia-Industrial cooperation Society, vol. 21, no. 1, pp. 169-177, 2020. https://doi.org/10.5762/KAIS.2020.21.1.169
  2. Y. Choi and S. P. Choi, "A Study on Patent Literature Classification Using Distributed Representation of Technical Terms," Journal of the Korean Society for Library and Information Science, vol. 53, no. 2, pp. 179-199, May 2019. https://doi.org/10.4275/KSLIS.2019.53.2.179
  3. S. Hwang and D. Kim, "BERT-based Classification Model for Korean Documents," The Journal of Society for e-Business Studies, vol. 25, no. 1, pp. 203-214, Feb. 2020. https://doi.org/10.7838/JSEBS.2020.25.1.203
  4. J. S. Lee, S. P. Jun, and H. S. Yoo, "A Study on Automatic Classification Model of Documents Based on Korean Standard Industrial Classification," Journal of Intelligent Information Systems, vol. 24, no. 3, pp. 221-241, Sep. 2018.
  5. S. Li, J. Hu, Y. Cui, and J. Hu, "DeepPatent: patent classification with convolutional neural networks and word embedding," Scientometrics, vol. 117, no. 2, pp. 721-744, 2018. https://doi.org/10.1007/s11192-018-2905-5
  6. A. K. Sharma, S. Chaurasia, and D. K. Srivastava, "Sentimental Short Sentences Classification by Using CNN Deep Learning Model with Fine Tuned Word2Vec," Procedia Computer Science, vol. 167, pp. 1139-1147, 2020. https://doi.org/10.1016/j.procs.2020.03.416
  7. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their Compositionality," in Advances in neural information processing systems, pp. 3111-3119, 2013.
  8. J. Pennington, R. Socher, and C. D. Manning, "Glove: Global vectors for word representation," in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532-1543, 2014.
  9. A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, "Bag of Tricks for Efficient Text Classification," Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 2, pp. 427-431, 2017.
  10. H. Kim, J. Lee, N. Y. Yeo, M. Astrid, S. Lee, and Y. Kim, "CNN based Sentence Classification with Semantic Features using Word Clustering," International Conference on Information and Communication Technology Convergence (ICTC), pp. 484-488, 2018.
  11. B. Jang, I. Kim, and J. W. Kim, "Word2vec Convolutional Neural Networks for Classification of News Articles and Tweets," PloS one, vol. 14, no. 8, pp. e0220976, 2019.doi: 10.1371/journal.pone.0220976.
  12. Q. Le and T. Mikolov, "Distributed Representations of Sentences and Documents," International conference on machine learning, PMLR, 2014.
  13. J. Yuk and M. Song, "A Study of Research on Methods of Automated Biomedical Document Classification Using Topic Modeling and Deep Learning," Journal of the Korean Society for information, vol. 35, no. 2, pp. 63-88, Jun. 2018.
  14. D. W. Kim and M. W. Koo, "Categorization of Korean News Articles Based on Convolutional Neural Network Using Doc2Vec and Word2Vec," Journal of KIISE, vol. 44, no. 7, pp. 742-747, 2017. https://doi.org/10.5626/JOK.2017.44.7.742
  15. Y. S. Kim and S. W. Lee, "Combinations of Text Preprocessing and Word Embedding Suitable for Neural Network Models for Document Classification," Korea Information Science Society, vol. 45, no. 7, pp. 690-700, July. 2018.
  16. T. Mikolov, K. ChenK, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," arXiv preprint arXiv:1301.3781, 2013.
  17. K. M. Jang, E. S. Kim, and H. K. Jung, "A Study on the Standardization of Information for the Integrated Management of Researcher's Information," Journal of the Korea Institute of Information and communication Engineering, vol. 25, no. 5, pp. 741-747, May. 2021. https://doi.org/10.6109/JKIICE.2021.25.5.741
  18. J. S. Bang, D. Y. Hwang, and H. K. Jung, "Product Recommendation System based on User Purchase Priority," Journal of Information and Communication Convergence Engineering, vol. 18, no. 1, pp. 55-60, Mar. 2020. https://doi.org/10.6109/jicce.2020.18.1.55