DOI QR코드

DOI QR Code

Verifying the Classification Accuracy for Korea's Standardized Classification System of Research F&E by using LDA(Linear Discriminant Analysis)

선형판별분석(LDA)기법을 적용한 국가연구시설장비 표준분류체계의 분류 정확도 검증

  • Joung, Seokin (National Research Facilities & Equipment Center, KBSI) ;
  • Sawng, Yeongwha (Department of Management of Technology, Konkuk University) ;
  • Jeong, Euhduck (National Research Facilities & Equipment Center, KBSI)
  • 정석인 (한국기초과학지원연구원 국가연구시설장비진흥센터) ;
  • 송영화 (건국대학교 경영대학 기술경영학과) ;
  • 정의덕 (한국기초과학지원연구원 국가연구시설장비진흥센터)
  • Received : 2020.02.20
  • Accepted : 2020.03.04
  • Published : 2020.03.31

Abstract

Recently, research F&E(Facilities and Equipment) have become very important as tools and means to lead the development of science and technology. The government has been continuously expanding investment budgets for R&D and research F&E, and the need for efficient operation and systematic management of research F&E built up nationwide has increased. In December 2010, The government developed and completed a standardized classification system for national research F&E. However, accuracy and trust of information classification are suspected because information is collected by a method in which a user(researcher) directly selects and registers a classification code in NTIS. Therefore, in the study, we analyzed linearly using linear discriminant analysis(LDA) and analysis of variance(ANOVA), to measure the classification accuracy for the standardized classification system(8 major-classes, 54 sub-classes, 410 small-classes) of the national research facilities and equipment established in 2010, and revised in 2015. For the analysis, we collected and used the information data(50,271 cases) cumulatively registered in NTIS(National Science and Technology Service) for the past 10 years. This is the first case of scientifically verifying the standardized classification system of the national research facilities and equipment, which is based on information of similar classification systems and a few expert reviews in the in-outside of the country. As a result of this study, the discriminant accuracy of major-classes organized hierarchically by sub-classes and small-classes was 92.2 %, which was very high. However, in post hoc verification through analysis of variance, the discrimination power of two classes out of eight major-classes was rather low. It is expected that the standardized classification system of the national research facilities and equipment will be improved through this study.

정부는 연구시설장비가 과학기술의 발전을 견인하는 매우 중요한 도구이자, 수단으로 여겨지면서 국가적으로 R&D와 연구시설장비에 대한 예산 투자를 지속적으로 확대하였다. 또한, 기 구축된 국가연구시설장비의 효율적 운영 및 체계적 관리의 필요성이 점차 대두되면서 2010년 12월, 국가연구시설장비 표준분류체계를 개발하였다. 그러나 연구현장에서는 국가연구시설장비의 NTIS(National Science and Technology Service) 정보수집 초기단계로 누적정보 부족에 따른 표준분류체계의 과학적 검증절차 부재와 동일계층 간 분류기준의 비일관성 문제가 여전히 한계로 제기되고 있다. 따라서 본 연구는 지난 2010년, 2015년 각 제/개정된 국가연구시설장비 표준분류체계(대분류 8개, 중분류 25개, 소분류 410개)의 분류 정확도를 측정하고자 선형판별분석(LDA)과 분산분석(ANOVA) 기법을 적용하여 2단계로 분석하였다. 또한, 본 연구 분석을 위해 지난 10년 동안 NTIS에 누적 등록된 정보데이터(Big-Data) 50,271건을 수집하여 이를 활용하였다. 이는 단순히 국내외 유사 분류체계와 전문가 의견을 토대로 만들어진 현(現) 국가연구시설 표준분류체계를 과학적으로 실증 검증한 첫 연구 사례에 해당된다. 본 연구 결과, 대분류 이하 중분류와 소분류로 분류된 개체 수의 집단별 판별정확도는 92.2% 로 매우 높은 수준이었고, 분산분석을 통한 사후검증에서는 대분류 8개 중 2개 집단의 변별력이 다소 낮게 나타나, 현(現) 표준분류체계 중 일부 개선이 필요한 것으로 조사되었다. 본 연구를 통해 현(現) 국가연구시설장비 표준분류체계가 향후 지속적으로 개선되길 바란다.

Keywords

References

  1. Cho, H. H., Sin, T. Y., Song, W. J., An, D. H., Song, S. S., Kim, S. K., Han, Y. B. (2001), Study on Writing National Science Technology Standardized Classification Table, Seoul: Science & Technology Policy Institute.
  2. Daniela M. W. and Robert T. (2011), "Penalized classification using Fisher's linear discriminant", Journal of the Royal Statistical Society: Series B(Statistical Methodology), 73(5), 53-772.
  3. Dipillo P. (1976), "The application of bias to discriminant analysis", Communication in Statistics Theory and Methodology, A5, 843-854. https://doi.org/10.1080/03610927608827401
  4. Dudoit S., Fridlyand J., Speed T. P. (2002), "Comparison of discrimination methods for the classification of tumors using gene expression data", Journal of the American Statistical Association, 97(457), 77-87. https://doi.org/10.1198/016214502753479248
  5. Duintjer T. J. and Schlesinger P. (2007), "Improving implementation of linear discriminant analysis for the high dimension/small sample size problem", Computational Statistics & Data Analysis, 52(1), 423-437. https://doi.org/10.1016/j.csda.2007.02.001
  6. Guo Y., Hastie T., Tibshirani R. (2007), "Regularized linear discriminant analysis and its application in microarray", Biostatistics, 8(1), 86-100. https://doi.org/10.1093/biostatistics/kxj035
  7. Hong, S. K., Kim, M., Lee, H. E., Choi, H. R., Kim, B. S., Kwon, S. A. (2016), Study on classification criteria of national R&D projects for systematic information provision, Seoul: Korea Institute of Science & Technology Evaluation and Planning.
  8. Lee C. Y. and Chen, B. S. (2018), "Mutually Exclusive and Collectively Exhaustive Feature Selection Scheme", Applied Soft Computing, 7(68), 961-971.
  9. Lee, D., Lee, H., Yoon, I. (2016), "Development of a Classification Scheme for Management of Technology Research: Approach on Research Designs and Methodologies", Management & Information Systems Review, 35(4), 269-287. https://doi.org/10.29214/damis.2016.35.4.015
  10. Lee H. S. and Lim J. H. (2009), Manual for SPSS 14.0, Seoul: Beob Moon Sa.
  11. Ministry of Science and ICT(2019), Manual on the Management of National R&D Facilities and Equipment, Gwacheon: Ministry of Science and ICT.
  12. National Research Facilities and Equipment Center(2015), the Standardized Classification System for Research Facilities and Equipment, PRISM Research Report(22), Daejeon: Korea Basic Science Institute.
  13. OECD(2015), Guidelines for collecting and reporting data on research and experimental development, Frascati Manual.
  14. Seol, S. S. and Song, C. H. (2000), Theory and Practice of Knowledge Activities Classification, Hannam University Press.
  15. Sin, T. Y, Park, J. H., Jeong, G. H. (1994), R&D for Korean Technology Classification System, Seoul: Science & Technology Policy Institute.
  16. Sung, B. S. and Cho, W. G. (2016), "Discriminant Factors Influencing Utilization of Genetic Resource", Management & Information Systems Review, 35(3), 95-113. https://doi.org/10.29214/DAMIS.2016.35.3.006
  17. Sung, T. K. (2010), "An Essay on the Relationship between Standards and Technological Innovation", Management & Information Systems Review, 29(4), 225-244. https://doi.org/10.29214/damis.2010.29.4.012
  18. Yoo, J. Y., Choi, M. J., Kang, S. Y., Lee, B. R. (2018), the National Science and Technology Standardized Classification System, Seoul: Korea Institute of Science & Technology Evaluation and Planning.