DOI QR코드

DOI QR Code

Export Control System based on Case Based Reasoning: Design and Evaluation

사례 기반 지능형 수출통제 시스템 : 설계와 평가

  • 홍원의 (한국과학기술원 지식서비스공학과) ;
  • 김의현 (한국과학기술원 지식서비스공학과) ;
  • 조신희 (한국과학기술원 지식서비스공학과) ;
  • 김산성 (한국방송공사 기술연구소) ;
  • 이문용 (한국과학기술원 지식서비스공학과) ;
  • 신동훈 (한국원자력통제기술원)
  • Received : 2014.06.29
  • Accepted : 2014.08.24
  • Published : 2014.09.30

Abstract

As the demand of nuclear power plant equipment is continuously growing worldwide, the importance of handling nuclear strategic materials is also increasing. While the number of cases submitted for the exports of nuclear-power commodity and technology is dramatically increasing, preadjudication (or prescreening to be simple) of strategic materials has been done so far by experts of a long-time experience and extensive field knowledge. However, there is severe shortage of experts in this domain, not to mention that it takes a long time to develop an expert. Because human experts must manually evaluate all the documents submitted for export permission, the current practice of nuclear material export is neither time-efficient nor cost-effective. Toward alleviating the problem of relying on costly human experts only, our research proposes a new system designed to help field experts make their decisions more effectively and efficiently. The proposed system is built upon case-based reasoning, which in essence extracts key features from the existing cases, compares the features with the features of a new case, and derives a solution for the new case by referencing similar cases and their solutions. Our research proposes a framework of case-based reasoning system, designs a case-based reasoning system for the control of nuclear material exports, and evaluates the performance of alternative keyword extraction methods (full automatic, full manual, and semi-automatic). A keyword extraction method is an essential component of the case-based reasoning system as it is used to extract key features of the cases. The full automatic method was conducted using TF-IDF, which is a widely used de facto standard method for representative keyword extraction in text mining. TF (Term Frequency) is based on the frequency count of the term within a document, showing how important the term is within a document while IDF (Inverted Document Frequency) is based on the infrequency of the term within a document set, showing how uniquely the term represents the document. The results show that the semi-automatic approach, which is based on the collaboration of machine and human, is the most effective solution regardless of whether the human is a field expert or a student who majors in nuclear engineering. Moreover, we propose a new approach of computing nuclear document similarity along with a new framework of document analysis. The proposed algorithm of nuclear document similarity considers both document-to-document similarity (${\alpha}$) and document-to-nuclear system similarity (${\beta}$), in order to derive the final score (${\gamma}$) for the decision of whether the presented case is of strategic material or not. The final score (${\gamma}$) represents a document similarity between the past cases and the new case. The score is induced by not only exploiting conventional TF-IDF, but utilizing a nuclear system similarity score, which takes the context of nuclear system domain into account. Finally, the system retrieves top-3 documents stored in the case base that are considered as the most similar cases with regard to the new case, and provides them with the degree of credibility. With this final score and the credibility score, it becomes easier for a user to see which documents in the case base are more worthy of looking up so that the user can make a proper decision with relatively lower cost. The evaluation of the system has been conducted by developing a prototype and testing with field data. The system workflows and outcomes have been verified by the field experts. This research is expected to contribute the growth of knowledge service industry by proposing a new system that can effectively reduce the burden of relying on costly human experts for the export control of nuclear materials and that can be considered as a meaningful example of knowledge service application.

최근 전 세계적인 원전 설비의 수요 증가로 원자력 전략물자 취급의 중요성이 높아지는 가운데, 국외 수출을 위한 원전 관련 물품 및 기술의 신청 또한 급증하는 추세이다. 전략물자 사전판정 업무는 통상 원자력 물자 관리에 해박한 전문가의 경험 및 지식에 근거하여 수행되어 왔지만, 급증하는 수요에 상응하는 전문 인력의 공급이 부족한 실정이다. 이러한 문제를 극복하기 위하여, 본 연구진은 전략물자 수출 통제를 위한 사례 기반 지능형 수출 통제 시스템을 설계 및 개발하였다. 이 시스템은 현장 전문가의 전담 업무이던 신규 사례에 대한 전략물자 사전판정 과정 업무의 주요 맥락을 자동화 하여 전문가 및 관계 기관이 감당해야 할 업무 부담을 줄이며, 빠르고 정확한 판정을 돕는 의사결정 지원 시스템의 역할을 맡는다. 개발된 시스템은 사례 기반 추론 (Case Based Reasoning) 방식에 기반을 두어 설계되었는데, 이는 과거 사례의 특성을 활용하여 신규 사례의 해법을 유추하는 추론 방법이다. 본 연구에서는 자연어로 작성된 전자문서 처리에 널리 사용되는 텍스트 마이닝 분석 기법을 원자력 분야에 특화된 형태로 응용하여 전략물자 수출통제 시스템을 설계하였다. 시스템 설계의 근거로 선행 연구에서 제안된 반자동식 핵심어 추출 방안의 성능을 보다 엄밀히 검증하였고, 추출된 핵심어로 신규 사례와 유사한 과거 사례를 추출하는 알고리즘을 제안하였다. 제안된 방안은 텍스트 마이닝 분야의 TF-IDF 방법 및 코사인 유사도 점수를 활용한 결과(${\alpha}$)와 원자력 분야에서 통용되는 개념적 지식을 계통으로 분류하여 도출한 결과(${\beta}$)를 조합하여 최종 결과 (${\gamma}$) 를 생성하게 된다. 세부 요소 기술의 성능 검증은 임상 데이터를 활용한 실험 및 실무 전문가의 의견수렴을 통해 이루어졌다. 개발된 시스템은 사전판정 전문 인력을 다수 양성하는 데 드는 비용을 절감하는 데 일조할 것이며, 지식서비스 산업의 의미 있는 응용 사례로서 관련 산업의 성장에 기여할 수 있을 것으로 보인다.

Keywords

References

  1. Aizawa, A., "An information-theoretic perspective of tf-idf measures," Information Processing and Management, Vol.39, No.1(2003), 45-65. https://doi.org/10.1016/S0306-4573(02)00021-3
  2. Al-Mubaid, H. and R. K. Singh, "A text-mining technique for extracting gene-disease associations from the biomedical literature," International Journal of Bioinformatics Research and Applications, Vol.6, No.3(2010), 270-286. https://doi.org/10.1504/IJBRA.2010.034075
  3. Ananiadou, S., T. Ohta, and M. K. Rutter, "Text Mining Supporting Search for Knowledge Discovery in Diabetes," Current Cardiovascular Risk Reports, Vol.7, No.1(2013), 1-8. https://doi.org/10.1007/s12170-012-0288-3
  4. Ananiadou, S., B. Rea, N. Okazaki, R. Procter, and J. Thomas, "Supporting Systematic Reviews Using Text Mining," Social Science Computer Review, Vol.27, No.4(2009), 509-523. https://doi.org/10.1177/0894439309332293
  5. Cao, Q., W. Duan, and Q. Gan, "Exploring determinants of voting for the "helpfulness" of online user reviews: A text mining approach," Decision Support Systems, Vol.50, No.2(2011), 511-521. https://doi.org/10.1016/j.dss.2010.11.009
  6. Chen, Y. L., Y. H. Liu, and W. L. Ho, "A text mining approach to assist the general public in the retrieval of legal documents," Journal of American Medical Informatics Association, Vol.64, No.2(2013), 280-290.
  7. Corley, C. D., D. J. Cook, A. R. Mikler, and K. P. Singh, "Text and Structural Data Mining of Influenza Mentions in Web and Social Media," International Journal of Environmental Research and Public Health, Vol.7, No.2 (2010), 596-615. https://doi.org/10.3390/ijerph7020596
  8. Feldman, R. and J. Sanger, The text mining handbook: advanced approaches in analyzing unstructured data, Cambridge University Press, Cambridge, 2007.
  9. Firdhous, M., "Automating Legal Research through Data Mining," International Journal of Advanced Computer Science and Applications, Vol.1, No.6(2012), 9-16.
  10. Ghose, A., "Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics," IEEE Transactions on Knowledge and Data Engineering, Vol.23, No.10(2011), 1498-1512. https://doi.org/10.1109/TKDE.2010.188
  11. Gupta, V. and G. S. Lehal, "A Survey of Text Mining Techniques and Applications," Journal of Emerging Technologies in Web Intelligence, Vol.1, No.1(2009), 60-76.
  12. Hu, X., J. S. Downie, and A. F. Ehmann, "Lyric Text Mining in Music Mood Classification," Proceedings of the 10th International Society for Music Information Retrieval Conference, (2009), 411-416.
  13. Hulth, A., "Improved Automatic Keyword Extraction Given More Lin-guistic Knowledge," Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, (2003), 216-223.
  14. Hung, J. I., "Trends of e-learning research from 2000 to 2008: Use of text mining and bibliometrics," British Journal of Educational Technology, Vol.43, No.1(2012), 5-16. https://doi.org/10.1111/j.1467-8535.2010.01144.x
  15. Hur, J., A. D. Schuyler, D. J. States, and E. L. Feldman, "SciMiner: web-based literature mining tool for target identification and functional enrichment analysis," Bioinformatics, Vol.25, No.6(2009), 838-840. https://doi.org/10.1093/bioinformatics/btp049
  16. Jessop, D. M., S. E. Adams, E. L. Willighagen, L. Hawizy, and P. Murray-Rust, "OSCAR4: a flexible architecture for chemical text-mining," Journal of Cheminformatics, Vol.3, No.1(2011), 41-52. https://doi.org/10.1186/1758-2946-3-41
  17. Kendal, S. L. and M. Creen, An introduction to knowledge engineering, Springer London, London, 2007.
  18. Kim, U., H. Kim, M. Y. Yi, and D. Shin, "Nuclear exports control system using semi-automatic keyword extraction," International Journal of Information and Electronics Engineering, Vol.4, No.4(2014), 293-297.
  19. Kodratoff, Y., "Knowledge discovery in texts: a definition, and applications." Foundations of Intelligent Systems, Proceedings of the 11th International Symposium, (1999), 16-29.
  20. Kozomara, A. and S. Griffiths-Jones, "miRBase: integrating microRNA annotation and deepsequencing data," Nucleic Acids Research, Vol.39, No.1(2011), 152-157.
  21. Krallinger, M., F. Leitner, and A. Valencia, "Analysis of Biological Processes and Diseases Using Text Mining Approaches," Bioinformatics Methods in Clinical Research, Vol.593, No.1(2010), 341-382. https://doi.org/10.1007/978-1-60327-194-3_16
  22. Krallinger, M., A. M. Rojas, and A. Valencia, "Creating Reference Datasets for Systems Biology Applications Using Text Mining," Annals of the New York Academy of Sciences, Vol.1158, No.1(2009), 14-28. https://doi.org/10.1111/j.1749-6632.2008.03750.x
  23. Landeghem, S. V., F. Ginter, Y. V. D. Peer, and T. Salakoski, "EVEX: a pubmed-scale resource for homology-based generalization of text mining predictions," Proceedings of the 2011 Workshop on Biomedical Natural Language Processing, (2011), 28-37.
  24. Lee, H. S., H. G. Song, and H. S. Lee, "Classification of Photovoltaic Research Papers by Using Text-Mining Techniques," Applied Mechanics and Materials, Vol.284, No.1 (2013), 3362-3369.
  25. Lee, J., Expert systems, principles and development, bubyoungsa, Seoul, 1996.
  26. Li, N. and D. D. Wu, "Using text mining and sentiment analysis for online forums hotspot detection and forecast," Decision Support Systems, Vol.48, No.2(2010), 354-368. https://doi.org/10.1016/j.dss.2009.09.003
  27. Liao, S., "Expert System methodologies and applications - a decade review from 1995 to 2004," Expert Systems with Application, Vol. 28, No.1(2005), 93-103. https://doi.org/10.1016/j.eswa.2004.08.003
  28. Lin, F. R., L. S. Hsieh, and F. T. Chuang, "Discovering genres of online discussion threads via text mining," Computers and Education, Vol.52, No.2(2009), 541-495.
  29. Liritano, S. and M. Ruffolo, "Managing the Knowledge Contained in Electronic Documents: a Clustering Method for Text Mining," Proceedings of the 12th International Workshop on Database and Expert Systems Applications, (2001), 454-458.
  30. Liu, X., S. Yu, F. Janssens, W. Glanzel, Y. Moreau, and B. D. Moor, "Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database," Journal of the American Society for Information Science and Technology, Vol.61, No.6(2010), 1105-1119.
  31. Macskassy, S. A., "Contextual linking behavior of bloggers: leveraging text mining to enable topic-based analysis," Social Network Analysis and Mining, Vol.1, No.4(2011), 355-375. https://doi.org/10.1007/s13278-011-0026-8
  32. Navathe, S. B., and R. Elmasri, Fundamentals of database systems, Pearson Education, Upper Saddle River, NJ, 2000.
  33. Netzer, O., R. Feldman,, J. Goldenberg, and M. Fresko, "Mine Your Own Business: Market-Structure Surveillance Through Text Mining," Marketing Science, Vol.31, No.3 (2012), 521-543. https://doi.org/10.1287/mksc.1120.0713
  34. Powers, D. M. W., "Evaluation: From precision, recall and f-measure to roc., informedness, markedness and correlation," Journal of Machine Learning Technologies, Vol.2, No.1 (2011), 37-63.
  35. Prentzas, J. and I. Hatzilygeroudis, "Categorizing approaches combining rule-based and casebased reasoning," Expert Systems, Vol.24, No.2(2007), 97-122. https://doi.org/10.1111/j.1468-0394.2007.00423.x
  36. Rajpathak, D., R. Chougule,, and P. Bandyopadhyay, "A domain-specific decision support system for knowledge discovery using association and text mining," Knowledge and Information Systems, Vol.31, No.3(2012), 405-432. https://doi.org/10.1007/s10115-011-0409-1
  37. Rak, R., A. Rowley, W. Black, and S. Ananiadou, "Argo: an integrative, interactive, text miningbased workbench supporting curation," The journal of biological databases and curation, (2012).
  38. Vellay, S. G. P., L. N. E. Miller,, and G. Paillard, "Interactive Text Mining with Pipeline Pilot: A Bibliographic Web-Based Tool for PubMed," Infectious Disorders - Drug Targets (Formerly Current Drug Targets - Infectious Disorders), Vol.9, No.3(2009), 366-374.
  39. Wyner, A., R. Mochales-Palau, M.-F. Moens, and D. Milward, "Approaches to Text Mining Arguments from Legal Cases," Semantic Processing of Legal Texts, Lecture Notes in Computer Science, Vol.6036(2010), 60-79.
  40. Xie, B., Q. Ding, H. Han, and D. Wu, "miRCancer: a microRNA-cancer association database constructed by text mining on literature," Bioinformatics, Vol.29, No.5(2013), 638-644. https://doi.org/10.1093/bioinformatics/btt014
  41. Yan, X. W., Y. F. Zheng, C. Yuan, and M. Q. Duan, "Research of Expert System in Nuclear Power Plant," Applied Mechanics and Materials, Vol.409-410(2013), 1569-1572. https://doi.org/10.4028/www.scientific.net/AMM.409-410.1569
  42. Yang, Y., "An evaluation of statistical approaches to text categorization," Information retrieval, Vol.1, No.(1-2)(1999), 69-90. https://doi.org/10.1023/A:1009982220290
  43. Yang. H., I. Spasic, J. A. Keane, and G. Nenadic, "A Text Mining Approach to the Prediction of Disease Status from Clinical Discharge Summaries," Journal of American Medical Informatics Association, Vol.16, No.4(2009), 596-600. https://doi.org/10.1197/jamia.M3096
  44. Yoon, J., "Detecting weak signals for long-term business opportunities using text mining of Web news," Expert Systems with Applications, Vol.39, No.16(2012), 12543-12550. https://doi.org/10.1016/j.eswa.2012.04.059