DOI QR코드

DOI QR Code

Methodology for Issue-related R&D Keywords Packaging Using Text Mining

텍스트 마이닝 기반의 이슈 관련 R&D 키워드 패키징 방법론

  • Received : 2014.04.07
  • Accepted : 2014.06.03
  • Published : 2015.04.30

Abstract

Considerable research efforts are being directed towards analyzing unstructured data such as text files and log files using commercial and noncommercial analytical tools. In particular, researchers are trying to extract meaningful knowledge through text mining in not only business but also many other areas such as politics, economics, and cultural studies. For instance, several studies have examined national pending issues by analyzing large volumes of text on various social issues. However, it is difficult to provide successful information services that can identify R&D documents on specific national pending issues. While users may specify certain keywords relating to national pending issues, they usually fail to retrieve appropriate R&D information primarily due to discrepancies between these terms and the corresponding terms actually used in the R&D documents. Thus, we need an intermediate logic to overcome these discrepancies, also to identify and package appropriate R&D information on specific national pending issues. To address this requirement, three methodologies are proposed in this study-a hybrid methodology for extracting and integrating keywords pertaining to national pending issues, a methodology for packaging R&D information that corresponds to national pending issues, and a methodology for constructing an associative issue network based on relevant R&D information. Data analysis techniques such as text mining, social network analysis, and association rules mining are utilized for establishing these methodologies. As the experiment result, the keyword enhancement rate by the proposed integration methodology reveals to be about 42.8%. For the second objective, three key analyses were conducted and a number of association rules between national pending issue keywords and R&D keywords were derived. The experiment regarding to the third objective, which is issue clustering based on R&D keywords is still in progress and expected to give tangible results in the future.

빅데이터 기술에 대한 관심이 급증함에 따라, 소셜 미디어를 통해 유통되는 방대한 양의 비정형 데이터를 분석하고자 하는 시도가 활발히 이루어지고 있다. 이에 따라서 텍스트 형태의 비정형 데이터 분석을 통해 의미 있는 정보를 찾고자 하는 시도가 비즈니스 영역뿐 아니라, 정치, 경제, 문화 등 다양한 영역에서 이루어지고 있다. 특히 최근에는 여러 현안 및 이슈들을 발굴하여 이를 의사결정에 활용하고자 하는 시도가 활발히 이루어지고 있다. 이처럼 빅데이터 분석을 통해 국가현안이나 이슈를 발굴하고자 하는 시도가 꾸준히 이루어져왔음에도 불구하고, 국가현안 및 이슈로부터 이와 관련된 R&D 문서를 효율적으로 제공하는 방안은 마련되지 않고있다. 이는 사용자들이 인식하는 현안 키워드와 실제 사용되는 R&D 키워드 사이의 이질성이 존재하기 때문이다. 따라서 현안 및 R&D키워드간의 이질성을 극복하기 위한 중간 장치가 필요하며, 이 중간 장치를 통해 각 현안 키워드와 R&D 키워드간에 적절한 대응이 이루어져야 한다. 이를 위해 본 연구에서는 (1) 현안 키워드 추출을 위한 하이브리드 방법론, (2) 현안 대응 R&D 정보 패키징 방법론, 그리고 (3) R&D 관점에서의 연관 현안 네트워크 구축 방법론의 총 세 가지 방법론을 제안한다. 제안하는 방법론은 텍스트 마이닝, 소셜네트워크 분석, 그리고 연관 규칙 마이닝 등의 데이터 분석 기법들을 활용하여 수행하였으며, 그 결과, (1)에 의한 키워드 보강률은 42.8%로 나타났으며, (2)의 경우, 현안 키워드와 R&D 키워드간 다수의 연관 규칙이 나타났다. (3)의 경우는 현재 진행 중에 있으며, 향후 가시적 성과를 낼 수 있을 것으로 예상된다.

Keywords

References

  1. R. Albright, "Taming Text with the SVD," SAS Institute Inc., 2004. www.sas.com/apps/whitepapers/whitepaper.jsp?code=SDM5
  2. J. Han, and M. Kamber, "Data Mining: Concepts and Techniques," 3rd ed., Morgan Kaufmann Publishers: Massachusetts, 2011. http://web.engr.illinois.edu/-hanj/bk3/
  3. R. J. Mooney, and R. Bunescu, "Mining Knowledge from Text Using Information Extraction," ACM SIGKDD Explorations, Vol. 7, No. 1, pp. 3-10, 2005. http://www.cs.utexas.edu/-ai-lab/pubs/text-kddexplore-05.pdf
  4. I. H. Witten, "Text Mining," Practical Handbook of Internet Computing, edited by M. P. Singh, Chapman & Hall/ CRC Press, 2005. http://www.cs.waikato.ac.nz/-ihw/papers/04-IHW-Textmining.pdf
  5. G. Salton, A. Wong, and C. S. Yang, "A Vector Space Model for Automatic Indexing," Communications of the ACM, Vol. 18, No. 11, pp. 613-620, 1975. http://dx.doi.org/10.1145/361219.361220
  6. A. Stanvrianou, P. Andritsos, and N. Nicoloyannis, "Overview and Semantic Issues of Text Mining," ACM SIGMOD Record, Vol. 36, No. 3, pp. 23-34, 2007. http://dx.doi.org/10.1145/1324185.1324190
  7. E. Yu, J. Kim, C. Lee, and N. Kim, "Using Ontologies for Semantic Text Mining," The Journal of Information Systems, Vol. 21, No. 3, pp. 137-161, 2012. http://dx.doi.org/10.5859/KAIS.2012.21.3.137
  8. D. Jeong, M. Hwang, M. Cho, H. Jung, S. Yoon, K. Kim, and P. Kim, "Ontology and Text Mining-based Advanced Historical People Finding Service," Journal of Internet Computing and Services, Vol. 13, No. 5, pp. 33-43, 2012. http://dx.doi.org/10.7472/jksii.2012.13.5.33
  9. R. Agrawal, and R. Srikant, "Fast Algorithms for Mining Association Rules," International Conference on Very Large Data Bases, Santiago, Chile, pp. 487-499, 1994. http://dl.acm.org/citation.cfm?id=645920.672836&coll=DL&dl=ACM&CFID=656631652&CFTOKEN=27677818
  10. I. Cho, and N. Kim, "Recommending Core and Connecting Keywords of Research Area Using Social Network and Data Mining Techniques," Journal of Intelligence and Information Systems, Vol. 17, No. 1, pp. 127-138, 2011. http://www.dbpia.co.kr/Journal/ArticleDetail/1477011
  11. Y. Sohn, I. Kim, and N. Kim, "Automated Conceptual Data Modeling Using Association Rule Mining," The Journal of information systems, Vol. 18, No. 4, pp. 59-86, 2009. http://dx.doi.org/10.5859/KAIS.2009.18.4.059
  12. N. Kim, "Effect of Market Basket Size on the Accuracy of Association Rule Measures," The journal of MIS research, Vol. 18, No. 2, pp. 95-114, 2008. http://scholar.ndsl.kr/schDetail.do
  13. H. Ahn, I. Han, and N. Kim, "The Product Recommender System Combining Association Rules and Classification Models: The Case of G Internet Shopping Mall," Information Systems Review, Vol. 8, No. 1, pp. 181-201, 2006. http://dx.doi.org/10.13088/jiis.2013.19.2.039
  14. S. Yoon, "Churn Prediction Model for Department Store Customers Using Data Mining Technique," Asia Marketing Journal, Vol. 6, No. 4, pp. 45-72, 2005. http://academic.naver.com/view.nhn?doc_id=11465855
  15. Y. Lee, and K. Kim, "Product Recommender Systems using Multi-Model Ensemble Techniques," Journal of Intelligence and Information Systems, Vol. 19, No. 2, pp. 39-54, 2013. http://www.dbpia.co.kr/Article/3219909 https://doi.org/10.13088/jiis.2013.19.2.039
  16. W. F. Wang, Y. L. Chung, M. H. Hus, and A. C. Keh, "A Personalized Recommender System for the Cosmetic Business," Expert Systems with Applications, Vol. 26, No. 3, pp. 427-434, 2007. http://dx.doi.org/10.1016/j.eswa.2003.10.001
  17. Y. Kim, "Social Network Analysis," Bakyoungsa: Seoul, 2003. http://book.naver.com/bookdb/book_detail.nhn?bid=128306
  18. S. Kauffman, "The Origins of Order," Oxford University Press: New York, 1993. https://global.oup.com/academic/product/the-origins-of-order-9780195079517
  19. K. Kwahk, "Social Network Analysis," Chungram: Seoul, 2013. http://book.naver.com/bookdb/book_detail.nhn?bid=7462254
  20. S. Park, and K. P. Kim, "A Closeness Analysis Algorithm for Workflow-supported Social Networks," Journal of Internet Computing and Services, Vol. 14, No. 5, pp. 77-85, 2013. http://www.dbpia.co.kr/Article/3282313 https://doi.org/10.7472/jksii.2013.14.5.77
  21. K. Lee, H. Namgoong, E. Kim, K. Lee, and H. Kim, "Analysis of Multi-Dimensional Interaction among SNS Users," Journal of Internet Computing and Services, Vol. 12, No. 2, pp. 113-122, 2011. http://www.dbpia.co.kr/Article/1464198
  22. A. Jin, J. Lee, and J. Lee, "Measuring Method of String Similarity for POI Data Retrieval," Journal of KIISE: Computing Practices and Letters, Vol. 19, No. 4, pp. 177-185, 2013. http://www.dbpia.co.kr/Article/3140094
  23. B. You, and K. Choi, "A Study on the Construction of the National R&D Knowledge Information: Mainly Focused on the Research Planning and Management," Journal of the Korean Society for Library and Information Science, Vol. 38, No. 1, pp. 281-301, 2004. http://www.dbpia.co.kr/Article/348410 https://doi.org/10.4275/KSLIS.2004.38.1.281
  24. S. Shin, Y. Yoon, M. Yang, J. Kim, and K. Shon, "A Data Cleansing Strategy for Improving Data Quality of National R&D Information-Case Study of NTIS," The Korean Society Of Computer And Information, Vol. 16, No. 6, 2011. http://dx.doi.org/10.9708/jksci.2011.16.6.119
  25. Y. Hyun, H. Han, H. Choi, J. Park, K. Lee, K. Kwahk, and N. Kim, "Methodology Using Text Analysis for Packaging R&D Information Services on Pending National Issues," Journal of Information Technology Applications & Management, Vol. 20, No. 3, pp. 231-257, 2013. http://www.dbpia.co.kr/Article/3257838
  26. L. Kwon, and J. Kim, "A Study on the Establishment of Reference Linking System for National R&D Information," Journal of Korea Contents Association, Vol. 8, No. 1, pp. 195-202, 2008. http://www.dbpia.co.kr/Article/761180 https://doi.org/10.5392/JKCA.2008.8.1.195
  27. M. Yang, Y. Yoon, S. Shin, J. Kim, and K. Shon, "A Development of Expert Search Agent System using National R&D Human Information Database for NTIS," Journal of Internet Computing and Services, Vol. 11, No. 2, pp. 285-286, 2010. http://www.dbpia.co.kr/Article/1390628
  28. J. Scott, Social Network Analysis: A Handbook, SAGE: California, 2000. http://www.amazon.com/Social-Network-Analysis-John-Scott/dp/1446209040

Cited by

  1. 문서 중요도를 고려한 토픽 기반의 논문 교정자 매칭 방법론 vol.19, pp.4, 2015, https://doi.org/10.7472/jksii.2018.19.4.27