DOI QR코드

DOI QR Code

Construction of Research Fronts Using Factor Graph Model in the Biomedical Literature

팩터그래프 모델을 이용한 연구전선 구축: 생의학 분야 문헌을 기반으로

  • 김혜진 (연세대학교 문헌정보학과) ;
  • 송민 (연세대학교 문헌정보학과)
  • Received : 2017.02.20
  • Accepted : 2017.03.06
  • Published : 2017.03.30

Abstract

This study attempts to infer research fronts using factor graph model based on heterogeneous features. The model suggested by this study infers research fronts having documents with the potential to be cited multiple times in the future. To this end, the documents are represented by bibliographic, network, and content features. Bibliographic features contain bibliographic information such as the number of authors, the number of institutions to which the authors belong, proceedings, the number of keywords the authors provide, funds, the number of references, the number of pages, and the journal impact factor. Network features include degree centrality, betweenness, and closeness among the document network. Content features include keywords from the title and abstract using keyphrase extraction techniques. The model learns these features of a publication and infers whether the document would be an RF using sum-product algorithm and junction tree algorithm on a factor graph. We experimentally demonstrate that when predicting RFs, the FG predicted more densely connected documents than those predicted by RFs constructed using a traditional bibliometric approach. Our results also indicate that FG-predicted documents exhibit stronger degrees of centrality and betweenness among RFs.

연구전선이란 연구논문들 간에 인용이 빈번하게 발생하며, 지속적으로 발전이 이루어지고 있는 연구영역을 의미한다. 연구행위가 집중되는 핵심 연구분야로 발전 가능성이 높은 연구전선을 조기에 예측해내는 것은 학계와 산업계, 정부기관, 나아가 국가의 과학기술 발전에 큰 유익을 가져다 줄 수 있는 유용한 사회적 자원이 된다. 본 연구는 복합자질을 활용하여 연구전선을 추론하는 모델을 제시하고자 시도하였다. 연구전선 추론은 핵심 연구영역으로 발전할 가능성이 높은 문헌들이 포함될 수 있도록 문헌을 복합자질로 표현하고, 그 자질들을 심층학습하여 새로 발행된 문헌들이 연구전선에 포함될 수 있는지 그 가능성을 예측하였다. 서지 자질, 네트워크 자질, 내용 자질 등 복합자질 세트를 사용하여 문헌을 표현하고 피인용을 많이 받을 가능성이 있는 문헌을 추론하기 위해서 확률기반 팩터그래프 모델을 적용하였다. 추출된 자질들은 팩터그래프의 변수로 표현되어 합-곱 알고리즘과 접합 트리 알고리즘을 적용하여 연구전선 추론이 이루어졌다. 팩터그래프 확률모델을 적용하여 연구전선을 추론 구축한 결과, 서지결합도 4 이상으로 구축된 베이스라인 연구전선과 큰 차이를 보였다. 팩터그래프 기반 연구전선그룹이 서지결합 기반 연구전선그룹보다 문헌 간의 직접 연결정도가 강하며 연결 관계에 있지 않은 두 개의 문헌을 연결시키는 매개정도 또한 강한 집단으로 나타났다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. 김조아, 이재윤 (2016). 인용 이미지 구축자 프로파일링을 이용한 국내 여성학 분야 연구 전선 분석. 정보관리학회지, 33(2), 201-225. http://dx.doi.org/10.3743/KOSIM.2016.33.2.201 (Kim, Jo-Ah, & Lee, Jae Yun (2016). Analyzing the research fronts of women's studies in Korea using citation image makers profiling. Journal of the Korean Society for Information Management, 33(2), 201-225. http://dx.doi.orf/10.3743/KOSIM.2016.33.2.201)
  2. 서은경, 유소영 (2013). 국내 정보학분야 연구동향 분석: 2000-2011. 정보관리학회지, 30(4), 215-239. http://dx.doi.org/10.3743/KOSIM.2013.30.4.215 (Eun-Gyoung, & Yu, So-Young (2013). Detecting research trends in Korean information science research, 2000-2011. Journal of the Korean Society for Information Management, 30(4), 215-239. http://dx.doi.orf/10.3743/KOSIM.2013.30.4.215)
  3. 이재윤 (2015). 문헌동시인용 분석을 통한 한국 문헌정보학의 연구 전선 파악. 정보관리학회지, 32(4), 77-106. http://dx.doi.org/10.3743/KOSIM.2015.32.4.077 (Lee, Jae Yun (2015). Identifying the research fronts in Korean library and information science 32(4), 77-106. http://dx.doi.orf/10.3743/KOSIM.2015.32.4.077) by document co-citation analysis. Journal of the Korean Society for Information Management,
  4. 조재인 (2011). 네트워크 텍스트 분석을 통한 문헌정보학 최근 연구 경향 분석. 정보관리학회지, 28(4), 65-83. https://doi.org/10.3743/kosim.2011.28.4.065 (Cho, Jane (2011). A study for research area of library and information science by network text analysis. Journal of the Korean Society for Information Management, 28(4), 65-83. https://doi.org/10.3743/kosim.2011.28.4.065)
  5. Bonacich, P. (2007). Some unique properties of eigenvector centrality. Social Networks, 29(4), 555-564. http://dx.doi.org/10.1016/j.socnet.2007.04.002
  6. Boyack, K., & Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389-2404. http://dx.doi.org/10.1002/asi.21419
  7. Castillo, C., Donato, D., & Gionis, A. (2007). Estimating number of citations using author reputation. Proceedings of the String Processing and Information Retrieval, 107-117. https://doi.org/10.1007/978-3-540-75530-2_10
  8. e Solla Price, D. (1965). Networks of scientific papers. Science, 149(4683), 510-515. https://doi.org/10.1126/science.149.3683.510
  9. Frey, B. (1998). Graphical models for machine learning and digital communication. Cambridge, Mass: The MIT Press.
  10. Fu, L., & Aliferis, C. (2008). Models for predicting and explaining citation count of biomedical articles. Proceedings of the American Medical Informatics Association (AMIA), 1, 222-226.
  11. Jaffe, A., & Trajtenberg, M. (1996). Flows of knowledge from universities and federal laboratories: Modeling the flow of patent citations over time and across institutional and geographic boundaries. Proceedings of the National Academy of Sciences, 93(23), 12671-12677. https://doi.org/10.1073/pnas.93.23.12671
  12. Jarneving, B. (2007). Bibliographic coupling and its application to research-front and other core documents. Journal of Informetrics, 1(4), 287-307. http://dx.doi.org/10.1016/j.joi.2007.07.004
  13. Jordan, M. (2004). Graphical models. Statistical Science, 19(1), 140-155. https://doi.org/10.1214/088342304000000026
  14. Kollar, D., & Friedman, N. (2009). Probabilistic graphical models: Principles and techniques. Cambridge, Mass: The MIT Press.
  15. Kschischang, F., Frey, B., & Loeliger, H. (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2), 498-519. https://doi.org/10.1109/18.910572
  16. Loeliger, H. (2004). An introduction to factor graphs. IEEE Signal Processing Magazine, 21(1), 28-41. http://dx.doi.org/10.1109/MSP.2004.1267047
  17. McCain, K., & Turner, K. (1989). Citation context analysis and aging patterns of journal articles in molecular genetics. Scientometrics, 17(1), 127-163. http://dx.doi.org/10.1007/BF02017729
  18. Narin, F., & Hamilton, K. (1996). Bibliometric performance measures. Scientometrics, 36(3), 293-310. http://dx.doi.org/10.1007/BF02129596
  19. Persson, O. (1994). The intellectual base and research fronts of JASIS 1986-1990. Journal of the American Society for Information Science, 45(1), 31-38. https://doi.org/10.1002/(SICI)1097-4571(199401)45:1<31::AID-ASI4>3.0.CO;2-G
  20. Porta, M. (2014). A Dictionary of Epidemiology (6th ed.). New York: Oxford University Press.
  21. Shen, H., & Coughlan, J. (2007). Grouping using factor graphs: An approach for finding text with a camera phone. Graph-Based Representations in Pattern Recognition, 4538, 394-403. http://dx.doi.org/10.1007/978-3-540-72903-7_36
  22. Shibata, N., Kajikawa, Y., & Matsushima, K. (2007). Topological analysis of citation networks to discover the future core articles. Journal of the American Society for Information Science and Technology, 58(6), 872-882. http://dx.doi.org/10.1002/asi.20529
  23. Shibata, N., Kajikawa, Y., Takeda, Y., & Matsushima, K. (2009). Comparative study on methods of detecting research fronts using different types of citation. Journal of the American Society for Information Science and Technology, 60(3), 571-580. http://dx.doi.org/10.1002/asi.20994
  24. Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for information Science, 24(4), 265-269. https://doi.org/10.1002/asi.4630240406
  25. Small, H., Sweeney, E., & Greenlee, E. (1985). Clustering the Science Citation Index using co-citations. II. Mapping science. Scientometrics, 8(5), 321-340. https://doi.org/10.1007/bf02018057
  26. Song, M., & Kim, S. Y. (2013). Detecting the knowledge structure of bioinformatics by mining full-text collections. Scientometrics, 96(1), 183-201. http://dx.doi.org/10.1007/s11192-012-0900-9
  27. Sun, Y., Deng, H., & Han, J. (2012). Probabilistic models for text mining. Mining Text Data, 259-295. https://doi.org/10.1007/978-1-4614-3223-4_8
  28. Sutton, C., & McCallum, A. (2007). An introduction to conditional random fields for relational learning. Introduction to Statistical Relational Learning, 93, 142-146.
  29. Upham, S., & Small, H. (2010). Emerging research fronts in science and technology: Patterns of new knowledge development. Scientometrics, 83(1), 15-38. http://dx.doi.org/10.1007/s11192-009-0051-9
  30. Van Eck, N., Waltman, L., Dekker, R., & Van Den Berg, J. (2010). A comparison of two techniques for bibliometric mapping: Multidimensional scaling and VOS. Journal of the American Society for Information Science and Technology, 61(12), 2405-2416. http://dx.doi.org/10.1002/asi.21421
  31. Weiss, Y., & Freeman, W. (2001). On the optimality of solutions of the max-product beliefpropagation algorithm in arbitrary graphs. IEEE Transactions on Information Theory, 47(2), 736-744. http://dx.doi.org/10.1109/18.910585
  32. Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C., & Nevill-Manning, C. G. (1999). KEA: Practical automatic keyphrase extraction. Proceedings of the 4th ACM conference on Digital Libraries, 254-255. https://doi.org/10.1145/313238.313437
  33. Yeh, Y., Breeden, K., Yang, L., Fisher, M., & Hanrahan, P. (2013). Synthesis of tiled patterns using factor graphs. ACM Transactions on Graphics (TOG), 32(1), 3. http://dx.doi.org/10.1145/2421636.2421639