Construction of Research Fronts Using Factor Graph Model in the Biomedical Literature

팩터그래프 모델을 이용한 연구전선 구축: 생의학 분야 문헌을 기반으로

  • 김혜진 (연세대학교 문헌정보학과) ;
  • 송민 (연세대학교 문헌정보학과)
  • Received : 2017.02.20
  • Accepted : 2017.03.06
  • Published : 2017.03.30


This study attempts to infer research fronts using factor graph model based on heterogeneous features. The model suggested by this study infers research fronts having documents with the potential to be cited multiple times in the future. To this end, the documents are represented by bibliographic, network, and content features. Bibliographic features contain bibliographic information such as the number of authors, the number of institutions to which the authors belong, proceedings, the number of keywords the authors provide, funds, the number of references, the number of pages, and the journal impact factor. Network features include degree centrality, betweenness, and closeness among the document network. Content features include keywords from the title and abstract using keyphrase extraction techniques. The model learns these features of a publication and infers whether the document would be an RF using sum-product algorithm and junction tree algorithm on a factor graph. We experimentally demonstrate that when predicting RFs, the FG predicted more densely connected documents than those predicted by RFs constructed using a traditional bibliometric approach. Our results also indicate that FG-predicted documents exhibit stronger degrees of centrality and betweenness among RFs.


Supported by : 한국연구재단


  1. 김조아, 이재윤 (2016). 인용 이미지 구축자 프로파일링을 이용한 국내 여성학 분야 연구 전선 분석. 정보관리학회지, 33(2), 201-225. (Kim, Jo-Ah, & Lee, Jae Yun (2016). Analyzing the research fronts of women's studies in Korea using citation image makers profiling. Journal of the Korean Society for Information Management, 33(2), 201-225. http://dx.doi.orf/10.3743/KOSIM.2016.33.2.201)
  2. 서은경, 유소영 (2013). 국내 정보학분야 연구동향 분석: 2000-2011. 정보관리학회지, 30(4), 215-239. (Eun-Gyoung, & Yu, So-Young (2013). Detecting research trends in Korean information science research, 2000-2011. Journal of the Korean Society for Information Management, 30(4), 215-239. http://dx.doi.orf/10.3743/KOSIM.2013.30.4.215)
  3. 이재윤 (2015). 문헌동시인용 분석을 통한 한국 문헌정보학의 연구 전선 파악. 정보관리학회지, 32(4), 77-106. (Lee, Jae Yun (2015). Identifying the research fronts in Korean library and information science 32(4), 77-106. http://dx.doi.orf/10.3743/KOSIM.2015.32.4.077) by document co-citation analysis. Journal of the Korean Society for Information Management,
  4. 조재인 (2011). 네트워크 텍스트 분석을 통한 문헌정보학 최근 연구 경향 분석. 정보관리학회지, 28(4), 65-83. (Cho, Jane (2011). A study for research area of library and information science by network text analysis. Journal of the Korean Society for Information Management, 28(4), 65-83.
  5. Bonacich, P. (2007). Some unique properties of eigenvector centrality. Social Networks, 29(4), 555-564.
  6. Boyack, K., & Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389-2404.
  7. Castillo, C., Donato, D., & Gionis, A. (2007). Estimating number of citations using author reputation. Proceedings of the String Processing and Information Retrieval, 107-117.
  8. e Solla Price, D. (1965). Networks of scientific papers. Science, 149(4683), 510-515.
  9. Frey, B. (1998). Graphical models for machine learning and digital communication. Cambridge, Mass: The MIT Press.
  10. Fu, L., & Aliferis, C. (2008). Models for predicting and explaining citation count of biomedical articles. Proceedings of the American Medical Informatics Association (AMIA), 1, 222-226.
  11. Jaffe, A., & Trajtenberg, M. (1996). Flows of knowledge from universities and federal laboratories: Modeling the flow of patent citations over time and across institutional and geographic boundaries. Proceedings of the National Academy of Sciences, 93(23), 12671-12677.
  12. Jarneving, B. (2007). Bibliographic coupling and its application to research-front and other core documents. Journal of Informetrics, 1(4), 287-307.
  13. Jordan, M. (2004). Graphical models. Statistical Science, 19(1), 140-155.
  14. Kollar, D., & Friedman, N. (2009). Probabilistic graphical models: Principles and techniques. Cambridge, Mass: The MIT Press.
  15. Kschischang, F., Frey, B., & Loeliger, H. (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2), 498-519.
  16. Loeliger, H. (2004). An introduction to factor graphs. IEEE Signal Processing Magazine, 21(1), 28-41.
  17. McCain, K., & Turner, K. (1989). Citation context analysis and aging patterns of journal articles in molecular genetics. Scientometrics, 17(1), 127-163.
  18. Narin, F., & Hamilton, K. (1996). Bibliometric performance measures. Scientometrics, 36(3), 293-310.
  19. Persson, O. (1994). The intellectual base and research fronts of JASIS 1986-1990. Journal of the American Society for Information Science, 45(1), 31-38.<31::AID-ASI4>3.0.CO;2-G
  20. Porta, M. (2014). A Dictionary of Epidemiology (6th ed.). New York: Oxford University Press.
  21. Shen, H., & Coughlan, J. (2007). Grouping using factor graphs: An approach for finding text with a camera phone. Graph-Based Representations in Pattern Recognition, 4538, 394-403.
  22. Shibata, N., Kajikawa, Y., & Matsushima, K. (2007). Topological analysis of citation networks to discover the future core articles. Journal of the American Society for Information Science and Technology, 58(6), 872-882.
  23. Shibata, N., Kajikawa, Y., Takeda, Y., & Matsushima, K. (2009). Comparative study on methods of detecting research fronts using different types of citation. Journal of the American Society for Information Science and Technology, 60(3), 571-580.
  24. Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for information Science, 24(4), 265-269.
  25. Small, H., Sweeney, E., & Greenlee, E. (1985). Clustering the Science Citation Index using co-citations. II. Mapping science. Scientometrics, 8(5), 321-340.
  26. Song, M., & Kim, S. Y. (2013). Detecting the knowledge structure of bioinformatics by mining full-text collections. Scientometrics, 96(1), 183-201.
  27. Sun, Y., Deng, H., & Han, J. (2012). Probabilistic models for text mining. Mining Text Data, 259-295.
  28. Sutton, C., & McCallum, A. (2007). An introduction to conditional random fields for relational learning. Introduction to Statistical Relational Learning, 93, 142-146.
  29. Upham, S., & Small, H. (2010). Emerging research fronts in science and technology: Patterns of new knowledge development. Scientometrics, 83(1), 15-38.
  30. Van Eck, N., Waltman, L., Dekker, R., & Van Den Berg, J. (2010). A comparison of two techniques for bibliometric mapping: Multidimensional scaling and VOS. Journal of the American Society for Information Science and Technology, 61(12), 2405-2416.
  31. Weiss, Y., & Freeman, W. (2001). On the optimality of solutions of the max-product beliefpropagation algorithm in arbitrary graphs. IEEE Transactions on Information Theory, 47(2), 736-744.
  32. Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C., & Nevill-Manning, C. G. (1999). KEA: Practical automatic keyphrase extraction. Proceedings of the 4th ACM conference on Digital Libraries, 254-255.
  33. Yeh, Y., Breeden, K., Yang, L., Fisher, M., & Hanrahan, P. (2013). Synthesis of tiled patterns using factor graphs. ACM Transactions on Graphics (TOG), 32(1), 3.