Inferring Undiscovered Public Knowledge by Using Text Mining Analysis and Main Path Analysis: The Case of the Gene-Protein 'brings_about' Chains of Pancreatic Cancer

텍스트마이닝과 주경로 분석을 이용한 미발견 공공 지식 추론 - 췌장암 유전자-단백질 유발사슬의 경우 -

  • 안혜림 (연세대학교 일반대학원 문헌정보학과) ;
  • 송민 (연세대학교 문헌정보학과) ;
  • 허고은 (연세대학교 일반대학원 문헌정보학과)
  • Received : 2015.02.16
  • Accepted : 2015.03.19
  • Published : 2015.03.30


This study aims to infer the gene-protein 'brings_about' chains of pancreatic cancer which were referred to in the pancreatic cancer related researches by constructing the gene-protein interaction network of pancreatic cancer. The chains can help us uncover publicly unknown knowledge that would develop as empirical studies for investigating the cause of pancreatic cancer. In this study, we applied a novel approach that grafts text mining and the main path analysis into Swanson's ABC model for expanding intermediate concepts to multi-levels and extracting the most significant path. We carried out text mining analysis on the full texts of the pancreatic cancer research papers published during the last ten-year period and extracted the gene-protein entities and relations. The 'brings_about' network was established with bio relations represented by bio verbs. We also applied main path analysis to the network. We found the main direct 'brings_about' path of pancreatic cancer which includes 14 nodes and 13 arcs. 9 arcs were confirmed as the actual relations emerged on the related researches while the other 4 arcs were arisen in the network transformation process for main path analysis. We believe that our approach to combining text mining analysis with main path analysis can be a useful tool for inferring undiscovered knowledge in the situation where either a starting or an ending point is unknown.


Literature Based Discovery;Undiscovered Public Knowledge;Text Mining;Main Path Analysis;Bio Network


Grant : 빅데이터기반 미래형 지식 정보서비스 사업팀


  1. 서울대학교병원 의학정보. 췌장암 [online]. [cited 2015.2.5]. .(Seoul National University Hospital Medical Information. Pancreatic Cancer [online]. [cited 2015. 2.5]. .)
  2. 허고은, 송민. 2014. 텍스트 마이닝 기반의 그래프 모델을 이용한 미발견 공공 지식 추론. 정보관리학회지, 31(1): 231-250.(Heo, Go Eun and Song Min. 2014. "Inferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model." Journal of the Korean Society for information Management, 31(1): 231-250.)
  3. Blagosklonny, M. V. and A. B. Pardee. 2002. "Unearthing the gems." Nature, 416(6879): 373-373.
  4. Cameron, D., O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, and T. C. Rindflesch. 2013. "A graph-based recovery and decomposition of swanson's hypothesis using semantic predications." Journal of Biomedical Informatics, 46(2): 238-251.
  5. De Nooy, W., A. Mrvar, and V. Batagelj. 2005. Exploratory Social Network Analysis with Pajek. Revised and Expanded Second Edition. New York, USA: Cambridge University Press.
  6. DiGiacomo, R. A., J. M. Kremer, and D. M. Shah. 1989. "Fish oil dietary supplementation in patients with Raynaud's phenomenon: A doubleblind, controlled, prospective study." American Journal of Medicine, 8: 158-164.
  7. Gustafsson, M., M. Hornquist, and A. Lombardi. 2005. "Constructing and analyzing a large-scale gene-to-gene regulatory network Lasso-constrained inference and biological validation." IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2(3): 254-261.
  8. Liu, J. S., and L. Y. Y. Lu. 2011. "An Integrated Approach for Main Path Analysis: Development of the Hirsch Index as an Example." Journal of the American Society for Information Science and Technology, 63(3): 528-542.
  9. Mattiazzi, M., T. Curk, I. Krizaj, B. Zupan, and U. Petrovic. 2010. "Inference of the Molecular Mechanism of Action from Genetic Interaction and Gene Expression Data." Omics-A Journal Of Integrative Biology, 14(4): 357-367.
  10. Natarajan, J., D. Berrar, W. Dubitzky, C. Hack, Y. Zhang, C. Desesa, J. R. Van Brocklyn, and E. G. Bremer. 2006. "Text mining of full-text journal articles combined with gene expression analysis reveals a relationship between sphingosine-1-phosphate and invasiveness of a glioblastoma cell line." BMC bioinformatics, 7: 373.
  11. NIH. Current Relations in the Semantic Network [online]. [cited 2015.2.15]. .
  12. Popa, O., E. Hazkani-Covo, G. Landan, W. Martin, and T. Dagan. 2011. "Directed networks reveal genomic barriers and DNA repair bypasses to lateral gene transfer among prokaryotes." Genome Research, 21(4): 599-609.
  13. SecondString Project. Class SoftTFIDF [online]. [cited 2015.2.15]. .
  14. Selga, E., C. Oleaga, S. Ramirez, M. C. de Almagro, V. Noe, and C. J. Ciudad. 2009. "Networking of differentially expressed genes in human cancer cells resistant to methotrexate." Genome Medicine, 1: 83.
  15. Swanson, D. R. 1986a. "Undiscovered public knowledge." The Library Quarterly, 56(2): 103-118.
  16. Swanson, D. R. 1986b. "Fish oil, Raynaud's syndrome, and undiscovered public knowledge." Perspectives in biology and medicine, 30(1): 7-18.
  17. Swanson, D. R. 1988. "Migraine and magnesium: eleven neglected connections." Perspectives in biology and medicine, 31(4): 526-557.
  18. Vinayagam, A., U. Stelzl, R. Foulle, S. Plassmann, M. Zenkner, J. Timm, H. E. Assmus, AM. A. ndrade-navarro, and E. E. Wanker. 2011. "A directed protein interaction network for investigating intracellular signal transduction." Science signaling, 4(189): rs8.
  19. Xiang-Yi He and Yao-Zong Yuan. 2014. "Advances in pancreatic cancer research: Moving towards early detection." World J Gastroenterol, 20(32): 11241-11248.
  20. Yan, E., Y. Ding, and C. R. Sugimoto. 2011. "P-Rank: An Indicator Measuring Prestige in Heterogeneous Scholarly Networks." Journal of the American Society for Information Science and Technology, 62(3): 467-477.