DOI QR코드

DOI QR Code

A review of drug knowledge discovery using BioNLP and tensor or matrix decomposition

  • Gachloo, Mina (Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University) ;
  • Wang, Yuxing (Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University) ;
  • Xia, Jingbo (Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University)
  • 투고 : 2019.02.23
  • 심사 : 2019.05.30
  • 발행 : 2019.06.30

초록

Prediction of the relations among drug and other molecular or social entities is the main knowledge discovery pattern for the purpose of drug-related knowledge discovery. Computational approaches have combined the information from different sources and levels for drug-related knowledge discovery, which provides a sophisticated comprehension of the relationship among drugs, targets, diseases, and targeted genes, at the molecular level, or relationships among drugs, usage, side effect, safety, and user preference, at a social level. In this research, previous work from the BioNLP community and matrix or matrix decomposition was reviewed, compared, and concluded, and eventually, the BioNLP open-shared task was introduced as a promising case study representing this area.

키워드

참고문헌

  1. Huang CC, Lu Z. Community challenges in biomedical text mining over 10 years: success, failure and the future. Brief Bioinform 2016;17:132-144. https://doi.org/10.1093/bib/bbv024
  2. Medina-Franco JL, Giulianotti MA, Welmaker GS, Houghten RA. Shifting from the single to the multitarget paradigm in drug discovery. Drug Discov Today 2013;18:495-501. https://doi.org/10.1016/j.drudis.2013.01.008
  3. Hopkins AL. Drug discovery: predicting promiscuity. Nature 2009;462:167-168. https://doi.org/10.1038/462167a
  4. Taguchi YH. Identification of candidate drugs using tensor-decomposition-based unsupervised feature extraction in integrated analysis of gene expression between diseases and DrugMatrix datasets. Sci Rep 2017;7:13733. https://doi.org/10.1038/s41598-017-13003-0
  5. Danishuddin, Khan AU. Descriptors and their selection methods in QSAR analysis: paradigm for drug design. Drug Discov Today 2016;21:1291-1302. https://doi.org/10.1016/j.drudis.2016.06.013
  6. Ghasemi F, Mehridehnavi A, Perez-Garrido A, Perez-Sanchez H. Neural network and deep-learning algorithms used in QSAR studies: merits and drawbacks. Drug Discov Today 2018;23:1784-1790. https://doi.org/10.1016/j.drudis.2018.06.016
  7. Zheng J, Yu H. Learning distributed word representations and applications in biomedial natural language processing. Language 1992;18:467-479.
  8. Canese K. PubMed celebrates its 10th anniversary. NLM Tech Bull 2006;352:e5.
  9. Yang HT, Ju JH, Wong YT, Shmulevich I, Chiang JH. Literature-based discovery of new candidates for drug repurposing. Brief Bioinform 2017;18:488-497.
  10. Leaman R, Wei CH, Lu Z. tmChem: a high performance approach for chemical named entity recognition and normalization. J Cheminform 2015;7:S3. https://doi.org/10.1186/1758-2946-7-S1-S3
  11. Leaman R, Islamaj Dogan R, Lu Z. DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 2013;29:2909-2917. https://doi.org/10.1093/bioinformatics/btt474
  12. Wei CH, Kao HY, Lu Z. GNormPlus: an integrative approach for tagging genes, gene families, and protein domains. Biomed Res Int 2015;2015:918710. https://doi.org/10.1155/2015/918710
  13. Wei CH, Harris BR, Kao HY, Lu Z. tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics 2013;29:1433-1439. https://doi.org/10.1093/bioinformatics/btt156
  14. Habibi M, Weber L, Neves M, Wiegandt DL, Leser U. Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 2017;33:i37-i48. https://doi.org/10.1093/bioinformatics/btx228
  15. Wei CH, Kao HY, Lu Z. PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res 2013;41:W518-W522. https://doi.org/10.1093/nar/gkt441
  16. Percha B, Altman RB. A global network of biomedical relationships derived from text. Bioinformatics 2018;34:2614-2624. https://doi.org/10.1093/bioinformatics/bty114
  17. Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 2005;33:D514-D517. https://doi.org/10.1093/nar/gki033
  18. Wang ZY, Zhang HY. Rational drug repositioning by medical genetics. Nat Biotechnol 2013;31:1080-1082. https://doi.org/10.1038/nbt.2758
  19. Zhang M, Luo H, Xi Z, Rogaeva E. Drug repositioning for diabetes based on 'omics' data mining. PLoS One 2015;10:e0126082. https://doi.org/10.1371/journal.pone.0126082
  20. Bourgeois FT, Murthy S, Mandl KD. Outcome reporting among drug trials registered in ClinicalTrials.gov. Ann Intern Med 2010;153:158-166. https://doi.org/10.7326/0003-4819-153-3-201008030-00006
  21. Su EW, Sanger TM. Systematic drug repositioning through mining adverse event data in ClinicalTrials.gov. PeerJ 2017;5:e3154. https://doi.org/10.7717/peerj.3154
  22. Xu J, Lee HJ, Zeng J, Wu Y, Zhang Y, Huang LC, et al. Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov. J Am Med Inform Assoc 2016;23:750-757. https://doi.org/10.1093/jamia/ocw009
  23. Banda JM, Callahan A, Winnenburg R, Strasberg HR, Cami A, Reis BY, et al. Feasibility of prioritizing drug-drug-event associations found in electronic health records. Drug Saf 2016;39:45-57. https://doi.org/10.1007/s40264-015-0352-2
  24. Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 2010;26:1205-1210. https://doi.org/10.1093/bioinformatics/btq126
  25. Barrett N, Weber-Jahnke JH. Applying natural language processing toolkits to electronic health records: an experience report. Stud Health Technol Inform 2009;143:441-446.
  26. Dalianis H. Clinical Text Mining: Secondary Use of Electronic Patient Records. Cham: Springer, 2018. pp. 109-148.
  27. Segura-Bedmar I, Martinez P, de Pablo-Sanchez C. Using a shallow linguistic kernel for drug-drug interaction extraction. J Biomed Inform 2011;44:789-804. https://doi.org/10.1016/j.jbi.2011.04.005
  28. Segura-Bedmar I, Martinez P, Herrero Zazo M. Semeval-2013 task 9: extraction of drug-drug interactions from biomedical texts (ddiextraction 2013). In: Second Joint Conference on Lexical and Computational Semantics (* SEM), Vol. 2. Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Manandhar S, Yuret D, eds.), 2013 Jun, Atlanta, GA, USA. Stroudsburg: Association for Computational Linguistics, 2013. pp. 341-350.
  29. Herrero-Zazo M, Segura-Bedmar I, Martinez P, Declerck T. The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions. J Biomed Inform 2013;46:914-920. https://doi.org/10.1016/j.jbi.2013.07.011
  30. Bui QC, Sloot PM, van Mulligen EM, Kors JA. A novel feature-based approach to extract drug-drug interactions from biomedical text. Bioinformatics 2014;30:3365-3371. https://doi.org/10.1093/bioinformatics/btu557
  31. Kim S, Liu H, Yeganova L, Wilbur WJ. Extracting drug-drug interactions from literature using a rich feature-based linear kernel approach. J Biomed Inform 2015;55:23-30. https://doi.org/10.1016/j.jbi.2015.03.002
  32. Lee K, Lee S, Park S, Kim S, Kim S, Choi K, et al. BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations. Database (Oxford) 2016;2016:baw043. https://doi.org/10.1093/database/baw043
  33. Lee K, Kim B, Choi Y, Kim S, Shin W, Lee S, et al. Deep learning of mutation-gene-drug relations from the literature. BMC Bioinformatics 2018;19:21. https://doi.org/10.1186/s12859-018-2029-1
  34. Fang AC, Liu Y, Lu Y, Cao J, Xia J. A corpus-oriented perspective on terminologies of side effect and adverse reaction in support of text retrieval for drug repurposing. Int J Data Min Bioinform 2018;21:269-286. https://doi.org/10.1504/IJDMB.2018.097684
  35. Demner-Fushman D, Shooshan SE, Rodriguez L, Aronson AR, Lang F, Rogers W, et al. A dataset of 200 structured product labels annotated for adverse drug reactions. Sci Data 2018;5:180001. https://doi.org/10.1038/sdata.2018.1
  36. Roberts K, Demner-Fushman D, Tonning JM. Overview of the TAC 2017 adverse reaction extraction from drug labels track. In: Proceedings of the 2017 Text Analysis Conference, 2017 Nov 13-14, Gaithersburg, MD, USA. Gaithersburg: National Institute of Standards and Technology, 2017.
  37. Abacha AB, Demner-Fushman D. A question-entailment approach to question answering. Ithaca: arXiv, Cornell University, 2019. Accessed 2019 May 10. Available from: https://arxiv.org/abs/1901.08079.
  38. Wang Y, Yao X, Zhou K, Qin X, Kim JD, Cohen KB, et al. Guideline design of an active gene annotation corpus for the purpose of drug repurposing. In: 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) (Li W, Li Q, Wang L, eds.), 2018 Oct 13-15, Beijing, China. Piscataway: Institute of Electrical and Electronics Engineers, 2018. pp. 1-5.
  39. Hanson CL, Cannon B, Burton S, Giraud-Carrier C. An exploration of social circles and prescription drug abuse through Twitter. J Med Internet Res 2013;15:e189. https://doi.org/10.2196/jmir.2741
  40. Nikfarjam A, Sarker A, O'Connor K, Ginn R, Gonzalez G. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J Am Med Inform Assoc 2015;22:671-681. https://doi.org/10.1093/jamia/ocu041
  41. Lardon J, Abdellaoui R, Bellet F, Asfari H, Souvignet J, Texier N, et al. Adverse drug reaction identification and extraction in social media: a scoping review. J Med Internet Res 2015;17:e171. https://doi.org/10.2196/jmir.4304
  42. Bigeard E, Grabar N, Thiessard F. Detection and analysis of drug misuses: a study based on social media messages. Front Pharmacol 2018;9:791. https://doi.org/10.3389/fphar.2018.00791
  43. Sinha MS, Freifeld CC, Brownstein JS, Donneyong MM, Rausch P, Lappin BM, et al. Social media impact of the Food and Drug Administration's drug safety communication messaging about zolpidem: mixed-methods analysis. JMIR Public Health Surveill 2018;4:e1. https://doi.org/10.2196/publichealth.7823
  44. Li J, Zhu X, Chen JY. Building disease-specific drug-protein connectivity maps from molecular interaction networks and PubMed abstracts. PLoS Comput Biol 2009;5:e1000450. https://doi.org/10.1371/journal.pcbi.1000450
  45. Zhang J, Jiang K, Lv L, Wang H, Shen Z, Gao Z, et al. Use of genome-wide association studies for cancer research and drug repositioning. PLoS One 2015;10:e0116477. https://doi.org/10.1371/journal.pone.0116477
  46. Barupal DK, Gao B, Budczies J, Phinney BS, Perroud B, Denkert C, et al. Prioritization of metabolic genes as novel therapeutic targets in estrogen-receptor negative breast tumors using multi-omics data and text mining. Oncotarget 2019;10:3894-3809. https://doi.org/10.18632/oncotarget.26995
  47. Long NP, Jung KH, Anh NH, Yan HH, Nghi TD, Park S, et al. An integrative data mining and omics-based translational model for the identification and validation of oncogenic biomarkers of pancreatic cancer. Cancers (Basel) 2019;11:E155. https://doi.org/10.3390/cancers11020155
  48. Rabanser S, Shchur O, Gunnemann S. Introduction to tensor decompositions and their applications in machine learning. Ithaca: arXiv, Cornell University, 2017. Accessed 2019 May 10. Available from: https://arxiv.org/abs/1711.10781.
  49. Wang YX, Zhang YJ. Nonnegative matrix factorization: a comprehensive review. IEEE Trans Knowl Data Eng 2013;25:1336-1353. https://doi.org/10.1109/TKDE.2012.51
  50. Kolda TG, Bader BW. Tensor decompositions and applications. SIAM Rev 2009;51:455-500. https://doi.org/10.1137/07070111X
  51. Nickel M, Tresp V, Kriegel HP. A three-way model for collective learning on multi-relational data. In: ICML'11 Proceedings of the 28th International Conference on International Conference on Machine Learning (Getoor L, Scheffer T, eds.), 2011 Jun 28-Jul 2, Bellevue, WA, USA. Madison: Omnipress, 2011. pp. 809-816.
  52. Nimishakavi M, Saini US, Talukdar P. Relation schema induction using tensor factorization with side information. Ithaca: arXiv, Cornell University, 2016. Accessed 2019 May 10. Available from: https://arxiv.org/abs/1605.04227.
  53. Nimishakavi M, Talukdar P. Higher-order relation schema in- duction using tensor factorization with back-off and aggregation. Ithaca: arXiv, Cornell University, 2017. Accessed 2019 May 10. Available from: https://arxiv.org/abs/1707.01917.
  54. Hodos RA, Kidd BA, Shameer K, Readhead BP, Dudley JT. in silico methods for drug repurposing and pharmacology. Wiley Interdiscip Rev Syst Biol Med 2016;8:186-210. https://doi.org/10.1002/wsbm.1337
  55. Zheng X, Ding H, Mamitsuka H, Zhu S. Collaborative matrix factorization with multiple similarities for predicting drug-target interactions. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Ghani R, Senator TE, Bradley P, Parekh R, He J, eds.), 2013 Aug 11-14, Chicago, IL, USA. New York: Association for Computing Machinery, 2013. pp. 1025-1033.
  56. Liu Y, Wu M, Miao C, Zhao P, Li XL. Neighborhood regularized logistic matrix factorization for drug-target interaction prediction. PLoS Comput Biol 2016;12:e1004760. https://doi.org/10.1371/journal.pcbi.1004760
  57. Zhang P, Wang F, Hu J. Towards drug repositioning: a unified computational framework for integrating multiple aspects of drug similarity and disease similarity. AMIA Annu Symp Proc 2014;2014:1258-1267.
  58. Dai W, Liu X, Gao Y, Chen L, Song J, Chen D, et al. Matrix factorization-based prediction of novel drug indications by integrating genomic space. Comput Math Methods Med 2015;2015:275045. https://doi.org/10.1155/2015/275045
  59. Hitchcock FL. The expression of a tensor or a polyadic as a sum of products. J Math Phys 1927;6:164-189. https://doi.org/10.1002/sapm192761164
  60. Ho JC, Ghosh J, Steinhubl SR, Stewart WF, Denny JC, Malin BA, et al. Limestone: high-throughput candidate phenotype generation via tensor factorization. J Biomed Inform 2014;52:199-211. https://doi.org/10.1016/j.jbi.2014.07.001
  61. Arany A, Simm J, Zakeri P, Haber T, Wegner JK, Chupakhin V, et al. Highly scalable tensor factorization for prediction of drug-protein interaction type. Ithaca: arXiv, Cornell University, 2015. Accessed 2019 May 10. Available from: https://arxiv.org/abs/1512.00315.
  62. Khan SA, Leppaaho E, Kaski S. Bayesian multi-tensor factorization. Mach Learn 2016;105:233-253. https://doi.org/10.1007/s10994-016-5563-y
  63. Taguchi YH. Identification of candidate drugs for heart failure using tensor decomposition-based unsupervised feature extraction applied to integrated analysis of gene expression between heart failure and DrugMatrix datasets. In: Intelligent Computing Theories and Application: 13th International Conference (ICIC 2017) (Huang DS, Bevelacqua V, Premaratne P, Gupta P, eds.), 2017 Aug 7-10, Liverpool, UK. Cham: Springer, 2017. pp. 517-528.
  64. Wang L, Wang JL, Cheng ZL, Ran L, Yin Z. Personalized medicine recommendation based on tensor decomposition. Comput Sci 2015;42:225-229.
  65. Zhou KY, Wang YX, Zhang S, Gachloo M, Kim JD, Luo Q, et al. GOF/LOF knowledge inference with tensor decomposition in support of high order link discovery for gene, mutation and disease. Math Biosci Eng 2019;16:1376-1391. https://doi.org/10.3934/mbe.2019067
  66. Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 2006;34:D668-D672. https://doi.org/10.1093/nar/gkj067
  67. Kim JD, Ohta T, Pyysalo S, Kano Y, Tsujii J. Overview of BioNLP'09 Shared Task on event extraction. In: Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task (Tsujii J, ed.), 2009 Jun, Boulder, CO. Stroudsburg: Association for Computational Linguistics, 2009. pp. 1-9.
  68. Kim JD, Wang Y, Takagi T, Yonezawa A. Overview of genia event task in BioNLP Shared Task 2011. In: BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop (Tsujii J, Kim JD, Pyysalo S, eds.), 2011 Jun 24, Portland, OR. Stroudsburg: Association for Computational Linguistics, 2011. pp. 7-15.
  69. Nedellec C, Bossy R, Kim JD, Kim JJ, Ohta T, Pyysalo S, et al. Overview of BioNLP Shared Task 2013. In: Proceedings of the BioNLP Shared Task 2013 Workshop (Nedellec C, Bossy R, Kim JD, Kim JJ, Ohta T, Pyysalo S, et al., eds.), 2013 Aug, Sofia, Bulgaria. Stroudsburg: Association for Computational Linguistics, 2013. pp. 1-7.
  70. Deleger L, Bossy R, Chaix E, Ba M, Ferre A, Bessieres P, et al. Overview of the Bacteria Biotope Task at BioNLP Shared Task 2016. In: Proceedings of the 4th BioNLP Shared Task Workshop (Nedellec C, Bossy R, Kim R, Kim JD, eds.), 2016 Aug, Berlin, Germany. Stroudsburg: Association for Computational Linguistics, 2016. pp. 12-22.
  71. Chaix L, Dubreucq B, Fatihi A, Valsamou D, Bossy R, Ba M, et al. Overview of the regulatory network of plant seed development (SeeDev) task at the BioNLP Shared Task 2016. In: Proceedings of the 4th BioNLP Shared Task Workshop (Nedellec C, Bossy R, Kim JD, eds.), 2016 Aug, Berlin, Germany. Stroudsburg: Association for Computational Linguistics, 2016. pp. 1-11.
  72. Kim JD, Wang Y. Pubannotation: a persistent and sharable corpus and annotation repository. In: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing (Cohen KB, Demner-Fushman D, Ananiadou S, Webber B, Tsujii J, Pestian J, eds.), 2012 Jun, Montreal, Canada. Stroudsburg: Association for Computational Linguistics, 2012. pp. 202-205.