A review of drug knowledge discovery using BioNLP and tensor or matrix decomposition

Gachloo, Mina;Wang, Yuxing;Xia, Jingbo;

doi:10.5808/GI.2019.17.2.e18

Genomics & Informatics

제17권2호
/
Pages.18.1-18.10
/
2019
/
1598-866X(pISSN)
/
2234-0742(eISSN)

한국유전체학회 (Korea Genome Organization)

DOI QR Code

A review of drug knowledge discovery using BioNLP and tensor or matrix decomposition

Gachloo, Mina (Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University) ;
Wang, Yuxing (Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University) ;
Xia, Jingbo (Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University)

투고 : 2019.02.23
심사 : 2019.05.30
발행 : 2019.06.30

https://doi.org/10.5808/GI.2019.17.2.e18 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Prediction of the relations among drug and other molecular or social entities is the main knowledge discovery pattern for the purpose of drug-related knowledge discovery. Computational approaches have combined the information from different sources and levels for drug-related knowledge discovery, which provides a sophisticated comprehension of the relationship among drugs, targets, diseases, and targeted genes, at the molecular level, or relationships among drugs, usage, side effect, safety, and user preference, at a social level. In this research, previous work from the BioNLP community and matrix or matrix decomposition was reviewed, compared, and concluded, and eventually, the BioNLP open-shared task was introduced as a promising case study representing this area.

키워드

참고문헌

Huang CC, Lu Z. Community challenges in biomedical text mining over 10 years: success, failure and the future. Brief Bioinform 2016;17:132-144. https://doi.org/10.1093/bib/bbv024
Medina-Franco JL, Giulianotti MA, Welmaker GS, Houghten RA. Shifting from the single to the multitarget paradigm in drug discovery. Drug Discov Today 2013;18:495-501. https://doi.org/10.1016/j.drudis.2013.01.008
Hopkins AL. Drug discovery: predicting promiscuity. Nature 2009;462:167-168. https://doi.org/10.1038/462167a
Taguchi YH. Identification of candidate drugs using tensor-decomposition-based unsupervised feature extraction in integrated analysis of gene expression between diseases and DrugMatrix datasets. Sci Rep 2017;7:13733. https://doi.org/10.1038/s41598-017-13003-0
Danishuddin, Khan AU. Descriptors and their selection methods in QSAR analysis: paradigm for drug design. Drug Discov Today 2016;21:1291-1302. https://doi.org/10.1016/j.drudis.2016.06.013
Ghasemi F, Mehridehnavi A, Perez-Garrido A, Perez-Sanchez H. Neural network and deep-learning algorithms used in QSAR studies: merits and drawbacks. Drug Discov Today 2018;23:1784-1790. https://doi.org/10.1016/j.drudis.2018.06.016
Zheng J, Yu H. Learning distributed word representations and applications in biomedial natural language processing. Language 1992;18:467-479.
Canese K. PubMed celebrates its 10th anniversary. NLM Tech Bull 2006;352:e5.
Yang HT, Ju JH, Wong YT, Shmulevich I, Chiang JH. Literature-based discovery of new candidates for drug repurposing. Brief Bioinform 2017;18:488-497.
Leaman R, Wei CH, Lu Z. tmChem: a high performance approach for chemical named entity recognition and normalization. J Cheminform 2015;7:S3. https://doi.org/10.1186/1758-2946-7-S1-S3
Leaman R, Islamaj Dogan R, Lu Z. DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 2013;29:2909-2917. https://doi.org/10.1093/bioinformatics/btt474
Wei CH, Kao HY, Lu Z. GNormPlus: an integrative approach for tagging genes, gene families, and protein domains. Biomed Res Int 2015;2015:918710. https://doi.org/10.1155/2015/918710
Wei CH, Harris BR, Kao HY, Lu Z. tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics 2013;29:1433-1439. https://doi.org/10.1093/bioinformatics/btt156
Habibi M, Weber L, Neves M, Wiegandt DL, Leser U. Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 2017;33:i37-i48. https://doi.org/10.1093/bioinformatics/btx228
Wei CH, Kao HY, Lu Z. PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res 2013;41:W518-W522. https://doi.org/10.1093/nar/gkt441
Percha B, Altman RB. A global network of biomedical relationships derived from text. Bioinformatics 2018;34:2614-2624. https://doi.org/10.1093/bioinformatics/bty114
Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 2005;33:D514-D517. https://doi.org/10.1093/nar/gki033
Wang ZY, Zhang HY. Rational drug repositioning by medical genetics. Nat Biotechnol 2013;31:1080-1082. https://doi.org/10.1038/nbt.2758
Zhang M, Luo H, Xi Z, Rogaeva E. Drug repositioning for diabetes based on 'omics' data mining. PLoS One 2015;10:e0126082. https://doi.org/10.1371/journal.pone.0126082
Bourgeois FT, Murthy S, Mandl KD. Outcome reporting among drug trials registered in ClinicalTrials.gov. Ann Intern Med 2010;153:158-166. https://doi.org/10.7326/0003-4819-153-3-201008030-00006
Su EW, Sanger TM. Systematic drug repositioning through mining adverse event data in ClinicalTrials.gov. PeerJ 2017;5:e3154. https://doi.org/10.7717/peerj.3154
Xu J, Lee HJ, Zeng J, Wu Y, Zhang Y, Huang LC, et al. Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov. J Am Med Inform Assoc 2016;23:750-757. https://doi.org/10.1093/jamia/ocw009
Banda JM, Callahan A, Winnenburg R, Strasberg HR, Cami A, Reis BY, et al. Feasibility of prioritizing drug-drug-event associations found in electronic health records. Drug Saf 2016;39:45-57. https://doi.org/10.1007/s40264-015-0352-2
Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 2010;26:1205-1210. https://doi.org/10.1093/bioinformatics/btq126
Barrett N, Weber-Jahnke JH. Applying natural language processing toolkits to electronic health records: an experience report. Stud Health Technol Inform 2009;143:441-446.
Dalianis H. Clinical Text Mining: Secondary Use of Electronic Patient Records. Cham: Springer, 2018. pp. 109-148.
Segura-Bedmar I, Martinez P, de Pablo-Sanchez C. Using a shallow linguistic kernel for drug-drug interaction extraction. J Biomed Inform 2011;44:789-804. https://doi.org/10.1016/j.jbi.2011.04.005
Segura-Bedmar I, Martinez P, Herrero Zazo M. Semeval-2013 task 9: extraction of drug-drug interactions from biomedical texts (ddiextraction 2013). In: Second Joint Conference on Lexical and Computational Semantics (* SEM), Vol. 2. Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Manandhar S, Yuret D, eds.), 2013 Jun, Atlanta, GA, USA. Stroudsburg: Association for Computational Linguistics, 2013. pp. 341-350.
Herrero-Zazo M, Segura-Bedmar I, Martinez P, Declerck T. The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions. J Biomed Inform 2013;46:914-920. https://doi.org/10.1016/j.jbi.2013.07.011
Bui QC, Sloot PM, van Mulligen EM, Kors JA. A novel feature-based approach to extract drug-drug interactions from biomedical text. Bioinformatics 2014;30:3365-3371. https://doi.org/10.1093/bioinformatics/btu557
Kim S, Liu H, Yeganova L, Wilbur WJ. Extracting drug-drug interactions from literature using a rich feature-based linear kernel approach. J Biomed Inform 2015;55:23-30. https://doi.org/10.1016/j.jbi.2015.03.002
Lee K, Lee S, Park S, Kim S, Kim S, Choi K, et al. BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations. Database (Oxford) 2016;2016:baw043. https://doi.org/10.1093/database/baw043
Lee K, Kim B, Choi Y, Kim S, Shin W, Lee S, et al. Deep learning of mutation-gene-drug relations from the literature. BMC Bioinformatics 2018;19:21. https://doi.org/10.1186/s12859-018-2029-1
Fang AC, Liu Y, Lu Y, Cao J, Xia J. A corpus-oriented perspective on terminologies of side effect and adverse reaction in support of text retrieval for drug repurposing. Int J Data Min Bioinform 2018;21:269-286. https://doi.org/10.1504/IJDMB.2018.097684
Demner-Fushman D, Shooshan SE, Rodriguez L, Aronson AR, Lang F, Rogers W, et al. A dataset of 200 structured product labels annotated for adverse drug reactions. Sci Data 2018;5:180001. https://doi.org/10.1038/sdata.2018.1
Roberts K, Demner-Fushman D, Tonning JM. Overview of the TAC 2017 adverse reaction extraction from drug labels track. In: Proceedings of the 2017 Text Analysis Conference, 2017 Nov 13-14, Gaithersburg, MD, USA. Gaithersburg: National Institute of Standards and Technology, 2017.
Abacha AB, Demner-Fushman D. A question-entailment approach to question answering. Ithaca: arXiv, Cornell University, 2019. Accessed 2019 May 10. Available from: https://arxiv.org/abs/1901.08079.
Wang Y, Yao X, Zhou K, Qin X, Kim JD, Cohen KB, et al. Guideline design of an active gene annotation corpus for the purpose of drug repurposing. In: 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) (Li W, Li Q, Wang L, eds.), 2018 Oct 13-15, Beijing, China. Piscataway: Institute of Electrical and Electronics Engineers, 2018. pp. 1-5.
Hanson CL, Cannon B, Burton S, Giraud-Carrier C. An exploration of social circles and prescription drug abuse through Twitter. J Med Internet Res 2013;15:e189. https://doi.org/10.2196/jmir.2741
Nikfarjam A, Sarker A, O'Connor K, Ginn R, Gonzalez G. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J Am Med Inform Assoc 2015;22:671-681. https://doi.org/10.1093/jamia/ocu041
Lardon J, Abdellaoui R, Bellet F, Asfari H, Souvignet J, Texier N, et al. Adverse drug reaction identification and extraction in social media: a scoping review. J Med Internet Res 2015;17:e171. https://doi.org/10.2196/jmir.4304
Bigeard E, Grabar N, Thiessard F. Detection and analysis of drug misuses: a study based on social media messages. Front Pharmacol 2018;9:791. https://doi.org/10.3389/fphar.2018.00791
Sinha MS, Freifeld CC, Brownstein JS, Donneyong MM, Rausch P, Lappin BM, et al. Social media impact of the Food and Drug Administration's drug safety communication messaging about zolpidem: mixed-methods analysis. JMIR Public Health Surveill 2018;4:e1. https://doi.org/10.2196/publichealth.7823
Li J, Zhu X, Chen JY. Building disease-specific drug-protein connectivity maps from molecular interaction networks and PubMed abstracts. PLoS Comput Biol 2009;5:e1000450. https://doi.org/10.1371/journal.pcbi.1000450
Zhang J, Jiang K, Lv L, Wang H, Shen Z, Gao Z, et al. Use of genome-wide association studies for cancer research and drug repositioning. PLoS One 2015;10:e0116477. https://doi.org/10.1371/journal.pone.0116477
Barupal DK, Gao B, Budczies J, Phinney BS, Perroud B, Denkert C, et al. Prioritization of metabolic genes as novel therapeutic targets in estrogen-receptor negative breast tumors using multi-omics data and text mining. Oncotarget 2019;10:3894-3809. https://doi.org/10.18632/oncotarget.26995
Long NP, Jung KH, Anh NH, Yan HH, Nghi TD, Park S, et al. An integrative data mining and omics-based translational model for the identification and validation of oncogenic biomarkers of pancreatic cancer. Cancers (Basel) 2019;11:E155. https://doi.org/10.3390/cancers11020155
Rabanser S, Shchur O, Gunnemann S. Introduction to tensor decompositions and their applications in machine learning. Ithaca: arXiv, Cornell University, 2017. Accessed 2019 May 10. Available from: https://arxiv.org/abs/1711.10781.
Wang YX, Zhang YJ. Nonnegative matrix factorization: a comprehensive review. IEEE Trans Knowl Data Eng 2013;25:1336-1353. https://doi.org/10.1109/TKDE.2012.51
Kolda TG, Bader BW. Tensor decompositions and applications. SIAM Rev 2009;51:455-500. https://doi.org/10.1137/07070111X
Nickel M, Tresp V, Kriegel HP. A three-way model for collective learning on multi-relational data. In: ICML'11 Proceedings of the 28th International Conference on International Conference on Machine Learning (Getoor L, Scheffer T, eds.), 2011 Jun 28-Jul 2, Bellevue, WA, USA. Madison: Omnipress, 2011. pp. 809-816.
Nimishakavi M, Saini US, Talukdar P. Relation schema induction using tensor factorization with side information. Ithaca: arXiv, Cornell University, 2016. Accessed 2019 May 10. Available from: https://arxiv.org/abs/1605.04227.
Nimishakavi M, Talukdar P. Higher-order relation schema in- duction using tensor factorization with back-off and aggregation. Ithaca: arXiv, Cornell University, 2017. Accessed 2019 May 10. Available from: https://arxiv.org/abs/1707.01917.
Hodos RA, Kidd BA, Shameer K, Readhead BP, Dudley JT. in silico methods for drug repurposing and pharmacology. Wiley Interdiscip Rev Syst Biol Med 2016;8:186-210. https://doi.org/10.1002/wsbm.1337
Zheng X, Ding H, Mamitsuka H, Zhu S. Collaborative matrix factorization with multiple similarities for predicting drug-target interactions. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Ghani R, Senator TE, Bradley P, Parekh R, He J, eds.), 2013 Aug 11-14, Chicago, IL, USA. New York: Association for Computing Machinery, 2013. pp. 1025-1033.
Liu Y, Wu M, Miao C, Zhao P, Li XL. Neighborhood regularized logistic matrix factorization for drug-target interaction prediction. PLoS Comput Biol 2016;12:e1004760. https://doi.org/10.1371/journal.pcbi.1004760
Zhang P, Wang F, Hu J. Towards drug repositioning: a unified computational framework for integrating multiple aspects of drug similarity and disease similarity. AMIA Annu Symp Proc 2014;2014:1258-1267.
Dai W, Liu X, Gao Y, Chen L, Song J, Chen D, et al. Matrix factorization-based prediction of novel drug indications by integrating genomic space. Comput Math Methods Med 2015;2015:275045. https://doi.org/10.1155/2015/275045
Hitchcock FL. The expression of a tensor or a polyadic as a sum of products. J Math Phys 1927;6:164-189. https://doi.org/10.1002/sapm192761164
Ho JC, Ghosh J, Steinhubl SR, Stewart WF, Denny JC, Malin BA, et al. Limestone: high-throughput candidate phenotype generation via tensor factorization. J Biomed Inform 2014;52:199-211. https://doi.org/10.1016/j.jbi.2014.07.001
Arany A, Simm J, Zakeri P, Haber T, Wegner JK, Chupakhin V, et al. Highly scalable tensor factorization for prediction of drug-protein interaction type. Ithaca: arXiv, Cornell University, 2015. Accessed 2019 May 10. Available from: https://arxiv.org/abs/1512.00315.
Khan SA, Leppaaho E, Kaski S. Bayesian multi-tensor factorization. Mach Learn 2016;105:233-253. https://doi.org/10.1007/s10994-016-5563-y
Taguchi YH. Identification of candidate drugs for heart failure using tensor decomposition-based unsupervised feature extraction applied to integrated analysis of gene expression between heart failure and DrugMatrix datasets. In: Intelligent Computing Theories and Application: 13th International Conference (ICIC 2017) (Huang DS, Bevelacqua V, Premaratne P, Gupta P, eds.), 2017 Aug 7-10, Liverpool, UK. Cham: Springer, 2017. pp. 517-528.
Wang L, Wang JL, Cheng ZL, Ran L, Yin Z. Personalized medicine recommendation based on tensor decomposition. Comput Sci 2015;42:225-229.
Zhou KY, Wang YX, Zhang S, Gachloo M, Kim JD, Luo Q, et al. GOF/LOF knowledge inference with tensor decomposition in support of high order link discovery for gene, mutation and disease. Math Biosci Eng 2019;16:1376-1391. https://doi.org/10.3934/mbe.2019067
Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 2006;34:D668-D672. https://doi.org/10.1093/nar/gkj067
Kim JD, Ohta T, Pyysalo S, Kano Y, Tsujii J. Overview of BioNLP'09 Shared Task on event extraction. In: Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task (Tsujii J, ed.), 2009 Jun, Boulder, CO. Stroudsburg: Association for Computational Linguistics, 2009. pp. 1-9.
Kim JD, Wang Y, Takagi T, Yonezawa A. Overview of genia event task in BioNLP Shared Task 2011. In: BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop (Tsujii J, Kim JD, Pyysalo S, eds.), 2011 Jun 24, Portland, OR. Stroudsburg: Association for Computational Linguistics, 2011. pp. 7-15.
Nedellec C, Bossy R, Kim JD, Kim JJ, Ohta T, Pyysalo S, et al. Overview of BioNLP Shared Task 2013. In: Proceedings of the BioNLP Shared Task 2013 Workshop (Nedellec C, Bossy R, Kim JD, Kim JJ, Ohta T, Pyysalo S, et al., eds.), 2013 Aug, Sofia, Bulgaria. Stroudsburg: Association for Computational Linguistics, 2013. pp. 1-7.
Deleger L, Bossy R, Chaix E, Ba M, Ferre A, Bessieres P, et al. Overview of the Bacteria Biotope Task at BioNLP Shared Task 2016. In: Proceedings of the 4th BioNLP Shared Task Workshop (Nedellec C, Bossy R, Kim R, Kim JD, eds.), 2016 Aug, Berlin, Germany. Stroudsburg: Association for Computational Linguistics, 2016. pp. 12-22.
Chaix L, Dubreucq B, Fatihi A, Valsamou D, Bossy R, Ba M, et al. Overview of the regulatory network of plant seed development (SeeDev) task at the BioNLP Shared Task 2016. In: Proceedings of the 4th BioNLP Shared Task Workshop (Nedellec C, Bossy R, Kim JD, eds.), 2016 Aug, Berlin, Germany. Stroudsburg: Association for Computational Linguistics, 2016. pp. 1-11.
Kim JD, Wang Y. Pubannotation: a persistent and sharable corpus and annotation repository. In: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing (Cohen KB, Demner-Fushman D, Ananiadou S, Webber B, Tsujii J, Pestian J, eds.), 2012 Jun, Montreal, Canada. Stroudsburg: Association for Computational Linguistics, 2012. pp. 202-205.

Genomics & Informatics

A review of drug knowledge discovery using BioNLP and tensor or matrix decomposition

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)