MediScore: MEDLINE-based Interactive Scoring of Gene and Disease Associations

  • Cho, Hye-Young (Division of Epidemiology and Bioinformatics, National Genome Research Institute, National Institute of Health) ;
  • Oh, Bermseok (Division of Epidemiology and Bioinformatics, National Genome Research Institute, National Institute of Health) ;
  • Lee, Jong-Keuk (Division of Epidemiology and Bioinformatics, National Genome Research Institute, National Institute of Health) ;
  • Kim, Kuchan (Division of Epidemiology and Bioinformatics, National Genome Research Institute, National Institute of Health) ;
  • Koh, InSong (Division of Epidemiology and Bioinformatics, National Genome Research Institute, National Institute of Health)
  • Published : 2004.09.01

Abstract

MediScore is an information retrieval system, which helps to search for the set of genes associated with a specific disease or the set of diseases associated with a specific gene. Despite recent improvement of natural language processing (NLP) and other text mining approaches to search for disease associated genes, many false positive results come out due to diversity of exceptional cases as well as ambiguities in gene names. In order to overcome the weak points of current text mining approaches, MediScore introduces statistical normalization based on binomial to normal distribution approximation which corrects inaccurate scores caused by common words not representing genes and interactive rescoring by the user to remove the false positive results. Interactive rescoring includes individual alias scoring for each gene to remove false gene synonyms, referring MEDLINE abstracts, and cross referencing between OMIM and other related information.

Keywords

References

  1. Chaussabel, D. and Sher A. (2002). Mining microarray expression data by literature profiling. Genome BioI. 3(10):RESEARCH0055
  2. Friedman, C., Kra, P., Yu, H., Krauthammer, M., and Rzhetscky, A. (2001). GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17, Suppl. 1, S74-S82 https://doi.org/10.1093/bioinformatics/17.suppl_1.S74
  3. Hu, Y., Hines, L.M., Weng, H., Zuo, D., Rivera, M., Richardson, A, and LaBaer, J. (2003). Analysis of genomic and proteomic data using advanced literature mining. J. Proteome Res. 2(4), 405-412 https://doi.org/10.1021/pr0340227
  4. Marcotte, E.M., Xenarios, I., and Eisenberg, D. (2001). Mining literature for protein-protein interactions. Bioinformatics 17(4),359-63 https://doi.org/10.1093/bioinformatics/17.4.359
  5. Perez-Iratxeta, C., Bork, P., and Andrade M.A. (2002). Association of genes to genetically inherited diseases using data mining. Nature Genetics. 31, 316-319 https://doi.org/10.1038/ng895