DOI QR코드

DOI QR Code

A One-Size-Fits-All Indexing Method Does Not Exist: Automatic Selection Based on Meta-Learning

  • Jimeno-Yepes, Antonio (National Library of Medicine) ;
  • Mork, James G. (National Library of Medicine) ;
  • Demner-Fushman, Dina (National Library of Medicine) ;
  • Aronson, Alan R. (National Library of Medicine)
  • Received : 2012.02.18
  • Accepted : 2012.03.18
  • Published : 2012.06.30

Abstract

We present a methodology that automatically selects indexing algorithms for each heading in Medical Subject Headings (MeSH), National Library of Medicine's vocabulary for indexing MEDLINE. While manually comparing indexing methods is manageable with a limited number of MeSH headings, a large number of them make automation of this selection desirable. Results show that this process can be automated, based on previously indexed MEDLINE citations. We find that AdaBoostM1 is better suited to index a group of MeSH hedings named Check Tags, and helps improve the micro F-measure from 0.5385 to 0.7157, and the macro F-measure from 0.4123 to 0.5387 (both p < 0.01).

Keywords

References

  1. US National Library of Medicine, Key MEDLINE indicators, http://www.nlm.nih.gov/bsd/bsd_key.html.
  2. US National Library of Medicine, Medical Text Indexer (MTI), http://ii.nlm.nih.gov/mti.shtml.
  3. A. R. Aronson, O. Bodenreider, H. F. Chang, S. M. Humphrey, J. G. Mork, S. J. Nelson, T. C. Rindflesch, and W. J. Wilbur, "The NLM Indexing Initiative," American Medical Informatics Association (AMIA) Annual Symposium Proceedings, 2000, pp. 17-21.
  4. A. R. Aronson, J. G. Mork, C. W. Gay, S. M. Humphrey, and W. J. Rogers, "The NLM Indexing Initiative's medical text indexer," Proceedings of the 11th World Congress on Medical Informatics, San Francisco, CA, 2004, pp. 268-272.
  5. A. R. Aronson and F. M. Lang, "An overview of MetaMap: historical perspective and recent advances," Journal of the American Medical Informatics Association, vol. 17, no. 3, pp. 229-236, 2010. https://doi.org/10.1136/jamia.2009.002733
  6. J. Lin and W. J. Wilbur, "PubMed related articles: a probabilistic topic-based model for content similarity," BMC Bioinformatics, vol. 8, p. 423, 2007. https://doi.org/10.1186/1471-2105-8-423
  7. K. W. Fung and O. Bodenreider, "Utilizing the UMLS for semantic mapping between terminologies," American Medical Informatics Association (AMIA) Annual Symposium Proceedings, 2005, pp. 266-270.
  8. US National Library of Medicine, Principles of MEDLINE subject indexing, http://www.nlm.nih.gov/bsd/disted/mesh/indexprinc.html.
  9. W. Hersh, C. Buckley, T. J. Leone, and D. Hickam, "OHSUMED: an interactive retrieval evaluation and new large test collection for research," Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, 1994, pp. 192-201.
  10. D. D. Lewis, R. E. Schapire, J. P. Callan, and R. Papka, "Training algorithms for linear text classifiers," Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, 1996, pp. 298-306.
  11. M. E. Ruiz and P. Srinivasan, "Hierarchical neural networks for text categorization (poster abstract)," Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, 1999, pp. 281-282.
  12. M. Yetisgen-Yildiz and W. Pratt, "The effect of feature representation on MEDLINE document classification," American Medical Informatics Association (AMIA) Annual Symposium Proceedings, 2005, pp. 849-853.
  13. G. L. Poulter, D. L. Rubin, R. B. Altman, and C. Seoighe, "MScanner: a classifier for retrieving Medline citations," BMC Bioinformatics, vol. 9, p. 108, 2008. https://doi.org/10.1186/1471-2105-9-108
  14. P. Ruch, "Automatic assignment of biomedical categories: toward a generic approach," Bioinformatics, vol. 22, no. 6, pp. 658-664, 2006. https://doi.org/10.1093/bioinformatics/bti783
  15. Y. Aphinyanaphongs, I. Tsamardinos, A. Statnikov, D. Hardin, and C. F. Aliferis, "Text categorization models for highquality article retrieval in internal medicine," Journal of the American Medical Informatics Association, vol. 12, no. 2, pp. 207-216, 2005.
  16. D. Trieschnigg, P. Pezik, V. Lee, F. de Jong, W. Kraaij, and D. Rebholz-Schuhmann, "MeSH Up: effective MeSH text classification for improved document retrieval," Bioinformatics, vol. 25, no. 11, pp. 1412-1418, 2009. https://doi.org/10.1093/bioinformatics/btp249
  17. A. Neveol, S. E. Shooshan, and V. Claveau, "Automatic inference of indexing rules for MEDLINE," BMC Bioinformatics, vol. 9, no. Suppl 11, p. S11, 2008.
  18. A. Jimeno-Yepes, B. Wilkowski, J. G. Mork, E. van Lenten, D. Demner-Fushman, and A. R. Aronson, "A bottom-up approach to MEDLINE indexing recommendations," American Medical Informatics Association (AMIA) Annual Symposium Proceedings, 2011, pp. 1583-1592.
  19. A. Jimeno-Yepes, J. G. Mork, B. Wilkowski, D. Demner- Fushman, A. R. Aronson, "MEDLINE MeSH indexing: lessons learned from machine learning and future directions," Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, Miami, FL, 2012, pp. 734-742.
  20. R. Vilalta and Y. Drissi, "A perspective view and survey of meta-learning," Artificial Intelligence Review, vol. 18, no. 2, pp. 77-95, 2002. https://doi.org/10.1023/A:1019956318069
  21. A. Kalousis, "Algorithm selection via meta-learning," Ph.D. dissertation, University of Geneva, Geneva, Switzerland, 2002.
  22. P. R. Cohen, Empirical Methods for Artificial Intelligence, Cambridge, MA: MIT Press, 1995.
  23. monq.jfa - Java Finite Automata, http://monqjfa.berlios.de/
  24. C. D. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval, New York, NY: Cambridge University Press, 2008.
  25. J. Rennie, L. Shih, J. Teevan, and D. Karger, "Tackling the poor assumptions of Naïve Bayes text classifiers," Proceedings of the 20th International Conference on Machine Learning, Washington, DC, 2003, pp. 616-623.
  26. Y. Freund and R. E. Schapire, "Experiments with a new boosting algorithm," Proceedings of the 13th International Conference in Machine Learning, Bari, Italy, 1996, pp. 148-156.
  27. J. R. Quinlan, "Induction of decision trees," Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
  28. J. D. Kim, T. Ohta, S. Pyysalo, Y. Kano, and J. Tsujii, "Overview of BioNLP'09 shared task on event extraction," Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, Boulder, CO, 2009, pp. 1-9.
  29. L. Hirschman, A. Yeh, C. Blaschke, and A. Valencia, "Overview of BioCreAtIvE: critical assessment of information extraction for biology," BMC Bioinformatics, vol. 6, no. Suppl 1, pp. S1, 2005.
  30. US National Library of Medicine, MTI ML, http://ii.nlm.nih.gov/MTI_ML/index.shtml.

Cited by

  1. GeneRIF indexing: sentence selection based on machine learning vol.14, pp.1, 2013, https://doi.org/10.1186/1471-2105-14-171
  2. MeSH indexing based on automatically generated summaries vol.14, pp.1, 2013, https://doi.org/10.1186/1471-2105-14-208
  3. USI: a fast and accurate approach for conceptual document annotation vol.16, pp.1, 2015, https://doi.org/10.1186/s12859-015-0513-4
  4. MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence vol.31, pp.12, 2015, https://doi.org/10.1093/bioinformatics/btv237
  5. Using cited references to improve the retrieval of related biomedical documents vol.14, pp.1, 2013, https://doi.org/10.1186/1471-2105-14-113
  6. Leveraging output term co-occurrence frequencies and latent associations in predicting medical subject headings vol.94, 2014, https://doi.org/10.1016/j.datak.2014.09.002
  7. Fusion architectures for automatic subject indexing under concept drift pp.1432-1300, 2020, https://doi.org/10.1007/s00799-018-0240-3
  8. Feature engineering for MEDLINE citation categorization with MeSH vol.16, pp.1, 2015, https://doi.org/10.1186/s12859-015-0539-7