A One-Size-Fits-All Indexing Method Does Not Exist: Automatic Selection Based on Meta-Learning

Jimeno-Yepes, Antonio;Mork, James G.;Demner-Fushman, Dina;Aronson, Alan R.;

doi:10.5626/JCSE.2012.6.2.151

Journal of Computing Science and Engineering

Volume 6 Issue 2
/
Pages.151-160
/
2012
/
1976-4677(pISSN)
/
2093-8020(eISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

DOI QR Code

A One-Size-Fits-All Indexing Method Does Not Exist: Automatic Selection Based on Meta-Learning

Jimeno-Yepes, Antonio (National Library of Medicine) ;
Mork, James G. (National Library of Medicine) ;
Demner-Fushman, Dina (National Library of Medicine) ;
Aronson, Alan R. (National Library of Medicine)

Received : 2012.02.18
Accepted : 2012.03.18
Published : 2012.06.30

https://doi.org/10.5626/JCSE.2012.6.2.151 Citation PDF KSCI KPUBS

Download PDF

⟨ Previous Next ⟩

Abstract

We present a methodology that automatically selects indexing algorithms for each heading in Medical Subject Headings (MeSH), National Library of Medicine's vocabulary for indexing MEDLINE. While manually comparing indexing methods is manageable with a limited number of MeSH headings, a large number of them make automation of this selection desirable. Results show that this process can be automated, based on previously indexed MEDLINE citations. We find that AdaBoostM1 is better suited to index a group of MeSH hedings named Check Tags, and helps improve the micro F-measure from 0.5385 to 0.7157, and the macro F-measure from 0.4123 to 0.5387 (both p < 0.01).

Keywords

References

US National Library of Medicine, Key MEDLINE indicators, http://www.nlm.nih.gov/bsd/bsd_key.html.
US National Library of Medicine, Medical Text Indexer (MTI), http://ii.nlm.nih.gov/mti.shtml.
A. R. Aronson, O. Bodenreider, H. F. Chang, S. M. Humphrey, J. G. Mork, S. J. Nelson, T. C. Rindflesch, and W. J. Wilbur, "The NLM Indexing Initiative," American Medical Informatics Association (AMIA) Annual Symposium Proceedings, 2000, pp. 17-21.
A. R. Aronson, J. G. Mork, C. W. Gay, S. M. Humphrey, and W. J. Rogers, "The NLM Indexing Initiative's medical text indexer," Proceedings of the 11th World Congress on Medical Informatics, San Francisco, CA, 2004, pp. 268-272.
A. R. Aronson and F. M. Lang, "An overview of MetaMap: historical perspective and recent advances," Journal of the American Medical Informatics Association, vol. 17, no. 3, pp. 229-236, 2010. https://doi.org/10.1136/jamia.2009.002733
J. Lin and W. J. Wilbur, "PubMed related articles: a probabilistic topic-based model for content similarity," BMC Bioinformatics, vol. 8, p. 423, 2007. https://doi.org/10.1186/1471-2105-8-423
K. W. Fung and O. Bodenreider, "Utilizing the UMLS for semantic mapping between terminologies," American Medical Informatics Association (AMIA) Annual Symposium Proceedings, 2005, pp. 266-270.
US National Library of Medicine, Principles of MEDLINE subject indexing, http://www.nlm.nih.gov/bsd/disted/mesh/indexprinc.html.
W. Hersh, C. Buckley, T. J. Leone, and D. Hickam, "OHSUMED: an interactive retrieval evaluation and new large test collection for research," Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, 1994, pp. 192-201.
D. D. Lewis, R. E. Schapire, J. P. Callan, and R. Papka, "Training algorithms for linear text classifiers," Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, 1996, pp. 298-306.
M. E. Ruiz and P. Srinivasan, "Hierarchical neural networks for text categorization (poster abstract)," Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, 1999, pp. 281-282.
M. Yetisgen-Yildiz and W. Pratt, "The effect of feature representation on MEDLINE document classification," American Medical Informatics Association (AMIA) Annual Symposium Proceedings, 2005, pp. 849-853.
G. L. Poulter, D. L. Rubin, R. B. Altman, and C. Seoighe, "MScanner: a classifier for retrieving Medline citations," BMC Bioinformatics, vol. 9, p. 108, 2008. https://doi.org/10.1186/1471-2105-9-108
P. Ruch, "Automatic assignment of biomedical categories: toward a generic approach," Bioinformatics, vol. 22, no. 6, pp. 658-664, 2006. https://doi.org/10.1093/bioinformatics/bti783
Y. Aphinyanaphongs, I. Tsamardinos, A. Statnikov, D. Hardin, and C. F. Aliferis, "Text categorization models for highquality article retrieval in internal medicine," Journal of the American Medical Informatics Association, vol. 12, no. 2, pp. 207-216, 2005.
D. Trieschnigg, P. Pezik, V. Lee, F. de Jong, W. Kraaij, and D. Rebholz-Schuhmann, "MeSH Up: effective MeSH text classification for improved document retrieval," Bioinformatics, vol. 25, no. 11, pp. 1412-1418, 2009. https://doi.org/10.1093/bioinformatics/btp249
A. Neveol, S. E. Shooshan, and V. Claveau, "Automatic inference of indexing rules for MEDLINE," BMC Bioinformatics, vol. 9, no. Suppl 11, p. S11, 2008.
A. Jimeno-Yepes, B. Wilkowski, J. G. Mork, E. van Lenten, D. Demner-Fushman, and A. R. Aronson, "A bottom-up approach to MEDLINE indexing recommendations," American Medical Informatics Association (AMIA) Annual Symposium Proceedings, 2011, pp. 1583-1592.
A. Jimeno-Yepes, J. G. Mork, B. Wilkowski, D. Demner- Fushman, A. R. Aronson, "MEDLINE MeSH indexing: lessons learned from machine learning and future directions," Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, Miami, FL, 2012, pp. 734-742.
R. Vilalta and Y. Drissi, "A perspective view and survey of meta-learning," Artificial Intelligence Review, vol. 18, no. 2, pp. 77-95, 2002. https://doi.org/10.1023/A:1019956318069
A. Kalousis, "Algorithm selection via meta-learning," Ph.D. dissertation, University of Geneva, Geneva, Switzerland, 2002.
P. R. Cohen, Empirical Methods for Artificial Intelligence, Cambridge, MA: MIT Press, 1995.
monq.jfa - Java Finite Automata, http://monqjfa.berlios.de/
C. D. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval, New York, NY: Cambridge University Press, 2008.
J. Rennie, L. Shih, J. Teevan, and D. Karger, "Tackling the poor assumptions of Naïve Bayes text classifiers," Proceedings of the 20th International Conference on Machine Learning, Washington, DC, 2003, pp. 616-623.
Y. Freund and R. E. Schapire, "Experiments with a new boosting algorithm," Proceedings of the 13th International Conference in Machine Learning, Bari, Italy, 1996, pp. 148-156.
J. R. Quinlan, "Induction of decision trees," Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
J. D. Kim, T. Ohta, S. Pyysalo, Y. Kano, and J. Tsujii, "Overview of BioNLP'09 shared task on event extraction," Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, Boulder, CO, 2009, pp. 1-9.
L. Hirschman, A. Yeh, C. Blaschke, and A. Valencia, "Overview of BioCreAtIvE: critical assessment of information extraction for biology," BMC Bioinformatics, vol. 6, no. Suppl 1, pp. S1, 2005.
US National Library of Medicine, MTI ML, http://ii.nlm.nih.gov/MTI_ML/index.shtml.

피인용 문헌

GeneRIF indexing: sentence selection based on machine learning vol.14, pp.1, 2013, https://doi.org/10.1186/1471-2105-14-171
MeSH indexing based on automatically generated summaries vol.14, pp.1, 2013, https://doi.org/10.1186/1471-2105-14-208
USI: a fast and accurate approach for conceptual document annotation vol.16, pp.1, 2015, https://doi.org/10.1186/s12859-015-0513-4
MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence vol.31, pp.12, 2015, https://doi.org/10.1093/bioinformatics/btv237
Using cited references to improve the retrieval of related biomedical documents vol.14, pp.1, 2013, https://doi.org/10.1186/1471-2105-14-113
Leveraging output term co-occurrence frequencies and latent associations in predicting medical subject headings vol.94, 2014, https://doi.org/10.1016/j.datak.2014.09.002
Fusion architectures for automatic subject indexing under concept drift pp.1432-1300, 2020, https://doi.org/10.1007/s00799-018-0240-3
Feature engineering for MEDLINE citation categorization with MeSH vol.16, pp.1, 2015, https://doi.org/10.1186/s12859-015-0539-7

Journal of Computing Science and Engineering

A One-Size-Fits-All Indexing Method Does Not Exist: Automatic Selection Based on Meta-Learning

Abstract

Keywords

References

피인용 문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)