DOI QR코드

DOI QR Code

Retrieval of Scholarly Articles with Similar Core Contents

  • Liu, Rey-Long (Department of Medical Informatics, Tzu Chi University)
  • Received : 2017.04.13
  • Accepted : 2017.08.01
  • Published : 2017.09.30

Abstract

Retrieval of scholarly articles about a specific research issue is a routine job of researchers to cross-validate the evidence about the issue. Two articles that focus on a research issue should share similar terms in their core contents, including their goals, backgrounds, and conclusions. In this paper, we present a technique CCSE ($\underline{C}ore$ $\underline{C}ontent$ $\underline{S}imilarity$ $\underline{E}stimation$) that, given an article a, recommends those articles that share similar core content terms with a. CCSE works on titles and abstracts of articles, which are publicly available. It estimates and integrates three kinds of similarity: goal similarity, background similarity, and conclusion similarity. Empirical evaluation shows that CCSE performs significantly better than several state-of-the-art techniques in recommending those biomedical articles that are judged (by domain experts) to be the ones whose core contents focus on the same research issues. CCSE works for those articles that present research background followed by main results and discussion, and hence it may be used to support the identification of the closely related evidence already published in these articles, even when only titles and abstracts of the articles are available.

Keywords

References

  1. Aljaber, B., Stokes, N., Bailey, J., & Pei, J. (2010). Document clustering of scientific texts using citation contexts. Information Retrieval, 13(2), 101-131. https://doi.org/10.1007/s10791-009-9108-x
  2. Becker, K. G., Barnes, K. C., Bright, T. J., & Wang, S. A. (2004). The Genetic Association Database. Nature Genetics, 36(5), 431-432. https://doi.org/10.1038/ng0504-431
  3. Boyack, K. W., & Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389-2404. https://doi.org/10.1002/asi.21419
  4. Boyack, K. W., Newman, D., Duhon, R. J., Klavans, R., Patek, M., Biberstine, J. R., et al. (2011). Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches. PLoS ONE, 6(3), e18029. https://doi.org/10.1371/journal.pone.0018029
  5. Boyack, K. W., Small, H., & Klavans, R. (2013). Improving the accuracy of co-citation clustering using full text. Journal of the American Society for Information Science and Technology, 64(9), 1759-1767. https://doi.org/10.1002/asi.22896
  6. Calado, P., Cristo, M., Moura, E., Ziviani, N., Ribeiro-Neto, B., & Goncalves, M. A. (2003). Combining Link-Based and Content-Based Methods for Web Document Classification. in Proc. of the 2003 ACM CIKM International Conference on Information and Knowledge Management (CIKM'03), New Orleans, Louisiana, USA.
  7. Couto, T., Cristo, M., Goncalves, M. A., Calado, P., Nivio Ziviani, N., Moura, E., et al. (2006). A Comparative Study of Citations and Links in Document Classification. in Proc. of the 6th ACM/IEEE-CS joint conference on Digital libraries, 75-84.
  8. Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., & Radev, D. (2008). Blind men and elephants: What do citation summaries tell us about a research article? Journal of the American Society for Information Science and Technology, 59(1), 51-62. https://doi.org/10.1002/asi.20707
  9. Gipp, B., & Beel, J. (2009). Citation Proximity Analysis (CPA) - A new approach for identifying related work based on Co-Citation Analysis. in Proc. of the 12th International Conference on Scientometrics and Informetrics, Brazil, 571-575.
  10. Gipp, B., & Meuschke, N. (2011). Citation Pattern Matching Algorithms for Citation-based Plagiarism Detection: Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence. in Proc. of 11th ACM Symposium on Document Engineering, Mountain View, CA, USA.
  11. Glenisson, P., Glanzel, W., Janssens, F., & De Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing and Management, 41, 1548-1572. https://doi.org/10.1016/j.ipm.2005.03.021
  12. Janssens, F., Glänzel, W., & De Moor, B. (2008). A hybrid mapping of information science. Scientometrics, 75(3), 607-631. https://doi.org/10.1007/s11192-007-2002-7
  13. Janssens, F., Zhang, L., & De Moor, B. (2009). Glanzel W. Hybrid clustering for validation and improvement of subject-classification schemes. Information Processing and Management, 45, 683-702. https://doi.org/10.1016/j.ipm.2009.06.003
  14. Joachims, T. (2002). Optimizing Search Engines using Clickthrough Data. In Proceedings of ACM SIGKDD, Edmonton, Alberta, Canada, 133-142.
  15. Kessler, M. M. (1963). Bibliographic coupling between scientific papers. American Documentation, 14(1), 10-25. https://doi.org/10.1002/asi.5090140103
  16. Kumar, S., P. Reddy K., Reddy, V. B., & Singh, A. (2011). Similarity Analysis of Legal Judgments. in Proc. of the Fourth Annual ACM Bangalore Conference (COMPUTE 2011), Bangalore, Karnataka, India.
  17. Landauer, T. K., Laham, D., & Derr, M. (2004). From paragraph to graph: Latent semantic analysis for information visualization. in Proceedings of the National Academy of Sciences of the USA, 101(Suppl 1), 5214-5219.
  18. Lin, J., & Wilbur, W. J. (2007). PubMed related articles: a probabilistic topic-based model for content similarity. BMC Bioinformatics, 8, 423. https://doi.org/10.1186/1471-2105-8-423
  19. Liu, R.-L. (2015). Passage-based Bibliographic Coupling: An Inter-Article Similarity Measure for Biomedical Articles. PLOS ONE, 10(10), e0139245. https://doi.org/10.1371/journal.pone.0139245
  20. Liu, R.-L. (2017). Identification of Biomedical Articles with Highly Related Core Contents. Proc. of the 9th Asian Conference on Intelligent Information and Database Systems, 217-226, Kanazawa, Japan.
  21. Liu, R.-L., & Huang, Y.-C. (2011). Ranker Enhancement for Proximity-based Ranking of Biomedical Texts. Journal of the American Society for Information Science and Technology, 62(12), 2479-2495. https://doi.org/10.1002/asi.21626
  22. Liu, S., Chen, C., Ding, K., Wang, B., Xu, K., & Lin, Y. (2014). Literature retrieval based on citation context. Scientometrics, 101(2), 1293-1307. https://doi.org/10.1007/s11192-014-1233-7
  23. Liu, X., Yu, S., Janssens, F., Glanzel, W., Moreau, Y., & De Moor, B. (2010). Weighted Hybrid Clustering by Combining Text Mining and Bibliometrics on a Large-Scale Journal Database. Journal of the American Society for Information Science and Technology, 61, 1105-1119.
  24. Liu, X., Zhang, J., & Guo, C. (2013). Full-text citation analysis: A new method to enhance scholarly networks. Journal of the American Society for Information Science and Technology, 64(9), 1852-1863. https://doi.org/10.1002/asi.22883
  25. Lu, Z., & Hirschman, L. (2012). Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II. Database, Vol. 2012, bas043.
  26. Nakov, P. I., Schwartz, A. S., & Hearst, M. (2004). Citances: Citation sentences for semantic analysis of bioscience text. in Proceedings of the SIGIR'04 workshop on search and discovery in bioinformatics, 81-88.
  27. PubMed. (2014). Computation of Related Citations. Retrieved from http://www.ncbi.nlm.nih.gov/books/NBK3827/#pubmedhelp.Computation_of_Similar_Articl (accessed November, 2014).
  28. Ritchie, A., Teufel, S., & Robertson, S. (2008). Using Terms from Citations for IR: Some First Results. in Advances in Information Retrieval, vol. 4956, Macdonald C, Ounis I, Plachouras V, Ruthven I, White R. (eds.), Springer, 211-221.
  29. Robertson, S. E., Walker, S., & Beaulieu, M. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive. in proceedings of the 7th Text REtrieval Conference (TREC 7), Gaithersburg, USA, 253-264, 1998.
  30. Small, H. (2011). Interpreting maps of science using citation context sentiments: a preliminary investigation. Scientometrics, 87(2), 373-388. https://doi.org/10.1007/s11192-011-0349-2
  31. Small, H. G. (1973). Co-citation in the scientific literature: A new measure of relationship between two documents. Journal of the American Society for Information Science, 24(4), 265-269. https://doi.org/10.1002/asi.4630240406
  32. Veloso, A., Almeida, H. M., Goncalves, M., & Meira Jr, W. (2008). Learning to Rank at Query-Time using Association Rules. In Proceedings of the 31rd annual international ACM SIGIR conference on research and development in information retrieval, Singapore, 267-274.
  33. Whissell, J. S., & Clarke, C. L. A. (2013). Effective Measures for Inter-Document Similarity. in proceedings of the 22nd ACM international conference on Conference on information & knowledge management (CIKM'13), 1361-1370.
  34. Wiegers, T. C., Davis, A. P., Cohen, K. B., Hirschman, L., & Mattingly, C. J. (2009). Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD). BMC Bioinformatics, 10, 326. https://doi.org/10.1186/1471-2105-10-326