DOI QR코드

DOI QR Code

A Dependency Graph-Based Keyphrase Extraction Method Using Anti-patterns

  • Batsuren, Khuyagbaatar (Doctoral School of Information and Communication Technology, University of Trento) ;
  • Batbaatar, Erdenebileg (Database and Bioinformatics Laboratory, School of Electrical and Computer Engineering, Chungbuk National University) ;
  • Munkhdalai, Tsendsuren (Dept. of Quantitative Health Sciences University, University of Massachusetts Medical School) ;
  • Li, Meijing (College of Information Engineering, Shanghai Maritime University) ;
  • Namsrai, Oyun-Erdene (School of Engineering and Applied Science, National University of Mongolia) ;
  • Ryu, Keun Ho (Database and Bioinformatics Laboratory, School of Electrical and Computer Engineering, Chungbuk National University)
  • Received : 2015.04.07
  • Accepted : 2017.02.09
  • Published : 2018.10.31

Abstract

Keyphrase extraction is one of fundamental natural language processing (NLP) tools to improve many text-mining applications such as document summarization and clustering. In this paper, we propose to use two novel techniques on the top of the state-of-the-art keyphrase extraction methods. First is the anti-patterns that aim to recognize non-keyphrase candidates. The state-of-the-art methods often used the rich feature set to identify keyphrases while those rich feature set cover only some of all keyphrases because keyphrases share very few similar patterns and stylistic features while non-keyphrase candidates often share many similar patterns and stylistic features. Second one is to use the dependency graph instead of the word co-occurrence graph that could not connect two words that are syntactically related and placed far from each other in a sentence while the dependency graph can do so. In experiments, we have compared the performances with different settings of the graphs (co-occurrence and dependency), and with the existing method results. Finally, we discovered that the combination method of dependency graph and anti-patterns outperform the state-of-the-art performances.

Keywords

References

  1. T. Munkhdalai, M. Li, K. Batsuren, H. A. Park, N. H. Choi, and K. H. Ryu, "Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations," Journal of Cheminformatics, vol. 7(Suppl 1), article no. S9, 2015.
  2. K. H. Ryu, M. Li, and I. Musa, "Biomedical text mining: an overview and an examplary application," in Proceedings of International Conference on Information and Convergence Technology for Smart Society (ICICTS), Bangkok, Thailand, 2015.
  3. T. Munkhdalai, M. Li, K. Batsuren, and K. H. Ryu, "Towards a unified named entity recognition system," on Proceedings of the International Joint Conference on Biomedical Engineering Systems and Technologies, Lisbon, Portugal, 2015, pp. 251-255.
  4. M. Li, T. Munkhdalai, X. Yu, and K. H. Ryu, "A novel approach for protein-named entity recognition and protein-protein interaction extraction," Mathematical Problems in Engineering, vol. 2015, article no. 942435, 2015.
  5. E. Batbaatar, T. Munkhdalai, A. Nasridinov, O. E. Namsrai, and K. H. Ryu, "Incorporating domain knowledge in chemical named entity recognition using deep learning," in Proceedings of 2016 International Conference on Information, System and Convergence Application, 2016.
  6. P. D. Turney, "Learning algorithms for keyphrase extraction," Information Retrieval, vol. 2, no. 4, pp. 303-336, 2000. https://doi.org/10.1023/A:1009976227802
  7. K. Sarkar, M. Nasipuri, and S. Ghose, "Machine learning based keyphrase extraction: comparing decision trees, naïve Bayes, and artificial neural networks," Journal of Information Processing Systems, vol. 8, no. 4, pp. 693-712, 2012. https://doi.org/10.3745/JIPS.2012.8.4.693
  8. J. Wang, H. Peng, and J. S. Hu, "Automatic keyphrases extraction from document using neural network," in Advances in Machine Learning and Cybernetics. Heidelberg: Springer, 2006, pp. 633-641.
  9. I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning, "KEA: practical automated keyphrase extraction," in Proceedings of the 4th ACM Conference on Digital Libraries, Berkeley, CA, 1999, pp. 254-255.
  10. O. Medelyan and I. H. Witten, "Thesaurus based automatic keyphrase indexing," in Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, Chapel Hill, NC, 2006, pp. 296-297.
  11. R. Mihalcea and P. Tarau, "Textrank: bringing order into text," in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 2004, pp. 404-411.
  12. S. N. Kim, O. Medelyan, M. Y. Kan, T. Baldwin, and L. P. Pingar, "SemEval-2010 Task 5: automatic keyphrase extraction from scientific," in Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden, 2010, pp. 21-26.
  13. J. Liu and J. Wang, "Keyword extraction using language network," in Proceedings of International Conference on Natural Language Processing and Knowledge Engineering, Beijing, China, 2007 pp. 129-134.
  14. X. Wan and J. Xiao, "Single document keyphrase extraction using neighborhood knowledge," in Proceedings of the 23rd AAAI Conference on Artificial Intelligence, Chicago, IL, 2008, pp. 855-860.
  15. Z. Liu, W. Huang, Y. Zheng, and M. Sun, "Automatic keyphrase extraction via topic decomposition, in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Cambridge, MA, 2010, pp. 366-376.
  16. D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet allocation," Journal of Machine Learning Research, vol. 3, pp. 993-1022, 2003.
  17. W. X. Zhao, J. Jiang, J. He, Y. Song, P. Achananuparp, E. P. Lim, and X. Li, "Topical keyphrase extraction from twitter," in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, 2011, pp. 379-388.
  18. A. Bellaachia and M. Al-Dhelaan, "Ne-rank: a novel graph-based keyphrase extraction in twitter," in Proceedings of the 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology, Macau, China, 2012, pp. 372-379.
  19. A. Hulth, "Improved automatic keyword extraction given more linguistic knowledge," in Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Sapporo, Japan, 2003, pp. 216-223.
  20. T. D. Nguyen and M. Y. Kan, "Keyphrase extraction in scientific publications," in Proceedings of International Conference on Asian Digital Libraries, Hanoi, Vietnam, 2007, pp. 317-326.
  21. J. Hipp, U. Guntzer, and G. Nakhaeizadeh, "Algorithms for association rule mining: a general survey and comparison," ACM SIGKDD Explorations Newsletter, vol. 2, no. 1, pp. 58-64, 2000. https://doi.org/10.1145/360402.360421
  22. M. Rafiqul Islam and M. Rakibu Islam, "An improved keyword extraction method using graph based random walk model," in Proceedings of the 11th International Conference on Computer and Information Technology, Khulna, Bangladesh, 2008, pp. 225-229). IEEE.
  23. K. S. Hasan and V. Ng, "Conundrums in unsupervised keyphrase extraction: making sense of the state-ofthe- art," in Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Beijing, China, 2010, pp. 365-373.
  24. Z. Zhu, M. Li, L. Chen, Z. Yang, and S. Chen, "Combination of unsupervised keyphrase extraction algorithms," in Proceedings of 2013 International Conference on Asian Language Processing (IALP), Urumqi, China, 2013, pp. 33-36.
  25. D. Klein and C. D. Manning, "Accurate unlexicalized parsing," in Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, Sapporo, Japan, 2003, pp. 423-430.
  26. D. Klein and C. D. Manning, "Fast exact inference with a factored model for natural language parsing," in Advances in Neural Information Processing Systems, vol. 15, pp. 3-10, 2003.
  27. De Marneffe, B. MacCartney, and C. D. Manning, "Generating typed dependency parses from phrase structure parses," Proceedings of LREC, vol. 6, pp. 449-454, 2006.
  28. S. Hassan and C. Banea, "Random walk term weighting for improved text classification," in Proceedings of 2006 Workshop on Graph-based Methods for Natural Language Processing, New York, NY, 2006, pp. 53-60.