DOI QR코드

DOI QR Code

Automatic In-Text Keyword Tagging based on Information Retrieval

  • Kim, Jin-Suk (Department of Information Technology Research, Knowledge Information Center, Korea Institute of Science & Technology Information (KISTI)) ;
  • Jin, Du-Seok (Department of Information Technology Research, Knowledge Information Center, Korea Institute of Science & Technology Information (KISTI)) ;
  • Kim, Kwang-Young (Department of Information Technology Research, Knowledge Information Center, Korea Institute of Science & Technology Information (KISTI)) ;
  • Choe, Ho-Seop (Department of Information Technology Research, Knowledge Information Center, Korea Institute of Science & Technology Information (KISTI))
  • Published : 2009.09.30

Abstract

As shown in Wikipedia, tagging or cross-linking through major keywords in a document collection improves not only the readability of documents but also responsive and adaptive navigation among related documents. In recent years, the Semantic Web has increased the importance of social tagging as a key feature of the Web 2.0 and, as its crucial phenotype, Tag Cloud has emerged to the public. In this paper we provide an efficient method of automated in-text keyword tagging based on large-scale controlled term collection or keyword dictionary, where the computational complexity of O(mN) - if a pattern matching algorithm is used - can be reduced to O(mlogN) - if an Information Retrieval technique is adopted - while m is the length of target document and N is the total number of candidate terms to be tagged. The result shows that automatic in-text tagging with keywords filtered by Information Retrieval speeds up to about 6 $\sim$ 40 times compared with the fastest pattern matching algorithm.

Keywords

References

  1. Sergey Brin and Lawrence Page, “The Anatomy of a Large-scale Hypertextual Web Search Engine,” In Computer Networks and ISDN Systems: Proceeings of the Seventh International World Wide Web Conference, Volume 30(1-7):107-117, Apr. 1998 https://doi.org/10.1016/S0169-7552(98)00110-X
  2. Hyeon Kim, “Handling XML documents in Hypertext Compilation of the Encyclopedia of Korean Local Culture,” Human Contents, 9:91-123, 2007
  3. Jihong Zeng and Peter A. Bloniarz, “From Keywords to Links: An Automatic Approach,” In Proceedings: International Conference on Information Technology: Coding and Computing (ITCC'04), Vol.1, pp.283- 286, Las Vegas, Nevada, USA, Apr. 2004 https://doi.org/10.1109/ITCC.2004.1286467
  4. R. Nigel Horspool, “Practical Fast Searching in Strings,” Software: Practice and Experience, 10(6):501-506, 1980 https://doi.org/10.1002/spe.4380100608
  5. William B. Frakes and Ricardo Baeza-Yates, “Information Retrieval: Data Structures & Algorithms,” Prentice-Hall, 1992
  6. Jinsuk Kim, Du-Seok Jin, Yusoo Choi, Chang-Hoo Jeong, Kwangyoung Kim, Sung-Pil Choi, Minho Lee, Min-Hee Cho, Ho-Seop Choe, Hwa-Mook Yoon, and Jeong-Hyun Seo, “Toward DB-IR Integration: Per- Document Basis Transactional Index Maintenance,” In Proceedings: The 6th International Conference on Advanced Language Processing and Web Information Technology (ALPIT'07), Vol.6, pp.452-462, Luoyang, Henan, China, Aug. 2007
  7. Hsin-Chang Yang and Chung-Hong Lee, “A Text Mining Approach for Automatic Construction of Hypertexts,” Expert Systems with Applications, 29: 723-734, 2005 https://doi.org/10.1016/j.eswa.2005.05.003
  8. Robert J. Glushko, “Transforming Text into Hypertext for a Compact Disc Encyclopedia,” ACM SIGCHI Bulletin, 20:293-298, 1989 https://doi.org/10.1145/67450.67505
  9. Luc Goffinet and Monique Noirhomme-Fraiture, “Automatic Cross-referencing of HCI Guidelines by Statistical Methods,” Interacting with Computers, 12(2):161-177, 1999 https://doi.org/10.1016/S0953-5438(99)00010-7
  10. Airi Salminen, Jean Tague-Sutcliffe, and Charles McClellan, “From Text to Hypertext by Indexing,” ACM Transactions on Information Systems, 13(1):69- 99, 1995 https://doi.org/10.1145/195705.195717
  11. Jakob Voβ, “Tagging, Folksonomy & Co-Renaissance of Manual Indexing”, In Proceedings: The 10th International Symposium for Information Science, pp.234- 254, Cologne, Germany, 2007
  12. Gilad Mishne, “AutoTag: A Collaborative Approach to Automated Tag Assignment for Weblog Posts”, In WWW '06: Proceedings of the 15th international conference on World Wide Web, pp.953-954, New York, USA, 2006 https://doi.org/10.1145/1135777.1135961
  13. Sanjay Sood, Sara Owsley, Kristian Hammond and Larry Birnbaum, “TagAssist: Automatic Tag Suggestion for Blog Posts”, In Proceedings: International Conference on Weblogs and Social Media (ICWSM 2007), Colorado, USA, 2007
  14. Robert Jaschke, Leandro Marinho, Andreas Hotho, Lars Schmidt-Thieme, and Gerd Stumme, “Tag Recommendations in Folksonomies”, In Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2007), pp.506-514, September 17-21, 2007, Warsaw, Poland https://doi.org/10.1007/978-3-540-74976-9_52
  15. Yang Song, Ziming Zhuang, Huajing Li, Qiankun Zhao, Jia Li, Wang-Chien Lee, and C. Lee Giles, “Real-time Automatic Tag Recommendation”, In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR'08), pp.515-522, July 20-24, 2008, Singapore, Singapore https://doi.org/10.1145/1390334.1390423
  16. Stephen Dill, Nadav Eiron, David Gibson, Daniel Gruhl, R. Guha, Anant Jhingran, Tapas Kanungo, Sridhar Rajagopalan, Andrew Tomkins, John A. Tomlin, and Jason Y. Zien, “SemTag and Seeker: Bootstrapping the Semantic Web via Automated Semantic Annotation”, In Proceedings of the 12th International Conference on World Wide Web (WWW'03) Budapest, Hungary, 2003 https://doi.org/10.1145/775152.775178
  17. Paul - Alexandru Chirita, Stefania Costache, Wolfgang Nejdl, and Siegfried Handschuh, “P-TAG: Large Scale Automatic Generation of Personalized Annotation Tags for the Web”, In Proceedings of the 16th international conference on World Wide Web, pp.845- 854, May 08-12, 2007, Banff, Alberta, Canada https://doi.org/10.1145/1242572.1242686
  18. Technion Grigory Begelman, Citrin I. Philipp Keller, and Rawsugar Frank Smadja. “Automated Tag Clustering: Improving Search and Exploration in the Tag Space”. In Collaborative Web Tagging Workshop at WWW2006, Edinburgh, Scotland, 2006. Online: http://www.rawsugar.com/www2006/20.pdf
  19. Scott A. Golder and Bernardo A. Huberman, “Usage Patterns of Collaborative Tagging Systems”, Journal of Information Science, 32(2): 198-208, 2006 https://doi.org/10.1177/0165551506062337

Cited by

  1. Intelligent route generation: discovery and search of correlation between shared resources vol.26, pp.6, 2013, https://doi.org/10.1002/dac.2331
  2. An ontology-based approach to auto-tagging articles vol.2, pp.2, 2015, https://doi.org/10.1007/s40595-014-0033-6