DOI QR코드

DOI QR Code

Document Clustering Using Reference Titles

인용문헌 표제를 이용한 문헌 클러스터링에 관한 연구

  • Choi, Sang-Hee (Department of Library Science, Catholic University of Daegu)
  • Received : 2010.05.27
  • Accepted : 2010.06.17
  • Published : 2010.06.30

Abstract

Titles have been regarded as having effective clustering features, but they sometimes fail to represent the topic of a document and result in poorly generated document clusters. This study aims to improve the performance of document clustering with titles by suggesting titles in the citation bibliography as a clustering feature. Titles of original literature, titles in the citation bibliography, and an aggregation of both titles were adapted to measure the performance of clustering. Each feature was combined with three hierarchical clustering methods, within group average linkage, complete linkage, and Ward's method in the clustering experiment. The best practice case of this experiment was clustering document with features from both titles by within-groups average method.

본 연구에서는 원문헌의 표제가 문헌클러스터링에서 문헌의 주제를 나타내는데 효과적인 자질로 인식되고 있지만 동의어나 유사어를 포함하여 문헌의 주제를 대표하는데 한계가 있음을 인지하고 인용문헌의 표제로 클러스터링 자질을 확대하는 방안을 제시하였다. 문헌 클러스터링의 자질로 원 문헌의 표제 용어와 인용문헌의 표제 용어, 두 종류의 표제 용어를 혼합하여 적용하여 인용문헌의 표제가 클러스터링 성능을 향상시키는 정도를 측정하였다. 각 자질별로 계층적 클러스터링 기법 3개, within group average linkage, complete linkage, Ward 기법을 결합하여 클러스터를 생성하는 성능을 비교, 분석하였는데 원문헌과 인용문헌 표제어를 혼합하여 within group average linkage 기법으로 클러스터링 한 경우가 가장 좋은 결과를 나타내었다.

Keywords

Acknowledgement

Supported by : Catholic University Daegu

References

  1. Chung, Young-mee, and Jae Yun Lee. 2001. “Development of an unbiased measure for clustering performance.” Proceedings of the 7th conference of Korean Society for Information Management, 23-24 August, 2001, [KISTI, Seoul], 167-172.
  2. Guo, Qinglin, and Ming Zhang. 2009. “Multi-document automatic abstracting based on text clustering and semantic analysis.” Knowledgebased systems, 22(6): 482-485.
  3. Hudes, Mark L., Joyce C. McCann, and Bruce N. Ames. 2009. “Unusual clustering of coefficients of variation in published articles from a medical biochemistry department in India.” The FASEB Journal, 23(3): 706-708. https://doi.org/10.1096/fj.08-123117
  4. Kim, Jun-Ha and Jae Yun Lee. 2000. “A Comparative study on performance evaluation of document clustering results.” Proceedings of the 7th conference of Korean Society for Information Management, 24-25 August, 2000, [Ewha Womans Univ., Seoul], 45-50.
  5. Kostoff, Ronald N. J. Antonio del Río, Hector D. Cortes, Charles Smith, Andrew Smith, Caroline Wagner, Loet Leydesdorff, George Karypis, Guido Malpohl, and Rene Tshiteya 2007. “Clustering methodologies for identifying country core competencies.” Journal of Information Science, 33(1): 21-40. https://doi.org/10.1177/0165551506067124
  6. Kuo, June-Jei, and Hsin-Hsi Chen. 2007. “Cross-document event clustering using knowledge mining from co-reference chains.” Information Processing and Management, 43(2): 327-343. https://doi.org/10.1016/j.ipm.2006.07.016
  7. Staff, Chris. 2008. “Bookmark category web page classification using four indexing and clustering approaches.” Lecture notes in computer science, Vol.5149: 345-348. https://doi.org/10.1007/978-3-540-70987-9_50
  8. Tong, Tuanjie, Deendayal, Dinakarpandian, and Yugyung Lee. 2009. “Literature clustering using citation semantics.” Proceedings of the 42nd Hawaii international conference on system sciences. 5-9 January 2009, [HICS; Waikola, HI], 1-10.
  9. Zhang, Lin, Frizo Janssens, Liming Liang, and Wolfgang Glanzel. “Journal cross-citation analysis for validation and improvement of journalbased subject classification in bibliometric research.” Scientometrics, 82(3): 687-706. https://doi.org/10.1007/s11192-010-0180-1
  10. Zhao, Yueyang, Lei Cui and Hua Yang. 2009. “Evaluating reliability of co-citation clustering analysis in representing the research history of subject.” Scientometrics, 80(1): 91-102. https://doi.org/10.1007/s11192-008-2056-1
  11. Zhu, Shanfeng, Ichigaku Takigawa, Jia Zeng and Hiroshi Mamitsuka. 2009. “Field independent probabilistic model for clusteing multi-field documents.” Information Processing and Management, 45(5): 555-570. https://doi.org/10.1016/j.ipm.2009.03.005

Cited by

  1. Usability Analysis of Structured Abstracts in Journal Articles for Document Clustering vol.29, pp.1, 2012, https://doi.org/10.3743/KOSIM.2012.29.1.331