DOI QR코드

DOI QR Code

Author Graph Generation based on Author Disambiguation

저자 식별에 기반한 저자 그래프 생성

  • Kang, In-Su (School of Computer, Kyungsung University)
  • 강인수 (경성대학교 컴퓨터학부)
  • Received : 2010.09.03
  • Accepted : 2010.11.24
  • Published : 2011.01.30

Abstract

While an ideal author graph should have its nodes to represent authors, automatically-generated author graphs mostly use author names as their nodes due to the difficulty of resolving author names into individuals. However, employing author names as nodes of author graphs merges namesakes, otherwise separate nodes in the author graph, into the same node, which may distort the characteristics of the author graph. This study proposes an algorithm which resolves author ambiguities based on co-authorship and then yields an author graph consisting of not author name nodes but author nodes. Scientific collaboration relationship this algorithm depends on tends to produce the clustering results which minimize the over-clustering error at the expense of the under-clustering error. In experiments, the algorithm is applied to the real citation records where Korean namesakes occur, and the results are discussed.

이상적 저자-망은 그 노드가 저자를 표현하도록 정의된다. 그러나 실제 자동 생성되는 대부분 저자망의 노드는 저자명을 저자 식별자로 사상시키는 어려움으로 인해 단순히 저자명으로 표현된다. 실 세계 저자를 표현하기 위해 이처럼 저자명을 사용하여 저자망을 구성하는 것은 서로 다른 동명 저자들이 하나의 저자명 노드로 병합됨으로 인해 저자망의 특성을 왜곡하는 문제가 발생한다. 이 연구는 공저 관계에 의존하여 저자명이 갖는 중의성을 해소하고 저자 노드로 구성된 저자망을 자동 생성하는 알고리즘을 제시한다. 공저자 자질의 특성상 이 알고리즘은 과소군집오류를 희생하면서 과다군집오류를 최소화하는 군집 결과를 만든다. 실험에서는 한글 동명 저자명이 출현한 실제 서지레코드 집합을 대상으로 알고리즘의 적용 결과를 제시한다.

Keywords

References

  1. 강인수, 이승우, 정한민, 김평, 구희관, 이미경, 성원경, 박동인. 2008. 저자 식별을 위한 자질 비교. 한국콘텐츠학회논문지, 8(2): 41-47.
  2. 강인수. 2009. 한글 저자명 중의성 해소를 위한 기계학습기법의 적용. 정보관리학회지, 25(3): 27-39.
  3. Blei, D., Ng, A., Jordan, M. 2003. "Latent Dirichlet allocation." Journal of Machine Learning Research, 3: 951- 991.
  4. Braun, T., Glanzel, W., Schubert, A. 2001. "Publication and cooperation patterns of the authors of neuroscience journals." Scientometrics, 51(12): 499-510. https://doi.org/10.1023/A:1019643002560
  5. Elmacioglu, E., Dongwon, L. 2005. "On six degrees of separation in DBLP-DB and more." SIGMOD Record, 34(2): 33-40.
  6. Fortunato, S. (2010). Community detection in graphs. Physics Report, 486: 75-174. https://doi.org/10.1016/j.physrep.2009.11.002
  7. Girvan, M., Newman, M. 2002. "Community structure in social and biological networks." In Proceedings of the National Academy of Science, 99, pp.7821-7826. https://doi.org/10.1073/pnas.122653799
  8. Hofmann, T. 1999. "Probabilistic latent semantic indexing." In Proceedings of the ACM SIGIR 22nd Annual International Conference on Research and Development in Information Retrieval, pp.50-57.
  9. Huang, J., Ertekin., S., Giles, C. 2006. "Efficient name disambiguation for largescale databases." In Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp.536-544.
  10. Kang, I., Na, S., Lee, S., Jung. H., Kim, P., Sung, W., Lee, J. 2009. "On co -authorship for author disambiguation," Information Processing & Management, 45(1): 84-97. https://doi.org/10.1016/j.ipm.2008.06.006
  11. Liu, X., Bollen, J., Nelson, M., Sompel, H. 2005. "Co-authorship networks in the digital library research community." Information Processing and Management, 41: 1462-1480. https://doi.org/10.1016/j.ipm.2005.03.012
  12. Newman, M. 2004a. "Coauthorship networks and patterns of scientific collaboration." In Proceedings of the National Academy of Science, 101, pp.5200-5205. https://doi.org/10.1073/pnas.0307545100
  13. Newman, M. 2004b. Who is the best connected scientist? A study of scientific coauthorship networks. In E. Ben -Naim, H. Frauenfelder, & Z. Toroczkai Eds. Complex networks, Berlin: Springer.
  14. Newman, M. 2004c. Fast algorithm for detecting community structure in networks. Physical Review E, 69, 066133. https://doi.org/10.1103/PhysRevE.69.066133
  15. Pereira, D., Ribeiro-Neto, B., Ziviani, N., Laender, A., Goncalves, M., Ferreira, A. 2009. "Using web information for author name disambiguation." In Proceedings of the 9th ACM/IEEE -CS Joint Conference on Digital libraries, pp.49-58.
  16. Smalheiser, N., Torvik, V. 2009. "Author name disambiguation." Annual Review of Information Science and Technology, 43:287-313.
  17. Song, Y., Huang, J., Councill, I., Li, J., Giles, C. 2007. "Efficient topic-based unsupervised name disambiguation." In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, pp.342-351.
  18. Tan, Y., Kan, M., Lee, D. 2006. "Search engine driven author disambiguation," In Proceedings of the 6th ACM/ IEEE-CSJoint Conference on Digital libraries, pp.314-315.
  19. Tomassini, M., Luthi, L., Giacobini, M., Langdon, W. 2007. "The structure of the genetic programming collaboration network." Genetic Programming and Evolvable Machines, 8(1): 97-103. https://doi.org/10.1007/s10710-006-9018-2
  20. Torvik, V., Weeber, M., Swanson, D., Smalheiser, N. 2005. "A probabilistic similarity metric for Medline records: a model for author name disambiguation." Journal of the American Society for Information Science and Technology, 56: 140-158. https://doi.org/10.1002/asi.20105