The Impact of Name Ambiguity on Properties of Coauthorship Networks

  • Kim, Jinseok ;
  • Kim, Heejun ;
  • Diesner, Jana
  • Received : 2014.06.09
  • Accepted : 2014.06.21
  • Published : 2014.06.30


Initial based disambiguation of author names is a common data pre-processing step in bibliometrics. It is widely accepted that this procedure can introduce errors into network data and any subsequent analytical results. What is not sufficiently understood is the precise impact of this step on the data and findings. We present an empirical answer to this question by comparing the impact of two commonly used initial based disambiguation methods against a reasonable proxy for ground truth data. We use DBLP, a database covering major journals and conferences in computer science and information science, as a source. We find that initial based disambiguation induces strong distortions in network metrics on the graph and node level: Authors become embedded in ties for which there is no empirical support, thus increasing their sphere of influence and diversity of involvement. Consequently, networks generated with initial-based disambiguation are more coherent and interconnected than the actual underlying networks, and individual authors appear to be more productive and more strongly embedded than they actually are.


bibliometrics;name ambiguity;initial based disambiguation;coauthorship networks;collaboration networks


  1. Wagner, C. S., & Leydesdorff, L. (2005). Network structure, self-organization, and the growth of international collaboration in science. Research Policy, 34(10), 1608-1618. doi:
  2. Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2005). A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56(2), 140-158. doi: Doi 10.1002/Asi/20105
  3. Treeratpituk, P., & Giles, C. L. (2009). Disambiguating Authors in Academic Publications using Random Forests. Paper presented at the Jcdl 09: Proceedings of the 2009 Acm/Ieee Joint Conference on Digital Libraries.
  4. Velden, Haque, A., & Lagoze, C. (2011). Resolving author name homonymy to improve resolution of structures in co-author networks. Paper presented at the Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries.
  5. Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. New York, NY: Cambridge University Press.
  6. Yoshikane, F., Nozawa, T., Shibui, S., & Suzuki, T. (2009). An analysis of the connection between researchers' productivity and their co-authors' past attributions, including the importance in collaboration networks. Scientometrics, 79(2), 435-449. doi: 10.1007/s11192-008-0429-8
  7. Milojevic, S. (2010). Modes of Collaboration in Modern Science: Beyond Power Laws and Preferential Attachment. Journal of the American Society for Information Science and Technology, 61(7), 1410-1423. doi: 10.1002/asi.21331
  8. Ley, M. (2009). DBLP: some lessons learned. Proc. VLDB Endow., 2(2), 1493-1500.
  9. Leydesdorff, L., & Sun, Y. (2009). National and International Dimensions of the Triple Helix in Japan: University-Industry-Government Versus International Coauthorship Relations. Journal of the American Society for Information Science and Technology, 60(4), 778-788. doi: 10.1002/asi.20997
  10. Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019-1031. doi: 10.1002/asi.20591
  11. Milojevic, S. (2013). Accuracy of simple, initials-based methods for author name disambiguation. Journal of Informetrics, 7(4), 767-773. doi:
  12. Moody, J. (2004). The structure of a social science collaboration network: Disciplinary cohesion from 1963 to 1999. American Sociological Review, 69(2), 213-238.
  13. Newman, M. E. J. (2001). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences of the United States of America, 98(2), 404-409. doi: 10.1073/pnas.021544898
  14. Newman, M. E. J. (2002). Assortative mixing in networks. Physical Review Letters, 89(20), 208701.
  15. Rorissa, A., & Yuan, X. J. (2012). Visualizing and mapping the intellectual structure of information retrieval. Information Processing & Management, 48(1), 120-135. doi: 10.1016/j.ipm.2011.03.004
  16. Smalheiser, N. R., & Torvik, V. I. (2009). Author Name Disambiguation. Annual Review of Information Science and Technology, 43, 287-313.
  17. Strotmann, A., & Zhao, D. Z. (2012). Author name disambiguation: What difference does it make in author-based citation analysis? Journal of the American Society for Information Science and Technology, 63(9), 1820-1833. doi: Doi 10.1002/Asi.22695
  18. Torvik, V. I., & Smalheiser, N. R. (2009). Author Name Disambiguation in MEDLINE. Acm Transactions on Knowledge Discovery from Data, 3(3). doi: Doi 10.1145/1552303.1552304
  19. Franceschet, M. (2011). Collaboration in Computer Science: A Network Science Approach. Journal of the American Society for Information Science and Technology, 62(10), 1992-2012. doi: 10.1002/asi.21614
  20. Diesner, J., & Carley, K. M. (2009). He says, she says, pat says, Tricia says: how much reference resolution matters for entity extraction, relation extraction, and social network analysis. Paper presented at the Proceedings of the Second IEEE international conference on Computational intelligence for security and defense applications, Ottawa, Ontario, Canada.
  21. Fegley, B. D., & Torvik, V. I. (2013). Has Large-Scale Named-Entity Network Analysis Been Resting on a Flawed Assumption? Plos One, 8(7). doi: 10.1371/journal.pone.0070299
  22. Fiala, D. (2012). Time-aware PageRank for bibliographic networks. Journal of Informetrics, 6(3), 370-388. doi: 10.1016/j.joi.2012.02.002
  23. Friedkin, N. E. (1981). The Development of Structure in Random Networks: An Analysis of the Effects of Increasing Network Density on Five Measures of Structure. Social Networks, 3(1), 41-52.
  24. Goyal, S., van der Leij, M. J., & Moraga-Gonzalez, J. L. (2006). Economics: An emerging small world. Journal of Political Economy, 114(2), 403-412. doi: 10.1086/500990
  25. He, B., Ding, Y., & Ni, C. (2011). Mining Enriched Contextual Information of Scientific Collaboration: A Meso Perspective. Journal of the American Society for Information Science and Technology, 62(5), 831-845. doi: 10.1002/asi.21510
  26. Huber, J. C. (2002). A new model that generates Lotka's Law. Journal of the American Society for Information Science and Technology, 53(3), 209-219. doi: 10.1002/asi.10025
  27. Knoke, D., & Yang, S. (2008). Social network analysis. Los Angeles, CA: Sage Publications.
  28. Lariviere, V., Sugimoto, C. R., & Cronin, B. (2012). A bibliometric chronicling of library and information science's first hundred years. Journal of the American Society for Information Science and Technology, 63(5), 997-1016. doi: 10.1002/asi.22645
  29. Lee, D., Goh, K. I., Kahng, B., & Kim, D. (2010). Complete trails of coauthorship network evolution. Physical Review E, 82(2). doi: 10.1103/PhysRevE.82.026112
  30. Ley, M. (2002). The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives. In A. F. Laender & A. Oliveira (Eds.), String Processing and Information Retrieval (Vol. 2476, pp. 1-10): Springer Berlin Heidelberg.
  31. Braun, T., Glanzel, W., & Schubert, A. (2001). Publication and cooperation patterns of the authors of neuroscience journals. Scientometrics, 51(3), 499-510. doi: 10.1023/a:1019643002560
  32. Barabasi, A. L., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica a-Statistical Mechanics and Its Applications, 311(3-4), 590-614. doi: 10.1016/s0378-4371(02)00736-7
  33. Bettencourt, L. M. A., Lobo, J., & Strumsky, D. (2007). Invention in the city: Increasing returns to patenting as a scaling function of metropolitan size. Research Policy, 36(1), 107-120. doi: 10.1016/j.respol.2006.09.026
  34. Brandes, U. (2008). On variants of shortest-path betweenness centrality and their generic computation. Social Networks, 30(2), 136-145. doi:
  35. de Nooy, W., Mrvar, A., & Batagelj, V. (2011). Exploratory social network analysis with Pajek: Cambridge University Press.

Cited by

  1. Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks vol.67, pp.6, 2015,
  2. Evolution and structure of scientific co-publishing network in Korea between 1948–2011 vol.107, pp.1, 2016,
  3. NameClarifier: A Visual Analytics System for Author Name Disambiguation vol.23, pp.1, 2017,
  4. A Survey of Scholarly Data Visualization vol.6, pp.2169-3536, 2018,


Supported by : KISTI, FORD Foundation