DOI QR코드

DOI QR Code

The Impact of Name Ambiguity on Properties of Coauthorship Networks

  • Kim, Jinseok ;
  • Kim, Heejun ;
  • Diesner, Jana
  • Received : 2014.06.09
  • Accepted : 2014.06.21
  • Published : 2014.06.30

Abstract

Initial based disambiguation of author names is a common data pre-processing step in bibliometrics. It is widely accepted that this procedure can introduce errors into network data and any subsequent analytical results. What is not sufficiently understood is the precise impact of this step on the data and findings. We present an empirical answer to this question by comparing the impact of two commonly used initial based disambiguation methods against a reasonable proxy for ground truth data. We use DBLP, a database covering major journals and conferences in computer science and information science, as a source. We find that initial based disambiguation induces strong distortions in network metrics on the graph and node level: Authors become embedded in ties for which there is no empirical support, thus increasing their sphere of influence and diversity of involvement. Consequently, networks generated with initial-based disambiguation are more coherent and interconnected than the actual underlying networks, and individual authors appear to be more productive and more strongly embedded than they actually are.

Keywords

bibliometrics;name ambiguity;initial based disambiguation;coauthorship networks;collaboration networks

References

  1. Wagner, C. S., & Leydesdorff, L. (2005). Network structure, self-organization, and the growth of international collaboration in science. Research Policy, 34(10), 1608-1618. doi: http://dx.doi.org/10.1016/j.respol.2005.08.002 https://doi.org/10.1016/j.respol.2005.08.002
  2. Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2005). A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56(2), 140-158. doi: Doi 10.1002/Asi/20105 https://doi.org/10.1002/asi.20105
  3. Treeratpituk, P., & Giles, C. L. (2009). Disambiguating Authors in Academic Publications using Random Forests. Paper presented at the Jcdl 09: Proceedings of the 2009 Acm/Ieee Joint Conference on Digital Libraries.
  4. Velden, Haque, A., & Lagoze, C. (2011). Resolving author name homonymy to improve resolution of structures in co-author networks. Paper presented at the Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries.
  5. Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. New York, NY: Cambridge University Press.
  6. Yoshikane, F., Nozawa, T., Shibui, S., & Suzuki, T. (2009). An analysis of the connection between researchers' productivity and their co-authors' past attributions, including the importance in collaboration networks. Scientometrics, 79(2), 435-449. doi: 10.1007/s11192-008-0429-8 https://doi.org/10.1007/s11192-008-0429-8
  7. Milojevic, S. (2010). Modes of Collaboration in Modern Science: Beyond Power Laws and Preferential Attachment. Journal of the American Society for Information Science and Technology, 61(7), 1410-1423. doi: 10.1002/asi.21331 https://doi.org/10.1002/asi.21331
  8. Ley, M. (2009). DBLP: some lessons learned. Proc. VLDB Endow., 2(2), 1493-1500. https://doi.org/10.14778/1687553.1687577
  9. Leydesdorff, L., & Sun, Y. (2009). National and International Dimensions of the Triple Helix in Japan: University-Industry-Government Versus International Coauthorship Relations. Journal of the American Society for Information Science and Technology, 60(4), 778-788. doi: 10.1002/asi.20997 https://doi.org/10.1002/asi.20997
  10. Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019-1031. doi: 10.1002/asi.20591 https://doi.org/10.1002/asi.20591
  11. Milojevic, S. (2013). Accuracy of simple, initials-based methods for author name disambiguation. Journal of Informetrics, 7(4), 767-773. doi: http://dx.doi.org/10.1016/j.joi.2013.06.006 https://doi.org/10.1016/j.joi.2013.06.006
  12. Moody, J. (2004). The structure of a social science collaboration network: Disciplinary cohesion from 1963 to 1999. American Sociological Review, 69(2), 213-238. https://doi.org/10.1177/000312240406900204
  13. Newman, M. E. J. (2001). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences of the United States of America, 98(2), 404-409. doi: 10.1073/pnas.021544898 https://doi.org/10.1073/pnas.021544898
  14. Newman, M. E. J. (2002). Assortative mixing in networks. Physical Review Letters, 89(20), 208701. https://doi.org/10.1103/PhysRevLett.89.208701
  15. Rorissa, A., & Yuan, X. J. (2012). Visualizing and mapping the intellectual structure of information retrieval. Information Processing & Management, 48(1), 120-135. doi: 10.1016/j.ipm.2011.03.004 https://doi.org/10.1016/j.ipm.2011.03.004
  16. Smalheiser, N. R., & Torvik, V. I. (2009). Author Name Disambiguation. Annual Review of Information Science and Technology, 43, 287-313.
  17. Strotmann, A., & Zhao, D. Z. (2012). Author name disambiguation: What difference does it make in author-based citation analysis? Journal of the American Society for Information Science and Technology, 63(9), 1820-1833. doi: Doi 10.1002/Asi.22695 https://doi.org/10.1002/asi.22695
  18. Torvik, V. I., & Smalheiser, N. R. (2009). Author Name Disambiguation in MEDLINE. Acm Transactions on Knowledge Discovery from Data, 3(3). doi: Doi 10.1145/1552303.1552304 https://doi.org/10.1145/1552303.1552304
  19. Franceschet, M. (2011). Collaboration in Computer Science: A Network Science Approach. Journal of the American Society for Information Science and Technology, 62(10), 1992-2012. doi: 10.1002/asi.21614 https://doi.org/10.1002/asi.21614
  20. Diesner, J., & Carley, K. M. (2009). He says, she says, pat says, Tricia says: how much reference resolution matters for entity extraction, relation extraction, and social network analysis. Paper presented at the Proceedings of the Second IEEE international conference on Computational intelligence for security and defense applications, Ottawa, Ontario, Canada.
  21. Fegley, B. D., & Torvik, V. I. (2013). Has Large-Scale Named-Entity Network Analysis Been Resting on a Flawed Assumption? Plos One, 8(7). doi: 10.1371/journal.pone.0070299 https://doi.org/10.1371/journal.pone.0070299
  22. Fiala, D. (2012). Time-aware PageRank for bibliographic networks. Journal of Informetrics, 6(3), 370-388. doi: 10.1016/j.joi.2012.02.002 https://doi.org/10.1016/j.joi.2012.02.002
  23. Friedkin, N. E. (1981). The Development of Structure in Random Networks: An Analysis of the Effects of Increasing Network Density on Five Measures of Structure. Social Networks, 3(1), 41-52. https://doi.org/10.1016/0378-8733(81)90004-6
  24. Goyal, S., van der Leij, M. J., & Moraga-Gonzalez, J. L. (2006). Economics: An emerging small world. Journal of Political Economy, 114(2), 403-412. doi: 10.1086/500990 https://doi.org/10.1086/500990
  25. He, B., Ding, Y., & Ni, C. (2011). Mining Enriched Contextual Information of Scientific Collaboration: A Meso Perspective. Journal of the American Society for Information Science and Technology, 62(5), 831-845. doi: 10.1002/asi.21510 https://doi.org/10.1002/asi.21510
  26. Huber, J. C. (2002). A new model that generates Lotka's Law. Journal of the American Society for Information Science and Technology, 53(3), 209-219. doi: 10.1002/asi.10025 https://doi.org/10.1002/asi.10025
  27. Knoke, D., & Yang, S. (2008). Social network analysis. Los Angeles, CA: Sage Publications.
  28. Lariviere, V., Sugimoto, C. R., & Cronin, B. (2012). A bibliometric chronicling of library and information science's first hundred years. Journal of the American Society for Information Science and Technology, 63(5), 997-1016. doi: 10.1002/asi.22645 https://doi.org/10.1002/asi.22645
  29. Lee, D., Goh, K. I., Kahng, B., & Kim, D. (2010). Complete trails of coauthorship network evolution. Physical Review E, 82(2). doi: 10.1103/PhysRevE.82.026112 https://doi.org/10.1103/PhysRevE.82.026112
  30. Ley, M. (2002). The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives. In A. F. Laender & A. Oliveira (Eds.), String Processing and Information Retrieval (Vol. 2476, pp. 1-10): Springer Berlin Heidelberg.
  31. Braun, T., Glanzel, W., & Schubert, A. (2001). Publication and cooperation patterns of the authors of neuroscience journals. Scientometrics, 51(3), 499-510. doi: 10.1023/a:1019643002560 https://doi.org/10.1023/A:1019643002560
  32. Barabasi, A. L., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica a-Statistical Mechanics and Its Applications, 311(3-4), 590-614. doi: 10.1016/s0378-4371(02)00736-7 https://doi.org/10.1016/S0378-4371(02)00736-7
  33. Bettencourt, L. M. A., Lobo, J., & Strumsky, D. (2007). Invention in the city: Increasing returns to patenting as a scaling function of metropolitan size. Research Policy, 36(1), 107-120. doi: 10.1016/j.respol.2006.09.026 https://doi.org/10.1016/j.respol.2006.09.026
  34. Brandes, U. (2008). On variants of shortest-path betweenness centrality and their generic computation. Social Networks, 30(2), 136-145. doi: http://dx.doi.org/10.1016/j.socnet.2007.11.001 https://doi.org/10.1016/j.socnet.2007.11.001
  35. de Nooy, W., Mrvar, A., & Batagelj, V. (2011). Exploratory social network analysis with Pajek: Cambridge University Press.

Cited by

  1. Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks vol.67, pp.6, 2015, https://doi.org/10.1002/asi.23489
  2. Evolution and structure of scientific co-publishing network in Korea between 1948–2011 vol.107, pp.1, 2016, https://doi.org/10.1007/s11192-016-1878-5
  3. NameClarifier: A Visual Analytics System for Author Name Disambiguation vol.23, pp.1, 2017, https://doi.org/10.1109/TVCG.2016.2598465
  4. A Survey of Scholarly Data Visualization vol.6, pp.2169-3536, 2018, https://doi.org/10.1109/ACCESS.2018.2815030

Acknowledgement

Supported by : KISTI, FORD Foundation