1.1. Statement of the Problem
Results published in Zhu (2017) showed that 55 % of respondents in UK universities expected that open access (OA) articles would receive more citations. Only 8% of the respondents doubted that statement. Similar expectations were expressed by the respondents of the Russian survey conducted in 2018 (Razumova, Litvinova, Shvartsman, & Kuznetsov, 2018). To get the answer on the reality of such expectations in the new OA environment, we performed a study of the citation advantages of OA articles with the instruments and methodology recently developed in Web of Science Core Collection (WoS CC) and Dimensions.
Following common definitions (Springer, 2019; Suber, 2006; Swan, 2012), we consider two main reference groups of open- access publications: Gold OA and Green OA. We assign to Gold OA the online journal articles published either in fully accessible OA (Pure-Gold-OA) journals or in Hybrid OA journals. Currently, the Directory of Open Access Journals (DOAJ) makes the world largest database of OA journals. Global citation indexes WoS CC, Scopus, and Dimensions use DOAJ as a source of Pure-Gold-OA articles. The hybrid journals are traditional subscription (Paywall) journals in which some of the articles are moved to OA (Hybrid OA articles). This stipulates the payment of an article processing charge (APC) to the publisher. Green OA refers to the author self-archiving a preprint or postprint versions of the article. Green OA articles are freely available for the general public on the websites of the institutional or subject-based OA repositories. For the purpose of this study, we will refer to the above reference groups of articles as Paywall, Gold OA, Pure-Gold-OA, Hybrid OA, and Green OA.
1.3. Literature Review
Since 2001 (Lawrence, 2001), many authors have reported the citation advantage of OA over non-OA or Paywall articles in separate research fields (Antelman, 2004; Eysenbach, 2006; Harnad et al., 2004; Kamat, 2018; Koler-Povh, Ju žni č, & Turk, 2014). In the first decade of the 21st century, Green OA citation impact was investigated and the OA citation advantage was confirmed (Metcalfe, 2005, 2006; Schwarz & Kennicutt, 2004; Wang, Liu, Mao, & Fang, 2015). However, a number of authors have argued this conclusion (Craig, Plume, McVeigh, Pringle, & Amin, 2007; Davis, Lewenstein, Simon, Booth, & Connolly, 2008; Davis & Walters, 2011).
No agreement has been reached yet and some authors have proposed a number of reasons for correlation between OA and citation advantage (Davis & Fromerth, 2007; Henneken et al., 2006; Kurtz et al., 2005; Moed, 2007).
The reasons were listed in Dorta-González, González- Betancor, & Dorta-González (2017), and can be summarized as follows:
- (a) The OA postulate. Since OA articles are available for a wide audience, they get higher readership and citation.
- (b) The early view postulate (Davis & Fromerth, 2007; Henneken et al., 2006; Kurtz et al., 2005; Moed, 2007). Green OA articles could be available online prior to their publication. They can, therefore, begin accumulating citations earlier than the paid access articles published at the same time and thus will have more citations because they have been available longer.
- (c) The author selection bias postulate (Gaule & Maystre, 2011; McCabe & Snyder, 2014). Authors are more likely to provide OA to their highest quality articles, so OA articles will have more citations than paid-access articles.
- (d) The APC selection bias. Average APC of about EUR 3,000 were listed among the barriers preventing authors from publishing articles in the OA model (Razumova et al., 2018; Zhu, 2017). Only rich and successful universities pay APCs for their authors. Successful universities perform high-quality research and thus provide the OA domain with high-quality articles that collect many citations.
- (e) The grant selection bias. Currently, 108 out of the 148 largest research funders listed on the Sherpa Juliet website (Sherpa Juliett, 2019) require OA publishing (Gold OA) or OA archiving (Green OA) for articles supported by funder grants. The grants are issued to high-quality research, so this practice brings to OA high-quality articles.
Meanwhile, all the above-mentioned selection-bias postulates were formulated in the old OA environment. However, the OA world changes very quickly and during the last five years, the OA environment has changed dramatically. National OA policies and programs have been adopted that aim at 100% of publicly funded research to be published in Gold OA or Green OA. Many of them request or encourage Hybrid OA. Since 2014, OA policies have been launched in leading Western European countries: the United Kingdom, the Netherlands, France, Germany, Austria, Sweden, Denmark, Finland, etc. In July 2014, Higher Education Funding Council for England (HEFCE) announced the Research Excellence Framework (REF) 2021 OA policy: “The core of the REF 2021 OA policy is that journal articles and conference proceedings must be available in an OA form to be eligible for the next REF. In practice, this means that these outputs must be uploaded to an institutional or subject repository ” (HEFCE, 2014). The Plan S (2018) of the international consortium of research funders (cOAlition S, 2019) mandates that, from 2020, all articles funded by Plan S signatories will be published in compliant OA journals (Gold OA) or platforms (Green OA).
Upon realization of the national OA policies, new licenses were negotiated with the world leading publishers: Springer, Wiley, Elsevier, and so on. The list of newly negotiated licenses counts up to 15 –20 publishers. The licenses include provisions that enable national corresponding authors to publish their articles in Gold (Hybrid) OA without paying APC.
The latest changes are resulting in the fast growth of the share of OA articles in the world publication flow. Thus, 28% of the 2016 articles in WoS CC were published in Green or Gold OA. Even higher numbers were reported in Science-Metrix (2018): As measured in Q3 2016, the percentage of OA articles in the WoS CC and science databases varied from 55% to 57% for the publication years within 2009 –2014.
The new environment removes the OA selection bias. The author selection bias has no effect as OA publication becomes a requirement of funding bodies and authors are forced to publish all their articles in OA irrespective of their subjective choice. As APC are waived for authors, the APC barrier does not exist anymore.
2. MATERIALS AND METHODOLOGY
2.1. Citation Impact of Hybrid OA and Paywall Articles in Hybrid Journals of the Royal Society of Chemistry
2.1.1. The Gold-for-Gold Project of the Royal Society of Chemistry in Russia
In 2017-2018 together with the Royal Society of Chemistry (RSC), we analyzed the citation impact of Hybrid OA and Paywall articles of the Russian authors accepted for publication in the RSC hybrid journals. The articles were moved to the Hybrid OA mode within the Gold-for-Gold project of the RSC in Russia (G4G-RU).
The G4G-RU Project created a unique situation when 143 Russian articles accepted for publication in 2016 were moved to Hybrid OA almost at one time, namely in February to March 2017. The APC was waived for the Russian authors. This removed the problem of the APC barrier and the author choice bias, as all participating authors agreed to transform to OA all their articles compliant with the project.
We created two reference groups: the 143 OA articles (Hybrid OA) and 360 non-OA (Paywall) articles. All of the articles were published in the same hybrid journals presented in WoS CC: Analyst, Analytical Methods, Catalysis Science & Technology, Chemical Communications, Crystengcomm, Dalton Transactions, Faraday Discussions, Green Chemistry, Journal of Materials Chemistry, Molecular Biosystems, Nanoscale, New Journal of Chemistry, Organic & Biomolecular Chemistry, Photochemical & Photobiological Sciences, Physical Chemistry Chemical Physics , and Soft Matter . At the beginning of the project the journal RSC Advances was converted to the Pure- Gold-OA mode, so we excluded it from the list of analyzed hybrid journals.
Citation impact in each reference group was set to zero at the start of the April 2017 project. We controlled the number of citations of each article and of the package of articles as a whole. Citation impact values were calculated quarterly and the dependence of citation impact on the citation period was built within the first year after the start of the project.
2.1.2. Country-level Citation Impact of Hybrid OA and Paywall articles in RSC Hybrid Journals
In April 2018, in view of the new WoS CC OA functionality, we analyzed the country-level citation impact of the OA and Paywall articles published in 2016 in the whole domain of the RSC hybrid journals included in the WoS CC. Along with Russia, we selected two other countries which waived the RCS APCs for their corresponding authors: the United Kingdom and the Netherlands. The list of 40 RSC journals was obtained using the publisher filter of the InCites platform, the Pure-Gold- OA journals being excluded from the list. Searching for the 2016 articles was performed in WoS CC and Gold OA (Yes/No) filters were applied.
2.2. OA Models in WoS CC and Dimensions Databases
In this study, we analyzed the datasets of the 69 mln. article
database WoS CC and the 96 mln. article Dimensions database. The filters used in WoS CC were: the journal article domain in Science Citation Index Expanded (SCIE), the Social Sciences Citation Index (SSCI), Arts and Humanities Citation Index (AHCI) and Emerging Sources Citation Index (ESCI), fixed publication year in the interval 2009 –2017, OA (Yes/No), Gold OA (Bronze) (Yes/No), and Green OA (Yes/No). The filters used in Dimensions were: All publications or articles, Green OA (repositories), and Gold OA (journal publications).
We studied the reference datasets of the Green OA, Hybrid OA, Pure-Gold-OA, and Paywall articles. Newly developed services of WoS CC and Dimensions enable article-level selection and analysis of the groups of Green OA, Gold OA, and All articles. Subtraction of Green OA and Gold OA data from the data on the All-article group gives the Paywall data. The Gold OA datasets in WoS CC and Dimensions include both the Hybrid OA and Pure-Gold-OA articles. The OA dataset of the InCites platform includes the Pure Gold OA articles from WoS CC only. To use the InCites instruments, we prepared the Gold OA, Green OA, and Paywall datasets in WoS CC and saved them to InCites. Subtraction of Pure- Gold-OA data obtained in InCites from the Gold OA data of WoS CC saved to InCites gives us the WoS CC Hybrid OA data exclusively.
2.2.1. Measured Citation Metrics
The reference datasets of the Green OA, Hybrid OA, Pure- Gold-OA, and Paywall articles were analyzed with the InCites functionality. The InCites instruments provide the following indicators:
- Number of articles
- Number of citations
- Citation impact
- Category Normalized Citation Impact
According to the InCites definition, the Category Normalized Citation Impact of a document is calculated by dividing the actual count of citing items by the expected citation rate for documents with the same document type, year of publication, and subject area.
In the Dimensions database, the reference datasets of Green OA, Gold OA, and Paywall articles were selected and studied with the Dimensions functionality. The indicators provided are as follows:
- Number of articles
- Number of citations
- Citation impact
- Field Citation Ratio (FCR)
- Relative Citation Ratio (RCR)
According to the definitions of the Dimensions database (Dimensions, 2019), the FCR is calculated by dividing the number of citations a paper has received by the average number received by documents published in the same year and in the same Fields of Research (FoR) category. The RCR is calculated for all PubMed publications which are at least two years old. It is calculated as the citations of a paper, normalized to the citations received by National Institutes of Health (NIH)-funded publications in the same area of research and year. The area of research is defined by the corpus of publications co-cited with the article of interest (the “co-citation network ”).
Datasets of the OA and Paywall articles of a given publication year were analyzed within the 2009 –2017 interval, as in the 3rd quarter of 2018. We obtained dependence of citation impact and %Cited on the citation period. The latter is calculated as the number of years passed after the publication.
To investigate citation impact in separate research areas we performed the above analysis in major research areas in the schemas of the Italian National Agency for the Evaluation of Universities and Research Institutes (ANVUR) and Global Institutional Profiles Project (GIPP), available in InCites. Articles were grouped either in twelve major research fields of the ANVUR classification (Agricultural and Veterinary Sciences, Biology, Chemistry, Civic Engineering and Architecture, Earth Science, Economics and Statistics, Industrial and Information Engineering, Mathematics and Informatics, Medicine, Multidisciplinary (excluded), Physics, and Psychology) or in six major research fields of the GIPP classification: Arts & Humanities, Clinical, Pre-Clinical & Health, Engineering & Technology, Life Sciences, Physical Sciences, and Social Sciences.
To avoid the impact of grant selection bias, we studied the Green OA, Gold OA, and Paywall datasets of articles published with the grant support of the NIH that mandates Green and Gold OA publications.
To eliminate the impact of author selection bias and the APC barrier, we studied articles generated in countries having OA policy and author APCs waived: the United Kingdom, where APCs fees were paid within the block grants provided to the universities (UK Research and Innovation, 2013), and the Netherlands, where APCs were paid by universities upfront together with subscription fees (Openaccess.nl, 2019).
3. RESULTS AND DISCUSSION
3.1. Citation Impact of Hybrid OA and Paywall Articles in Hybrid Journals of the Royal Society of Chemistry
3.1.1. The Gold-for-Gold Project of the Royal Society of Chemistry in Russia
We studied the dependence of citation impact on citation period for two reference groups of the Russian articles published in the hybrid journals of RSC. Results are shown in Fig. 1.
Fig. 1. Dependence of the citation impact on citation period for two groups of Hybrid open access (OA) and Paywall articles published in Royal Society of Chemistry hybrid journals in 2016. Measured in April 2018 in Web of Science Core Collection.
In both cases, the y(x) dependence follows the linear law. In the case of the Hybrid OA articles: y=0.4x with R²= 0.997. In the case of the Paywall articles, y=0.3x with R²=0.996. Here, y is the citation impact and x is the citation period calculated in months from the start of the experiment in April 2017. If we count x in years, we will get y=4.8x and y=3.6x for OA and NON-OA articles, respectively. Here 4.8 and 3.6 are the growth rates of the Hybrid OA and Paywall citation impact within the first year after the start of the project.
3.1.2. Country-level Citation Impact of Hybrid OA and Paywall Articles in RSC Hybrid Journals
Results of the country-level analysis performed in April 2018 are shown in Table 1. The ratio of the citation impact of the Hybrid OA to that of the Paywall articles in the RSC journals is as follows: United Kingdom, 1.37; Netherlands, 1.33; and Russia, 1.36 (Table 1).
Thus we can conclude that the values of the citation impact of Russian articles measured in the domain of all hybrid RSC journals and those we obtained in the GFG-RU experiment are nearly the same. In the first year after publication, the Hybrid OA articles show a 35±2% citation advantage over Paywall articles published in the RSC hybrid journals.
3.2. Impact of OA Models on Citation Metrics of Articles in WoS CC and Dimensions Databases
3.2.1. Overall Values in All Subject Areas
In this section we report the overall analysis of citation metrics of the Dimensions articles published within 2009 – 2017. No bias-free filters were applied in these experiments. All the parameters were obtained in the 2nd quarter of 2018. The data set retrieved is the number of articles, the number of citations, %Cited, citation impact, RCR Mean, and FCR Mean (Table 2).
For the citation impact and %Cited we retrieved the data for OA and Paywall articles published in different publication years and built dependence of both parameters on the depth of the citation period.
The temporal dependence of %Cited values of the OA and Paywall articles in WoS CC and Dimensions is shown in Fig. 2.
The benchmark of the %Cited values of the Green OA articles in WoS CC, and Green OA, Gold OA, and Paywall articles in Dimensions are 99%, 89%, 74%, and 60%, respectively.
Table 1. Citation impact of Hybrid OA and Paywall articles published in hybrid Royal Society of Chemistry journals in 2016
|Country||Hybrid OA model||Paywall model||Citation impact ratio
(Hybrid OA to Paywall)
Measured in April 2018 in Web of Science Core Collection.
OA, open access.
Table 2. Citation metrics of the Gold OA, Green OA, and Paywall articles of the Dimensions database published in 2009 –2017
|OA/non-OA article model||No. of publications||No. of citations||%Cited||Citation impact||RCR Mean||FCR Mean|
Measured in second quarter 2018.
OA, open access; RCR, Relative Citation Ratio; FCR, Field Citation Ratio.
Fig. 2. Dependence of the %Cited values of the Green open access (OA), Gold OA, and Paywall articles on the depth of citation period. Measured in second quarter 2018 in Web of Science (WoS) Core Collection and Dimensions.
The dependence of citation impacts on the depth of the citation period for different OA models is shown in Fig. 3.
Fig. 3. Dependence of citation impact on citation period for Green open access (OA), Gold OA, and Paywall articles in the Dimensions database. Measured in second quarter 2018.
Within the nine-year period, temporal dynamics of the citation impact fit the linear dependence y=kx with R²=0.99±0.01, where y is the citation impact, x is the number of years passed after the publication year, and k is the growth rate of the citation impact: k=3.6 for the Green OA, 2.4 for the Gold OA, and 1.4 for the Paywall articles.
Similar results were obtained for the WoS CC OA datasets processed with the InCites instruments. The k values for the WoS CC articles are 4.6, 3.6, and 2.3, respectively, for the Green OA, Gold OA, and Paywall articles.
3.2.2. Citation Impact in Separate Research Areas. Avoidance of Author Selection Bias and APC barrier
It was proposed that the high values of citation impact of the Green OA articles could be affected by a large number of articles in the fields of Medicine and Health published in PubMed because of the OA mandatory policy of the NIH. Articles in the field of Medical Sciences get higher citations than those in many other research fields and thus affect overall values. That is why we analyzed the citation impact of Green OA, Hybrid OA, Gold OA, Pure-Gold-OA, and Paywall articles in separate research areas of the ANVUR and GIPP schemas available in InCites. The OA and NON-OA datasets of articles were selected with the WoS CC functionality and saved to InCites. In each subject area, we retrieved OA/NON- OA citation impact within the 2009 –2017 publication years and built dependence of the citation impact on the citation period.
We found that in both schemas, similarly to the results in Fig. 3, the dependence of the value of citation impact (y) on the citation period (x) can be approximated by the linear law y=kx with R² values close to 0.9 –1.0.
Table 3. The growth rates (k) of the citation impact in the OA and non-OA groups of articles and the raw data of all Web of Science Documents and Times Cited in different research areas of the GIPP schema
|Clinical, Pre-Clinical & Health||8.2||5.6||4.6||2.9||190,415||1,798,562||9.4|
|Engineering & Technology||3.9||4.3||3.9||2.8||104,222||720,625||6.9|
|Arts & Humanities||0.9||1.1||0.9||0.3||40,781||35,847||0.9|
The raw data in the 2014 –2017 window.
OA, open access; GIPP, Global Institutional Profiles Project.
Table 4. The OA/Paywall ratio of the growth rates of citation impact in different research areas of the GIPP schema
|Research area||Green OA / Paywall||Hybrid OA / Paywall||Gold OA / Paywall|
|Clinical, Pre-Clinical & Health||2.8||1.9||1.6|
|Engineering & Technology||1.4||1.5||1.4|
|Arts & Humanities||3||3.7||3|
OA, open access.
The GIPP Schema . In this case we studied articles published in the United Kingdom (England only) and the Netherlands to avoid author selection bias and the APC barrier. The k values calculated in each FoR are listed in Table 3. The raw data for all reference groups and for the whole dataset are given in the Supplemental Materials section. For reference purposes, in Table 3 we present the raw data for all the articles in the 2014 –2017 citation window for each FoR.
The k-values in Table 3 clearly indicate that articles in the different OA reference datasets have an obvious citation advantage over the Paywall articles. The OA/Paywall ratios of the growth rates of citation impact in the GIPP schema FoRs are given in Table 4.
Noteworthy is that in none of our experiments did we detect any effect on citation impact of the early view of the Green OA articles. The ratio of Green OA/Paywall growth rates of citation impact varies from 1.4 for Physical Sciences and Engineering & Technology to 2.9 –3.0 for Life Sciences and Arts & Humanities. The Hybrid OA/Paywall ratio varies from 1.5 in Engineering & Technology to 3.7 in Arts & Humanities. The Pure-Gold-OA data is not reported in this section as the temporal dependence of Pure-Gold-OA citation impact is not linear and the growth rate cannot be obtained. However, the Gold OA group that comprises both Hybrid OA and Pure-Gold-OA articles shows lower k values than those for Hybrid OA articles and indicates the lower citation impact of the Pure-Gold-OA articles. It would be of interest to study the behavior of this group in more detail.
Comparison of the Green OA group with the group of the Hybrid OA shows that the Green OA articles have higher citation impact in the research areas of Life Sciences and Clinical, and Pre-Clinical & Health. In Physical Sciences and Engineering & Technology, the citation impact of the Hybrid OA articles prevails.
The ANVUR Schema . In the case of the ANVUR schema, we investigated the overall dataset of the WoS CC articles published all over the world. Using the InCites instruments and the OA (Yes/No) filter, we could reconstruct results of Dorta-González et al. (2017) and elucidate the group of Pure- Gold-OA articles, that is, articles published exclusively in the Pure-Gold-OA journals and the group of Non-Pure-Gold- OA articles. In Dorta-González et al. (2017), those groups were defined as OA and non-OA articles. However, the non- Pure-Gold-OA group includes not only the Paywall but also the Green OA and Hybrid OA articles. As follows from Tables 3 and 4, the citation impact of the Green OA and Hybrid OA articles is much higher than that of the truly Paywall articles. Therefore, the citation impact of the group of non-Pure-Gold- OA articles is heavily affected by OA and its comparison with the citation impact of the Pure-Gold-OA articles is incorrect. Fig. 4 confirms that the comparison made using the InCites instruments leads to similar conclusions that were reported in Dorta-González et al. (2017).
The Green OA group of articles has the highest citation impact among all OA/non-OA groups in all research fields. Except for Biology, Earth Science, and Physics, the values of citation impact of the non-Pure-Gold-OA articles exceed those of the Pure-Gold-OA articles. This fits the results of Dorta-González et al. (2017), in which the authors found no citation advantage of OA articles. However, in our opinion this conclusion will be different if Hybrid OA and Green OA articles are considered. In our next study, we plan to double-check this statement.
Fig. 4. Growth rates of citation impact in the groups of Green open access (OA), Pure-Gold-OA, and non-Pure-Gold-OA articles. The ANVUR (Italian National Agency for the Evaluation of Universities and Research Institutes) classification schema of subject areas. Web of Science Core Collection database and InCites platform. Retrieved in third quarter 2018.
3.3. Eliminating Grant Selection Bias
As was indicated in Section 2.2, we investigated the citation impact of the Green OA, Gold OA, and Paywall articles published with the grant support of the National Institutes of Health that mandates Green OA and Gold OA publications. To eliminate author selection bias in this research, we filtered the WoS CC articles published in the United Kingdom and the Netherlands that mandate OA publishing.
We obtained dependence of citation impact of Green OA, Gold OA, and Paywall articles published in WoS CC on citation period, as in the 3rd quarter of 2018. Citation period was calculated as the number of years passed after the publication year. Publication year was selected within the 2009 –2017 interval. Results are shown in Fig. 5.
Fig. 5. Temporal dependence of citation impact in the groups of Green open access (OA), Gold OA, and Paywall articles published in the United Kingdom and the Netherlands with National Institutes of Health funding. Web of Science Core Collection database and InCites platform. Retrieved in third quarter 2018.
Temporal dependence of citation impact follows the linear law y=kx in all three groups of articles. The growth rates of citation impact equal 12.4, 9.3, and 6.6 for the Green OA, Gold OA, and Paywall articles, respectively. Thus, the results obtained demonstrate the citation advantage of the OA articles.
We used several approaches to eliminate author selection bias, the APC barrier, and grant selection bias in studying the citation impact of different groups of OA and NON-OA articles in the WoS CC and Dimensions databases. Irrespectively of the bias filters, the results of this analysis ground the conclusion on the higher percent of cited articles and citation impact of OA as compared with Paywall articles. The Green OA articles demonstrate the highest values of citation metrics among all
the OA models. The citation impact fits the linear dependence on the depth of the citation period: y=kx. Here, y is the citation impact, x is the depth of the citation period equal to the number of years passed after the publication year, and k is the growth rate of the citation impact. The growth rates of citation impact of the Green OA, Gold OA, and Hybrid OA articles exceed those of the Paywall articles. The values and the growth rates of citation impact vary in the different research areas. The detailed relevant studies of different research areas fall beyond the scope of the current article and could be the topic of future research.
No measurable effect of the early view postulate was detected for the citation impact of the Green OA articles. We also argue the earlier results reported no citation advantage for Pure-Gold- OA articles (articles published in pure-open-access journals) vs. non-Pure-Gold-OA articles (Dorta-González et al., 2017). In our opinion, the high level of the citation impact of non-Pure- Gold-OA articles measured in Dorta-González et al. (2017) was caused by the high citation impact of the Green OA and Hybrid OA articles that could not be eliminated in the Paywall journals at that time.
The authors express their gratitude to Ivan Sterligov, Director of Scientometrics Center of National Research University Higher School of Economics, Moscow, Russia for valuable comments and fruitful ideas.
Supplemental materials are available from https://doi.org/10.1633/JISTaP.2019.7.2.2