Section: Archaeology
Topic: Archaeology

Replication report for Marwick (2025) “Is archaeology a science?”, including new data from OpenAlex

Corresponding author(s): Queffelec, Alain (alain.queffelec@u-bordeaux.fr)

10.24072/pcjournal.710 - Peer Community Journal, Volume 6 (2026), article no. e38

Get full text PDF Peer reviewed and recommended by PCI

Abstract

This document is a reproduction and replication of the first part of Ben Marwick’s paper published in Journal of Archaeological Science, which analyzes the hard/soft position of archaeology and the evolution through time by the proxy of bibliometric data (Marwick, 2025). I confirm the complete computational reproducibility of Marwick (2025) while also pointing to a few problems in the manuscript. As for the replication of the study, while Marwick’s article is based on the analysis of the Web of Science data of archaeological journals and articles, I use the data from OpenAlex, a free and open-source database using more diverse sources. The analysis of the data from OpenAlex confirms the trends visible in the replicated study for the position of the trends of publication in archaeological journals, for its evolution through time, and for the classification of different journals. Some differences are still visible, mainly since OpenAlex data is less influenced by recent trends in the publication process due to its more balanced data for the second half of the 20th century. This study also emphasizes that using the free and open source OpenAlex database is suitable for this kind of scientometric study instead of commercial databases, but that OpenAlex could still be improved, especially in terms of quality of some metadata and cited references.

Metadata
Published online:
DOI: 10.24072/pcjournal.710
Type: Research article
Keywords: open science, archaeology, reproducibility, replication

Queffelec, Alain  1

1 Univ. Bordeaux, CNRS, Ministère de la Culture, PACEA, UMR 5199, F-33600 Pessac, France
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
@article{10_24072_pcjournal_710,
     author = {Queffelec, Alain},
     title = {Replication report for {Marwick} (2025) {{\textquotedblleft}Is} archaeology a science?{\textquotedblright}, including new data from {OpenAlex
}},
     journal = {Peer Community Journal},
     eid = {e38},
     year = {2026},
     publisher = {Peer Community In},
     volume = {6},
     doi = {10.24072/pcjournal.710},
     language = {en},
     url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.710/}
}
TY  - JOUR
AU  - Queffelec, Alain
TI  - Replication report for Marwick (2025) “Is archaeology a science?”, including new data from OpenAlex

JO  - Peer Community Journal
PY  - 2026
VL  - 6
PB  - Peer Community In
UR  - https://peercommunityjournal.org/articles/10.24072/pcjournal.710/
DO  - 10.24072/pcjournal.710
LA  - en
ID  - 10_24072_pcjournal_710
ER  - 
%0 Journal Article
%A Queffelec, Alain
%T Replication report for Marwick (2025) “Is archaeology a science?”, including new data from OpenAlex

%J Peer Community Journal
%] e38
%D 2026
%V 6
%I Peer Community In
%U https://peercommunityjournal.org/articles/10.24072/pcjournal.710/
%R 10.24072/pcjournal.710
%G en
%F 10_24072_pcjournal_710
Queffelec, A. Replication report for Marwick (2025) “Is archaeology a science?”, including new data from OpenAlex. Peer Community Journal, Volume 6 (2026), article  no. e38. https://doi.org/10.24072/pcjournal.710

PCI peer reviews and recommendation, and links to data, scripts, code and supplementary information: 10.24072/pci.archaeo.100616

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Full text

The full text below may contain a few conversion errors compared to the version of record of the published article.

Introduction

Replication and reproduction of archaeological studies are extremely rare. Aren’t they supposed, though, to be among the pillars of the scientific method (Popper, 1959)?

Following the recommendations by Barba (2018) and the National Academies of Sciences (2019) which are adopted by Marwick et al. (2020) or Karoune and Plomp (2022), reproduction is defined as “re-creating the results” given that “authors provide all the necessary data and the computer codes to run the analysis again”, while replication is defined as “arriv[ing] at the same scientific findings as another study, collecting new data (possibly with different methods) and completing new analyses”. The former is also called “exact replication” and the latter “direct replication” in the EDCR taxonomy summarized by Matarese (2022). Bibliometric research for any of these two words “reproduction” or “replication” along with the word “archaeology” in Google Scholar and OpenAlex did not yield any results of research articles replicating or reproducing another archaeological article. Published articles clearly emphasizing their aim at being a reproduction or replication of published results are still absent from the literature, even if some articles can contain such reproduction studies (e.g., Foecke et al., 2025) and others evaluate the inter-observer errors at the stage of data acquisition, mainly through blind tests (Atici et al., 2013; e.g. Kot et al., 2025; Pargeter et al., 2023). Despite the accelerating use of programming languages in archaeological articles (Schmidt and Marwick, 2020) and the growing awareness of reproducibility issues within the community as seen with the advent of “Associate Editors for Reproducibility” in some journals (Farahani, 2024; Marwick, 2025), replicability has yet to be embraced in archaeology (Karoune and Plomp, 2022; Marwick, 2022). Despite the article by Marwick (2025) being a scientometric study about archaeology rather than an archaeological study, this manuscript attempts to reproduce and replicate the results published in the first part of his article regarding the hard/soft science categorization of publication practice in archaeology (sections 2 and 3). In the second part of his article (sections 4-7), Marwick explains the importance of reproducibility in science, his work as ‘Associate Editor for Reproducibility’ for the first year in the Journal of Archaeological Sciences, and proposes ways to improve reproducibility in archaeological studies. I hope that in the near future more articles of replications and reproductions of archaeological studies will be published, and that the reproducibility of archaeological articles will be enhanced following the advice provided in his article.

This document will use Marwick’s shared code and data to reproduce the results presented in the sections 2 and 3 of the article. This process, originally thought as a personal opportunity for me to learn new aspects of R, Quarto documents, the organization of files in such a research project, and to gain experience with software forges, also served as a means to verify the presented results.

Delving into the data during the manuscript reproduction, the idea to replicate Marwick’s results using OpenAlex occurred to me. I was indeed surprised to read in the article that there were so few archaeological journals with 100 papers in the Web of Science (WoS) database that Marwick (2025) had to limit his analysis to just 20 journals. It is also striking that the commercial database of WoS only includes 108 journals in its Archaeology category. The replication section of this document will thus apply the same methodology as Marwick to different but supposedly equivalent and broader dataset, using mainly OpenAlex instead of Web of Science, and also OpenCitations. As open-source projects, OpenAlex and OpenCitations provide free access to bibliometric data for researchers and institutions worldwide, unless the commercial databases such as Web of Science and Scopus (Peroni and Shotton, 2020; Priem et al., 2022). It is therefore crucial to determine whether OpenAlex can yield comparable results to evaluate research openly (Rizzetto and Peroni, 2023), especially now that many research institutions have decided to stop relying on the commercial databases (e.g. CNRS, 2025, 2024; University of Jyväskylä, 2025; Utrecht University, 2025; Vrije Universiteit Amsterdam, 2025; West Virginia University, 2025). Additionally, it is important to evaluate whether OpenAlex — being a more inclusive database, less biased toward English-language publications and experimental sciences — can broaden the scope of scientometric research (Andersen, 2023). This replication will assess two key aspects. First, does OpenAlex offer sufficient data for researchers to replicate the analyses conducted by Marwick (2025)? Second, if so, do the results produced align closely enough that their interpretation would remain consistent?

The core purpose of this article is not to confirm or refute whether archaeology is a hard or soft science based on bibliometric proxies. Instead, it focuses on assessing whether 1) the original study is reproducible from a computational standpoint and 2) whether its findings can be replicated using a different (and open) data source.

Reproduction of Marwick (2025)

Marwick (2025), in its sections 2 and 3, apply to the archaeological literature a methodology proposed by Fanelli and Glänzel (2013). The goal of this method is to evaluate the position of different disciplines on a hard/soft science scale based on bibliometric proxies. These indices are supposed to organize the disciplines based on their scientific publications, “with papers at the softer end of the spectrum tending to have fewer co-authors, use less substantive titles, have longer texts, cite older literature, and have a higher diversity of sources” (Marwick, 2025). This purely bibliometric analysis and the classification of sciences as hard or soft are, of course, debatable, but this is not the purpose of this document.

To compare archaeology with other disciplines with this methodology, the workflow followed by Marwick (2025) is:

  1. download data from Web of Science for “Archaeology” category,

  2. extract and organize useful variables from Web of Science dataset (authors, title, journal, number of pages, year etc.),

  3. filter the articles for the top 25 h-indices journals and then for journals with at least 100 published papers,

  4. calculate the indices necessary to compare with other disciplines (number of authors, relative title length, number of pages, age of references (Price’s index), diversity of references (Shannon’s index),

  5. plot indices calculated for archaeology with indices for physics and social sciences by simulating the data to reproduce boxplots visible in Fanelli and Glänzel (2013),

  6. plot the evolution of the indices overtime,

  7. compare the different journals for each indices and with a multivariate analysis.

Marwick (2025), by providing a well-organized code and data organization, allows a complete computational reproduction.

Nevertheless, reading the article and the code carefully to reproduce it, I identified some concerns with the published version of the manuscript which I will outline here:

  • The manuscript states on page 2 that the selection of the “top-ranking 25 journals [in WoS archaeological was made based on] their h-indices as reported by Clarivate’s Journal Citation Indicator”. However, the code of the Quarto document and the dataset itself does not mention H-indices; the filter is actually based on the 2022 Impact Factor of the journals. This is an error in the text of the manuscript, almost a typo, since it does not change anything to the results, but ultimately, the only way to realize that the list of journals is based on the 2022 Impact Factor and not H-indices is by examining the data and code.

  • The dataset of WoS’s Impact Factor contains one “< 0.1” and one “NA” values which integrates the top 25 lines of the dataset when it is arranged in descending order based on this variable. When the values of these two journals IF are changed to 0, the list of the 25 journals should have included Journal of African Archaeology and World Archaeology (Table 1). Both journals would have met the criteria for the final list of journals even after applying the threshold of at least 100 papers in WoS, therefore extending this list to include 22 journals.

  • The Shannon’s index is incorrecly calculated in Marwick’s code and, therefore, should not be compared with the data presented in Fanelli and Glänzel (2013). Although it is accurately described in the comments of the code, the code itself incorrectly computes the Shannon index of the references instead of the sources. Specifically, it divides p_i, the number of times a reference appears in an article (which is always one, as each reference is listed only once in each article), by the total number of citations of that reference in the entire dataset. Instead, it should calculate the Shannon index of the sources of the references. Additionally, the text of the manuscript is also misleading as it mentions “The diversity of references” where it should be “The diversity of sources”, as presented in Fanelli and Glänzel (2013).

  • The shared code does not contain the code to produce Figure 2 of the manuscript. Version 1.3 of the code loads a pre-existing .png from the figures folder. The code to produce a very similar figure is present in the version 1.1 of the code, but it is not exactly the same.

Other issues are small code errors, which I mentioned by pushing a commit to the GitHub repository of the original article

Replication of the bibliometric study with data from OpenAlex and OpenCitations

After reproducing the published results, I realized that the Web of Science data for archaeology was limited. Marwick had to integrate the full dataset for archaeology, from 1975 to 2025, to obtain a sufficient amount of articles (9697) because keeping only the year 2012, as in Fanelli and Glänzel (2013), would have led to run the analysis on only 303 articles. He also had to restrict the analysis to just 20 journals to keep journals with at least 100 published articles. With this in mind and the fact that WoS data is not accessible to everyone due to its commercial status, I decided to conduct the same analysis using the larger, open-source dataset provided by OpenAlex (Priem et al., 2022), and also data from OpenCitations (Peroni and Shotton, 2020). This approach would allow me to determine whether similar results could be obtained with a more extensive dataset and whether freely accessible data could support the same type of research.

OpenAlex, as described on the website of its creator, the nonprofit company OurResearch, is an “open and comprehensive catalog of scholarly papers, authors, institutions, and more”. Established in 2021, it is a free, open source, and open access bibliographic database that can serve as an alternative to commercial databases and is already supported by many public institutions (e.g. Badolato, 2024; Jack, 2023; OurResearch team, 2021; Singh Chawla, 2022). The OpenAlex database has a much broader scope than Web of Science and its dataset is significantly larger (Alperin et al., 2024; Culbert et al., 2025). This can be particularly crucial for archaeology, as the vast majority of references cited in publications from the History & Archaeology field (from OECD classification) are not identifiable in WoS (Figure 5 Andersen, 2023). However, caution must be exercised with the OpenAlex dataset, as some metadata are still relatively poorly documented (Alperin et al., 2024). This will necessitate additional filtering in the OpenAlex dataset to focus on usable data rather than the entire dataset.

OpenCitations is “an infrastructure organization […] dedicated to the publication of open citation data […], thereby providing a disruptive alternative to traditional proprietary citation indexes.” (Peroni and Shotton, 2020). It provides the connections between scientific publications, and a limited number of information on each of the works, but not alike OpenAlex. For example, there is no information on scientific field of the works, it is not possible to filter the publications from a journal of by an author etc. The data from OpenCitations will be used in this work only on the subset of articles which are registered in both OpenAlex and the Web of Science dataset provided by Marwick (2025) for direct comparison of some of the bibliometric indices accessible in the three datasets.

Table 1 - Table presenting the issues with the list of journals selected in Marwick (2025)

Top25 IF WoS (Marwick 2025)
(Issues with data, should NOT be in the list of 25)

20 journals used (Marwick 2025)
(Removed journals <100 papers (n papers))

22 journals that should have been used (This work)
(Should have been in the list)

Advances in Archaeological Practice

Advances in Archaeological Practice

Advances in Archaeological Practice

African Archaeological Review

African Archaeological Review

African Archaeological Review

American Antiquity

American Antiquity

American Antiquity

Antiquity

Antiquity

Antiquity

Archaeological and Anthropological Sciences

Archaeological and Anthropological Sciences

Archaeological and Anthropological Sciences

Archaeological Dialogues

Archaeological Dialogues (94)

-------------

Archaeological Prospection

Archaeological Prospection

Archaeological Prospection

Archaeological Research in Asia

Archaeological Research in Asia

Archaeological Research in Asia

Archaeometry

Archaeometry

Archaeometry

Environmental Archaeology

Environmental Archaeology

Environmental Archaeology

European Journal of Archaeology

European Journal of Archaeology

European Journal of Archaeology

Geoarchaeology

Geoarchaeology

Geoarchaeology

-------------

-------------

Journal of African Archaeology

Journal of Anthropological Archaeology

Journal of Anthropological Archaeology

Journal of Anthropological Archaeology

Journal of Archaeological Method and Theory

Journal of Archaeological Method and Theory

Journal of Archaeological Research

Journal of Archaeological Research

Journal of Archaeological Research

Journal of Archaeological Science Reports

Journal of Archaeological Science

Journal of Archaeological Science

Journal of Archaeological Method and Theory

Journal of Archaeological Science Reports

Journal of Archaeological Science Reports

Journal of Archaeological Science

Journal of Cultural Heritage

Journal of Cultural Heritage

Journal of Cultural Heritage

Journal of Field Archaeology

Journal of Field Archaeology

Journal of Field Archaeology

Journal of Historic Buildings and Places

Journal of Historic Buildings and Places (0)

-------------

Journal of Island & Coastal Archaeology

Journal of Island & Coastal Archaeology

Journal of Island & Coastal Archaeology

Journal of World Prehistory

Journal of World Prehistory (63)

-------------

Lithic Technology

Lithic Technology (60)

-------------

Mediterranean Archaeology & Archaeometry

Mediterranean Archaeology & Archaeometry

Mediterranean Archaeology & Archaeometry

Transactions of the Ancient Monument Society

Transactions of the Ancient Monument Society (2)

-------------

-------------

-------------

World Archaeology

 
Data extraction from OpenAlex

Journals’ data extraction

To extract data from journals (and from works in the next section) , I used openalexR which is “an R package to interface with the OpenAlex API” (Aria et al., 2024).

Unfortunately, obtaining all the journals from a subfield of OpenAlex is not feasible, as ‘journals’ are not categorized by fields or subfields in OpenAlex, unlike ‘works’ (OpenAlex uses the term ‘work’ to encompass all types of scientific production).

Consequently, I utilized the list of archaeological journals from Web of Science to retrieve the journal’s information from OpenAlex, as this list probably contains the largest journals which will be included in a top 25 list.

While requesting OpenAlex with the list of 108 journals from WoS, I received only 38 results. This is due to variations in journal names, such as the use of capitals letters, dashes, etc. When I adjusted manually the names to match those in OpenAlex for the journals which are further used in Marwick’s top 25 and top 20 lists, I received 69 results, including all the journals from these lists.

To gather information from OpenAlex for as many journals as possible, I had to check each journal individually with its name or sometimes its ISSN. This was necessary because special characters from journal names have been removed in the WoS dataset, many journals which title begins with “The” are in the WoS dataset without the “The”, among similar issues. For example in WoS a journal is called ‘Hesperia’ when it is called ‘Hesperia The Journal of the American School of Classical Studies at Athens’ in OpenAlex. Unfortunately, this task of extracting journals is not straightforward, and it would be much more efficient that the journals in OpenAlex have also fields and subfields to gather the information in a single request. Ultimately, I successfully retrieved data through openalexR and the API for 85 out of the 108 journals listed in WoS. However, increasing the count from 69 to 85 did not alter the top 25, and the still missing journals likely would not have been in the top 25 either, given that they are not major journals.

Articles’ data extraction

Using the API via web browser

I attempted to download all articles from OpenAlex in the subfield of Archaeology (number 1204) by accessing the API through an internet browser at this address: https://api.openalex.org/works?filter=type:article,from_publication_date:1975-01-01,to_publication_date:2025-12-31,topics.subfield.id:1204. This results in a gigantic list of more than 1.8 million references that can only be viewed 25 entries at a time. If you download it as JSON files, it is only by a single page of the first 25 or at best 100 results. This is not feasible manually. For such large queries, the OpenAlex team recommends downloading their full dataset, a 300 GB JSON file. However, I didn’t try this way because of my lack of experience in manipulating large JSON files.

Using the openalexR package

When requesting works with openalexR, it is possible to use an entire subfield ID, in this case ‘1204’ for ‘Archaeology’. Simply counting the number of articles in the subfield Archaeology in OpenAlex between 1975 and 2025 yields a result of 3096906. Given the size of this sample, I downloaded it only once and then simplified it to keep only necessary fields and cleaned it for doublons. I then saved this data in case I would use it later, and I also extracted a subset from this enormous dataset keeping only the articles from the top 25 journals based on their 2-years-mean-citedness. This subset contains 20551 articles.

To replicate Marwick’s work, I also directly extracted from OpenAlex the metadata of all the articles published in the same list of journals. It is very interesting to observe that this extraction leads to a dataset of 33395 after cleaning, much bigger than the previous subset. This is due to the fact that many articles from these journals are not classified with the subfield ‘Archaeology’ but with ‘Anthropology’, ‘Geophysics’, ‘Classics’, ‘History’ etc. A very telling example is to compare the number of articles published in the journal Advances in Archaeological Practice in 2025 based on OpenAlex, which is 43, with the number of articles with the subfield ‘Archaeology’ in this same journal for this same year, which is 5. For the sake of replication of Marwick’s article, I kept the dataset containing all the articles published in the top-25 journals for the rest of this article.

To these articles metadata, I added the metadata from the cited papers for each article. This step required approximately 20 hours, as it requires submitting an individual API request for each article, and the duration of each request is variable as the amount of information to retrieve depends on the number of references in the article. The data extracted this way contains many inconsistency, errors, special characters etc. and I had to clean and filter it, as visible in the code used to produce this document.

Despite the availability of other journal-level metrics such as H-index and i10 in OpenAlex, data was not extracted, and the graphics not produced, for the top 25 journals based on them. This is because these metrics are strongly correlated with the seniority of the journals, similar to when these metrics are used to compare individual researchers (Hirsch, 2005; Kozak and Bornmann, 2012).

Comparison between WoS and OpenAlex

Journals’ comparison

The fact that there are only 108 journals in the archaeology category of the Web of Science database, is unfortunately difficult to compare directly with OpenAlex since it is not possible to extract all the journals of the subfield ‘Archaeology’ form the open database. I therefore filtered the OpenAlex dataset of all articles published between 1975 and 2025 with the subfield ‘Archaeology’ (ca. 1,800,000 articles) and grouped them by source. This treatment reveals that 2314 sources are registered in OpenAlex with at least 100 articles with the subfield ‘Archaeology’, including some data repository or public archives such as Zenodo, HAL etc. The same treatment for a limit of 500 and 1000 articles gives a result of 337 and 122 sources respectively. This difference is certainly at least partly due to the fact that being indexed in WoS necessitate application by the journal and a validation by Clarivate, the company owning WoS.

Once the journals’ information are extracted from OpenAlex based on the list of archaeological journals from WoS, it is possible to compare the two datasets at the journal level for 85 out of the 108 journals.

The WoS dataset is limited in terms of papers listed when comparing journals present in both databases. It is interesting to demonstrate this by looking at those journals which have been removed from Marwick’s top 25 list because they had less than 100 articles. Archaeological Dialogues (94 papers in Wos), Journal of World Prehistory (63 papers in WoS), and Lithic Technology (60 papers in WoS) have respectively 732, 354, and 893 papers in the OpenAlex database. I do not know why the WoS dataset is so small even for journals which are in the list.

The metadata of both datasets are also different. OpenAlex gives much more variables and information on the selected works or journals than WoS. The main issue with both datasets is the lack of many information regarding books, book chapters, monographs, and grey literature, as scientific production recorded in the database and also as references cited in the articles.

While the size of the OpenAlex dataset can be seen as an advantage due to its broader representation of diverse sources, it may also introduce noise, such as poor-quality data or duplicates. Since both datasets have their issues, I cleaned it as much as possible as Marwick did too for his data.

Since both datasets provide the same metric, 2-years-mean-citedness (2ymc) which is the same as Impact Factor, the top 25 journals from WoS can be compared with the top 25 from OpenAlex. It is important to note that this metric is calculated by each database based on their own data. OpenAlex for example does calculate the 2ymc of each journal based solely on the articles that they have in their dataset, and by counting the references that they do have in their dataset which cites each article. OpenAlex does not rely on another source to directly provide the 2ymc of each journal or even to calculate the number of citations of an article: they do use their internal data to calculate this metric. This lead to at least one issue that I was able to spot in the top 25 list of OpenAlex archaeological journals based on the 2ymc, since in the original dataset, the journal Archaeofauna rank second with a 2ymc of more than 4, which is very high for such a specialized journal. The value of the 2ymc of Archaeofauna is artificially inflating: OpenAlex extracted the number of citations of each paper from 2023 from the pdf of the articles, which all contain the Table of Content of the volume with the doi. Thus, each paper of the volume is considered citing each other paper of the volume, creating numerous false citations. I removed this journal from the top 25 and kept in the list the journal ranked 26th.

Both lists are at the same time similar and pretty different. I can outline here some specific points:

  • As for the WoS list, the 25 journals are English-language journals. They are published by 9 different publishers: Springer Nature, Wiley, Elsevier, Cambridge University Press, SAGE, Taylor & Francis, Antiquity Publications, De Gruyter Brill, and Equinox Publishing.

  • American Antiquity is missing from the list because it has a pretty low 2ymc in OpenAlex and is therefore ranked 41st. It is related to the fact that all the book reviews published in this journal are counted as articles in OpenAlex, therefore strongly diminishing the ratio between citations and the number of published articles. If those book reviews would be removed, the 2ymc of American Antiquity would be 2.33 instead of 0.83, and would be ranked at 8th position.

  • All the 25 journals have largely more than 100 works in the OpenAlex dataset, so we can keep them all based on Marwick’s decision to keep only journals with more than 100 articles for further analysis. The minimum here is 342 works for Journal of Archaeological Research.

  • Strong discrepancies (>15-20) between the 2ymc rankings from OpenAlex and the IF ranking from Web of Science can be detected for 10 journals: The International Journal of Nautical Archaeology, Advances in Archaeological Practice, Ancient Mesoamerica, Cambridge Archaeological Journal, Journal of Social Archaeology, Azania Archaeological Research in Africa, Open Archaeology, Levant, Journal of World Prehistory, and Journal of Mediterranean Archaeology. These differences can go in both directions, and show that the calculation of the 2ymc is subject to strong differences between databases. As a reminder, this metric is calculated, for a journal, as the ratio between the number of citations in 2024 of articles published in 2022 and 2023 in the journal, divided by the number of articles published in 2022 and 2023. I think it is possible to interpret these strong differences as issues in the databases about the references, rather than about the number of published articles which is a value much easier to record.

  • The OpenAlex dataset also containing other journal’s metrics H-indices and i10, I also made the top25-rankings on these metrics (Table 2). H index defined as the number of papers (h) with citation number ≥ h (Hirsch, 2005). The i10 index, created initially by Google Scholar, is the number of articles that have been cited at least 10 times. These rankings are strongly dissimilar with the 2ymc ranking, and are less impacted by recent trends and more representative of long-term publication habits in archaeology and put at the top the historical journals of archaeology.

  • The top 25 2ymc journals from OpenAlex in this manuscript is also quite different from the list presented in the previous version of this manuscript (June 2025), and show that these values can change in few months. Some journals are present in this list but were completely absent 6 months ago (e.g. Ancient Mesoamerica, Azania Archaeological Research in Africa) and despite my rapid check on the data, I did not spot any real reason for that. On the other hand, the top 1 journal 6 months ago in the V3 of this manuscript, Australian Archaeology, has disappeared from the list, and I discovered why it was at this position: the journal published a book review of “The Dawn of Everything” (Flexner, 2022) which has 1137 citations in OpenAlex and artificially inflates the 2ymc of this journal. I think that most of these citations should relate to the book reviewed itself and not to this review. Here again, we can see that the references in OpenAlex are not as clean as one could hope.

Table 2 - Table of the top 25 journals for 2-years-mean-citedness in the OpenAlex dataset in December 2025, their publisher, and the ranking of the same journals in the WoS dataset used by Marwick (2025)

Top 25 2ymc journals (OpenAlex)
by 2ymc order (works count)
(December 2025)

Publisher

Ranking with WoS IF

1. Journal of Archaeological Research (342)

Springer Nature

1

2. Journal of Cultural Heritage (3169)

Elsevier

2

3. The International Journal of Nautical Archaeology (3756)

Wiley

68

4. Journal of Anthropological Archaeology (1548)

Elsevier

9

5. Advances in Archaeological Practice (501)

Cambridge University Press

23

6. Journal of Archaeological Method and Theory (853)

Springer Nature

5

7. Journal of Archaeological Science (7472)

Elsevier

4

8. Ancient Mesoamerica (1086)

Cambridge University Press

51

9. Archaeological and Anthropological Sciences (2315)

Springer Nature

6

10. Cambridge Archaeological Journal (1632)

Cambridge University Press

27

11. Journal of Social Archaeology (434)

SAGE

28

12. Azania Archaeological Research in Africa (1365)

Taylor & Francis

31

13. Antiquity (14505)

Antiquity Publications

11

14. Archaeological Prospection (1262)

Wiley

12

15. Lithic Technology (893)

Taylor & Francis

14

16. Open Archaeology (463)

De Gruyter Brill

45

17. Archaeological Dialogues (732)

Cambridge University Press

10

18. Environmental Archaeology (967)

Taylor & Francis

21

19. Geoarchaeology (2419)

Wiley

15

20. Journal of Archaeological Science Reports (5109)

Elsevier

16

21. Levant (1176)

Taylor & Francis

41

22. Journal of World Prehistory (354)

Springer Nature

7

23. The Journal of Island and Coastal Archaeology (655)

Taylor & Francis

13

24. Journal of Mediterranean Archaeology (544)

Equinox Publishing

42

25. Archaeometry (3196)

Wiley

18

The list of the top cited journals from the OpenAlex dataset (Table 3) shows that the most cited journal, Journal of Archaeological Science, is more than twice as cited as the second one, American Antiquity, and more than three times the third one Antiquity. This highligths the importance of this journal in the community and may partly explain the low values of Shannon’s index. It is also interesting to note the presence of highly reputable generalist journals in position 5 and 6 for Nature and Science respectively, and even of the PNAS in position 17.

This table of the most cited journals differs significantly from the same table in Marwick (2025) (Table 4, calculated in the Marwick’s code but not shown in the published article). This table ranks Journal of Archaeological Science first, followed by American Antiquity with half the citations, as with the data from OpenAlex. Starting from the third position, the order changes considerably, with Antiquity ranking 3rd and not Archaeometry for instance. The journal Nature is here in the 15th position, and the PNAS in 9th. Quaternary International is in 5th position woth WoS dataset, whereas it is 12th in the OpenAlex table, etc. This indicates that the differences between both datasets regarding references are relatively significant, which explains the variations in the Diversity of source results. This discrepancy could be due to the recency of the WoS dataset (70% of the articles are post-2012), as seen for example with the presence of PLoS ONE (created in 2006) and Journal of Archaeological Science: Reports (created in 2015) in the top 20 sources, despite both journals being relatively new outlets.

Table 3 - Top 20 journals from OpenAlex dataset

rank

Journal

N.citations

rank

Journal

N.citations

1

journalofarchaeologicalscience

62611

11

man

10440

2

americanantiquity

24756

12

journalofanthropologicalarchaeology

10298

3

antiquity

17691

13

americanjournalofphysicalanthropology

10163

4

archaeometry

15720

14

worldarchaeology

9855

5

science

12520

15

journalofhumanevolution

9746

6

nature

12091

16

proceedingsofthenationalacademyofsciences

9013

7

currentanthropology

11634

17

americananthropologist

7960

8

radiocarbon

10602

18

americanjournalofarchaeology

6839

9

quaternaryinternational

10567

19

quaternarysciencereviews

6703

10

journaloffieldarchaeology

10557

20

journalofculturalheritage

6135


Table 4 - Top 20 journals from Web of Science dataset

rank

Journal

N.citations

rank

Journal

N.citations

1

jarchaeolsci

24814

11

thesis

4222

2

amantiquity

12718

12

radiocarbon

4177

3

antiquity

7447

13

jfieldarchaeol

3733

4

janthropolarchaeol

6100

14

jarchaeolscirep

3614

5

quaternint

4996

15

nature

3561

6

curranthropol

4983

16

amjphysanthropol

3444

7

worldarchaeol

4754

17

jarchaeolmethodth

3376

8

science

4733

18

jhumevol

3296

9

pnatlacadsciusa

4615

19

amanthropol

3177

10

archaeometry

4477

20

plosone

3154

 


Articles’ comparison

At the level of the articles, I compared the datasets from OpenAlex, Web of Science, and OpenCitations, for three subsets: the total of works in the archaeology field, the total of articles in the same field, the articles present in the top-20 or top-25 journals ranked by their 2ymc, and the articles which are represented in the three datasets based on identical DOIs (Table 5). As previously demonstrated at a higher level than just archaeology, the OpenAlex dataset is significantly larger than the Wos dataset (Alperin et al., 2024; Culbert et al., 2025), at every level.

Only articles from OpenAlex were extracted for this study for the sake of comparison with Marwick’s work. This, of course, does not fully represent the scientific production of the discipline. A similar work encompassing all the sources of archaeological publications would be of course even more interesting but is out of the scope of this article.

The WoS dataset of archaeological articles is much smaller than the OpenAlex dataset for the same field: 28 871 compared to 2 050 903. OpenAlex aggregates data from a much wider diversity of sources. However, this number of archaeological articles in OpenAlex is largely underevaluated since many articles published in archaeological journals are not indexed with the subfield ‘Archaeology’. I couldn’t evaluate the number of references in OpenCitations for a specific field. The number of articles kept for the analysis in this article, which are only from the top-25 journals for OpenAlex and top-20 for WoS, is also much bigger for OpenAlex: 33 395 vs. 9 697. The WoS dataset also shows a bias towards more ancient articles (Wilcoxon test p << 0.01), when OpenAlex integrates much more recent articles and also older ones (Figure 1).

Table 5 - Summary of basic statistics for the three datasets

 

For Archaeology

For selected journals

For common articles

 
 

N documents

N articles

N articles

Authors

Pages

References

N articles

Authors

Pages

References

 

Mean

Median

Mean

Median

Mean

Median

Mean

Median

Mean

Median

Mean

Median

OpenAlex

3 096 906

2 050 903

33 395

2.96

2

12.60

11

38.03

28

3 665

3.44

3

15.17

13

63.21

51

Web of Science

-

28 871

9 697

3.42

3

15.96

15

69.00

59

3 665

3.43

3

16.17

14

73.34

60

Open Citations

-

-

-

-

-

-

-

-

-

3 665

3.42

3

16.19

14

31.06

25


Figure 1 - Comparison of the distribution of articles by year in Web of Science (WoS) and OpenAlex.

As for other basic statistics such as number of authors and pages, Table 5 shows that the values can be different for different datasets, but the values are very similar when the dataset is kept to articles which are common to all of them. This last subset, based on similar DOIs and therefore for indexed journals only, shows that good metadata is possible to achieve in specific cases. On the other hand, the OpenAlex dataset also shows in archaeology the same limitations regarding some metadata which have been previously demonstrated (Alperin et al., 2024; Culbert et al., 2025), especially the list of references cited in the articles. I also show here that the Open Citations has even less cited references (Table 5). When looking more specifically in the datasets, I observe that the Wos dataset is cleaner than the OpenAlex dataset. For example I had to remove duplicates, papers without authors information, page numbers, reference list etc. from OpenAlex (see line 252 in the quarto document). The WoS dataset is not without its issues either. Upon examining the data produced during the preparation of Marwick’s manuscript, problems remain even after cleaning by his code due to discrepancies in the references’ structure. For instance, in the first 4 lines, entries like “11swmuspap” or “1964uclaarchsurv” appear as journals, which, of course, won’t match other mentions of these same journals due to the remaining numbers. Additionally, even in the top-cited journals used for calculating the Shannon’s indices, there are problems in the WoS dataset. This list includes entries such as “[anonymous]thesis”, “” (empty cells), “notitlecaptured” etc.

When comparing the data for articles present in the three datasets (based on similar doi), the number of article is only 3665. The similarity between the datasets is very strong for the number of authors, the number of pages, and the year of the article (Figure 2A, C, and D). On the other hand, the length of the title, one of the metric used later in the study, is significantly longer in WoS than it is in OpenAlex and OpenCitations (Figure 2B). This is due to the fact that for some articles, especially for the journal Advances in Archaeological Practice but this is also true for some articles from other journals, OpenAlex makes the difference between title and subtitle, while WoS merges these two parts of the title to count the number of words. The year of publication is also very similar between OpenAlex and OpenCitations, but often a bit lower than the year of publication registered in Web of Science. This is the case especially for relatively recent articles, published after 2000, for which the article was published online at the end of the year X, but attributed officially to a volume in the year X+1. The main issue is clearly the number of references which is the weak point of OpenAlex already mentioned in the literature (Culbert et al., 2025), and it is even lower in OpenCitations (Figure 2E).

Finally, as of the extraction of the data for this analysis (January 2026), and after automatically cleaning the dataset extracted from OpenAlex, there are 33395 unique articles from 1975 to 2025 in the top 25 journals (identified by their 2-years-mean-citedness), for which the necessary variables to replicate Marwick’s results are available.

Among these, 4508 papers have zero references, 519 have only 1, and 534 only 2. Manual checking of some of these articles show that these values are incorrect. Given the issues with references listed in OpenAlex, I think that some metrics calculated from this dataset will not be accurate. This is particularly the case for the diversity of sources, since so many sources are not considered at all and of course the missing references are not all coming from single journals. We can even think that big journals have their references correctly referenced with their DOI, while books, chapter books, conference proceedings are probably less referenced due to the absence of such permanent identifiers. This will of course lead the final result of diversity of sources to be strongly underevaluated, which is also probably the case for the Web of Science data. Given that the extraction of all this data also allows for ranking the top-cited references, I present in Table 6 the 20 papers that have the most citations in the dataset. The list indicates that the most cited references are primarily methodological (radiocarbon and isotopes) or theoretical articles, and sourcebooks, rather than case studies.

Figure 2 - Comparison of Web of Science (Wos), OpenAlex (OA), and OpenCitations (OC) data for the articles present in the three datasets. A. Number of authors, B. Length of the title, C. Number of pages, D. Attributed year, E. Number of references. For each plot, a dashed-line represents y = x.

Table 6 - Top 20 references cited in the OpenAlex dataset

rank

Article

N.citations

rank

Article

N.citations

1

IntCal13 and Marine 13 radiocarbon age calibration curves 0-50,000 years cal BP

1869

11

Preparation and characterization of bone and tooth collagen for isotopic analysis

293

2

Bayesian Analysis of Radiocarbon Dates

647

12

IntCal09 and Marine09 Radiocarbon Age Calibration Curves, 0–50,000 Years cal BP

273

3

Postmortem preservation and alteration of in vivo bone collagen isotope ratios in relation to palaeodietary reconstruction

417

13

R: A Language and Environment for Statistical Computing

270

4

Willow Smoke and Dogs’ Tails: Hunter-Gatherer Settlement Systems and Archaeological Site Formation

353

14

Influence of diet on the distribution of carbon isotopes in animals

264

5

Influence of diet on the distribution of nitrogen isotopes in animals

340

15

The IntCal20 Northern Hemisphere Radiocarbon Age Calibration Curve (0–55 cal kBP)

260

6

Extended 14C Data Base and Revised CALIB 3.0 14C Age Calibration Program

335

16

Advances in Archaeological Method and Theory

256

7

Nitrogen and carbon isotopic composition of bone collagen from marine and terrestrial animals

334

17

Bone Collagen Quality Indicators for Palaeodietary and Radiocarbon Measurements

254

8

Bones: Ancient Men and Modern Myths

331

18

New Method of Collagen Extraction for Radiocarbon Dating

252

9

Experimental Evidence for the Relationship of the Carbon Isotope Ratios of Whole Diet and Dietary Protein to Those of Bone Collagen and Carbonate

313

19

Standards for Data Collection from Human Skeletal Remains

248

10

Pottery Analysis: A Sourcebook.

302

20

Prehistoric Human Bone : Archaeology at the Molecular Level

242


Replicating Marwick’s results with OpenAlex data

The goal here is to replicate the figures 1 to 4 from Marwick (2025) using data from OpenAlex. This requires some effort to prepare the extensive list of papers, including the information on the references they cite. The entire code of Marwick (2025) can then be executed with only few modifications.

How does archaeology compares to other fields?

Figure 3 presents boxplots for archaeological journals (in black) that are quite similar to those in Marwick (2025). In my study, the number of authors and article length are closer to social sciences than to physics. The relative title length is very similar to the WoS data, although it is again slightly closer to social sciences. The recency of references is more akin to humanities. The diversity of sources, calculated correctly with the OpenAlex data, is lower in archaeological journals than in the physics data from Fanelli and Glänzel (2013). Low values of Shannon’s index indicate that articles from archaeological journals cite a limited number of different sources, which is typically interpreted as a characteristic of hard sciences (Fanelli and Glänzel, 2013). Nevertheless, as written above, this metric is probably strongly underestimated from OpenAlex dataset, given the lack of many references in the metadata of articles (books, book chapters, conferences, references in other languages than English etc.)

Figure 3 - Replication of the figure 1 of Marwick (2025) with OpenAlex data. Distributions of article characteristics hypothesised to reflect the level of consensus. The boxplot shows the distribution of values of archaeology articles. The color is red for Diversity of source because I suspect this value to be largely underevaluated due to lack of references metadata. The thick line in the middle of the boxplot is the median value, the box represents the inter-quartile range (the range between the 25th and 75th percentiles, where 50% of the data are located), and individual points represent outliers. The smaller coloured boxplots indicate the values computed by Fanelli and Glanzel (2013), where p = physics, s = social sciences, h = humanities. ln denotes the natural logarithm, or logarithm to the base e.

In the WoS dataset, the proportion of articles published after 2012 is 70 %, for only 13 years out of the 50, or 26 % of the studied time range. On the other side, the post-2012 articles represent only 41 % of the OpenAlex dataset. Data for archaeology in figure 1 of Marwick (2025) is thus strongly skewed towards recent publication habits rather than truly representing trends from 1975 to 2025. In contrast, the OpenAlex data presented in Figure 3 is more representative of the entire time range.

Given that the OpenAlex dataset is larger than the WoS dataset, I replicated figure 1 from Marwick (2025) but selected only data from 2012 (Figure 4), as in Fanelli and Glänzel (2013). Marwick (2025) did not perform the analysis due to a small sample size (n = 303), but the OpenAlex dataset contains 1241 articles from 2012. I believe this is interesting because the calculated metrics do vary over time (Figure 2 Marwick, 2025). Thus, comparing the 1975-2025 dataset of WoS with the 2012 data used by Fanelli and Glänzel (2013) could misrepresent the archaeological publication tendencies and, consequently, the interpretation of archaeology as a hard/soft science.

Figure 4 shows very minor differences compared to Figure 3. The boxplots for all five calculated metrics only shrink slightly, but the relative position to other fields remain the same with same mean, indicating that data from 2012 may be representative of the entire 1975-2025 dataset.

Figure 4 - Replication of the figure 1 of Marwick (2025) for 2012 articles only as in Fanelli and Glanzel (2013), based on OpenAlex data. Distributions of article characteristics hypothesised to reflect the level of consensus. The color is red for Diversity of source because I suspect this value to be largely underevaluated due to lack of references metadata. The thick line in the middle of the boxplot is the median value, the box represents the inter-quartile range (the range between the 25th and 75th percentiles, where 50% of the data are located), and individual points represent outliers. The smaller coloured boxplots indicate the values computed by Fanelli and Glanzel (2013), where p = physics, s = social sciences, h = humanities. ln denotes the natural logarithm, or logarithm to the base e.

In the end, with the OpenAlex dataset, the publications of archaeological research follow trends that are prety different from those visible with the Web of Science dataset. It is mostly close to the ways of publishing in the social sciences, except for the length of articles which is shorter and therefore closer to the publications in the humanities. The diversity of sources is low, as in physics, but the dataset is not complete enough for this index to be calculated accurately.

How has the hardness of archaeology varied over time?

Regarding the evolution of hardness over time, the plots created with the OpenAlex data (Figure 5) are similar to those created with the WoS data (Figure 2 of Marwick, 2025). The only difference is the evolution of the relative length, which I found so close to 0 that I decided not to put in green but to present it in grey as a variable which does not evolve over time in OpenAlex data.

Figure 5 - Replication of the figure 2 of Marwick (2025) with OpenAlex data. Distribution of article characteristics for archaeology articles over time. Data points represent individual articles. The colour of the points indicates if the overall trend is toward softer (orange), harder (green) or do not really change (grey).

How do archaeology journals vary in hardness?

Figure 6 is the equivalent to figure 4 of Marwick (2025) and represents the characteristics of the journals in the top 25 by 2-years-mean-citedness of OpenAlex. However, it is noteworthy to remind that the journals are not all the same in both figures. The review journal Journal of Archaeological Research stands out significantly from the other journals, featuring a high diversity of sources and notably long papers. In the same direction on PC1 but in the opposite direction on PC2, Journal of Archaeological Methods and Theory, Journal of World Prehistory, and Journal of Social Archaeology are characterized by long articles with fewer authors, and references which are less diverse and less recent. Another part of the PCA is represented by a behavior more like hard science articles, with recent references, more authors and shorter papers. This group includes Archaeological Dialogues (for which many papers are short comments or answers), Advances in Archaeological Practice, and Antiquity. Large teams of authors, publishing rather short articles, and citing less diverse sources are typically what is published in Journal of Archaeological Science: Reports, Archaeological and Anthropological Science, Archaeological Prospections. The landscape of journals dedicated to archeological research is well established, with specific outputs dedicated to specific types of articles.

Figure 6 - Replication of the figure 4 of Marwick (2025) with OpenAlex data. Biplot of the first and second principal components of a PCA computed on the means of the five bibliometric variables for each journal in the sample. The arrows represent the correlation between each original variable and the principal components. The direction and length of the arrows indicate how strongly each variable contributes to each component.

Figure 7 - Replication of the figure 3 of Marwick (2025) with OpenAlex data. Panels A-E: Variation in bibliometric indicators of hardness for 25 archaeological journals. The journals are ordered for each indicator so that within each plot, the harder journals are at the top of the plot and the softer journals are at the base. Panel F shows a bar plot that is the single consensus ranking computed from all five variables, using the Borda Count ranking algorithm.

Figure 7 also presents interesting results even if the differences in the listed journals make it sometimes difficult to compare both this figure and the one produced with the WoS dataset:

  • Figure 7A shows a not so different order for the journals that are both in the top 25 of OpenAlex and in the top 20 of WoS when observed. Geoarchaeology, Journal of Archaeological Methods and Theory, and Journal of Anthropological Archaeology are the journals for which the ranking is the most different.

  • Figure 7B-E illustrate that the data in OpenAlex have a much wider distribution than the data in WoS presented by Marwick (2025). This is largely due to the bigger dataset for each journal that is extracted from OpenAlex.

  • Figure 7B-E show rankings in the length of articles, the relative length of articles and in the recency of references quite similar to the ones calculated by Marwick (2025). The journal which is ranked in the most different way between both datasets is probably The Journal of Island and Coastal Archaeology.

  • Figure 7F also generally matches the results from Marwick (2025) which confirms that, globally, both databases ranks the journals in the same way.

  • Figure 7F shows that the journal which is the most distinct from the others and which ranks in top position on the hard/soft science ways of publishing articles, is the Journal of Cultural Heritage. This confirms the interpretation of Marwick (2025): this journal behave the most like experimental science journals because it “publishes materials science and computational analyses related to conservation and preservation of historic objects in museums and other collections” and therefore behave more as chemistry journals than archaeological journals. The journals which behave the more like soft science journals are the journals of theory, of reviews, which maintain a tradition of long articles, with many references, and few authors.

Discussion

The well-organized and shared data and scripts allowed me to easily reproduce the published paper with all its figures, confirming full computational reproducibility. Nevertheless, a few errors were identified in the code, which have been shared with the author through GitHub. The main issues exposed relate to the selection of the top 20 journals and to the calculation of the Shannon’s index. The article states that the selection of journals is based on the H-index, whereas the code shows that it is actually based on the 2022 Impact Factor. This list is also missing two journals due to a problem of data sorting prior to subsetting. The second issue lies in the calculation of Shannon’s index, which significantly modify the results for this metric.

The replication of the first part of Marwick (2025) about the hard/soft positioning of archaeology within the sciences was conducted using the OpenAlex dataset instead of the Web of Science dataset. This open dataset, while significantly larger than WoS, is less curated for certain variables. OpenAlex’s increased size is a major asset in terms of data representativeness (older sources, less English-oriented, less journal oriented), but it also presents challenges related to metadata quality, particularly regarding citations. This trade-off between coverage and precision is crucial for assessing OpenAlex’s suitability for scientometric analyses. Nevertheless, it confirmed most of the observations from the original study. It is interesting to note that extracting the articles indexed with the subfield ‘Archaeology’ in OpenAlex is very different from extracting all the articles published in archaeological journals. The indexation of the articles published in archaeological journals by OpenAlex is apparently very variable, and it would be interesting to know how this indexation is done and why many archaeological articles are not classified as is.

At the journal level, I found some problems in the OpenAlex dataset due to the calculation of the 2ymc index directly on their own citation data (for Archaeofauna) or to their classification of some works which are not really articles as articles (for American Antiquity). Other smaller issues may exist that I may not have detected, the ranks of many journals being rather different between both datasets (Table 2). At this level again, the size of the dataset is much bigger and the list of sources publishing archaeological articles is much higher than those recorded in Web of Science.

The basic metadata for articles present in both datasets are highly consistent, with one notable exception: OpenAlex excludes subtitles when calculating title length, whereas WoS includes them. The other key difference lies in the reference lists for each article, a particularly challenging dataset to compile in an open-source, non-commercial project, given that private companies have curated and monetized this data for decades. OpenAlex constructs its own metadata for cited references, which introduces certain issues. For example, journals sometimes cross-reference all articles in a given volume because each PDF includes the volume’s full table of contents with DOIs. Additionally, between versions 3 and 4 of this manuscript, thousands of references were incorrectly attributed to a Japanese database.

The replication nevertheless confirms that the open and free dataset is usable for scientometrics analysis if sufficient care is taken with the cited references. It also confirms that the ways archaeology is being published can be positioned in terms of hard/soft science’s publishing habits as intermediate between physics and humanities and often close to social sciences, when using the parameters from Fanelli and Glänzel (2013). The only strong difference lies in the diversity of sources as estimated by the Shannon’s index.

As previously explained, the calculation of Shannon’s index for the diversity of sources is incorrect in Marwick (2025). When corrected and still using the WoS dataset, the result is quite different from the result presented in the article: the corrected diversity of sources is even higher higher than social sciences and humanities. With the OpenAlex dataset, the values are lower for archaeology than for physics, social sciences or humanities (Figure 3), and this is also the case when only articles from 2012 are used (Figure 4). Shannon’s index calculated from OpenAlex’s data suggests a very low diversity that would indicate a hard science behavior of archaeologists when citing scientific articles. This could be interpreted as evidence that, in archaeology, “scholars agree on the relative importance of scientific problems, their efforts […] concentrate in specific fields and their findings [are] of more general interest, leading to a greater concentration of the relevant literature in few, high-ranking outlets” (Fanelli and Glänzel, 2013). However, given the issues observed with the cited references for numerous articles, this calculation is to be taken with a grain of salt.

Observing the strong dominance of Journal of Archaeological Science and the presence of Nature, Science, and PNAS in the top 20 cited journals in archaeology (Table 3) could indeed indicate that archaeology relies on a relatively small number of journals, and especially high-ranking ones. Despite this observation, I wonder if this is due to the agreement of scholars on the relative importance of scientific problems as mentioned by Fanelli and Glänzel (2013), or if it is due to the ability of archaeological results to be published relatively more easily than other disciplines in these high-ranking journals, particularly in the archaeology of ancient periods. It may also indicate that the number of archaeological journals is smaller than in other disciplines, though I am unsure if this is true, and I am not aware of any studies on this topic. A recent study focusing on the publications in archaeology between 2020 and 2023 also shows quite a high concentration of citations in few journals (Table 2 in Vélaz Ciaurriz, 2023). It may also be the result of the lack of information about many references in published articles when they cite books, book chapters, conference, or literature in other languages than English, which are not recorded correctly in either database, but may be even stronger in OpenAlex and therefore explain the concentration of citations from journals only and then artificially reducing the diversity of sources. Further refinement of the OpenAlex dataset for cross-references will be necessary to allow further research on this specific subject, or it may be improved by citations specific projects such as OpenCitations (Peroni and Shotton, 2020) even if currently the number of references for each paper are pretty low in this dataset and many articles are still lacking.

The comparison of different journals for each metric measured in this study is also generally similar to the results published in Marwick (2025), although it is sometimes difficult to compare because the lists of journals in both manuscripts are different. Some journals are positioned in the PCA in the same way with both datasets, particularly on the more extreme sides (Figure 6). Changing the calculation of Shannon’s index almost do not alters the PCA of Marwick (2025). The rankings are also quite similar for the journals which are in both lists (Figure 7), but the OpenAlex dataset shows much more diversity for each journal across most metrics. The higher number of articles in the OpenAlex dataset offers a more nuanced view of the behavior of each journal. This may be due to the inclusion of more ancient articles compared to the WoS dataset, which comprises 70% of post-2012 articles. Alternatively, it may also result from some data being poorly documented in OpenAlex, especially for the oldest articles?

Conclusion

This work confirmed the computational reproducibility and the partial replicability of the first part of Marwick (2025). The reproducibility was easy to implement, but some errors were identified in the process. It is the replication of the results with another dataset which allowed me to identify these errors. This process underscore (if necessary) the value of reusing and learning from the code and data of a skilled colleague, a method also employed by the author of the replicated study himself to train his students (Marwick et al., 2020). The results obtained using the OpenAlex dataset, which is entirely free and open source, generally align with those published by Marwick (2025): 1) publication habits in archaeology are generally closer to those in Social Sciences than in Physics or Humanities, and 2) there are different kinds of publication venues for different kind of archaeological research (shorter papers with more authors and more recent references for more experimental research, and longer papers with less authors and many references for reviews and more anthropological research). However, this replication also highlights the limits of both datasets for scientometric analysis, in particular because of the poor representation of some types of references and the quality of cited references data.

The primary difference between both databases that have been explored with the same methodology lie in the references listed in the articles which can be automatically extracted. The findings indicate that the OpenAlex dataset is less influenced by recent trends in publications than the Web of Science dataset, as it maintains a more balanced number of articles over the 50-year period studied. It also offers much more references and integrates much more journals and also more diverse types of sources. This replication supports the idea that it is totally possible to use this extensive, free, and open database for scientometrics analyses, particularly considering that this database will continue to expand and improve in the future. However, this open dataset remains highly incomplete in terms of citations included in the references, especially citations of works that are not journal articles, despite our discipline strongly relying on these other sources.

Finally, I wish to highlight the significant work that remains to be done regarding both bibliometric datasets. A closer examination of the results obtained from both datasets reveals that the for-profit Web of Science dataset includes very few journals, and each journal contains only a limited number of references. This scarcity restricts the ability to conduct broad and representative research based on their data: books are underrepresented, and only English-language journals are included, among other limitations and biases. While the OpenAlex dataset demonstrates robust integration of core bibliographic data across a vast number of articles and journals, the quality of citation references remains inconsistent and contains errors due to less curated data. The OpenCitations dataset contains even fewer but more qualitative references. In their current state, these citation datasets are not yet reliable enough to support conclusive meta-research.

Data and script availability

Data and quarto document allowing to fully reproduce this manuscript are available here (Queffelec 2026): https://doi.org/10.5281/zenodo.19371104. There is also an html version of this manuscript that one can read in a more interactive way here https://aqueff.github.io/replication_Marwick2025_OpenAlex/. You can also make comments, publish issues or commits on GitHub: https://github.com/AQueff/replication_Marwick2025_OpenAlex.

Acknowledgements

I would like to express my gratitude to Ben Marwick for his ongoing efforts to promote transparency and openness in archaeology. Through his influential publications and active participation in professional societies, he consistently advocates for these principles within our community. I have gained significant insights from reading his papers and examining the code he develops and generously shares to produce his research findings. Once again, replicating his work in this paper has been an enriching learning experience.

I also want to thank Mathias Bellat for recommending this manuscript on PCI Archaeology, as well as Zachary Batist, Alan Farahani and 1 anonymous reviewer for their comments on versions 3 and 4 of this manuscript.

Additionally, I wish to disclose that I used Large Language Models (LLMs) for assistance in creating and modifying R code, as well as for refining the English language in this document.

Preprint version 5 of this article has been peer-reviewed and recommended by Peer Community In Archaeology (https://doi.org/10.24072/pci.archaeo.100616; Bellat 2026).

Funding

The authors declare that they have received no specific funding for this study.

Conflict of interest disclosure

The author declare that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article. The author is the founder of PCI Archaeology and member of its Managing Board. Ben Marwick, the author of the replicated manuscript, is a recommender at PCI Archaeology. He recommended one of my previous works (Marwick, B. (2021) Open data on beads, pendants, blanks from the Ceramic Age Caribbean. Peer Community in Archaeology, 100009. https://doi.org/10.24072/pci.archaeo.100009) and I have been the recommender with Shanti Pappu of one of his previous works in 2022 (Queffelec, A. and Pappu, S. (2022) Tektites as chronological markers: after careful geoarchaeological validation only! Peer Community in Archaeology, 100013. https://doi.org/10.24072/pci.archaeo.100013).


References

[1] Alperin, J. P.; Portenoy, J.; Demes, K.; Larivière, V.; Haustein, S. An Analysis of the Suitability of OpenAlex for Bibliometric Analyses, 2024 no. arXiv:2404.17663 | DOI

[2] Andersen, J. P. Field-Level Differences in Paper and Author Characteristics across All Fields of Science in Web of Science, 2000–2020, Quantitative Science Studies, Volume 4 (2023) no. 2, pp. 394-422 | DOI

[3] Aria, M.; Le, T.; Cuccurullo, C.; Belfiore, A.; Choe, J. openalexR: An R-Tool for Collecting Bibliometric Data from OpenAlex, The R Journal, Volume 15 (2024) no. 4, pp. 167-180 | DOI

[4] Atici, L.; Kansa, S. W.; Lev-Tov, J.; Kansa, E. C. Other People's Data: A Demonstration of the Imperative of Publishing Primary Data, Journal of Archaeological Method and Theory, Volume 20 (2013) no. 4, pp. 663-681 | DOI

[5] Badolato, A.-M. Partenariat du ministère de l'Enseignement supérieur et de la Recherche avec OpenAlex pour le développement d'un outil bibliographique entièrement ouvert, Ouvrir la Science, 2024 (https://www.ouvrirlascience.fr/partenariat-du-ministere-de-lenseignementsuperieur-et-de-la-recherche-avec-openalex-pour-le-developpement-dun-outil-bibliographiqueentierement- ouvert)

[6] Barba, L. A. Terminologies for Reproducible Research, 2018 no. arXiv:1802.03311 | DOI

[7] Bellat, M. Reproducibility-ception: A reproduction of a bibliometric study with an open dataset, Peer Community in Archaeology (2026) | DOI

[8] CNRS Le CNRS se désabonne de la base de publications Scopus, https://www.cnrs.fr/fr/actualite/le-cnrs-se-desabonne-de-la-base-de-publications-scopus, 2024

[9] CNRS The CNRS Is Breaking Free from the Web of Science, https://www.cnrs.fr/en/update/cnrs-breaking-free-web-science, 2025

[10] Culbert, J. H.; Hobert, A.; Jahn, N.; Haupka, N.; Schmidt, M.; Donner, P.; Mayr, P. Reference Coverage Analysis of OpenAlex Compared to Web of Science and Scopus, Scientometrics, Volume 130 (2025) no. 4, pp. 2475-2492 | DOI

[11] Derksen, M.; Morawski, J. Kinds of Replication: Examining the Meanings of “Conceptual Replication” and “Direct Replication”, Perspectives on Psychological Science, Volume 17 (2022) no. 5, pp. 1490-1505 | DOI

[12] Fanelli, D.; Glänzel, W. Bibliometric Evidence for a Hierarchy of the Sciences, PLOS ONE, Volume 8 (2013) no. 6, p. e66938 | DOI

[13] Farahani, A. Reproducibility and Archaeological Practice in the Journal of Field Archaeology, Journal of Field Archaeology, Volume 49 (2024) no. 6, pp. 391-394 | DOI

[14] Flexner, J. L. The Dawn of Everything: A New History of Humanity: By David Graeber and David Wengrow, New York, Farrar, Straus, and Giroux, 2021, 692 Pp., ISBN 9780374157357 (Hbk), Australian Archaeology, Volume 88 (2022) no. 3, pp. 328-330 | DOI

[15] Foecke, K.; Queffelec, A.; Pickering, R. No Geoarchaeological Evidence for Deliberate Burial by Homo Naledi: On Best Practice for Geochemical Studies in Archaeology and Paleoanthropology, PaleoAnthropology, Volume 2025 (2025) no. 1, pp. 94-115 | DOI

[16] Hirsch, J. E. An Index to Quantify an Individual's Scientific Research Output, Proceedings of the National Academy of Sciences of the United States of America, Volume 102 (2005) no. 46, pp. 16569-16572 | DOI

[17] Jack, A. Sorbonne's Embrace of Free Research Platform Shakes up Academic Publishing, Financial Times, 2023 (https://www.ft.com/content/89098b25-78af-4539-ba24-c770cf9ec7c3?syn-25a6b1a6=1)

[18] Karoune, E.; Plomp, E. Removing Barriers to Reproducible Research in Archaeology, 2022 | DOI

[19] Kot, M.; Tyszkiewicz, J.; Leloch, M.; Gryczewska, N.; Miller, S. Reliability and Validity in Determining the Relative Chronology between Neighbouring Scars on Flint Artefacts, Journal of Archaeological Science, Volume 175 (2025), p. 106156 | DOI

[20] Kozak, M.; Bornmann, L. A New Family of Cumulative Indexes for Measuring Scientific Performance, PLOS ONE, Volume 7 (2012) no. 10, p. e47679 | DOI

[21] Lau, H.; Kansa, S. W. Zooarchaeology in the Era of Big Data: Contending with Interanalyst Variation and Best Practices for Contextualizing Data for Informed Reuse, Journal of Archaeological Science, Volume 95 (2018), pp. 33-39 | DOI

[22] Marwick, B. Is Archaeology a Science? Insights and Imperatives from 10,000 Articles and a Year of Reproducibility Reviews, Journal of Archaeological Science, Volume 180 (2025), p. 106281 | DOI

[23] Marwick, B.; Wang, L.-Y.; Robinson, R.; Loiselle, H. How to Use Replication Assignments for Teaching Integrity in Empirical Archaeology, Advances in Archaeological Practice, Volume 8 (2020) no. 1, pp. 78-86 | DOI

[24] Marwick, B. Three Levels of Reproducible Workflow Remove Barriers for Archaeologists and Increase Accessibility, Peer Community in Archaeology, Volume 1 (2022), p. 100022 | DOI

[25] Matarese, V. Kinds of Replicability: Different Terms and Different Functions, Axiomathes, Volume 32 (2022) no. 2, pp. 647-670 | DOI

[26] National Academies of Sciences, Reproducibility and Replicability in Science, National Academies Press, 2019 (https://www.nationalacademies.org/projects/DBASSE-BBCSS-17-03/publication/25303)

[27] Nosek, B. A.; Errington, T. M. What Is Replication?, PLOS Biology, Volume 18 (2020) no. 3, p. e3000691 | DOI

[28] OurResearch team Open Science Nonprofit OurResearch Receives $4.5M Grant from Arcadia Fund - OurResearch Blog, OurResearch blog, 2021 (https://blog.ourresearch.org/arcadia-2021-grant/)

[29] Pargeter, J.; Brooks, A.; Douze, K.; Eren, M.; Groucutt, H. S.; McNeil, J.; Mackay, A.; Ranhorn, K.; Scerri, E.; Shaw, M.; Tryon, C.; Will, M.; Leplongeon, A. Replicability in Lithic Analysis, American Antiquity, Volume 88 (2023) no. 2, pp. 163-186 | DOI

[30] Peroni, S.; Shotton, D. OpenCitations, an Infrastructure Organization for Open Scholarship, Quantitative Science Studies, Volume 1 (2020) no. 1, pp. 428-444 | DOI

[31] Popper, K. The Logic of Scientific Discovery, Routledge, London, 1959 | DOI

[32] Priem, J.; Piwowar, H.; Orr, R. OpenAlex: A Fully-Open Index of Scholarly Works, Authors, Venues, Institutions, and Concepts, 2022 no. arXiv:2205.01833 | DOI

[33] Queffelec, A. Replication report for Marwick (2025) « Is archaeology a science? », including new data from OpenAlex, Zenodo, 2026 | DOI

[34] Rizzetto, E.; Peroni, S. Mapping Bibliographic Metadata Collections: The Case of OpenCitations Meta and OpenAlex, 2023 no. arXiv:2312.16523 | DOI

[35] Schmidt, S. C.; Marwick, B. Tool-Driven Revolutions in Archaeological Science, Journal of Computer Applications in Archaeology, Volume 3 (2020) no. 1 | DOI

[36] Singh Chawla, D. Massive Open Index of Scholarly Papers Launches, Nature (2022) | DOI

[37] University of Jyväskylä The Subscription to the Web of Science Database Will End on January 1, 2026, https://www.jyu.fi/en/news/the-subscription-to-the-web-of-science-database-will-end-on-january-1-2026, 2025

[38] Utrecht University Reminder: Access to Web of Science Will End on 1 January 2026, https://www.uu.nl/en/news/reminder-access-to-web-of-science-will-end-on-1-january-2026, 2025

[39] Vélaz Ciaurriz, D. Revisión de La Investigación Científica En Arqueología: Un Análisis Bibliométrico (Review of Scientific Research in Archaeology: A Bibliometric Analysis), Arqueologia Iberoamericana, Volume 52 (2023), pp. 37-47 | DOI

[40] Vrije Universiteit Amsterdam Termination of Access to Web of Science as of January 1, 2026, https://vu.nl/en/employee/university-library/termination-of-access-toweb- of-science-as-of-january-1-2026, 2025

[41] West Virginia University WVU Libraries to Transition to Scopus on January 1, 2026, https://library.wvu.edu/collections/2025/10/07/wvu-libraries-to-transition-to-scopus-on-january-1-2026-frequently-asked-questions, 2025