Section: Archaeology
Topic: Agricultural sciences, Archaeology

Convolutional neural networks and outline analyses for archaeobotanical studies of domestication and subspecific identification

Corresponding author(s): Bonhomme, Vincent (bonhomme.vincent@gmail.com)

10.24072/pcjournal.647 - Peer Community Journal, Volume 5 (2025), article no. e119

Get full text PDF Peer reviewed and recommended by PCI

Abstract

The identification of archaeological fruits and seeds is crucial for understanding the relationships between humans and plants within the cultural and biological history of both wild and cultivated species. We compared the relative performance of a deep learning approach, namely convolutional neural networks (CNN), and outline analyses via geometric morphometrics using elliptical Fourier transforms (EFT) at identifying pairs of plant taxa. We used their seeds and fruit stones that are the most abundant organs in archaeobotanical assemblages, and whose morphological identification, chiefly between wild and domesticated types, allows to document their domestication and biogeographical history. We used existing modern datasets of four plant taxa (barley, olive, date palm and grapevine) corresponding to photographs of two orthogonal views of their seeds that were analysed separately to offer a larger spectrum of shape diversity. Sample sizes ranged from 473 to 1,769 seeds per class, which constitute a relatively small dataset for training CNNs models yet typical within archaeobotanical research. On these eight datasets, we compared the performance of CNN and EFT coupled with linear discriminant analyses. Our objectives were twofold: i) to test whether CNN can beat geometric morphometrics in taxonomic identification and if so, ii) to test which minimal sample size is required. We ran simulations on the full datasets and also on subsets, starting from 50 images in each binary class. For the CNN network, we deliberately used a candid approach relying on pre-parameterised VGG19 network. For EFT, we used a state-of-the art morphometrical pipeline. The main difference rests in the data used by each model: our CNN used bare photographs where EFT used outline coordinates. This “pre-distilled” geometrical description of seed outlines is often the most time-consuming part of morphometric studies. Results show that our CNN beats EFT in most cases, even for very small datasets. We finally discuss the potential of CNNs for archaeobotany, and how bioarchaeological studies could embrace both approaches, used in a complementary way, to better assess and understand the past history of species.

Metadata
Published online:
DOI: 10.24072/pcjournal.647
Type: Research article
Keywords: convolutional neural networks, elliptical Fourier transforms, archaeobotany, domestication studies, geometric morphometrics

Bonhomme, Vincent 1, 2; Bouby, Laurent 1; Claude, Julien 1, 3; Dham, Camille 1; Gros-Balthazard, Muriel 4; Ivorra, Sarah 1; Jeanty, Angèle 1; Pagnoux, Clémence 5; Pastor, Thierry 1; Terral, Jean-Frédéric 1; Evin, Allowen 1

1 ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
2 Athéna, Lacamp, Roquedur, France
3 Department of Biology, Faculty of Science, University of Chulalongkorn, Bangkok, Thailand
4 DIADE, Univ Montpellier, IRD, CIRAD, Montpellier, France
5 AASPE, CNRS, MNHN, Paris, France
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
@article{10_24072_pcjournal_647,
     author = {Bonhomme, Vincent and Bouby, Laurent and Claude, Julien and Dham, Camille and Gros-Balthazard, Muriel and Ivorra, Sarah and Jeanty, Ang\`ele and Pagnoux, Cl\'emence and Pastor, Thierry and Terral, Jean-Fr\'ed\'eric and Evin, Allowen},
     title = {Convolutional neural networks and outline analyses~for archaeobotanical studies of domestication and subspecific identification},
     journal = {Peer Community Journal},
     eid = {e119},
     year = {2025},
     publisher = {Peer Community In},
     volume = {5},
     doi = {10.24072/pcjournal.647},
     language = {en},
     url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.647/}
}
TY  - JOUR
AU  - Bonhomme, Vincent
AU  - Bouby, Laurent
AU  - Claude, Julien
AU  - Dham, Camille
AU  - Gros-Balthazard, Muriel
AU  - Ivorra, Sarah
AU  - Jeanty, Angèle
AU  - Pagnoux, Clémence
AU  - Pastor, Thierry
AU  - Terral, Jean-Frédéric
AU  - Evin, Allowen
TI  - Convolutional neural networks and outline analyses for archaeobotanical studies of domestication and subspecific identification
JO  - Peer Community Journal
PY  - 2025
VL  - 5
PB  - Peer Community In
UR  - https://peercommunityjournal.org/articles/10.24072/pcjournal.647/
DO  - 10.24072/pcjournal.647
LA  - en
ID  - 10_24072_pcjournal_647
ER  - 
%0 Journal Article
%A Bonhomme, Vincent
%A Bouby, Laurent
%A Claude, Julien
%A Dham, Camille
%A Gros-Balthazard, Muriel
%A Ivorra, Sarah
%A Jeanty, Angèle
%A Pagnoux, Clémence
%A Pastor, Thierry
%A Terral, Jean-Frédéric
%A Evin, Allowen
%T Convolutional neural networks and outline analyses for archaeobotanical studies of domestication and subspecific identification
%J Peer Community Journal
%D 2025
%V 5
%I Peer Community In
%U https://peercommunityjournal.org/articles/10.24072/pcjournal.647/
%R 10.24072/pcjournal.647
%G en
%F 10_24072_pcjournal_647
Bonhomme, V.; Bouby, L.; Claude, J.; Dham, C.; Gros-Balthazard, M.; Ivorra, S.; Jeanty, A.; Pagnoux, C.; Pastor, T.; Terral, J.-F.; Evin, A. Convolutional neural networks and outline analyses for archaeobotanical studies of domestication and subspecific identification. Peer Community Journal, Volume 5 (2025), article  no. e119. https://doi.org/10.24072/pcjournal.647

PCI peer reviews and recommendation, and links to data, scripts, code and supplementary information: 10.24072/pci.archaeo.100502

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Full text

The full text below may contain a few conversion errors compared to the version of record of the published article.

Introduction

From Aristotle to Darwin, the form of organisms has long inspired our understanding of the living world. In some disciplines such as archaeobotany, the shape of plant remains is, most often, the only available datum. Both qualitative and quantitative morphological criteria first allowed to identify plant remains, particularly seeds and fruit stones, often at the species level (Zohary et al., 2012; Wallace et al., 2019a). Then, purely quantitative tools, and chiefly geometric morphometrics, allowed for finer-grained, statistically assessed identifications, to further explore the morphological size and shape variation.

Geometric modern morphometrics (further abbreviated GMM), is the statistical description of shape and its covariation (Kendall, 1989). It uses generic mathematical transformations to convert shape and size into quantitative variables. Most GMM studies either uses configuration of landmark coordinates, the geometry of curves (closed or not) or, more recently, entire surfaces. Curves analyses are often favoured in archaeobotany due to the absence of clear landmarks, if any, on botanical organs and elliptical Fourier transforms (further abbreviated EFT) is the most popular approach.

By comparing archaeological material to modern collections of reference, GMM and EFT in particular for plants, allowed fine-grained inferences, in particular to document the emergence of new morphological types, evidence domestication syndromes, reconstruct the dynamics of their diffusion in both time and space, and overall gain insights into the intertwined histories of human societies and domesticated plants (Terral et al., 2004, 2010, 2012; Kost & Heil, 2005; Burger et al., 2011; Bouby et al., 2013; Ros et al., 2014; Pagnoux et al., 2015; Bonhomme et al., 2017; Bourgeon et al., 2018; Wallace et al., 2019a; Tarongi et al., 2021; Jesus et al., 2021; Bonhomme et al., 2021a; Evin et al., 2022; Roushannafas et al., 2022).

Deep learning quickly became a game-changer from academia to industry, through its versatility and cutting-edge achievements. Computer vision in general has largely benefited the synergy between the massive democratization of computational power and the arrival of software frameworks on top of solid mathematical foundations. Convolutional neural networks (further abbreviated CNN) (Lecun et al., 1998), in particular, have been at heart of very diverse supervised classification tasks, from autonomous vehicles to plant identification (Alzubaidi et al., 2021; Berganzo-Besga et al., 2022). However CNN still remain relatively rare in paleontological and archaeological studies (Soroush et al., 2020; Romero et al., 2020; Garcia‐Molsosa et al., 2021; Loddo et al., 2021; and the review by Bellat et al., 2025) and also in morphometrics (Miele et al., 2020; Le et al., 2020) yet datasets of large number of images are now available and can be employed to develop new tools for specialized tasks like seed recognition (Yuan et al., 2024).

Date palm (Phoenix dactylifiera L.), grapevine (Vitis vinifera L.) archaeological seeds, barley (Hordeum vulgare L.) caryopsis and olive (Olea europaea L.) stones have been intensively studied in archaeobotany using geometric morphometrics. They are four important taxa of human subsistence in the Mediterranean basin since millennia. The presence of the wild progenitors of the domestic forms in vast geographic ranges makes the identification of the wild or domestic status of the archaeobotanical remains of date palm, olive and grapevine particularly difficult. In addition, the presence of multiple types for barley in the region, exploited for diverse use and with different agricultural practices require intra-specific identification.

The morphological distinction between wild and domestic types using GMM is now very accurate for olive (Terral et al., 2004, 2021) and grapevine (Terral et al., 2010; Bonhomme et al., 2022). On the other hand, distinguishing between wild and domestic date palm seeds (Terral et al., 2012; Gros-Balthazard et al., 2017), as well as between two- and six-row barley grains remains challenging (Ros et al., 2014; Bonhomme et al., 2017; Wallace et al., 2019b; Jeanty et al., 2024).

In that context, this paper aims to test the potential of a deep learning approach for archaeobotanical identification and ask the following questions: i) can a CNN outperforms baselines obtained with GMM and if so, ii) how much data are typically required to train the models? Here, we used four plant models presenting binary challenges below the species level, at core in archaeobotanical studies. More precisely, our aim was to distinguish between wild and domesticated types of date palm, olive and grapevine, and between two- and six-row barley,

A CNN model correctly trained on large datasets is expected to outperforms EFT approaches, providing taxonomical differences are reflected in some morphological contrasts, at least because EFT are limited to the geometrical differences of outlines, while CNN applied on images can capture any morphological discriminant feature beyond shape, texture for example. That being said, several conditions of our models make such expectation far from granted here:

  • Low inter-class differences: differences tested here, chiefly shape differences, ranged from subtle at best to extremely challenging; the group labelling was certain only because the identification was obtained through molecular markers (for the date palm) or on entire plants cultivated in biological conservation centres (other models).

  • Small sample sizes: the available datasets were particularly small compared to those usually deployed in CNN learning tasks. The datasets used here were obtained through 2D images acquired following rigorous and time-consuming protocols, which limit the number of biological objects that can be analysed in the context of archaeobotanical studies.

  • Challenging baselines: existing baselines obtained through GMM are already good to very good.

  • Accessible models: our intention was to develop CNN-based pipelines, reasonably easy to run by non-expert users using general-purpose computers.

  • Taphonomic biais: Charring, desiccation or waterlogging related to the fossilisation of the fruit and seed stones can potentially generate important sampling bias in CNN by comparison to GMM as the first one might focus on texture rather than outline geometry.

We first present the models used and compare their performance to geometric morphometrics. Finally, we discuss the pros and cons of CNN versus EFT and propose an agenda of future researches.

Material and methods

Statistical environment

All analyses were run using R 4.1.3 (R Development Core Team, 2024). We used a MacBook Pro 2013 model with a 2,6 GHz Intel Core i5 CPU and 16 Go 1600 MHz DDR3 RAM. Data manipulation and visualization was done using tidyverse 2.0.0 (Wickham et al., 2019). Image manipulation was done using magick 2.7.3 (Ooms, 2016). All morphometric analyses were performed using Momocs 1.4.1 (Bonhomme et al., 2014; Bonhomme et al., 2025a) and linear discriminant analyses using MASS (Venables & Ripley, 2002). CNN models used keras 2.15.0 (Allaire & Chollet, 2017), the R interface to the eponym Python 3.7 architecture, which here ran on CPU alone.

Datasets used

Among the model species studied by our team, we retained those for which we have enough material, secure identification and associated publication record: grapevine pips (Pagnoux et al., 2015; Bonhomme et al., 2020), barley grains (Jeanty et al., 2023), olive stones (Bourgeon et al., 2018; Terral et al., 2021) and dates seeds (Terral et al., 2012; Gros-Balthazard et al., 2017) (Table 1). All models corresponded to a binary classification task with 2- versus 6-row types of barley (Hordeum vulgare), and wild versus domestic for the three other taxa. These datasets only comprised modern material from collections of references, commonly used to compare with archaeological material.

All seeds/stones/fruits were photographed in dorsal and lateral views using a stereomicroscope coupled with a digital camera. It is worth noting that GMM identification is usually obtained by combining the information brought by the two orthogonal views but we chose here to not combine these views to increase the number of “independent” datasets and have a larger spectrum of shapes (Figure 1).

Figure 1 - Schematic description of the GMM and CNN models used. For each taxon, an archetypical individual seed is presented.

Convolutional neural networks

Our CNN models used the VGG19 architecture (Simonyan & Zisserman, 2014) with the weights trained on the ImageNet reference dataset (Deng et al., 2009), as available in keras. The convolutional base, with feature hierarchies learnt on ImageNet, was frozen. Given we did not want to predict ImageNet classes, the last three dense layers were replaced with two fully connected dense layers and only these last layers were fine-tuned. The first has 32 units and a rectified linear unit (ReLU). Because all models were binary classification, the last layer has two units and a sigmoid activation (Figure 1).

The loss was calculated using binary cross-entropy for binary classification tasks. We used two callbacks to control the training step. The first controlled the learning rate, based on loss decrease, initially fixed to 10-2 with a decay factor of 10, a patience of 10 epochs and a minimal value of 10-7. The second stopped the training with a patience of 20 epochs with no accuracy improvement. These two callbacks were used to homogenise training among models.

For each dataset, the number of images was balanced between classes, using a random sampling without replacement among available images (Table 1). This allowed to explore the effect of different sample sizes on final CNNs performance. For each sample size tested, 60% of the total number of images was used for the training set, 20% for the validation set and the last 20% for evaluating the model. The training set was used to adjust weights while the validation set was used to evaluate model performance back-propagate results to the unfrozen layers at the end of each epoch. The evaluating set was used only once and after the training step, to report the model performance on images never seen before by the model.

The rgb images were converted to grayscale with pixel values standardized between 0 and 1 and reduced to a resolution of 120×90 pixels, while maintaining the aspect ratio (Figure 1). The first layer of the VGG19 convolutional based was adapted accordingly.

Table 1 - Material used. Each of the four taxa provided two views treated separately.

Dataset

Taxa

Classification problem

Levels

Sample sizes

hordeum

Hordeum vulgare

2- vs 6-row

2-row

981

6-row

473

olea

Olea europaea

wild vs domesticated

domesticated

1589

wild

630

phoenix

Phoenix dactylifera

wild vs domesticated

domesticated

776

wild

662

vitis

Vitis vinifera

wild vs domesticated

domesticated

1769

wild

872

Geometric morphometrics baseline

We used outline analysis using elliptical Fourier transforms (EFT) (Kuhl & Giardina, 1982; Claude, 2008; Bonhomme et al., 2014). We first converted full-sized images into silhouette masks on which 360 outline coordinates were sampled, equally spaced along the curvilinear abscissa. We then normalized outlines for their size, position, rotation and first point and obtained enough harmonics to gather 95% of the total harmonic power (6 for all datasets). Then, a linear discriminant analysis (LDA) was trained on the same dataset as for CNN yet combining the training and validation sets (Figure 1). The general methodology is detailed elsewhere (Bonhomme et al., 2014, 2020).

Model comparisons

Ten replicates were used for each of the eight datasets and each was tested with increasing sample sizes (Table 1 and Table 2). Given one of the 560 runs, the very same sets of images (or masks) was submitted to both CNN and EFT. The only difference is that, for EFT, the two training and validation sets were combined for training then evaluated on the same 20% as for CNN (Figure 1). This cross-validation scheme allowed direct comparisons between the respective performances of each model. Performance was measured with accuracy, that is the proportion of correctly identified individuals. Sensitivity and specificity were also calculated.

Results

In most cases, CNN beat EFT (419 cases over 560, that is 75% - Figure 2, Figure 3, Table 2). This is particularly true for larger training sets.

Table 2 - Mean accuracy (CNN – EFT) differences ± standard deviation, expressed as percentages, for each of the 10 replicates. Sample sizes are expressed as the total number of images used per class (training/validation + evaluation). Cells with ‘-’ could not be calculated due to sample size limitations.

Sample sizes

Dataset

View

50

100

150

200

300

400

500

600

700

hordeum

dorsal

0.1 ± 8.2

-1.7 ± 5

-0.3 ± 7.1

-0.8 ± 6.7

-1.2 ± 5.7

-

-

-

-

lateral

-0.4 ± 5.9

-3.2 ± 5.4

-0.6 ± 5.5

-5.8 ± 6

-0.4 ± 6.3

-

-

-

-

olea

dorsal

4.8 ± 3.9

5.2 ± 2.9

7.1 ± 2.8

7.8 ± 2.5

8.9 ± 2.1

8.4 ± 1.9

8.3 ± 1.7

-

-

lateral

2.5 ± 3.2

4.5 ± 4.1

6.7 ± 2.2

6.3 ± 1.6

6.9 ± 2

8.7 ± 2.2

7.3 ± 2.3

-

-

phoenix

dorsal

-4.5 ± 5.1

-0.9 ± 5.4

1.3 ± 2.8

0.2 ± 2.9

1.5 ± 3.7

3.1 ± 3

4.1 ± 2.1

-

-

lateral

-6.2 ± 8.8

-4.3 ± 4

0.3 ± 2.4

-2.7 ± 5.7

0.2 ± 3

0.4 ± 2.2

0.7 ± 3

-

-

vitis

dorsal

-0.5 ± 10.5

5.2 ± 3.9

8.5 ± 2.4

5.8 ± 3.6

7.3 ± 3.1

8.4 ± 2.1

7.6 ± 3.6

8.9 ± 3

9.5 ± 2.8

lateral

-3.7 ± 6.3

1.5 ± 3.6

1.4 ± 5.2

4.1 ± 2.6

2.8 ± 2.3

3 ± 2.4

5 ± 1.5

4.5 ± 3.7

5.8 ± 2.4

Figure 2 - Model performances for each dataset presented using accuracy, training sample sizes and replicates for CNN and EFT. Sample sizes are expressed as the total number of images used per class (training/validation + evaluation). The models are run for the ventral (VD) and lateral (VL) views. Two-row vs six-row barley (hordeum) are compared, as well as the wild and domestic forms of olive (olea), date palm (phoenix) and grapevine (vitis). The same graphs for precision and recall are available in the supplementary material and show similar trends.

Figure 3 - Model performances for each dataset training sample sizes and replicates, presented as absolute CNN - EFT accuracies. Sample sizes are expressed as the total number of images used per class (training/validation + evaluation). The models are run for the ventral (VD) and lateral (VL) views. Two-row vs six-row barley (hordeum) are compared, as well as the wild and domestic forms of olive (olea), date palm (phoenix) and grapevine (vitis).

Among the eight datasets (further referred using their vernacular name) two groups can be made: olive and grapevine in one hand, barley and date palm in the other, no matter the view considered. For grapevine and olive, where EFT “already” provided good accuracies, CNN perform even better, particularly for the large sample sizes. For grapevine with 700 images, average CNN accuracies range from 94 to 99% for dorsal view and from 92 to 96% for lateral view. For olive with 500 images, performances range from 98 to 100% for dorsal view, and from 97 to 99% for lateral view. For barley and date palm, the results seem more mitigated at first glance (Figure 2), yet, on average (Table 2) the CNN also achieve better accuracies when the datasets are large enough. For sample sizes above 150 individuals, CNNs are better in most cases for barley and consistently for date palm. These two groups of results are reflected in the mean differences between models for the largest sample size: olive and grapevine gained ~10% accuracy where date palm and barley gained less than 5%. The patterns observed in sensitivity and specificity mirrored those of accuracy and are available in the ESM.

Finally, to give an idea of computational time, a single iteration of the 56 models pairs took ~17 hours to complete, with less than 1% dedicated to the EFT. In the other hand, post-treatment time for preparing pictures is virtually zero for CNN and about 1 min per picture for EFT, that is about a full-time week for each dataset here.

Discussion

Our results show that even a candid CNN approach could outperform state-of-the-art EFT to identify plant seeds and fruits below the species level. Even if the performance boost is not dramatical for all four studied taxa, this was a quite surprising result since the CNN beat almost consistently our EFT baselines even when the sample sizes were small.

Regarding the four pairs of taxa studied, identifying wild and domestic types for olive and grapevine is relatively easy using the seed shape but distinguishing between the wild and domestic date palm, and the two- and six-row barley is challenging, not to say troublesome. For hordeum (lateral view), CNN are even, on average, below the EFT obtained with accuracy. Further research using refined CNNs architectures would be helpful on that particular dataset.

Here, when geometrical differences between studied pairs are quite obvious macroscopically (olive and grapevine), the CNN clearly beat GMM identification and is close to perfect when the sample sizes of the training sets exceeded 500 hundred seeds. For instance, over the 10 replicates, a single olive stone in lateral view was wrongly identified among the 1400 evaluated images (20% of 10*700). Using outline analyses, accuracies around 95% can now be reached for certain taxa (e.g. olive and grapevine), particularly when combining several views (Bonhomme et al., 2020), but here CNN only have raw 120×90 images as inputs.

Perhaps the most surprising result is that CNN also beat EFT baseline even when trained with only ~100 images in each class, at least for these two “easy” models (here grapevine and olive). Given how costly and time-consuming is the constitution of a reference collection, this means that CNN can be tested early and possibly cut off these costs. Also, that methods applied here could be easily tested in many other archeological models whether they are plant, animal organs or non-biological artefacts, imprints, etc.

One important result here is that CNN can still improve their classification score when increasing the training sample size well after the classification score of GMM can no more be improved (because of limitations associated with the linear discriminant analyses or because the number of available variables is limited). Our results seem also to indicate that GMM and linear discriminant analysis allow to fast reach their maximal accuracy but rapidly reach a plateau (corresponding to 50 to 150 analyzed individuals). With larger sample size, they are clearly performing less well than CNNs.

Deep learning approaches are now quite common for animal and plant species identification, particularly for citizen science projects (Willi et al., 2019; Picek et al., 2022), but remain so far very new when it comes to archaeological material (but see Miele et al., 2020) or morphometrics (but see Le et al., 2020). To the best of our knowledge, this is the first time CNNs are used for such sub-specific identification task in plants, a fortiori on four different model taxa. The results shown here appeal to further studies to test how they could be extended to other archaeological material, other plant or animal taxa and at the species level. Here we show that, at least in some cases, the diversity at even lower taxonomic levels can be explored. This would be of prime interest to develop tools that can be used not only by archaeobotanists but also by any people interested in identifying variety (e.g. for conservation purposes). In palynology, another field that may be developed in an archaeological context, deep learning using CNNs has already proven to be helpful in the fastidious task of pollen and phytolith identification and counting (Sevillano et al., 2020; Berganzo-Besga et al., 2022; Gimenez et al., 2024).

In this paper, our main intention was to take the archaeobotanists point of view: “How can my reference collection help me interpret the identity and significance of my unknown seeds?”. Despite these encouraging initial results, it remains important to note the most apparent potential pitfalls.

First, CNN and EFT as implemented here neither work with the same information (CNN use images, EFT use an outline geometry), nor use the same method here for the classification (CNN use a sigmoid activation, EFT uses LDA). EFT is, by construction, limited to the description of the shape and form variations where CNN use a number of other variables that can be useful (and also possibly detrimental) for classification, such as colour, texture, or patterns that go beyond the outline geometry. This calls for additional research, for the sake of a more direct comparison between paths. For example: how would CNNs (and other deep learning tools) behave with more or less “distilled” geometric information when given raw images, cropped images, masked images, (x, y) outline coordinates, EFT coefficients or even PC scores? Also, would an intermediate segmentation model mask (i.e. with texture) seeds be interesting in terms of robustness and performance? Future research will help clarify how CNN, and more largely deep learning can really be a game changer in archaeobotanical studies.

Second, CNN models used here may be highly sensible to different image acquisition environment, including apparatus, lightning, operator, post-processing, etc. Such biases have already been investigated, sometimes with workarounds (Kothari et al., 2014; Fortin et al., 2018; Da Rin et al., 2022). Datasets used here were obtained in multiple settings and environments in the last twenty years and further experiments will likely share the same potential pitfall, either directly by combining cross-laboratory acquisition, by using the reference datasets provided with this paper, or by evaluating archaeological material with models trained on modern material. This may also call for other approaches using masks or outlines which are often already obtained for further outline analyses anyway. Additional work to test for these potential biases is needed.

More generally, should we expect rivalry or synergy emerge between CNN and geometric morphometrics? For rough identification, CNN will likely become a more popular tool for future archaeobotanical studies, and possibly the next standard toolkit. Here, we insist on the fact that our CNN architecture was deliberately kept simple for both practical and conservative reasons: we had many models to run that needed to be generic and the point was to test if a candid CNN approach could beat state-of-the-art EFT. There is definitely room for improvement by using better models, fine-tuning them, larger datasets, larger images and by combining views or even using 3D models of the objects. That being said, with the best will in the world a model cannot see what simply does not exist. In some cases, a single ratio of lengths can achieve nearly perfect identification when morphological differences are trivial. This is the case for example for grapevine (Bonhomme et al., 2022). On the other hand, meaningful differences for human use may just not be reflected on the studied organs. Somewhere between these two extremes are a wide range of real differences that can only be detected by statistical means (Bonhomme et al., 2021a; Bonhomme et al., 2021b). This is where methodological refinement makes the most sense and a natural playground for deep learning approaches.

EFT have the advantage of translating the shape into coefficients that can be directly treated as quantitative variables. Also, the inverse transformations are mathematically defined, so that one can go “back” to shape from coefficients, which allows rich insights into the morphological space of taxa of interest and the comparison between the relative occupancies between taxonomic, diachronic or synchronic assemblages. For that matter, the best equivalent CNN have to offer so far are activation maps where one can visualize for each image, the regions that triggered the final vote. Even though the reputation of being black boxes is largely erroneous, CNN are and will likely remain less handy to that respect mostly because they use images that are difficult to interpret in terms of model explainaibility than, say, outline coordinates. More generally, “to predict is not to explain” (Thom, 2010), and in our opinion, CNN and EFT should be seen as complementary approaches rather than competitors. Future studies will explore this assumption but CNN may soon become the go-to tool when identification is of prime interest. Paradoxically, CNN are more computationally intensive than GMM models but may prove easier to deploy as applications and more accessible to a broad audience, as they can be trained and used directly on raw images whereas GMM approaches require meticulously prepared inputs.

Finally, if deep learning was here restricted to identification using convolutional neural networks, it has much more to offer to archaeology and morphometrics: its versatility extends to regression problems (e.g. Reese, 2021), segmentation (i.e. automating and/or improving pre-morphometrics treatment (e.g. Lee et al., 2017), adversarial reconstruction for broken or missing parts (e.g. Hermoza & Sipiran, 2018), pose and parallax correction for data acquisition (e.g. Zhang et al., 2021). In our view, this also argues for synergy rather than rivalry between CNN and GMM approaches, with future research determining the extent to which this holds true.

Acknowledgements

We are grateful to the Centre de Ressources Biologiques de la Vigne, Domaine de Vassal-Montpellier (INRAE) (https://vassal.montpellier.hub.inrae.fr) that provided all the pips from cultivated varieties and the OSU-OREME (https://oreme.org/) that helped to the constitution of the wild grape pip collection. We thank the Melgueil experimental domain (INRA, Mauguio, France), the Porquerolles (CBNMed, France), Córdoba (IFAPA, Spain) and Tessaout (INRA Marrakech, Morocco) worldwide collections of the olive tree that provide the olives stones from cultivated varieties. We also thank the Biological Resource Center-INRAE Clermont Ferrand for providing all the barley grains studied.

Preprint version 4 of this article has been peer-reviewed and recommended by Peer Community In Archaeology (Huet, 2025; https://doi.org/10.24072/pci.archaeo.100502). We want to warmly thank Mathias Bellat, Marco Cornelli, Lloyd A. Courtenay, Thomas Huet and two anonymous reviewers for their very precious inputs during the peer-review process.

Funding

This work has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant Agreement 852573), the French National Agency (ANR-16-CE27-0013, ANR-06-BLAN-0212-02-PHOENIX, ANR-22-CE27-0026) the International Research Program EVOLEA (France - Morocco) (CNRS-INEE) and the Défi Clé “Sciences du passé” - Occitanie region / Federal University of Toulouse, France (PATRIMOLEA programme).

Conflict of interest disclosure

The authors declare that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.

Data, scripts, code, and supplementary information availability

The following doi includes all scripts and data to rerun and build upon the work presented here (6.79 Go-zip): https://doi.org/10.6084/m9.figshare.25680390.v2 (Bonhomme et al., 2025b).

It includes:

• Scripts used and commented in the folder /R. Raw images in the folder /DATA and .rda in the folder /rda are primarily intended to be accessed by the .R scripts.

• model histories showing accuracies, loss and learning rate for both training and validation partitions, for each combination of model x sample size x seed are in the folder /model histories

• hdf5 images of training weights are also available for seed 2329 in the folder /hdf5

• Figure 2 with the very same presentation but showing precision and recall instead of accuracy.


References

[1] Allaire, J.; Chollet, F. keras: R Interface to “Keras, Contributed Packages, CRAN, 2017 | DOI

[2] Alzubaidi, L.; Zhang, J.; Humaidi, A.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.; Al-Amidie, M.; Farhan, L. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions, Journal of Big Data, Volume 8 (2021), p. 53 | DOI

[3] Bellat, M.; Orellana Figueroa, J.; Reeves, J.; Taghizadeh-Mehrjardi, R.; Tennie, C.; Scholten, T. Machine learning applications in archaeological practices: a review, https://doi.org/10.48550/arXiv.2501.03840, 2025 | DOI

[4] Berganzo-Besga, I.; Orengo, H.; Lumbreras, F.; Aliende, P.; Ramsey, M. Automated detection and classification of multi-cell Phytoliths using Deep Learning-Based Algorithms, Journal of Archaeological Science, Volume 148 (2022), p. 105654 | DOI

[5] Bonhomme, V.; Bouby, L.; Claude, J.; Gros-Balthazard, M.; Ivorra, S.; Pagnoux, C.; Dham, C.; Jeanty, A.; Pastor, T.; Terral, J.-F.; Evin, A. GMM vs CNN paper by Bonhomme et al, https://doi.org/10.6084/m9.figshare.25680390.v2, 2025 | DOI

[6] Bonhomme, V.; Forster, E.; Wallace, M.; Stillman, E.; Charles, M.; Jones, G. Identification of inter- and intra-species variation in cereal grains through geometric morphometric analysis, and its resilience under experimental charring, Journal of Archaeological Science, Volume 86 (2017), pp. 60-67 | DOI

[7] Bonhomme, V.; Ivorra, S.; Lacombe, T.; Evin, A.; Figueiral, I.; Maghradze, D.; Marchal, C.; Pagnoux, C.; Pastor, T.; Pomarèdes, H.; Bacilieri, R.; Terral, J.; Bouby, L. Pip shape echoes grapevine domestication history, Scientific Reports, Volume 11 (2021) | DOI

[8] Bonhomme, V.; Pagnoux, C.; Bouby, L.; Ivorra, S.; Allen, S.; Valamoti, S.-M. Early viticulture in Neolithic and Bronze Age Greece: looking for the best traditional morphometric method to distinguish wild and domestic grape pips, Cooking with plants in ancient Europe and beyond: interdisciplinary approaches to the archaeology of plant foods (eds Valamoti SM, Sidestone Press, 2022, pp. 57-69

[9] Bonhomme, V.; Picq, S.; Claude, J. Momocs: Morphometrics using R, https://doi.org/10.32614/CRAN.package.Momocs, 2025 | DOI

[10] Bonhomme, V.; Picq, S.; Gaucherel, C.; Claude, J. Momocs: Outline Analysis Using R, Journal of Statistical Software, Volume 56 (2014) | DOI

[11] Bonhomme, V.; Picq, S.; Ivorra, S.; Evin, A.; Pastor, T.; Bacilieri, R.; Lacombe, T.; Figueiral, I.; Terral, J.-F.; Bouby, L. Eco-evo-devo implications and archaeobiological perspectives of trait covariance in fruits of wild and domesticated grapevines, PLOS ONE, Volume 15, 2020, p. 0239863 | DOI

[12] Bonhomme, V.; Terral, J.-F.; Zech-Matterne, V.; Ivorra, S.; Lacombe, T.; Deborde, G.; Kuchler, P.; Limier, B.; Pastor, T.; Rollet, P.; Bouby, L. Seed morphology uncovers 1500 years of vine agrobiodiversity before the advent of the Champagne wine, Scientific Reports, Volume 11 (2021) | DOI

[13] Bouby, L.; Figueiral, I.; Bouchette, A.; Rovira, N.; Ivorra, S.; Lacombe, T.; Pastor, T.; Picq, S.; Marinval, P.; Terral, J.-F. Bioarchaeological Insights into the Process of Domestication of Grapevine (Vitis vinifera L.) during Roman Times in Southern France, PLoS ONE, Volume 8, 2013, p. 63195 | DOI

[14] Bourgeon, O.; Pagnoux, C.; Mauné, S.; Vargas, E.; Ivorra, S.; Bonhomme, V.; Ater, M.; Moukhli, A.; Terral, J.-F. Olive tree varieties cultivated for the great Baetican oil trade between the 1st and the 4th centuries ad: morphometric analysis of olive stones from Las Delicias (Ecija, Province of Seville, Spain, Vegetation History and Archaeobotany, Volume 27 (2018) | DOI

[15] Burger, P.; Terral, J.-F.; Ruas, M.-P.; Ivorra, S.; Picq, S. Assessing past agrobiodiversity of Prunus avium L. (Rosaceae): a morphometric approach focussed on the stones from the archaeological site Hôtel-Dieu (16th century, Tours, France, Vegetation History and Archaeobotany, Volume 20 (2011), pp. 447-458 | DOI

[16] Claude, J. Morphometrics with R, Springer, New York, New York, NY, 2008 | DOI

[17] Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database, 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp. 248-255 | DOI

[18] Evin, A.; Bouby, L.; Bonhomme, V.; Jeanty, A.; Jeanjean, M.; Terral, J.-F. Archaeophenomics of ancient domestic plants and animals using geometric morphometrics : a review, Peer Community Journal, Volume 2 (2022) | DOI

[19] Fortin, J.-P.; Cullen, N.; Sheline, Y.; Taylor, W.; Aselcioglu, I.; Cook, P.; Adams, P.; Cooper, C.; Fava, M.; McGrath, P.; McInnis, M.; Phillips, M.; Trivedi, M.; Weissman, M.; Shinohara, R. Harmonization of cortical thickness measurements across scanners and sites, NeuroImage, Volume 167 (2018), pp. 104-120 | DOI

[20] Garcia‐Molsosa, A.; Orengo, H.; Lawrence, D.; Philip, G.; Hopper, K.; Petrie, C. Potential of deep learning segmentation for the extraction of archaeological features from historical map series, Archaeological Prospection, Volume 28 (2021), pp. 187-199 | DOI

[21] Gimenez, B.; Joannin, S.; Pasquet, J.; Beaufort, L.; Gally, Y.; Garidel‐Thoron, T.; Combourieu‐Nebout, N.; Bouby, L.; Canal, S.; Ivorra, S.; Limier, B.; Terral, J.; Devaux, C.; Peyron, O. A user‐friendly method to get automated pollen analysis from environmental samples, New Phytologist, Volume 243 (2024), pp. 797-810 | DOI

[22] Gros-Balthazard, M.; Galimberti, M.; Kousathanas, A.; Newton, C.; Ivorra, S.; Paradis, L.; Vigouroux, Y.; Carter, R.; Tengberg, M.; Battesti, V.; Santoni, S.; Falquet, L.; Pintaud, J.-C.; Terral, J.-F.; Wegmann, D. The Discovery of Wild Date Palms in Oman Reveals a Complex Domestication History Involving Centers in the Middle East and Africa, Current Biology, Volume 27 (2017) | DOI

[23] Hermoza, R.; Sipiran, I. 3D Reconstruction of Incomplete Archaeological Objects Using a Generative Adversarial Network, Proceedings of Computer Graphics International 2018 on - CGI 2018, ACM Press, New York, New York, USA, 2018, pp. 5-11 | DOI

[24] Huet, T. Deep Learning Outperforms Geometric Morphometrics for Archaeobotanical Seed Classification, https://doi.org/10.24072/pci.archaeo.100502, 2025 | DOI

[25] Jeanty, A.; Bouby, L.; Bonhomme, V.; Balfourier, F.; Debiton, C.; Dham, C.; Ivorra, S.; Ros, J.; Evin, A. Barley systematics and taxonomy foreseen by seed morphometric variation, PLoS ONE, Volume 18 (2023), pp. 1-17 | DOI

[26] Jeanty, A.; Ros, J.; Mureau, C.; Dham, C.; Lecomte, C.; Bonhomme, V.; Ivorra, S.; Figueiral, I.; Bouby, L.; Evin, A. Identification of archaeological barley grains using geometric morphometrics and experimental charring, Journal of Archaeological Science, Volume 162 (2024) | DOI

[27] Jesus, A.; Bonhomme, V.; Evin, A.; Ivorra, S.; Soteras, R.; Salavert, A.; Antolín, F.; Bouby, L. A morphometric approach to track opium poppy domestication, Scientific Reports, Volume 11 (2021) | DOI

[28] Kendall, D. A Survey of the Statistical Theory of Shape, Statistical Science, Volume 4 (1989), pp. 81-120 | DOI

[29] Kost, C.; Heil, M. Increased availability of extrafloral nectar reduces herbivory in Lima bean plants (Phaseolus lunatus, Fabaceae, Basic and Applied Ecology, Volume 6 (2005), pp. 237-248 | DOI

[30] Kothari, S.; Phan, J.; Stokes, T.; Osunkoya, A.; Young, A.; Wang, M. Removing Batch Effects From Histopathological Images for Enhanced Cancer Diagnosis, IEEE Journal of Biomedical and Health Informatics, Volume 18 (2014), pp. 765-772 | DOI

[31] Kuhl, F.; Giardina, C. Elliptic Fourier features of a closed contour, Computer graphics and image processing, Volume 18 (1982), pp. 236-258 | DOI

[32] Le, V.-L.; Beurton-Aimar, M.; Zemmari, A.; Marie, A.; Parisey, N. Automated landmarking for insects morphometric analysis using deep neural networks, Ecological Informatics, Volume 60 (2020) | DOI

[33] Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition, Proceedings of the IEEE, Volume 86, 1998, pp. 2278-2324 | DOI

[34] Lee, H.; Troschel, F.; Tajmir, S.; Fuchs, G.; Mario, J.; Fintelmann, F.; Do, S. Pixel-Level Deep Segmentation: Artificial Intelligence Quantifies Muscle on Computed Tomography for Body Morphometric Analysis, Journal of Digital Imaging, Volume 30 (2017), pp. 487-498 | DOI

[35] Loddo, A.; Ruberto, C.; Vale, A.; Ucchesu, M.; Soares, J.; Bacchetta, G. An effective and friendly tool for seed image analysis, The Visual Computer, Volume 39, 2021 | DOI

[36] Miele, V.; Dussert, G.; Cucchi, T.; Renaud, S. Deep learning for species identification of modern and fossil rodent molars, bioRxiv, 2020 | DOI

[37] Ooms, J. magick: Advanced Graphics and Image-Processing in R, Contributed Packages, CRAN, 2016 | DOI

[38] Pagnoux, C.; Bouby, L.; Ivorra, S.; Petit, C.; Valamoti, S.-M.; Pastor, T.; Picq, S.; Terral, J.-F. Inferring the agrobiodiversity of Vitis vinifera L. (grapevine) in ancient Greece by comparative shape analysis of archaeological and modern seeds, Vegetation History and Archaeobotany, Volume 24 (2015), pp. 75-84 | DOI

[39] Picek, L.; Šulc, M.; Patel, Y.; Matas, J. Plant recognition by AI: Deep neural nets, transformers, and kNN in deep embeddings, Frontiers in Plant Science, Volume 13 (2022) | DOI

[40] R Development Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2024

[41] Reese, K. Deep learning artificial neural networks for non-destructive archaeological site dating, Journal of Archaeological Science, Volume 132 (2021) | DOI

[42] Da Rin, G.; Seghezzi, M.; Padoan, A.; Pajola, R.; Bengiamo, A.; Fabio, A.; Dima, F.; Fanelli, A.; Francione, S.; Germagnoli, L.; Lorubbio, M.; Marzoni, A.; Pipitone, S.; Rolla, R.; Bagorria Vaca, M. C.; Bartolini, A.; Bonato, L.; Sciacovelli, L.; Buoro, S. Multicentric evaluation of the variability of digital morphology performances also respect to the reference methods by optical microscopy, International Journal of Laboratory Hematology, Volume 44 (2022), pp. 1040-1049 | DOI

[43] Romero, I.; Kong, S.; Fowlkes, C.; Jaramillo, C.; Urban, M.; Oboh-Ikuenobe, F.; D’Apolito, C.; Punyasena, S. Improving the taxonomy of fossil pollen using convolutional neural networks and superresolution microscopy, Proceedings of the National Academy of Sciences, Volume 117 (2020), pp. 28496-28505 | DOI

[44] Ros, J.; Evin, A.; Bouby, L.; Ruas, M.-P. Geometric morphometric analysis of grain shape and the identification of two-rowed barley (Hordeum vulgare subsp. distichum L.) in southern France, Journal of Archaeological Science, Volume 41 (2014), pp. 568-575 | DOI

[45] Roushannafas, T.; Bogaard, A.; Charles, M. Geometric morphometrics sheds new light on the identification and domestication status of ‘new glume wheat’ at Neolithic Çatalhöyük, Journal of Archaeological Science, Volume 142 (2022) | DOI

[46] Sevillano, V.; Holt, K.; Aznarte, J. Precise automatic classification of 46 different pollen types with convolutional neural networks, Plos One, Volume 15 (2020) | DOI

[47] Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition, ArXiv, 2014 | DOI

[48] Soroush, M.; Mehrtash, A.; Khazraee, E.; Ur, J. Deep Learning in Archaeological Remote Sensing: Automated Qanat Detection in the Kurdistan Region of Iraq, Remote Sensing, Volume 12 (2020) | DOI

[49] Tarongi, M.; Bonhomme, V.; Evin, A.; Ivorra, S.; López, D.; Alonso, N.; Bouby, L. A new way of seeing pulses: preliminary results of geometric morphometric analyses of Iron Age seeds from the site of La Font de la Canya (Barcelona, Spain, Vegetation History and Archaeobotany, Volume 30 (2021), pp. 77-87 | DOI

[50] Terral, J.-F.; Alonso, N.; Chatti, N.; i, C. R.; Fabre, L.; Fiorentino, G.; Marinval, P.; Jordá, G.; Pradat, B.; Rovira, N.; Alibert, P. Historical biogeography of olive domestication (Olea europaea L.) as revealed by geometrical morphometry applied to biological and archaeological material, Journal of Biogeography, Volume 31 (2004), pp. 63-77 | DOI

[51] Terral, J.-F.; Bonhomme, V.; Pagnoux, C.; Ivorra, S.; Newton, C.; Paradis, L.; Ater, M.; Kassout, J.; Limier, B.; Bouby, L.; Cornet, F.; Barazani, O.; Dag, A.; Galili, E. The Shape Diversity of Olive Stones Resulting from Domestication and Diversification Unveils Traits of the Oldest Known 6500-Years-Old Table Olives from Hishuley Carmel Site (Israel, Agronomy, Volume 11 (2021) | DOI

[52] Terral, J.-F.; Newton, C.; Ivorra, S.; Gros-Balthazard, M.; Morais, C.; Picq, S.; Tengberg, M.; Pintaud, J.-C. Insights into the historical biogeography of the date palm (Phoenix dactylifera L.) using geometric morphometry of modern and ancient seeds, Journal of Biogeography, Volume 39 (2012), pp. 929-941 | DOI

[53] Terral, J.-F.; Tabard, E.; Bouby, L.; Ivorra, S.; Pastor, T.; Figueiral, I.; Picq, S.; Chevance, J.-B.; Jung, C.; Fabre, L.; Tardy, C.; Compan, M.; Bacilieri, R.; Lacombe, T.; This, P. Evolution and history of grapevine (Vitis vinifera) under domestication: new morphometric perspectives to understand seed domestication syndrome and reveal origins of ancient European cultivars, Annals of Botany, Volume 105 (2010), pp. 443-455 | DOI

[54] Thom, R. To Predict is not To Explain: Conversations on Mathematics, Science, Catastrophe Theory, Semiophysics, Morphogenesis and Natural Philosophy, Thombooks Press, 2010

[55] Venables, W.; Ripley, B. Modern Applied Statistics with S, Springer, New York, New York, NY, 2002 | DOI

[56] Wallace, M.; Bonhomme, V.; Russell, J.; Stillman, E.; George, T.; Ramsay, L.; Wishart, J.; Timpany, S.; Bull, H.; Booth, A.; Martin, P. Searching for the Origins of Bere Barley: a Geometric Morphometric Approach to Cereal Landrace Recognition in Archaeology, Journal of Archaeological Method and Theory, Volume 26 (2019), pp. 1125-1142 | DOI

[57] Wickham, H.; Averick, M.; Bryan, J.; Chang, W.; McGowan, L.; François, R.; Grolemund, G.; Hayes, A.; Henry, L.; Hester, J.; Kuhn, M.; Pedersen, T.; Miller, E.; Bache, S.; Müller, K.; Ooms, J.; Robinson, D.; Seidel, D.; Spinu, V.; Takahashi, K.; Vaughan, D.; Wilke, C.; Woo, K.; Yutani, H. Welcome to the Tidyverse, Journal of Open Source Software, Volume 4 (2019) | DOI

[58] Willi, M.; Pitman, R.; Cardoso, A.; Locke, C.; Swanson, A.; Boyer, A.; Veldthuis, M.; Fortson, L. Identifying animal species in camera trap images using deep learning and citizen science, Methods in Ecology and Evolution, Volume 10 (2019), pp. 80-91 | DOI

[59] Yuan, M.; Lv, N.; Dong, Y.; Hu, X.; Lu, F.; Zhan, K.; Shen, J.; Wu, X.; Zhu, L.; Xie, Y. A dataset for fine-grained seed recognition, Scientific Data, Volume 11 (2024) | DOI

[60] Zhang, S.; Lu, S.; He, R.; Bao, Z. Stereo Visual Odometry Pose Correction through Unsupervised Deep Learning, Sensors, Volume 21 (2021) | DOI

[61] Zohary, D.; Hopf, M.; Weiss, E. Domestication of Plants in the Old World, Oxford University Press, 2012 | DOI