Section: Ecology
Topic: Ecology

A Julia toolkit for species distribution data

Corresponding author(s): Poisot, Timothée (timothee.poisot@umontreal.ca)

10.24072/pcjournal.589 - Peer Community Journal, Volume 5 (2025), article no. e101

Get full text PDF Peer reviewed and recommended by PCI

Abstract

(1) Species distribution modeling requires to handle varied types of data, and benefits from an integrated approach to programming. (2) We introduce SpeciesDistributionToolkit, a Julia package aiming to facilitate the production of species distribution models. It covers various steps of the data collection and analysis process, extending to the development of interfaces for integration of additional functionalities. (3) By relying on semantic versioning and strong design choices on modularity, we expect that this package will lead to improved reproducibility and long-term maintainability. (4) We illustrate the functionalities of the package through several case studies, accompanied by reproducible code.

Metadata
Published online:
DOI: 10.24072/pcjournal.589
Type: Software tool
Keywords: species distribution models, biogeography, occurrence data, land use, climatic data, pseudo-absences

Poisot, Timothée 1; Bussières-Fournel, Ariane 1; Dansereau, Gabriel 1; Catchen, Michael D. 1

1 Université de Montréal, Département de Sciences Biologiques, Montréal QC, Canada
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
@article{10_24072_pcjournal_589,
     author = {Poisot, Timoth\'ee and Bussi\`eres-Fournel, Ariane and Dansereau, Gabriel and Catchen, Michael D.},
     title = {A {Julia} toolkit for species distribution data},
     journal = {Peer Community Journal},
     eid = {e101},
     publisher = {Peer Community In},
     volume = {5},
     year = {2025},
     doi = {10.24072/pcjournal.589},
     language = {en},
     url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.589/}
}
TY  - JOUR
AU  - Poisot, Timothée
AU  - Bussières-Fournel, Ariane
AU  - Dansereau, Gabriel
AU  - Catchen, Michael D.
TI  - A Julia toolkit for species distribution data
JO  - Peer Community Journal
PY  - 2025
VL  - 5
PB  - Peer Community In
UR  - https://peercommunityjournal.org/articles/10.24072/pcjournal.589/
DO  - 10.24072/pcjournal.589
LA  - en
ID  - 10_24072_pcjournal_589
ER  - 
%0 Journal Article
%A Poisot, Timothée
%A Bussières-Fournel, Ariane
%A Dansereau, Gabriel
%A Catchen, Michael D.
%T A Julia toolkit for species distribution data
%J Peer Community Journal
%D 2025
%V 5
%I Peer Community In
%U https://peercommunityjournal.org/articles/10.24072/pcjournal.589/
%R 10.24072/pcjournal.589
%G en
%F 10_24072_pcjournal_589
Poisot, T.; Bussières-Fournel, A.; Dansereau, G.; Catchen, M. D. A Julia toolkit for species distribution data. Peer Community Journal, Volume 5 (2025), article  no. e101. https://doi.org/10.24072/pcjournal.589

PCI peer reviews and recommendation, and links to data, scripts, code and supplementary information: 10.24072/pci.ecology.100789

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Full text

The full text below may contain a few conversion errors compared to the version of record of the published article.

Introduction

Species Distribution Models (SDMs; Elith and Leathwick, 2009), in addition to being key tools to further our knowledge of biodiversity, are key components of effective conservation decisions (Guisan et al., 2013), planning (McShea, 2014), and ecological impact assesment (Baker et al., 2021). The training and evaluation of an SDM is a complex process, with key decisions to make on design and reporting (Zurell et al., 2020). The ability to use the correct data format of representation at these steps is central to support the correct interpretation of these models (Araújo et al., 2019). This is particularly true since the choice of data source can affect the prediction significantly (Merkenschlager et al., 2023; Arenas-Castro et al., 2022; Booth, 2022), suggesting that there is a need for flexible pipelines in which data sources can be conveniently swapped. In recent years, there has been an increase in the number of software packages and tools to assist ecologists with various steps of the development of species distribution models.

As Kass et al. (2024) point out, this increase in the diversity of software tools (most of them in the R language) is a good thing. Because SDMs are a general-purpose methodology, a varied software offer increases the chances that specific decisions can be chained together in the way that best support a specific use case. By making code available for all users, package developers reduce the need for custom implementation of analytical steps, and contribute to the adoption of good practices in the field. However, because building, validating, and applying SDMs requires a diversity of data types, from different sources, many existing packages have been designed independently. Therefore, they may suffer from low interoperability, which can create friction when using multiple tools together. As an illustration, Kellner et al. (2025) highlight that, out of publications on abundance or distribution models that share code and data, about 20% are not reproducible because of issues in package dependencies.

To promote interoperability and improve reproductibility, tools that provide an integrated environment are important. In this manuscript, we present SpeciesDistributionToolkit (abbreviated as SDT), a meta-package for the Julia programming language, offering an integrated environment for the retrieval, formatting, and interpretation of data relevant to the modeling of species distributions. SDT was in part designed to work within the BON-in-a-Box project (Gonzalez et al., 2023; Griffith et al., 2024), a GEO BON initiative to facilitate the calculation and reporting of biodiversity indicators supporting the Kunming-Montréal Global Biodiversity Framework. A leading design consideration for SDT was therefore to maximize interoperability between components and functionalities from the ground up. This is achieved through three mechanisms. First, by relying on strict semantic versioning: package releases provide information about the compatibility of existing code. Second, through the use of interfaces: separate software components (including ones external to the package) can interact without prior knowledge of either implementation, and without dependencies between the components of SDT. Finally, through the use of Julia’s extension mechanism, where packages gain functionalities when loaded within the same project.In this manuscript, we provide a high-level overview of the functionalities of the package(s) forming SDT. We then discuss design principles that facilitate long-term maintenance, development, and integration. We finish by presenting four illustrative case studies: extraction of data at known species occurrences, manipulation of multiple geospatial layers, training and explanation of a SDM, and creation of virtual communities to simulate the spatial distribution of ecological uniqueness. This later case study is intended to provide an impression of what using SDT as a support for the development of novel analyses feels like. All of the case studies are available as supplementary material, in the form of fully reproducible, self-contained Jupyter notebooks.

Methods

SpeciesDistributionToolkit is released as a package for the Julia programming language (Bezanson et al., 2017). It is licensed under the open-source initiative approved MIT license. It has evolved from a previous collection of packages to handle GBIF and raster data (Dansereau and Poisot, 2021), and now provides extended functionalities as well as improved performance. The package is registered in the Julia package repository and can be downloaded and installed anonymously. It is compatible with the current stable and long-term support (LTS) releases of Julia. The full source code, complete commit history, plans for future development, and a forum, are available at https://github.com/PoisotLab/SpeciesDistributionToolkit.jl. This page additionally has a link to the documentation, containing a full reference for the package functions, a series of brief how-to examples, and longer vignettes showcasing more integrative tutorials.

An overview of the SDT package is given in Figure 1. The project is organized as a “monorepo”, in which separate but interoperable packages (meaning that they can be installed independently, but are designed to work cohesively) reside. This allows expanding the scope of the package by moving functionalities into new component packages, without requiring interventions from users. As SDT is registered in the Julia package repository, it can be installed by using add SpeciesDistributionToolkit when in package mode at the Julia prompt. When loading the SDT package with using SpeciesDistributionToolkit, all component packages are automatically and transparently loaded. Therefore, users do not need to know where a specific method or function resides to use it. The monorepo structure has an important advantage for users: the code of all component packages can be found in the same location, and it makes inspecting the internal implementation of any package easier. In addition, users can open an issue describing a problem or desired feature within the monorepo, without needing to understand which component package is the right target for this issue. This both decreases barriers to interact with the software, while also facilitating the work of contributors who can look at all the issues to address in a centralized way. Similarly, monorepo lend themselves to integrated documentation, which is the approach we have chosen with the online SDT manual.

Figure 1 - Overview of the packages included in SpeciesDistributionToolkit. The packages are color-coded by intended use (acquisition, representation, and analysis of data). The specific content of each package is presented in the main text. Note that because the package relies on interfaces to facilitate code interoperability, there are few dependency relationships (black arrows). Some packages can interact with data sources, represented on the left side of the figure. When loading SpeciesDistributionToolkit, all public methods from the package are accessible to the user. Packages that are supported through extensions are in dashed boxes.

SDT uses the built-in Julia package manager to keep all dependencies up to date. Furthermore, we use strict semantic versioning: major versions correspond to changes that would break user-developped code; minor versions represent additional functionalities; patch releases cover minor bug fixes or documentation changes. All component packages are versioned independently, and have their own CHANGELOG file documenting each release. This strict reliance on semantic versioning removes the issues of maintaining compatibility when new functionalities are added: all releases in the v1.x.x branch of SDT depend on component packages in their respective v1.x.x branch, and users can benefit from new functionalities without needing to adapt existing code. This behavior is extensively tested, both through unit tests and through integration testing generated as part of the online documentation.

Component packages

The SDT package primarily provides integration between the other packages via method overloading (reusing method names for intuitive and concise code), allowing to efficiently join packages together (Roesch et al., 2023). Additional functionalities that reside in the top-level package are the handling of polygon data and zonal statistics, and various quality of life methods. Because of the modular nature of the code, any of these functions can be transparently moved to their own packages without affecting reproducibility. Note that all packages can still be installed (and would be fully functional) independently.

The SimpleSDMLayers package offers a series of types to represent raster data in arbitrary projections defined by a proj string (Evenden et al., 2024). This package provides the main data representation for most spatial functionalities that SDT supports, and handles saving and loading data. It also contains utility functions to deal with raster data, including interpolation to different spatial grids and CRS, rescaling and quantization of data, masking, and most mathematical operations that can be applied to rasters.

OccurrencesInterface is a light-weight package to provide a common interface for occurrence data. It implements abstract and concrete types to define a single occurrence and a collection thereof, and a series of methods allowing any occurrence data provider (e.g. GBIF) or data representation to become fully interoperable with the rest of SDT. All SDT methods that handle occurrence data do so through the interface provided by the OccurrencesInterface package, allowing future data sources to be integrated without the need for new code.

The GBIF package offers access to the gbif.org streaming API (GBIF: The Global Biodiversity Information Facility, 2025), including the ability to retrieve, filter, and restart downloads. Although this package provides a rich data representation for occurrence data when access to the full GBIF data schema is required, all the objects it returns adhere to the OccurrencesInterface interface. The package also offers the functionality to download datasets from GBIF using their DOI.

SimpleSDMDatasets implements an interface to retrieve and locally store raster data, which can be extended by users to support additional data sources. It offers access to a series of common data sources for spatial biodiversity modeling, including the biodiversity mapping project (Jenkins et al., 2013), the EarthEnv collection for land cover (Tuanmu and Jetz, 2014) and habitat heterogeneity (Tuanmu and Jetz, 2015), Copernicus land cover 100m data (Buchhorn et al., 2020), PaleoClim (Brown et al., 2018) data, WorldClim 1 and 2 (Fick and Hijmans, 2017) and CHELSA 1 and 2 (Karger et al., 2017) and their projections under various RCPs and SSPs.

SimpleSDMPolygons uses the interface from SimpleSDMDatasets to offer access to geospatial polygons, including the GADM database, the OpenStreetMap polygon API, as well as several providers for georegions, ecoregions, and ecoprovinces (Dinerstein et al., 2017; Olson et al., 2001).

Phylopic offers a wrapper around the phylopic.org API to download silhouettes for taxonomic entities. It also provides utilities for citation of the downloaded images. Its functionalities are similar to the rphylopic package (Gearty and Jones, 2023).

Fauxcurrences is inspired by the work of Osborne et al. (2022), and allows generating a series of simulated occurrence data that have the same statistical structure as observed ones. The package supports multi-species data, with user-specified weights for conserving intra and inter-specific occurrence distances.

PseudoAbsences offers functions to place pseudo-absences points on layers, under various constraint on range and distance to existing observations (Barbet-Massin et al., 2012).

Finally, SDeMo provides a high-level interface to the training, validation, and interpretation of species distribution modeling. The package is built around a series of data transformation steps (PCA, Whitening, z-score, which can be chained together) and several classifiers, currently including BIOCLIM (Booth et al., 2014), Naive Bayes, logistic regression, and decision trees. SDeMo offers functions to demonstrate training and evaluation of SDMs, as well as techniques related to heterogeneous ensembles and bagging with support for arbitrary consensus (Marmion et al., 2009) and voting (Drake, 2014) functions. SDeMo promotes the use of interpretable techniques: the package supports regular (Elith et al., 2005) and inflated (Zurell et al., 2012) partial responses, as well as the calculation and mapping of Shapley values (Wadoux et al., 2023; Mesgaran, Cousens, and Webber, 2014) using the standard Monte-Carlo approach (Mitchell et al., 2021). Counterfactuals (Van Looveren and Klaise, 2019; Karimi et al., 2019), representing perturbation of the input data leading to the opposite prediction (i.e. “what environmental conditions would lead to the species being absent”) can also be generated. The API of SDeMo has been designed to (i) enforce the use of best practices, and (ii) be consistent across analyses, so that the package can be used for educational material. Despite the focus on education, SDeMo has been thoroughly tested and may be used for research. As it implements a generic interface to any predictive model, users can expand it by adding additional classifiers or transformers. This can be done either through a contribution to the SDT repository, or as part of the code written by users for a specific analysis.

Case studies

In this section, we provide a series of case studies to illustrate the use of the package. The on-line manual offers longer tutorials, as well as a series of how-to vignettes to illustrate the full scope of what the package allows. As the notebooks accompanying this article cover the full code required to run these case studies, we do not present code snippets in the main text (as they are presented with detailed explanations in the Supp. Mat.), but rather focus on explaining how the component packages work together in each example.

Landcover consensus map

In this case study (Supp. Mat. 1), we retrieve the land cover data from Tuanmu and Jetz (2014), clip them to a GeoJSON polygon describing the country of Paraguay (SimpleSDMPolygons can download data directly from several polygon providers), and apply the mosaic operation to figure out which class is the most locally abundant. This case study uses the SimpleSDMDatasets package to download (and locally cache) the raster data, as well as the SimpleSDMLayers package to provide basic utility functions on raster data. The results are presented in Figure 2.

SimpleSDMDatasets uses local storage of raster data for future use, to avoid re-downloading data upon repeated use. The location of the data is (i) standardized by the package itself, making the file findable to humans, and (ii) changeable by the user to, e.g., store the data within the project folder rather than in a central location. As much as possible, SDT will only read the part of the raster data that is required given the region of interest to the user. This is done by providing additional context in the form of a bounding box (in WGS84, regardless of the underlying raster data projection, in line with the GeoJSON specification). SDT has methods to calculate the bounding box for all the objects it supports.

Using data from GBIF

SDT provides strong integration between data on species occurrences and source of geospatial information. To illustrate this, we will collect data on the distribution of Akodon montensis (Rodentia, family Cricetidae), a known host of orthohantaviruses (Burgos et al., 2021; Owen et al., 2010), in Paraguay. In Supp. Mat. 2 we (i) request occurrence data using the GBIF package, (ii) download the silhouette of the species through Phylopic, and (iii) extract temperature and precipitation data at the points of occurrence based on bioclimatic data layers. The results are presented in Figure 3. The full notebook includes information about basic operations on raster data, as well as extraction of data based on occurrence records.

In practice, although the data are retrieved using the GBIF package, they are used internally by SDT through the OccurrencesInterface package. This package defines a small convention to handle georeferenced occurrence data, and allows to transparently integrate additional occurrence sources. By defining a handful of methods for a custom data type, or by using the conversions built into the package, users can plug-in any occurrence data source or csv file, and enjoy full compatibility with the entire SDT functionalities.

The GBIF package also supports download of archived GBIF datasets; in the following example, we have generated a dataset from this query, which can be accessed online (GBIF.org, 2025).

Figure 2 - Land cover consensus (defined as the class with the strongest local representation) in the country of Paraguay. Only the classes that were most abundant in at least one pixel are represented. The code to produce this figure is available as Supp. Mat. 2.

Figure 3 - Relationship between temperature and precipitation (BIO1 and BIO12) at each georeferenced occurrence known to GBIF for Akodon montensis. The code to produce this figure is available as Supp. Mat. 1.

Training a species distribution model

In this case study, we illustrate the integration of SDeMo and SimpleSDMLayers to train a species distribution model. Specifically, we re-use the data from Figure 3, with additional layers of bioclimatic variables. We train a rotation forest (Bagnall et al., 2018), an homogeneous ensemble of PCA followed by decision trees where each model has a subset of features and training data. The results are presented in Figure 4. The model is built by selecting an optimal suite of BioClim variables, then predicted in space, and the resulting predicted species range is finally clipped by the elevational range observed in the occurrence data. The data transformations in SDeMo are always applied in a way that prevents the possibility of data leakage (Stock et al., 2023). Because SDeMo works through generic functions, these methods can be applied to any model specified by the user. In practice, generic purpose ML frameworks in Julia, notably MLJ (Blaom et al., 2020), can also be used and interfaced with SDT by using the classifier and transformer interface.

By default, SDeMo will always split data for cross-validation in a way that respect class balance; in other words, the prevalence of the species is always the same in the validation and training set (this is also true when bootstrapping observations to construct homogeneous ensembles). This behavior can be adjusted, or the user may design their own training and validation sets. In the future, PseudoAbsences will be extended to introduce stratified cross-validation (Roberts et al., 2017).

The full notebook (Supp. Mat. 3) has additional information on routines for variable selection, stratified cross-validation, as well as the construction of the ensemble from a single PCA and decision tree. In addition, we report in Figure 5 the partial and inflated partial responses to the most important variable (highlighting an interpretable effect of the variable in the model), as well as the (Monte-Carlo) Shapley values (Wadoux et al., 2023; Mitchell et al., 2021) for each prediction in the training set. Checking the partial responses, in particular in space, is an important step, as some bioclimatic variables are known to have discontinuities stemming from their interpolation that can bias the predicted range of a species (Booth, 2022).

Figure 4 - Predicted range of Akodon montensis in Paraguay based on a rotation forest trained on GBIF occurrences and the BioClim variables. The predicted range is clipped to the elevational range of the species. The code to produce this figure is available as Supp. Mat. 3.

Figure 5 - Partial responses (red) and inflated partial responses (grey) to the most important variable. In addition, the Shapley values for all training data are presented in the same figure; green points are presences, and pale points are pseudo-absences. Shapley values were added to the average model prediction to be comparable to partial responses. The code to produce this figure is available as Supp. Mat. 3.

Figure 6 - Virtual distribution of normalized (mean of 0 and unit variance) locality contribution to beta-diversity (Legendre and De Cáceres, 2013), based on a pool of 100 virtual species. The inset histogram represents the standardized species contribution to beta-diversity. Red areas represent comparatively more unique areas in terms of simulated species composition. The code to produce this figure is available as Supp. Mat. 4.

Species and location contribution to beta diversity

In the final case study (Supp. Mat. 4), we simulate the distribution of virtual species (Hirzel et al., 2001) with a logistic response to two environmental covariate (Leroy et al., 2016). We then use this simulated sample to perform the decomposition of 𝛽-diversity introduced by Legendre and De Cáceres (2013) and applied by Dansereau et al. (2022) to spatially continuous data. This simulates the potential distribution of hotspots and coldspots of ecological uniqueness. The results are presented in Figure 6.

Because the layers used by SDT are broadcastable, we can rapidly apply a function (here, the logistic response to the environmental covariate) to each layer, and then multiply the suitabilities together. The last step is facilitated by the fact that most basic arithmetic operations are defined for layers, allowing for example to add, multiply, substract, and divide them by one another.

Conclusion

We have presented SpeciesDistributionToolkit, a package for the Julia programming language aiming to facilitate the collection, curation, analysis, and visualisation of data commonly used in species distribution modeling. Through the use of interfaces and a modular design, we have made this package robust to changes, easy to add functionalities to, and well integrated to the rest of the Julia ecosystem. All code for the case studies can be found in Supp. Mat. 1-4. Plans for active development of the package are focused on (i) additional techniques for pseudo-absence generations, to be incorporated in the PseudoAbsences package, (ii) full compatibility with the MultivariateStatistics for transformation, and (iii) additional SDeMo functionalities to allow cross-validation techniques with biologically relevant structure (Roberts et al., 2017).

The SDT package benefits from close integration with other packages in the Julia universe. Notably, this includes Makie (including GeoMakie; Danisch and Krumbiegel, 2021) for plotting and interactive data visualisation: all relevant plot types are overloaded for layer and occurrence data. Most data handled by SDT can be exported using the Tables interface, which allows data to be consumed by other packages like DataFrames (Bouchet-Valat and Kamiński, 2023) and MLJ (Blaom et al., 2020), or directly saved as csv files. Interfaces to internal Julia methods are implemented whenever they are pertinent. SimpleSDMLayers and OccurrencesInterface objects behave like arrays, are iterable, and broadcastable. The SDeMo package relies in part on the StatsAPI interface, allowing to easily define new data transformation and classifier types to support additional features. Achieving integration with other packages through method overloading and the adherence to well-established interfaces is important, as it increases the chances that additional functionalities external to SDT can be used directly or fully supported with minimal addition of code. For situations where interfaces are not sufficient to link with other packages, we rely on Julia’s extension mechanism. For instance, SimpleSDMLayers objects can be used with Clustering, MultivariateStats, as well as SpatialBoundaries (Strydom and Poisot, 2023), with strict version bounds, ensuring that this integration will remain usable regardless of possible changes in external packages.

A key advantage of Julia for species distribution modeling is its emphasis on extensibility and composability. In developing SDT, we leveraged these strengths by ensuring that each component package operates independently, while the top-level package provides additional methods to integrate their functionalities. Through method overloading, we minimize the number of unique function names users must learn—core operations such as arithmetic, dimension queries, and statistical summaries are consistently available across relevant data types. This unified approach not only streamlines the user experience but also makes the code more readable and accessible, which is particularly beneficial in educational settings. SDT is intentionally structured to promote best practices and long-term sustainability. The unified interface for occurrence, raster, and polygon data allows new data sources or representations to be incorporated with minimal changes to existing workflows. Strict adherence to semantic versioning and interface-based design ensures that updates do not compromise reproducibility. Advanced model interpretation tools—including Shapley values and counterfactuals—are built in, which will help with their adoption, and users can extend the modeling pipeline with custom classifiers or data transformations via Julia’s multiple dispatch.

Data, script, and code availability

The package can be installed from the general Julia registry, and the version used for this manuscript is archived at https://doi.org/10.5281/zenodo.15926733 (Poisot et al., 2025a); occurrence data for Akodon montensis are published by GBIF.org and available from https://doi.org/10.15468/dl.d3cxpr (GBIF.org, 2025); the code for all supplementary material is available from https://doi.org/10.5281/zenodo.15923830 (Poisot et al., 2025b). All of these resources can be accessed for free and anonymously.

Conflict of interest disclosure

The authors declare that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article. Timothée Poisot is a recommender and member of the managing board of PCI Ecology.

Funding

TP is funded by an NSERC Discovery grant, a Discovery Acceleration Supplement grant, and a Wellcome Trust grant (223764/Z/21/Z). MDC is funded by an IVADO Postdoctoral Fellowship.

Acknowledgements

Preprint version 2 of this article has been peer-reviewed and recommended by Peer Community In Ecology (https://doi.org/10.24072/pci.ecology.100789; Estay, 2025).


References

[1] Araújo, M. B.; Anderson, R. P.; Márcia Barbosa, A.; Beale, C. M.; Dormann, C. F.; Early, R.; Garcia, R. A.; Guisan, A.; Maiorano, L.; Naimi, B.; O'Hara, R. B.; Zimmermann, N. E.; Rahbek, C. Standards for distribution models in biodiversity assessments, Science advances, Volume 5 (2019) no. 1, p. eaat4858 | DOI

[2] Arenas-Castro, S.; Regos, A.; Martins, I.; Honrado, J.; Alonso, J. Effects of input data sources on species distribution model predictions across species with different distributional ranges, Journal of biogeography, Volume 49 (2022) no. 7, pp. 1299-1312 | DOI

[3] Bagnall, A.; Flynn, M.; Large, J.; Line, J.; Bostrom, A.; Cawley, G. Is rotation forest the best classifier for problems with continuous features?, arXiv [cs.LG] (2018)

[4] Baker, D. J.; Maclean, I. M. D.; Goodall, M.; Gaston, K. J. Species distribution modelling is needed to support ecological impact assessments, The journal of applied ecology, Volume 58 (2021) no. 1, pp. 21-26 | DOI

[5] Barbet-Massin, M.; Jiguet, F.; Albert, C. H.; Thuiller, W. Selecting pseudo‐absences for species distribution models: how, where and how many?: How to use pseudo-absences in niche modelling?, Methods in ecology and evolution, Volume 3 (2012) no. 2, pp. 327-338 | DOI

[6] Bezanson, J.; Edelman, A.; Karpinski, S.; Shah, V. B. Julia: A fresh approach to numerical computing, SIAM review. Society for Industrial and Applied Mathematics, Volume 59 (2017) no. 1, pp. 65-98 | DOI

[7] Blaom, A.; Kiraly, F.; Lienart, T.; Simillides, Y.; Arenas, D.; Vollmer, S. MLJ: A Julia package for composable machine learning, Journal of open source software, Volume 5 (2020) no. 55, p. 2704 | DOI

[8] Booth, T. H. Checking bioclimatic variables that combine temperature and precipitation data before their use in species distribution models, Austral ecology, Volume 47 (2022) no. 7, pp. 1506-1514 | DOI

[9] Booth, T. H.; Nix, H. A.; Busby, J. R.; Hutchinson, M. F. bioclim: the first species distribution modelling package, its early applications and relevance to most current MaxEnt studies, Diversity & distributions, Volume 20 (2014) no. 1, pp. 1-9 | DOI

[10] Bouchet-Valat, M.; Kamiński, B. DataFrames.Jl: Flexible and fast tabular data in Julia, Journal of statistical software, Volume 107 (2023) no. 4 | DOI

[11] Brown, J. L.; Hill, D. J.; Dolan, A. M.; Carnaval, A. C.; Haywood, A. M. PaleoClim, high spatial resolution paleoclimate surfaces for global land areas, Scientific data, Volume 5 (2018) no. 1, p. 180254 | DOI

[12] Burgos, E. F.; Vadell, M. V.; Bellomo, C. M.; Martinez, V. P.; Salomon, O. D.; Gómez Villafañe, I. E. First evidence of Akodon-borne orthohantavirus in northeastern Argentina, EcoHealth, Volume 18 (2021) no. 4, pp. 429-439 | DOI

[13] Danisch, S.; Krumbiegel, J. Makie.jl: Flexible high-performance data visualization for Julia, Journal of open source software, Volume 6 (2021) no. 65, p. 3349 | DOI

[14] Dansereau, G.; Legendre, P.; Poisot, T. Evaluating ecological uniqueness over broad spatial extents using species distribution modelling, Oikos (Copenhagen, Denmark), Volume 2022 (2022) no. 5, p. e09063 | DOI

[15] Dansereau, G.; Poisot, T. SimpleSDMLayers.Jl and GBIF.Jl: A framework for species distribution modeling in Julia, Journal of open source software, Volume 6 (2021) no. 57, p. 2872 | DOI

[16] Dinerstein, E.; Olson, D.; Joshi, A.; Vynne, C.; Burgess, N. D.; Wikramanayake, E.; Hahn, N.; Palminteri, S.; Hedao, P.; Noss, R.; Hansen, M.; Locke, H.; Ellis, E. C.; Jones, B.; Barber, C. V.; Hayes, R.; Kormos, C.; Martin, V.; Crist, E.; Sechrest, W.; Price, L.; Baillie, J. E. M.; Weeden, D.; Suckling, K.; Davis, C.; Sizer, N.; Moore, R.; Thau, D.; Birch, T.; Potapov, P.; Turubanova, S.; Tyukavina, A.; de Souza, N.; Pintea, L.; Brito, J. C.; Llewellyn, O. A.; Miller, A. G.; Patzelt, A.; Ghazanfar, S. A.; Timberlake, J.; Klöser, H.; Shennan-Farpón, Y.; Kindt, R.; Lillesø, J.-P. B.; van Breugel, P.; Graudal, L.; Voge, M.; Al-Shammari, K. F.; Saleem, M. An ecoregion-based approach to protecting half the terrestrial realm, Bioscience, Volume 67 (2017) no. 6, pp. 534-545 | DOI

[17] Drake, J. M. Ensemble algorithms for ecological niche modeling from presence‐background and presence‐only data, Ecosphere (Washington, D.C), Volume 5 (2014) no. 6, pp. 1-16 | DOI

[18] Elith, J.; Ferrier, S.; Huettmann, F.; Leathwick, J. The evaluation strip: A new and robust method for plotting predicted responses from species distribution models, Ecological modelling, Volume 186 (2005) no. 3, pp. 280-289 | DOI

[19] Elith, J.; Leathwick, J. R. Species distribution models: Ecological explanation and prediction across space and time, Annual review of ecology, evolution, and systematics, Volume 40 (2009) no. 1, pp. 677-697 | DOI

[20] Estay, S. Expanding the Software Ecosystem for Species Distribution Modeling: A New Alternative in Julia, Peer Community in Ecology (2025) | DOI

[21] Evenden, G. I.; Rouault, E.; Warmerdam, F.; Evers, K.; Knudsen, T.; Butler, H.; Taves, M. W.; Schwehr, K.; Sales de Andrade, E.; Karney, C.; Couwenberg, S.; Dawson, N.; Snow, A. D.; Jimenez Shaw, J. PROJ, 2024 | DOI

[22] Fick, S. E.; Hijmans, R. J. WorldClim 2: new 1‐km spatial resolution climate surfaces for global land areas: NEW CLIMATE SURFACES FOR GLOBAL LAND AREAS, International journal of climatology: a journal of the Royal Meteorological Society, Volume 37 (2017) no. 12, pp. 4302-4315 | DOI

[23] GBIF: The Global Biodiversity Information Facility What is GBIF?, https://www.gbif.org/what-is-gbif , 2025 (accessed on Sept. 16, 2025)

[24] GBIF.org GBIF Occurrence Download, The Global Biodiversity Information Facility, 2025 | DOI

[25] Gearty, W.; Jones, L. A. rphylopic: An R package for fetching, transforming, and visualising PhyloPic silhouettes, Methods in ecology and evolution, Volume 14 (2023) no. 11, pp. 2700-2708 | DOI

[26] Gonzalez, A.; Vihervaara, P.; Balvanera, P.; Bates, A. E.; Bayraktarov, E.; Bellingham, P. J.; Bruder, A.; Campbell, J.; Catchen, M. D.; Cavender-Bares, J.; Chase, J.; Coops, N.; Costello, M. J.; Dornelas, M.; Dubois, G.; Duffy, E. J.; Eggermont, H.; Fernandez, N.; Ferrier, S.; Geller, G. N.; Gill, M.; Gravel, D.; Guerra, C. A.; Guralnick, R.; Harfoot, M.; Hirsch, T.; Hoban, S.; Hughes, A. C.; Hunter, M. E.; Isbell, F.; Jetz, W.; Juergens, N.; Kissling, W. D.; Krug, C. B.; Le Bras, Y.; Leung, B.; Londoño-Murcia, M. C.; Lord, J.-M.; Loreau, M.; Luers, A.; Ma, K.; MacDonald, A. J.; McGeoch, M.; Millette, K. L.; Molnar, Z.; Mori, A. S.; Muller-Karger, F. E.; Muraoka, H.; Navarro, L.; Newbold, T.; Niamir, A.; Obura, D.; O'Connor, M.; Paganini, M.; Pereira, H.; Poisot, T.; Pollock, L. J.; Purvis, A.; Radulovici, A.; Rocchini, D.; Schaepman, M.; Schaepman-Strub, G.; Schmeller, D. S.; Schmiedel, U.; Schneider, F. D.; Shakya, M. M.; Skidmore, A.; Skowno, A. L.; Takeuchi, Y.; Tuanmu, M.-N.; Turak, E.; Turner, W.; Urban, M. C.; Urbina-Cardona, N.; Valbuena, R.; van Havre, B.; Wright, E. A global biodiversity observing system to unite monitoring and guide action, Nature ecology & evolution (2023), pp. 1-5 | DOI

[27] Griffith, J.; Lord, J.-M.; Catchen, M. D.; Arce-Plata, M. I.; Bohorquez, M. F. G.; Chandramohan, M.; Diaz-Corzo, M. C.; Gravel, D.; Gonzalez, L. F. U.; Gutiérrez, C.; Helfenstein, I.; Hoban, S.; Kass, J. M.; Laroque, G.; Laikre, L.; Leigh, D.; Leung, B.; Mastretta-Yanes, A.; Millette, K.; Moreno, D.; Molina-Berbeo, M. A.; Norman, K.; Rincón-Parra, V. J.; Pahls, S.; Peres-Neto, P. R.; Perreira, K.; Poisot, T.; Pollock, L. J.; Rodríguez, M. H. O.; Röösli, C.; Rousseu, F.; Sánchez-Clavijo, L. M.; Schuman, M. C.; Selmoni, O.; da Silva, J.; Surasinghe, T.; Turak, E.; Valencia, E. S.; Valentin, S.; Wightman, N.; Zuloaga, J.; Murcia, M. C. L.; Gonzalez, A. BON in a Box: An Open and Collaborative Platform for Biodiversity Monitoring, Indicator Calculation, and Reporting, EcoEvoRxiv (2024) | DOI

[28] Guisan, A.; Tingley, R.; Baumgartner, J. B.; Naujokaitis-Lewis, I.; Sutcliffe, P. R.; Tulloch, A. I. T.; Regan, T. J.; Brotons, L.; McDonald-Madden, E.; Mantyka-Pringle, C.; Martin, T. G.; Rhodes, J. R.; Maggini, R.; Setterfield, S. A.; Elith, J.; Schwartz, M. W.; Wintle, B. A.; Broennimann, O.; Austin, M.; Ferrier, S.; Kearney, M. R.; Possingham, H. P.; Buckley, Y. M. Predicting species distributions for conservation decisions, Ecology letters, Volume 16 (2013) no. 12, pp. 1424-1435 | DOI

[29] Hirzel, A. H.; Helfer, V.; Metral, F. Assessing habitat-suitability models with a virtual species, Ecological modelling, Volume 145 (2001) no. 2-3, pp. 111-121 | DOI

[30] Jenkins, C. N.; Pimm, S. L.; Joppa, L. N. Global patterns of terrestrial vertebrate diversity and conservation, Proceedings of the National Academy of Sciences of the United States of America, Volume 110 (2013) no. 28, pp. E2602-10 | DOI

[31] Karger, D. N.; Conrad, O.; Böhner, J.; Kawohl, T.; Kreft, H.; Soria-Auza, R. W.; Zimmermann, N. E.; Linder, H. P.; Kessler, M. Climatologies at high resolution for the earth's land surface areas, Scientific data, Volume 4 (2017) no. 1, p. 170122 | DOI

[32] Karimi, A.-H.; Barthe, G.; Balle, B.; Valera, I. Model-agnostic counterfactual explanations for consequential decisions, arXiv [cs.LG] (2019)

[33] Kass, J. M.; Smith, A. B.; Warren, D. L.; Vignali, S.; Schmitt, S.; Aiello-Lammens, M. E.; Arlé, E.; Márcia Barbosa, A.; Broennimann, O.; Cobos, M. E.; Guéguen, M.; Guisan, A.; Merow, C.; Naimi, B.; Nobis, M. P.; Ondo, I.; Osorio-Olvera, L.; Owens, H. L.; Pinilla-Buitrago, G. E.; Sánchez-Tapia, A.; Thuiller, W.; Valavi, R.; Velazco, S. J. E.; Zizka, A.; Zurell, D. Achieving higher standards in species distribution modeling by leveraging the diversity of available software, Ecography (2024) | DOI

[34] Kellner, K. F.; Doser, J. W.; Belant, J. L. Functional R code is rare in species distribution and abundance papers, Ecology, Volume 106 (2025) no. 1, p. e4475 | DOI

[35] Legendre, P.; De Cáceres, M. Beta diversity as the variance of community data: dissimilarity coefficients and partitioning, Ecology letters, Volume 16 (2013) no. 8, pp. 951-963 | DOI

[36] Leroy, B.; Meynard, C. N.; Bellard, C.; Courchamp, F. virtualspecies, an R package to generate virtual species distributions, Ecography, Volume 39 (2016) no. 6, pp. 599-607 | DOI

[37] Marmion, M.; Parviainen, M.; Luoto, M.; Heikkinen, R. K.; Thuiller, W. Evaluation of consensus methods in predictive species distribution modelling, Diversity & distributions, Volume 15 (2009) no. 1, pp. 59-69 | DOI

[38] McShea, W. J. What are the roles of species distribution models in conservation planning?, Environmental conservation, Volume 41 (2014) no. 2, pp. 93-96 | DOI

[39] Merkenschlager, C.; Bangelesa, F.; Paeth, H.; Hertig, E. Blessing and curse of bioclimatic variables: A comparison of different calculation schemes and datasets for species distribution modeling within the extended Mediterranean area, Ecology and evolution, Volume 13 (2023) no. 10, p. e10553 | DOI

[40] Mesgaran, M. B.; Cousens, R. D.; Webber, B. L. Here be dragons: a tool for quantifying novelty due to covariate range and correlation change when projecting species distribution models, Diversity & distributions, Volume 20 (2014) no. 10, pp. 1147-1159 | DOI

[41] Mitchell, R.; Cooper, J.; Frank, E.; Holmes, G. Sampling Permutations for Shapley Value Estimation, arXiv [stat.ML] (2021)

[42] Olson, D. M.; Dinerstein, E.; Wikramanayake, E. D.; Burgess, N. D.; Powell, G. V. N.; Underwood, E. C.; D'amico, J. A.; Itoua, I.; Strand, H. E.; Morrison, J. C.; Loucks, C. J.; Allnutt, T. F.; Ricketts, T. H.; Kura, Y.; Lamoreux, J. F.; Wettengel, W. W.; Hedao, P.; Kassem, K. R. Terrestrial ecoregions of the world: A new map of life on earth, Bioscience, Volume 51 (2001) no. 11, p. 933 | DOI

[43] Osborne, O. G.; Fell, H. G.; Atkins, H.; van Tol, J.; Phillips, D.; Herrera-Alsina, L.; Mynard, P.; Bocedi, G.; Gubry-Rangin, C.; Lancaster, L. T.; Creer, S.; Nangoy, M.; Fahri, F.; Lupiyaningdyah, P.; Sudiana, I. M.; Juliandi, B.; Travis, J. M. J.; Papadopulos, A. S. T.; Algar, A. C. Fauxcurrence: simulating multi‐species occurrences for null models in species distribution modelling and biogeography, Ecography, Volume 2022 (2022) no. 7, p. e05880 | DOI

[44] Owen, R. D.; Goodin, D. G.; Koch, D. E.; Chu, Y.-K.; Jonsson, C. B. Spatiotemporal variation in Akodon montensis (Cricetidae: Sigmodontinae) and hantaviral seroprevalence in a subtropical forest ecosystem, Journal of Mammalogy, Volume 91 (2010) no. 2, pp. 467-481 | DOI

[45] Poisot, T.; Dansereau, G.; Catchen, M.; Borregaard, M.; Stock, M.; Singhvi A; Katz, D.; Schouten, R.; spaette PoisotLab/SpeciesDistributionToolkit.jl: v1.7.0, Zenodo, 2025 | DOI

[46] Poisot, T.; Paperpile Bot (official); Dansereau, G.; Catchen, M. PoisotLab/ms_sdt_software: Recommended version, Zenodo, 2025 | DOI

[47] Roberts, D. R.; Bahn, V.; Ciuti, S.; Boyce, M. S.; Elith, J.; Guillera-Arroita, G.; Hauenstein, S.; Lahoz-Monfort, J. J.; Schröder, B.; Thuiller, W.; Warton, D. I.; Wintle, B. A.; Hartig, F.; Dormann, C. F. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure, Ecography, Volume 40 (2017) no. 8, pp. 913-929 | DOI

[48] Roesch, E.; Greener, J. G.; MacLean, A. L.; Nassar, H.; Rackauckas, C.; Holy, T. E.; Stumpf, M. P. H. Julia for biologists, Nature methods, Volume 20 (2023) no. 5, pp. 655-664 | DOI

[49] Stock, A.; Gregr, E. J.; Chan, K. M. A. Data leakage jeopardizes ecological applications of machine learning, Nature ecology & evolution, Volume 7 (2023) no. 11, pp. 1743-1745 | DOI

[50] Strydom, T.; Poisot, T. SpatialBoundaries.jl: edge detection using spatial wombling, Ecography, Volume 2023 (2023) no. 5 | DOI

[51] Tuanmu, M.-N.; Jetz, W. A global 1‐km consensus land‐cover product for biodiversity and ecosystem modelling: Consensus land cover, Global ecology and biogeography: a journal of macroecology, Volume 23 (2014) no. 9, pp. 1031-1045 | DOI

[52] Tuanmu, M.-N.; Jetz, W. A global, remote sensing‐based characterization of terrestrial habitat heterogeneity for biodiversity and ecosystem modelling: Global habitat heterogeneity, Global ecology and biogeography: a journal of macroecology, Volume 24 (2015) no. 11, pp. 1329-1339 | DOI

[53] Van Looveren, A.; Klaise, J. Interpretable counterfactual explanations guided by prototypes, arXiv [cs.LG] (2019)

[54] Vasconcelos, R. N.; Cantillo-Pérez, T.; Franca Rocha, W. J. S.; Aguiar, W. M.; Mendes, D. T.; de Jesus, T. B.; de Santana, C. O.; de Santana, M. M. M.; Oliveira, R. P. Advances and challenges in species ecological niche modeling: A mixed review, Earth (Basel, Switzerland), Volume 5 (2024) no. 4, pp. 963-989 | DOI

[55] Wadoux, A. M. J.-C.; Saby, N. P. A.; Martin, M. P. Shapley values reveal the drivers of soil organic carbon stock prediction, SOIL, Volume 9 (2023) no. 1, pp. 21-38 | DOI

[56] Zurell, D.; Elith, J.; Schröder, B. Predicting to new environments: tools for visualizing model behaviour and impacts on mapped distributions, Diversity & distributions, Volume 18 (2012) no. 6, pp. 628-634 | DOI

[57] Zurell, D.; Franklin, J.; König, C.; Bouchet, P. J.; Dormann, C. F.; Elith, J.; Fandos, G.; Feng, X.; Guillera-Arroita, G.; Guisan, A.; Lahoz-Monfort, J. J.; Leitão, P. J.; Park, D. S.; Townsend Peterson, A.; Rapacciuolo, G.; Schmatz, D. R.; Schröder, B.; Serra-Diaz, J. M.; Thuiller, W.; Yates, K. L.; Zimmermann, N. E.; Merow, C. A standard protocol for reporting species distribution models, Ecography, Volume 43 (2020) no. 9, pp. 1261-1277 | DOI