Section: Ecology
Topic: Biophysics and computational biology, Ecology, Statistics

Easy, fast and reproducible Stochastic Cellular Automata with chouca

Corresponding author(s): Génin, Alexandre (alexandre.genin@sete.cnrs.fr)

10.24072/pcjournal.466 - Peer Community Journal, Volume 4 (2024), article no. e95.

Get full text PDF Peer reviewed and recommended by PCI
article image

Abstract

Stochastic cellular automata (SCA) are models that describe spatial dynamics using a grid of cells that switch between discrete states over time. They are widely used to understand how small-scale processes scale up to affect ecological dynamics at larger spatial scales, and have been applied to a wide diversity of theoretical and applied problems in all systems, such as arid ecosystems, coral reefs, forests, bacteria, or urban growth. Despite their wide applications, SCA implementations are often ad-hoc, lacking performance, guarantees of correctness and poorly reproducible. De novo implementation of SCA for each specific system and application also represents a major barrier for many practitioners. To provide a unifying, well-tested technical basis to this class of models and facilitate their implementation, we built chouca, an R package that translates definitions of SCA models into compiled code, and runs simulations in an efficient way. chouca supports SCA based on rectangular grids where transition probabilities are defined for each cell, with performance typically two to three orders of magnitude above typical implementations in interpreted languages (e.g. R, Python), all while maintaining an intuitive interface in the R environment. Exact and mean-field simulations can be run, and both numerical and graphical results can be easily exported. Besides providing better reproducibility and accessibility, a fast engine for SCA unlocks novel, computationally intensive statistical approaches, such as simulation-based inference of ecological interactions from field data, which represents by itself an important avenue for research. By providing an easy and efficient entry point to SCAs, chouca lowers the bar to the use of this class of models for ecologists, managers and general practitioners, providing a leveled-off reproducible platform while opening novel methodological approaches.

Metadata
Published online:
DOI: 10.24072/pcjournal.466
Type: Research article
Keywords: spatial ecology, spatial modelling, stochastic cellular automaton, R package, chouca

Génin, Alexandre 1, 2, 3; Dupont, Guillaume 2, 4; Valencia, Daniel 2; Zucconi, Mauro 2, 5; Ávila-Thieme, M. Isidora 6, 7, 8, 9; Navarrete, Sergio A. 2, 5, 10, 8; Wieters, Evie A. 2

1 Copernicus Institute of Sustainable Development, Utrecht University – Utrecht, The Netherlands
2 Estación Costera de Investigaciones Marinas (ECIM) and Núcleo Milenio para la Ecología y Conservación de los Ecosistemas de Arrecifes Mesofóticos Templados (NUTME), Pontificia Universidad Católica de Chile – Las Cruces, Chile
3 Experimental and Theoretical Ecology Station, Centre National de la Recherche Scientifique – Moulis, France
4 International Master of Science in Marine Biological Resources (IMBRSea), Ghent University – Ghent, Belgium
5 Centro de Investigación Oceanográfica en el Pacifico Sur-Oriental (COPAS COASTAL), Universidad de Concepción – Concepción, Chile
6 Escuela de Negocios, Facultad de Ciencias Sociales y Artes, Universidad Mayor – Temuco, Chile
7 Center for Resilience, Adaptation and Mitigation (CReAM), Universidad Mayor – Temuco, Chile
8 Instituto Milenio en Socio-Ecología Costera (SECOS), Pontificia Universidad Católica de Chile – Santiago, Chile
9 Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile – Santiago, Chile
10 Center for Applied Ecology and Sustainability (CAPES), Pontificia Universidad Católica de Chile – Santiago, Chile
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
@article{10_24072_pcjournal_466,
     author = {G\'enin, Alexandre and Dupont, Guillaume and Valencia, Daniel and Zucconi, Mauro and \'Avila-Thieme, M. Isidora and Navarrete, Sergio A. and Wieters, Evie A.},
     title = {Easy, fast and reproducible {Stochastic} {Cellular} {Automata} with chouca},
     journal = {Peer Community Journal},
     eid = {e95},
     publisher = {Peer Community In},
     volume = {4},
     year = {2024},
     doi = {10.24072/pcjournal.466},
     language = {en},
     url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.466/}
}
TY  - JOUR
AU  - Génin, Alexandre
AU  - Dupont, Guillaume
AU  - Valencia, Daniel
AU  - Zucconi, Mauro
AU  - Ávila-Thieme, M. Isidora
AU  - Navarrete, Sergio A.
AU  - Wieters, Evie A.
TI  - Easy, fast and reproducible Stochastic Cellular Automata with chouca
JO  - Peer Community Journal
PY  - 2024
VL  - 4
PB  - Peer Community In
UR  - https://peercommunityjournal.org/articles/10.24072/pcjournal.466/
DO  - 10.24072/pcjournal.466
LA  - en
ID  - 10_24072_pcjournal_466
ER  - 
%0 Journal Article
%A Génin, Alexandre
%A Dupont, Guillaume
%A Valencia, Daniel
%A Zucconi, Mauro
%A Ávila-Thieme, M. Isidora
%A Navarrete, Sergio A.
%A Wieters, Evie A.
%T Easy, fast and reproducible Stochastic Cellular Automata with chouca
%J Peer Community Journal
%D 2024
%V 4
%I Peer Community In
%U https://peercommunityjournal.org/articles/10.24072/pcjournal.466/
%R 10.24072/pcjournal.466
%G en
%F 10_24072_pcjournal_466
Génin, Alexandre; Dupont, Guillaume; Valencia, Daniel; Zucconi, Mauro; Ávila-Thieme, M. Isidora; Navarrete, Sergio A.; Wieters, Evie A. Easy, fast and reproducible Stochastic Cellular Automata with chouca. Peer Community Journal, Volume 4 (2024), article  no. e95. doi : 10.24072/pcjournal.466. https://peercommunityjournal.org/articles/10.24072/pcjournal.466/

PCI peer reviews and recommendation, and links to data, scripts, code and supplementary information: 10.24072/pci.ecology.100686

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Full text

The full text below may contain a few conversion errors compared to the version of record of the published article.

Introduction

The analysis of spatial patterns has proven essential to understand ecological system dynamics, and various modelling approaches have helped ground empirical patterns into ecological theory. Among such approaches, models based on Stochastic Cellular Automata (hereafter SCA), also called Probabilistic or Random Cellular Automata, or Locally-interacting Markov Chains, have been a particularly useful, heuristic and widely used approach (Wolfram, 1984; Louis & Nardi, 2018). Cellular automata are based on a grid of cells that switch over time between a finite number of states. Most often, SCA are considered over a rectangular grid, though other geometries can exist (van Baalen, 2000). A famous deterministic cellular automaton (CA) is Conway’s game of life, which is defined by two discrete states (“dead” and “alive”) and a set of deterministic rules to make cells switch between them (Gardner, 1970; Bays, 2010). Stochastic cellular automata follow the same principles, but state transitions occur with a given probability instead of being based on deterministic rules. The probabilities of a cell switching from one state to another is assumed to depend on model parameters, the global state of the system (the proportion of cells in each state), and the local neighborhood of the focal cell. In all cases, the system future dynamics is probabilistically defined by its current state, i.e. dynamics are memoryless.

The use of SCA in ecology is widespread, as they have been used to describe the dynamics of a large array of ecosystems, including mussel beds (Guichard et al., 2003), arid ecosystems (Kéfi et al., 2007), forests (Heinonen & Pukkala, 2007), rocky shores (Wootton, 2001), coral reefs (Muthukrishnan et al., 2016; Génin et al., 2024) and plant communities (Lanzer & Pillar, 2002). SCA are often used to understand how local processes can scale up to affect landscape-wide properties, such as the persistence or extinction of a given species, the type of spatial patterns arising at the scale of a landscape (Pascual et al., 2002), or the spread of fire or epidemics. The latter case is a classical application of SCA in applied ecology, where data on local processes affecting forest stands can be coupled with GIS data to provide guidance on forest fire sensitivity (Yassemi et al., 2008). Though common in ecosystem modelling, and the primary purpose for this package, SCA are useful beyond this sole discipline, and are for example also used to describe epidemics (Yakowitz et al., 1990; Keeling, 2000), tumor growth (Moreira & Deutsch, 2002) or organism development (Manukyan et al., 2017).

The popularity of SCA is probably rooted in their relatively light need for formal mathematics compared to other approaches modelling spatial processes (e.g. partial derivative equations): only the probabilities of transitions between states need to be defined. The drawback of this simplicity is that the numerical simulation of SCA is typically computationally-intensive. Current approaches to do so efficiently rely on approximations, either assuming no spatial structure (mean field approximation), or using approximations to obtain analytical results, such as spatial moment equations (Lion, 2016; of which pair approximation is a specific case; Matsuda et al., 1992; Iwasa, 2000). However, these approximations are inappropriate when long-range correlations occur within a landscape (Iwasa, 2000), or when the full simulated landscape is needed as a model output, for example to compute spatial metrics (Génin et al., 2018). In many cases, the explicit numerical simulation must be run, which is often done on grids that may not be large enough to fully capture emergent spatial patterns (van de Koppel et al., 2011; Majumder et al., 2021). On top of these performance issues, most ecological studies based on SCA come with their own implementation. This opens the possibility of errors in code and often makes it difficult to reproduce model simulations. We aim at alleviating these issues with chouca, which provides a unifying and well-tested technical basis to SCA. Our goal is to improve the performance and accessibility of this class of models, and ultimately allow ecologists to spend more time exploring the behavior of their models, rather than on their implementation.

Supported models

The R package chouca works with 2-dimensional rectangular grids of cells (a “landscape”). Each cell can be in one of a finite set \(S\) of \(n\) elementary states \(S_{1}\ldots S_{n}\). Probabilities of transition are assumed to depend only on (i) the proportion of neighbors in each state, captured by the vector \(\mathbf{q} = \left( q_{1},...,q_{n} \right)\), (ii) the proportion of cells in a given state in the whole landscape, \(\mathbf{p} = \left( p_{1},...,p_{n} \right)\), and (iii) a set of constant model parameters \(\mathbf{\theta}\).

chouca has been primarily designed for modeling the dynamics of sessile organisms over space, which do not move and reproduce through the dispersal of propagules (e.g. gap-models; Shugart & West, 1981). This includes a wide range of organisms, ranging from forests, herbaceous plants or coral reefs, but the supported models may also be useful to describe other ecological situations. For the moment, the types of models that can be implemented exclude cellular automata in which an intermediate distance of interaction is considered (e.g. through a dispersion kernel; Muthukrishnan et al., 2016), or those in which a preferential direction exists (e.g. modeling water redistribution on a slope; Mayor et al., 2013), though these limitations are planned to be lifted in later versions. More importantly, SCA that are based on the dynamics of pairs of cells, for instance in which two cells swap their respective state, cannot be implemented directly. This is typical when representing the movements of an organism in a landscape (Pascual et al., 2002), and we provide links to alternative software at the end of the article for these models.

In the package, the probabilities of transition of a cell from state \(i\) to \(j\), \(P\left( S_{i} \rightarrow S_{j} \right)\), can be any function of \(\mathbf{p}\), \(\mathbf{q}\), and \(\mathbf{\theta}\) – however, it will work best and without approximation if it has the following form:

  1. \(P\left( S_{i} \rightarrow S_{j} \right) = a_{0} + g_{1}\left( q_{1} \right) + \ldots + g_{n}\left( q_{n} \right) + \zeta\left( \mathbf{q},\mathbf{q} \right) + \zeta\left( \mathbf{q},\mathbf{p} \right) + \zeta\left( \mathbf{p},\mathbf{p} \right)\)

where, for any transition \(S_{i} \rightarrow S_{j}\), \(a_{0}\) is a constant, \(g_{s}\) is any univariate function of \(q_{s}\), and \(\zeta\left( \mathbf{x},\mathbf{y} \right)\) is the sum, defined for two vectors \(\mathbf{x} = \left( x_{1},...,x_{n} \right)\) and \(\mathbf{y} = \left( y_{1},...,y_{n} \right)\) as

  1. \(\zeta\left( \mathbf{x},\mathbf{y} \right) = a_{1}x_{1}^{\alpha_{1}}y_{1}^{\beta_{1}} + a_{2}x_{1}^{\alpha_{2}}y_{2}^{\beta_{2}} + a_{3}x_{2}^{\alpha_{3}}y_{1}^{\beta_{3}} + \ldots + a_{K}x_{n}^{\alpha_{K}}y_{n}^{\beta_{K}}\)

in which the \(\left( a_{i} \right)_{i \in \lbrack 1:K\rbrack}\), \(\left( \alpha_{i} \right)_{i \in \lbrack 1:K\rbrack}\) and \(\left( \beta_{i} \right)_{i \in \lbrack 1:K\rbrack}\) are constants and \(K\) is the total number of terms of the sum. In practice, this functional form is flexible enough to approximate the probabilities of transition of many ecological models.

Implementing and working with an SCA in chouca typically consists in four steps, in which the user (1) defines the model states and the transitions between them, (2) creates an initial landscape (grid of cells), (3) runs the model, and (4) displays or extracts the results (Figure 1). We detail in this paper this workflow (Figure 1) – documentation is available throughout the package, individually for each function, or as a whole in an R “vignette”, accessible with the command vignette(“chouca-package”).

Figure 1 - Main tasks (boxes) of the chouca package, and their associated sets of functions

Example applications

A simple model of mussel bed

To illustrate how a stochastic cellular automaton can be defined with chouca, we use the model of Guichard et al. (2003), which describes the dynamics of mussels colonizing rocks exposed to waves. This model has three cell states (i) “disturbed”, (ii) “empty” and (iii) “occupied” (by mussels). During a single time-step, a disturbed cell becomes an empty cell with probability 1. Such transition can be defined by using a call to the R function transition():

transition(from = "disturbed", to = "empty", ~ 1)

This statement declares a transition from a “disturbed” state to an “empty” state, with the last argument being a symbolic expression starting with “~”, that describes how to compute the probability, here being simply equal to the constant “1”.

The model assumes that the establishment of new individuals always occurs next to existing mussels. In other words, for a focal cell in the “bare” state, its probability of switching to the “mussel” state is not constant, but given by \(\alpha q_{mussel}\), where \(\alpha\) is a productivity rate, and \(q_{mussel}\) is the proportion of cell neighbors in the mussel state. Such transition is defined by the following call in R:

transition(from = "empty", to = "mussel", ~ alpha * q["mussel"])

Here, q["mussel"] is used to refer to the proportion of cell neighbors in the “mussel” state, a continuous number between 0 and 1. Similarly, p["mussel"] can be used to refer to the global proportion of cells in the landscape in the "mussel" state.

Mussels can be perturbed by incoming waves, which dislodge them and turn them into "disturbed" cells. In this model, the probability of a mussel cell to become disturbed is the sum of a baseline term \(\delta\), and an additional term \(d\), which is non-zero only if the mussel cell has one or more disturbed neighbors:

transition(from = "mussel", to = "disturbed", ~ delta + d * ( q["disturbed"] > 0 ))

The original model considers that cells are neighbors when they share an edge (current options include a 4-way or von-Neumann neighborhood, the other option being a Moore or 8-way neighborhood), and uses a toric space for simulations, meaning that the up/leftmost cells of the grid are neighbors of the bottom/rightmost cells. Putting everything together, this model can be defined using the following syntax:

musselbed_mod <- camodel(

transition(from = "disturbed", to = "empty", ~ 1),

transition(from = "empty", to = "mussel", ~ alpha * q["mussel"]),

transition(from = "mussel", to = "disturbed", ~ delta + d * ( q["disturbed"] > 0 )),

parms = list(alpha = 1, delta = 0.2, d = 1), # parameters

wrap = TRUE, # toric space

neighbors = 4 # 4-way neighborhood

)

The call to the camodel() function above estimates the constants in the functional form described in equation 1 to match the parameters of the model. If this process yields large residual error, for example because the transition probabilities do not correspond to the functional form in equation 1, a warning is produced. The structure of the model can be displayed on the R console using print(), or as a graph using the generic function plot() (Figure 2).

> print(musselbed_mod)

Stochastic Cellular Automaton

States: disturbed empty mussel

Transition: disturbed -> empty

~ 1

Transition: empty -> mussel

~ alpha * q["mussel"]

Transition: mussel -> disturbed

~ delta + d * (q["disturbed"] > 0)

Neighborhood: 4x4

Wrap: TRUE

Max error: 0 (OK)

Max rel error: 0 (OK)

Figure 2 - Structure of the mussel bed model displayed as a graph using plot(musselbed_mod), in which nodes are states, and arrows represent the possible transitions, with the expression used to compute their probabilities

Once the model is created, an initial landscape can be defined with a given number of rows and columns using generate_initmat(), which creates a landscape with randomly-distributed cell states in space, but following the specified initial covers:

init_landscape <- generate_initmat(musselbed_mod,

pvec = c(disturbed = 0.1, empty = 0.1, mussel = 0.8),

nrow = 64, ncol = 64)

The model can then be simulated for a given number of time steps, below from 0 to 50, using run_camodel(). Standard methods such as plot() or image() can be used to display the global covers after the model has run, or the resulting landscapes:

output <- run_camodel(musselbed_mod, init_landscape, times = seq(0, 50))

plot(output)

image(output)

By default, chouca uses a C++ backend based on Rcpp (Eddelbuettel, 2013), which has reasonable performance. This can be improved by compiling the model code just before the simulation is run, and using memoization so that transition probabilities are computed only once for cells with the same neighborhood configuration. This typically improve performance by two orders of magnitude over a typical implementation (Figure 3; Schneider et al., 2016), which can be further increased by parallelizing computations over multiple cores, though this is a less efficient approach. Enabling these options can be done by passing control arguments as an R list object:

control_args <- list(engine = "compiled",

cores = 4,

precompute_probas = TRUE)

output <- run_camodel(musselbed_mod, init_landscape, niter = 256, control = control_args)

This “control” list defines how the simulation is run, which data to save from the simulation, or the textual output to print while the simulation is running (a complete list of options can be found in the help page by using ?run_camodel on the R console). The user can also supply custom functions that will be run as the simulation is running. This can be useful, for example, to display landscapes, covers, or compute statistics on the 2D landscape as the simulation is running, which we illustrate in the following sections.

Figure 3 - Simulation speeds for a set of three simple ecological models (2-3 states and 3-4 transitions) according the the grid size, for a pure-R implementation (R reference; Schneider et al., 2016), and three backends provided by chouca (blue and green lines). Single-core performance on a 2020 desktop computer.

Graphical explorations

Because an SCA describes dynamics over landscapes, they are particularly well-adapted to pattern-oriented modelling (Grimm et al., 1996), in which a model is defined and revised based on a qualitative or quantitative comparison with empirical patterns. Likelihood-based approaches are increasingly popular to compare models with data (Hartig et al., 2011 and section below), but the qualitative comparison and visual exploration of model dynamics remains an essential phase of the design of spatial models. To make this modeling step more accessible, we made it easy to investigate visually the behavior of models, and illustrate here this approach with an epidemiological example.

Keeling (2000) uses an SCA-based approach to investigate the spread of a parasite over space, with an application to forests. This very simple model uses three states, “host”, “parasitized”, “empty”, and can be defined as follows in chouca:

mod <- camodel(transition(from = "empty", to = "host",

~ 1 - ( 1 - g )^( 4 * q["host"] ) ),

transition(from = "host", to = "parasitized",

~ 1 - ( 1 – T )^( 4 * q["parasitized"] )),

transition(from = "parasitized", to = "empty",

~ 1),

parms = list(g = 0.05, T = 0.5),

wrap = TRUE,

neighbors = 4)

where \(g\) is the growth rate of the host, and \(T\) the transmissibility of the parasite (this model assumes that infection is always fatal, so the transition from “parasitized” to “empty” is equal to 1).

This model can produce interesting “epidemiological fronts”, which stem from the way the parasite spreads to its neighboring hosts, killing them and leaving behind empty, bare patches that are later recolonized by the host, albeit at a slower rate. This phenomenon of fronts propagating through an excitable media may be difficult to quantify formally, but can be easily visualized from model outputs. This can be done in chouca by setting the run_camodel() function to display the results as the simulation is run. To do so, we adjust the control list to include a function that displays the landscapes:

options <- list(custom_output_fun = landscape_plotter(mod),

custom_output_every = 1)

then run the model on a 256×256 grid seeded with 10% of cells in the “parasitized” state:

initmm <- generate_initmat(mod, c(host = 0.9, parasitized = 0.1, empty = 0),

nrow = 256, ncol = 256)

output <- run_camodel(mod, initmm, times = seq(0, 1024), control = options)

The above lines of code will run the model and display the changing landscape, which allows investigating the spreading patterns of the parasite. Once this visualization step is done (Figure 4, and animated version in SM1), the options set to visualize the landscape can be removed to reduce simulation times, for example to investigate the model behavior along a range of parameter values.

Figure 4 - Landscape patterns of the host-parasitized-empty model (respectively green, yellow and blue) as displayed on screen.

Inference of local interactions from landscape-scale patterns

Because SCA are defined on grids, a natural application is to compare their output to empirical raster data, such as remote-sensing images, to infer local-scale ecological interactions from landscape-wide spatial patterns. Arid systems provide a good illustration of this approach: in those systems, interactions between plants are often a balance of negative effects, through competition for nutrients or water, with positive effects, for example with an increased survival of seedlings below the canopy of taller plants (Valiente-Banuet & Ezcurra, 1991). These effects can be strong enough for new plants to mostly establish below existing plants, which results in their aggregation into patches, and has important consequences for the resilience of those systems to changes in aridity (Kéfi et al., 2007). The sizes and numbers of those patches can be readily quantified from remote-sensing images, and such patterns can be used to infer whether facilitation occurs between plants (Xu et al., 2015; Chen et al., 2022). This is traditionally done by summarizing the spatial structure into spatial statistics, such as spatial autocorrelation (Sankaran et al., 2017) or type of patch size distribution (Siteur et al., 2023; Pichon et al., 2024) and linking the observed changes in those metrics to theoretical results (Scanlon et al., 2007; Kéfi et al., 2024). However, such qualitative comparison logically results in a qualitative and corroborative result, i.e. is there or is there not facilitation between plants, rather than a more informative quantitative result, i.e. how strong facilitation is between plants. A quantitative inference must be based on an approach that links quantitatively some aspects of the spatial patterns to the strength of facilitation. This can be done by defining a model that links the strength of facilitation with the expected patterns, and finding model parameters that maximize agreement between model output and data. This is expensive computationally, but this limitation is alleviated by a fast SCA engine such as chouca, as we show below.

We define here a model of an arid ecosystem with two states, “bare” and “vegetated”. A bare cell can become a vegetated cell with the probability \(p_{plant}\). A vegetated cell can become bare (plants die) with the probability \(d(1 - \beta q_{plant})\), where \(d\) is a constant mortality rate, that is reduced by the coefficient \(\beta\). \(\beta\) captures the local effect of plants, either facilitation when they increase the survival of plants near existing vegetation (\(\beta > 0\)), or competition when survival is decreased (\(\beta < 0\)). This model is implemented in chouca using:

facilitation_mod <- camodel(

transition(from = "bare", to = "plant", ~ p["plant"] ),

transition(from = "plant", to = "bare", ~ d * ( 1 – beta * q[“plant”]) ),

parms = list(beta = 0.5, d = 0.85),

wrap = TRUE,

neighbors = 4

)

We ran the model till equilibrium on a 1024×1024 grid, to simulate a landscape that would be obtained from empirical data (e.g. a remote sensing image) using \((d,\beta) = (0.85,\ 0.5)\). From this landscape used as observed data, we computed the distribution of pairs, which summarizes all the possible pairs of neighboring cells present in the landscape \(\mathbf{n}_{\mathbf{obs}} = (n_{p,0},n_{p,p},n_{0,0}\ )\). We then define the likelihood \(P(\mathbf{n}_{\mathbf{obs}}│d,\beta)\) assuming these observed number of pairs follow a multinomial distribution of size \(N_{p}\) (the sum of all pairs, a fixed number given the grid size and neighborhood type), and probabilities \(\mathbf{\mu} = (\mu_{p,0},\mu_{p,p},\mu_{0,0}\ )\). \(\mathbf{\mu}\) defines the relative probabilities of observing each type of pair in the grid, which depend on the particular values of \(d\) and \(\beta\), and can be estimated by simulating the landscape with these parameter values. This estimate of the likelihood formally quantifies the agreement between model and data, and allows exploring the parameter space in terms of \(d\) and \(\beta\) to find the most likely parameter combinations given the observed patterns.

We find that this approach can recover the parameter values used for \(d\) and \(\beta\), with only one global maximum in the likelihood function (Figure 5; code provided in SM2), showing that the distribution of pairs is a sufficient statistic to recover the model parameters. Applying such an approach to empirical data would require further testing of model assumptions, for example, to investigate whether facilitation occurs on the recruitment of new plants instead of on the mortality of adult plants – this can be done by simply changing the model definition above. Because this approach is likelihood-based, model support can be compared using traditional statistics, such as AIC, and a Bayesian approach can be used to estimate credible intervals on parameter values, or use informative priors grounded in knowledge about the system (Hartig et al., 2011).

Using this type of simulation-based inference is very expensive computationally, as this simple test requires running around a thousand simulations. This would require long-running computations with usual SCA implementations, but takes just under five minutes with chouca on a 2018 laptop computer (8 cores). This way of calibrating models has seldom been used in spatial ecology – a fast SCA implementation is essential to make it more accessible and test its relevance to real-world datasets.

Figure 5 - Likelihood surface as a function of the two model parameters d and β. Blue values indicate parameter combinations with high likelihood. The white dot and lines indicate the estimated parameter values.

Conclusion

chouca is an easy-to-use package to model, simulate, and visualize SCA in a reproducible way, that enables an interactive design and revision of models as well as novel methodological approaches. chouca does not support all types of cellular automata, and does not replace more generic modelling frameworks such as NetLogo (Wilensky, 1999) or simecol (Petzoldt & Rinke, 2007), but allows very efficient simulation for the types of SCA it supports. The package focuses on processes defined at the level of a cell, while other SCA define processes at the level of pair of cells, a model representation other software packages can provide (e.g. CellLab-CTS; Tucker et al., 2016). Another limitation is the use of rectangular grids, which may produce discretisation artefacts that are not present in real-world patterns – future improvements may include the ability to run simulations on triangular or hexagonal grids. It is important to note that because chouca splits the definition of an SCA model from its simulation phase, it may use different backends for simulation. This opens the possibility of future improvements to already-implemented models without modification of existing code.

Because of the relatively low bar of entry of SCA compared to other forms of spatial modelling, we hope this work will contribute to make more accessible the testing of hypotheses linking spatial processes to patterns in ecology, as well exploring system-level consequences of specific local management decisions. chouca is available and maintained on CRAN, and welcomes comments, feedback and bug reports on its home page at https://github.com/alexgenin/chouca.

Acknowledgements

This manuscript has received very constructive comments and feedback from Samuel Alizon, Broder Breckling and one anonymous reviewer, to whom we are thankful. Preprint version 6 of this article has been peer-reviewed and recommended by Peer Community In Ecology (https://doi.org/10.24072/pci.ecology.100686; Alizon, 2024).

Data, script and code availability

All code used for this work is freely-available at Zenodo under a CC-BY license (Génin, 2024), available at https://doi.org/10.5281/ZENODO.11567662, along with chouca version v0.1.99 used at the time of writing.

Funding

AG has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement 896159 (INDECOSTAB). MGZ thanks the Pontificia Universidad Católica de Chile for the doctoral student support scholarship, and programs COPAS-COASTAL (FB10021) and Núcleo Milenio NUTME NCN2023_004 for the awarded doctoral thesis fellowships. MIAT acknoweldges support from FONDECYT 3220110, GD from the IsBlue program (ANR-17-EURE-0015), EAW from FONDECYT 1181719, 1241901, and Núcleo Milenio NCN2023_004 (NUTME), and SAN from NUTME ICM_NCN2023_004, SECOS, ICN 2019-015, CAPES, PIA/BASAL FB0002, COPAS COASTAL FB21002, and FONDECYT 1200636.

Conflict of interest disclosure

The authors declare they comply with the PCI rule of having no financial conflicts of interest.


References

[1] Alizon, S. An R package for flexible and fast Stochastic Cellular Automata modeling, Peer Community in Ecology (2024), p. 100686 | DOI

[2] van Baalen, M. Pair approximations for different spatial geometries, The geometry of ecological interactions: simplifying spatial complexity (Cambridge Studies in Adaptive Dynamics), 2000, pp. 359-387 | DOI

[3] Bays, C. Introduction to Cellular Automata and Conway’s Game of Life, Game of Life Cellular Automata, Springer London, London, 2010, pp. 1-7 | DOI

[4] Chen, B. J. W.; Teng, S. N.; Zheng, G.; Cui, L.; Li, S.; Staal, A.; Eitel, J. U. H.; Crowther, T. W.; Berdugo, M.; Mo, L.; Ma, H.; Bialic‐Murphy, L.; Zohner, C. M.; Maynard, D. S.; Averill, C.; Zhang, J.; He, Q.; Evers, J. B.; Anten, N. P. R.; Yizhaq, H.; Stavi, I.; Argaman, E.; Basson, U.; Xu, Z.; Zhang, M.; Niu, K.; Liu, Q.; Xu, C. Inferring plant–plant interactions using remote sensing, Journal of Ecology, Volume 110 (2022) no. 10, pp. 2268-2287 | DOI

[5] Eddelbuettel, D. Seamless R and C++ integration with Rcpp, Use R!, Springer, New York, 2013 no. 64 | DOI

[6] Gardner, M. Mathematical Games: The fantastic combinations of John Conway's new solitaire game "life", Scientific American, Volume 223 (1970) no. 4, pp. 120-123 | DOI

[7] Génin, A. Scripts used in the publication "Easy, fast and reproducible Stochastic Cellular Automata with chouca" , 2024 (https://zenodo.org/doi/10.5281/zenodo.11567662) | DOI

[8] Génin, A.; Majumder, S.; Sankaran, S.; Danet, A.; Guttal, V.; Schneider, F. D.; Kéfi, S. Monitoring ecosystem degradation using spatial data and the R package spatialwarnings, Methods in Ecology and Evolution, Volume 9 (2018) no. 10, pp. 2067-2075 | DOI

[9] Génin, A.; Navarrete, S. A.; Garcia-Mayor, A.; Wieters, E. A. Emergent Spatial Patterns Can Indicate Upcoming Regime Shifts in a Realistic Model of Coral Community, The American Naturalist, Volume 203 (2024) no. 2, pp. 204-218 | DOI

[10] Grimm, V.; Frank, K.; Jeltsch, F.; Brandl, R.; Uchmański, J.; Wissel, C. Pattern-oriented modelling in population ecology, Science of The Total Environment, Volume 183 (1996) no. 1-2, pp. 151-166 | DOI

[11] Guichard, F.; Halpin, P. M.; Allison, G. W.; Lubchenco, J.; Menge, B. A. Mussel disturbance dynamics: signatures of oceanographic forcing from local interactions, The American Naturalist, Volume 161 (2003) no. 6, pp. 889-904 | DOI

[12] Hartig, F.; Calabrese, J. M.; Reineking, B.; Wiegand, T.; Huth, A. Statistical inference for stochastic simulation models - theory and application: Inference for stochastic simulation models, Ecology Letters, Volume 14 (2011) no. 8, pp. 816-827 | DOI

[13] Heinonen, T.; Pukkala, T. The use of cellular automaton approach in forest planning, Canadian Journal of Forest Research, Volume 37 (2007) no. 11, pp. 2188-2200 | DOI

[14] Iwasa, Y. Lattice models and pair approximation in ecology, The geometry of ecological interactions: simplifying spatial complexity (Cambridge Studies in Adaptive Dynamics), Cambridge University Press, 2000, pp. 227-251 | DOI

[15] Keeling, M. J. Evolutionary Dynamics in Spatial Host–Parasite Systems, The Geometry of Ecological Interactions, Cambridge University Press, 2000, pp. 271-291 | DOI

[16] Kéfi, S.; Génin, A.; Garcia-Mayor, A.; Guirado, E.; Cabral, J. S.; Berdugo, M.; Guerber, J.; Solé, R.; Maestre, F. T. Self-organization as a mechanism of resilience in dryland ecosystems, Proceedings of the National Academy of Sciences, Volume 121 (2024) no. 6, p. e2305153121 | DOI

[17] Kéfi, S.; Rietkerk, M.; Alados, C. L.; Pueyo, Y.; Papanastasis, V. P.; El Aich, A.; de Ruiter, P. C. Spatial vegetation patterns and imminent desertification in Mediterranean arid ecosystems, Nature, Volume 449 (2007) no. 7159, pp. 213-217 | DOI

[18] van de Koppel, J.; Gupta, R.; Vuik, C. Scaling-up spatially-explicit ecological models using graphics processors, Ecological Modelling, Volume 222 (2011) no. 17, pp. 3011-3019 | DOI

[19] Lanzer, A.; Pillar, V. Probabilistic cellular automaton: model and application to vegetation dynamics, Community Ecology, Volume 3 (2002) no. 2, pp. 159-167 | DOI

[20] Lion, S. Moment equations in spatial evolutionary ecology, Journal of Theoretical Biology, Volume 405 (2016), pp. 46-57 | DOI

[21] Louis, P.-Y.; Nardi, F. R. Probabilistic Cellular Automata: Theory, Applications and Future Perspectives, Emergence, Complexity and Computation, Springer International Publishing, Cham, 2018 | DOI

[22] Majumder, S.; Das, A.; Kushal, A.; Sankaran, S.; Guttal, V. Finite-size effects, demographic noise, and ecosystem dynamics, The European Physical Journal Special Topics, Volume 230 (2021) no. 16-17, pp. 3389-3401 | DOI

[23] Manukyan, L.; Montandon, S. A.; Fofonjka, A.; Smirnov, S.; Milinkovitch, M. C. A living mesoscopic cellular automaton made of skin scales, Nature, Volume 544 (2017) no. 7649, pp. 173-179 | DOI

[24] Matsuda, H.; Ogita, N.; Sasaki, A.; Sato, K. Statistical Mechanics of Population: The Lattice Lotka-Volterra Model, Progress of Theoretical Physics, Volume 88 (1992) no. 6, pp. 1035-1049 | DOI

[25] Mayor, Á. G.; Kéfi, S.; Bautista, S.; Rodríguez, F.; Cartení, F.; Rietkerk, M. Feedbacks between vegetation pattern and resource loss dramatically decrease ecosystem resilience and restoration potential in a simple dryland model, Landscape Ecology, Volume 28 (2013) no. 5, pp. 931-942 | DOI

[26] Moreira, J.; Deutsch, A. Cellular Automaton Models of Tumor Development: A Critical Review, Advances in Complex Systems, Volume 05 (2002) no. 02n03, pp. 247-267 | DOI

[27] Muthukrishnan, R.; Lloyd-Smith, J. O.; Fong, P. Mechanisms of resilience: empirically quantified positive feedbacks produce alternate stable states dynamics in a model of a tropical reef, Journal of Ecology, Volume 104 (2016) no. 6, pp. 1662-1672 | DOI

[28] Pascual, M.; Roy, M.; Guichard, F.; Flierl, G. Cluster size distributions: signatures of self-organization in spatial ecologies, Philosophical Transactions of the Royal Society B: Biological Sciences, Volume 357 (2002) no. 1421, pp. 657-666 | DOI

[29] Petzoldt, T.; Rinke, K. simecol : An Object-Oriented Framework for Ecological Modeling in R, Journal of Statistical Software, Volume 22 (2007) no. 9 | DOI

[30] Pichon, B.; Donnet, S.; Gounand, I.; Kéfi, S. Estimating distances to desertification points from dryland ecosystem images, bioRxiv (2024) no. 2024.02.20.581244 | DOI

[31] Sankaran, S.; Majumder, S.; Viswanathan, A.; Guttal, V. Patchiness and scale-free correlations: characterising criticality in ecosystems., bioRxiv (2017) no. 233429 | DOI

[32] Scanlon, T. M.; Caylor, K. K.; Levin, S. A.; Rodriguez-Iturbe, I. Positive feedbacks promote power-law clustering of Kalahari vegetation, Nature, Volume 449 (2007) no. 7159, pp. 209-212 | DOI

[33] Schneider, F. D.; Danet, A.; Génin, A.; Guttal, V.; Kéfi, S.; Majumder, S.; Sankaran, S. R-package caspr, 2016 (https://github.com/fdschneider/caspr)

[34] Shugart, H. H. J.; West, D. C. Long-Term Dynamics of Forest Ecosystems: Computer simulation models, which allow for numerous seedlings and the long lives of large trees, predict how forests will respond to different management techniques, American Scientist, Volume 69 (1981) no. 6, pp. 647-652 (https://www.jstor.org/stable/27850716)

[35] Siteur, K.; Liu, Q.-X.; Rottschäfer, V.; van der Heide, T.; Rietkerk, M.; Doelman, A.; Boström, C.; van de Koppel, J. Phase-separation physics underlies new theory for the resilience of patchy ecosystems, Proceedings of the National Academy of Sciences, Volume 120 (2023) no. 2, p. e2202683120 | DOI

[36] Tucker, G. E.; Hobley, D. E. J.; Hutton, E.; Gasparini, N. M.; Istanbulluoglu, E.; Adams, J. M.; Nudurupati, S. S. CellLab-CTS 2015: continuous-time stochastic cellular automaton modeling using Landlab, Geoscientific Model Development, Volume 9 (2016) no. 2, pp. 823-839 | DOI

[37] Valiente-Banuet, A.; Ezcurra, E. Shade as a cause of the association between the cactus Neobuxbaumia tetetzo and the nurse plant Mimosa luisana in the Tehuacan Valley, Mexico, The Journal of Ecology, Volume 79 (1991) no. 4, p. 961 | DOI

[38] Wilensky, U. NetLogo, 1999 (http://ccl.northwestern.edu/netlogo/)

[39] Wolfram, S. Cellular automata as models of complexity, Nature, Volume 311 (1984) no. 5985, pp. 419-424 | DOI

[40] Wootton, J. T. Local interactions predict large-scale pattern in empirically derived cellular automata, Nature, Volume 413 (2001) no. 6858, pp. 841-844 | DOI

[41] Xu, C.; Holmgren, M.; Van Nes, E. H.; Maestre, F. T.; Soliveres, S.; Berdugo, M.; Kéfi, S.; Marquet, P. A.; Abades, S.; Scheffer, M. Can we infer plant facilitation from remote sensing? a test across global drylands, Ecological Applications, Volume 25 (2015) no. 6, pp. 1456-1462 | DOI

[42] Yakowitz, S.; Gani, J.; Hayes, R. Cellular automaton modeling of epidemics, Applied Mathematics and Computation, Volume 40 (1990) no. 1, pp. 41-54 | DOI

[43] Yassemi, S.; Dragicevic, S.; Schmidt, M. Design and implementation of an integrated GIS-based cellular automata model to characterize forest fire behaviour, Ecological Modelling (2008) no. 210, pp. 71-84 | DOI


block.super