Assessing species interactions using integrated predator-prey models

1 Inferring the strength of species interactions from demographic data is a challenging task. The 2 Integrated Population Modelling (IPM) approach, bringing together population counts, capture-3 recapture, and individual-level fecundity data into a unified model framework, has been extended 4 from single species to the community level. This allows to specify IPMs for multiple species with 5 interactions specified as links between vital rates and stage-specific densities. However, there is no 6 evaluation of such models when interactions are actually absent—while any interaction inference 7 method runs the risk of producing false positives. We investigate here whether multispecies 8 IPMs could output interactions where there are in fact none, building on an existing predator-9 prey IPM. We show that interspecific density-dependence estimates are centered on zero when 10 simulated to be zero, and therefore their estimation is unbiased. Their coverage probability, 11 quantifying how many times credible intervals include zero, is also satisfactory. We further 12 confirm that adding random temporal variation to multispecies density-dependent link functions 13 does not alter these results. This study therefore reaffirms the potential of multispecies IPMs 14 to infer correctly how biotic interactions influence demography, although future studies should 15 investigate model misspecifications. 16


Introduction
Estimating ecological interactions between and within species through models of their joint population dynamics is a task which requires large amounts of data.Indeed, with potentially as many as q 2 interaction parameters for q model compartments (combination of species and age classes), the number of parameters to estimate can climb very rapidly.Therefore, ecological statistics searches for improved ways to infer such population-level interaction strengths.A recently developed technique consists in combining data sources in multispecies Integrated Populations Models (IPMs) including interspecific interactions (Péron & Koons, 2012;Barraquand & Gimenez, 2019;Quéroué et al., 2021).Because Integrated Population Models (IPMs, Besbeas et al., 2002) combine data on demographic rates (e.g., capture recapture, breeding data) with data on population size (typically from counts), they allow: (a) estimating both demographic rates and population size (and hence their inter-dependencies) in a joint analysis, (b) an improved precision in parameter estimates, compared to separate analyses of component datasets, since the information contained in several datasets combine into estimated parameters (e.g., count data and capture recapture data both contain information on survival rates), and in some cases (c) to estimate parameters for which there is no dedicated data stream, that can only be estimated through inverse estimation of a demographic model (Kéry & Schaub, 2011;Abadi et al., 2010).This last property is particularly useful to estimate population-level species interactions strengths, since population-level interactions are always indirectly inferred.Although inverse estimation can in theory be performed using a single data source such as population counts, such inverse estimation is a difficult task fraught with identifiability issues.Asking whether multispecies IPMs performed better than classical inverse estimation from count data alone, Barraquand & Gimenez (2019) have shown that better estimates of interaction parameters could be obtained by combining data sources.Additionally, an empirical study in a bird predator-prey system Quéroué et al. (2021) was able to detect the expected bottom-up demographic linkages from prey to predator but not the expected top-down relationships, suggesting that those may be too weak to be detected.
In these multispecies IPM studies estimating interspecific interactions, between-species linkages have always been considered to be present in the simulations or in the underlying reality (based on background knowledge).Other choices are possible: some multispecies IPMs do not assume interspecific interactions to be present a priori (Lahoz-Monfort et al., 2017), but they do not estimate them either and focus instead on environmental effects.However, multispecies IPMs with inter-specific interactions could also be used in situations where it is not clear whether population-level interactions between species are possible.This is all the more true that interactions are specified as links between vital rates and stage-specific densities, and while some of these relationships may be known a priori, others may not.The issue was raised but not tackled by Barraquand & Gimenez (2019): a natural follow-up is therefore to ask what happens whenever we try to estimate interactions that are actually absent, to make sure that multispecies IPMs do not yield false positives.
Let us note that when estimating or predicting interspecific interactions in general-not just with multispecies IPMs-whether methods could output false positives is a key concern (e.g., with multivariate autoregressive models, Mutshinda et al. 2009;Barraquand et al. 2021;dynamic bayesian networks, Sander et al. 2017; or other machine learning tools, Strydom et al. 2021).The fact that all interaction inference methods run the risk of creating false positives of interspecific interactions at exaggerated rates only reinforces the need to evaluate it in multispecies IPMs.
An additional concern is temporal stochasticity in the functions linking vital rates of a given stage of species i to the densities of a given stage of species j.In the simulation-based study of Barraquand & Gimenez (2019), it was assumed that such stochasticity was absent, while empirical studies (Péron & Koons, 2012;Quéroué et al., 2021) assumed its presence in order to partition variation in vital rates due to species densities vs other factors changing over time.We therefore still need to understand whether theoretical performances hold in this more empirically realistic context, where environmental factors can perturb demographic rates, and those are not solely deterministic functions of species densities.
To sum up, we follow-up here on the multispecies IPM study of Barraquand & Gimenez (2019) by asking whether (1) inter-species interactions are truly estimated to be zero when species have in fact independent dynamics and (2) how species interaction strengths estimates can be affected by the absence and presence of environmental stochasticity (random year effects on demographic rates).

General description of the multispecies IPM
The deterministic skeleton can be described as a density-dependent matrix population model (1) Eq. 1 describes in discrete-time the dynamics of abundances of two species and two stages per species, with projection matrix and abundance vector where n J V,t , n A V,t , n J P,t and n A P,t are respectively the abundances of juvenile prey (denoted V as 'victim'), adult prey, juvenile predators and adult predators, at time t.The fecundities f V,t , f P,t are the expected number of juvenile prey and predator produced by an adult prey and predator, respectively.Survival probabilities between t and t + 1 are denoted with ϕ, so that ϕ J V,t , ϕ A V,t , ϕ J P,t and ϕ A P,t are the survival probabilities of the juvenile prey, adult prey, juvenile predator and adult predator.

Count data
To simulate and account for demographic stochasticity, we modelled yearly (st)age specific abundances n t using Binomial and Poisson distributions as in Barraquand & Gimenez (2019) eqs. (2)-(5).
Regarding the observation process for count data, the 2019 model assumed a negligible observation error (σ 2 = 10 −5 ).The reason was that in absence of replicated counts at each time unit, observation error variance is notoriously difficult to disentangle from process error variance (Knape, 2008;Auger-Méthé et al., 2016).While in some cases it could be possible to remove observation error altogether, because total population sizes of each species (summed numbers of juveniles and adults) are the observed count variables (as in most IPMs), they need to appear in the model as drawn from some probability distribution-they need to be a stochastic node in the MCMC representation.It was therefore decided to keep the formulation of the model in its state-space version, but forcing it to observe true population size almost with certainty (negligible process error variance).However, we uncovered in the present work that stage-specific abundances could not be estimated properly.
Because correctly reproducing stage-specific abundances when fitting a stage-structured model is desirable, and that there is in most wildlife surveys some measure of observation error on counts, we assumed in the present article a non-negligible, positive observation error variance.As we do not have replicated counts at any given time, we do not attempt to estimate observation error variance, and assume that it is known and classically set on the logarithmic scale (i.e., the coefficient of variation of observed population size is constant).For predator counts (denoted P ) we have and similarly for prey counts with LN the log-Normal distribution and its associated variance on the log-scale σ 2 obs = 0.1.Other choices of observation model are possible but this one is standard for abundance values that are not too small (Besbeas et al., 2002;Dennis et al., 2006).

Survival data
To increase computational efficiency (particularly true for the scenarios with more individuals captured and a shorter time series), we simulated and fitted the capture-mark-recapture data in the m-array format, using a multinomial likelihood (Burnham, 1987).The data is in the form of two (T −1)×T matrices M J and M A , one for each age class, with M (a) = (m where T is the total number of years of capture recapture history.m  It is important to note that for the matrix of released juveniles M J , R J t corresponds to the number of juveniles newly marked at time t.However, R A t corresponds to the number of newly marked adults (lets denote it R A m,t ), but also of all previously marked juveniles and adults that were released at time t.That is, but Therefore, unless individuals are not released when marked (e.g., killed, or taken to be released outside of the study population), one needs to provide data and model the number of released adults re-sighted, even if no individuals are first marked as adults.As no individuals are marked as adults here, R A m,1 = 0, and so that Equations 5 and 6 can be simplified accordingly.
Note also that in such case were no adults are newly marked, no data on R A t is needed to simulate and fit M A .Since R A 1 = 0, we have: and so on.
For juveniles, diagonal elements of the θ J matrix write: with ϕ J t the first year (i.e.juvenile) survival probability from year t to year t + 1 (for the species considered), and p the recapture (or re-sighting) probability set as constant among years and age classes, and for t < j < T with ϕ A t the adult survival probability from year t to year t + 1 (for the species considered).The last element pertains to individuals never recaptured Similarly for θ A , the above mentioned equations are identical to the exception that ϕ J is replaced by ϕ A , which leads to: for the diagonal elements of the θ A matrix, and for t < j < T : The last element again pertains to individuals never recaptured

Fecundity data
Fecundity was modelled using a Poisson regression: with F t the total number of offspring counted, R t the number of surveyed broods/litters, and f t the expected number of offspring (male + female) per adult female each year t.

Density dependence and random temporal variation on demographic rates
Intra-and inter-species density dependence of survival rates ϕ j i,t (with i ∈ {V, P } and j ∈ {J, A}) and fecundities f i,t were modelled on the logit and log scale, respectively.We initially used the same equations as the 2019 model, which are: where the number of adult predators negatively affects juvenile predator survival (negative intraspe- where the number of adult predators negatively affects juvenile prey survival (predation), (no density dependence on adult survival) where the number of juvenile prey positively affects predator fecundity, and where the number of adult prey individuals negatively affect prey fecundity (negative intraspecific density dependence).Associated results can be found in Supplementary Information Table S1 and Figures S5 to S8.
However, to limit posterior correlation between intercept and slope parameters and improve their estimation, we centered the abundances in the density dependent functions.While centering is typically done and most efficient on mean values, mean abundances varied here from a simulation to the next due to stochasticity.Therefore, intercept parameter values would have to be redefined for each simulation to maintain equivalent mean demographic rate values and asymptotic stage specific abundance equilibria for all simulation.To avoid these complications, we centered by subtracting the corresponding fixed point equilibria estimated in Barraquand & Gimenez (2019) as The new α intercept parameters obey the following centered formulas: To maintain equivalent dynamics to parameter set 1 of the 2019 model, we calculated the intercepts α 1 , α 3 , α 5 and α 7 as their original values plus the original slope multiplied by the estimated fixed point equilibrium of the n responsible for density dependence.For example, we now use whenever simulating α 3 = 0.5 − 0.025 × 21 = −0.025and α 5 = 0 + 0.004 × 101 = 0.404 (Table 1).
In addition, we introduced scenarios with inter-annual random variation in the intercepts of density-dependent links, such that Although mathematically identical, we used a parameterisation of the form µ + ϵσ, ϵ ∼ N (0, σ 2 ) (sometimes called non-centered) for survival estimates and a centered parameterisation (N (µ, σ 2 )) for fecundity estimates as it was found to be optimal for the mixing of the MCMC chains.As we were primarily interested in the ability of multispecies IPMs to estimate species interactions when these were in fact absent, inter species density dependence parameter values for α 2 and α 4 were

Initial values and monitoring setup
For all simulation scenarios in the main text, we used the initial population size vector , a study period of T = 30 years, the yearly number of monitored prey and predator broods/litters respectively R V t = 50 and R P t = 20, and the yearly number of marked juveniles was 100 for both species.Results using the monitoring setups of Barraquand & Gimenez (2019) with either 100 marked juveniles per species per year for T = 10 years, or 20 marked juveniles per species per year for T = 30 years(and the non centered density-dependencies) are also presented in the used outputs from models for which all α i had R < 1.1 and n ef f. > 50, that is, 100/100 models for the scenario without random temporal variation and 94/100 models for the scenario with random temporal variation.The computer code is provided at https://github.com/MatthieuPaquet/multi_species.

Results
Overall, estimates of density dependence curves were unbiased, regarding interspecific density dependence (either absent, Figures 1 and 3, or present, Figures S1 and S3) as well as intraspecific density dependence.This was true without and with temporal stochasticity (Figures 1 to 4).
This absence of bias extends to the alternative data designs with smaller sample sizes considered in Barraquand & Gimenez (2019) (shown in Supplementary Information in Figures S5 to S8).
Estimated α i parameters also did not show sign of bias in any scenario (Table 2 and Table S1).We did not detect more false positive species interactions than expected by chance when investigating the coverage probability of the species interaction parameters at 95% (i.e., the proportion of simulations where 95% CrI of estimated parameter includes the true parameter value).In the scenario with 100 juveniles marked each year for 30 years and no interspecific density dependence nor temporal random variation, this probability was 0.95 for α 4 and 0.92 for α 6 (cf Table 2, see Figure 2 for an example of estimated mean and pointwise 95% CrI density dependent curves).
Coverage probabilities were also satisfactory when interspecific interactions were simulated to be nonzero (0.94 and 0.93).Species interactions parameters were still estimable with no noticeable bias in the presence of random time variation (Figures 3 and 4), in which case the coverage probabilities of the species interaction parameters α 4 and α 6 at 95% were 0.99 and 0.98 respectively in absence of interspecific interactions (Table 2).In the presence of interspecific interactions, coverage values were both 0.96.Moreover, the addition of random time variation did not noticeably alter the precision of the species interaction parameters, both in absence and presence of species interactions (Figure S3, Table 2).

Discussion
Building on the multispecies integrated predator-prey model of Barraquand & Gimenez (2019), we investigated here whether multispecies IPMs could output interactions where there are in fact none.We did so by modelling functions relating vital rates to stage-specific species densities, whose slope parameters are used to model species interactions.We found that when those slopes were simulated as zero, the estimates were centered on zero and therefore unbiased.There was also a good coverage probability of interaction parameters (close to 0.95 for 95% CrIs).We also found that adding temporal variability to these multispecies density-dependent link functions did not alter these results.This confirms that multispecies IPMs are a promising way to estimate species interactions, and in particular, that they could be used to infer whether two species interact or not when such information is missing.
These results are encouraging, though some readers might find our sample sizes relatively large (see Appendix B for slightly lower sample sizes).In a previous version of this work, we inadvertently omitted the θ A CMR array in the code, which transformed the model into a capture-removal model (i.e., individuals were re-captured only once and then removed from the population, as in hunting or fishing data).In this configuration, the lower amount of data on survival and detection provided proper estimation of all quantities for the main text data design but not those of Appendix B for which convergence was not always reached.With live capture-recapture data, all data designs (main text and Appendix B) now provide satisfactory convergence and estimation.Moreover, in field population studies, additional types of data available are likely to improve the estimation of species interactions and we give three examples below.First, when age classes can be determined during the count observation process, including such information explicitly in the model (see e.g., Weegman et al., 2016;Paquet et al., 2019) will increase identifiability and/or precision of survival parameters and age specific abundances, and therefore will likely improve the estimation of density dependence parameters as well.This stage-specific abundance information may also allow, in some cases where counts are provided with little error, to remove the observation process, which we cannot do in our current model formulation because the observed population size sums adult and juvenile densities, and this sum has to arise from a probability distribution (Equations ( 2) and ( 3)).
Second, integrating dead prey recovery data is likely to give extra information on the strength of predator-prey interactions.Dead recoveries are classically implemented in capture-mark-recovery models (Seber, 1972;North & Morgan, 1979) which in some cases can be combined with CMR data (Barker, 1999) and counts (Reynolds et al., 2009).Since the probability to find a dead prey is likely affected by predation rates in the population (e.g., in some systems prey eaten will not be recovered, in others dead recoveries may present signs of predation), taking the predation process into account in the dead recoveries data-generation mechanism could improve the estimation of the strength of predator-prey interactions.Finally, the spatial structure of the data should contain additional information that may help to estimate parameters.The extension to spatially explicit IPMs (Chandler & Clark, 2014;Zhao, 2020) for interacting populations represents a promising way forward for the estimation of species interactions.
We commented above on the amount of data and possible additional data types.However, the efficiency of multispecies IPMs in estimating species interactions may also depend on the parameter set, and thus on the ecological features of the populations studied.For example, the parameters considered here correspond well to vertebrate predator-prey systems with a stable equilibrium in absence of environmental perturbations.Faster life histories, different stage or age structure, and multiple factors contributing to altering the quantity of information encapsulated in the various data streams may alter the sample sizes required for efficient inferences.When applying these models to new systems with different life history parameters and density-dependent structures (e.g., predators also eating adult prey), simulated datasets with plausible ecological features for the empirical system considered (and similar data designs), will help confirm that parameter values can be recovered without bias and with sufficient precision.Tools such as JAGS (Plummer et al., 2003) or Nimble (de Valpine et al., 2017) make it particularly handy to both simulate and fit data with complex dynamic models.
Finally, while using the same model to simulate and fit the data is a necessary first step to (i) assess the identifiability of model parameters (and assess the amount and type of data needed for practical identifiability), (ii) evaluate the coverage of parameter estimates, and (iii) check for bias in the estimates that can still occur, notably because of limited sample sizes (Paquet et al., 2021), an important next step will be to evaluate the sensitivity of multi-species IPM estimates to model mis-specifications (Plard et al., 2021).For example, different functions than the log and logit links chosen here may be used to fit or to simulate intra-and inter-specific density-dependencies. Hence, we encourage future work to try fitting a broader range of plausible models that differ from the model used to simulate the data (or conversely, to simulate from more mechanistic models) in order to assess such sensitivity.
is the number of individuals of released at age class (a) at time t that were re-sighted the following year, and the last column m (a) t,T is the number of individuals released at age class (a) at time t that were never re-sighted.
the number of individuals of age class (a) released at time t.

Figure 1 :Figure 2 :Figure 3 :Figure 4 :
Figure1: Density-dependencies for juvenile survival rates (A for predator and B for prey) as well as prey (C) and predator (D) fecundities in the scenario without random time variation.Purple: simulated relationships, light green: posterior mean relationships for all 100 fitted models, dark green: average of the posterior mean relationships.True inter species density-dependencies (right panels) were set to be absent.

Figure S2 :
Figure S2: Example of posterior mean (blue-green line) and 95% Credible Intervals (grey polygons) of density-dependencies for juvenile survival rates (A for predator and B for prey) as well as prey (C) and predator (D) fecundities estimated by one of the 100 models run in the scenario without random time variation in presence of true inter species density-dependencies. Purple lines indicate the simulated (true) relationships.Points represent estimated mean demographic parameter each year plotted against estimated yearly abundance values and vertical and horizontal error bars their respective 95% Credible Intervals.

Figure S5 :Figure S6 :Figure S7 :Figure S8 :Figure S9 :
FigureS5: Density-dependencies for juvenile survival rates (A for predator and B for prey) as well as prey (C) and predator (D) fecundities in the scenario with 100 juveniles per species marked each year for 10 years without random time variation in absence of true inter species density-dependencies. Purple: simulated relationships, light green: posterior mean relationships for the 99 fitted models that appear to converge satisfactorily, dark green: average of the posterior mean relationships.

Table 1 :
Model parameters with their values.Values of α 4 and α 6 in the scenarios with true presence of species interactions are presented in parentheses.Temporal standard deviations (SD) are only present in the scenarios with random temporal variation.For interpretation, note that α i and temporal SD parameters are within exponential functions.For instance, α 5 = 0.404 corresponds to a mean fecundity of e 0.404 ≈ 1.5.
either set to zero for the simulations, or at the same value as the 2019 model.Parameter values used to simulate data and their interpretation can be found in Table1.

Table 2 :
Summary table of parameter estimates.Value refers to the true values used to simulate the data and values of the interspecific density dependent parameters are highlighted in bold.Estimate (95% quantiles) are the mean and the 95% quantiles of the posterior mean estimates.Coverage 95% is the proportion of 95% Credible Intervals that included the true parameter values.

Table S1 :
Value refers to the true values used to simulate the data and values of the interspecific density dependent parameters are highlighted in bold.Estimate (95% quantiles) are the mean and the 95% quantiles of the posterior mean estimates.Coverage 95% is the proportion of 95% Credible Intervals that included the true parameter values.