Do closely related species interact with similar partners? Testing for phylogenetic signal in bipartite interaction networks

Whether interactions between species are conserved on evolutionary time-scales has spurred the development of both correlative and process-based approaches for testing phylogenetic signal in interspecific interactions: do closely related species interact with similar partners? Here we use simulations to test the statistical performances of the two approaches that are the most widely used in the field: Mantel tests and the Phylogenetic Bipartite Linear Model (PBLM). Mantel tests investigate the correlation between phylogenetic distances and dissimilarities in sets of interacting partners, while PBLM is a process-based approach that relies on strong assumptions about how interactions evolve. We find that PBLM often detects a phylogenetic signal when it should not. Simple Mantel tests instead have infrequent false positives and moderate statistical power; however, they often artifactually detect that closely related species interact with dissimilar partners. Partial Mantel tests, which are used to partial out the phylogenetic signal in the number of partners, actually fail at correcting for this confounding effect, and we instead recommend evaluating the significance of Mantel tests with network permutations constraining the number of partners. We also explore the ability of simple Mantel tests to analyze clade-specific phylogenetic signals. We provide general guidelines and an application on an interaction network between orchids and mycorrhizal fungi.

contemporary interactions is to test for a phylogenetic signal in species interactions, 57 i.e. whether closely related species interact with similar sets of partners (Peralta 2016). 58 59 Testing for a phylogenetic signal in a trait for a given clade, i.e. whether a trait 60 is phylogenetically conserved, is mainstream (Felsenstein 1985;Blomberg et al. 2003; 61 Münkemüller et al. 2012). One approach (the 'correlative' approach) is to perform a 62 Mantel test between phylogenetic and trait distances (Mantel 1967); another approach 63 (the 'process-based' approach) relies on trait evolution models such as Pagel's l (Pagel 64 1999) or Blomberg's K (Blomberg et al. 2003). The process-based approach has a higher 65 ability to detect an existing phylogenetic signal (power) and a lower propensity to infer correlative approach should therefore only be used when the process-based approach 68 is not applicable, e.g. if the 'trait' data is expressed in terms of pairwise distances. 69 70 Testing for a phylogenetic signal in species interactions falls in the category of 71 cases where the 'trait' data are pairwise distances, here the between-species 72 dissimilarity in sets of interacting species. Simple Mantel  Mantel tests and PBLM sometimes provide contradictory conclusions on 102 empirical data and this is difficult to interpret because the statistical performances of 103 the two approaches have never been compared (Peralta 2016). Importantly, the 104 statistical performances of PBLM have not been tested. Here, we use simulations to 105 perform a comparative analysis of the statistical performances of these approaches. 106 We consider both weighted and unweighted bipartite interaction networks between 107 species from two guilds A and B (Fig. 1). Our results lead us to propose alternative 108 approaches for measuring phylogenetic signal in interaction networks. We also 109 investigate the ability of Mantel tests to detect the presence of phylogenetic signal in 110 the different clades of a phylogenetic tree, as phylogenetic signal may be limited to 111 some sub-clades. Finally, we provide general guidelines and illustrate them on an 112 orchid-fungus mycorrhizal network identified across the oceanic island of Réunion 113 (Martos et al. 2012). 114 Toy example of an interaction network between orchids (in green) and 117 mycorrhizal fungi (in brown) with associated phylogenetic trees. The bipartite 118 interaction network between two guilds A (here the orchids) and B (the fungi) is 119 represented by a matrix, which elements indicate either whether or not species interact 120 (i.e. 1 if they do and 0 otherwise, 'unweighted' or 'binary' network) or the frequency 121 of the interaction ('weighted' network; for example here we indicated the number of 122 times a given pairwise interaction has been observed using shades of gray from white 123 (no interaction) to dark gray (many interactions) a normal distribution centered on the parental traits and with a variance of 1. We 206 followed the interacting individuals during N=5.10 7 death events. In the end, we 207 extracted for each guild a species tree from its genealogy by randomly selecting one 208 individual per species (Fig. S1), we also recorded the number of individuals belonging 209 to each species, and counted the number of occurrences of each interspecific 210 interaction; we then reconstructed the corresponding weighted interaction network. Second, we separated the 2,400 simulated networks between those for which 220 we should expect a phylogenetic signal in species interactions and those for which we 221 should not (Figure 2). We did not expect any phylogenetic signal in species interactions 222 in neutral networks and in non-neutral networks with no phylogenetic signal in 223 species traits. Conversely, we expected phylogenetic signal in non-neutral networks 224 with phylogenetic signal in species traits. We evaluated phylogenetic signal in species 225 traits using two approaches. First, for simplicity and consistency with the rest of the  networks; we therefore only tested this approach on networks simulated with 500 291 individuals (i.e. a total of 400 networks). 292

293
PGLMM: We performed analyses of the statistical performances of PGLMM (Rafferty 294 and Ives 2013) using the function pglmm in the R-package phyr (Li et al. 2020).

295
Following the procedure used in Lajoie and Kembel (2021), we fitted for each network 296 different models accounting or not for phylogenetic signal in both the number of 297 partners and in the species interactions in both clades, using restricted maximum 298 likelihood and evaluating significance with likelihood ratio tests. Because fitting these 299 models can require a large amount of memory (e.g. >80 Gb for some networks with >50 300 species per guild) and long computation time (Fig. S2), we only tested this approach 301 on networks simulated with 500 individuals. We fitted the PGLMM using either a 302 Gaussian or a Poisson distribution of abundances for weighted networks, and a 303 binomial distribution (presence/absence data) for unweighted networks (Li et al. 2020 BipartiteEvol simulations resulted in interaction networks with a large range of 451 sizes for guilds A and B, from less than 50 to more than 250 species (Fig. S3). These 452 simulated networks had similar structural properties as empirical networks, including 453 in terms of connectance, nestedness, and modularity (Fig. S4). This means that 454 networks simulated using BipartiteEvol are realistic and cover a large range of 455 structures encountered in natural interaction networks. 456 457 Using Mantel tests, we found a significant phylogenetic signal in species traits 458 for most antagonistic and neutral simulations (Fig. S5A). In contrast, for many 459 mutualistic simulations, closely related species often did not tend to have similar traits, 460 except when αB=0.01 (i.e. mutualistic parameters iii, v, and vi; Fig. S5A). When αB were 461 higher (i.e. mutualistic parameters i, ii, and iv), we suspect stabilizing selection to 462 occur and erase the phylogenetic signal in the traits (Maliet et al. 2020): we therefore 463 do not expect any phylogenetic signal in species interactions for these simulations, 464 which represent ~40% of the mutualistic simulations. In addition, we found a negative 465 phylogenetic signal in species traits (suggesting that closely related species have 466 dissimilar traits) in less than 1% of the simulations (Fig. S5A). Given that we do not 467 expect BipartiteEvol to generate a negative phylogenetic signal in species traits and 468 given that the risk of false positives of a Mantel test is 5%, these 1% of networks with 469 a negative phylogenetic signal in species traits are likely false positives. We removed 470 them when evaluating the performance of the different approaches and we therefore 471 do not expect any negative phylogenetic signal in species interactions for the networks 472 we tested, i.e. closely related species should not tend to associate with dissimilar 473 partners. Results were similar when using Pagel's l, with a significant phylogenetic 474 signal in species traits for almost all antagonistic and neutral simulations, and in ~65% 475 of the mutualistic simulations (Fig. S5B). Mantel tests and Pagel's l lead to identical 476 conclusions for >95% of the simulated networks.  (Table S1), with one notable exception for small 484 networks when using weighted Jaccard distances and Pearson correlations (~8% of 485 false positives). Conversely, we detected a significant unexpected negative 486 phylogenetic signal in more than 10% of the simulated networks, in particular in the 487 small ones (Figs. 3 & S6). based on Pagel's l to measure the phylogenetic signals in species traits (Fig. S11). rows correspond to the 2 tested ecological distances: weighted Jaccard (a, b) or 504 weighted UniFrac (c, d) distances. One-tailed Mantel tests between phylogenetic 505 distances and ecological distances were performed using 10,000 permutations. In each 506 panel, the bars indicate the percentage of simulated networks that present a significant 507 positive correlation (in green; p-value>0.05 for the test of phylogenetic signal), a 508 significant negative correlation (in red; p-value>0.05 for the test of negative 509 phylogenetic signal), or no significant correlation (in yellow; both p-values>0.05). 510 Significant phylogenetic signals (resp. significant negative phylogenetic signals) are 511 shaded from light green (resp. red) to dark green (resp. red) according to the strength 512 of the signal: we arbitrarily considered a "low signal" when R<0.05 (resp. R>-0.05), an 513 "intermediate signal" when 0.05<R<0.15 (resp. -0.05>R>-0.15), and a "strong signal" 514 when R>0.15 (resp. R<-0.15).  were most often not significant unless the phylogenetic signal in species traits was 537 strong (R>0.6; Fig. S7). Even when the phylogenetic signal in species traits was very 538 strong (R>0.9), the phylogenetic signal in species interactions was not significant in 539 many networks. In mutualistic networks, phylogenetic signals in species interactions 540 were present only when there was a large asymmetry in the effects of trait matching 541 on the fitnesses of the species from guilds A or B (case v: αA=1; αB=0.01), i.e. when only 542 one guild was specialized. Conversely, in antagonistic networks, phylogenetic signals 543 were found mainly when trait matching had a strong impact on the fitness of guild B 544 (the obligate parasites/predators; αB≥ 0.1). Additionally, when the phylogenetic signal 545 was significant in one guild, it was generally also significant in the other; in 546 antagonistic networks, the signal was usually higher in guild A compared to guild B 547 (Fig. S6). 548 549 The statistical power of Mantel tests measuring phylogenetic signals in species 550 interactions seems to be modulated by network size, as phylogenetic signals were less 551 often significant but generally stronger in smaller networks (Fig. S6). Moreover, 552 Mantel tests based on Pearson correlations had higher power than Spearman and 553 Kendall correlations (Fig. S6) and weighted UniFrac distances performed better than 554 other ecological distances in terms of power in the context of these simulations (Fig.  555 S6; Table S2). 556 557 When using mean square errors to evaluate the significance of PBLM, we found 558 a significant phylogenetic signal in species interactions in most of the simulated 559 networks including when we did not expect any (Fig. 3e). The strength and the 560 significance of the inferred phylogenetic signals were independent of the strength of 561 the phylogenetic signal in species traits (Fig. S7). The propensity of PBLM to detect 562 phylogenetic signals decreased in large unweighted networks, but the false positives 563 remained >30%, including when using a more stringent significance cutoff (Figs. S8).

564
Similar results were obtained when bootstrapping to evaluate the significance (Fig.  565  S9). PGLMM on weighted networks with a Gaussian or Poisson distribution had 566 slightly lower but still frequent false positives (>25% or 20%, respectively) and 567 intermediate statistical power (<50%) when measuring phylogenetic signals in species 568 interactions (Fig. S10). PGLMM also often artifactually detected phylogenetic signals 569 in the number of partners (Fig. S10). Conversely, PGLMM on unweighted networks 570 never detected any significant signal (Fig. S10). 571

572
We inferred similar statistical performances of both Mantel tests and PBLM 573 when we used Pagel's l to evaluate phylogenetic signals in species traits (Figs. S7 and 574 S11). 575 576 Confounding effect of the phylogenetic signal in the number of partners 577 578 As expected, tests of phylogenetic signals in the number of partners were non-579 significant in the large majority of the BipartiteEvol networks, especially the larger ones 580 (Fig. S12). We did however observe significant correlations between ecological 581 distances and differences in the number of partners (Fig. S13). Partial Mantel tests 582 testing for phylogenetic signal in species interactions while accounting for the 583 phylogenetic signal in the number of partners had similar false positive rates and 584 power as simple Mantel tests (Figs. S6 & S14; Table S2). Sequential Mantel tests 585 decreased the statistical power by less than 2% (Table S2). Partitioning the ecological 586 distances before running the Mantel test reduced the power by only 5% and resulted 587 in less than 1.5% of false positives, but also resulted in an artefactual detection of 588 negative phylogenetic signals in 9% of the simulations (Table S2; Fig. S15). Finally, 589 Mantel tests with network permutations keeping the number of partners constant 590 increased the statistical power by ~5% (Table S2) but resulted in an artefactual 591 detection of (positive or negative) phylogenetic signals in ~10% of the simulations 592 when using Jaccard distances (Fig. S16). 593 594 Networks simulated with phylogenetic conservatism in the number, but not the 595 identity of partners covered a realistic range of sizes (Fig. S17). As expected, Mantel 596 tests revealed significant phylogenetic signals in the number of partners in >60% of 597 these networks, with an increasing percentage of significant tests with decreasing aA 598 (i.e. increasing conservatism in the number of partners; Table 1b; Fig. S18). We found 599 significant correlations between differences in the number of partners and ecological 600 distances in most of these simulated networks (Fig. S19), generating a confounding 601 effect. As a result, simple Mantel tests testing for phylogenetic signal in species 602 interactions without accounting for the phylogenetic signal in the number of partners 603 were frequently significant (>30%; Fig. S20; Table S3). Partial Mantel tests controlling 604 for differences in the number of partners slightly decreased the proportion of false 605 positives, but it remained very high (>25% of false positives; Fig. S21). In addition, 606 partial Mantel tests detected a spurious significant negative phylogenetic signal in 607 species interactions in >15% of the networks (Fig. S21). Conversely, only a few 608 networks with a significant simple Mantel test in species interactions did not produce 609 a significant simple Mantel test in the number of partners, such that sequential Mantel 610 tests had only ~7% of false positives (Table S3). Partitioning the ecological distances 611 before running the Mantel test (Fig. S22) or using Mantel tests with network 612 permutations keeping the number of partners constant (Fig. S23) had even lower false 613 positive rates (~4% and ~5% respectively; Table S3). However, partitioning the 614 ecological distances led to an artefactual detection of negative phylogenetic signals in 615 more than 30% of the simulated networks (Fig. S23). 616

617
Effect of phylogenetic uncertainty, sampling asymmetry, and network 618 heterogeneity on measures of phylogenetic signal in species interactions 619 620 As expected, phylogenetic uncertainty in the partner's tree increased when the 621 length of the simulated DNA sequence used for phylogenetic reconstruction decreased 622 (Fig. S24), resulting in a decreased statistical power of Mantel tests using UniFrac 623 distances (Fig. S25). However, even when the simulated DNA sequences were the 624 shortest (75 base pairs), resulting in very noisy reconstructed partners' trees ( Fig. S26), 625 the statistical power of the Mantel tests using UniFrac distances decreased by less than 626 5% (Fig. S25). 627

628
Our results on the statistical performance of tests of phylogenetic signal were 629 similar when considering sampling asymmetry (Figs. S27-30): PBLM spuriously 630 detected phylogenetic signals when it should not, and Mantel tests had decent 631 statistical performances, especially when using weighted UniFrac distances. In 632 addition, the correlations of the Mantel tests in guild A were generally higher when 633 significant (Fig. S29). 634

635
Our clade-specific tests of phylogenetic signals using Mantel tests while 636 correcting for multiple testing recovered a significant phylogenetic signal in 82% of the 637 nodes where mutualism originated (Fig. S31), as well as in most of the ascending 638 nodes. Conversely, we did not find spurious phylogenetic signals in nodes with only 639 neutrally-evolving lineages (Fig. S31). signal in species interactions for fungi and orchids revealed a significant but low 650 phylogenetic signal (R<0.10) on the orchid side using Jaccard distances; however, the 651 significance disappeared with UniFrac distances (Table S4). Similarly, marginally not-652 significant and low phylogenetic signals were detected on the mycorrhizal fungi side 653 (R<0.04; Table S4). Next (step 2), results were qualitatively similar when using Mantel 654 tests with permutations keeping the number of partners constant, suggesting that the 655 phylogenetic signal in species interaction did not result from a phylogenetic signal in 656 the number of partners. Our investigation of clade-specific phylogenetic signals in 657 species interactions in orchids (option 1) revealed a significant phylogenetic signal in 658 Angraecineae, a sub-tribe composed of 34 epiphytic species (Mantel test: R=0.37; 659 Bonferroni-corrected p-value=0.016; Fig. 5) interacting with 53 fungi, suggesting that 660 closely related Angraecineae tend to interact with more similar mycorrhizal fungi. 661 When we checked the robustness of the significant phylogenetic signal detected in 662 Angraecineae (option 2) by subsampling the Angraecineae clade down to 10 species, 663 we still recovered a significant signal in species interactions (Fig. S32). Similarly, we 664 still recovered a significant signal when changing the arbitrary age of the polytomies 665 corresponding to unresolved orchid genera (Fig. S33). 666 This guideline is composed of two fixed steps followed by two optional ones and can 669 be applied as soon as a bipartite interaction network (with or without abundances) 670 and at least the phylogenetic tree of guild A are available. The phylogenetic tree does 671 not need to be binary, rooted, or ultrametric. For each step, an example of the 672 corresponding function available in the R-package RPANDA is indicated in grey.

673
Step 1: The first step consists of testing for a phylogenetic signal in species interactions 674 for guild A (i.e. whether closely related species from guild A tend to interact with 675 similar partners from guild B) using a one-tailed simple Mantel test. This step requires 676 picking an ecological distance (e.g. UniFrac or Jaccard distances) and a type of 677 correlation (Pearson correlation by default). 678 Step 2: Next, to assess whether a phylogenetic signal in species interactions really 679 comes from the identity of species interactions (and not from a phylogenetic signal in 680 the number of partners), the second step consists of testing whether the phylogenetic 681 signal in guild A remains significant when the significance of the Mantel correlation is 682 evaluated using network permutations keeping the number of partners constant. 683 Option 1: Clade-specific phylogenetic signals in guild A can be tested using simple 684 Mantel tests while correcting for multiple testing (e.g. Bonferroni correction). It can be 685 used to test whether some clades present different intensities of phylogenetic signals 686 (e.g. because of higher specificity). 687 Option 2: The robustness of the findings can be tested by looking at how the 688 conclusions might be affected by phylogenetic uncertainty (e.g. using a Bayesian 689 posterior of tree) or sampling bias. The potential effect of sampling bias can be 690 investigated by subsampling all clades to the same number of species. 691 If a phylogenetic tree for guild B is available, all these steps can be replicated to test 692 for the phylogenetic signal in species interaction in guild B. 693 Step 2: Is there still a phylogenetic signal in species interactions when accounting for the signal in the number of partners ? (simple Mantel test with permutations keeping the number of partners per species constant) RPANDA::phylosignal_network(network, tree_A, tree_B, method = "GUniFrac", correlation = "Pearson", permutations="nbpartners") Step The orchid phylogeny (Martos et al., 2012) is represented with its nodes colored 699 according to the results of the Mantel test performed on the corresponding sub-700 network: in blue if non-significant, in grey when the node has less than 10 descendent 701 species (the Mantel test was not performed), and in red when the phylogenetic signal 702 is significant. Each one-tailed simple Mantel test was performed using the Pearson 703 correlation, UniFrac distances, and 100,000 permutations, and its significance was 704 evaluated while correcting for multiple testing (Bonferroni correction). 705 For each species, its habitat (terrestrial or epiphytic) is indicated at the tips of the tree 706 and the main orchid clades are highlighted in colors. Only the genera are indicated at 707 the tips of the tree (see Supplementary Figure S32 for the species list). 708 709 0.5 < R 0.3 < R < 0.5 0.1 < R < 0.3 R < 0.  Methods 3), but they probably lose information. We also reported that 776 using ecological distances that consider interaction abundances, such as weighted 777 Jaccard or UniFrac distances, significantly improves the detection of phylogenetic 778 signals. Using UniFrac distances, which rely on the phylogenetic relatedness of the 779 partners, can be particularly relevant when species delineation is somewhat arbitrary, 780 e.g. in microbial systems, as it is less sensitive to species delineation than Jaccard 781 distances. In addition, results obtained with UniFrac distances were only moderately 782 influenced by the phylogenetic uncertainty in the partner's tree, which should thus not 783 prevent the use of UniFrac distances. In the context of our BipartiteEvol simulations, 784 which assume that species interactions are mediated by some phylogenetically-785 conserved traits on both sides of the network, we found that UniFrac distances 786 outperform Jaccard distances. We note however that a significant phylogenetic signal 787 in UniFrac or Jaccard distances can reflect different evolutionary processes, such as 788 one where the traits involved in the interaction are evolutionarily conserved on both 789 sides of the networks in the case of UniFrac, and on only one side of the network in 790 the case of Jaccard (Calatayud et al. 2016). Therefore, choosing between one or the 791 other metric (or using both) can also be dictated by the question at stake. Also, if 792 communities of interactors differ mainly in terms of recently diverged species, Jaccard 793 distances may perform better, as UniFrac distances emphasize differences in long 794 branches rather than recent splits (Sanders et al. 2014). 795 796 We also found that multiple simple Mantel tests combined with a Bonferroni 797 correction perform rather well to investigate clade-specific phylogenetic signals. Such 798 an approach can therefore be valuable for detecting the phylogenetic signals in 799 particular sub-clades among large "meta-networks", such as those describing host-800 microbiota phylosymbiosis (Song et al. 2020), which likely have heterogeneous 801 phylogenetic signals across the network. 802

803
While simple Mantel tests have satisfactory statistical performances, these tests do 804 not control for the potential confounding effect of the phylogenetic signal in the 805 number of partners. Partial Mantel tests are frequently used for investigating a 806 phylogenetic signal in species interactions while controlling for the signal in the 807 number of partners; however, we found that they often detected significant signals in 808 species interactions when we simulated signals in only the number of partners. Thus, 809 partial Mantel tests fail at discerning whether evolutionary relatedness strictly affects 810 the identity of partners, independently of the total number of partners associated with 811 each species (Rezende et al. 2007). This corroborates the poor statistical performances 812 of partial Mantel tests frequently observed in other contexts (Harmon and Glor 2010; 813 Guillot and Rousset 2013). Among the alternative possibilities we tested, using 814 sequential Mantel tests, i.e. testing first for the phylogenetic signal in species 815 interactions, and if significant, testing for the phylogenetic signal in the number of 816 partners, has both high statistical power and a low false positive rate. Yet, if both 817 Mantel tests are significant, it does not say whether the signal is entirely due to the 818 signal in the number of partners and therefore, sequential Mantel tests likely have very 819 low power in this case. Alternatively, using methods that can explicitly partition 820 ecological distances into parts due to dissimilarities in the number of partners versus 821 the identity of the partners appears promising, although we detected a slight power 822 decrease in our simulations and >30% of artefactual negative phylogenetic signals 823 when partitioning unweighted Jaccard distances. Other partitioning approaches may 824 give better results and should require further attention, as they offer a direct Therefore, if there is still a signal while constraining the number of partners, then we 831 can safely conclude that evolutionary relatedness affects the identity of partners. We 832 thus recommend using such network permutations to correct for the confounding 833 effect of the phylogenetic signal in the number of partners (Figure 4). 834

835
By definition, phylogenetic signals in species interactions measure general patterns 836 that are not informative of the processes at play (Losos 2008). A better understanding 837 of the ecological and evolutionary processes playing a role in the assembly of 838 interaction networks (Harmon et al. 2019) will require developing integrative process-839 based approaches, for instance, an inference machinery for eco-evolutionary models 840 such as BipartiteEvol. Classical inferences (generalized least-squares or likelihood-841 based approaches) might be challenging for such complex models (Hadfield et al. 842 2014), but strategies such as machine learning provide promising alternatives. 843

844
In the mycorrhizal network from La Réunion, we found non-significant or weak 845 phylogenetic signals in species interactions at the level of the entire orchid-fungus 846 network, suggesting these interactions are generally poorly conserved over long Interaction networks are increasingly being analyzed to unravel the 858 evolutionary processes shaping their structure and to predict their stability. Currently-859 used tools for measuring phylogenetic signals are clearly misleading. The approach 860 we propose based on Mantel tests may have a limited statistical power, but it avoids 861 false positives, and it is flexible as it allows using different ecological distances and/or 862 permutation strategies. By emphasizing the limits of current tests of phylogenetic 863 signal, we hope to stimulate new developments in the statistical adjustment to 864 empirical data of process-based models for the evolution of interaction networks. 865