Section: Evolutionary Biology
Topic: Evolution

The effect of gene tree dependence on summary methods for species tree inference

Corresponding author(s): Chan, Yao-ban (yaoban@unimelb.edu.au)

10.24072/pcjournal.694 - Peer Community Journal, Volume 6 (2026), article no. e25

Get full text PDF Peer reviewed and recommended by PCI

When inferring the evolutionary history of species and the genes they contain, the phylogenetic trees of genes can be different from those of the species and to each other, due to a variety of causes, including incomplete lineage sorting. We often wish to infer the species tree, but only reconstruct the gene trees from sequences. We then combine the gene trees to produce a species tree; methods to do this are known as summary methods, of which ASTRAL is currently among the most popular. ASTRAL has been shown to be accurate in many practical scenarios through extensive simulations. However, these simulations generally assume that the input gene trees are independent of each other (infinite recombination between loci). This is known to be unrealistic, as genes that are close to each other on the chromosome (or are co-evolving) have dependent phylogenies. In this paper, we develop a model for generating dependent gene trees within a species tree, based on the coalescent with recombination. We then use these trees as input to ASTRAL to reassess its accuracy for dependent gene trees. Our results allow us to evaluate the impact of any level of dependence on the accuracy of ASTRAL, both when gene trees are known and estimated from sequences. We find that a fixed amount of dependence reduces the effective sample size by a constant factor. In current phylogenomic datasets, loci are generally sampled at large genomic distances to reduce gene tree dependence, thereby limiting the number of genes available for inference. However, full independence between genes is not required for accurate species tree estimation, and excluding gene trees may reduce inference accuracy. This creates a trade-off between the number of genes used and the degree of gene tree dependence. We therefore propose a method to identify the minimum genomic separation required to maintain satisfactory inference accuracy.

Published online:
DOI: 10.24072/pcjournal.694
Type: Research article
Keywords: Incomplete Lineage Sorting, Species Tree Inference, ASTRAL, Gene Tree Dependence, Gene Tree Estimation Error, Recombination

He, Wanting  1 ; Scornavacca, Celine  2 ; Chan, Yao-ban  1

1 School of Mathematics and Statistics/Melbourne Integrative Genomics, The University of Melbourne, Melbourne, Victoria, Australia
2 Institut des Sciences de l’Évolution Montpellier, Université Montpellier, Montpellier, France
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
@article{10_24072_pcjournal_694,
     author = {He, Wanting and Scornavacca, Celine and Chan, Yao-ban},
     title = {The effect of gene tree dependence on summary methods for species tree inference
},
     journal = {Peer Community Journal},
     eid = {e25},
     year = {2026},
     publisher = {Peer Community In},
     volume = {6},
     doi = {10.24072/pcjournal.694},
     language = {en},
     url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.694/}
}
TY  - JOUR
AU  - He, Wanting
AU  - Scornavacca, Celine
AU  - Chan, Yao-ban
TI  - The effect of gene tree dependence on summary methods for species tree inference

JO  - Peer Community Journal
PY  - 2026
VL  - 6
PB  - Peer Community In
UR  - https://peercommunityjournal.org/articles/10.24072/pcjournal.694/
DO  - 10.24072/pcjournal.694
LA  - en
ID  - 10_24072_pcjournal_694
ER  - 
%0 Journal Article
%A He, Wanting
%A Scornavacca, Celine
%A Chan, Yao-ban
%T The effect of gene tree dependence on summary methods for species tree inference

%J Peer Community Journal
%] e25
%D 2026
%V 6
%I Peer Community In
%U https://peercommunityjournal.org/articles/10.24072/pcjournal.694/
%R 10.24072/pcjournal.694
%G en
%F 10_24072_pcjournal_694
He, W.; Scornavacca, C.; Chan, Y.-B. The effect of gene tree dependence on summary methods for species tree inference. Peer Community Journal, Volume 6 (2026), article  no. e25. https://doi.org/10.24072/pcjournal.694

PCI peer reviews and recommendation, and links to data, scripts, code and supplementary information: 10.24072/pci.evolbiol.100860

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

[1] Abecasis, G.; Noguchi, E.; Heinzmann, A.; Traherne, J.; Bhattacharyya, S.; Leaves, N.; Anderson, G.; Zhang, Y.; Lench, N.; Carey, A. Extent and distribution of linkage disequilibrium in three genomic regions, The American Journal of Human Genetics, Volume 68 (2001), pp. 191-197 | DOI

[2] Allio, R.; Delsuc, F.; Belkhir, K.; Douzery, E.; Ranwez, V.; Scornavacca, C. OrthoMaM v12: A data-base of curated single-copy ortholog alignments and trees to study mammalian evolutionary ge-nomics, Nucleic Acids Research, Volume 52 (2024), p. 529 | DOI

[3] Ballesteros, J.; Sharma, P. A critical appraisal of the placement of Xiphosura (Chelicerata) with account of known sources of phylogenetic error, Systematic Biology, Volume 68 (2019), pp. 896-917 | DOI

[4] Barker, D.; Pagel, M. Predicting Functional Gene Links from Phylogenetic-Statistical Analyses of Whole Genomes, PLoS Computational Biology, Volume 1 (2005), pp. 1-8 | DOI

[5] Baumdicker, F.; Bisschop, G.; Goldstein, D.; Gower, G.; Ragsdale, A.; Tsambos, G.; Zhu, S.; Eldon, B.; Ellerman, E.; Galloway, J.; Gladstein, A.; Gorjanc, G.; Guo, B.; Jeffery, B.; Kretzschumar, W.; Lohse, K.; Matschiner, M.; Nelson, D.; Pope, N.; Quinto-Cortés, C. Efficient ancestry and mutation simulation with msprime 1.0, Genetics, Volume 220 (2022), p. 229 | DOI

[6] Binet, M.; Gascuel, O.; Scornavacca, C.; Douzery, P.; EJ, P.; F. Fast and accurate branch lengths estimation for phylogenomic trees, BMC bioinformatics, Volume 17 (2016), p. 23 | DOI

[7] Butler, G.; Rasmussen, M.; Lin, M.; Santos, M.; Sakthikumar, S.; Munro, C.; Rheinbay, E.; Grabherr, M.; Forche, A.; Reedy, J. Evolution of pathogenicity and sexual reproduction in eight Candida genomes, Nature, Volume 459 (2009), pp. 657-662 | DOI

[8] Chan, Y.; Li, Q.; Scornavacca, C. The large-sample asymptotic behaviour of quartet-based sum-mary methods for species tree inference, Journal of Mathematical Biology, Volume 85 (2022), pp. 1-22 | DOI

[9] Chen, G.; Marjoram, P.; Wall, J. Fast and flexible simulation of DNA sequence data, Genome research, Volume 19 (2009), pp. 136-142 | DOI

[10] Collins, A.; Lonjou, C.; Morton, N. Genetic epidemiology of single-nucleotide polymorphisms, Proceedings of the National Academy of Sciences, Volume 96, 1999, pp. 15173-15177 | DOI

[11] Conry, M. Determining the impact of recombination on phylogenetic inference, The Florida State University, 2020

[12] Coyne, J.; Orr, H. Speciation, Oxford University Press, 2004

[13] Darwin, C. On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life, John Murray, London, 1859 | DOI

[14] Davidson, R.; Vachaspati, P.; Mirarab, S.; Warnow, T. Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer, BMC Genomics, Volume 16 (2015), pp. 1-12 | DOI

[15] DeGiorgio, M.; Degnan, J. Robustness to divergence time underestimation when inferring species trees from estimated gene trees, Systematic Biology, Volume 63 (2014), pp. 66-82 | DOI

[16] Deinum, E.; Halligan, D.; Ness, R.; Zhang, Y.; Cong, L.; Zhang, J.; Keightley, P. Recent evolution in Rattus norvegicus is shaped by declining effective population size, Molecular biology and evolution, Volume 32 (2015), pp. 2547-2558 | DOI

[17] Flouri, T.; Jiao, X.; Rannala, B.; Yang, Z. Species tree inference with BPP using genomic sequences and the multispecies coalescent, Molecular biology and evolution, Volume 35 (2018), pp. 2585-2593 | DOI

[18] Geraldes, A.; Basset, P.; Gibson, B.; Smith, K.; Harr, B.; YU, H.; Bulatova, N.; Ziv, Y.; Nachman, M. Inferring the history of speciation in house mice from autosomal, X-linked, Y-linked and mitochondrial genes, Molecular ecology, Volume 17 (2008), pp. 5349-5363 | DOI

[19] Giarla, T.; Esselstyn, J. The challenges of resolving a rapid, recent radiation: empirical and simulated phylogenomics of Philippine shrews, Systematic Biology, Volume 64 (2015), pp. 727-740 | DOI

[20] Griffiths, R.; Marjoram, P. Ancestral inference from samples of DNA sequences with recombi-nation, Journal of computational biology, Volume 3 (1996), pp. 479-502 | DOI

[21] Guéguen, L.; Gaillard, S.; Boussau, B.; Gouy, M.; Groussin, M.; Rochette, N.; Bigot, T.; Fournier, D.; Pouyet, F.; Cahais, V. Bio++: efficient extensible libraries and tools for computational molec-ular evolution, Molecular biology and evolution, Volume 30 (2013), pp. 1745-1750 | DOI

[22] Han, Y.; Molloy, E. Improving quartet graph construction for scalable and accurate species tree estimation from gene trees, Genome Research, Volume 33 (2023), pp. 1042-1052 | DOI

[23] Han, Y.; Molloy, E. Improved robustness to gene tree incompleteness, estimation errors, and systematic homology errors with weighted TREE-QMC, Systematic Biology, Volume syaf009 (2025) | DOI

[24] Hasegawa, M.; Kishino, H.; Ta, Y. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA, Journal of Molecular Evolution, Volume 22 (1985), pp. 160-174 | DOI

[25] He, W.; Scornavacca, C.; Yb, C. Code for: the effect of gene tree dependence on summary methods for species tree inference, Zenodo, 2026 | DOI

[26] Heled, J.; Drummond, A. Bayesian inference of species trees from multilocus data, Molecular biology and evolution, Volume 27 (2009), pp. 570-580 | DOI

[27] Hillis, D.; Moritz, C.; Mable, B. Molecular systematics, https://doi.org/10.1093/sysbio/45.4.607, Volume 23. Sinauer, 1996 | DOI

[28] Huang, H.; Knowles, L. Unforeseen consequences of excluding missing data from next-generation sequences: simulation study of RAD sequences, Systematic Biology, Volume 65 (2016), pp. 357-365 | DOI

[29] Hudson, R. Properties of a neutral allele model with intragenic recombination, Theoretical Population Biology, Volume 23 (1983), pp. 183-201 | DOI

[30] Hudson, R. Generating samples under a Wright–Fisher neutral model of genetic variation, Bioinformatics, Volume 18 (2002), pp. 337-338 | DOI

[31] Jarvis, E.; Mirarab, S.; Aberer, A.; Li, B.; Houde, P.; Li, C.; Ho, S.; Faircloth, B.; Nabholz, B.; Howard, J. Phylogenomic analyses data of the avian phylogenomics project, GigaScience, Volume 4 (2015), p. 13742-014 | DOI

[32] Kelleher, J.; Etheridge, A.; McVean, G. Efficient coalescent simulation and genealogical anal-ysis for large sample sizes, PLoS Computational Biology, Volume 12 (2016), p. 1004842 | DOI

[33] Kingman, J. On the genealogy of large populations, Journal of Applied Probability, Volume 19 (1982), pp. 27-43 | DOI

[34] Kishino, H. Assessing the accuracy of species tree reconstruction using dependent gene trees, Peer Community in Evolutionary Biology, 2026 | DOI

[35] Kuhner, M.; Felsenstein, J. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates, Molecular biology and evolution, Volume 11 (1994), pp. 459-468 | DOI

[36] Lanier, H.; Knowles, L. Is recombination a problem for species-tree analyses?, Systematic Biology, Volume 61 (2012), pp. 691-701 | DOI

[37] Lanier, H.; Knowles, L. Applying species-tree analyses to deep phylogenetic histories: chal-lenges and potential suggested from a survey of empirical phylogenetic studies, Molecular Phylo-genetics and Evolution, Volume 83 (2015), pp. 191-199 | DOI

[38] Legried, B.; Molloy, E.; Warnow, T.; Roch, S. Polynomial-time statistical estimation of species trees under gene duplication and loss, Journal of Computational Biology, Volume 28 (2021), pp. 452-468 | DOI

[39] Li, Q.; Scornavacca, C.; Galtier, N.; Chan, Y. The multilocus multispecies coalescent: a flexible new model of gene family evolution, Systematic Biology, Volume 70 (2021), pp. 822-837 | DOI

[40] Liu, K.; Steinberg, E.; Yozzo, A.; Song, Y.; Kohn, M.; Nakhleh, L. Interspecific introgressive origin of genomic diversity in the house mouse, Proceedings of the National Academy of Sciences, Volume 112, 2015, pp. 196-201 | DOI

[41] Liu, L. BEST: Bayesian estimation of species trees under the coalescent model, Bioinformatics, Volume 24 (2008), pp. 2542-2543 | DOI

[42] Liu, L.; Yu, L. Estimating species trees from unrooted gene trees, Systematic Biology, Volume 60 (2011), pp. 661-667 | DOI

[43] Liu, L.; Yu, L.; Edwards, S. A maximum pseudo-likelihood approach for estimating species trees under the coalescent model, BMC Evolutionary Biology, Volume 10 (2010), pp. 1-18 | DOI

[44] Liu, L.; Yu, L.; Pearl, D.; Edwards, S. Estimating species phylogenies using coalescence times among sequences, Systematic Biology, Volume 58 (2009), pp. 468-477 | DOI

[45] Maddison, W. Gene trees and species trees, Systematic Biology, Volume 46 (1997), pp. 523-536 | DOI

[46] Mahbub, M.; Wahab, Z.; Reaz, R.; Rahman, M.; Bayzid, M. wQFM: highly accurate genome-scale species tree estimation from weighted quartets, Bioinformatics, Volume 37 (2021), pp. 3734-3743 | DOI

[47] Markin, A.; Eulenstein, O. Quartet-based inference is statistically consistent under the unified duplication-loss-coalescence model, Bioinformatics, Volume 37 (2021), pp. 4064-4074 | DOI

[48] McVean, G.; Cardin, N. Approximating the coalescent with recombination, Philosophical Transactions of the Royal Society B: Biological Sciences, Volume 360 (2005), pp. 1387-1393 | DOI

[49] Minh, B.; Schmidt, H.; Chernomor, O.; Schrempf, D.; Woodhams, M.; Haeseler, A.; Lanfear, R. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era, Molecular Biology and Evolution, Volume 37 (2020), pp. 1530-1534 | DOI

[50] Mirarab, S.; Bayzid, M.; Boussau, B.; Warnow, T. Statistical binning enables an accurate coalescent-based estimation of the avian tree, Science, Volume 346 (2014), p. 1250463 | DOI

[51] Mirarab, S.; Reaz, R.; Bayzid, M.; Zimmermann, T.; Swenson, M.; Warnow, T. ASTRAL: genome-scale coalescent-based species tree estimation, Bioinformatics, Volume 30 (2014), p. 541 | DOI

[52] Mirarab, S.; Warnow, T. ASTRAL-II: coalescent-based species tree estimation with many hun-dreds of taxa and thousands of genes, Bioinformatics, Volume 31 (2015), p. 44 | DOI

[53] Molloy, E.; Warnow, T. To include or not to include: the impact of gene filtering on species tree estimation methods, Systematic Biology, Volume 67 (2018), pp. 285-303 | DOI

[54] N, M. K. Phylogenomic subsampling and the search for phylogenetically reliable loci, Molecular Biology and Evolution, Volume 38 (2021), pp. 4025-4038 | DOI

[55] Mossel, E.; Roch, S. Incomplete lineage sorting: consistent phylogeny estimation from multiple loci, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Volume 7 (2008), pp. 166-171 | DOI

[56] Nguyen, L.; Schmidt, H.; Haeseler, A.; Minh, B. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-Likelihood phylogenies, Molecular Biology and Evolution, Volume 32 (2015), pp. 268-274 | DOI

[57] Ogilvie, H.; Bouckaert, R.; Drummond, A. StarBEAST2 brings faster species tree inference and accurate estimates of substitution rates, Molecular biology and evolution, Volume 34 (2017), pp. 2101-2114 | DOI

[58] Pamilo, P.; Nei, M. Relationships between gene trees and species trees, Molecular Biology and Evolution, Volume 5 (1988), pp. 568-583 | DOI

[59] Patané, J. S. L.; Martins, J.; Setubal, J. C. A Guide to Phylogenomic Inference, Methods in Molecular Biology, Springer US, New York, NY, 2024, pp. 267-345 | DOI

[60] Patel, S.; Kimball, R.; Braun, E. Error in phylogenetic estimation for bushes in the tree of life, Journal of Phylogenetics and Evolutionary Biology, Volume 1 (2013), pp. 1-10 | DOI

[61] Phifer-Rixey, M.; Harr, B.; Hey, J. Further resolution of the house mouse (Mus musculus) phy-logeny by integration over isolation-with-migration histories, BMC evolutionary biology, Volume 20 (2020), p. 120 | DOI

[62] Price, M.; Dehal, P.; Arkin, A. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix, Molecular Biology and Evolution, Volume 26 (2009), pp. 1641-1650 | DOI

[63] Price, M.; Dehal, P.; Arkin, A. FastTree 2–approximately maximum-likelihood trees for large alignments, PLOS ONE, Volume 5 (2010), p. 9490 | DOI

[64] Rannala, B.; Yang, Z. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci, Genetics, Volume 164 (2003), pp. 1645-1656 | DOI

[65] Rannala, B.; Yang, Z. Efficient Bayesian species tree inference under the multispecies coalescent, Systematic biology, Volume 66 (2017), pp. 823-842 | DOI

[66] Rasmussen, M.; Kellis, M. Unified modeling of gene duplication, loss, and coalescence using a locus tree, Genome Research, Volume 22 (2012), pp. 755-765 | DOI

[67] Robinson, D.; Foulds, L. Comparison of phylogenetic trees, Mathematical Biosciences, Volume 53 (1981), pp. 131-147 | DOI

[68] Sayyari, E.; Mirarab, S. Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction, BMC Genomics, Volume 17 (2016), pp. 101-113 | DOI

[69] Simmons, M.; Sloan, D.; Gatesy, J. The effects of subsampling gene trees on coalescent meth-ods applied to ancient divergences, Molecular Phylogenetics and Evolution, Volume 97 (2016), pp. 76-89 | DOI

[70] Slatkin, M.; Pollack, J. The concordance of gene trees and species trees at two linked loci, Genetics, Volume 172 (2006), pp. 1979-1984 | DOI

[71] Song, S.; Liu, L.; Edwards, S.; Wu, S. Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model, Proceedings of the National Academy of Sciences, Volume 109, 2012, pp. 14942-14947 | DOI

[72] Spielman, S.; Wilke, C. Pyvolve: a flexible Python module for simulating sequences along phylogenies, PloS one, Volume 10 (2015), p. 0139047 | DOI

[73] Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies, Bioinformatics, Volume 30 (2014), pp. 1312-1313 | DOI

[74] Taillon-Miller, P.; Bauer-Sardiña, I.; Saccone, N.; Putzel, J.; Laitinen, T.; Cao, A.; Kere, J.; Pilia, G.; Rice, J.; Kwok, P. Juxtaposed regions of extensive and minimal linkage disequilibrium in human Xq25 and Xq28, Nature genetics, Volume 25 (2000), pp. 324-328 | DOI

[75] Tavaré, S. Some probabilistic and statistical problems in the analysis of DNA sequences, Some Mathematical Questions in Biology: DNA Sequence Analysis, Volume 17, American Mathematical Society, 1986, pp. 57-86

[76] Vachaspati, P.; Warnow, T. ASTRID: accurate species trees from internode distances, BMC Genomics, Volume 16 (2015), pp. 1-13 | DOI

[77] Wang, Z.; Liu, K. A performance study of the impact of recombination on species tree analysis, BMC Genomics, Volume 17 (2016), pp. 165-174 | DOI

[78] Wu, Y.; Rasmussen, M.; Bansal, M.; Kellis, M. Most parsimonious reconciliation in the pres-ence of gene duplication, loss, and deep coalescence using labeled coalescent trees, Genome Re-search, Volume 24 (2014), pp. 475-486 | DOI

[79] Yan, Z.; Smith, M.; Du, P.; Hahn, M.; Nakhleh, L. Species tree inference methods intended to deal with incomplete lineage sorting are robust to the presence of paralogs, Systematic Biology, Volume 71 (2022), pp. 367-381 | DOI

[80] Yang, Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods, Journal of Molecular Evolution, Volume 39 (1994), pp. 306-314 | DOI

[81] Yin, J.; Zhang, C.; Mirarab, S. ASTRAL-MP: scaling ASTRAL to very large datasets using ran-domization and parallelization, Bioinformatics, Volume 35 (2019), pp. 3961-3969 | DOI

[82] Zhang, C.; Mirarab, S. ASTRAL-Pro 2: ultrafast species tree reconstruction from multi-copy gene family trees, Bioinformatics, Volume 38 (2022), pp. 4949-4950 | DOI

[83] Zhang, C.; Mirarab, S. Weighting by gene tree uncertainty improves accuracy of quartet-based species trees, Molecular Biology and Evolution, Volume 39 (2022), p. 215 | DOI

[84] Zhang, C.; Rabiee, M.; Sayyari, E.; Mirarab, S. ASTRAL-III: polynomial time species tree recon-struction from partially resolved gene trees, BMC Bioinformatics, Volume 19 (2018), pp. 15-30 | DOI

[85] Zhang, C.; Scornavacca, C.; Molloy, E.; Mirarab, S. ASTRAL-Pro: quartet-based species-tree inference despite paralogy, Molecular Biology and Evolution, Volume 37 (2020), pp. 3292-3307 | DOI

[86] Zhu, T.; Flouri, T.; Yang, Z. A simulation study to examine the impact of recombination on phylogenomic inferences under the multispecies coalescent model, Molecular Ecology, Volume 31 (2022), pp. 2814-2829 | DOI

Cited by Sources: