Mathematical & Computational Biology

An efficient algorithm for estimating population history from genetic data

10.24072/pcjournal.132 - Peer Community Journal, Volume 2 (2022), article no. e32.

Get full text PDF Peer reviewed and recommended by PCI

The Legofit statistical package uses genetic data to estimate parameters describing population history. Previous versions used computer simulations to estimate probabilities, an approach that limited both speed and accuracy. This article describes a new deterministic algorithm, which makes Legofit faster and more accurate. The speed of this algorithm declines as model complexity increases. With very complex models, the deterministic algorithm is slower than the stochastic one. In an application to simulated data sets, the estimates produced by the deterministic and stochastic algorithms were essentially identical. Reanalysis of a human data set replicated the findings of a previous study and provided increased support for the hypotheses that (a) early modern humans contributed genes to Neanderthals, and (b) a "superarchaic" population (which separated from all other humans early in the Pleistocene) was either large or deeply subdivided.

Published online:
DOI: 10.24072/pcjournal.132
Rogers, Alan R. 1

1 Dept. of Anthropology, University of Utah, USA
@article{10_24072_pcjournal_132,
     author = {Rogers, Alan R.},
     title = {An efficient algorithm for estimating  population history from genetic data},
     journal = {Peer Community Journal},
     eid = {e32},
     publisher = {Peer Community In},
     volume = {2},
     year = {2022},
     doi = {10.24072/pcjournal.132},
     url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.132/}
}
TY  - JOUR
TI  - An efficient algorithm for estimating  population history from genetic data
JO  - Peer Community Journal
PY  - 2022
DA  - 2022///
VL  - 2
PB  - Peer Community In
UR  - https://peercommunityjournal.org/articles/10.24072/pcjournal.132/
UR  - https://doi.org/10.24072/pcjournal.132
DO  - 10.24072/pcjournal.132
ID  - 10_24072_pcjournal_132
ER  - 
%0 Journal Article
%T An efficient algorithm for estimating  population history from genetic data
%J Peer Community Journal
%D 2022
%V 2
%I Peer Community In
%U https://doi.org/10.24072/pcjournal.132
%R 10.24072/pcjournal.132
%F 10_24072_pcjournal_132
Rogers, Alan R. An efficient algorithm for estimating  population history from genetic data. Peer Community Journal, Volume 2 (2022), article  no. e32. doi : 10.24072/pcjournal.132. https://peercommunityjournal.org/articles/10.24072/pcjournal.132/

Peer reviewed and recommended by PCI : 10.24072/pci.mcb.100003

[1] Andrews, G. E. The Theory of Partitions, Addison Wesley, Reading, MA, 1976

[2] Buckland, S. T.; Burnham, K. P.; Augustin, N. H. Model Selection: An Integral Part of Inference, Biometrics, Volume 53 (1997) no. 2 | Article

[3] Durrett, R. Probability Models for DNA Sequence Evolution, Probability and its Applications, Springer New York, New York, NY, 2008 | Article

[4] Efron, B. Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation, Journal of the American Statistical Association, Volume 78 (1983) no. 382, pp. 316-331 | Article

[5] Efron, B.; Tibshirani, R. J. An Introduction to the Bootstrap, Springer US, Boston, MA, 1993 | Article

[6] Feller, W. An Introduction to Probability Theory and Its Applications, 2nd edition, volume II, Wiley, New York, 1971

[7] Fousse, L.; Hanrot, G.; Lefèvre, V.; Pélissier, P.; Zimmermann, P. MPFR: A Multiple-Precision Binary Floating-Point Library with Correct Rounding, ACM Transactions on Mathematical Software, Volume 33 (2007) no. 2 | Article

[8] Griffiths, R.; Tavaré, S. The age of a mutation in a general coalescent tree, Communications in Statistics. Stochastic Models, Volume 14 (1998) no. 1-2, pp. 273-295 | Article

[9] Kelleher, J.; Etheridge, A. M.; McVean, G. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes, PLOS Computational Biology, Volume 12 (2016) no. 5 | Article

[10] Kimura, M. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations, Genetics, Volume 61 (1969) no. 4, pp. 893-903 | Article

[11] Knuth, D. E. The Art of Computer Programming: Volume 4A, Combinatorial Algorithms. Part 1, Addison-Wesley, New York, 2011

[12] Kuhlwilm, M.; Gronau, I.; Hubisz, M. J.; de Filippo, C.; Prado-Martinez, J.; Kircher, M.; Fu, Q.; Burbano, H. A.; Lalueza-Fox, C.; de la Rasilla, M.; Rosas, A.; Rudan, P.; Brajkovic, D.; Kucan, Ž.; Gušic, I.; Marques-Bonet, T.; Andrés, A. M.; Viola, B.; Pääbo, S.; Meyer, M.; Siepel, A.; Castellano, S. Ancient gene flow from early modern humans into Eastern Neanderthals, Nature, Volume 530 (2016) no. 7591, pp. 429-433 | Article

[13] Kullback, S.; Leibler, R. On Information and Sufficiency, The Annals of Mathematical Statistics, Volume 22 (1951), pp. 79-86

[14] Liu, R. Y.; Singh, K. Moving blocks jacknife and bootstrap capture weak dependence, In Raoul LePage and Lynne Billard , editors, Exploring the “Limits” of the Bootstrap (1992), pp. 225-248

[15] Mallick, S.; Li, H.; Lipson, M.; Mathieson, I.; Gymrek, M.; Racimo, F.; Zhao, M.; Chennagiri, N.; Nordenfelt, S.; Tandon, A.; Skoglund, P.; Lazaridis, I.; Sankararaman, S.; Fu, Q.; Rohland, N.; Renaud, G.; Erlich, Y.; Willems, T.; Gallo, C.; Spence, J. P.; Song, Y. S.; Poletti, G.; Balloux, F.; van Driem, G.; de Knijff, P.; Romero, I. G.; Jha, A. R.; Behar, D. M.; Bravi, C. M.; Capelli, C.; Hervig, T.; Moreno-Estrada, A.; Posukh, O. L.; Balanovska, E.; Balanovsky, O.; Karachanak-Yankova, S.; Sahakyan, H.; Toncheva, D.; Yepiskoposyan, L.; Tyler-Smith, C.; Xue, Y.; Abdullah, M. S.; Ruiz-Linares, A.; Beall, C. M.; Di Rienzo, A.; Jeong, C.; Starikovskaya, E. B.; Metspalu, E.; Parik, J.; Villems, R.; Henn, B. M.; Hodoglugil, U.; Mahley, R.; Sajantila, A.; Stamatoyannopoulos, G.; Wee, J. T. S.; Khusainova, R.; Khusnutdinova, E.; Litvinov, S.; Ayodo, G.; Comas, D.; Hammer, M. F.; Kivisild, T.; Klitz, W.; Winkler, C. A.; Labuda, D.; Bamshad, M.; Jorde, L. B.; Tishkoff, S. A.; Watkins, W. S.; Metspalu, M.; Dryomov, S.; Sukernik, R.; Singh, L.; Thangaraj, K.; Pääbo, S.; Kelso, J.; Patterson, N.; Reich, D. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations, Nature, Volume 538 (2016) no. 7624, pp. 201-206 | Article

[16] Meyer, M.; Kircher, M.; Gansauge, M.-T.; Li, H.; Racimo, F.; Mallick, S.; Schraiber, J. G.; Jay, F.; Prüfer, K.; de Filippo, C.; Sudmant, P. H.; Alkan, C.; Fu, Q.; Do, R.; Rohland, N.; Tandon, A.; Siebauer, M.; Green, R. E.; Bryc, K.; Briggs, A. W.; Stenzel, U.; Dabney, J.; Shendure, J.; Kitzman, J.; Hammer, M. F.; Shunkov, M. V.; Derevianko, A. P.; Patterson, N.; Andrés, A. M.; Eichler, E. E.; Slatkin, M.; Reich, D.; Kelso, J.; Pääbo, S. A High-Coverage Genome Sequence from an Archaic Denisovan Individual, Science, Volume 338 (2012) no. 6104, pp. 222-226 | Article

[17] Price , K.; Storn, R. M.; Lampinen, J. A. Differential Evolution: A Practical Approach to Global Optimization, Springer Science and Business Media, Berlin, 2006

[18] Prüfer, K.; de Filippo, C.; Grote, S.; Mafessoni, F.; Korlević, P.; Hajdinjak, M.; Vernot, B.; Skov, L.; Hsieh, P.; Peyrégne, S.; Reher, D.; Hopfe, C.; Nagel, S.; Maricic, T.; Fu, Q.; Theunert, C.; Rogers, R.; Skoglund, P.; Chintalapati, M.; Dannemann, M.; Nelson, B. J.; Key, F. M.; Rudan, P.; Kućan, Ž.; Gušić, I.; Golovanova, L. V.; Doronichev, V. B.; Patterson, N.; Reich, D.; Eichler, E. E.; Slatkin, M.; Schierup, M. H.; Andrés, A. M.; Kelso, J.; Meyer, M.; Pääbo, S. A high-coverage Neandertal genome from Vindija Cave in Croatia, Science, Volume 358 (2017) no. 6363, pp. 655-658 | Article

[19] Prüfer, K.; Racimo, F.; Patterson, N.; Jay, F.; Sankararaman, S.; Sawyer, S.; Heinze, A.; Renaud, G.; Sudmant, P. H.; de Filippo, C.; Li, H.; Mallick, S.; Dannemann, M.; Fu, Q.; Kircher, M.; Kuhlwilm, M.; Lachmann, M.; Meyer, M.; Ongyerth, M.; Siebauer, M.; Theunert, C.; Tandon, A.; Moorjani, P.; Pickrell, J.; Mullikin, J. C.; Vohr, S. H.; Green, R. E.; Hellmann, I.; Johnson, P. L. F.; Blanche, H.; Cann, H.; Kitzman, J. O.; Shendure, J.; Eichler, E. E.; Lein, E. S.; Bakken, T. E.; Golovanova, L. V.; Doronichev, V. B.; Shunkov, M. V.; Derevianko, A. P.; Viola, B.; Slatkin, M.; Reich, D.; Kelso, J.; Pääbo, S. The complete genome sequence of a Neanderthal from the Altai Mountains, Nature, Volume 505 (2013) no. 7481, pp. 43-49 | Article

[20] Rogers, A. R. Legofit: estimating population history from genetic data, BMC Bioinformatics, Volume 20 (2019) no. 1 | Article

[21] Rogers, A. n Efficient Algorithm for Estimating Population History from Genetic Data. Open Science Framework. Code and data for an article of the same name published in Peer Community Journal. OSF, 2021 | Article

[22] Rogers, A. Supplementary Information for “An efficient algorithm for estimating population history from genetic data”. Zenodo, 2022 | Article

[23] Rogers, A. R.; Bohlender, R. J.; Huff, C. D. Early history of Neanderthals and Denisovans, Proceedings of the National Academy of Sciences, Volume 114 (2017) no. 37, pp. 9859-9863 | Article

[24] Rogers, A. R.; Harris, N. S.; Achenbach, A. A. Neanderthal-Denisovan ancestors interbred with a distantly related hominin, Science Advances, Volume 6 (2020) no. 8 | Article

[25] Tavaré, S. Line-of-descent and genealogical processes, and their applications in population genetics models, Theoretical Population Biology, Volume 26 (1984) no. 2, pp. 119-164 | Article

[26] Waddell, P. Happy New Year Homo erectus? More Evidence for Interbreeding with Archaics Predating the Modern Human/Neanderthal Split, ArXiv 1312.7749 (2013)

[27] Waddell PJ et al., Homo denisova, Correspondence Spectral Analysis, Finite Sites Reticulate Hierarchical Coalescent Models and the Ron Jeremy Hypothesis, ArXiv 1112.6424 (2011)

[28] Wooding, S.; Rogers, A. The Matrix Coalescent and an Application to Human Single-Nucleotide Polymorphisms, Genetics, Volume 161 (2002) no. 4, pp. 1641-1650 | Article

Cited by Sources: