Section: Mathematical & Computational Biology
Topic: Statistics, Biophysics and computational biology, Genetics/genomics

A new iterative framework for simulation-based population genetic inference with improved coverage properties of confidence intervals

Corresponding author(s): Rousset, François (francois.rousset@umontpellier.fr)

10.24072/pcjournal.721 - Peer Community Journal, Volume 6 (2026), article no. e43

Get full text PDF Peer reviewed and recommended by PCI

Simulation-based methods such as approximate Bayesian computation (ABC) are widely used to infer the evolutionary history of populations from molecular genetic data. We describe and evaluate a new iterative method of statistical inference about model parameters, which revisits the idea of inferring a likelihood surface using simulation when the likelihood function cannot be evaluated. It is based on combining the random forest machine learning method, and multivariate Gaussian mixture (MGM) models, in an effective inference workflow, here used to fit models with up to 15 variable parameters. In addition to the traditional assessment of precision in terms of bias and mean square error, we also evaluate the coverage of confidence intervals. The method is compared with approximate Bayesian computation using random forests (ABC-RF), a non-iterative method sharing some technical features with the proposed approach, across scenarios of historical demographic inference from population genetic data. It is also compared to another iterative method, sequential neural likelihood estimation (SNLE). These comparisons highlight the importance of an iterative workflow for exploring the parameter space efficiently. For equivalent simulation effort of the data-generating process, the new summary-likelihood method provides intervals whose coverage is better controlled than the marginal coverage of intervals provided by ABC with random forests, and than generally reported for ABC methods. The iterative workflow can also yield greater improvements in estimator precision when larger datasets are used.

Published online:
DOI: 10.24072/pcjournal.721
Type: Research article
Classification:
Keywords: statistical inference; simulation-based inference; demographic history; polupation genetics

Rousset, François  1 ; Leblois, Raphaël  2 ; Estoup, Arnaud  2 ; Marin, Jean-Michel  3

1 ISEM, Univ Montpellier, CNRS, IRD, 34095 Montpellier, France
2 CBGP, INRAE, CIRAD, IRD, Institut Agro, Univ Montpellier, 34980 Montferrier-sur-Lez, France
3 IMAG, Univ Montpellier, 34095 Montpellier, France
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
Rousset, F.; Leblois, R.; Estoup, A.; Marin, J.-M. A new iterative framework for simulation-based population genetic inference with improved coverage properties of confidence intervals. Peer Community Journal, Volume 6 (2026), article  no. e43. https://doi.org/10.24072/pcjournal.721
@article{10_24072_pcjournal_721,
     author = {Rousset, Fran\c{c}ois and Leblois, Rapha\"el and Estoup, Arnaud and Marin, Jean-Michel},
     title = {A new iterative framework for simulation-based population genetic inference with improved coverage properties of confidence intervals
},
     journal = {Peer Community Journal},
     eid = {e43},
     year = {2026},
     publisher = {Peer Community In},
     volume = {6},
     doi = {10.24072/pcjournal.721},
     language = {en},
     url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.721/}
}
TY  - JOUR
AU  - Rousset, François
AU  - Leblois, Raphaël
AU  - Estoup, Arnaud
AU  - Marin, Jean-Michel
TI  - A new iterative framework for simulation-based population genetic inference with improved coverage properties of confidence intervals

JO  - Peer Community Journal
PY  - 2026
VL  - 6
PB  - Peer Community In
UR  - https://peercommunityjournal.org/articles/10.24072/pcjournal.721/
DO  - 10.24072/pcjournal.721
LA  - en
ID  - 10_24072_pcjournal_721
ER  - 
%0 Journal Article
%A Rousset, François
%A Leblois, Raphaël
%A Estoup, Arnaud
%A Marin, Jean-Michel
%T A new iterative framework for simulation-based population genetic inference with improved coverage properties of confidence intervals

%J Peer Community Journal
%] e43
%D 2026
%V 6
%I Peer Community In
%U https://peercommunityjournal.org/articles/10.24072/pcjournal.721/
%R 10.24072/pcjournal.721
%G en
%F 10_24072_pcjournal_721

PCI peer reviews and recommendation, and links to data, scripts, code and supplementary information: 10.24072/pci.mcb.100426

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

[1] Akeret, J.; Refregier, A.; Amara, A.; Seehars, S.; Hasner, C. Approximate Bayesian computation for forward modeling in cosmology, Journal of Cosmology and Astroparticle Physics, Volume 2015 (2015) no. 08, p. 043 | DOI

[2] Auger-Méthé, M.; Newman, K.; Cole, D.; Empacher, F.; Gryba, R.; King, A. A.; Leos-Barajas, V.; Mills Flemming, J.; Nielsen, A.; Petris, G.; Thomas, L. A guide to state–space modeling of ecological time series, Ecological Monographs, Volume 91 (2021) no. 4, p. e01470 | DOI

[3] Bartlett, M. S. Properties of sufficiency and statistical tests, Proceedings of the Royal Society (London) A, Volume 160 (1937), pp. 268-282 | DOI

[4] Beaumont, M. A. Approximate Bayesian Computation in evolution and ecology, Annual Review of Ecology, Evolution and Systematics, Volume 41 (2010), pp. 379-406 | DOI

[5] Beaumont, M. A.; Cornuet, J.-M.; Marin, J.-M.; Robert, C. P. Adaptive approximate Bayesian computation, Biometrika, Volume 96 (2009) no. 4, pp. 983-990 | DOI

[6] Beaumont, M. A.; Zhang, W.; Balding, D. J. Approximate Bayesian computation in population genetics, Genetics, Volume 162 (2002), pp. 2025-2035 | DOI

[7] Bertorelle, G.; Benazzo, A.; Mona, S. ABC as a flexible framework to estimate demography over space and time: some cons, many pros, Molecular Ecology, Volume 19 (2010) no. 13, pp. 2609-2625 | DOI

[8] Blum, M. G. B.; François, O. Non-linear regression models for approximate Bayesian computation, Statistics and Computing, Volume 20 (2010), pp. 63-73 | DOI

[9] Blum, M. G. B.; Nunes, M. A.; Prangle, D.; Sisson, S. A. A comparative review of dimension reduction methods in approximate Bayesian computation, Statistical Science, Volume 28 (2013), pp. 189-208 | DOI

[10] Bonassi, F. V.; You, L.; West, M. Bayesian learning from marginal data in bionetwork models, Statistical Applications in Genetics and Molecular Biology, Volume 10 (2011) no. 1 | DOI

[11] Breiman, L. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author), Statistical Science, Volume 16 (2001) no. 3, pp. 199-231 | DOI

[12] Casella, G.; Berger, R. L. Statistical inference, Duxbury, Pacific Grove, CA, 2002

[13] Collin, F.-D.; Durif, G.; Raynal, L.; Lombaert, E.; Gautier, M.; Vitalis, R.; Marin, J.-M.; Estoup, A. Extending approximate Bayesian computation with supervised machine learning to infer demographic history from genetic polymorphisms using DIYABC Random Forest, Molecular Ecology Resources, Volume 21 (2021) no. 8, pp. 2598-2613 | DOI

[14] Cranmer, K.; Brehmer, J.; Louppe, G. The frontier of simulation-based inference, Proceedings of the National Academy of Sciences, Volume 117 (2019), pp. 30055-30062 | DOI

[15] Dalmasso, N.; Masserano, L.; Zhao, D.; Izbicki, R.; Lee, A. B. Likelihood-free frequentist inference: bridging classical statistics and machine learning for reliable simulator-based inference, Electronic Journal of Statistics, Volume 18 (2024) no. 2, pp. 5045-5090 | DOI

[16] Daly, A. C.; Gavaghan, D.; Cooper, J.; Tavener, S. Inference-based assessment of parameter identifiability in nonlinear biological models, Journal of The Royal Society Interface, Volume 15 (2018) no. 144, p. 20180318 | DOI

[17] Davison, A. C. Statistical models, Cambridge Univ.\ Press, Cambridge, UK, 2003 | DOI

[18] Davison, A. C.; Hinkley, D. V. Bootstrap methods and their applications, Cambridge Univ.\ Press, 1997 | DOI

[19] Del Moral, P.; Doucet, A.; Jasra, A. Sequential Monte Carlo samplers, Journal of the Royal Statistical Society Series B: Statistical Methodology, Volume 68 (2006) no. 3, pp. 411-436 | DOI

[20] Del Moral, P.; Doucet, A.; Jasra, A. An adaptive sequential Monte Carlo method for approximate Bayesian computation, Statistics and Computing, Volume 22 (2012) no. 5, pp. 1009-1020 | DOI

[21] Deleforge, A.; Forbes, F.; Horaud, R. High-dimensional regression with gaussian mixtures and partially-latent response variables, Statistics and Computing, Volume 25 (2014) no. 5, pp. 893-911 | DOI

[22] Diggle, P. J.; Gratton, R. J. Monte Carlo methods of inference for implicit statistical models, Journal of the Royal Statistical Society B, Volume 46 (1984), pp. 193-227 | DOI

[23] Dinh, K. N.; Liu, C.; Xiang, Z.; Liu, Z.; Tavaré, S. Approximate Bayesian Computation sequential Monte Carlo via random forests, Statistics and Computing, Volume 35 (2025) no. 6 | DOI

[24] Druilhet, P. Computationally efficient iterative summary-likelihood inference for calibrated uncertainty in population genetics, Peer Community in Mathematical and Computational Biology (2026) | DOI

[25] Fan, Y.; Meikle, S. R.; Angelis, G. I.; Sitek, A. ABC in nuclear imaging, Handbook of Approximate Bayesian Computation, CRC Press, 2019, pp. 623-647 | DOI

[26] Fan, Y.; Nott, D. J.; Sisson, S. A. Approximate Bayesian computation via regression density estimation, Stat, Volume 2 (2013) no. 1, pp. 34-48 | DOI

[27] Fearnhead, P.; Prangle, D. Constructing summary statistics for Approximate Bayesian Computation: semi-automatic Approximate Bayesian Computation, Journal of the Royal Statistical Society Series B: Statistical Methodology, Volume 74 (2012) no. 3, pp. 419-474 | DOI

[28] Fraimout, A.; Debat, V.; Fellous, S.; Hufbauer, R. A.; Foucaud, J.; Pudlo, P.; Marin, J.-M.; Price, D. K.; Cattel, J.; Chen, X.; Deprá, M.; François Duyck, P.; Guedot, C.; Kenis, M.; Kimura, M. T.; Loeb, G.; Loiseau, A.; Martinez-Sañudo, I.; Pascual, M.; Polihronakis Richmond, M.; Shearer, P.; Singh, N.; Tamura, K.; Xuéreb, A.; Zhang, J.; Estoup, A. Deciphering the routes of invasion of \emphDrosophila suzukii by means of ABC random forest, Molecular Biology and Evolution, Volume 34 (2017), pp. 980-996 | DOI

[29] Frazier, D. T.; Kelly, R.; Drovandi, C.; Warne, D. J. The statistical accuracy of neural posterior and likelihood estimation, arXiv, 2024 | DOI

[30] Germain, M.; Gregor, K.; Murray, I.; Larochelle, H. MADE: Masked Autoencoder for Distribution Estimation, Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Volume 37, PMLR, 2015, pp. 881-889

[31] Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees, Machine Learning, Volume 36 (2006), pp. 3-42 | DOI

[32] Greenberg, D.; Nonnenmacher, M.; Macke, J. Automatic posterior transformation for likelihood-free inference, Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Volume 97, PMLR, 2019, pp. 2404-2414

[33] Häggström, H.; Rodrigues, P. L. C.; Oudoumanessah, G.; Forbes, F.; Picchini, U. Fast, accurate and lightweight sequential simulation-based inference using Gaussian locally linear mappings, arXiv, 2024 | DOI

[34] Hastie, T.; Tibshirani, R.; Friedman, J. The elements of statistical learning: data mining, inference and prediction, Springer, 2009 | DOI

[35] Hermans, J.; Delaunoy, A.; Rozet, F.; Wehenkel, A.; Begy, V.; Louppe, G. A trust crisis in simulation-based inference? Your posterior approximations can be unfaithful, arxiv, 2022 | DOI

[36] Laugier, F.; Béthune, K.; Plumel, F.; Froissard, C.; Donnay, J.-M.; Chenin, T.; Rousset, F.; David, P. Cytoplasmic male sterility declines in the presence of resistant nuclear backgrounds, The American Naturalist, Volume 206 (2025), pp. 16-30 | DOI

[37] Lebret, R.; Iovleff, S.; Langrognet, F.; Biernacki, C.; Celeux, G.; Govaert, G. Rmixmod: The R package of the model-based unsupervised, supervised, and semi-supervised classification Mixmod library, Journal of Statistical Software, Volume 67 (2015) no. 6, pp. 1-29 | DOI

[38] Lehmann, E. L.; Casella, G. Theory of point estimation, Springer-Verlag, New York, 1998

[39] Lombaert, E.; Guillemaud, T.; Thomas, C. E.; Lawson Handley, L. J.; Li, J.; Wang, S.; Pang, H.; Goryacheva, I.; Zakharov, I. A.; Jousselin, E.; Poland, R. L.; Migeon, A.; Van Lenteren, J.; De Clercq, P.; Berkvens, N.; Jones, W.; Estoup, A. Inferring the origin of populations introduced from a genetically structured native range by approximate Bayesian computation: case study of the invasive ladybird \emphHarmonia axyridis, Molecular Ecology, Volume 20 (2011) no. 22, pp. 4654-4670 | DOI

[40] Lopez-Paz, D.; Oquab, M. Revisiting classifier two-sample tests, International Conference on Learning Representations, 2017

[41] Lueckmann, J.-M.; Boelts, J.; Greenberg, D. S.; Gonçalves, P. J.; Macke, J. H. Benchmarking simulation-based inference, Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) (Proceedings of Machine Learning Research), Volume 130, 2021, pp. 343-351

[42] Lueckmann, J.-M.; Goncalves, P. J.; Bassetto, G.; Öcal, K.; Nonnenmacher, M.; Macke, J. H. Flexible statistical inference for mechanistic models of neural dynamics, Advances in Neural Information Processing Systems 30, Curran Associates, Inc., 2017, pp. 1289-1299

[43] Marin, J.-M.; Raynal, L.; Pudlo, P.; Robert, C. P.; Estoup, A. abcrf: Approximate Bayesian Computation via Random Forests. R package version 2.0, 2025 | DOI

[44] Moshe, A.; Wygoda, E.; Ecker, N.; Loewenthal, G.; Avram, O.; Israeli, O.; Hazkani-Covo, E.; Pe'er, I.; Pupko, T. An approximate bayesian computation approach for modeling genome rearrangements, Molecular Biology and Evolution, Volume 39 (2022), p. msac231 | DOI

[45] Nakagome, S.; Alkorta-Aranburu, G.; Amato, R.; Howie, B.; Peter, B. M.; Hudson, R. R.; Di Rienzo, A. Estimating the ages of selection signals from different epochs in human history, Molecular Biology and Evolution, Volume 33 (2015), pp. 657-669 | DOI

[46] Neyman, J. Frequentist probability and frequentist statistics, Synthese, Volume 36 (1977), pp. 97-131 | DOI

[47] Papamakarios, G.; Murray, I. Fast ϵ-free inference of simulation models with bayesian conditional density estimation, Advances in Neural Information Processing Systems 29, Curran Associates, Inc., 2016, pp. 1028-1036

[48] Papamakarios, G.; Pavlakou, T.; Murray, I. Masked autoregressive flow for density estimation, Advances in Neural Information Processing Systems 30, Curran Associates, Inc., 2017, pp. 2335-2344

[49] Papamakarios, G.; Sterratt, D.; Murray, I. Sequential neural likelihood: fast likelihood-free inference with autoregressive flows, Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) (Proceedings of Machine Learning Research), Volume 89, PMLR, 2019, pp. 837-848

[50] Prangle, D. Adapting the ABC distance function, Bayesian Analysis, Volume 12 (2017) no. 1, pp. 289-309 | DOI

[51] Pudlo, P.; Marin, J.-M.; Estoup, A.; Cornuet, J.-M.; Gautier, M.; Robert, C. P. Reliable ABC model choice via random forests, Bioinformatics, Volume 32 (2016), pp. 859-866 | DOI

[52] Quelin, A.; Austerlitz, F.; Jay, F. Assessing simulation-based supervised machine learning for demographic parameter inference from genomic data, Heredity, Volume 134 (2025) no. 7, pp. 417-426 | DOI

[53] Raynal, L.; Marin, J.-M.; Pudlo, P.; Ribatet, M.; Robert, C. P.; Estoup, A. ABC random forests for Bayesian parameter inference, Bioinformatics, Volume 35 (2019), pp. 1720-1728 | DOI

[54] Rousset, F.; Gouy, A.; Martinez-Almoyna, C.; Courtiol, A. The summary-likelihood method and its implementation in the Infusion package, Molecular Ecology Research, Volume 17 (2017) no. 1, pp. 110-119 | DOI

[55] Rousset, F.; Leblois, R.; Estoup, A.; Marin, J.-M. Data, scripts, code, and supplementary information for "A new iterative framework for simulation-based population genetic inference with improved coverage properties of confidence intervals", Zenodo, 2026 | DOI

[56] Rousset, F. Infusion: inference using simulation. R package version 2.3.0, 2025 (https://cran.r-project.org/package=Infusion) | DOI

[57] Rubio, F. J.; Johansen, A. M. A simple approach to maximum intractable likelihood estimation, Electronic Journal of Statistics, Volume 7 (2013), pp. 1632-1654 | DOI

[58] Schälte, Y.; Hasenauer, J. Efficient exact inference for dynamical systems with noisy measurements using sequential approximate Bayesian computation, Bioinformatics, Volume 36 (2020), p. i551-i559 | DOI

[59] Sharrock, L.; Simons, J.; Liu, S.; Beaumont, M. Sequential neural score estimation: likelihood-free inference with conditional score based diffusion models, Proceedings of the 41st International Conference on Machine Learning (Proceedings of Machine Learning Research), Volume 235, PMLR, 2024, pp. 44565-44602

[60] Sisson, S. A.; Fan, Y.; Beaumont, M. A. Handbook of Approximate Bayesian Computation, Chapman and Hall/CRC, Boca Raton, Florida : CRC Press, [2019], 2018 | DOI

[61] Tavaré, S.; Balding, D. J.; Griffiths, R. C.; Donnelly, P. Inferring coalescence times from DNA sequence data, Genetics, Volume 145 (1997), pp. 505-518 | DOI

[62] Tejero-Cantero, A.; Boelts, J.; Deistler, M.; Lueckmann, J.-M.; Durkan, C.; Gonçalves, P. J.; Greenberg, D. S.; Macke, J. H. sbi: A toolkit for simulation-based inference, Journal of Open Source Software, Volume 5 (2020) no. 52, p. 2505 | DOI

[63] The 1000 Genomes Project Consortium An integrated map of genetic variation from 1,092 human genomes, Nature, Volume 491 (2012), pp. 56-65

[64] Toni, T.; Welch, D.; Strelkowa, N.; Ipsen, A.; Stumpf, M. P. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems, Journal of The Royal Society Interface, Volume 6 (2009) no. 31, pp. 187-202 | DOI

[65] Wood, S. N. Statistical inference for noisy nonlinear ecological dynamic systems, Nature, Volume 466 (2010) no. 7310, pp. 1102-1104 | DOI

[66] Wright, M. N.; Ziegler, A. \textttranger: A fast implementation of random forests for high dimensional data in C++ and R, Journal of Statistical Software, Volume 77 (2017) no. 1, pp. 1-17 | DOI

[67] Zammit-Mangion, A.; Sainsbury-Dale, M.; Huser, R. Neural methods for amortized inference, Annual Review of Statistics and Its Application, Volume 12 (2025), pp. 311-335 | DOI

Cited by Sources: