Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaks

10.24072/pcjournal.187 - Peer Community Journal, Volume 2 (2022), article no. e68.

Get full text PDF Peer reviewed and recommended by PCI

1. Mutation, the source of genetic diversity, is the raw material of evolution; however, the mutation process remains understudied, especially in plants. Using both a simulation and reanalysis framework, we set out to explore and demonstrate the improved performance of variant callers developed for cancer research compared to single nucleotide polymorphism (SNP) callers in detecting de novo somatic mutations.

2. In an in silico experiment, we generated Illumina-like sequence reads spiked with simulated mutations at different allelic fractions to compare the performance of seven commonly-used variant callers to recall them. More empirically, we then reanalyzed two of the largest datasets available for plants, both developed for identifying within-individual variation in long-lived pedunculate oaks.

3. Based on the in silico experiment, variant callers developed for cancer research outperform SNP callers regarding plant mutation recall and precision, especially at low allele frequency. Such variants at low allelic fractions are typically expected for within-individual de novo plant mutations, which initially appear in single cells. Reanalysis of published oak data with Strelka2, the best-performing caller based on our simulations, identified up to 3.4x more candidate somatic mutations than reported in the original studies.

4. Our results advocate the use of cancer research callers to boost de novo mutation research in plants, and to reconcile empirical reports with theoretical expectations.

Published online:
DOI: 10.24072/pcjournal.187
Schmitt, Sylvain 1; Leroy, Thibault 2, 3; Heuertz, Myriam 4; Tysklind, Niklas 5

1 CNRS, UMR EcoFoG (Agroparistech, Cirad, INRAE, Université des Antilles, Université de la Guyane), Campus Agronomique, 97310 Kourou, French Guiana
2 Department of Botany and Biodiversity Research, University of Vienna, Rennweg 14, A-1030 Vienna, Austria
3 IRHS-UMR1345, Université d’Angers, INRAE, Institut Agro, SFR 4207 QuaSaV, 49071 Beaucouzé, France
4 Université Bordeaux, INRAE, BIOGECO, 69 route d’Arcachon, CS 80227, 33612 Cestas Cedex, France
5 INRAE, UMR EcoFoG (Agroparistech, CNRS, Cirad, Université des Antilles, Université de la Guyane), Campus Agronomique, 97310 Kourou, French Guiana
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
     author = {Schmitt, Sylvain and Leroy, Thibault and Heuertz, Myriam and Tysklind, Niklas},
     title = {Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaks},
     journal = {Peer Community Journal},
     eid = {e68},
     publisher = {Peer Community In},
     volume = {2},
     year = {2022},
     doi = {10.24072/pcjournal.187},
     url = {}
AU  - Schmitt, Sylvain
AU  - Leroy, Thibault
AU  - Heuertz, Myriam
AU  - Tysklind, Niklas
TI  - Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaks
JO  - Peer Community Journal
PY  - 2022
DA  - 2022///
VL  - 2
PB  - Peer Community In
UR  -
UR  -
DO  - 10.24072/pcjournal.187
ID  - 10_24072_pcjournal_187
ER  - 
%0 Journal Article
%A Schmitt, Sylvain
%A Leroy, Thibault
%A Heuertz, Myriam
%A Tysklind, Niklas
%T Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaks
%J Peer Community Journal
%D 2022
%V 2
%I Peer Community In
%R 10.24072/pcjournal.187
%F 10_24072_pcjournal_187
Schmitt, Sylvain; Leroy, Thibault; Heuertz, Myriam; Tysklind, Niklas. Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaks. Peer Community Journal, Volume 2 (2022), article  no. e68. doi : 10.24072/pcjournal.187.

Peer reviewed and recommended by PCI : 10.24072/pci.genomics.100024

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

[1] Alioto, T. S.; Buchhalter, I.; Derdak, S.; Hutter, B.; Eldridge, M. D.; Hovig, E.; Heisler, L. E.; Beck, T. A.; Simpson, J. T.; Tonon, L.; Sertier, A.-S.; Patch, A.-M.; Jäger, N.; Ginsbach, P.; Drews, R.; Paramasivam, N.; Kabbe, R.; Chotewutmontri, S.; Diessl, N.; Previti, C.; Schmidt, S.; Brors, B.; Feuerbach, L.; Heinold, M.; Gröbner, S.; Korshunov, A.; Tarpey, P. S.; Butler, A. P.; Hinton, J.; Jones, D.; Menzies, A.; Raine, K.; Shepherd, R.; Stebbings, L.; Teague, J. W.; Ribeca, P.; Giner, F. C.; Beltran, S.; Raineri, E.; Dabad, M.; Heath, S. C.; Gut, M.; Denroche, R. E.; Harding, N. J.; Yamaguchi, T. N.; Fujimoto, A.; Nakagawa, H.; Quesada, V.; Valdés-Mas, R.; Nakken, S.; Vodák, D.; Bower, L.; Lynch, A. G.; Anderson, C. L.; Waddell, N.; Pearson, J. V.; Grimmond, S. M.; Peto, M.; Spellman, P.; He, M.; Kandoth, C.; Lee, S.; Zhang, J.; Létourneau, L.; Ma, S.; Seth, S.; Torrents, D.; Xi, L.; Wheeler, D. A.; López-Otín, C.; Campo, E.; Campbell, P. J.; Boutros, P. C.; Puente, X. S.; Gerhard, D. S.; Pfister, S. M.; McPherson, J. D.; Hudson, T. J.; Schlesner, M.; Lichter, P.; Eils, R.; Jones, D. T. W.; Gut, I. G. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing, Nature Communications, Volume 6 (2015) no. 1 | DOI

[2] Auwera, G. A.; Carneiro, M. O.; Hartl, C.; Poplin, R.; del Angel, G.; Levy‐Moonshine, A.; Jordan, T.; Shakir, K.; Roazen, D.; Thibault, J.; Banks, E.; Garimella, K. V.; Altshuler, D.; Gabriel, S.; DePristo, M. A. From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline, Current Protocols in Bioinformatics, Volume 43 (2013) no. 1, pp. 483-492 | DOI

[3] Benjamin, D.; Sato, T.; Cibulskis, K.; Getz, G.; Stewart, C.; Lichtenstein, L. Calling Somatic SNVs and Indels with Mutect2, bioRxiv, 2019 | DOI

[4] Bolger, A. M.; Lohse, M.; Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data, Bioinformatics, Volume 30 (2014) no. 15, pp. 2114-2120 | DOI

[5] Burian, A. Does Shoot Apical Meristem Function as the Germline in Safeguarding Against Excess of Mutations?, Frontiers in Plant Science, Volume 12 (2021) | DOI

[6] Cagan, A.; Baez-Ortega, A.; Brzozowska, N.; Abascal, F.; Coorens, T. H. H.; Sanders, M. A.; Lawson, A. R. J.; Harvey, L. M. R.; Bhosle, S.; Jones, D.; Alcantara, R. E.; Butler, T. M.; Hooks, Y.; Roberts, K.; Anderson, E.; Lunn, S.; Flach, E.; Spiro, S.; Januszczak, I.; Wrigglesworth, E.; Jenkins, H.; Dallas, T.; Masters, N.; Perkins, M. W.; Deaville, R.; Druce, M.; Bogeska, R.; Milsom, M. D.; Neumann, B.; Gorman, F.; Constantino-Casas, F.; Peachey, L.; Bochynska, D.; Smith, E. S. J.; Gerstung, M.; Campbell, P. J.; Murchison, E. P.; Stratton, M. R.; Martincorena, I. Somatic mutation rates scale with lifespan across mammals, Nature, Volume 604 (2022) no. 7906, pp. 517-524 | DOI

[7] Chen, Z.-L.; Meng, J.-M.; Cao, Y.; Yin, J.-L.; Fang, R.-Q.; Fan, S.-B.; Liu, C.; Zeng, W.-F.; Ding, Y.-H.; Tan, D.; Wu, L.; Zhou, W.-J.; Chi, H.; Sun, R.-X.; Dong, M.-Q.; He, S.-M. A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides, Nature Communications, Volume 10 (2019) no. 1 | DOI

[8] Fan, Y.; Xi, L.; Hughes, D. S. T.; Zhang, J.; Zhang, J.; Futreal, P. A.; Wheeler, D. A.; Wang, W. MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data, Genome Biology, Volume 17 (2016) no. 1 | DOI

[9] Gao, Z.; Moorjani, P.; Sasani, T. A.; Pedersen, B. S.; Quinlan, A. R.; Jorde, L. B.; Amster, G.; Przeworski, M. Overlooked roles of DNA damage and maternal age in generating human germline mutations, Proceedings of the National Academy of Sciences, Volume 116 (2019) no. 19, pp. 9491-9500 | DOI

[10] Garrison, E.; Marth, G. Haplotype-based variant detection from short-read sequencing, arXiv (2012) | DOI

[11] Gourlé, H.; Karlsson-Lindsjö, O.; Hayer, J.; Bongcam-Rudloff, E. Simulating Illumina metagenomic data with InSilicoSeq, Bioinformatics, Volume 35 (2019) no. 3, pp. 521-522 | DOI

[12] Hanlon, V. C. T.; Otto, S. P.; Aitken, S. N. Somatic mutations substantially increase the per‐generation mutation rate in the conifer Picea sitchensis, Evolution Letters, Volume 3 (2019) no. 4, pp. 348-358 | DOI

[13] Harris, K.; Pritchard, J. K. Rapid evolution of the human mutation spectrum, eLife, Volume 6 (2017) | DOI

[14] Kent, W. J. BLAT---The BLAST-Like Alignment Tool, Genome Research, Volume 12 (2002) no. 4, pp. 656-664 | DOI

[15] Kim, S.; Scheffler, K.; Halpern, A. L.; Bekritsky, M. A.; Noh, E.; Källberg, M.; Chen, X.; Kim, Y.; Beyter, D.; Krusche, P.; Saunders, C. T. Strelka2: fast and accurate calling of germline and somatic variants, Nature Methods, Volume 15 (2018) no. 8, pp. 591-594 | DOI

[16] Koboldt, D. C.; Chen, K.; Wylie, T.; Larson, D. E.; McLellan, M. D.; Mardis, E. R.; Weinstock, G. M.; Wilson, R. K.; Ding, L. VarScan: variant detection in massively parallel sequencing of individual and pooled samples, Bioinformatics, Volume 25 (2009) no. 17, pp. 2283-2285 | DOI

[17] Köster, J.; Rahmann, S. Snakemake--a scalable bioinformatics workflow engine, Bioinformatics, Volume 28 (2012) no. 19, pp. 2520-2522 | DOI

[18] Kurtzer, G. M.; Sochat, V.; Bauer, M. W. Singularity: Scientific containers for mobility of compute, PLOS ONE, Volume 12 (2017) no. 5 | DOI

[19] Lanfear, R. Do plants have a segregated germline?, PLOS Biology, Volume 16 (2018) no. 5 | DOI

[20] Larson, D. E.; Harris, C. C.; Chen, K.; Koboldt, D. C.; Abbott, T. E.; Dooling, D. J.; Ley, T. J.; Mardis, E. R.; Wilson, R. K.; Ding, L. SomaticSniper: identification of somatic point mutations in whole genome sequencing data, Bioinformatics, Volume 28 (2011) no. 3, pp. 311-317 | DOI

[21] Li, H.; Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics, Volume 25 (2009) no. 14, pp. 1754-1760 | DOI

[22] Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R. The Sequence Alignment/Map format and SAMtools, Bioinformatics, Volume 25 (2009) no. 16, pp. 2078-2079 | DOI

[23] Lynch, M.; Ackerman, M. S.; Gout, J.-F.; Long, H.; Sung, W.; Thomas, W. K.; Foster, P. L. Genetic drift, selection and the evolution of the mutation rate, Nature Reviews Genetics, Volume 17 (2016) no. 11, pp. 704-714 | DOI

[24] Martincorena, I.; Campbell, P. J. Somatic mutation in cancer and normal cells, Science, Volume 349 (2015) no. 6255, pp. 1483-1489 | DOI

[25] Milholland, B.; Dong, X.; Zhang, L.; Hao, X.; Suh, Y.; Vijg, J. Differences between germline and somatic mutation rates in humans and mice, Nature Communications, Volume 8 (2017) no. 1 | DOI

[26] Orr, A. J.; Padovan, A.; Kainer, D.; Külheim, C.; Bromham, L.; Bustos-Segura, C.; Foley, W.; Haff, T.; Hsieh, J.-F.; Morales-Suarez, A.; Cartwright, R. A.; Lanfear, R. A phylogenomic approach reveals a low somatic mutation rate in a long-lived plant, Proceedings of the Royal Society B: Biological Sciences, Volume 287 (2020) no. 1922 | DOI

[27] Plomion, C.; Aury, J.-M.; Amselem, J.; Leroy, T.; Murat, F.; Duplessis, S.; Faye, S.; Francillonne, N.; Labadie, K.; Le Provost, G.; Lesur, I.; Bartholomé, J.; Faivre-Rampant, P.; Kohler, A.; Leplé, J.-C.; Chantret, N.; Chen, J.; Diévart, A.; Alaeitabar, T.; Barbe, V.; Belser, C.; Bergès, H.; Bodénès, C.; Bogeat-Triboulot, M.-B.; Bouffaud, M.-L.; Brachi, B.; Chancerel, E.; Cohen, D.; Couloux, A.; Da Silva, C.; Dossat, C.; Ehrenmann, F.; Gaspin, C.; Grima-Pettenati, J.; Guichoux, E.; Hecker, A.; Herrmann, S.; Hugueney, P.; Hummel, I.; Klopp, C.; Lalanne, C.; Lascoux, M.; Lasserre, E.; Lemainque, A.; Desprez-Loustau, M.-L.; Luyten, I.; Madoui, M.-A.; Mangenot, S.; Marchal, C.; Maumus, F.; Mercier, J.; Michotey, C.; Panaud, O.; Picault, N.; Rouhier, N.; Rué, O.; Rustenholz, C.; Salin, F.; Soler, M.; Tarkka, M.; Velt, A.; Zanne, A. E.; Martin, F.; Wincker, P.; Quesneville, H.; Kremer, A.; Salse, J. Oak genome reveals facets of long lifespan, Nature Plants, Volume 4 (2018) no. 7, pp. 440-452 | DOI

[28] Prevost, L.; Knight, J.; Smith, M.; Lurain, U. Student writing reveals their heterogeneous thinking about the origin of genetic variation in populations, National Association on Research in Science Teaching, 2013 (

[29] Quinlan, A. R.; Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features, Bioinformatics, Volume 26 (2010) no. 6, pp. 841-842 | DOI

[30] Ren, Y.; He, Z.; Liu, P.; Traw, B.; Sun, S.; Tian, D.; Yang, S.; Jia, Y.; Wang, L. Somatic Mutation Analysis in Salix suchowensis Reveals Early-Segregated Cell Lineages, Molecular Biology and Evolution, Volume 38 (2021) no. 12, pp. 5292-5308 | DOI

[31] Schmid-Siegert, E.; Sarkar, N.; Iseli, C.; Calderon, S.; Gouhier-Darimont, C.; Chrast, J.; Cattaneo, P.; Schütz, F.; Farinelli, L.; Pagni, M.; Schneider, M.; Voumard, J.; Jaboyedoff, M.; Fankhauser, C.; Hardtke, C. S.; Keller, L.; Pannell, J. R.; Reymond, A.; Robinson-Rechavi, M.; Xenarios, I.; Reymond, P. Low number of fixed somatic mutations in a long-lived oak tree, Nature Plants, Volume 3 (2017) no. 12, pp. 926-929 | DOI

[32] Schmitt, S. generateMutations: singularity & snakemake workflow to generate in silico mutations, Zenodo, 2022 | DOI

[33] Schmitt, S. detectMutations: singularity & snakemake workflow to detect mutations with several callers, Zenodo (2022) | DOI

[34] Schmitt, S.; Leroy, T.; Heuertz, M.; Tysklind, N. Supplementary material of Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaks, Zenodo, 2022 | DOI

[35] Schoen, D. J.; Schultz, S. T. Somatic Mutation and Evolution in Plants, Annual Review of Ecology, Evolution, and Systematics, Volume 50 (2019) no. 1, pp. 49-73 | DOI

[36] Schöngart, J.; Bräuning, A.; Barbosa, A.; Lisi, C.; Oliveira, J. Dendroecology, Ecological Studies, Springer International Publishing, Cham, 2017 | DOI

[37] Smith, M. K.; Knight, J. K. Using the Genetics Concept Assessment to Document Persistent Conceptual Difficulties in Undergraduate Genetics Courses, Genetics, Volume 191 (2012) no. 1, pp. 21-32 | DOI

[38] Wang, L.; Ji, Y.; Hu, Y.; Hu, H.; Jia, X.; Jiang, M.; Zhang, X.; Zhao, L.; Zhang, Y.; Jia, Y.; Qin, C.; Yu, L.; Huang, J.; Yang, S.; Hurst, L. D.; Tian, D. The architecture of intra-organism mutation rate variation in plants, PLOS Biology, Volume 17 (2019) no. 4 | DOI

[39] Watson, J. M.; Platzer, A.; Kazda, A.; Akimcheva, S.; Valuchova, S.; Nizhynska, V.; Nordborg, M.; Riha, K. Germline replications and somatic mutation accumulation are independent of vegetative life span in Arabidopsis, Proceedings of the National Academy of Sciences, Volume 113 (2016) no. 43, pp. 12226-12231 | DOI

[40] Weismann, A. The germ-plasm: a theory of heredity, W. Scott, London, 1893 | DOI

[41] Weng, M.-L.; Becker, C.; Hildebrandt, J.; Neumann, M.; Rutter, M. T.; Shaw, R. G.; Weigel, D.; Fenster, C. B. Fine-Grained Analysis of Spontaneous Mutation Spectrum and Frequency in Arabidopsis thaliana, Genetics, Volume 211 (2019) no. 2, pp. 703-714 | DOI

[42] Yue, J.-X.; Liti, G. simuG: a general-purpose genome simulator, Bioinformatics, Volume 35 (2019) no. 21, pp. 4442-4444 | DOI

[43] Zahradníková, E.; Ficek, A.; Brejová, B.; Vinař, T.; Mičieta, K. Mosaicism in old trees and its patterns, Trees, Volume 34 (2019) no. 2, pp. 357-370 | DOI

Cited by Sources: