Mathematical & Computational Biology

HMMploidy: inference of ploidy levels from short-read sequencing data

10.24072/pcjournal.178 - Peer Community Journal, Volume 2 (2022), article no. e60.

Get full text PDF Peer reviewed and recommended by PCI

The inference of ploidy levels from genomic data is important to understand molecular mechanisms underpinning genome evolution. However, current methods based on allele frequency and sequencing depth variation do not have power to infer ploidy levels at low- and mid-depth sequencing data, as they do not account for data uncertainty. Here we introduce HMMploidy, a novel tool that leverages the information from multiple samples and combines the information from sequencing depth and genotype likelihoods. We demonstrate that HMMploidy outperforms existing methods in most tested scenarios, especially at low-depth with large sample size. We apply HMMploidy to sequencing data from the pathogenic fungus Cryptococcus neoformans and retrieve pervasive patterns of aneuploidy, even when artificially downsampling the sequencing data. We envisage that HMMploidy will have wide applicability to low-depth sequencing data from polyploid and aneuploid species.

Published online:
DOI: 10.24072/pcjournal.178
Soraggi, Samuele 1, 2; Rhodes, Johanna 3; Altinkaya, Isin 2, 4, 5; Tarrant, Oliver 2; Balloux, Francois 6; Fisher, Matthew C 3; Fumagalli, Matteo 2, 7

1 Bioinformatics Research Center (BiRC), University of Aarhus, 8000 Aarhus, Denmark
2 Department of Life Sciences Silwood Park, Imperial College London, Ascot, SL5 7PY, UK
3 MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, W2 1PG, UK
4 Department of Biology, Hacettepe University, 06800 Beytepe Campus, Ankara, Turkey
5 GLOBE, Section for Geogenetics, Øster Voldgade 5-7, 1350, Copenhagen, Denmark
6 UCL Genetics Institute, University College London, London, WC1E 6BT, UK
7 School of Biological and Behavioural Sciences, Queen Mary University of London, London, E1 4NS, UK
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
     author = {Soraggi, Samuele and Rhodes, Johanna and Altinkaya, Isin and Tarrant, Oliver and Balloux, Francois and Fisher, Matthew C and Fumagalli, Matteo},
     title = {HMMploidy: inference of ploidy levels from short-read sequencing data},
     journal = {Peer Community Journal},
     eid = {e60},
     publisher = {Peer Community In},
     volume = {2},
     year = {2022},
     doi = {10.24072/pcjournal.178},
     url = {}
AU  - Soraggi, Samuele
AU  - Rhodes, Johanna
AU  - Altinkaya, Isin
AU  - Tarrant, Oliver
AU  - Balloux, Francois
AU  - Fisher, Matthew C
AU  - Fumagalli, Matteo
TI  - HMMploidy: inference of ploidy levels from short-read sequencing data
JO  - Peer Community Journal
PY  - 2022
DA  - 2022///
VL  - 2
PB  - Peer Community In
UR  -
UR  -
DO  - 10.24072/pcjournal.178
ID  - 10_24072_pcjournal_178
ER  - 
%0 Journal Article
%A Soraggi, Samuele
%A Rhodes, Johanna
%A Altinkaya, Isin
%A Tarrant, Oliver
%A Balloux, Francois
%A Fisher, Matthew C
%A Fumagalli, Matteo
%T HMMploidy: inference of ploidy levels from short-read sequencing data
%J Peer Community Journal
%D 2022
%V 2
%I Peer Community In
%R 10.24072/pcjournal.178
%F 10_24072_pcjournal_178
Soraggi, Samuele; Rhodes, Johanna; Altinkaya, Isin; Tarrant, Oliver; Balloux, Francois; Fisher, Matthew C; Fumagalli, Matteo. HMMploidy: inference of ploidy levels from short-read sequencing data. Peer Community Journal, Volume 2 (2022), article  no. e60. doi : 10.24072/pcjournal.178.

Peer reviewed and recommended by PCI : 10.24072/pci.mcb.100010

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

[1] Anders, S.; Huber, W. Differential expression analysis for sequence count data, Genome Biology, Volume 11 (2010) no. 10 | DOI

[2] Augusto Corrêa dos Santos, R.; Goldman, G. H.; Riaño-Pachón, D. M. ploidyNGS: visually exploring ploidy with Next Generation Sequencing data, Bioinformatics, Volume 33 (2017) no. 16, pp. 2575-2576 | DOI

[3] Avramovska, O.; Rego, E.; Hickman, M. A. Tetraploidy accelerates adaption under drug-selection in a fungal pathogen, bioRxiv, 2021 | DOI

[4] Bao, L.; Pu, M.; Messer, K. AbsCN-seq: a statistical method to estimate tumor purity, ploidy and absolute copy numbers from next-generation sequencing data, Bioinformatics, Volume 30 (2014) no. 8, pp. 1056-1063 | DOI

[5] Ben-David, U.; Amon, A. Context is everything: aneuploidy in cancer, Nature Reviews Genetics, Volume 21 (2019) no. 1, pp. 44-62 | DOI

[6] Bishop, C. Pattern recognition and machine learning, Springer, 2006

[7] Cappé, O.; Moulines, E.; Rydén, T. Inference in Hidden Markov Models, Springer Series in Statistics, Springer New York, New York, NY, 2005 | DOI

[8] Casella, G.; Berger, R. Statistical inference, Thomson Learning, 2002

[9] Chen, B.; Cole, J. W.; Grond-Ginsbach, C. Departure from Hardy Weinberg Equilibrium and Genotyping Error, Frontiers in Genetics, Volume 8 (2017) | DOI

[10] Coward, J.; Harding, A. Size Does Matter: Why Polyploid Tumor Cells are Critical Drug Targets in the War on Cancer, Frontiers in Oncology, Volume 4 (2014) | DOI

[11] Davoli, T.; de Lange, T. The Causes and Consequences of Polyploidy in Normal Development and Cancer, Annual Review of Cell and Developmental Biology, Volume 27 (2011) no. 1, pp. 585-610 | DOI

[12] Farrer, R. A.; Henk, D. A.; Garner, T. W. J.; Balloux, F.; Woodhams, D. C.; Fisher, M. C. Chromosomal Copy Number Variation, Selection and Uneven Rates of Recombination Reveal Cryptic Genome Diversity Linked to Pathogenicity, PLoS Genetics, Volume 9 (2013) no. 8 | DOI

[13] Favero, F.; Joshi, T.; Marquard, A.; Birkbak, N.; Krzystanek, M.; Li, Q.; Szallasi, Z.; Eklund, A. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data, Annals of Oncology, Volume 26 (2015) no. 1, pp. 64-70 | DOI

[14] Fox, D. T.; Soltis, D. E.; Soltis, P. S.; Ashman, T.-L.; Van de Peer, Y. Polyploidy: A Biological Force From Cells to Ecosystems, Trends in Cell Biology, Volume 30 (2020) no. 9, pp. 688-694 | DOI

[15] Fu, C.; Davy, A.; Holmes, S.; Sun, S.; Yadav, V.; Gusa, A.; Coelho, M. A.; Heitman, J. Dynamic genome plasticity during unisexual reproduction in the human fungal pathogen Cryptococcus deneoformans, PLOS Genetics, Volume 17 (2021) no. 11 | DOI

[16] Fumagalli, M.; Vieira, F. G.; Korneliussen, T. S.; Linderoth, T.; Huerta-Sánchez, E.; Albrechtsen, A.; Nielsen, R. Quantifying Population Genetic Differentiation from Next-Generation Sequencing Data, Genetics, Volume 195 (2013) no. 3, pp. 979-992 | DOI

[17] Fumagalli, M.; Vieira, F. G.; Linderoth, T.; Nielsen, R. ngsTools: methods for population genetics analyses from next-generation sequencing data, Bioinformatics, Volume 30 (2013) no. 10, pp. 1486-1487 | DOI

[18] Garrison, E.; Marth, G. Haplotype-based variant detection from short-read sequencing, arXiv (2012) | DOI

[19] Hardy, G. H. Mendelian Proportions in a Mixed Population, Science, Volume 28 (1908) no. 706, pp. 49-50 | DOI

[20] Lachance, J. Detecting selection-induced departures from Hardy-Weinberg proportions, Genetics Selection Evolution, Volume 41 (2009) no. 1 | DOI

[21] Levy, S. E.; Myers, R. M. Advancements in Next-Generation Sequencing, Annual Review of Genomics and Human Genetics, Volume 17 (2016) no. 1, pp. 95-115 | DOI

[22] Li, C.; Biswas, G. Temporal Pattern Generation Using Hidden Markov Model Based Unsupervised Classification, Advances in Intelligent Data Analysis, Springer Berlin Heidelberg, Berlin, Heidelberg, 1999, pp. 245-256 | DOI

[23] Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R. The Sequence Alignment/Map format and SAMtools, Bioinformatics, Volume 25 (2009) no. 16, pp. 2078-2079 | DOI

[24] Loftus, B. J.; Fung, E.; Roncaglia, P.; Rowley, D.; Amedeo, P.; Bruno, D.; Vamathevan, J.; Miranda, M.; Anderson, I. J.; Fraser, J. A.; Allen, J. E.; Bosdet, I. E.; Brent, M. R.; Chiu, R.; Doering, T. L.; Donlin, M. J.; D'Souza, C. A.; Fox, D. S.; Grinberg, V.; Fu, J.; Fukushima, M.; Haas, B. J.; Huang, J. C.; Janbon, G.; Jones, S. J. M.; Koo, H. L.; Krzywinski, M. I.; Kwon-Chung, J. K.; Lengeler, K. B.; Maiti, R.; Marra, M. A.; Marra, R. E.; Mathewson, C. A.; Mitchell, T. G.; Pertea, M.; Riggs, F. R.; Salzberg, S. L.; Schein, J. E.; Shvartsbeyn, A.; Shin, H.; Shumway, M.; Specht, C. A.; Suh, B. B.; Tenney, A.; Utterback, T. R.; Wickes, B. L.; Wortman, J. R.; Wye, N. H.; Kronstad, J. W.; Lodge, J. K.; Heitman, J.; Davis, R. W.; Fraser, C. M.; Hyman, R. W. The Genome of the Basidiomycetous Yeast and Human Pathogen Cryptococcus neoformans, Science, Volume 307 (2005) no. 5713, pp. 1321-1324 | DOI

[25] Longley, N.; Muzoora, C.; Taseera, K.; Mwesigye, J.; Rwebembera, J.; Chakera, A.; Wall, E.; Andia, I.; Jaffar, S.; Harrison, T. S. Dose Response Effect of High‐Dose Fluconazole for HIV‐Associated Cryptococcal Meningitis in Southwestern Uganda, Clinical Infectious Diseases, Volume 47 (2008) no. 12, pp. 1556-1561 | DOI

[26] Lou, R. N.; Jacobs, A.; Wilder, A. P.; Therkildsen, N. O. A beginner's guide to low‐coverage whole genome sequencing for population genomics, Molecular Ecology, Volume 30 (2021) no. 23, pp. 5966-5993 | DOI

[27] May, R. C.; Stone, N. R.; Wiesner, D. L.; Bicanic, T.; Nielsen, K. Cryptococcus: from environmental saprophyte to global pathogen, Nature Reviews Microbiology, Volume 14 (2016) no. 2, pp. 106-117 | DOI

[28] McKenna, A.; Hanna, M.; Banks, E.; Sivachenko, A.; Cibulskis, K.; Kernytsky, A.; Garimella, K.; Altshuler, D.; Gabriel, S.; Daly, M.; DePristo, M. A. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data, Genome Research, Volume 20 (2010) no. 9, pp. 1297-1303 | DOI

[29] Metzker, M. L. Sequencing technologies — the next generation, Nature Reviews Genetics, Volume 11 (2010) no. 1, pp. 31-46 | DOI

[30] Morrow, C. A.; Fraser, J. A. Ploidy variation as an adaptive mechanism in human pathogenic fungi, Seminars in Cell & Developmental Biology, Volume 24 (2013) no. 4, pp. 339-346 | DOI

[31] Nielsen, R.; Paul, J. S.; Albrechtsen, A.; Song, Y. S. Genotype and SNP calling from next-generation sequencing data, Nature Reviews Genetics, Volume 12 (2011) no. 6, pp. 443-451 | DOI

[32] Ormerod, K. L.; Morrow, C. A.; Chow, E. W. L.; Lee, I. R.; Arras, S. D. M.; Schirra, H. J.; Cox, G. M.; Fries, B. C.; Fraser, J. A. Comparative Genomics of Serial Isolates of Cryptococcus neoformans Reveals Gene Associated With Carbon Utilization and Virulence, G3 Genes|Genomes|Genetics, Volume 3 (2013) no. 4, pp. 675-686 | DOI

[33] Rabiner, L. A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, Volume 77 (1989) no. 2, pp. 257-286 | DOI

[34] Rhodes, J.; Beale, M. A.; Vanhove, M.; Jarvis, J. N.; Kannambath, S.; Simpson, J. A.; Ryan, A.; Meintjes, G.; Harrison, T. S.; Fisher, M. C.; Bicanic, T. A Population Genomics Approach to Assessing the Genetic Basis of Within-Host Microevolution Underlying Recurrent Cryptococcal Meningitis Infection, G3 Genes|Genomes|Genetics, Volume 7 (2017) no. 4, pp. 1165-1176 | DOI

[35] Rhodes, J.; Beale, M. A.; Fisher, M. C. Illuminating Choices for Library Prep: A Comparison of Library Preparation Methods for Whole Genome Sequencing of Cryptococcus neoformans Using Illumina HiSeq, PLoS ONE, Volume 9 (2014) no. 11 | DOI

[36] Rhodes, J.; Desjardins, C. A.; Sykes, S. M.; Beale, M. A.; Vanhove, M.; Sakthikumar, S.; Chen, Y.; Gujja, S.; Saif, S.; Chowdhary, A.; Lawson, D. J.; Ponzio, V.; Colombo, A. L.; Meyer, W.; Engelthaler, D. M.; Hagen, F.; Illnait-Zaragozi, M. T.; Alanio, A.; Vreulink, J.-M.; Heitman, J.; Perfect, J. R.; Litvintseva, A. P.; Bicanic, T.; Harrison, T. S.; Fisher, M. C.; Cuomo, C. A. Tracing Genetic Exchange and Biogeography of Cryptococcus neoformans var. grubii at the Global Population Level, Genetics, Volume 207 (2017) no. 1, pp. 327-346 | DOI

[37] Sattler, M. C.; Carvalho, C. R.; Clarindo, W. R. The polyploidy and its key role in plant breeding, Planta, Volume 243 (2016) no. 2, pp. 281-296 | DOI

[38] Sionov, E.; Chang, Y. C.; Kwon-Chung, K. J. Azole Heteroresistance in Cryptococcus neoformans: Emergence of Resistant Clones with Chromosomal Disomy in the Mouse Brain during Fluconazole Treatment, Antimicrobial Agents and Chemotherapy, Volume 57 (2013) no. 10, pp. 5127-5130 | DOI

[39] Soraggi, S. HMMploidy, Zenodo, 2022 | DOI

[40] Soraggi, S. HMMploidy, OSF, 2022 | DOI

[41] Soraggi, S.; Rhodes, J.; Altinkaya, I.; Tarrant, O.; Balloux, F.; Fisher, M. C.; Fumagalli, M. HMMploidy: inference of ploidy levels from short-read sequencing data, bioRxiv (2022) no. 2021.06.29.450340 | DOI

[42] Stone, N. R.; Rhodes, J.; Fisher, M. C.; Mfinanga, S.; Kivuyo, S.; Rugemalila, J.; Segal, E. S.; Needleman, L.; Molloy, S. F.; Kwon-Chung, J.; Harrison, T. S.; Hope, W.; Berman, J.; Bicanic, T. Dynamic ploidy changes drive fluconazole resistance in human cryptococcal meningitis, Journal of Clinical Investigation, Volume 129 (2019) no. 3, pp. 999-1014 | DOI

[43] Therkildsen, N. O.; Palumbi, S. R. Practical low-coverage genomewide sequencing of hundreds of individually barcoded samples for population and evolutionary genomics in nonmodel species, Molecular Ecology Resources, Volume 17 (2017) no. 2, pp. 194-208 | DOI

[44] Van de Peer, Y.; Mizrachi, E.; Marchal, K. The evolutionary significance of polyploidy, Nature Reviews Genetics, Volume 18 (2017) no. 7, pp. 411-424 | DOI

[45] Van der Auwera, G. A.; Carneiro, M. O.; Hartl, C.; Poplin, R.; del Angel, G.; Levy‐Moonshine, A.; Jordan, T.; Shakir, K.; Roazen, D.; Thibault, J.; Banks, E.; Garimella, K. V.; Altshuler, D.; Gabriel, S.; DePristo, M. A. From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline, Current Protocols in Bioinformatics, Volume 43 (2013) no. 1 | DOI

[46] Vu, G. T. H.; Cao, H. X.; Reiss, B.; Schubert, I. Deletion‐bias in DNA double‐strand break repair differentially contributes to plant genome shrinkage, New Phytologist, Volume 214 (2017) no. 4, pp. 1712-1721 | DOI

[47] Weinberg, W. Über den Nachweis der Vererbung beim Menschen, Ver. Vaterl. Naturkd. Württemb, Volume 64 (1908), pp. 369-382

[48] Weiß, C. L.; Pais, M.; Cano, L. M.; Kamoun, S.; Burbano, H. A. nQuire: a statistical framework for ploidy estimation using next generation sequencing, BMC Bioinformatics, Volume 19 (2018) no. 1 | DOI

[49] Wittke-Thompson, J. K.; Pluzhnikov, A.; Cox, N. J. Rational Inferences about Departures from Hardy-Weinberg Equilibrium, The American Journal of Human Genetics, Volume 76 (2005) no. 6, pp. 967-986 | DOI

[50] Wood, T. E.; Takebayashi, N.; Barker, M. S.; Mayrose, I.; Greenspoon, P. B.; Rieseberg, L. H. The frequency of polyploid speciation in vascular plants, Proceedings of the National Academy of Sciences, Volume 106 (2009) no. 33, pp. 13875-13879 | DOI

[51] Yang, F.; Gritsenko, V.; Lu, H.; Zhen, C.; Gao, L.; Berman, J.; Jiang, Y.-y. Adaptation to Fluconazole via Aneuploidy Enables Cross-Adaptation to Amphotericin B and Flucytosine in Cryptococcus neoformans, Microbiology Spectrum, Volume 9 (2021) no. 2 | DOI

[52] Zhu, J.; Tsai, H.-J.; Gordon, M. R.; Li, R. Cellular Stress Associated with Aneuploidy, Developmental Cell, Volume 44 (2018) no. 4, pp. 420-431 | DOI

Cited by Sources: