Section: Genomics
Topic: Agricultural sciences, Genetics/genomics

HaploCharmer: a Snakemake workflow for read-scale haplotype calling adapted to polyploids

Corresponding author(s): Rio, Simon (simon.rio@cirad.fr)

10.24072/pcjournal.631 - Peer Community Journal, Volume 5 (2025), article no. e106

Get full text PDF Peer reviewed and recommended by PCI

The advent of next-generation sequencing (NGS) has revolutionized the study of single nucleotide polymorphisms (SNPs), making it increasingly cost-effective. Haplotypes, which combine alleles from adjacent variants, offer several advantages over bi-allelic SNPs, including enhanced information content, reduced dimensionality, and improved statistical power in genomic studies. These benefits are particularly significant for polyploid species, where distinguishing all homologous copies using SNP markers alone can be challenging. This article introduces HaploCharmer, a flexible workflow designed for read-scale haplotype calling from NGS data. HaploCharmer identifies haplotypes within preconfigured genomic regions smaller than a sequencing read, ensuring direct comparability across individuals. It integrates a series of processing steps including mapping, haplotype identification, filtration, and reporting of haplotype sequences, as presence-absence, in the panel of accessions analyzed. The performance of HaploCharmer was validated by building a genetic map using whole-genome sequencing data from a highly polyploid sugarcane cultivar (R570) and its self-progeny, and performing a diversity analysis in the polyploid Saccharum genus using targeted sequencing data. The workflow successfully identified a large number of high-quality haplotypes, with less than 1% of false positives. The dense genetic map obtained using single-dose haplotypes accurately depicted the known genome architecture of the R570 cultivar, including large chromosome rearrangements. The diversity analysis accurately reflected the known genetic structure within this genus. It also allowed inferring ancestral origins to mapped haplotypes and the corresponding chromosome segments in the R570 genetic map. HaploCharmer provides a robust method for diversity, genetic mapping, and quantitative genetics studies in both diploid and polyploid species.

Published online:
DOI: 10.24072/pcjournal.631
Type: Research article
Keywords: Haplotype, Next-generation sequencing, Polyploid, Sugarcane

Rio, Simon 1, 2; Abdallah, Sophie 1, 2; Durand, Théo 1, 2; D'Hont, Angélique 1, 2; Garsmeur, Olivier 1, 2

1 CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
2 UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
@article{10_24072_pcjournal_631,
     author = {Rio, Simon and Abdallah, Sophie and Durand, Th\'eo and D'Hont, Ang\'elique and Garsmeur, Olivier},
     title = {HaploCharmer: a {Snakemake} workflow for read-scale haplotype calling adapted to polyploids},
     journal = {Peer Community Journal},
     eid = {e106},
     publisher = {Peer Community In},
     volume = {5},
     year = {2025},
     doi = {10.24072/pcjournal.631},
     language = {en},
     url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.631/}
}
TY  - JOUR
AU  - Rio, Simon
AU  - Abdallah, Sophie
AU  - Durand, Théo
AU  - D'Hont, Angélique
AU  - Garsmeur, Olivier
TI  - HaploCharmer: a Snakemake workflow for read-scale haplotype calling adapted to polyploids
JO  - Peer Community Journal
PY  - 2025
VL  - 5
PB  - Peer Community In
UR  - https://peercommunityjournal.org/articles/10.24072/pcjournal.631/
DO  - 10.24072/pcjournal.631
LA  - en
ID  - 10_24072_pcjournal_631
ER  - 
%0 Journal Article
%A Rio, Simon
%A Abdallah, Sophie
%A Durand, Théo
%A D'Hont, Angélique
%A Garsmeur, Olivier
%T HaploCharmer: a Snakemake workflow for read-scale haplotype calling adapted to polyploids
%J Peer Community Journal
%D 2025
%V 5
%I Peer Community In
%U https://peercommunityjournal.org/articles/10.24072/pcjournal.631/
%R 10.24072/pcjournal.631
%G en
%F 10_24072_pcjournal_631
Rio, S.; Abdallah, S.; Durand, T.; D'Hont, A.; Garsmeur, O. HaploCharmer: a Snakemake workflow for read-scale haplotype calling adapted to polyploids. Peer Community Journal, Volume 5 (2025), article  no. e106. https://doi.org/10.24072/pcjournal.631

PCI peer reviews and recommendation, and links to data, scripts, code and supplementary information: 10.24072/pci.genomics.100434

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

[1] Aguiar, D.; Istrail, S. HapCompass: A Fast Cycle Basis Algorithm for Accurate Haplotype Assembly of Sequence Data, Journal of Computational Biology, Volume 19 (2012) no. 6, pp. 577-590 | DOI

[2] Aguiar, D.; Istrail, S. Haplotype assembly in polyploid genomes and identical by descent shared tracts, Bioinformatics, Volume 29 (2013) no. 13, p. i352-i360 | DOI

[3] Bhat, J. A.; Yu, D.; Bohra, A.; Ganie, S. A.; Varshney, R. K. Features and applications of haplotypes in crop breeding, Communications Biology, Volume 4 (2021) no. 1, p. 1266 | DOI

[4] Campos Dominguez, L. Haplotype calling in complex polyploids using a single streamlined workflow, Peer Community in Genomics (2025), p. 100434 | DOI

[5] Chevreux, B.; Pfisterer, T.; Drescher, B.; Driesel, A. J.; Müller, W. E.; Wetter, T.; Suhai, S. Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTs, Genome Research, Volume 14 (2004) no. 6, pp. 1147-1159 | DOI

[6] Clevenger, J. P.; Korani, W.; Ozias-Akins, P.; Jackson, S. Haplotype-Based Genotyping in Polyploids, Frontiers in Plant Science, Volume 9 (2018) | DOI

[7] Danecek, P.; Bonfield, J. K.; Liddle, J.; Marshall, J.; Ohan, V.; Pollard, M. O.; Whitwham, A.; Keane, T.; McCarthy, S. A.; Davies, R. M.; Li, H. Twelve years of SAMtools and BCFtools, GigaScience, Volume 10 (2021) no. 2 | DOI

[8] DePristo, M. A.; Banks, E.; Poplin, R.; Garimella, K. V.; Maguire, J. R.; Hartl, C.; Philippakis, A. A.; del Angel, G.; Rivas, M. A.; Hanna, M.; McKenna, A.; Fennell, T. J.; Kernytsky, A. M.; Sivachenko, A. Y.; Cibulskis, K.; Gabriel, S. B.; Altshuler, D.; Daly, M. J. A framework for variation discovery and genotyping using next-generation DNA sequencing data, Nature Genetics, Volume 43 (2011) no. 5, pp. 491-498 | DOI

[9] Dempster, A. P.; Laird, N. M.; Rubin, D. B. Maximum Likelihood from Incomplete Data Via the EM Algorithm, Journal of the Royal Statistical Society Series B: Statistical Methodology, Volume 39 (1977) no. 1, pp. 1-22 | DOI

[10] Dijoux, J.; Rio, S.; Hervouet, C.; Garsmeur, O.; Barau, L.; Dumont, T.; Rott, P.; D’Hont, A.; Hoarau, J.-Y. Unveiling the predominance of Saccharum spontaneum alleles for resistance to orange rust in sugarcane using genome-wide association, Theoretical and Applied Genetics, Volume 137 (2024) no. 4 | DOI

[11] D’Hont, A.; Grivet, L.; Feldmann, P.; Glaszmann, J. C.; Rao, S.; Berding, N. Characterisation of the double genome structure of modern sugarcane cultivars (Saccharum spp.) by molecular cytogenetics, Molecular and General Genetics MGG, Volume 250 (1996) no. 4, pp. 405-413 | DOI

[12] Garsmeur, O.; Droc, G.; Antonise, R.; Grimwood, J.; Potier, B.; Aitken, K.; Jenkins, J.; Martin, G.; Charron, C.; Hervouet, C.; Costet, L.; Yahiaoui, N.; Healey, A.; Sims, D.; Cherukuri, Y.; Sreedasyam, A.; Kilian, A.; Chan, A.; Van Sluys, M.-A.; Swaminathan, K.; Town, C.; Bergès, H.; Simmons, B.; Glaszmann, J. C.; van der Vossen, E.; Henry, R.; Schmutz, J.; D’Hont, A. A mosaic monoploid reference sequence for the highly complex genome of sugarcane, Nature Communications, Volume 9 (2018) no. 1 | DOI

[13] Gattepaille, L. M.; Jakobsson, M. Combining Markers into Haplotypes Can Improve Population Structure Inference, Genetics, Volume 190 (2012) no. 1, pp. 159-174 | DOI

[14] Grivet, L.; D’Hont, A.; Dufour, P.; Hamon, P.; Roques, D.; Glaszmann, J. C. Comparative genome mapping of sugar cane with other species within the Andropogoneae tribe, Heredity, Volume 73 (1994) no. 5, pp. 500-508 | DOI

[15] Grivet, L.; D’Hont, A.; Roques, D.; Feldmann, P.; Lanaud, C.; Glaszmann, J. C. RFLP Mapping in Cultivated Sugarcane (Saccharum spp.): Genome Organization in a Highly Polyploid and Aneuploid Interspecific Hybrid, Genetics, Volume 142 (1996) no. 3, pp. 987-1000 | DOI

[16] He, D.; Saha, S.; Finkers, R.; Parida, L. Efficient algorithms for polyploid haplotype phasing, BMC Genomics, Volume 19 (2018) no. S2 | DOI

[17] Healey, A. L.; Garsmeur, O.; Lovell, J. T.; Shengquiang, S.; Sreedasyam, A.; Jenkins, J.; Plott, C. B.; Piperidis, N.; Pompidor, N.; Llaca, V.; Metcalfe, C. J.; Doležel, J.; Cápal, P.; Carlson, J. W.; Hoarau, J. Y.; Hervouet, C.; Zini, C.; Dievart, A.; Lipzen, A.; Williams, M.; Boston, L. B.; Webber, J.; Keymanesh, K.; Tejomurthula, S.; Rajasekar, S.; Suchecki, R.; Furtado, A.; May, G.; Parakkal, P.; Simmons, B. A.; Barry, K.; Henry, R. J.; Grimwood, J.; Aitken, K. S.; Schmutz, J.; D’Hont, A. The complex polyploid genome architecture of sugarcane, Nature, Volume 628 (2024) no. 8009, pp. 804-810 | DOI

[18] Köster, J.; Rahmann, S. Snakemake—a scalable bioinformatics workflow engine, Bioinformatics, Volume 28 (2012) no. 19, pp. 2520-2522 | DOI

[19] Margarido, G. R. A.; Souza, A. P.; Garcia, A. A. F. OneMap: software for genetic mapping in outcrossing species: OneMap, Hereditas, Volume 144 (2007) no. 3, pp. 78-79 | DOI

[20] Moeinzadeh, M.-H.; Yang, J.; Muzychenko, E.; Gallone, G.; Heller, D.; Reinert, K.; Haas, S.; Vingron, M. Ranbow: A fast and accurate method for polyploid haplotype reconstruction, PLOS Computational Biology, Volume 16 (2020) no. 5, p. e1007843 | DOI

[21] Mölder, F.; Jablonski, K. P.; Letcher, B.; Hall, M. B.; Tomkins-Tinch, C. H.; Sochat, V.; Forster, J.; Lee, S.; Twardziok, S. O.; Kanitz, A.; Wilm, A.; Holtgrewe, M.; Rahmann, S.; Nahnsen, S.; Köster, J. Sustainable data analysis with Snakemake, F1000Research, Volume 10 (2021), p. 33 | DOI

[22] Piperidis, G.; Piperidis, N.; D’Hont, A. Molecular cytogenetic investigation of chromosome composition and transmission in sugarcane, Molecular Genetics and Genomics, Volume 284 (2010) no. 1, pp. 65-73 | DOI

[23] Piperidis, N.; D’Hont, A. Sugarcane genome architecture decrypted with chromosome‐specific oligo probes, The Plant Journal, Volume 103 (2020) no. 6, pp. 2039-2051 | DOI

[24] Rio, S.; Abdallah, S.; Durand, T.; D'Hont, A.; Garsmeur, O. HaploCharmer supplementary figures and codes, Zenodo (2025) | DOI

[25] Shirley, M. D.; Ma, Z.; Pedersen, B. S.; Wheelan, S. J. Efficient “pythonic” access to FASTA files using pyfaidx, PeerJ PrePrints, Volume 3 (2015), p. e970v1 | DOI

[26] Voorrips, R. E.; Tumino, G. PolyHaplotyper: haplotyping in polyploids based on bi-allelic marker dosage data, BMC Bioinformatics, Volume 23 (2022) no. 1 | DOI

[27] Yang, X.; Song, J.; Todd, J.; Peng, Z.; Paudel, D.; Luo, Z.; Ma, X.; You, Q.; Hanson, E.; Zhao, Z.; Zhao, Y.; Zhang, J.; Ming, R.; Wang, J. Target enrichment sequencing of 307 germplasm accessions identified ancestry of ancient and modern hybrids and signatures of adaptation and selection in sugarcane (Saccharum spp.), a ‘sweet’ crop with ‘bitter’ genomes, Plant Biotechnology Journal, Volume 17 (2018) no. 2, pp. 488-498 | DOI

Cited by Sources: