Section: Mathematical & Computational Biology
Topic:
Biophysics and computational biology
HairSplitter: haplotype assembly from long, noisy reads
Corresponding author(s): Faure, Roland (roland.faure@irisa.fr)
10.24072/pcjournal.481 - Peer Community Journal, Volume 4 (2024), article no. e96.
Get full text PDF Peer reviewed and recommended by PCIMotivation: Long-read assemblers face challenges in discerning closely related viral or bacterial strains, often collapsing similar strains into a single sequence. This limitation has been hampering metagenome analysis, as diverse strains may harbor crucial functional distinctions. Results: We introduce a novel software, HairSplitter, designed to retrieve strains from a partially or totally collapsed assembly and long reads. The method uses a custom variant-calling process to operate with erroneous long reads and introduces a new read binning algorithm to recover an a priori unknown number of strains. On noisy long reads, HairSplitter recovers more strains while being faster than state-of-the-art tools, both in the cases of viruses and bacteria. Availability: HairSplitter is freely available on GitHub at https://github.com/RolandFaure/Hairsplitter (https://doi.org/10.5281/zenodo.13753481).
Type: Research article
Faure, Roland 1, 2; Lavenier, Dominique 1; Flot, Jean-François 2, 3
@article{10_24072_pcjournal_481, author = {Faure, Roland and Lavenier, Dominique and Flot, Jean-Fran\c{c}ois}, title = {HairSplitter: haplotype assembly from long, noisy reads}, journal = {Peer Community Journal}, eid = {e96}, publisher = {Peer Community In}, volume = {4}, year = {2024}, doi = {10.24072/pcjournal.481}, language = {en}, url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.481/} }
TY - JOUR AU - Faure, Roland AU - Lavenier, Dominique AU - Flot, Jean-François TI - HairSplitter: haplotype assembly from long, noisy reads JO - Peer Community Journal PY - 2024 VL - 4 PB - Peer Community In UR - https://peercommunityjournal.org/articles/10.24072/pcjournal.481/ DO - 10.24072/pcjournal.481 LA - en ID - 10_24072_pcjournal_481 ER -
%0 Journal Article %A Faure, Roland %A Lavenier, Dominique %A Flot, Jean-François %T HairSplitter: haplotype assembly from long, noisy reads %J Peer Community Journal %D 2024 %V 4 %I Peer Community In %U https://peercommunityjournal.org/articles/10.24072/pcjournal.481/ %R 10.24072/pcjournal.481 %G en %F 10_24072_pcjournal_481
Faure, Roland; Lavenier, Dominique; Flot, Jean-François. HairSplitter: haplotype assembly from long, noisy reads. Peer Community Journal, Volume 4 (2024), article no. e96. doi : 10.24072/pcjournal.481. https://peercommunityjournal.org/articles/10.24072/pcjournal.481/
PCI peer reviews and recommendation, and links to data, scripts, code and supplementary information: 10.24072/pci.mcb.100307
Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
[1] High-quality metagenome assembly from long accurate reads with metaMDBG, Nature Biotechnology (2024), pp. 1-6 | DOI
[2] Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes, Nature Biotechnology, Volume 37 (2019) no. 8, pp. 937-944 | DOI
[3] Chinese whispers: An efficient graph clustering algorithm and its application to natural language processing problems, Proceedings of TextGraphs (2006), pp. 73-80 | DOI
[4] HaploDMF: viral Haplotype reconstruction from long reads via Deep Matrix Factorization, Bioinformatics, Volume 38 (2022) | DOI
[5] Genetics of leukocyte antigens: a family study of segregation and linkage., Histocompatibility Testing 1967, 1967
[6] Flexible and cost-effective genomic surveillance of P. falciparum malaria with targeted nanopore sequencing, Nature Communications, Volume 15 (2024) no. 1 | DOI
[7] Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm, Nature Methods, Volume 18 (2021), pp. 170-175 | DOI
[8] Soil microbiota as game-changers in restoration of degraded lands, Science, Volume 375 (2022), p. abe0725 | DOI
[9] The impact of diet and lifestyle on gut microbiota and human health, Nutrients, Volume 7 (2014), pp. 17-44 | DOI
[10] Probability and Statistics, 2002
[11] Replication data for: HairSplitter: separating haplotypes with long reads [Data set], Zenodo (2024) | DOI
[12] Hairsplitter: v1.9.17, Zenodo, 2024 | DOI
[13] GraphUnzip: unzipping assembly graphs with long reads and Hi-C, bioRxiv (2021) | DOI
[14] Metagenome assembly of high-fidelity long reads with hifiasm-meta, Nature Methods, Volume 19 (2022), pp. 671-674 | DOI
[15] Detecting and phasing minor single-nucleotide variants from long-read sequencing data, Nature Communications, Volume 12 (2021), p. 3032 | DOI
[16] Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties, International Statistical Review / Revue Internationale de Statistique, Volume 57 (1989) no. 3 | DOI
[17] Genomic analysis of human Noroviruses using combined Illumina-Nanopore data, Virus Evolution, Volume 7 (2021) | DOI
[18] Epidemic profile of shiga-toxin–producing Escherichia coli O104:H4 outbreak in Germany, New England Journal of Medicine, Volume 365 (2011) no. 19, pp. 1771-1780 | DOI
[19] Metagenomic assembly: overview, challenges and applications, The Yale Journal of Biology and Medicine, Volume 89 (2016), pp. 353-362
[20] StrainXpress: strain aware metagenome assembly from short reads, Nucleic Acids Research, Volume 50 (2022) no. 17, p. e101-e101 | DOI
[21] stRainy: assembly-based metagenomic strain phasing using long reads (2023) | DOI
[22] metaFlye: scalable long-read metagenome assembly using repeat graphs, Nature Methods, Volume 17 (2020) no. 11, pp. 1103-1110 | DOI
[23] Recent Advances in Assembly of Complex Plant Genomes, Genomics, Proteomics & Bioinformatics, Volume 21 (2023) no. 3, pp. 427-439 | DOI
[24] Genomic insights that advance the species definition for prokaryotes, Proceedings of the National Academy of Sciences of the United States of America, Volume 102 (2005), p. 2567-72 | DOI
[25] Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation, Genome Research, Volume 27 (2017) no. 5, pp. 722-736 | DOI
[26] Minimap2: pairwise alignment for nucleotide sequences, Bioinformatics, Volume 34 (2018) no. 18, pp. 3094-3100 | DOI
[27] The design and construction of reference pangenome graphs with minigraph, Genome Biology, Volume 21 (2020), p. 265 | DOI
[28] ConStrains identifies microbial strains in metagenomic datasets, Nature Biotechnology, Volume 33 (2015), pp. 1045-1052 | DOI
[29] Strainline: full-length de novo viral haplotype reconstruction from noisy long reads, Genome Biology, Volume 23 (2022), p. 29 | DOI
[30] Mutations and evolution of the SARS-CoV-2 spike protein, Viruses, Volume 14 (2022), p. 640 | DOI
[31] Illumina and Nanopore methods for whole genome sequencing of hepatitis B virus (HBV), Scientific Reports, Volume 9 (2019), p. 7081 | DOI
[32] MetaQUAST: Evaluation of metagenome assemblies, Bioinformatics, Volume 32 (2016), pp. 1088-1090 | DOI
[33] Using Tablet for visual exploration of second-generation sequencing data, Briefings in Bioinformatics, Volume 14 (2013) no. 2, pp. 193-202 | DOI
[34] Accurate Haplotype Reconstruction from Long, Error-Prone, Reads with *HairSplitter*, Peer Community in Mathematical and Computational Biology, Volume 1 (2024), p. 100307 | DOI
[35] DESMAN: a new tool for de novo extraction of strains from metagenomes, Genome Biology, Volume 18 (2017) no. 1, p. 181 | DOI
[36] Comparative genome analysis of Vagococcus fluvialis reveals abundance of mobile genetic elements in sponge-isolated strains, BMC Genomics, Volume 23 (2022) | DOI
[37] On-Site MinION Sequencing, Advances in Experimental Medicine and Biology, Springer Singapore, Singapore, 2019, pp. 143-150 | DOI
[38] Fast and accurate de novo genome assembly from long uncorrected reads, Genome Research, Volume 27 (2017), pp. 737-746 | DOI
[39] Strainberry: automated strain separation in low-complexity metagenomes using long reads, Nature Communications, Volume 12 (2021) no. 1, p. 4485 | DOI
[40] New directions and interactions in metagenomics research, FEMS Microbiology Ecology, Volume 55 (2006), pp. 331-338 | DOI
[41] Badread: simulation of error-prone long reads, Journal of Open Source Software, Volume 4 (2019) no. 36, p. 1316 | DOI
[42] Bandage: interactive visualization of de novo genome assemblies, Bioinformatics, Volume 31 (2015) no. 20, pp. 3350-3352 | DOI
Cited by Sources: