HairSplitter: haplotype assembly from long, noisy reads

Corresponding author(s): Faure, Roland (

10.24072/pcjournal.481 - Peer Community Journal, Volume 4 (2024), article no. e96.

Motivation: Long-read assemblers face challenges in discerning closely related viral or bacterial strains, often collapsing similar strains into a single sequence. This limitation has been hampering metagenome analysis, as diverse strains may harbor crucial functional distinctions. Results: We introduce a novel software, HairSplitter, designed to retrieve strains from a partially or totally collapsed assembly and long reads. The method uses a custom variant-calling process to operate with erroneous long reads and introduces a new read binning algorithm to recover an a priori unknown number of strains. On noisy long reads, HairSplitter recovers more strains while being faster than state-of-the-art tools, both in the cases of viruses and bacteria. Availability: HairSplitter is freely available on GitHub at (

DOI: 10.24072/pcjournal.481
Mots clés : Metagenomes, Metaviromes, Haplotyping, Genome assembly, Strain separation

Faure, Roland 1, 2; Lavenier, Dominique 1; Flot, Jean-François 2, 3

1 Univ. Rennes, INRIA RBA, CNRS UMR 6074, Rennes, France
2 Service Evolution Biologique et Ecologie, Université libre de Bruxelles (ULB), Brussels, Belgium
3 Interuniversity Institute of Bioinformatics in Brussels -- (IB)2, Brussels, Belgium
Faure, Roland; Lavenier, Dominique; Flot, Jean-François. HairSplitter: haplotype assembly from long, noisy reads. Peer Community Journal, Volume 4 (2024), article  no. e96. doi : 10.24072/pcjournal.481.

