Genomics

CulebrONT: a streamlined long reads multi-assembler pipeline for prokaryotic and eukaryotic genomes

10.24072/pcjournal.153 - Peer Community Journal, Volume 2 (2022), article no. e46.

Get full text PDF Peer reviewed and recommended by PCI

Using long reads provides higher contiguity and better genome assemblies. However, producing such high quality sequences from raw reads requires to chain a growing set of tools, and determining the best workflow is a complex task.
To tackle this challenge, we developed CulebrONT, an open-source, scalable, modular and traceable Snakemake pipeline for assembling long reads data. CulebrONT enables to perform tests on multiple samples and multiple long reads assemblers in parallel, and can optionally perform, downstream circularization and polishing. It further provides a range of assembly quality metrics summarized in a final user-friendly report. CulebrONT alleviates the difficulties of assembly pipelines development, and allow users to identify the best assembly options.

Published online:
DOI: 10.24072/pcjournal.153
Orjuela, Julie 1, 2, 3; Comte, Aurore 2, 3; Ravel, Sébastien 2, 3; Charriat, Florian 2, 3; Vi, Tram 3, 4; Sabot, François  1, 2; Cunnac, Sébastien  2, 3

1 DIADE Unit, University of Montpellier, CIRAD, IRD – Montpellier Cedex 5, France
2 IFB - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD – Montpellier, France
3 PHIM Plant Health Institute, University of Montpellier, CIRAD, INRAE, Institut Agro, IRD – Montpellier, France
4 Agricultural Genetics Institute, Vietnam Academy of Agricultural Sciences – Hanoi, Vietnam
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
@article{10_24072_pcjournal_153,
     author = {Orjuela, Julie and Comte, Aurore and Ravel, S\'ebastien and Charriat, Florian and Vi, Tram and Sabot, Fran\c{c}ois  and Cunnac, S\'ebastien },
     title = {CulebrONT: a streamlined long reads multi-assembler pipeline for prokaryotic and eukaryotic genomes},
     journal = {Peer Community Journal},
     eid = {e46},
     publisher = {Peer Community In},
     volume = {2},
     year = {2022},
     doi = {10.24072/pcjournal.153},
     url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.153/}
}
TY  - JOUR
TI  - CulebrONT: a streamlined long reads multi-assembler pipeline for prokaryotic and eukaryotic genomes
JO  - Peer Community Journal
PY  - 2022
DA  - 2022///
VL  - 2
PB  - Peer Community In
UR  - https://peercommunityjournal.org/articles/10.24072/pcjournal.153/
UR  - https://doi.org/10.24072/pcjournal.153
DO  - 10.24072/pcjournal.153
ID  - 10_24072_pcjournal_153
ER  - 
%0 Journal Article
%T CulebrONT: a streamlined long reads multi-assembler pipeline for prokaryotic and eukaryotic genomes
%J Peer Community Journal
%D 2022
%V 2
%I Peer Community In
%U https://doi.org/10.24072/pcjournal.153
%R 10.24072/pcjournal.153
%F 10_24072_pcjournal_153
Orjuela, Julie; Comte, Aurore; Ravel, Sébastien; Charriat, Florian; Vi, Tram; Sabot, François ; Cunnac, Sébastien . CulebrONT: a streamlined long reads multi-assembler pipeline for prokaryotic and eukaryotic genomes. Peer Community Journal, Volume 2 (2022), article  no. e46. doi : 10.24072/pcjournal.153. https://peercommunityjournal.org/articles/10.24072/pcjournal.153/

Peer reviewed and recommended by PCI : 10.24072/pci.genomics.100018

[1] Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. https://anaconda.com.

[2] Chen, Z.; Erickson, D. L.; Meng, J. Benchmarking Long-Read Assemblers for Genomic Analyses of Bacterial Pathogens Using Oxford Nanopore Sequencing, International Journal of Molecular Sciences, Volume 21 (2020) no. 23 | DOI

[3] Cheng, H.; Concepcion, G. T.; Feng, X.; Zhang, H.; Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm, Nature Methods, Volume 18 (2021) no. 2, pp. 170-175 | DOI

[4] Darling, A. C.; Mau, B.; Blattner, F. R.; Perna, N. T. Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangements, Genome Research, Volume 14 (2004) no. 7, pp. 1394-1403 | DOI

[5] Gurevich, A.; Saveliev, V.; Vyahhi, N.; Tesler, G. QUAST: quality assessment tool for genome assemblies, Bioinformatics, Volume 29 (2013) no. 8, pp. 1072-1075 | DOI

[6] Hunt, M.; Silva, N. D.; Otto, T. D.; Parkhill, J.; Keane, J. A.; Harris, S. R. Circlator: automated circularization of genome assemblies using long sequencing reads, Genome Biology, Volume 16 (2015) no. 1 | DOI

[7] Katuali. Katuali: A flexible consensus pipeline implemented in Snakemake to basecall, assemble, and polish Oxford Nanopore Technologies’ sequencing data. URL: https://nanoporetech.github.io/katuali/index.html (Accessed 25th July 2022)

[8] Kolmogorov, M.; Yuan, J.; Lin, Y.; Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs, Nature Biotechnology, Volume 37 (2019) no. 5, pp. 540-546 | DOI

[9] Koren, S.; Walenz, B. P.; Berlin, K.; Miller, J. R.; Bergman, N. H.; Phillippy, A. M. Canu: scalable and accurate long-read assembly via adaptive ik/i-mer weighting and repeat separation, Genome Research, Volume 27 (2017) no. 5, pp. 722-736 | DOI

[10] Koster, J.; Rahmann, S. Snakemake--a scalable bioinformatics workflow engine, Bioinformatics, Volume 28 (2012) no. 19, pp. 2520-2522 | DOI

[11] Kurtzer, G. M.; Sochat, V.; Bauer, M. W. Singularity: Scientific containers for mobility of compute, PLOS ONE, Volume 12 (2017) no. 5 | DOI

[12] Laetsch, D. R.; Blaxter, M. L. BlobTools: Interrogation of genome assemblies, F1000Research, Volume 6 (2017) | DOI

[13] Latorre-Pérez, A.; Villalba-Bermell, P.; Pascual, J.; Vilanova, C. Assembly methods for nanopore-based metagenomic sequencing: a comparative study, Scientific Reports, Volume 10 (2020) no. 1 | DOI

[14] Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R. The Sequence Alignment/Map format and SAMtools, Bioinformatics, Volume 25 (2009) no. 16, pp. 2078-2079 | DOI

[15] Li, H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences, Bioinformatics, Volume 32 (2016) no. 14, pp. 2103-2110 | DOI

[16] Liao, Y.-C.; Cheng, H.-W.; Wu, H.-C.; Kuo, S.-C.; Lauderdale, T.-L. Y.; Chen, F.-J. Completing Circular Bacterial Genomes With Assembly Complexity by Using a Sampling Strategy From a Single MinION Run With Barcoding, Frontiers in Microbiology, Volume 10 (2019) (https://www.frontiersin.org/articles/10.3389/fmicb.2019.0206) | DOI

[17] Liu, H.; Wu, S.; Li, A.; Ruan, J. SMARTdenovo: a de novo assembler using long noisy reads, Gigabyte, Volume 2020 (2021), pp. 1-9 | DOI

[18] Loman, N. J.; Quick, J.; Simpson, J. T. A complete bacterial genome assembled de novo using only nanopore sequencing data, Nature Methods, Volume 12 (2015) no. 8, pp. 733-735 | DOI

[19] Mapleson, D.; Garcia Accinelli, G.; Kettleborough, G.; Wright, J.; Clavijo, B. J. KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies, Bioinformatics, Volume 33 (2017) | DOI

[20] Medaka. Medaka: Sequence correction provided by ONT Research. URL: https://github.com/nanoporetech/medaka (Accessed 25th July 2022)

[21] Mohamed, M.; Dang, N. T.-M.; Ogyama, Y.; Burlet, N.; Mugat, B.; Boulesteix, M.; Mérel, V.; Veber, P.; Salces-Ortiz, J.; Severac, D.; Pélisson, A.; Vieira, C.; Sabot, F.; Fablet, M.; Chambeyron, S. A Transposon Story: From TE Content to TE Dynamic Invasion of Drosophila Genomes Using the Single-Molecule Sequencing Technology from Oxford Nanopore, Cells, Volume 9 (2020) no. 8 | DOI

[22] Murigneux, V.; Rai, S. K.; Furtado, A.; Bruxner, T. J. C.; Tian, W.; Harliwong, I.; Wei, H.; Yang, B.; Ye, Q.; Anderson, E.; Mao, Q.; Drmanac, R.; Wang, O.; Peters, B. A.; Xu, M.; Wu, P.; Topp, B.; Coin, L. J. M.; Henry, R. J. Comparison of long-read methods for sequencing and assembly of a plant genome, GigaScience, Volume 9 (2020) no. 12 | DOI

[23] Murigneux, V.; Roberts, L. W.; Forde, B. M.; Phan, M.-D.; Nhu, N. T. K.; Irwin, A. D.; Harris, P. N. A.; Paterson, D. L.; Schembri, M. A.; Whiley, D. M.; Beatson, S. A. MicroPIPE: validating an end-to-end workflow for high-quality complete bacterial genome construction, BMC Genomics, Volume 22 (2021) no. 1 | DOI

[24] Nattestad, M.; Schatz, M. C. Assemblytics: a web analytics tool for the detection of variants from an assembly, Bioinformatics, Volume 32 (2016) no. 19, pp. 3021-3023 | DOI

[25] Orjuela, J.; Comte, A.; Ravel, S.; Charriat, F.; Vi, T.; Sabot, F.; Cunnac, S. Source code of CulebrONT: a streamlined long reads multi-assembler pipeline for prokaryotic and eukaryotic genomes. Version 3, 2022 | DOI

[26] Orjuela, J.; Comte, A.; Ravel, S.; Charriat, F.; Vi, T.; Sabot, F.; Cunnac, S. Test data, reports and documentation for CulebrONT software: a streamlined long reads multi-assembler pipeline for prokaryotic and eukaryotic genomes, datasuds, 2022 | DOI

[27] Phan, N. T.; Orjuela, J.; Danchin, E. G. J.; Klopp, C.; Perfus‐Barbeoch, L.; Kozlowski, D. K.; Koutsovoulos, G. D.; Lopez‐Roques, C.; Bouchez, O.; Zahm, M.; Besnard, G.; Bellafiore, S. Genome structure and content of the rice root‐knot nematode ( iMeloidogyne graminicola/i ), Ecology and Evolution, Volume 10 (2020) no. 20, pp. 11006-11021 | DOI

[28] Rhie, A.; Walenz, B. P.; Koren, S.; Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies, Genome Biology, Volume 21 (2020) no. 1 | DOI

[29] Shafin, K.; Pesout, T.; Lorig-Roach, R.; Haukness, M.; Olsen, H. E.; Bosworth, C.; Armstrong, J.; Tigyi, K.; Maurer, N.; Koren, S.; Sedlazeck, F. J.; Marschall, T.; Mayes, S.; Costa, V.; Zook, J. M.; Liu, K. J.; Kilburn, D.; Sorensen, M.; Munson, K. M.; Vollger, M. R.; Monlong, J.; Garrison, E.; Eichler, E. E.; Salama, S.; Haussler, D.; Green, R. E.; Akeson, M.; Phillippy, A.; Miga, K. H.; Carnevali, P.; Jain, M.; Paten, B. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes, Nature Biotechnology, Volume 38 (2020) no. 9, pp. 1044-1053 | DOI

[30] Simão, F. A.; Waterhouse, R. M.; Ioannidis, P.; Kriventseva, E. V.; Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs, Bioinformatics, Volume 31 (2015) no. 19, pp. 3210-3212 | DOI

[31] Vaser, R.; Šikić, M. Raven: a de novo genome assembler for long reads, bioRxiv, 2020 (https://www.biorxiv.org/content/early/2020/08/10/2020.08.07.242461.full.pd) | DOI

[32] Vaser, R.; Sović, I.; Nagarajan, N.; Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads, Genome Research, Volume 27 (2017) no. 5, pp. 737-746 | DOI

[33] Walker, B. J.; Abeel, T.; Shea, T.; Priest, M.; Abouelliel, A.; Sakthikumar, S.; Cuomo, C. A.; Zeng, Q.; Wortman, J.; Young, S. K.; Earl, A. M. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement, PLoS ONE, Volume 9 (2014) no. 11 | DOI

[34] Wick, R. R.; Judd, L. M.; Cerdeira, L. T.; Hawkey, J.; Méric, G.; Vezina, B.; Wyres, K. L.; Holt, K. E. Trycycler: consensus long-read assemblies for bacterial genomes, Genome Biology, Volume 22 (2021) no. 1 | DOI

[35] Wick, R. R.; Holt, K. E. Benchmarking of long-read assemblers for prokaryote whole genome sequencing, F1000Research, Volume 8 (2021) | DOI

Cited by Sources: