A rapid and simple method for assessing and representing genome sequence relatedness

10.24072/pcjournal.37 - Peer Community Journal, Volume 1 (2021), article no. e24.

Get full text PDF Peer reviewed and recommended by PCI

Coherent genomic groups are frequently used as a proxy for bacterial species delineation through computation of overall genome relatedness indices (OGRI). Average nucleotide identity (ANI) is a widely employed method for estimating relatedness between genomic sequences. However, pairwise comparisons of genome sequences based on ANI is relatively computationally intensive and therefore precludes analyses of large datasets composed of thousands of genome sequences.In this work we proposed a workflow to compute and visualize relationships between genomic sequences. A dataset containing more than 3,500 Pseudomonas genome sequences was successfully classified with an alternative OGRI based on k-mer counts in few hours with the same precision as ANI. A new visualization method based on zoomable circle packing was employed for assessing relationships among the 350 groups generated. Amendment of databases with these Pseudomonas groups greatly improved the classification of metagenomic read sets with k-mer-based classifier. The developed workflow was integrated in the user-friendly KI-S tool that is available at the following address:

Published online:
DOI: 10.24072/pcjournal.37
Briand, M 1; Bouzid, M 1; Hunault, G 2; Legeay, M 3; Fischer-Le Saux, M 1; Barret, M 1

1 Univ Angers, Institut Agro, INRAE, IRHS, SFR QUASAV, F-49000 Angers, France
2 Université d’Angers, Laboratoire d’Hémodynamique, Interaction Fibrose et Invasivité tumorale hépatique, UPRES 3859, IFR 132, F-49045 Angers, France
3 Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
     author = {Briand, M and Bouzid, M and Hunault, G and Legeay, M and Fischer-Le Saux, M and Barret, M},
     title = {A rapid and simple method for assessing and representing genome sequence relatedness},
     journal = {Peer Community Journal},
     eid = {e24},
     publisher = {Peer Community In},
     volume = {1},
     year = {2021},
     doi = {10.24072/pcjournal.37},
     url = {}
AU  - Briand, M
AU  - Bouzid, M
AU  - Hunault, G
AU  - Legeay, M
AU  - Fischer-Le Saux, M
AU  - Barret, M
TI  - A rapid and simple method for assessing and representing genome sequence relatedness
JO  - Peer Community Journal
PY  - 2021
DA  - 2021///
VL  - 1
PB  - Peer Community In
UR  -
UR  -
DO  - 10.24072/pcjournal.37
ID  - 10_24072_pcjournal_37
ER  - 
%0 Journal Article
%A Briand, M
%A Bouzid, M
%A Hunault, G
%A Legeay, M
%A Fischer-Le Saux, M
%A Barret, M
%T A rapid and simple method for assessing and representing genome sequence relatedness
%J Peer Community Journal
%D 2021
%V 1
%I Peer Community In
%R 10.24072/pcjournal.37
%F 10_24072_pcjournal_37
Briand, M; Bouzid, M; Hunault, G; Legeay, M; Fischer-Le Saux, M; Barret, M. A rapid and simple method for assessing and representing genome sequence relatedness. Peer Community Journal, Volume 1 (2021), article  no. e24. doi : 10.24072/pcjournal.37.

Peer reviewed and recommended by PCI : 10.24072/pci.genomics.100001

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

[1] Amann, R.; Rosselló-Móra, R. After All, Only Millions?, mBio, Volume 7 (2016) no. 4 | DOI

[2] Benoit, G.; Peterlongo, P.; Mariadassou, M.; Drezen, E.; Schbath, S.; Lavenier, D.; Lemaitre, C. Multiple comparative metagenomics using multisetk-mer counting, PeerJ Computer Science, Volume 2 (2016) | DOI

[3] Berge, O.; Monteil, C. L.; Bartoli, C.; Chandeysson, C.; Guilbaud, C.; Sands, D. C.; Morris, C. E. A User's Guide to a Data Base of the Diversity of Pseudomonas syringae and Its Application to Classifying Strains in This Phylogenetic Complex, PLoS ONE, Volume 9 (2014) no. 9 | DOI

[4] Bromham, L. Why do species vary in their rate of molecular evolution?, Biology Letters, Volume 5 (2009) no. 3, pp. 401-404 | DOI

[5] Deorowicz, S.; Gudyś, A.; Długosz, M.; Kokot, M.; Danek, A. Kmer-db: instant evolutionary distance estimation, Bioinformatics, Volume 35 (2019), pp. 133-136 | DOI

[6] Déraspe, M.; Raymond, F.; Boisvert, S.; Culley, A.; Roy, P. H.; Laviolette, F.; Corbeil, J. Phenetic Comparison of Prokaryotic Genomes Using k-mers, Molecular Biology and Evolution, Volume 34 (2017) no. 10, pp. 2716-2729 | DOI

[7] Garrido-Sanz, D.; Meier-Kolthoff, J. P.; Göker, M.; Martín, M.; Rivilla, R.; Redondo-Nieto, M. Genomic and Genetic Diversity within the Pseudomonas fluorescens Complex, PLOS ONE, Volume 11 (2016) no. 2 | DOI

[8] Garrity, G. M. A New Genomics-Driven Taxonomy of Bacteria and Archaea: Are We There Yet?, Journal of Clinical Microbiology, Volume 54 (2016) no. 8, pp. 1956-1963 | DOI

[9] Gomila, M.; Busquets, A.; Mulet, M.; García-Valdés, E.; Lalucat, J. Clarification of Taxonomic Status within the Pseudomonas syringae Species Group Based on a Phylogenomic Analysis, Frontiers in Microbiology, Volume 8 (2017) | DOI

[10] Gomila, M.; Peña, A.; Mulet, M.; Lalucat, J.; García-Valdés, E. Phylogenomics and systematics in Pseudomonas, Frontiers in Microbiology, Volume 6 (2015) | DOI

[11] Goris, J.; Konstantinidis, K. T.; Klappenbach, J. A.; Coenye, T.; Vandamme, P.; Tiedje, J. M. DNA–DNA hybridization values and their relationship to whole-genome sequence similarities, International Journal of Systematic and Evolutionary Microbiology, Volume 57 (2007) no. 1, pp. 81-91 | DOI

[12] Grimont, P. A. D. Use of DNA reassociation in bacterial classification, Canadian Journal of Microbiology, Volume 34 (1988) no. 4, pp. 541-546 | DOI

[13] Hesse, C.; Schulz, F.; Bull, C. T.; Shaffer, B. T.; Yan, Q.; Shapiro, N.; Hassan, K. A.; Varghese, N.; Elbourne, L. D. H.; Paulsen, I. T.; Kyrpides, N.; Woyke, T.; Loper, J. E. Genome‐based evolutionary history of Pseudomonas spp, Environmental Microbiology, Volume 20 (2018) no. 6, pp. 2142-2159 | DOI

[14] Jain, C.; Rodriguez-R, L. M.; Phillippy, A. M.; Konstantinidis, K. T.; Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries, Nature Communications, Volume 9 (2018) no. 1 | DOI

[15] Kim, D.; Song, L.; Breitwieser, F. P.; Salzberg, S. L. Centrifuge: rapid and sensitive classification of metagenomic sequences, Genome Research, Volume 26 (2016) no. 12, pp. 1721-1729 | DOI

[16] Lee, I.; Ouk Kim, Y.; Park, S.-C.; Chun, J. OrthoANI: An improved algorithm and software for calculating average nucleotide identity, International Journal of Systematic and Evolutionary Microbiology, Volume 66 (2016) no. 2, pp. 1100-1103 | DOI

[17] Locey, K. J.; Lennon, J. T. Scaling laws predict global microbial diversity, Proceedings of the National Academy of Sciences, Volume 113 (2016) no. 21, pp. 5970-5975 | DOI

[18] Meier-Kolthoff, J. P.; Auch, A. F.; Klenk, H.-P.; Göker, M. Genome sequence-based species delimitation with confidence intervals and improved distance functions, BMC Bioinformatics, Volume 14 (2013) no. 1 | DOI

[19] Nasko, D. J.; Koren, S.; Phillippy, A. M.; Treangen, T. J. RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification, Genome Biology, Volume 19 (2018) no. 1 | DOI

[20] Ondov, B. D.; Treangen, T. J.; Melsted, P.; Mallonee, A. B.; Bergman, N. H.; Koren, S.; Phillippy, A. M. Mash: fast genome and metagenome distance estimation using MinHash, Genome Biology, Volume 17 (2016) no. 1 | DOI

[21] Ounit, R.; Wanamaker, S.; Close, T. J.; Lonardi, S. CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers, BMC Genomics, Volume 16 (2015) no. 1 | DOI

[22] Parks, D. H.; Chuvochina, M.; Waite, D. W.; Rinke, C.; Skarshewski, A.; Chaumeil, P.-A.; Hugenholtz, P. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life, Nature Biotechnology, Volume 36 (2018) no. 10, pp. 996-1004 | DOI

[23] Parte, A. C. LPSN – List of Prokaryotic names with Standing in Nomenclature (, 20 years on, International Journal of Systematic and Evolutionary Microbiology, Volume 68 (2018) no. 6, pp. 1825-1829 | DOI

[24] Peix, A.; Ramírez-Bahena, M.-H.; Velázquez, E. Historical evolution and current status of the taxonomy of genus Pseudomonas, Infection, Genetics and Evolution, Volume 9 (2009) no. 6, pp. 1132-1147 | DOI

[25] Peix, A.; Ramírez-Bahena, M.-H.; Velázquez, E. The current status on the taxonomy of Pseudomonas revisited: An update, Infection, Genetics and Evolution, Volume 57 (2018), pp. 106-116 | DOI

[26] Pritchard, L.; Glover, R. H.; Humphris, S.; Elphinstone, J. G.; Toth, I. K. Genomics and taxonomy in diagnostics for food security: soft-rotting enterobacterial plant pathogens, Analytical Methods, Volume 8 (2016) no. 1, pp. 12-24 | DOI

[27] Richter, M.; Rosselló-Móra, R. Shifting the genomic gold standard for the prokaryotic species definition, Proceedings of the National Academy of Sciences, Volume 106 (2009) no. 45, pp. 19126-19131 | DOI

[28] Rodriguez-R, L. M.; Gunturu, S.; Harvey, W. T.; Rosselló-Mora, R.; Tiedje, J. M.; Cole, J. R.; Konstantinidis, K. T. The Microbial Genomes Atlas (MiGA) webserver: taxonomic and gene diversity analysis of Archaea and Bacteria at the whole genome level, Nucleic Acids Research, Volume 46 (2018) no. W1 | DOI

[29] Rosselló-Móra, R.; Amann, R. Past and future species definitions for Bacteria and Archaea, Systematic and Applied Microbiology, Volume 38 (2015) no. 4, pp. 209-216 | DOI

[30] Rosselló-Móra, R.; Sutcliffe, I. C. Reflections on the introduction of the Digital Protologue Database – a partial success?, Antonie van Leeuwenhoek, Volume 112 (2019) no. 2, pp. 141-143 | DOI

[31] Sczyrba, A.; Hofmann, P.; Belmann, P.; Koslicki, D.; Janssen, S.; Dröge, J.; Gregor, I.; Majda, S.; Fiedler, J.; Dahms, E.; Bremges, A.; Fritz, A.; Garrido-Oter, R.; Jørgensen, T. S.; Shapiro, N.; Blood, P. D.; Gurevich, A.; Bai, Y.; Turaev, D.; DeMaere, M. Z.; Chikhi, R.; Nagarajan, N.; Quince, C.; Meyer, F.; Balvočiūtė, M.; Hansen, L. H.; Sørensen, S. J.; Chia, B. K. H.; Denis, B.; Froula, J. L.; Wang, Z.; Egan, R.; Don Kang, D.; Cook, J. J.; Deltel, C.; Beckstette, M.; Lemaitre, C.; Peterlongo, P.; Rizk, G.; Lavenier, D.; Wu, Y.-W.; Singer, S. W.; Jain, C.; Strous, M.; Klingenberg, H.; Meinicke, P.; Barton, M. D.; Lingner, T.; Lin, H.-H.; Liao, Y.-C.; Silva, G. G. Z.; Cuevas, D. A.; Edwards, R. A.; Saha, S.; Piro, V. C.; Renard, B. Y.; Pop, M.; Klenk, H.-P.; Göker, M.; Kyrpides, N. C.; Woyke, T.; Vorholt, J. A.; Schulze-Lefert, P.; Rubin, E. M.; Darling, A. E.; Rattei, T.; McHardy, A. C. Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software, Nature Methods, Volume 14 (2017) no. 11, pp. 1063-1071 | DOI

[32] Torres-Cortés, G.; Bonneau, S.; Bouchez, O.; Genthon, C.; Briand, M.; Jacques, M.-A.; Barret, M. Functional Microbial Features Driving Community Assembly During Seed Germination and Emergence, Frontiers in Plant Science, Volume 9 | DOI

[33] Varghese, N. J.; Mukherjee, S.; Ivanova, N.; Konstantinidis, K. T.; Mavrommatis, K.; Kyrpides, N. C.; Pati, A. Microbial species delineation using whole genome sequences, Nucleic Acids Research, Volume 43 (2015) no. 14, pp. 6761-6771 | DOI

[34] Vinatzer, B. A.; Tian, L.; Heath, L. S. A proposal for a portal to make earth’s microbial diversity easily accessible and searchable, Antonie van Leeuwenhoek, Volume 110 (2017) no. 10, pp. 1271-1279 | DOI

[35] Wood, D. E.; Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact alignments, Genome Biology, Volume 15 (2014) no. 3 | DOI

[36] Yoon, S.-H.; Ha, S.-m.; Lim, J.; Kwon, S.; Chun, J. A large-scale evaluation of algorithms to calculate average nucleotide identity, Antonie van Leeuwenhoek, Volume 110 (2017) no. 10, pp. 1281-1286 | DOI

Cited by Sources: