Section: Genomics
Topic: Genetics/Genomics

MacSyFinder v2: Improved modelling and search engine to identify molecular systems in genomes

Corresponding author(s): Rocha, Eduardo P.C. (erocha@pasteur.fr); Abby, Sophie S. (sophie.abby@univ-grenoble-alpes.fr)

10.24072/pcjournal.250 - Peer Community Journal, Volume 3 (2023), article no. e28.

Get full text PDF Peer reviewed and recommended by PCI
article image

Complex cellular functions are usually encoded by a set of genes in one or a few organized genetic loci in microbial genomes. Macromolecular System Finder (MacSyFinder) is a program that uses these properties to model and then annotate cellular functions in microbial genomes. This is done by integrating the identification of each individual gene at the level of the molecular system. We hereby present a major release of MacSyFinder (version 2) coded in Python 3. The code was improved and rationalized to facilitate future maintainability. Several new features were added to allow more flexible modelling of the systems. We introduce a more intuitive and comprehensive search engine to identify all the best candidate systems and sub-optimal ones that respect the models’ constraints. We also introduce the novel macsydata companion tool that enables the easy installation and broad distribution of the models developed for MacSyFinder (macsy-models) from GitHub repositories. Finally, we have updated and improved MacSyFinder popular models: TXSScan to identify protein secretion systems, TFFscan to identify type IV filaments, CONJscan to identify conjugative systems, and CasFinder to identify CRISPR associated proteins. MacSyFinder and the updated models are available at: https://github.com/gem-pasteur/macsyfinder and https://github.com/macsy-models.

Published online:
DOI: 10.24072/pcjournal.250
Type: Software tool
Mots-clés : genome annotation; modelling; comparative genomics; prokaryotes; microbiology; functional annotation; bioinformatics

Néron, Bertrand 1; Denise, Rémi 2, 3; Coluzzi, Charles 2; Touchon, Marie 2; Rocha, Eduardo P.C. 2; Abby, Sophie S. 4

1 Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics HUB, Paris, France
2 Institut Pasteur, Université Paris Cité, CNRS UMR3525, Microbial Evolutionary Genomics, Paris, France
3 APC Microbiome Ireland & School of Microbiology, University College Cork, Cork, Ireland
4 Univ. Grenoble Alpes, CNRS, UMR 5525, VetAgro Sup, Grenoble INP, TIMC, 38000 Grenoble, France
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
@article{10_24072_pcjournal_250,
     author = {N\'eron, Bertrand and Denise, R\'emi and Coluzzi, Charles and Touchon, Marie and Rocha, Eduardo P.C. and Abby, Sophie S.},
     title = {MacSyFinder v2: {Improved} modelling and search engine to identify molecular systems in genomes},
     journal = {Peer Community Journal},
     eid = {e28},
     publisher = {Peer Community In},
     volume = {3},
     year = {2023},
     doi = {10.24072/pcjournal.250},
     url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.250/}
}
TY  - JOUR
AU  - Néron, Bertrand
AU  - Denise, Rémi
AU  - Coluzzi, Charles
AU  - Touchon, Marie
AU  - Rocha, Eduardo P.C.
AU  - Abby, Sophie S.
TI  - MacSyFinder v2: Improved modelling and search engine to identify molecular systems in genomes
JO  - Peer Community Journal
PY  - 2023
VL  - 3
PB  - Peer Community In
UR  - https://peercommunityjournal.org/articles/10.24072/pcjournal.250/
DO  - 10.24072/pcjournal.250
ID  - 10_24072_pcjournal_250
ER  - 
%0 Journal Article
%A Néron, Bertrand
%A Denise, Rémi
%A Coluzzi, Charles
%A Touchon, Marie
%A Rocha, Eduardo P.C.
%A Abby, Sophie S.
%T MacSyFinder v2: Improved modelling and search engine to identify molecular systems in genomes
%J Peer Community Journal
%D 2023
%V 3
%I Peer Community In
%U https://peercommunityjournal.org/articles/10.24072/pcjournal.250/
%R 10.24072/pcjournal.250
%F 10_24072_pcjournal_250
Néron, Bertrand; Denise, Rémi; Coluzzi, Charles; Touchon, Marie; Rocha, Eduardo P.C.; Abby, Sophie S. MacSyFinder v2: Improved modelling and search engine to identify molecular systems in genomes. Peer Community Journal, Volume 3 (2023), article  no. e28. doi : 10.24072/pcjournal.250. https://peercommunityjournal.org/articles/10.24072/pcjournal.250/

PCI peer reviews and recommendation, and links to data, scripts, code and supplementary information: 10.24072/pci.genomics.100233

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

[1] Abby, S. Supplementary Information for the MacSyFinder v2 article. figshare. Journal contribution, Figshare, 2023 | DOI

[2] Abby, S. Test dataset for MacSyFinder (v2) and expected output files. figshare. Dataset, Figshare, 2022 | DOI

[3] Abby, S. A comprehensive test dataset for MacSyFinder v2 with TXSScan. figshare. Dataset, Figshare, 2022 | DOI

[4] Abby, S. S.; Denise, R.; Rocha, E. P. Identification of protein secretion systems in bacterial genomes using MacSyFinder version 2 | DOI

[5] Abby, S. S.; Cury, J.; Guglielmini, J.; Néron, B.; Touchon, M.; Rocha, E. P. C. Identification of protein secretion systems in bacterial genomes, Scientific Reports, Volume 6 (2016) no. 1 | DOI

[6] Abby, S. S.; Néron, B.; Ménager, H.; Touchon, M.; Rocha, E. P. C. MacSyFinder: A Program to Mine Genomes for Molecular Systems with an Application to CRISPR-Cas Systems, PLoS ONE, Volume 9 (2014) no. 10 | DOI

[7] Abby, S. S.; Rocha, E. P. C. The Non-Flagellar Type III Secretion System Evolved from the Bacterial Flagellum and Diversified into Host-Cell Adapted Systems, PLoS Genetics, Volume 8 (2012) no. 9 | DOI

[8] Adam, P. S.; Borrel, G.; Gribaldo, S. An archaeal origin of the Wood–Ljungdahl H4MPT branch and the emergence of bacterial methylotrophy, Nature Microbiology, Volume 4 (2019) no. 12, pp. 2155-2163 | DOI

[9] Bernheim, A.; Bikard, D.; Touchon, M.; Rocha, E. P. C. Atypical organizations and epistatic interactions of CRISPRs and cas clusters in genomes and their mobile genetic elements, Nucleic Acids Research (2019) | DOI

[10] Blin, K.; Shaw, S.; Kloosterman, A. M.; Charlop-Powers, Z.; van Wezel, G. P.; Medema, M. H.; Weber, T. antiSMASH 6.0: improving cluster detection and comparison capabilities, Nucleic Acids Research, Volume 49 (2021) no. W1 | DOI

[11] Brandes U; Erlebach T Network Analysis. Methodological Foundations, Lecture Notes in Computer Science series, Springer Berlin Heidelberg, Berlin, Heidelberg, 2005 | DOI

[12] Chibani, C. M.; Mahnert, A.; Borrel, G.; Almeida, A.; Werner, A.; Brugère, J.-F.; Gribaldo, S.; Finn, R. D.; Schmitz, R. A.; Moissl-Eichinger, C. A catalogue of 1,167 genomes from the human gut archaeome, Nature Microbiology, Volume 7 (2021) no. 1, pp. 48-61 | DOI

[13] Coluzzi, C.; Garcillán-Barcia, M. P.; de la Cruz, F.; Rocha, E. P. Evolution of Plasmid Mobility: Origin and Fate of Conjugative and Nonconjugative Plasmids, Molecular Biology and Evolution, Volume 39 (2022) no. 6 | DOI

[14] Couvin, D.; Bernheim, A.; Toffano-Nioche, C.; Touchon, M.; Michalik, J.; Néron, B.; Rocha, E. P. C.; Vergnaud, G.; Gautheret, D.; Pourcel, C. CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins, Nucleic Acids Research, Volume 46 (2018) no. W1 | DOI

[15] Cury, J.; Abby, S. S.; Doppelt-Azeroual, O.; Néron, B.; Rocha, E. P. C. Identifying Conjugative Plasmids and Integrative Conjugative Elements with CONJscan, Horizontal Gene Transfer, Springer US, New York, NY, 2019, pp. 265-283 | DOI

[16] Cury, J.; Touchon, M.; Rocha, E. P. C. Integrative and conjugative elements and their hosts: composition, distribution and organization, Nucleic Acids Research, Volume 45 (2017) no. 15, pp. 8943-8956 | DOI

[17] Dandekar, T. Conservation of gene order: a fingerprint of proteins that physically interact, Trends in Biochemical Sciences, Volume 23 (1998) no. 9, pp. 324-328 | DOI

[18] De La Cruz, F.; Frost, L. S.; Meyer, R. J.; Zechner, E. L. Conjugative DNA metabolism in Gram-negative bacteria, FEMS Microbiology Reviews, Volume 34 (2010) no. 1, pp. 18-40 | DOI

[19] Denise, R.; Abby, S. S.; Rocha, E. P. C. Diversification of the type IV filament superfamily into machines for adhesion, protein secretion, DNA uptake, and motility, PLOS Biology, Volume 17 (2019) no. 7 | DOI

[20] Denise, R.; Abby, S. S.; Rocha, E. P. The Evolution of Protein Secretion Systems by Co-option and Tinkering of Cellular Machineries, Trends in Microbiology, Volume 28 (2020) no. 5, pp. 372-386 | DOI

[21] Eddy, S. R. Accelerated Profile HMM Searches, PLoS Computational Biology, Volume 7 (2011) no. 10 | DOI

[22] Guglielmini, J.; de la Cruz, F.; Rocha, E. P. Evolution of Conjugation and Type IV Secretion Systems, Molecular Biology and Evolution, Volume 30 (2012) no. 2, pp. 315-331 | DOI

[23] Haft, D. H. The TIGRFAMs database of protein families, Nucleic Acids Research, Volume 31 (2003) no. 1, pp. 371-373 | DOI

[24] Hagberg, A.; Schult, D.; Swart, P. Exploring Network Structure, Dynamics, and Function using NetworkX In: Proceedings of the 7th Python in Science conference (SciPy 2008), G Varoquaux, T Vaught, J Millman (Eds.) (2008), pp. 11-15 (https://conference.scipy.org/proceedings/scipy2008/paper_2/full_text.pdf)

[25] Hampton, H. G.; Watson, B. N. J.; Fineran, P. C. The arms race between bacteria and their phage foes, Nature, Volume 577 (2020) no. 7790, pp. 327-336 | DOI

[26] Huynen, M.; Snel, B.; Lathe, W.; Bork, P. Predicting Protein Function by Genomic Context: Quantitative Evaluation and Qualitative Inferences, Genome Research, Volume 10 (2000) no. 8, pp. 1204-1210 | DOI

[27] Kanehisa, M.; Sato, Y. KEGG Mapper for inferring cellular functions from protein sequences, Protein Science, Volume 29 (2019) no. 1, pp. 28-35 | DOI

[28] Karp, P. D.; Midford, P. E.; Billington, R.; Kothari, A.; Krummenacker, M.; Latendresse, M.; Ong, W. K.; Subhraveti, P.; Caspi, R.; Fulcher, C.; Keseler, I. M.; Paley, S. M. Pathway Tools version 23.0 update: software for pathway/genome informatics and systems biology, Briefings in Bioinformatics, Volume 22 (2019) no. 1, pp. 109-126 | DOI

[29] Makarova, K. S.; Wolf, Y. I.; Iranzo, J.; Shmakov, S. A.; Alkhnbashi, O. S.; Brouns, S. J. J.; Charpentier, E.; Cheng, D.; Haft, D. H.; Horvath, P.; Moineau, S.; Mojica, F. J. M.; Scott, D.; Shah, S. A.; Siksnys, V.; Terns, M. P.; Venclovas, Č.; White, M. F.; Yakunin, A. F.; Yan, W.; Zhang, F.; Garrett, R. A.; Backofen, R.; van der Oost, J.; Barrangou, R.; Koonin, E. V. Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants, Nature Reviews Microbiology, Volume 18 (2019) no. 2, pp. 67-83 | DOI

[30] Néron, B.; Abby, S.; MacSyFinder developers gem-pasteur/macsyfinder: MacSyFinder v2.1 (2.1), 2023 | DOI

[31] Pelicic, V. Type IV pili: e pluribus unum?, Molecular Microbiology, Volume 68 (2008) no. 4, pp. 827-837 | DOI

[32] Pende, N.; Sogues, A.; Megrian, D.; Sartori-Rupp, A.; England, P.; Palabikyan, H.; Rittmann, S. K.-M. R.; Graña, M.; Wehenkel, A. M.; Alzari, P. M.; Gribaldo, S. SepF is the FtsZ anchor in archaea, with features of an ancestral cell division system, Nature Communications, Volume 12 (2021) no. 1 | DOI

[33] Perrin, A.; Rocha, E. PanACoTA: a modular tool for massive microbial comparative genomics, NAR genomics and bioinformatics, Volume 3 (2021) (https://academic.oup.com/nargab/article/3/1/lqaa106/6090162)

[34] Rendueles, O.; Garcia-Garcerà, M.; Néron, B.; Touchon, M.; Rocha, E. P. C. Abundance and co-occurrence of extracellular capsules increase environmental breadth: Implications for the emergence of pathogens, PLOS Pathogens, Volume 13 (2017) no. 7 | DOI

[35] Sharp, C.; Foster, K. R. Host control and the evolution of cooperation in host microbiomes, Nature Communications, Volume 13 (2022) no. 1 | DOI

[36] Sonnhammer, E. L.; Eddy, S. R.; Durbin, R. Pfam: A comprehensive database of protein domain families based on seed alignments, Proteins: Structure, Function, and Genetics, Volume 28 (1997) no. 3, pp. 405-420 | DOI

[37] Taib, N.; Megrian, D.; Witwinowski, J.; Adam, P.; Poppleton, D.; Borrel, G.; Beloin, C.; Gribaldo, S. Genome-wide analysis of the Firmicutes illuminates the diderm/monoderm transition, Nature Ecology & Evolution, Volume 4 (2020) no. 12, pp. 1661-1672 | DOI

[38] Teichmann, S. A.; Babu, M. Conservation of gene co-regulation in prokaryotes and eukaryotes, Trends in Biotechnology, Volume 20 (2002) no. 10, pp. 407-410 | DOI

[39] Tesson, F.; Hervé, A.; Mordret, E.; Touchon, M.; d’Humières, C.; Cury, J.; Bernheim, A. Systematic and quantitative view of the antiviral arsenal of prokaryotes, Nature Communications, Volume 13 (2022) no. 1 | DOI

[40] Vallenet, D.; Calteau, A.; Dubois, M.; Amours, P.; Bazin, A.; Beuvin, M.; Burlot, L.; Bussell, X.; Fouteau, S.; Gautreau, G.; Lajus, A.; Langlois, J.; Planel, R.; Roche, D.; Rollin, J.; Rouy, Z.; Sabatet, V.; Médigue, C. MicroScope: an integrated platform for the annotation and exploration of microbial gene functions through genomic, pangenomic and metabolic comparative analysis, Nucleic Acids Research, Volume 48 (2019), p. D579-D589 | DOI

Cited by Sources:

block.super