RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes

10.24072/pcjournal.244 - Peer Community Journal, Volume 3 (2023), article no. e19.

Get full text PDF Peer reviewed and recommended by PCI

Compared to eukaryotes, repetitive sequences are rare in bacterial genomes and usually do not persist for long. Yet, there is at least one class of persistent prokaryotic mobile genetic elements: REPINs. REPINs are non-autonomous transposable elements replicated by single-copy transposases called RAYTs. REPIN-RAYT systems are mostly vertically inherited and have persisted in individual bacterial lineages for millions of years. Discovering and analyzing REPIN populations and their corresponding RAYT transposases in bacterial species can be rather laborious, hampering progress in understanding REPIN-RAYT biology and evolution. Here we present RAREFAN, a webservice that identifies REPIN populations and their corresponding RAYT transposase in a given set of bacterial genomes. We demonstrate RAREFAN’s capabilities by analyzing a set of 49 Stenotrophomonas maltophilia genomes, containing nine different REPIN-RAYT systems. We guide the reader through the process of identifying and analyzing REPIN-RAYT systems across S. maltophilia, highlighting erroneous associations between REPIN and RAYTs, and providing solutions on how to find correct associations. RAREFAN enables rapid, large-scale detection of REPINs and RAYTs, and provides insight into the fascinating world of intragenomic sequence populations in bacterial genomes. RAREFAN is available at

Published online:
DOI: 10.24072/pcjournal.244
Keywords: sequence analysis, mobile genetic elements, bacterial genomes, Stenotrophomonas maltophilia
Fortmann-Grote, Carsten 1; Irmer, Julia von 1; Bertels, Frederic 1

1 Max-Planck-Institute for Evolutionary Biology, Department of Microbial Population Biology, Plön, Germany
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
     author = {Fortmann-Grote, Carsten and Irmer, Julia von and Bertels, Frederic},
     title = {RAREFAN: {A} webservice to identify {REPINs} and {RAYTs} in bacterial genomes},
     journal = {Peer Community Journal},
     eid = {e19},
     publisher = {Peer Community In},
     volume = {3},
     year = {2023},
     doi = {10.24072/pcjournal.244},
     url = {}
AU  - Fortmann-Grote, Carsten
AU  - Irmer, Julia von
AU  - Bertels, Frederic
TI  - RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes
JO  - Peer Community Journal
PY  - 2023
VL  - 3
PB  - Peer Community In
UR  -
UR  -
DO  - 10.24072/pcjournal.244
ID  - 10_24072_pcjournal_244
ER  - 
%0 Journal Article
%A Fortmann-Grote, Carsten
%A Irmer, Julia von
%A Bertels, Frederic
%T RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes
%J Peer Community Journal
%D 2023
%V 3
%I Peer Community In
%R 10.24072/pcjournal.244
%F 10_24072_pcjournal_244
Fortmann-Grote, Carsten; Irmer, Julia von; Bertels, Frederic. RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes. Peer Community Journal, Volume 3 (2023), article  no. e19. doi : 10.24072/pcjournal.244.

Peer reviewed and recommended by PCI : 10.24072/pci.genomics.100166

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

[1] Afgan, E.; Baker, D.; Batut, B.; van den Beek, M.; Bouvier, D.; Čech, M.; Chilton, J.; Clements, D.; Coraor, N.; Grüning, B. A.; Guerler, A.; Hillman-Jackson, J.; Hiltemann, S.; Jalili, V.; Rasche, H.; Soranzo, N.; Goecks, J.; Taylor, J.; Nekrutenko, A.; Blankenberg, D. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update, Nucleic Acids Research, Volume 46 (2018) no. W1 | DOI

[2] Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. Basic local alignment search tool, Journal of Molecular Biology, Volume 215 (1990) no. 3, pp. 403-410 | DOI

[3] Arnold, K.; Gosling, J.; Holmes, D. The Java programming language, Addison Wesley Professional, 2005

[4] Bertels, F.; Gallie, J.; Rainey, P. B. Identification and Characterization of Domesticated Bacterial Transposases, Genome Biology and Evolution, Volume 9 (2017) no. 8, pp. 2110-2121 | DOI

[5] Bertels, F.; Gokhale, C. S.; Traulsen, A. Discovering Complete Quasispecies in Bacterial Genomes, Genetics, Volume 206 (2017) no. 4, pp. 2149-2157 | DOI

[6] Bertels, F.; Rainey, P. B. Within-Genome Evolution of REPINs: a New Family of Miniature Mobile DNA in Bacteria, PLoS Genetics, Volume 7 (2011) no. 6 | DOI

[7] Bertels, F.; Rainey, P. B. Curiosities of REPINs and RAYTs, Mobile Genetic Elements, Volume 1 (2011) no. 4, pp. 262-301 | DOI

[8] Bertels, F.; Rainey, P. B. Ancient Darwinian replicators nested within eubacterial genomes, BioEssays, Volume 45 (2023) no. 2 | DOI

[9] Bichsel, M.; Barbour, A. D.; Wagner, A. The early phase of a bacterial insertion sequence infection, Theoretical Population Biology, Volume 78 (2010) no. 4, pp. 278-288 | DOI

[10] Camacho, C.; Coulouris, G.; Avagyan, V.; Ma, N.; Papadopoulos, J.; Bealer, K.; Madden, T. L. BLAST+: architecture and applications, BMC Bioinformatics, Volume 10 (2009) no. 1 | DOI

[11] van Dijk, B.; Bertels, F.; Stolk, L.; Takeuchi, N.; Rainey, P. B. Transposable elements promote the evolution of genome streamlining, Philosophical Transactions of the Royal Society B: Biological Sciences, Volume 377 (2022) no. 1842 | DOI

[12] Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research, Volume 32 (2004) no. 5, pp. 1792-1797 | DOI

[13] Felsenstein, J. Phylogenies and the Comparative Method, The American Naturalist, Volume 125 (1985) no. 1, pp. 1-15 | DOI

[14] Grinberg, M. Flask web development: developing web applications with python, O’Reilly Media, Inc., 2018

[15] Guindon, S.; Dufayard, J.-F.; Lefort, V.; Anisimova, M.; Hordijk, W.; Gascuel, O. New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0, Systematic Biology, Volume 59 (2010) no. 3, pp. 307-321 | DOI

[16] Haubold, B.; Klötzl, F.; Pfaffelhuber, P. andi: Fast and accurate estimation of evolutionary distances between closely related genomes, Bioinformatics, Volume 31 (2015) no. 8, pp. 1169-1175 | DOI

[17] Higgins, C. F.; Ames, G. F.-L.; Barnes, W. M.; Clement, J. M.; Hofnung, M. A novel intercistronic regulatory element of prokaryotic operons, Nature, Volume 298 (1982) no. 5876, pp. 760-762 | DOI

[18] Kearse, M.; Moir, R.; Wilson, A.; Stones-Havas, S.; Cheung, M.; Sturrock, S.; Buxton, S.; Cooper, A.; Markowitz, S.; Duran, C.; Thierer, T.; Ashton, B.; Meintjes, P.; Drummond, A. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data, Bioinformatics, Volume 28 (2012) no. 12, pp. 1647-1649 | DOI

[19] Kleinmann, S. G.; Rudolph, S.; Vila, S.; Rodin, J.; Peña, J. F.-S. The Debian GNU/Linux Operating System Manual., 2021 (

[20] Lawrence, J. G.; Ochman, H.; Hartl, D. L. The evolution of insertion sequences within enteric bacteria., Genetics, Volume 131 (1992) no. 1, pp. 9-20 | DOI

[21] Nunvar, J.; Huckova, T.; Licha, I. Identification and characterization of repetitive extragenic palindromes (REP)-associated tyrosine transposases: implications for REP evolution and dynamics in bacterial genomes, BMC Genomics, Volume 11 (2010) no. 1 | DOI

[22] Park, H. J.; Gokhale, C. S.; Bertels, F. How sequence populations persist inside bacterial genomes, Genetics, Volume 217 (2021) no. 4 | DOI

[23] R Core Team R: A Language and Environment for Statistical Computing, 2016 (

[24] Rankin, D. J.; Bichsel, M.; Wagner, A. Mobile DNA can drive lineage extinction in prokaryotic populations, Journal of Evolutionary Biology, Volume 23 (2010) no. 11, pp. 2422-2431 | DOI

[25] RStudio Inc Easy web applications in R, 2013 (

[26] Rutherford, K.; Parkhill, J.; Crook, J.; Horsnell, T.; Rice, P.; Rajandream, M.-A.; Barrell, B. Artemis: sequence visualization and annotation, Bioinformatics, Volume 16 (2000) no. 10, pp. 944-945 | DOI

[27] Sawyer, S. A.; Dykhuizen, D. E.; DuBose, R. F.; Green, L.; Mutangadura-Mhlanga, T.; Wolczyk, D. F.; Hartl, D. L. Distribution and Abundance of Insertion Sequences Among Natural Isolates of Escherichia coli, Genetics, Volume 115 (1987) no. 1, pp. 51-63 | DOI

[28] Ton-Hoang, B.; Siguier, P.; Quentin, Y.; Onillon, S.; Marty, B.; Fichant, G.; Chandler, M. Structuring the bacterial genome: Y1-transposases associated with REP-BIME sequences †, Nucleic Acids Research, Volume 40 (2012) no. 8, pp. 3596-3609 | DOI

[29] Van Dongen, S. A cluster algorithm for graphs, Information Systems [INS], CWI, 2000 (

[30] Van Rossum, G. Python reference manual, Department of Computer Science [CS], CWI, 1995 (

[31] Wu, Y.; Aandahl, R. Z.; Tanaka, M. M. Dynamics of bacterial insertion sequences: can transposition bursts help the elements persist?, BMC Evolutionary Biology, Volume 15 (2015) no. 1 | DOI

[32] Yu, G.; Lam, T. T.-Y.; Zhu, H.; Guan, Y. Two Methods for Mapping and Visualizing Associated Data on Phylogeny Using Ggtree, Molecular Biology and Evolution, Volume 35 (2018) no. 12, pp. 3041-3043 | DOI

Cited by Sources: