Section: Mathematical & Computational Biology
Topic: Biophysics and computational biology, Genetics/genomics, Immunology and inflammation

Alignment-free detection and seed-based identification of multi-loci V(D)J recombinations in Vidjil-algo

Corresponding author(s): Salson, Mikaël (contact@vidjil.org)

10.24072/pcjournal.547 - Peer Community Journal, Volume 5 (2025), article no. e44.

Get full text PDF Peer reviewed and recommended by PCI

The diversity of the immune repertoire is grounded on V(D)J recombinations in several loci. Many algorithms and software detect and designate these recombinations in high-throughput sequencing data. To improve their efficiency, we propose a multi-loci seed identification through an Aho-Corasick like automaton as well as a seed-based gene filtration. These algorithms were implemented into Vidjil-algo, used routinely by several labs for the analysis of hematologic malignancies. We benchmark the results of Vidjil-algo and of MiXCR on five datasets, evaluating the specificity and sensitivity of the detection, as well as the adequation of the designation to manually curated sequences. Compared to the previous algorithms, the new algorithms implemented in Vidjil-algo bring speedups between 3× and 30×, with a smaller memory footprint and without quality loss in results. They enable to precisely annotate in a few minutes millions of sequences coming from V(D)J recombinations, including incomplete V(D)J-like recombinations, improving our knowledge on immune repertoires.

Published online:
DOI: 10.24072/pcjournal.547
Type: Research article
Keywords: Spaced seeds, Aho-Corasick Automaton, Alignment-free algorithm, Immune repertoire, VDJ recombinations, Adaptive immune receptor repertoire

Borée, Cyprien 1; Giraud, Mathieu 1; Salson, Mikaël 1

1 Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
@article{10_24072_pcjournal_547,
     author = {Bor\'ee, Cyprien and Giraud, Mathieu and Salson, Mika\"el},
     title = {Alignment-free detection and seed-based identification of multi-loci {V(D)J} recombinations in {Vidjil-algo}},
     journal = {Peer Community Journal},
     eid = {e44},
     publisher = {Peer Community In},
     volume = {5},
     year = {2025},
     doi = {10.24072/pcjournal.547},
     language = {en},
     url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.547/}
}
TY  - JOUR
AU  - Borée, Cyprien
AU  - Giraud, Mathieu
AU  - Salson, Mikaël
TI  - Alignment-free detection and seed-based identification of multi-loci V(D)J recombinations in Vidjil-algo
JO  - Peer Community Journal
PY  - 2025
VL  - 5
PB  - Peer Community In
UR  - https://peercommunityjournal.org/articles/10.24072/pcjournal.547/
DO  - 10.24072/pcjournal.547
LA  - en
ID  - 10_24072_pcjournal_547
ER  - 
%0 Journal Article
%A Borée, Cyprien
%A Giraud, Mathieu
%A Salson, Mikaël
%T Alignment-free detection and seed-based identification of multi-loci V(D)J recombinations in Vidjil-algo
%J Peer Community Journal
%D 2025
%V 5
%I Peer Community In
%U https://peercommunityjournal.org/articles/10.24072/pcjournal.547/
%R 10.24072/pcjournal.547
%G en
%F 10_24072_pcjournal_547
Borée, C.; Giraud, M.; Salson, M. Alignment-free detection and seed-based identification of multi-loci V(D)J recombinations in Vidjil-algo. Peer Community Journal, Volume 5 (2025), article  no. e44. doi : 10.24072/pcjournal.547. https://peercommunityjournal.org/articles/10.24072/pcjournal.547/

PCI peer reviews and recommendation, and links to data, scripts, code and supplementary information: 10.24072/pci.mcb.100268

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

[1] Abdollahi, N. B cell receptor repertoire analysis in clinical context : new approaches for clonal grouping, intra-clonal diversity studies, and repertoire visualization, Sorbonne Université (2021)

[2] Afzal, S.; Gil-Farina, I.; Gabriel, R.; Ahmad, S.; von Kalle, C.; Schmidt, M.; Fronza, R. Systematic comparative study of computational methods for T-cell receptor sequencing data analysis, Briefings in Bioinformatics, Volume 20 (2019) no. 1, pp. 222-234 | DOI

[3] Aho, A. V.; Corasick, M. J. Efficient string matching: An aid to bibliographic search, Communications of the ACM, Volume 18 (1975) no. 6, pp. 333-340 | DOI

[4] Arnaout, R.; Lee, W.; Cahill, P.; Honan, T.; Sparrow, T.; Weiand, M.; Nusbaum, C.; Rajewsky, K.; Koralov, S. B. High-Resolution Description of Antibody Heavy-Chain Repertoires in Humans, PLoS ONE, Volume 6 (2011) no. 8, p. e22365 | DOI

[5] Benichou, J.; Ben-Hamo, R.; Louzoun, Y.; Efroni, S. Rep-Seq: uncovering the immunological repertoire through next-generation sequencing, Immunology, Volume 135 (2012) no. 3, p. 183-91 | DOI

[6] Bolotin, D. A.; Poslavsky, S.; Mitrophanov, I.; Shugay, M.; Mamedov, I. Z.; Putintseva, E. V.; Chudakov, D. M. MiXCR: software for comprehensive adaptive immunity profiling, Nature Methods, Volume 12 (2015) no. 5, pp. 380-381 | DOI

[7] Bolotin, D. A.; Shugay, M.; Mamedov, I. Z.; Ekaterina V Putintseva, M. A. T.; Zvyagin, I. V.; Britanova, O. V.; Chudakov, D. M. MiTCR: software for T-cell receptor sequencing data analysis, Nature Methods, Volume 10 (2013), pp. 813-814 | DOI

[8] Brown, D. G. Bioinformatics Algorithms: Techniques and Applications, Section: A survey of seeding for sequence alignment, 2008, pp. 126-152 | DOI

[9] Brüggemann, M.; Kotrová, M.; Knecht, H.; Bartram, J.; Boudjogrha, M.; Bystry, V.; Fazio, G.; Froňková, E.; Giraud, M.; Grioni, A.; others Standardized next-generation sequencing of immunoglobulin and T-cell receptor gene recombinations for MRD marker identification in acute lymphoblastic leukaemia; a EuroClonality-NGS validation study, Leukemia, Volume 33 (2019), pp. 2241-2253 | DOI

[10] Cavé, H.; van der Werff Ten Bosch, J.; Suciu, S.; Guidal, C.; Waterkeyn, C.; Otten, J.; Bakkus, M.; Thielemans, K.; Grandchamp, B.; Vilmer, E.; Nelken, B.; Fournier, M.; Boutard, P.; Lebrun, E.; Méchinaud, F.; Garand, R.; Robert, A.; Dastugue, N.; Plouvier, E.; Racadot, E.; Ferster, A.; Gyselinck, J.; Fenneteau, O.; Duval, M.; Solbu, G.; Manel, A.-M. Clinical significance of minimal residual disease in childhood acute lymphoblastic leukemia, New England Journal of Medicine, Volume 339 (1998) no. 9, pp. 591-598 (Publisher: Mass Medical Soc) | DOI

[11] Chao, K.-M.; Pearson, W. R.; Miller, W. Aligning two sequences within a specified diagonal band, Bioinformatics, Volume 8 (1992) no. 5, pp. 481-487 | DOI

[12] Duez, M.; Giraud, M.; Herbert, R.; Rocher, T.; Salson, M.; Thonier, F. Vidjil: A web platform for analysis of high-throughput repertoire sequencing, PLOS One, Volume 11 (2016) no. 11, p. e0166126 | DOI

[13] Edgar, R. C.; Taylor, J.; Lin, V.; Altman, T.; Barbera, P.; Meleshko, D.; Lohr, D.; Novakovsky, G.; Buchfink, B.; Al-Shayeb, B.; others Petabase-scale sequence alignment catalyses viral discovery, Nature, Volume 602 (2022) no. 7895, pp. 142-147 (Publisher: Nature Publishing Group UK London) | DOI

[14] Giraud, M.; Salson, M.; Duez, M.; Villenet, C.; Quief, S.; Caillault, A.; Grardel, N.; Roumier, C.; Preudhomme, C.; Figeac, M. Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing, BMC Genomics, Volume 15 (2014) no. 1, p. 409 | DOI

[15] Giudicelli, V.; Chaume, D.; Mennessier, G.; Althaus, H.-H.; Müller, W.; Bodmer, J.; Malik, A.; Lefranc, M.-P. IMGT, the international ImMunoGeneTics database: a new design for immunogenetics data access, MEDINFO'98, IOS Press, 1998, pp. 351-355 | DOI

[16] Giudicelli, V.; Duroux, P.; Ginestoux, C.; Folch, G.; Jabado-Michaloud, J.; Chaume, D.; Lefranc, M.-P. IMGT/LIGM-DB, the IMGT® comprehensive database of immunoglobulin and T cell receptor nucleotide sequences, Nucleic acids research, Volume 34 (2006) no. suppl 1, p. D781-D784 (Publisher: Oxford Univ Press) | DOI

[17] Jones, N. C.; Pevzner, P. A. An introduction to bioinformatics algorithms, MIT Press, 2004

[18] Kuchenbecker, L.; Nienen, M.; Hecht, J.; Neumann, A. U.; Babel, N.; Reinert, K.; Robinson, P. N. IMSEQ – a fast and error aware approach to immunogenetic sequence analysis, Bioinformatics, Volume 31 (2015) no. 18, p. btv309 | DOI

[19] Köster, J.; Rahmann, S. Snakemake—a scalable bioinformatics workflow engine, Bioinformatics, Volume 28 (2012) no. 19, pp. 2520-2522 (Publisher: Oxford University Press) | DOI

[20] Langlois de Septenville, A.; Boudjoghra, M.; Bravetti, C.; Armand, M.; Salson, M.; Giraud, M.; Davi, F. Immunoglobulin Gene Mutational Status Assessment by Next Generation Sequencing in Chronic Lymphocytic Leukemia, Immunogenetics (Methods in Molecular Biology), Volume 2453, Springer, 2022, pp. 153-167 | DOI

[21] Lefranc, M.-P. IMGT, the International ImMunoGeneTics Information System, Cold Spring Harbor Protocols, Volume 2011 (2011) no. 6, p. pdb-top115 (Publisher: Cold Spring Harbor Laboratory Press) | DOI

[22] Marcou, Q.; Mora, T.; Walczak, A. M. High-throughput immune repertoire analysis with IGoR, Nature communications, Volume 9 (2018) no. 1, p. 561 (Publisher: Nature Publishing Group) | DOI

[23] Pibiri, G. An accelerated Vidjil algorithm: up to 30X faster identification of V(D)J recombinations via spaced seeds and Aho-Corasick pattern matching, Peer Community in Mathematical and Computational Biology (2024) | DOI

[24] Ralph, D. K.; Iv, F. A. M. Consistency of VDJ Rearrangement and Substitution Parameters Enables Accurate B Cell Receptor Sequence Annotation, PLOS Comput Biol, Volume 12 (2016) no. 1, p. e1004409 | DOI

[25] Salson, M.; Caillault, A.; Duez, M.; Ferret, Y.; Fievet, A.; Kotrova, M.; Thonier, F.; Villarese, P.; Wakeman, S.; Wright, G.; Giraud, M. A Dataset of Sequences with Manually Curated V(D)J Designations, 2016 (https://hal.science/hal-01331556v1)

[26] Shlemov, A.; Bankevich, S.; Bzikadze, A.; Turchaninova, M. A.; Safonova, Y.; Pevzner, P. A. Reconstructing antibody repertoires from error-prone immunosequencing reads, The Journal of Immunology, Volume 199 (2017) no. 9, pp. 3369-3380 (Publisher: Am Assoc Immnol) | DOI

[27] Thomas, N.; Heather, J.; Ndifon, W.; Shawe-Taylor, J.; Chain, B. Decombinator: a tool for fast, efficient gene assignment in T-cell receptor sequences using a finite state machine, Bioinformatics, Volume 29 (2013) no. 5, pp. 542-550 | DOI

[28] Tonegawa, S. Somatic generation of antibody diversity, Nature, Volume 302 (1983) no. 5909, pp. 575-581 (Publisher: Nature Publishing Group) | DOI

[29] Villarese, P.; Abdo, C.; Bertrand, M.; Thonier, F.; Giraud, M.; Salson, M.; Macintyre, E. One-Step Next-Generation Sequencing of Immunoglobulin and T-Cell Receptor Gene Recombinations for MRD Marker Identification in Acute Lymphoblastic Leukemia, Immunogenetics. Methods and Protocols (Methods in Molecular Biology), Volume 2453, Springer, 2022, pp. 43-59 | DOI

[30] Yang, X.; Liu, D.; Lv, N.; Zhao, F.; Liu, F.; Zou, J.; Chen, Y.; Xiao, X.; Wu, J.; Liu, P.; Gao, J.; Hu, Y.; Shi, Y.; Liu, J.; Zhang, R.; Chen, C.; Ma, J.; Gao, G. F.; Zhu, B. TCRklass: A New K-String-Based Algorithm for Human and Mouse TCR Repertoire Characterization., Journal of Immunology, Volume 194 (2014) no. 1, pp. 446-454 | DOI

[31] Ye, J.; Ma, N.; Madden, T. L.; Ostell, J. M. IgBLAST: an immunoglobulin variable domain sequence analysis tool, Nucleic Acids Research, Volume 41 (2013), p. W34-W40 | DOI

Cited by Sources: