
Section: Mathematical & Computational Biology
Topic:
Biophysics and computational biology,
Genetics/genomics,
Immunology and inflammation
Alignment-free detection and seed-based identification of multi-loci V(D)J recombinations in Vidjil-algo
Corresponding author(s): Salson, Mikaël (contact@vidjil.org)
10.24072/pcjournal.547 - Peer Community Journal, Volume 5 (2025), article no. e44.
Get full text PDF Peer reviewed and recommended by PCIThe diversity of the immune repertoire is grounded on V(D)J recombinations in several loci. Many algorithms and software detect and designate these recombinations in high-throughput sequencing data. To improve their efficiency, we propose a multi-loci seed identification through an Aho-Corasick like automaton as well as a seed-based gene filtration. These algorithms were implemented into Vidjil-algo, used routinely by several labs for the analysis of hematologic malignancies. We benchmark the results of Vidjil-algo and of MiXCR on five datasets, evaluating the specificity and sensitivity of the detection, as well as the adequation of the designation to manually curated sequences. Compared to the previous algorithms, the new algorithms implemented in Vidjil-algo bring speedups between 3× and 30×, with a smaller memory footprint and without quality loss in results. They enable to precisely annotate in a few minutes millions of sequences coming from V(D)J recombinations, including incomplete V(D)J-like recombinations, improving our knowledge on immune repertoires.
Type: Research article
Borée, Cyprien 1; Giraud, Mathieu 1; Salson, Mikaël 1

@article{10_24072_pcjournal_547, author = {Bor\'ee, Cyprien and Giraud, Mathieu and Salson, Mika\"el}, title = {Alignment-free detection and seed-based identification of multi-loci {V(D)J} recombinations in {Vidjil-algo}}, journal = {Peer Community Journal}, eid = {e44}, publisher = {Peer Community In}, volume = {5}, year = {2025}, doi = {10.24072/pcjournal.547}, language = {en}, url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.547/} }
TY - JOUR AU - Borée, Cyprien AU - Giraud, Mathieu AU - Salson, Mikaël TI - Alignment-free detection and seed-based identification of multi-loci V(D)J recombinations in Vidjil-algo JO - Peer Community Journal PY - 2025 VL - 5 PB - Peer Community In UR - https://peercommunityjournal.org/articles/10.24072/pcjournal.547/ DO - 10.24072/pcjournal.547 LA - en ID - 10_24072_pcjournal_547 ER -
%0 Journal Article %A Borée, Cyprien %A Giraud, Mathieu %A Salson, Mikaël %T Alignment-free detection and seed-based identification of multi-loci V(D)J recombinations in Vidjil-algo %J Peer Community Journal %D 2025 %V 5 %I Peer Community In %U https://peercommunityjournal.org/articles/10.24072/pcjournal.547/ %R 10.24072/pcjournal.547 %G en %F 10_24072_pcjournal_547
Borée, C.; Giraud, M.; Salson, M. Alignment-free detection and seed-based identification of multi-loci V(D)J recombinations in Vidjil-algo. Peer Community Journal, Volume 5 (2025), article no. e44. doi : 10.24072/pcjournal.547. https://peercommunityjournal.org/articles/10.24072/pcjournal.547/
PCI peer reviews and recommendation, and links to data, scripts, code and supplementary information: 10.24072/pci.mcb.100268
Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
[1] B cell receptor repertoire analysis in clinical context : new approaches for clonal grouping, intra-clonal diversity studies, and repertoire visualization, Sorbonne Université (2021)
[2] Systematic comparative study of computational methods for T-cell receptor sequencing data analysis, Briefings in Bioinformatics, Volume 20 (2019) no. 1, pp. 222-234 | DOI
[3] Efficient string matching: An aid to bibliographic search, Communications of the ACM, Volume 18 (1975) no. 6, pp. 333-340 | DOI
[4] High-Resolution Description of Antibody Heavy-Chain Repertoires in Humans, PLoS ONE, Volume 6 (2011) no. 8, p. e22365 | DOI
[5] Rep-Seq: uncovering the immunological repertoire through next-generation sequencing, Immunology, Volume 135 (2012) no. 3, p. 183-91 | DOI
[6] MiXCR: software for comprehensive adaptive immunity profiling, Nature Methods, Volume 12 (2015) no. 5, pp. 380-381 | DOI
[7] MiTCR: software for T-cell receptor sequencing data analysis, Nature Methods, Volume 10 (2013), pp. 813-814 | DOI
[8] Bioinformatics Algorithms: Techniques and Applications, Section: A survey of seeding for sequence alignment, 2008, pp. 126-152 | DOI
[9] Standardized next-generation sequencing of immunoglobulin and T-cell receptor gene recombinations for MRD marker identification in acute lymphoblastic leukaemia; a EuroClonality-NGS validation study, Leukemia, Volume 33 (2019), pp. 2241-2253 | DOI
[10] Clinical significance of minimal residual disease in childhood acute lymphoblastic leukemia, New England Journal of Medicine, Volume 339 (1998) no. 9, pp. 591-598 (Publisher: Mass Medical Soc) | DOI
[11] Aligning two sequences within a specified diagonal band, Bioinformatics, Volume 8 (1992) no. 5, pp. 481-487 | DOI
[12] Vidjil: A web platform for analysis of high-throughput repertoire sequencing, PLOS One, Volume 11 (2016) no. 11, p. e0166126 | DOI
[13] Petabase-scale sequence alignment catalyses viral discovery, Nature, Volume 602 (2022) no. 7895, pp. 142-147 (Publisher: Nature Publishing Group UK London) | DOI
[14] Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing, BMC Genomics, Volume 15 (2014) no. 1, p. 409 | DOI
[15] IMGT, the international ImMunoGeneTics database: a new design for immunogenetics data access, MEDINFO'98, IOS Press, 1998, pp. 351-355 | DOI
[16] IMGT/LIGM-DB, the IMGT® comprehensive database of immunoglobulin and T cell receptor nucleotide sequences, Nucleic acids research, Volume 34 (2006) no. suppl 1, p. D781-D784 (Publisher: Oxford Univ Press) | DOI
[17] An introduction to bioinformatics algorithms, MIT Press, 2004
[18] IMSEQ – a fast and error aware approach to immunogenetic sequence analysis, Bioinformatics, Volume 31 (2015) no. 18, p. btv309 | DOI
[19] Snakemake—a scalable bioinformatics workflow engine, Bioinformatics, Volume 28 (2012) no. 19, pp. 2520-2522 (Publisher: Oxford University Press) | DOI
[20] Immunoglobulin Gene Mutational Status Assessment by Next Generation Sequencing in Chronic Lymphocytic Leukemia, Immunogenetics (Methods in Molecular Biology), Volume 2453, Springer, 2022, pp. 153-167 | DOI
[21] IMGT, the International ImMunoGeneTics Information System, Cold Spring Harbor Protocols, Volume 2011 (2011) no. 6, p. pdb-top115 (Publisher: Cold Spring Harbor Laboratory Press) | DOI
[22] High-throughput immune repertoire analysis with IGoR, Nature communications, Volume 9 (2018) no. 1, p. 561 (Publisher: Nature Publishing Group) | DOI
[23] An accelerated Vidjil algorithm: up to 30X faster identification of V(D)J recombinations via spaced seeds and Aho-Corasick pattern matching, Peer Community in Mathematical and Computational Biology (2024) | DOI
[24] Consistency of VDJ Rearrangement and Substitution Parameters Enables Accurate B Cell Receptor Sequence Annotation, PLOS Comput Biol, Volume 12 (2016) no. 1, p. e1004409 | DOI
[25] A Dataset of Sequences with Manually Curated V(D)J Designations, 2016 (https://hal.science/hal-01331556v1)
[26] Reconstructing antibody repertoires from error-prone immunosequencing reads, The Journal of Immunology, Volume 199 (2017) no. 9, pp. 3369-3380 (Publisher: Am Assoc Immnol) | DOI
[27] Decombinator: a tool for fast, efficient gene assignment in T-cell receptor sequences using a finite state machine, Bioinformatics, Volume 29 (2013) no. 5, pp. 542-550 | DOI
[28] Somatic generation of antibody diversity, Nature, Volume 302 (1983) no. 5909, pp. 575-581 (Publisher: Nature Publishing Group) | DOI
[29] One-Step Next-Generation Sequencing of Immunoglobulin and T-Cell Receptor Gene Recombinations for MRD Marker Identification in Acute Lymphoblastic Leukemia, Immunogenetics. Methods and Protocols (Methods in Molecular Biology), Volume 2453, Springer, 2022, pp. 43-59 | DOI
[30] TCRklass: A New K-String-Based Algorithm for Human and Mouse TCR Repertoire Characterization., Journal of Immunology, Volume 194 (2014) no. 1, pp. 446-454 | DOI
[31] IgBLAST: an immunoglobulin variable domain sequence analysis tool, Nucleic Acids Research, Volume 41 (2013), p. W34-W40 | DOI
Cited by Sources: