Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection

10.24072/pcjournal.62 - Peer Community Journal, Volume 1 (2021), article no. e53.

Get full text PDF Peer reviewed and recommended by PCI

The widespread use of High-Throughput Sequencing (HTS) for detection of plant viruses and sequencing of plant virus genomes has led to the generation of large amounts of data and of bioinformatics challenges to process them. Many bioinformatics pipelines for virus detection are available, making the choice of a suitable one difficult. A robust benchmarking is needed for the unbiased comparison of the pipelines, but there is currently a lack of reference datasets that could be used for this purpose. We present 7 semi-artificial datasets composed of real RNA-seq datasets from virus-infected plants spiked with artificial virus reads. Each dataset addresses challenges that could prevent virus detection. We also present 3 real datasets showing a challenging virus composition as well as 8 completely artificial datasets to test haplotype reconstruction software. With these datasets that address several diagnostic challenges, we hope to encourage virologists, diagnosticians and bioinformaticians to evaluate and benchmark their pipeline(s).

Published online:
DOI: 10.24072/pcjournal.62
Tamisier, Lucie 1; Haegeman, Annelies 2; Foucart, Yoika 2; Fouillien, Nicolas 1; Al Rwahnih, Maher 3; Buzkan, Nihal 4; Candresse, Thierry 5; Chiumenti, Michela 6; De Jonghe, Kris 2; Lefebvre, Marie 5; Margaria, Paolo 7; Reynard, Jean Sébastien 8; Stevens, Kristian 3, 9; Kutnjak, Denis 10; Massart, Sébastien 1

1 Université de Liège, Terra-Gembloux Agro-Bio Tech, Plant Pathology Laboratory, Passage des Déportés, 2, 5030 Gembloux, Belgium
2 Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Burg. Van Gansberghelaan 96, 9820 Merelbeke, Belgium
3 Department of Plant Pathology, University of California, Davis, California 95616, USA
4 Department of Plant Protection, Faculty of Agriculture, University of Sütçü Imam, Kahramanmaras 46060, Turkey
5 Univ. Bordeaux, INRAE, UMR BFP, CS20032, 33882 Villenave d’Ornon cedex, France
6 Institute for Sustainable Plant Protection, CNR, Via Amendola 122/D, Bari 70126, Italy
7 Leibniz Institute - DSMZ, German Collection of Microorganisms and Cell Cultures GmbH, 38124 Braunschweig, Germany
8 Virology, Agroscope, Nyon, Switzerland
9 Department of Evolution and Ecology, University of California, Davis, California 95616, USA
10 Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
     author = {Tamisier, Lucie and Haegeman, Annelies and Foucart, Yoika and Fouillien, Nicolas and  Al Rwahnih, Maher and Buzkan, Nihal and Candresse, Thierry and Chiumenti, Michela and De Jonghe, Kris and Lefebvre, Marie and Margaria, Paolo and Reynard, Jean S\'ebastien and Stevens, Kristian and Kutnjak, Denis and Massart, S\'ebastien},
     title = {Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection},
     journal = {Peer Community Journal},
     eid = {e53},
     publisher = {Peer Community In},
     volume = {1},
     year = {2021},
     doi = {10.24072/pcjournal.62},
     url = {}
TI  - Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection
JO  - Peer Community Journal
PY  - 2021
DA  - 2021///
VL  - 1
PB  - Peer Community In
UR  -
UR  -
DO  - 10.24072/pcjournal.62
ID  - 10_24072_pcjournal_62
ER  - 
%0 Journal Article
%T Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection
%J Peer Community Journal
%D 2021
%V 1
%I Peer Community In
%R 10.24072/pcjournal.62
%F 10_24072_pcjournal_62
Tamisier, Lucie; Haegeman, Annelies; Foucart, Yoika; Fouillien, Nicolas;  Al Rwahnih, Maher; Buzkan, Nihal; Candresse, Thierry; Chiumenti, Michela; De Jonghe, Kris; Lefebvre, Marie; Margaria, Paolo; Reynard, Jean Sébastien; Stevens, Kristian; Kutnjak, Denis; Massart, Sébastien. Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection. Peer Community Journal, Volume 1 (2021), article  no. e53. doi : 10.24072/pcjournal.62.

Peer reviewed and recommended by PCI : 10.24072/pci.genomics.100007

[1] Baizan-Edge, A.; Cock, P.; MacFarlane, S.; McGavin, W.; Torrance, L.; Jones, S. Kodoja: A workflow for virus detection in plants using k-mer analysis of RNA-sequencing data, Journal of General Virology, Volume 100 (2019) no. 3, pp. 533-542 | Article

[2] Barzon, L.; Lavezzo, E.; Costanzi, G.; Franchin, E.; Toppo, S.; Palù, G. Next-generation sequencing technologies in diagnostic virology, Journal of Clinical Virology, Volume 58 (2013) no. 2, pp. 346-350 | Article

[3] Blawid, R.; Silva, J.; Nagata, T. Discovering and sequencing new plant viral genomes by next-generation sequencing: description of a practical pipeline, Annals of Applied Biology, Volume 170 (2017) no. 3, pp. 301-314 | Article

[4] Boonham, N.; Kreuze, J.; Winter, S.; van der Vlugt, R.; Bergervoet, J.; Tomlinson, J.; Mumford, R. Methods in virus diagnostics: From ELISA to next generation sequencing, Virus Research, Volume 186 (2014), pp. 20-31 | Article

[5] Buzkan, N.; Chiumenti, M.; Massart, S.; Sarpkaya, K.; Karadağ, S.; Minafra, A. A new emaravirus discovered in Pistacia from Turkey, Virus Research, Volume 263 (2019), pp. 159-163 | Article

[6] De‐Kayne, R.; Frei, D.; Greenway, R.; Mendes, S. L.; Retel, C.; Feulner, P. G. D. Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets, Molecular Ecology Resources, Volume 21 (2021) no. 3, pp. 653-660 | Article

[7] Domingo, E.; Holland, J. J. RNA virus mutations and fitness for survival, Annual Review of Microbiology, Volume 51 (1997) no. 1, pp. 151-178 | Article

[8] Elena, S. F.; Sanjuán, R. Virus Evolution: Insights from an Experimental Approach, Annual Review of Ecology, Evolution, and Systematics, Volume 38 (2007) no. 1, pp. 27-52 | Article

[9] Eliseev, A.; Gibson, K. M.; Avdeyev, P.; Novik, D.; Bendall, M. L.; Pérez-Losada, M.; Alexeev, N.; Crandall, K. A. Evaluation of haplotype callers for next-generation sequencing of viruses, Infection, Genetics and Evolution, Volume 82 (2020) | Article

[10] Escalona, M.; Rocha, S.; Posada, D. A comparison of tools for the simulation of genomic next-generation sequencing data, Nature Reviews Genetics, Volume 17 (2016) no. 8, pp. 459-469 | Article

[11] García-Arenal, F.; McDonald, B. A. An Analysis of the Durability of Resistance to Plant Viruses, Phytopathology®, Volume 93 (2003) no. 8, pp. 941-952 | Article

[12] Ho, T.; Tzanetakis, I. E. Development of a virus detection and discovery pipeline using next generation sequencing, Virology, Volume 471 (2014), pp. 54-60 | Article

[13] Hu, X.; Karasev, A. V.; Brown, C. J.; Lorenzen, J. H. Sequence characteristics of potato virus Y recombinants, Journal of General Virology, Volume 90 (2009) no. 12, pp. 3033-3041 | Article

[14] Huang, W.; Li, L.; Myers, J. R.; Marth, G. T. ART: a next-generation sequencing read simulator, Bioinformatics, Volume 28 (2012) no. 4, pp. 593-594 | Article

[15] Jones, S.; Baizan-Edge, A.; MacFarlane, S.; Torrance, L. Viral Diagnostics in Plants Using Next Generation Sequencing: Computational Analysis in Practice, Frontiers in Plant Science, Volume 8 (2017) | Article

[16] Kalantar, K. L.; Carvalho, T.; de Bourcy, C. F. A.; Dimitrov, B.; Dingle, G.; Egger, R.; Han, J.; Holmes, O. B.; Juan, Y.-F.; King, R.; Kislyuk, A.; Lin, M. F.; Mariano, M.; Morse, T.; Reynoso, L. V.; Cruz, D. R.; Sheu, J.; Tang, J.; Wang, J.; Zhang, M. A.; Zhong, E.; Ahyong, V.; Lay, S.; Chea, S.; Bohl, J. A.; Manning, J. E.; Tato, C. M.; DeRisi, J. L. IDseq—An open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring, GigaScience, Volume 9 (2020) no. 10 | Article

[17] Knierim, D.; Menzel, W.; Winter, S. Immunocapture of virions with virus-specific antibodies prior to high-throughput sequencing effectively enriches for virus-specific sequences, PLOS ONE, Volume 14 (2019) no. 5 | Article

[18] Lefebvre, M.; Theil, S.; Ma, Y.; Candresse, T. The VirAnnot Pipeline: A Resource for Automated Viral Diversity Estimation and Operational Taxonomy Units Assignation for Virome Sequencing Data, Phytobiomes Journal, Volume 3 (2019) no. 4, pp. 256-259 | Article

[19] Lefterova, M. I.; Suarez, C. J.; Banaei, N.; Pinsky, B. A. Next-Generation Sequencing for Infectious Disease Diagnosis and Management, The Journal of Molecular Diagnostics, Volume 17 (2015) no. 6, pp. 623-634 | Article

[20] Longdon, B.; Brockhurst, M. A.; Russell, C. A.; Welch, J. J.; Jiggins, F. M. The Evolution and Genetics of Virus Host Shifts, PLoS Pathogens, Volume 10 (2014) no. 11 | Article

[21] Madeira, F.; Park, Y. m.; Lee, J.; Buso, N.; Gur, T.; Madhusoodanan, N.; Basutkar, P.; Tivey, A. R. N.; Potter, S. C.; Finn, R. D.; Lopez, R. The EMBL-EBI search and sequence analysis tools APIs in 2019, Nucleic Acids Research, Volume 47 (2019) no. W1 | Article

[22] Maree, H. J.; Fox, A.; Al Rwahnih, M.; Boonham, N.; Candresse, T. Application of HTS for Routine Plant Virus Diagnostics: State of the Art and Challenges, Frontiers in Plant Science, Volume 9 (2018) | Article

[23] Martin, D. P.; Lemey, P.; Posada, D. Analysing recombination in nucleotide sequences, Molecular Ecology Resources, Volume 11 (2011) no. 6, pp. 943-955 | Article

[24] Massart, S.; Chiumenti, M.; De Jonghe, K.; Glover, R.; Haegeman, A.; Koloniuk, I.; Komínek, P.; Kreuze, J.; Kutnjak, D.; Lotos, L.; Maclot, F.; Maliogka, V.; Maree, H. J.; Olivier, T.; Olmos, A.; Pooggin, M. M.; Reynard, J.-S.; Ruiz-García, A. B.; Safarova, D.; Schneeberger, P. H. H.; Sela, N.; Turco, S.; Vainio, E. J.; Varallyay, E.; Verdin, E.; Westenberg, M.; Brostaux, Y.; Candresse, T. Virus Detection by High-Throughput Sequencing of Small RNAs: Large-Scale Performance Testing of Sequence Analysis Strategies, Phytopathology®, Volume 109 (2019) no. 3, pp. 488-497 | Article

[25] Massart, S.; Olmos, A.; Jijakli, H.; Candresse, T. Current impact and future directions of high throughput sequencing in plant virus diagnostics, Virus Research, Volume 188 (2014), pp. 90-96 | Article

[26] Nerva, L.; Vigani, G.; Di Silvestre, D.; Ciuffo, M.; Forgia, M.; Chitarra, W.; Turina, M. Biological and Molecular Characterization of Chenopodium quinoa Mitovirus 1 Reveals a Distinct Small RNA Response Compared to Those of Cytoplasmic RNA Viruses, Journal of Virology, Volume 93 (2019) no. 7 | Article

[27] Olmos, A.; Boonham, N.; Candresse, T.; Gentit, P.; Giovani, B.; Kutnjak, D.; Liefting, L.; Maree, H.; Minafra, A.; Moreira, A.; Nakhla, M.; Petter, F.; Ravnikar, M.; Rodoni, B.; Roenhorst, J.; Rott, M.; Ruiz-García, A.; Santala, J.; Stancanelli, G.; van der Vlugt, R.; Varveri, C.; Westenberg, M.; Wetzel, T.; Ziebell, H.; Massart, S. High-throughput sequencing technologies for plant pest diagnosis: challenges and opportunities, EPPO Bulletin, Volume 48 (2018) no. 2, pp. 219-224 | Article

[28] Reynard, J.-S.; Brodard, J.; Dubuis, N.; Zufferey, V.; Schumpp, O.; Schaerer, S.; Gugerli, P. Grapevine red blotch virus: Absence in Swiss Vineyards and Analysis of Potential Detrimental Effect on Viticultural Performance, Plant Disease, Volume 102 (2018) no. 3, pp. 651-655 | Article

[29] Rott, M.; Xiang, Y.; Boyes, I.; Belton, M.; Saeed, H.; Kesanakurti, P.; Hayes, S.; Lawrence, T.; Birch, C.; Bhagwat, B.; Rast, H. Application of Next Generation Sequencing for Diagnostic Testing of Tree Fruit Viruses and Viroids, Plant Disease, Volume 101 (2017) no. 8, pp. 1489-1499 | Article

[30] Schirmer, M.; Sloan, W. T.; Quince, C. Benchmarking of viral haplotype reconstruction programmes: an overview of the capacities and limitations of currently available programmes, Briefings in Bioinformatics, Volume 15 (2014) no. 3, pp. 431-442 | Article

[31] Thekke-Veetil, T.; Ho, T.; Postman, J.; Martin, R.; Tzanetakis, I. A Virus in American Blackcurrant (Ribes americanum) with Distinct Genome Features Reshapes Classification in the Tymovirales, Viruses, Volume 10 (2018) no. 8 | Article

[32] Visser, M.; Burger, J. T.; Maree, H. J. Targeted virus detection in next-generation sequencing data using an automated e-probe based approach, Virology, Volume 495 (2016), pp. 122-128 | Article

[33] Wu, Q.; Ding, S.-W.; Zhang, Y.; Zhu, S. Identification of Viruses and Viroids by Next-Generation Sequencing and Homology-Dependent and Homology-Independent Algorithms, Annual Review of Phytopathology, Volume 53 (2015) no. 1, pp. 425-444 | Article

[34] Zheng, Y.; Gao, S.; Padmanabhan, C.; Li, R.; Galvez, M.; Gutierrez, D.; Fuentes, S.; Ling, K.-S.; Kreuze, J.; Fei, Z. VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs, Virology, Volume 500 (2017), pp. 130-138 | Article

Cited by Sources: