Section: Genomics
Topic:
Genetics/Genomics,
Plant biology,
Microbiology
Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection
Corresponding author(s): Tamisier, Lucie (lucie.tamisier@inrae.fr)
10.24072/pcjournal.62 - Peer Community Journal, Volume 1 (2021), article no. e53.
Get full text PDF Peer reviewed and recommended by PCIThe widespread use of High-Throughput Sequencing (HTS) for detection of plant viruses and sequencing of plant virus genomes has led to the generation of large amounts of data and of bioinformatics challenges to process them. Many bioinformatics pipelines for virus detection are available, making the choice of a suitable one difficult. A robust benchmarking is needed for the unbiased comparison of the pipelines, but there is currently a lack of reference datasets that could be used for this purpose. We present 7 semi-artificial datasets composed of real RNA-seq datasets from virus-infected plants spiked with artificial virus reads. Each dataset addresses challenges that could prevent virus detection. We also present 3 real datasets showing a challenging virus composition as well as 8 completely artificial datasets to test haplotype reconstruction software. With these datasets that address several diagnostic challenges, we hope to encourage virologists, diagnosticians and bioinformaticians to evaluate and benchmark their pipeline(s).
Type: Software tool
Tamisier, Lucie 1; Haegeman, Annelies 2; Foucart, Yoika 2; Fouillien, Nicolas 1; Al Rwahnih, Maher 3; Buzkan, Nihal 4; Candresse, Thierry 5; Chiumenti, Michela 6; De Jonghe, Kris 2; Lefebvre, Marie 5; Margaria, Paolo 7; Reynard, Jean Sébastien 8; Stevens, Kristian 9, 3; Kutnjak, Denis 10; Massart, Sébastien 1
@article{10_24072_pcjournal_62, author = {Tamisier, Lucie and Haegeman, Annelies and Foucart, Yoika and Fouillien, Nicolas and Al Rwahnih, Maher and Buzkan, Nihal and Candresse, Thierry and Chiumenti, Michela and De Jonghe, Kris and Lefebvre, Marie and Margaria, Paolo and Reynard, Jean S\'ebastien and Stevens, Kristian and Kutnjak, Denis and Massart, S\'ebastien}, title = {Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection}, journal = {Peer Community Journal}, eid = {e53}, publisher = {Peer Community In}, volume = {1}, year = {2021}, doi = {10.24072/pcjournal.62}, url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.62/} }
TY - JOUR AU - Tamisier, Lucie AU - Haegeman, Annelies AU - Foucart, Yoika AU - Fouillien, Nicolas AU - Al Rwahnih, Maher AU - Buzkan, Nihal AU - Candresse, Thierry AU - Chiumenti, Michela AU - De Jonghe, Kris AU - Lefebvre, Marie AU - Margaria, Paolo AU - Reynard, Jean Sébastien AU - Stevens, Kristian AU - Kutnjak, Denis AU - Massart, Sébastien TI - Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection JO - Peer Community Journal PY - 2021 VL - 1 PB - Peer Community In UR - https://peercommunityjournal.org/articles/10.24072/pcjournal.62/ DO - 10.24072/pcjournal.62 ID - 10_24072_pcjournal_62 ER -
%0 Journal Article %A Tamisier, Lucie %A Haegeman, Annelies %A Foucart, Yoika %A Fouillien, Nicolas %A Al Rwahnih, Maher %A Buzkan, Nihal %A Candresse, Thierry %A Chiumenti, Michela %A De Jonghe, Kris %A Lefebvre, Marie %A Margaria, Paolo %A Reynard, Jean Sébastien %A Stevens, Kristian %A Kutnjak, Denis %A Massart, Sébastien %T Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection %J Peer Community Journal %D 2021 %V 1 %I Peer Community In %U https://peercommunityjournal.org/articles/10.24072/pcjournal.62/ %R 10.24072/pcjournal.62 %F 10_24072_pcjournal_62
Tamisier, Lucie; Haegeman, Annelies; Foucart, Yoika; Fouillien, Nicolas; Al Rwahnih, Maher; Buzkan, Nihal; Candresse, Thierry; Chiumenti, Michela; De Jonghe, Kris; Lefebvre, Marie; Margaria, Paolo; Reynard, Jean Sébastien; Stevens, Kristian; Kutnjak, Denis; Massart, Sébastien. Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection. Peer Community Journal, Volume 1 (2021), article no. e53. doi : 10.24072/pcjournal.62. https://peercommunityjournal.org/articles/10.24072/pcjournal.62/
PCI peer reviews and recommendation, and links to data, scripts, code and supplementary information: 10.24072/pci.genomics.100007
Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
[1] Kodoja: A workflow for virus detection in plants using k-mer analysis of RNA-sequencing data, Journal of General Virology, Volume 100 (2019) no. 3, pp. 533-542 | DOI
[2] Next-generation sequencing technologies in diagnostic virology, Journal of Clinical Virology, Volume 58 (2013) no. 2, pp. 346-350 | DOI
[3] Discovering and sequencing new plant viral genomes by next-generation sequencing: description of a practical pipeline, Annals of Applied Biology, Volume 170 (2017) no. 3, pp. 301-314 | DOI
[4] Methods in virus diagnostics: From ELISA to next generation sequencing, Virus Research, Volume 186 (2014), pp. 20-31 | DOI
[5] A new emaravirus discovered in Pistacia from Turkey, Virus Research, Volume 263 (2019), pp. 159-163 | DOI
[6] Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets, Molecular Ecology Resources, Volume 21 (2021) no. 3, pp. 653-660 | DOI
[7] RNA virus mutations and fitness for survival, Annual Review of Microbiology, Volume 51 (1997) no. 1, pp. 151-178 | DOI
[8] Virus Evolution: Insights from an Experimental Approach, Annual Review of Ecology, Evolution, and Systematics, Volume 38 (2007) no. 1, pp. 27-52 | DOI
[9] Evaluation of haplotype callers for next-generation sequencing of viruses, Infection, Genetics and Evolution, Volume 82 (2020) | DOI
[10] A comparison of tools for the simulation of genomic next-generation sequencing data, Nature Reviews Genetics, Volume 17 (2016) no. 8, pp. 459-469 | DOI
[11] An Analysis of the Durability of Resistance to Plant Viruses, Phytopathology®, Volume 93 (2003) no. 8, pp. 941-952 | DOI
[12] Development of a virus detection and discovery pipeline using next generation sequencing, Virology, Volume 471 (2014), pp. 54-60 | DOI
[13] Sequence characteristics of potato virus Y recombinants, Journal of General Virology, Volume 90 (2009) no. 12, pp. 3033-3041 | DOI
[14] ART: a next-generation sequencing read simulator, Bioinformatics, Volume 28 (2012) no. 4, pp. 593-594 | DOI
[15] Viral Diagnostics in Plants Using Next Generation Sequencing: Computational Analysis in Practice, Frontiers in Plant Science, Volume 8 (2017) | DOI
[16] IDseq—An open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring, GigaScience, Volume 9 (2020) no. 10 | DOI
[17] Immunocapture of virions with virus-specific antibodies prior to high-throughput sequencing effectively enriches for virus-specific sequences, PLOS ONE, Volume 14 (2019) no. 5 | DOI
[18] The VirAnnot Pipeline: A Resource for Automated Viral Diversity Estimation and Operational Taxonomy Units Assignation for Virome Sequencing Data, Phytobiomes Journal, Volume 3 (2019) no. 4, pp. 256-259 | DOI
[19] Next-Generation Sequencing for Infectious Disease Diagnosis and Management, The Journal of Molecular Diagnostics, Volume 17 (2015) no. 6, pp. 623-634 | DOI
[20] The Evolution and Genetics of Virus Host Shifts, PLoS Pathogens, Volume 10 (2014) no. 11 | DOI
[21] The EMBL-EBI search and sequence analysis tools APIs in 2019, Nucleic Acids Research, Volume 47 (2019) no. W1 | DOI
[22] Application of HTS for Routine Plant Virus Diagnostics: State of the Art and Challenges, Frontiers in Plant Science, Volume 9 (2018) | DOI
[23] Analysing recombination in nucleotide sequences, Molecular Ecology Resources, Volume 11 (2011) no. 6, pp. 943-955 | DOI
[24] Virus Detection by High-Throughput Sequencing of Small RNAs: Large-Scale Performance Testing of Sequence Analysis Strategies, Phytopathology®, Volume 109 (2019) no. 3, pp. 488-497 | DOI
[25] Current impact and future directions of high throughput sequencing in plant virus diagnostics, Virus Research, Volume 188 (2014), pp. 90-96 | DOI
[26] Biological and Molecular Characterization of Chenopodium quinoa Mitovirus 1 Reveals a Distinct Small RNA Response Compared to Those of Cytoplasmic RNA Viruses, Journal of Virology, Volume 93 (2019) no. 7 | DOI
[27] High-throughput sequencing technologies for plant pest diagnosis: challenges and opportunities, EPPO Bulletin, Volume 48 (2018) no. 2, pp. 219-224 | DOI
[28] Grapevine red blotch virus: Absence in Swiss Vineyards and Analysis of Potential Detrimental Effect on Viticultural Performance, Plant Disease, Volume 102 (2018) no. 3, pp. 651-655 | DOI
[29] Application of Next Generation Sequencing for Diagnostic Testing of Tree Fruit Viruses and Viroids, Plant Disease, Volume 101 (2017) no. 8, pp. 1489-1499 | DOI
[30] Benchmarking of viral haplotype reconstruction programmes: an overview of the capacities and limitations of currently available programmes, Briefings in Bioinformatics, Volume 15 (2014) no. 3, pp. 431-442 | DOI
[31] A Virus in American Blackcurrant (Ribes americanum) with Distinct Genome Features Reshapes Classification in the Tymovirales, Viruses, Volume 10 (2018) no. 8 | DOI
[32] Targeted virus detection in next-generation sequencing data using an automated e-probe based approach, Virology, Volume 495 (2016), pp. 122-128 | DOI
[33] Identification of Viruses and Viroids by Next-Generation Sequencing and Homology-Dependent and Homology-Independent Algorithms, Annual Review of Phytopathology, Volume 53 (2015) no. 1, pp. 425-444 | DOI
[34] VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs, Virology, Volume 500 (2017), pp. 130-138 | DOI
Cited by Sources: