MATEdb, a data repository of high-quality metazoan transcriptome assemblies to accelerate phylogenomic studies

10.24072/pcjournal.177 - Peer Community Journal, Volume 2 (2022), article no. e58.

Get full text PDF Peer reviewed and recommended by PCI

With the advent of high throughput sequencing, the amount of genomic data available for animals (Metazoa) species has bloomed over the last decade, especially from transcriptomes due to lower sequencing costs and easier assembling process compared to genomes. Transcriptomic data sets have proven useful for phylogenomic studies, such as inference of phylogenetic interrelationships (e.g., species tree reconstruction) and comparative genomics analyses (e.g., gene repertoire evolutionary dynamics). However, these data sets are often analyzed following different analytical pipelines, particularly including different software versions, leading to potential methodological biases when analyzed jointly in a comparative framework. Moreover, these analyses are computationally expensive and not affordable for a large part of the scientific community. More importantly, assembled transcriptomes are usually not deposited in public databases. Furthermore, the quality of these data sets is hardly ever taken into consideration, potentially impacting subsequent analyses such as orthology and phylogenetic or gene repertoire evolution inference. To alleviate these issues, we present Metazoan Assemblies from Transcriptomic Ensembles (MATEdb), a curated database of 335 high-quality transcriptome assemblies from different animal phyla analyzed following the same pipeline. The repository is composed, for each species, of (1) a de novo transcriptome assembly, (2) its candidate coding regions within transcripts (both at the level of nucleotide and amino acid sequences), (3) the coding regions filtered using their contamination profile (i.e., only metazoan content), (4) the longest isoform of the amino acid candidate coding regions, (5) the gene content completeness score as assessed against the BUSCO database, and (6) an orthology-based gene annotation. We complement the repository with gene annotations from high-quality genomes, which are often not straightforward to obtain from individual sequencing projects, totalling 423 high-quality genomic and transcriptomic data sets. We invite the community to provide suggestions for new data sets and new annotation features to be included in subsequent versions, that will be analyzed following the same pipeline and be permanently stored in public repositories. We believe that MATEdb will accelerate research on animal phylogenomics while saving thousands of hours of computational work in a plea for open and collaborative science.

Published online:
DOI: 10.24072/pcjournal.177
Fernández, Rosa 1; Tonzo, Vanina 1; Simón Guerrero, Carolina 1; Lozano-Fernandez, Jesus 1; Martínez-Redondo, Gemma I. 1; Balart-García, Pau 1; Aristide, Leandro 1; Eleftheriadi, Klara 1; Vargas-Chávez, Carlos 1

1 Metazoa Phylogenomics Lab, Biodiversity Program, Institute of Evolutionary Biology (CSIC-University Pompeu Fabra). Passeig marítim de la Barceloneta 37-49. 08003 Barcelona, Spain
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
     author = {Fern\'andez, Rosa and Tonzo, Vanina and Sim\'on Guerrero, Carolina and Lozano-Fernandez, Jesus and Mart{\'\i}nez-Redondo, Gemma I. and Balart-Garc{\'\i}a, Pau and Aristide, Leandro and Eleftheriadi, Klara and Vargas-Ch\'avez, Carlos},
     title = {MATEdb, a data repository of high-quality metazoan transcriptome assemblies to accelerate phylogenomic studies},
     journal = {Peer Community Journal},
     eid = {e58},
     publisher = {Peer Community In},
     volume = {2},
     year = {2022},
     doi = {10.24072/pcjournal.177},
     url = {}
AU  - Fernández, Rosa
AU  - Tonzo, Vanina
AU  - Simón Guerrero, Carolina
AU  - Lozano-Fernandez, Jesus
AU  - Martínez-Redondo, Gemma I.
AU  - Balart-García, Pau
AU  - Aristide, Leandro
AU  - Eleftheriadi, Klara
AU  - Vargas-Chávez, Carlos
TI  - MATEdb, a data repository of high-quality metazoan transcriptome assemblies to accelerate phylogenomic studies
JO  - Peer Community Journal
PY  - 2022
DA  - 2022///
VL  - 2
PB  - Peer Community In
UR  -
UR  -
DO  - 10.24072/pcjournal.177
ID  - 10_24072_pcjournal_177
ER  - 
%0 Journal Article
%A Fernández, Rosa
%A Tonzo, Vanina
%A Simón Guerrero, Carolina
%A Lozano-Fernandez, Jesus
%A Martínez-Redondo, Gemma I.
%A Balart-García, Pau
%A Aristide, Leandro
%A Eleftheriadi, Klara
%A Vargas-Chávez, Carlos
%T MATEdb, a data repository of high-quality metazoan transcriptome assemblies to accelerate phylogenomic studies
%J Peer Community Journal
%D 2022
%V 2
%I Peer Community In
%R 10.24072/pcjournal.177
%F 10_24072_pcjournal_177
Fernández, Rosa; Tonzo, Vanina; Simón Guerrero, Carolina; Lozano-Fernandez, Jesus; Martínez-Redondo, Gemma I.; Balart-García, Pau; Aristide, Leandro; Eleftheriadi, Klara; Vargas-Chávez, Carlos. MATEdb, a data repository of high-quality metazoan transcriptome assemblies to accelerate phylogenomic studies. Peer Community Journal, Volume 2 (2022), article  no. e58. doi : 10.24072/pcjournal.177.

Peer reviewed and recommended by PCI : 10.24072/pci.genomics.100022

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

[1] Buchfink, B.; Xie, C.; Huson, D. H. Fast and sensitive protein alignment using DIAMOND, Nature Methods, Volume 12 (2015) no. 1, pp. 59-60 | DOI

[2] Cantalapiedra, C. P.; Hernández-Plaza, A.; Letunic, I.; Bork, P.; Huerta-Cepas, J. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale, Molecular Biology and Evolution, Volume 38 (2021) no. 12, pp. 5825-5829 | DOI

[3] Caurcel, C.; Laetsch, D. R.; Challis, R.; Kumar, S.; Gharbi, K.; Blaxter, M. MolluscDB: a genome and transcriptome database for molluscs, Philosophical Transactions of the Royal Society B: Biological Sciences, Volume 376 (2021) no. 1825 | DOI

[4] Challis, R.; Richards, E.; Rajan, J.; Cochrane, G.; Blaxter, M. BlobToolKit – Interactive Quality Assessment of Genome Assemblies, G3 Genes|Genomes|Genetics, Volume 10 (2020) no. 4, pp. 1361-1374 | DOI

[5] Cheon, S.; Zhang, J.; Park, C. Is Phylotranscriptomics as Reliable as Phylogenomics?, Molecular Biology and Evolution, Volume 37 (2020) no. 12, pp. 3672-3683 | DOI

[6] De Oliveira, A. L.; Wollesen, T.; Kristof, A.; Scherholz, M.; Redl, E.; Todt, C.; Bleidorn, C.; Wanninger, A. Comparative transcriptomics enlarges the toolkit of known developmental genes in mollusks, BMC Genomics, Volume 17 (2016) no. 1 | DOI

[7] Erséus, C.; Williams, B. W.; Horn, K. M.; Halanych, K. M.; Santos, S. R.; James, S. W.; Creuzé des Châtelliers, M.; Anderson, F. E. Phylogenomic analyses reveal a Palaeozoic radiation and support a freshwater origin for clitellate annelids, Zoologica Scripta, Volume 49 (2020) no. 5, pp. 614-640 | DOI

[8] Fernández, R.; Gabaldón, T. Gene gain and loss across the metazoan tree of life, Nature Ecology and Evolution, Volume 4 (2020) no. 4, pp. 524-533 | DOI

[9] Fernández, R.; Kallal, R. J.; Dimitrov, D.; Ballesteros, J. A.; Arnedo, M. A.; Giribet, G.; Hormiga, G. Phylogenomics, Diversification Dynamics, and Comparative Transcriptomics across the Spider Tree of Life, Current Biology, Volume 28 (2018) no. 13, pp. 2190-2193 | DOI

[10] Fernández, R.; Laumer, C. E.; Vahtera, V.; Libro, S.; Kaluziak, S.; Sharma, P. P.; Pérez-Porro, A. R.; Edgecombe, G. D.; Giribet, G. Evaluating Topological Conflict in Centipede Phylogeny Using Transcriptomic Data Sets, Molecular Biology and Evolution, Volume 31 (2014) no. 6, pp. 1500-1513 | DOI

[11] Geng, Y.; Cai, C.; McAdam, S. A.; Banks, J. A.; Wisecaver, J. H.; Zhou, Y. A De Novo Transcriptome Assembly of Ceratopteris richardii Provides Insights into the Evolutionary Dynamics of Complex Gene Families in Land Plants, Genome Biology and Evolution, Volume 13 (2021) no. 3 | DOI

[12] Grabherr, M. G.; Haas, B. J.; Yassour, M.; Levin, J. Z.; Thompson, D. A.; Amit, I.; Adiconis, X.; Fan, L.; Raychowdhury, R.; Zeng, Q.; Chen, Z.; Mauceli, E.; Hacohen, N.; Gnirke, A.; Rhind, N.; di Palma, F.; Birren, B. W.; Nusbaum, C.; Lindblad-Toh, K.; Friedman, N.; Regev, A. Full-length transcriptome assembly from RNA-Seq data without a reference genome, Nature Biotechnology, Volume 29 (2011) no. 7, pp. 644-652 | DOI

[13] Gu, J.; Dai, J.; Lu, H.; Zhao, H. Comprehensive analysis of ubiquitously expressed genes in human, from a data-driven perspective, Genomics, Proteomics & Bioinformatics, Volume In press (2022) | DOI

[14] Kocot, K. M.; Cannon, J. T.; Todt, C.; Citarella, M. R.; Kohn, A. B.; Meyer, A.; Santos, S. R.; Schander, C.; Moroz, L. L.; Lieb, B.; Halanych, K. M. Phylogenomics reveals deep molluscan relationships, Nature, Volume 477 (2011) no. 7365, pp. 452-456 | DOI

[15] Laumer, C. E.; Fernández, R.; Lemer, S.; Combosch, D.; Kocot, K. M.; Riesgo, A.; Andrade, S. C. S.; Sterrer, W.; Sørensen, M. V.; Giribet, G. Revisiting metazoan phylogeny with genomic sampling of all phyla, Proceedings of the Royal Society B: Biological Sciences, Volume 286 (2019) no. 1906 | DOI

[16] Leinonen, R.; Sugawara, H.; Shumway, M. The Sequence Read Archive, Nucleic Acids Research, Volume 39 (2010) no. Database | DOI

[17] Liu, F.; Li, Y.; Yu, H.; Zhang, L.; Hu, J.; Bao, Z.; Wang, S. MolluscDB: an integrated functional and evolutionary genomics database for the hyper-diverse animal phylum Mollusca, Nucleic Acids Research, Volume 49 (2021) no. D1 | DOI

[18] Lozano-Fernandez, J.; Giacomelli, M.; Fleming, J. F.; Chen, A.; Vinther, J.; Thomsen, P. F.; Glenner, H.; Palero, F.; Legg, D. A.; Iliffe, T. M.; Pisani, D.; Olesen, J. Pancrustacean Evolution Illuminated by Taxon-Rich Genomic-Scale Data Sets with an Expanded Remipede Sampling, Genome Biology and Evolution, Volume 11 (2019) no. 8, pp. 2055-2070 | DOI

[19] Manni, M.; Berkeley, M. R.; Seppey, M.; Simão, F. A.; Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes, Molecular Biology and Evolution, Volume 38 (2021) no. 10, pp. 4647-4654 | DOI

[20] Mongiardino Koch, N.; Coppard, S. E.; Lessios, H. A.; Briggs, D. E. G.; Mooi, R.; Rouse, G. W. A phylogenomic resolution of the sea urchin tree of life, BMC Evolutionary Biology, Volume 18 (2018) no. 1 | DOI

[21] Novo, M.; Fernández, R.; Andrade, S. C.; Marchán, D. F.; Cunha, L.; Díaz Cosín, D. J. Phylogenomic analyses of a Mediterranean earthworm family (Annelida: Hormogastridae), Molecular Phylogenetics and Evolution, Volume 94 (2016), pp. 473-478 | DOI

[22] Ramsköld, D.; Wang, E. T.; Burge, C. B.; Sandberg, R. An Abundance of Ubiquitously Expressed Genes Revealed by Tissue Transcriptome Sequence Data, PLoS Computational Biology, Volume 5 (2009) no. 12 | DOI

[23] Schwentner, M.; Combosch, D. J.; Pakes Nelson, J.; Giribet, G. A Phylogenomic Solution to the Origin of Insects by Resolving Crustacean-Hexapod Relationships, Current Biology, Volume 27 (2017) no. 12 | DOI

[24] Smythe, A. B.; Holovachov, O.; Kocot, K. M. Improved phylogenomic sampling of free-living nematodes enhances resolution of higher-level nematode phylogeny, BMC Evolutionary Biology, Volume 19 (2019) no. 1 | DOI

[25] Thoma, M.; Missbach, C.; Jordan, M. D.; Grosse-Wilde, E.; Newcomb, R. D.; Hansson, B. S. Transcriptome Surveys in Silverfish Suggest a Multistep Origin of the Insect Odorant Receptor Gene Family, Frontiers in Ecology and Evolution, Volume 7 (2019) | DOI

[26] Zapata, F.; Wilson, N. G.; Howison, M.; Andrade, S. C. S.; Jörger, K. M.; Schrödl, M.; Goetz, F. E.; Giribet, G.; Dunn, C. W. Phylogenomic analyses of deep gastropod relationships reject Orthogastropoda, Proceedings of the Royal Society B: Biological Sciences, Volume 281 (2014) no. 1794 | DOI

Cited by Sources: