Section: Zoology
Topic: Applied biological sciences

A pipeline for assessing the quality of images and metadata from crowd-sourced databases

10.24072/pcjournal.205 - Peer Community Journal, Volume 2 (2022), article no. e81.

Get full text PDF Peer reviewed and recommended by PCI

Crowd-sourced biodiversity databases provide easy access to data and images for ecological education and research. One concern with using publicly sourced databases; however, is the quality of their images, taxonomic descriptions, and geographical metadata. The method presented in this paper attempts to address this concern using a suite of pipelines to evaluate taxonomic consistency, how well geo-tagging fits known distributions, and the image quality of crowd-sourced data acquired from iNaturalist, a crowd-sourced biodiversity database. Additionally, it provides researchers that use these datasets to report a quantifiable assessment of the taxonomic consistency. The pipeline allows users to analyze multiple images from iNaturalist and their associated metadata; to determine the level of taxonomic identification (family, genera, species) for each occurrence; whether the taxonomy label for an image matches accepted nesting of families, genera, and species; and whether geo-tags match the distribution of the taxon described using occurrence data from the Global Biodiversity Infrastructure Facility (GBIF) as a reference. Additionally, image quality is assessed using BRISQUE, an algorithm that allows for image quality evaluation without a reference photo. Entries from the order Araneae (spiders) are used as a case study. Overall, the results suggest that iNaturalist can provide large metadata and image sets for research. Given the inevitability of some low-quality observations, this pipeline provides a valuable resource for researchers and educators to evaluate the quality of iNaturalist and other crowd-sourced data.

Published online:
DOI: 10.24072/pcjournal.205
Type: Software tool
Billotte, Jackie 1, 2

1 Colorado State University, Fort Collins, CO, USA
2 The Butterfly Pavilion, Brighton, CO, USA
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
@article{10_24072_pcjournal_205,
     author = {Billotte, Jackie},
     title = {A pipeline for assessing the quality of images and metadata from crowd-sourced databases},
     journal = {Peer Community Journal},
     eid = {e81},
     publisher = {Peer Community In},
     volume = {2},
     year = {2022},
     doi = {10.24072/pcjournal.205},
     url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.205/}
}
TY  - JOUR
AU  - Billotte, Jackie
TI  - A pipeline for assessing the quality of images and metadata from crowd-sourced databases
JO  - Peer Community Journal
PY  - 2022
VL  - 2
PB  - Peer Community In
UR  - https://peercommunityjournal.org/articles/10.24072/pcjournal.205/
DO  - 10.24072/pcjournal.205
ID  - 10_24072_pcjournal_205
ER  - 
%0 Journal Article
%A Billotte, Jackie
%T A pipeline for assessing the quality of images and metadata from crowd-sourced databases
%J Peer Community Journal
%D 2022
%V 2
%I Peer Community In
%U https://peercommunityjournal.org/articles/10.24072/pcjournal.205/
%R 10.24072/pcjournal.205
%F 10_24072_pcjournal_205
Billotte, Jackie. A pipeline for assessing the quality of images and metadata from crowd-sourced databases. Peer Community Journal, Volume 2 (2022), article  no. e81. doi : 10.24072/pcjournal.205. https://peercommunityjournal.org/articles/10.24072/pcjournal.205/

PCI peer reviews and recommendation, and links to data, scripts, code and supplementary information: 10.24072/pci.zool.100017

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

[1] Austen, G. E.; Bindemann, M.; Griffiths, R. A.; Roberts, D. L. Species identification by conservation practitioners using online images: accuracy and agreement between experts, PeerJ, Volume 6 (2018) | DOI

[2] Cardoso, P.; Erwin, T. L.; Borges, P. A.; New, T. R. The seven impediments in invertebrate conservation and how to overcome them, Biological Conservation, Volume 144 (2011) no. 11, pp. 2647-2655 | DOI

[3] Cull, B. Potential for online crowdsourced biological recording data to complement surveillance for arthropod vectors, PLOS ONE, Volume 16 (2021) no. 4 | DOI

[4] Dag, O.; Karabulut, E.; Alpar, R. GMDH2: Binary Classification via GMDH-Type Neural Network Algorithms—R Package and Web-Based Tool, International Journal of Computational Intelligence Systems, Volume 12 (2019) no. 2 | DOI

[5] Fawcett, T. An introduction to ROC analysis, Pattern Recognition Letters, Volume 27 (2006) no. 8, pp. 861-874 | DOI

[6] GBIF: The Global Biodiversity Information Facility. What is GBIF?, 2022 (https://www.gbif.org/what-is-gbif)

[7] Heberling, J. M.; Miller, J. T.; Noesgaard, D.; Weingart, S. B.; Schigel, D. Data integration enables global biodiversity synthesis, Proceedings of the National Academy of Sciences, Volume 118 (2021) no. 6 | DOI

[8] Hochmair, H. H.; Scheffrahn, R. H.; Basille, M.; Boone, M. Evaluating the data quality of iNaturalist termite records, PLOS ONE, Volume 15 (2020) no. 5 | DOI

[9] Integrated Taxonomic Information System (ITIS) on-line database, 2022 (https://doi.org/10.5066/F7KH0KBK)

[10] Matheson, C. iNaturalist, Reference Reviews, Volume 28 (2014) no. 8, pp. 36-38 | DOI

[11] Mesaglio, T.; Callaghan, C. T. An overview of the history, current contributions and future outlook of iNaturalist in Australia, Wildlife Research, Volume 48 (2021) no. 4 | DOI

[12] Moudrý, V.; Devillers, R. Quality and usability challenges of global marine biodiversity databases: An example for marine mammal data, Ecological Informatics, Volume 56 (2020) | DOI

[13] Natural History Museum Bern NMBE - World Spider Catalog. World Spider Catalog. Version 20.0., 2019 | DOI

[14] Nugent, J. iNaturalist: Citizen Science for the Digital Age, The Science Teacher, Volume 87 (2020), p. 58 (https://www.jstor.org/stable/27048173)

[15] Nyffeler, M.; Sterling, W. L.; Dean, D. A. How Spiders Make a Living, Environmental Entomology, Volume 23 (1994) no. 6, pp. 1357-1367 | DOI

[16] R Core Team A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, 2022 (https://www.r-project.org/)

[17] Schwerdt, L.; Elena de Villalobos, A.; Miles, F. P. Spiders as potential bioindicators of mountain grasslands health: the Argentine tarantula Grammostola vachoni (Araneae, Theraphosidae), Wildlife Research, Volume 45 (2018) no. 1 | DOI

[18] Shirey, V.; Seppälä, S.; Branco, V.; Cardoso, P. Current GBIF occurrence data demonstrates both promise and limitations for potential red listing of spiders, Biodiversity Data Journal, Volume 7 (2019) | DOI

[19] Zizka, A.; Silvestro, D.; Andermann, T.; Azevedo, J.; Duarte Ritter, C.; Edler, D.; Farooq, H.; Herdean, A.; Ariza, M.; Scharn, R.; Svantesson, S.; Wengström, N.; Zizka, V.; Antonelli, A. CoordinateCleaner: Standardized cleaning of occurrence records from biological collection databases, Methods in Ecology and Evolution, Volume 10 (2019) no. 5, pp. 744-751 | DOI

Cited by Sources: