TaxonWorks. ICEDIG Project Outcomes. Addressing today’s global environmental challenges requires access to significant quantities of data. This holds especially true for the natural sciences, where one rich data trove remains unearthed: The European scientific collections. These jointly hold more than 1.5 billion objects, representing 80% of the world’s bio- and geo-diversity. With only 10 % of these objects digitised, their information remains vastly underused, thus impeding potential applications of this critical scientific resource. The EU-funded ICEDIG project – “Innovation and Consolidation for Large Scale Digitisation of Natural Heritage” - aims to support the implementation phase of the new Research Infrastructure DiSSCo (“Distributed System of Scientific Collections”) by designing and addressing the technical, financial, policy and governance aspects necessary to operate such a large distributed initiative for natural sciences collections across Europe.
Search for other works by this author on: a W. Debunking reliability myths of PIDs for Digital Specimens – DiSSCoTech. In this post I address an erroneous assertion – a myth perhaps, that the proposed Digital Specimen Architecture relies heavily on a centralized resolver and registry for persistent identifiers that is inherently not distributed and that this makes the proposed “persistent” identifiers (PID) for Digital Specimens unreliable. By unreliable is meant link rot (‘404 not found’) and/or content drift (content today is not the same as content yesterday). This assertion and its concerns (myths) came during a lively Q&A and associated ‘chat’ that took place while I was presenting the recent progress in development of the openDS standard at the virtual TDWG 2020 SYM07 symposium this week.
I want to show that any such issues are not those of the persistent identifier scheme itself or its associated service provider organizations but are usually human failings and inadequacies in the management and procedures adopted by users of such schemes. Myth: doi.org is centralized Like this: Like Loading... Stats - sfg.taxonworks.org. Posters for TDWG 2020 - TDWG. An analysis of data paper templates and guidelines: types of contextual information described by data journals. Introduction Data sharing is an emerging scholarly communication practice that facilitates the progress of science by making data accessible, verifiable, and reproducible . There are several ways of sharing data, including personally exchanging data sets, posting data on researchers’ or laboratories’ websites, and depositing data sets in repositories. A relatively novel means of releasing data sets is the publication of data papers, which describe how data were collected, processed, and verified, thereby improving data provenance .
Data papers are published by data journals, and the publication process is similar to that of conventional journals, in that data papers and data are both peer-reviewed, amended, and publicly accessible under unique identifiers . Since data papers take the form of academic papers and can be cited by primary research articles, credit can be given to data creators . Go to : Methods Table 1. Results Table 2. Table 3. Table 4. Discussion Conflict of Interest. A benchmark dataset of herbarium specimen images with label data.
Costbook of the digitisation infrastructure of DiSSCo. A workflow for standardising and integrating alien species distribution data. 5 essential tools for nature conservation we are still missing (Part 1/2) Small Collections Network. The Impact of Brazil’s Virtual Herbarium in e-Science. By Dora Ann Lange Canhos1, Sidnei de Souza1, Alexandre Marino1, Vanderlei Perez Canhos1, and Leonor Costa Maia2 1Centro de Referência em Informação Ambiental – CRIA 2Universidade Federal de Pernambuco, UFPE Summary: Herbarium, a collection of preserved samples of plants and fungi and associated data, is key documentation of the biodiversity of the past and an important instrument with which to model the biodiversity of the future. If prepared and maintained correctly, these specimens hold their scientific value for centuries.Comparing Brazil’s collection of herbaria with that of Europe or the USA demonstrates a significant difference in the size of their holdings.
A herbarium can be defined as a collection of preserved samples of plants and fungi and associated data. Through herbaria data one can analyze species’ distribution across both time and space. An important indicator is the movement of data (entry and removal) in the network, showing its dynamic nature. New ALA strategy for 2020-2025 – Atlas of Living Australia. Today we release our Atlas of Living Australia Strategy 2020-2025. The Atlas of Living Australia (ALA) strategy has been shaped extensively by input from our national and international partners who contributed so actively to our 2019 ALA Future Directions national consultation process.
As Australia’s national biodiversity data infrastructure and one of the world’s foremost such capabilities, we rely on the strength of our partnerships with data providers, users and stakeholders. Indeed, the genesis of the ALA was built on the strength and richness of existing relationships within the museums, collections and herbaria communities. Australia’s fruitful partnership with the Global Biodiversity Information Facility (GBIF) also provides our community a unique opportunity to ensure that local, regional or national biodiversity data delivers impact globally. The ALA is particularly proud of the relationship we play hosting the Australian node of GBIF. Case Study: Brazilian Virtual Herbarium. Data Management Plan: Brazil's Virtual Herbarium. The Tragedy of #OpenData - Comprehension 360. It Is A Commons Tale (no, that s is not a typ-o) I was asked recently about Open Data initiatives. Off-hand I gave one of my typical rough, if stylized replies — “nonsense”.
Put simply — I have yet to find an Open Data set that contained any real value OR was not readily accessible to me without #OpenData frameworks. Now — if that opinion pisses you off, pay attention. As is most often the case, I tend to follow-up rough statements of opinion with additional research. Did I change my mind? Open Data is not Open Source I am a huge fan of both analogies and Open Source. There is a huge difference between property and intellectual property, between resources and ideas. Data on the other hand, is a resource. The Tragedy Of The Commons Is A Strong Analogy We are coming up on the two century anniversary of William Forster Lloyd’s tragedy of the commons. The grass in the commons and data have much in common. Is Data’s Value Really Used Up? Let’s start with a different question. Is all hope lost? Born-digital collection software. Biologists conducting field research, such as floristic studies, accession thousands of specimens into natural history collections.
Many of these specimens’ digital records are now becoming available through online portals such as iDigBio ( the Global Biodiversity Information Facility (GBIF) (Global Biodiversity Information Facility, 2018; Symbiota (Gries et al., 2014; and regional consortia (e.g., SouthEast Regional Network of Expertise and Collections [SERNEC]; One major challenge in digitizing these specimens is the accurate transcription of physical labels into digital formats.
Numerous workflows have been presented to address this challenge, whereby citizen scientists, students, or professionals are tasked with transcribing these data (Hill et al., 2012; Ellwood et al., 2015; Harris and Marsico, 2017; Sweeney et al., 2018). CollNotes development Structured data BOX 1. Green digitization: Online botanical collections data answering real-world questions - Soltis - 2018 - Applications in Plant Sciences. Recent advances in digital technology, coupled with rapidly increasing interest in the creation and dissemination of digitized specimen data for use in broad-scale research by botanists and other organismal scientists, have encouraged the development of a variety of new research opportunities in the botanical sciences (e.g., Page et al., 2015; Soltis, 2017). It is now increasingly possible to collect, use, re-use, and share data more easily and effectively.
With the advent of the U.S. National Science Foundation's Advancing Digitization of Biodiversity Collections initiative and the establishment of iDigBio (Integrated Digitized Biocollections; www.idigbio.org) as the national resource for specimen digitization and digital data mobilization, researchers now have access to ever larger and varied digital data sets for visualization, analysis, and modeling and have new opportunities for adopting “big data” strategies for facilitating discovery. Herbarium data: Global biodiversity and societal botanical needs for novel research - James - 2018 - Applications in Plant Sciences. Research use of herbarium data Herbarium specimens and their data are, for the most part, verifiable, repeatable, sustainable, and persistent (Page et al., 2015; Holmes et al., 2016). Temporal data across taxonomic groups, communities, and habitats enable assessment of changes in species distributions, dispersal ability, or clade differences.
Interactions within and between taxa can be interpreted, providing information about species associations and community assemblages through space and time. Historical and reliable baseline data from collections are needed to build robust predictive models for various taxon-level or functional-group global change responses (e.g., Willis et al., 2017). Significant and irreparable changes to Earth's ecosystems due to global change can be seen by examining the shifts in species distributions and community structure in space and time (IPCC, 2014). Species and community assemblages can be indicators of habitat health. Herbarium data fitness for use. The Australasian Virtual Herbarium: Tracking data usage and benefits for biological collections - Cantrill - 2018 - Applications in Plant Sciences.
Abstract Premise of the Study Globally, natural history collections are focused on digitizing specimens and information and making these data accessible. Usage information on National Herbarium of Victoria data made available through the Atlas of Living Australia and The Australasian Virtual Herbarium (AVH) is analyzed to understand how and by whom herbarium data are being used. Methods Since 2010, AVH data usage information has been gathered from users and supplied to data custodians as a spreadsheet that includes number of download events, number of records downloaded, and user reasons for downloading data in predefined categories. Results Since 2010, in excess of 268,000 download events of 194 million records (excluding testing events) have been recorded for the National Herbarium of Victoria data set. Discussion Data have primarily been used for ecological research, but there is an emerging trend for use in education including citizen science projects.
Methods Results Testing Discussion. ePlant: Visualizing and exploring multiple levels of data for hypothesis generation. The application of systems biology is quite phenomenal these days for prediction-based modeling and interactive data visualization. Along with the genome sequencing of the model plant Arabidopsis thaliana, there has been a parallel increase in systems biology tools. Unfortunately, these tools have been developed by different groups or institutions for various purposes, whereas a unifying and comprehensive systems biology toolbox is more convenient to connect multiple databases and generate prediction models or build hypothesis. ePlant provides that purpose for Arabidopsis thaliana.
In ePlant, Wrase et al. incorporated multiple datasets and integrated search system for analysis and visualization in a single portal. ePlant Steps into the Breach for Plant Researchers. The ever-increasing amount of data available to researchers has come with similarly increasing cognitive loads in efforts to use these data. Even when data sets are stored in well-curated databases, it can be time-consuming to master the specific tools harbored at each site and cumbersome to move between data types. A new tool created by Waese et al. (2017) aims to facilitate hypothesis generation in plant biology by allowing researchers to easily and intuitively move between types and levels of data. ePlant is a web-based tool that integrates Arabidopsis data from more than 10 sources over a scale of more than 10 orders of magnitude.
Waese and coworkers built ePlant to link data on gene expression, subcellular localization (experimental and predicted), protein interactions (with other proteins and with DNA), structural predictions, non-synonymous SNPs, and DNA/RNA sequencing among other types (see figure). Speier, C., Vessey, I., and Valacich, J.S. (2003). Ware, C. (2012). ePlant: Data Visualization Tools for Plant Data.
CyVerse: Meeting Those Midnight Computing Needs. Papua New Guinea is home to hundreds of Begonia species - possibly one of the fastest radiations of flowering plants. This species is unidentified, but will soon be sequenced using Hyb-Seq by Hannah Wilson as part of her doctoral project at RBGE. The data will be analyzed with CyVerse resources. Image: RBGE Herbarium. For researcher, lecturer, mother, and CyVerse community member Catherine Kidner, nothing beats having extensive computing power under her finger.
By Shelley Littin Catherine Kidner, a senior lecturer in the School of Biological Sciences at the University of Edingburgh, doesn’t remember when she started using CyVerse resources, but it was probably sometime around CyVerse’s inception as the iPlant Collaborative in 2008. From partaking in a little late-night genetic analysis, to suggesting software to graduate students, to testing new approaches for analyzing data, Kidner turns to CyVerse resources. “The things we can do now are really exciting,” said Kidner. CyVerse | Cyberinfrastructure for Data Management and Analysis.
Figshare: Research platform for biodiersity discovery. IPBES Data Management Policy. January 24, 2020 Data management plan Open Access Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services Editor(s) Krug, Rainer M.; Aboki Omare, Benedict D Niamir, Aidin Project member(s) Addink, Wouter; Dubois, Gregoire; Krug, Cornelia; Nelson, Howard; Parker-Allie, Fatima; Pignatari Drucker, Debora; Thau, David Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) IPBES Data Management Policy - Version 1.0 Drafting authors: Rainer M.
Editors: Rainer M. Reviewed by: Taskforce on knowledge and data, Taskforce on capacity-building, Taskforce on indigenous and local knowledge, Taskforce on policy tools and methodologies, Taskforce on scenarios and models, and IPBES Secretariat Approved by: Multidisciplinary Expert Panel and Bureau Suggested citation: IPBES (2020): IPBES Data Management Policy ver. 1.0. The policy comes into effect immediately upon approval by the MEP and Bureau at their 14th meetings in January 2020.
Where is Web Science? From 404 to 200. PhyloJive – Integrating biodiversity data with phylogenies | Atlas of Living Australia. Data mining and machine learning to identify collectors and collecting trips. Towards a biodiversity knowledge graph. Pensoft journals integrated with Catalogue of Life to help list the species of the world. Automated pipeline for nomenclatural acts. Confusion: The Biodiversity Informatics Landscape. Management, Archiving, and Sharing for Biologists and the Role of Research Institutions in the Technology-Oriented Age | BioScience.
Imago at Indiana U, links library and natural history databases. Cross-Linking NCBI (DNA) & EMu Records. Integration of Big Data and the Science of the Christmas Tree. Unmet Needs for Analyzing Biological Big Data: A Survey of 704 NSF PIs. RainBio: Using the “Natural History Large Hadron Collider” to tell us about plant diversity.