Project 31: Executable metadata mappings to FAIRify Biodiversity Genome Annotations

Main goals

The FAIRification of Genomic Annotations Working Group (FGA-WG) in the Research Data Alliance will focus on the challenges of harmonising metadata and software solutions to improve the discovery and reuse of publicly available genomic annotation data.

Our Biohackathon project aims to:

Define minimal metadata to support genome annotations as FAIR objects, and
Develop interoperable executable mappings from bioinformatics case-studies to the FAIRtracks model.

Our PLAN during the biohackathon is to assess and implement the following:

What research data / metadata do we have that we can use as a case study?
What do we want in terms of interoperability, and will the Fairtracks schema provide sufficient coverage for the source metadata in our case-study?
What definitions are missing, or what level of lossiness is "acceptable"? How do we document this loss?
What tools and processes are needed to algorithmically produce a transformation?
Review Omnipy / Whyqd (/wɪkɪd/) (two Python-based libraries for data wrangling) to algorithmically produce a transformation.

Interested in contributing?

We have a diverse group of people participating, both on-site and remotely - including collaborators calling in from Australia - and we would appreciate people with any of the following skills or resources to contribute:

Schema.org / bioschemas familiarity (or metadata for research annotations)
Metadata modelling for interoperability
Bioinformatics research data / metadata to contribute as case-studies for transformation
Python & JSON / JSON-LD

📢 Get hold of us:

Biohackathon slack community: Sveinung Gundersen
Co-lead emails:
- Gavin Chait
- Sveinung Gundersen

Our project is committed to inclusivity, guided by the ELIXIR code of conduct for events and the ELIXIR RSEc code of conduct. We value inputs from a multitude of perspectives, levels of experience and skill, and across a diversity of professional, personal, cultural, or linguistic backgrounds.

Remote participation is welcome, and we are supporting cross-over time-zones for our Australian contributors. Expect us online from about 7am CET from Tuesday, 6 November.

Resources

Current list of remote and on-site contributors
Resource reading list
Potential genome annotation metadata case-studies
Rolling collaboration notes

Abstract

Advances in sequencing technologies and assembly algorithms have enabled an explosion in diverse reference genomes across the tree of life, together with a need to annotate functional and structural features. There is no current set of minimal metadata to support genome annotations as FAIR objects, limiting their reproducibility and reliability.

The FAIRification of Genomic Annotations Working Group (FGA-WG) in the Research Data Alliance (RDA) will develop a harmonised metadata model and recommended infrastructure to improve discovery and reuse of publicly available genomic annotations/tracks, supporting harmonised metadata for GFF3 files. Such metadata exists in e.g. project-specific databases or spreadsheets, workflow systems, repositories, exchange formats, and linked data.

Harmonising metadata according to a unified data model requires the extraction, transformation and integration of data sourced in different research contexts, including "messy" data, using schema mappings or "crosswalks". These operations are time-consuming and may introduce opaque errors. FAIR principles emphasise reproducibility and trust in data analyses with persisted and shared accessible, auditable and executable data transformation and validation methods.

Omnipy and Whyqd (/wɪkɪd/) are independently-developed Python libraries offering general functionality for auditable and executable metadata mappings. Each is pragmatically designed to ensure transformations are executable on real-world data, with validation and feedback. They differ in scope and users, and provide complementary functionalities.

In this project, we will integrate Omnipy and Whyqd to develop executable mappings that transform existing metadata from biodiversity projects, such as ERGA, to conform to the FGA-WG metadata model, kickstarting the process of FAIRifying genome annotation GFF3 files.

Lead(s)

Sveinung Gundersen ɴᴏ, Gavin Chait ᴢᴀ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

31.md

31.md

Project 31: Executable metadata mappings to FAIRify Biodiversity Genome Annotations

Main goals

Interested in contributing?

Resources

Abstract

Lead(s)

Files

31.md

Latest commit

History

31.md

File metadata and controls

Project 31: Executable metadata mappings to FAIRify Biodiversity Genome Annotations

Main goals

Interested in contributing?

Resources

Abstract

Lead(s)