One of the main challenges in the biomedical world is finding mechanisms to integrate heterogeneous data from different sources. Semantic technologies enable this kind of interoperability by means of linked data, shared vocabularies and ontologies. However, in practice these technologies have some challenges: initial data is usually in a relational database or is some structured data like XML or JSON which needs to be converted to semantic data, if the initial data is already using some semantic technologies like RDF, it may be using different properties from different vocabularies, it may be poorly described in some dataset or behind some endpoint or it can have a granularity level which makes too complex for simple queries at the cognitive level required by the experts.
In this project we want to improve the existing technologies that can make biomedical data interoperability at a semantic level. The key objectives are:
-
Analyze different challenges of interoperability of semantic data and propose practical solutions.
-
Create a prototype integrating heterogeneous biomedical data using semantic technologies and Shape Expressions that answers some competency questions.
-
Improve existing tools like the Shapes library that we are currently implementing in Rust adding Python bindings for it.
Our approach is to leverage our work on Shape Expressions, which can provide a concise data model description which can be used to identify proper subsets of large RDF datasets and to map between structured data to RDF or to convert between different RDF data models.
Jose Emilio Labra Gayo