GitHub - mdrishti/integrate_trydb_globi_enpkg: Integrate TRY db and GLOBI db data with minimal subset of enpkg

This repository contains scripts for integrating species and subsequent traits data from trydb with taxonomic ids from gbif, otol, ncbi and wikidata. At the moment, data for only 25 traits was downloaded from TRY-db. Subsequently, the traits metadata was retrieved from TRY-db website and a subset of enpkg was also retrieved. The csv files retrieved were converted to duckdb (adavantge: on-disk approach for sql queries).

The TRY-db dataset with 25 traits has multiple columns ('data/trydbtemp_Ontop/trydbAll.csv'). These columns have a complex relationship as depicted in the diagram below.

NOTE: the trydbAll table containing the datasets from the TRY-db is a subset of the actual data.

I. Prerequisites:

For smooth running of the scripts (R,shell), install R (version 4.1.2) and the following R-packages :

a) For accessing taxonomic ids from wikidata, with mappings from gbif and ncbi (taxizedb) and from open treel of life (rotl) install.packages(c("taxizedb", "rotl"))

b) For data manipulation, install dplyr and dbplyr (backend wrapper to convert dplyr code into SQL) install.packages(c("dplyr", "dbplyr"))

c) For the on-disk approach of accessing and querying databases, duckdb's API client for R install.packages("duckdb")

and duckdb

d) For building a Virtual Knowledge Graph (VKG), download Ontop-cli/Ontop-protege bundle (version 5.1.2)

For converting ontology files between multiple formats (e.g.: owl to ttl), install robot.

II. Script to map the TRY plant species name to the gbif, ncbi, wikidata and otol ids

Rscript matchTaxonomy.R

To plot distribution of the TRY-db species matched with ids from ott, ncbi, gbif and wikidata, run

Rscript distTaxonomicIds.R

III. Script to build a duckdb database for Ontop and build the knowledge graph

duckdb data/Ontop_input.db -c "IMPORT DATABASE 'data/trydbtemp_Ontop'" or

sh run_duckdb.sh

The relations between tables are depicted in this diagram.

IV. Script to build the knowledge graph in Ontop
#Set the path in data/Ontop_config/duckdb.properties

sh run_ontop.sh

V. Disclaimer

Tha mappings in the ontop virtual knowledge graph are faulty at the moment. Therefore, the SPARQL query does not result in correct results. Work in progress...

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
figures		figures
.gitattributes		.gitattributes
README.md		README.md
distTaxonomicIds.R		distTaxonomicIds.R
ext_com_GLOBI_TRY.sh		ext_com_GLOBI_TRY.sh
matchTaxonomy.R		matchTaxonomy.R
run_duckdb.sh		run_duckdb.sh
run_ontop.sh		run_ontop.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

mdrishti/integrate_trydb_globi_enpkg

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages