This repository contains the source code which generates the SSSOM-style mapping files used for the Monarch Initiative knowledge graph.
The pipeline is run via Jenkins and the resulting mapping files are uploaded to Google Cloud Storage, hosted at https://data.monarchinitiative.org/mappings/
- config/ - configuration files
- project-cruft.json - edit this if you need to change any of the project template values in .cruft.json
- mappings/ - SSSOM mapping files (do not edit these)
- scripts/ - scripts for processing mappings (ok to edit these as needed)
- src/monarch_gene_mapping - source code for Monarch Gene Mapping (ok to edit these as needed). See monarch_gene_mapping/README.md for more information.
- Makefile - Default makefile generated by the cookiecutter (Do not edit this file)
- monarch_mapping_commons.Makefile - Custom makefile with additional targets specific to this project (ok to edit this file)
git clone https://github.com/monarch-initiative/monarch-mapping-commons.git
cd monarch-mapping-commons
poetry install
make mappings
Note:
The first time you run this command, it will take a while to download and process the data.
Subsequent runs will be much faster.
This is because Monarch Gene Mapping depends on a very large (11gb) file from UniProtKB.
Future plans are in place to cache this file in Google Cloud Storage, or to use the UniProtKB API,
but for now, the file must be downloaded in its entirety.
To update the mapping registry from OLS:
sh odk.sh make update_registry -B
To update the mappings:
sh odk.sh make mappings
If the run requires a recently published SSSOM or OAK feature, first update ODK:
docker pull obolibrary/odkfull:dev
and then run the dependencies
goal together with the mappings goal:
IMAGE=odkfull:dev sh odk.sh make mappings
For Windows, append :dev
to obolibrary/odkfull
in the odk.bat
file.
Note: If running on a Windows machine, replace sh odk.sh
with odk.bat
in the above commands.
- Only mappings of base entities are extracted. This ensures that we do not import the same UBERON mapping for every species specific anatomy ontology (XAO). This is realised as a filtering step that relies on the crude assumption that the ontology ID is somehow reflected in the subject_id.
This project was made with the mapping-commons-cookiecutter.