CZ ID is a hypothesis-free global software platform that helps scientists identify pathogens in metagenomic sequencing data.
- Discover - Identify the pathogen landscape
- Detect - Monitor and review potential outbreaks
- Decipher - Find potential infecting organisms in large datasets
CZ ID is a collaborative open project of Chan Zuckerberg Initiative and Chan Zuckerberg Biohub.
Currently we have 5 main workflows. The details of each pipeline are in a README in each of the workflow folders.
This repository contains WDL workflows that the CZ ID platform uses in production. See Running WDL workflows locally to get started with them.
The system requirements differs for each workflow and depending on the database being used. For example running the short-read-mngs workflow with the full NT and NR databases would require an instance with >1TB of disk space and >100GB of memory. Running other workflows (e.g. consensus-genome, amr) requires much less space.
If using an ARM Mac try following these setup instructions
- docker with buildx support (version >= 19.03)
- python3
- virtualenv
- requirements-dev.txt - to automatically install this run
make python-dependencies
To get setup, first set the workflow you want to run with
export WORKFLOW=<workflow-name>
e.g.
export WORKFLOW=amr
You can see available workflows with make ls
Either build
or pull
the workflow docker image with
make pull ## The faster option
or
make build ## The slower option, but necessary if you're modifying the docker container
Run a workflow with
make run
Which simply runs the miniwdl run path_to_wdl.wdl
command with some defaults
Each workflow has a number of required and optional inputs, and all require at least an input file (usually a fastq). Default inputs are set from the workflows/<workflow-name>/test/local_test.yml
file. These may or may not be accurate for every analysis. You can override these defaults and add your own with:
make run INPUT='-i your_file_here.yml'
If you're happy with the defaults, you can add arguments to the miniwdl
command using
make run EXTRA_INPUTS='input_fastq=/path/to/input.fastq'
Lets say I want to run a consensus-genome workflow. I would run the following:
export WORKFLOW=consensus-genome
make pull # pull the latest docker container from github packages
make python-dependencies # create a .venv and install the requirements-dev.txt dependencies
make run EXTRA_INPUTS='fastqs_0=workflows/consensus-genome/test/sample_sars-cov-2_paired_r1.fastq.gz \
fastqs_1=workflows/consensus-genome/test/sample_sars-cov-2_paired_r2.fastq.gz \
technology="Illumina" \
sample="my_sample_name"'
We use GitHub Actions for CI/CD. Lint and unit tests run on GitHub from jobs in .github/workflows/wdl-ci.yml
(triggered on every commit).
This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to [email protected].
Please disclose security issues responsibly by contacting [email protected].