Bio-agents and Precipitation Pipeline

This is a fun project to see if there are any correlations between select agents and precipiation. It is known that dramatic increases in precipitation can increase the liklihood of infectious diseases - Infectious Disease, Weather, and Climate. This project scrapes the select agent's webpage, pulls biosample data from SRA via NCBI's EDirect tool, parses the data via python, and loads it into BigQuery to be visualized by Looker Studio.

Note: A large chunk of the data was removed during processing due to missing geographic locations OR the python tool meteostat did not have precipitation data for the geographic location on the specified date.

Technologies Used

Docker
Google Cloud Platform
- Compute Instance
- Google Cloud Storage
- BigQuery
- Looker Studio
miniconda3
Prefect
Python
Terraform

Prerequisites

Python 3
GCP Service account and Google Storage Bucket. See setting up google cloud platform.
Prefect - to set up prefect, create a conda environment and pip install -U prefect.

Note: If scripts are ran without using the docker container located at gregorysprenger/prefect-and-edirect:v1.0.0, then dependency tools in docker-requirements.txt will need to be installed.

Usage

Clone repository via git clone https://github.com/gregorysprenger/bio-agents_and_precip.git
Add GCP Service account credentials to ./credentials/creds.json
Start Prefect orion server

prefect orion start

Open another terminal and set the prefect config, and start prefect's agent:

# Set config to specified URL to view GUI
prefect config set PREFECT_API_URL=http://127.0.0.1:4200/api

# Start prefect agent to listen for deployments
prefect agent start -q default

Create Docker block, GCP block, and deployments in Prefect:

python ./bin/deploy.py

Run first deployment to scrape webpage and fetch biosample information, and parse geographic location, add precipitation, and upload to Google Cloud Storage:

prefect deployment run fetch-and-transform/docker-flow1 -p "start=0" -p "end=65" -p "api_key=<ENTER NCBI API KEY HERE IF YOU HAVE ONE, IF NOT REMOVE PARAMETER>"

A hard limit of 1 hour was set to fetch data for each query. SARS-CoV pulled the most amount of data. This step took 12.5 hours to run on a 16 CPU compute instance.

Run second deployment to pull data from Google Cloud Storage, concatenate, and upload to BigQuery to be visualized via Looker Studio.

prefect deployment run gcs-to-bq/docker-flow2 -p "start=0" -p "end=65"

Visualize data in Looker Studio (formerly Data Studio).

Visualization

Google Cloud Platform setup

Create Google Cloud Project
- Create service account in IAM & Admin with name agents-precip
  - Add the following roles:
    - Viewer
    - BigQuery Adin
    - Storage Admin
    - Storage Object Admin
- Go to manage keys, add and download key in JSON format
- Rename key to creds.json and place in credentials directory
Enable the following API's:
- IAM
- IAM Credentials
Create Google Cloud Storage Bucket
- Go to Cloud Storage and Buckets
- Create a bucket with name agents-precip

Note if names above are taken, choose a different name and update terraform scripts in the terraform directory and python scripts within the bin directory.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
bin		bin
create_images		create_images
credentials		credentials
data		data
docker_tests		docker_tests
images		images
terraform		terraform
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-requirements.txt		docker-requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bio-agents and Precipitation Pipeline

Technologies Used

Prerequisites

Usage

Visualization

Google Cloud Platform setup

About

Releases

Packages

Languages

gregorysprenger/bio-agents_and_precip

Folders and files

Latest commit

History

Repository files navigation

Bio-agents and Precipitation Pipeline

Technologies Used

Prerequisites

Usage

Visualization

Google Cloud Platform setup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages