Skip to content

Latest commit

 

History

History
65 lines (47 loc) · 2.43 KB

README.rst

File metadata and controls

65 lines (47 loc) · 2.43 KB

Scooby

image

Documentation Status

Code for the scooby manuscript. Scooby is the first model to predict scRNA-seq coverage and scATAC-seq insertion profiles along the genome at single-cell resolution. For this, it leverages the pre-trained multi-omics profile predictor Borzoi as a foundation model, equips it with a cell-specific decoder, and fine-tunes its sequence embeddings. Specifically, the decoder is conditioned on the cell position in a precomputed single-cell embedding.

This repository contains model and data loading code and a train script. The reproducibility repository contains notebooks to reproduce the results of the manuscript.

Hardware requirements

  • NVIDIA GPU (tested on A40), Linux, Python (tested with v3.9)

Installation instructions

Prerequisites

scooby uses a a custom version of SnapATAC2, which we built using rust:

  • Install rust with curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
  • pip install git+https://github.com/lauradmartens/SnapATAC2.git#egg=snapatac2&subdirectory=snapatac2-python

Scooby package installation

  • pip install git+https://github.com/gagneurlab/scooby.git
  • Download file contents from the Zenodo repo
  • Use examples from the scooby reproducibility repository

Training

We offer a train script for modeling scRNA-seq only and a script for multiome modeling. Both require SNAPATAC2-preprocessed anndatas and embeddings. Training scooby takes 1-2 days on 8 NVIDIA A40 GPUs with 128GB RAM and 32 cores.

Model architecture

Currently, the model is only tested with a batch size of 1.

image