Scooby

Code for the scooby manuscript. Scooby is the first model to predict scRNA-seq coverage and scATAC-seq insertion profiles along the genome at single-cell resolution. For this, it leverages the pre-trained multi-omics profile predictor Borzoi as a foundation model, equips it with a cell-specific decoder, and fine-tunes its sequence embeddings. Specifically, the decoder is conditioned on the cell position in a precomputed single-cell embedding.

This repository contains model and data loading code and a train script. The reproducibility repository contains notebooks to reproduce the results of the manuscript.

Hardware requirements

NVIDIA GPU (tested on A40), Linux, Python (tested with v3.9)

Installation instructions

Prerequisites

scooby uses a a custom version of SnapATAC2, which we built using rust:

Install rust with curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
pip install git+https://github.com/lauradmartens/SnapATAC2.git#egg=snapatac2&subdirectory=snapatac2-python

Scooby package installation

pip install git+https://github.com/gagneurlab/scooby.git
Download file contents from the Zenodo repo
Use examples from the scooby reproducibility repository

Training

We offer a train script for modeling scRNA-seq only and a script for multiome modeling. Both require SNAPATAC2-preprocessed anndatas and embeddings. Training scooby takes 1-2 days on 8 NVIDIA A40 GPUs with 128GB RAM and 32 cores.

Model architecture

Currently, the model is only tested with a batch size of 1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.rst

README.rst

Scooby

Hardware requirements

Installation instructions

Prerequisites

Scooby package installation

Training

Model architecture

Files

README.rst

Latest commit

History

README.rst

File metadata and controls

Scooby

Hardware requirements

Installation instructions

Prerequisites

Scooby package installation

Training

Model architecture