Project 12: Perturb -Bench: large-scale benchmarking of perturbational modelling tools in complex single-cell data
Single-cell perturbation modelling delineates how perturbations affect cellular and molecular physiology, such as transcription factors, kinases, and signalling pathways. Perturbation modelling aims to understand the molecular impacts of pharmaceutical compounds or cellular stimulants, dissect disease pathobiology, and facilitate drug repurposing.
Our BioHackathon project aims to address the current lack of independent benchmarking and best practices for perturbation modelling tools, which hinders their broader adoption by the single-cell community. We will conduct an extensive benchmarking study for various perturbation modelling tools, including variational autoencoders, graph-based models for gene-regulatory networks, Optimal Transport tools deciphering cell states, and foundational models.
The benchmarking study will focus on out-of-distribution predictions for unseen events, drug synergy scores, and distilling perturbation effects from confounding sources of variation. We will adopt workflow management systems compatible with community-driven benchmarking frameworks, such as OpenEBench.
We will utilise harmonised single-cell datasets from scPerturb (containing control/disease samples and CRISPR/compound treatments, e.g., sci-Plex, Perturb-seq). The project will standardise emerging metrics (e.g. gene expression correlation, distribution distances, clustering separation) concerning datasets and perturbational tasks and assemble a multidisciplinary group of participants to address biological and computational-mathematical challenges.
Another goal will be the creation of a continuous repository to further develop benchmarking efforts beyond the BioHackathon’s duration. The project's feasibility is supported by the expertise of the leads, who are members of the ELIXIR Single Cell Omics Community/Machine-Learning Focus group, and their ongoing research initiatives, e.g. Mongoose ELIXIR Staff Exchange Project (GR-DE-NL nodes, Feb-Jul 2024).
Georgios Gavriilidis, Marina Esteban-Medina