The implementation of benchmarking the performance of differential abundance (DA) testing methods including 5 clustering-free methods:
- Testing for differential abundance in mass cytometry data (Cydar)
- Detection of differentially abundant cell subpopulations in scRNA-seq data (DAseq)
- Quantifying the effect of experimental perturbations at single-cell resolution (MELD)
- Differential abundance testing on single-cell data using k-nearest neighbor graphs (Milo)
- Co-varying neighborhood analysis identifies cell populations associated with phenotypes of interest from single-cell transcriptomics (CNA),
and a clustering-based method, Louvain.
To run this benchmarking codes, it needs to install a list of R and Python packages. The R packages needed are:
The Python packages needed are
- MELD
- cna
- scanpy
- graphtools
- scikit-learn
- multianndata
- Synthetic datasets and BCR-XL dataset are available under the
data
directory. - The COVID-19 PBMC dataset is available at https://www.covid19cellatlas.org/#wilk20.
Note: Our implementation can only be used on a cluster with Slurm job scheduler since we need to run thousands of jobs.
The benchmarking scripts are all located in the bin
drectory.
bin
├── bm_parameter.sh
├── bm_runtime.sh
├── bm_syn_real.sh
└── make_bm_data.sh
To run a benchmarking job, use the following command:
bash bm_{the script}.sh
Our implementation is inspired by the repo https://github.com/MarioniLab/milo_analysis_2020.