coupledNMF

Introduction

Here is the source code for Integrative analysis of single cell genomics data by coupled nonnegative matrix factorizations. When different types of functional genomics data are generated on single cells from different samples of cells from the same heterogeneous population, the clustering of cells in the different samples should be coupled. We formulate this “coupled clustering” problem as an optimization problem, and propose the method of coupled nonnegative matrix factorizations (coupled NMF) for its solution. The method is illustrated by the integrative analysis of single cell RNA-seq and single cell ATAC-seq data.

Otehr versions

CoupledNMF has been implemented in R package of sc-compReg. Please visit https://github.com/SUwonglab/sc-compReg

Preprocessing

For preprocessing the scRNA-seq data, please following the standard processing pipline to get the expression matrix, where each row represents a gene, each column represents a cell.

For preprocessing the scATAC-seq data, please first put all the .bam files for each cell into a folder. Then run the preprossing script we provided to get the the openness matrix, PeakO and PeakName.

Running coupleNMF

coupleNMF receives 8 parameters:

-k the clustering numbers
-PeakO the location of PeakO matrix
-E the location of E matrix
-E_symbol the location of gene symbol file
-P_symbol the location of peak symbol file
-pe the location of pre-calculated peak-gene interactions file
-lambda1 the hyper-paramters lambda1 to control term of the NMF for E
-lamdba2 the hyper-paramters lambda2 to control coupled term

Note:-k, -PeakO, -E, -E_symbol, -P_symbol, -pe are the must-have parameters; -lambda1, -lamdba2 are optional parameters. If coupleNMF does not receive -lambda1 and -lambda2, it will choose the best parameters automatically.

coupleNMF outputs 3 files:

scATAC-result.txt the clustering results for scATAC-seq
scRNA-result.txt the clustering results for scRNA-seq
cluster-specific-peaks-genes-pairs.txt the cluster-specific peak-gene pairs. First column is the gene name, second column is the peak name, the third column is the p-value for gene and last column is the p-value for peak.

Example

python coupleNMF.py -k 2 -E exampledata/E.txt -PeakO exampledata/PeakO.txt -E_symbol exampledata/symbol.txt -P_symbol exampledata/PeakName.txt -pe exampledata/peak_gene_100k_corr.bed  -lambda1 0.04 -lambda2 25

Requirements

sklearn
pandas
scipy
itertools
argparse
itertools
MACS

For any questions about running the software, please contact [email protected].

For any questions about the algorithm, please contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
exampledata		exampledata
joint embedding		joint embedding
README.md		README.md
coupleNMF.py		coupleNMF.py
preprocessing.sh		preprocessing.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

coupledNMF

Introduction

Otehr versions

Preprocessing

Running coupleNMF

Example

Requirements

About

Releases

Packages

Contributors 4

Languages

SUwonglab/CoupledNMF

Folders and files

Latest commit

History

Repository files navigation

coupledNMF

Introduction

Otehr versions

Preprocessing

Running coupleNMF

Example

Requirements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages