This repository contains code for the manuscript "Spatially resolved gene neighborhood networks in single cells". Codes are run under specified Anaconda environments.
To set up environments, run the following command: conda env create -f filename.yml
Example data can be found at: https://doi.org/10.5281/zenodo.7651412
All code in MERFISH analysis should be run under the scenv environment except for 04_networkVisualization.ipynb.
04_networkVisualization.ipynb should be run under the network environment.
01_merFishPatchAnalyss.ipynb generates analysis results and figures for patch correlation analysis and the following figures:
Rna visualization:
Single-cell patch detection:
Single-cell patch correlations:
The locations of gene pairs with positive and negative correlations are also plotted:
02_neighborhoodNetworkAnalysis.ipynb performs gene neighborhood networks analysis. The code detects gene neighbors as shown below:
After counting the number of times that two genes are neighbors, a permutation analysis was performed to determine the proximity score of the two genes. Higher proximity scores mean the two genes are more likely to be neighbors given the copy number of each gene. Pairs of genes within different proximity scores are also visualized in a scatter plot:
03_coclusteringAnalysis.ipynb clusters cells based on single-cell gene count, patch correlation, or network variability. The clustering result and true cell types are visualized using t-SNE plot
04_networkVisualization.ipynb uses networkx package to visualize gene pairwise proximity scores in network format.
05_clustering_eval.ipynb evaluated the mismatch between the clustering results and cell types. Confusion matrices shown below are count-based, patch correlation-based, and network variability-based clustering.
The analysis of MSC seqFISH dataset is similar to the analysis of MERFISH dataset with the addition of image analysis using codes in the "image_processing" folder. All image processing is conducted under skim environment, specified by skimEnv.yml.
00_Registration.ipynb cross-register images from different cycles
01_dotDetectionThreCheck.ipynb allows the user to manually identify a threshold for dot detection in each channel. The following plot is generated to help examine the threshold:
02_2dDotDetection.ipynb takes the input directories and detects dots. The detected dots are saved as .hdf5 files. A matrix of the size length x width x number of genes is stored. Each layer represents a gene. The value is 1 at the position of detected dots and 0 otherwise.
03_cytokineDetection.ipynb provides more detailed dot detection if dots detected by 02_2dDotDetection.ipynb needs additional tuning.
The spatially resolved gene neighborhood network analysis is performed by codes under the "subcelluar_analysis" folder. All subcellular analysis is performed under the scenv environment specified by scanpyEnv.yml.
00_dotFiles2pkl.ipynb takes the .hdf5 files containing the information regarding detected dots and generates a dictionary for each cell. The keys of the dictionary are the gene names, and each item is a list of row and column positions of detected transcripts.
01_subcellularPatches.ipynb performs clustering-based subcellular patch detection and patch correlation calculation. Detected patches and patch correlation is shown below:
02_neighborhoodCorrelationAnalyses.ipynb performs analysis on combined patch correlations. All pair-wise correlations are combined and analyzed for HBM, HUC, and HCH datasets. A PCA analysis was conducted, and statistical comparisons were conducted to identify significant differences.
03_subcellularNetworkInference.ipynb finds local gene neighborhoods as shown below:
Then the copy number of genes per local neighborhood is counted, and the correlation of genes is then calculated. The mean and standard deviation of pairwise gene neighborhood correlation were computed for each cell, and cells were clustered based on the mean and standard deviation of pairwise gene neighborhood correlations. The clustering result and cell types are visualized in t-SNE plots shown below:
04_coclusteringAnalysis.ipynb clusters cells based on single-cell RNA count and patch correlations. The clustering results and cell types are visualized on t-SNE plots as shown below:
05_connectivityToNetwork.ipynb visualizes the pairwise gene neighborhood correlations of each subcellular patch in network format. This code should be run under the network environment specified in networkEnv.yml.
06_rnaProteinNetwork.ipynb expands the subcellular gene neighborhood networks to protein markers. Both patch correlations and local gene neighborhood networks are expanded to inlclude protein markers.
07_circularNetworks.ipynb extract patch correlation and gene neighborhood networks based on distances from the edge of cells.
08_clustering_eval.ipynb evaluated the mismatch between the clustering results and cell types. Confusion matrices shown below are count-based, patch correlation-based, and network variability-based clustering.
Please cite: Fang Z, Ford AJ, Hu T, Zhang N, Mantalaris A, Coskun AF. Subcellular spatially resolved gene neighborhood networks in single cells. Cell Reports Methods. 2023;3(5):100476. doi:10.1016/j.crmeth.2023.100476