You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are currently utilizing CellxGene VM (https://github.com/Novartis/cellxgene-gateway) to host a substantial spatial transcriptomic dataset comprising roughly 16 million cells. However, we are facing a couple of critical issues that are hampering our analysis workflow:
Dataset Loading:
Incomplete Loading: During the dataset loading process, we often experience disruptions and incomplete loading scenarios. Though after several attempts, we can achieve full dataset loading with a loading time around 3m30s, the inconsistency remains a concern.
Conversion to CXG: After successful conversion of our dataset to CXG format, we realized that it is not being recognized by our self-hosted explorer.
Differential Expression Analysis:
Inconsistent Loading of Gene Details: While attempting to utilize the differential expressed gene function, we noticed it doesn't uniformly complete the loading of all gene details.
Comparatively, using CZI to work with large datasets (over 4 million cells), we observed a fast data loading and a smooth completion of differential expression analysis in a few seconds. Is there any practices, setups, or approaches that would help us to efficiently handle and analyze big datasets on the local CellxGene VM to achieve performance similar to CZI?
The text was updated successfully, but these errors were encountered:
Thanks for the question. The original issue this user was experiencing was partially addressed over private communication, but sharing here for visibility and to continue the public discussion.
W.r.t. converting to CXG you can refer to this code in the single cell data portal repo that is the entry point for the CXG conversion. To provide a bit more context, the CXG file format is an implementation of the TIleDB format/data structure that adheres to the SOMA specification, with the goal of more specifically catering to the single cell use case.
Happy to provide more information if needed, but as a disclaimer - because of the wide range of contexts/requirements of different self-hosting use cases, the CELLxGENE team does not explicitly offer/guarantee support for self-hosting CZ CELLxGENE Annotate.
Thank you so much for your quick response. I am attempting to self-host CellxGene with a similar setup as the original poster using CellxGene Gateway (https://github.com/Novartis/cellxgene-gateway). I have been running into similar issues with loading times, as even a dataset of 400k cells (4.5 gb) takes nearly 15 minutes to load. I noticed that the https://cellxgene.cziscience.com/ is able to load similar sized datasets at an incredibly fast speed and I was wondering if you could provide some guidance on how I might be able to achieve similar load times. I have used the --sparse and --backed flags and while it does improve performance the load times are still not comparable to what I see on the CellxGene site.
I completely understand that you are not able to explicitly offer/guarantee support for self-hosting, however I appreciate any guidance you can provide.
If you would like to correspond over private communication, please feel free to reach out to me on my email at [email protected]
We are currently utilizing CellxGene VM (https://github.com/Novartis/cellxgene-gateway) to host a substantial spatial transcriptomic dataset comprising roughly 16 million cells. However, we are facing a couple of critical issues that are hampering our analysis workflow:
Dataset Loading:
Differential Expression Analysis:
Comparatively, using CZI to work with large datasets (over 4 million cells), we observed a fast data loading and a smooth completion of differential expression analysis in a few seconds. Is there any practices, setups, or approaches that would help us to efficiently handle and analyze big datasets on the local CellxGene VM to achieve performance similar to CZI?
The text was updated successfully, but these errors were encountered: