-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft Census schema for support of Visium and Spatial data #1092
Comments
First iteration, very likely to change https://drive.google.com/file/d/1_A8YlZsVZrDrt_hhjHIYQ_jVw0M5b_eP/view?usp=sharing |
Second iteration |
Third iteration (changes reflected in text as of today). |
@pablo-gar - Some questions/comments here about the differences in the diagram of Does
Should
There is no
According the Full resolution image of a Scene and High resolution image of a Scene are specified as |
@pablo-gar - I noticed a reference to |
@prathapsridharan answering your questions
No,
I'll bring this proposal to Julia and Aaron. I don't have an strong opinion on it.
I'm proposing to unify everything via the
Yes, thanks for the catch! I will remove it
Yes, thanks for catching all of these! |
@brianraymor Thanks for the catch I've fixed it. |
Fourth iteration with fixes from the comments above. Text has also been updated in the top-level comment. |
Sixth iteration:
|
Seventh iteration:
|
Eighth iteration:
|
LAST EDITED: Aug, 29, 2024
See parent Epic for further information.
chanzuckerberg/single-cell#644
See current draft for spatial support in SOMA https://docs.google.com/document/d/1S48pD5XTzDcaLGlq6YVYCoUjptR93PHHHmG79TiJzsA/edit
TODOs
./census_accepted_assays.csv
to include:EFO:0010961
-Visium Spatial Gene Expression
EFO:0030062
-Slide-seqV2
EFO:0009920
-Slide-seq
maybe?spatial[scene_id].obsl["loc"]["soma_geometry"]
Schema changes
Version: 2.2.0
Last edited: April, 2024.
Data included
All datasets included in the Census MUST be of CELLxGENE dataset schema version 5.1.0. The following data constraints are imposed on top of the CELLxGENE dataset schema.
Assays
[...]
The Census MUST include all cells from the list of accepted assays.
These assays were selected with the following criteria:
Spatial Assays
Only observations from Visium and Slide-seq assays MUST be included in Census, as indicated in the list of accepted assays. Per the CELLxGENE dataset schema, datasets with spatial observations can be identified with the presence of the slot
uns["spatial"]
. For these assays, only observations from datasets that contain "one Space Ranger output for a single tissue section" MUST be included in Census.The full logic above can be asserted as follows:
uns["spatial"]
andTrue
inuns["spatial"]["is_single"]
, then all observations MUST be included.uns["spatial"]
andFalse
inuns["spatial"]["is_single"]
, then all observations MUST be excluded.Census metadata –
census_obj["census_info"]["summary"]
–SOMADataFrame
[...]
"total_cell_count"
"unique_cell_count"
Data encoding and organization
[...]
Census Non-Spatial Data –
census_obj["census_data"][organism]
–SOMAExperiment
Non-spatial data for Homo sapiens MUST be stored as a
SOMAExperiment
incensus_obj["census_data"]["homo_sapiens"]
.Non-spatial data for Mus musculus MUST be stored as a
SOMAExperiment
incensus_obj["census_data"]["mus_musculus"]
.Feature dataset presence matrix –
census_obj["census_data"][organism].ms["RNA"]["feature_dataset_presence_matrix"]
–SOMASparseNDArray
[...]
Census Spatial Sequencing Data –
census_obj["census_spatial_sequencing"][organism]
–SOMAExperiment
Only Visium and Slide-seq are supported for spatial data. See the "assays included" section above.
Spatial data for Homo sapiens MUST be stored as a
SOMAExperiment
incensus_obj["census_spatial_sequencing"]["homo_sapiens"]
.Spatial data for Mus musculus MUST be stored as a
SOMAExperiment
incensus_obj["census_spatial_sequencing"]["mus_musculus"]
.For each organism the
SOMAExperiment
MUST contain the following:census_obj["census_spatial_sequencing"][organism].obs
–SOMADataFrame
census_obj["census_spatial_sequencing"][organism].ms
–SOMACollection
. ThisSOMACollection
MUST only contain oneSOMAMeasurement
incensus_obj["census_spatial_sequencing"][organism].ms["RNA"]
with the following:census_obj["census_spatial_sequencing"][organism].ms["RNA"].X
–SOMACollection
. It MUST contain exactly two layers:census_obj["census_spatial_sequencing"][organism].ms["RNA"].X["raw"]
–SOMASparseNDArray
census_obj["census_spatial_sequencing"][organism].ms["RNA"].var
–SOMAIndexedDataFrame
census_obj["census_spatial_sequencing"][organism].ms["RNA"]["feature_dataset_presence_matrix"]
–SOMASparseNDArray
census_obj["census_spatial_sequencing"][organism].obs_scene
. It indicates the link between an observation and a scene, it MUST have two columns: 1)obs_id
corresponding tosoma_joinid
ofobs
and 2)scene_id
corresponding to the associated scene.census_obj["census_spatial_sequencing"][organism].spatial
–SOMACollection
.census_obj["census_spatial_sequencing"][organism].spatial[scene_soma_joinid]
–SOMAScene
. There will be as many as Spatial Scenes as spatial datasets. EachSOMAScene
MUST contain the following:census_obj["census_spatial_sequencing"][organism].spatial[scene_soma_joinid].obsl["loc"]
–SOMAGeometryNDArray
. This will contain the spatial array positions for each observation, the geometry points associated to them, and additional metadata.census_obj["census_spatial_sequencing"][organism].spatial[scene_soma_joinid].img[library_id]["fullres_image"]
–SOMAImageNDArray
.census_obj["census_spatial_sequencing"][organism].spatial[scene_soma_joinid].img[library_id]["highres_image"]
–SOMAImageNDArray
.Matrix Data, count (raw) matrix –
census_obj["census_spatial_sequencing"][organism].ms["RNA"].X["raw"]
–SOMASparseNDArray
Same as non-spatial data. See the corresponding section here.
Feature metadata –
census_obj["census_spatial_sequencing"][organism].ms["RNA"].var
–SOMADataFrame
Same as non-spatial data. See the corresponding section here.
Feature dataset presence matrix –
census_obj["census_spatial_sequencing"][organism].ms["RNA"]["feature_dataset_presence_matrix"]
–SOMASparseNDArray
Same as non-spatial data. See the corresponding section here.
Cell metadata –
census_obj["census_spatial_sequencing"][organism].obs
–SOMADataFrame
Same as non-spatial data. See the corresponding section here.
Important note: In addition, the following spatial
obs
columns from the CELLxGENE dataset schema MUST be included in thisSOMADataFrame
Obs to spatial mapping –
census_obj["census_spatial_sequencing"][organism].obs_scene
–SOMADataFrame
It indicates the link between an observation and a scene. Each row corresponds to an observation with the following columns:
soma_joinid
fromcensus_obj["census_spatial_sequencing"][organism].obs
.scene_id
fromcensus_obj["census_spatial_sequencing"][organism].spatial
.True
if the scene contains spatial information about the oberservation, otherwise it MUST beFalse
.Positions array of a Scene –
census_obj["census_spatial_sequencing"][organism].spatial[scene_id].obsl["loc"]
–SOMAGeometryNDArray
scene_soma_joinid
MUST correspond to the valuessoma_joinid
incensus_obj["census_spatial_sequencing"][organism].spatial.scenes
.For each observation in each Scene, spatial array positions, the geometry points associated to them, and additional positional metadata MUST be encoded as a
SOMAGeometryNDArray
. Each row corresponds to an observation with the following columns:If Visium ("EFO:0010961") the units for the spatial array pisitions are pixels from the high-resolution image (
spatial[scene_soma_joinid].img["highres_image"]
). Otherwise TBD.obsm["spatial"]
. As defined in the CELLxGENE dataset schema.obsm["spatial"]
. As defined in the CELLxGENE dataset schema.dimeter/2
. If Visium ("EFO:0010961")diameter
MUST beuns.["spatial"][library_id]['spot_diameter_fullres']
. As defined in the CELLxGENE dataset schema. Otherwise TBD-TODO (else for Slide-seq it should be 0.003% of the radius occupied by the full cloud of points).Images of a Scene -
census_obj["census_spatial_sequencing"][organism].spatial[scene_soma_joinid].img[library_id] –
SOMASMultiscaleImage`.Images of a Visium ("EFO:0010961") scene MUST adhere to the following specifications. Other assays MUST NOT have images, and MUST NOT include the
img
collection.library_id
MUST be the corresponding value in the source H5AD slotuns.["spatial"][library_id]
, as defined in the CELLxGENE dataset schema.Full resolution image of a Scene –
census_obj["census_spatial_sequencing"][organism].spatial[scene_soma_joinid].img[library_id]["fullres_image"]
–SOMAImageNDArray
.The full resolution image of a Visium ("EFO:0010961") scene MAY be included and MUST be encoded as a
SOMAImageNDArray
.Value: the image from
uns["spatial"][library_id]['images']['fullres']
as defined in the CELLxGENE dataset schema.High resolution image of a Scene –
census_obj["census_spatial_sequencing"][organism].spatial[scene_soma_joinid].img[library_id]["highres_image"]
–SOMAImageNDArray
.The full resolution image of a Visium ("EFO:0010961") scene MUST be included and MUST be encoded as a
SOMAImageNDArray
.Value: the image from
uns["spatial"][library_id]['images']['hires']
as defined in the CELLxGENE dataset schema.The text was updated successfully, but these errors were encountered: