SOMA – for “Stack Of Matrices, Annotated” – is a flexible, extensible, and open-source API enabling access to data in a variety of formats. The driving use case of SOMA is for single-cell data in the form of annotated matrices where observations are frequently cells and features are genes, proteins, or genomic regions.
The TileDB-SOMA package is a C++ library with APIs in Python and R, using TileDB Embedded to implement the SOMA specification.
Get started on using TileDB-SOMA:
- Quick start.
- Python documentation.
- R documentation.
Intended to be used for single-cell data, TileDB-SOMA provides Python and R APIs to allow for storage and data access patterns at scale and for larger-than-memory operations:
- Create and write large volumes of data.
- Open and read data at low latency, locally and from the cloud.
- Query and access interconnected arrays efficiently and at low latency.
TileDB-SOMA provides interoperability with existing single-cell toolkits:
TileDB-SOMA provides interoperability with existing Python or R data structures:
- From Python create PyArrow objects, SciPy sparse matrices, NumPy arrays, and pandas data frames.
- From R create R Arrow objects, sparse matrices (via the Matrix package), and standard data frames and (dense) matrices.
- Please join the TileDB Slack community with dedicated channel
#genomics
. - Please join the CZI Slack community, with dedicated
channel
#cellxgene-census-users
.
The TileDB-SOMA doc-site (Python|R), contains the reference documentation and tutorials.
Reference documentation can also be accessed directly from Python help(tiledsoma)
or R help(package = "tiledbsoma")
.
The capabilities of TileDB-SOMA lay on the different read, access, and query patterns that each of the main implementations of SOMA objects provide:
DenseNDArray
is a dense, N-dimensional array, with offset (zero-based) integer indexing on each dimension.SparseNDArray
is the same asDenseNDArray
but sparse, and supports point indexing (disjoint index access).DataFrame
is a multi-column table with a user-defined columns names and value types, with support for point indexing.Collection
is a persistent container of named SOMA objects.Experiment
is a class that represents a single-cell experiment. It always contains two objects:obs
: aDataFrame
with primary annotations on the observation axis.ms
: aCollection
of measurements, each composed ofX
matrices and axis annotation matrices or data frames (e.g.var
,varm
,obsm
, etc).
- CZ CELLxGENE Discover to build its Census, which provides efficient access and querying to a corpus containing nearly 50 million cells, compiled from 700+ datasets.
If you are interested in listing any projects here please contact us at [email protected].
- Any/all questions, comments, and concerns are welcome at the GitHub new-issue page -- or, you can also browse existing issues.
- If you believe you have found a security issue, in lieu of filing an issue please responsibly disclose it by contacting [email protected].
This branch, main
, implements the updated specfication. Please also see the main-old
branch which implements the original specification.
All participants in TileDB spaces are expected to adhere to high standards of professionalism in all interactions. This repository is governed by the specific standards and reporting procedures detailed in depth in the TileDB core repository Code Of Conduct.