TileDB-SOMA

SOMA – for “Stack Of Matrices, Annotated” – is a flexible, extensible, and open-source API enabling access to data in a variety of formats. The driving use case of SOMA is for single-cell data in the form of annotated matrices where observations are frequently cells and features are genes, proteins, or genomic regions.

The TileDB-SOMA package is a C++ library with APIs in Python and R, using TileDB Embedded to implement the SOMA specification.

Get started on using TileDB-SOMA:

What Can TileDB-SOMA Do?

Intended to be used for single-cell data, TileDB-SOMA provides Python and R APIs to allow for storage and data access patterns at scale and for larger-than-memory operations:

Create and write large volumes of data.
Open and read data at low latency, locally and from the cloud.
Query and access interconnected arrays efficiently and at low latency.

TileDB-SOMA provides interoperability with existing single-cell toolkits:

Load and create AnnData objects.
Load and create Seurat objects.

TileDB-SOMA provides interoperability with existing Python or R data structures:

From Python create PyArrow objects, SciPy sparse matrices, NumPy arrays, and pandas data frames.
From R create R Arrow objects, sparse matrices (via the Matrix package), and standard data frames and (dense) matrices.

Community

Please join the TileDB Slack community with dedicated channel #genomics.
Please join the CZI Slack community, with dedicated channel #cellxgene-census-users.

APIs Installation and Quick Start

API Documentation

The TileDB-SOMA doc-site (Python|R), contains the reference documentation and tutorials.

Reference documentation can also be accessed directly from Python help(tiledsoma) or R help(package = "tiledbsoma").

Main SOMA Objects

The capabilities of TileDB-SOMA lay on the different read, access, and query patterns that each of the main implementations of SOMA objects provide:

DenseNDArray is a dense, N-dimensional array, with offset (zero-based) integer indexing on each dimension.
SparseNDArray is the same as DenseNDArray but sparse, and supports point indexing (disjoint index access).
DataFrame is a multi-column table with a user-defined columns names and value types, with support for point indexing.
Collection is a persistent container of named SOMA objects.
Experiment is a class that represents a single-cell experiment. It always contains two objects:
- obs: a DataFrame with primary annotations on the observation axis.
- ms: a Collection of measurements, each composed of X matrices and axis annotation matrices or data frames (e.g. var, varm, obsm, etc).

Who Is Using SOMA?

CZ CELLxGENE Discover to build its Census, which provides efficient access and querying to a corpus containing nearly 50 million cells, compiled from 700+ datasets.

If you are interested in listing any projects here please contact us at [email protected].

Issues and Contacts

Any/all questions, comments, and concerns are welcome at the GitHub new-issue page -- or, you can also browse existing issues.
If you believe you have found a security issue, in lieu of filing an issue please responsibly disclose it by contacting [email protected].

Branches

This branch, main, implements the updated specfication. Please also see the main-old branch which implements the original specification.

Developer Information

Code of Conduct

All participants in TileDB spaces are expected to adhere to high standards of professionalism in all interactions. This repository is governed by the specific standards and reporting procedures detailed in depth in the TileDB core repository Code Of Conduct.

Name		Name	Last commit message	Last commit date
Latest commit History 1,989 Commits
.github		.github
apis		apis
data		data
doc		doc
images		images
libtiledbsoma		libtiledbsoma
profiler		profiler
quarto-materials		quarto-materials
scripts		scripts
test		test
.clang-format		.clang-format
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CONTRIBUTING.md		CONTRIBUTING.md
Doxyfile		Doxyfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
codecov.yml		codecov.yml
overview.md		overview.md
pull_request_template.md		pull_request_template.md
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TileDB-SOMA

What Can TileDB-SOMA Do?

Community

APIs Installation and Quick Start

API Documentation

Main SOMA Objects

Who Is Using SOMA?

Issues and Contacts

Branches

Developer Information

Code of Conduct

About

Releases 126

Contributors 31

Languages

License

single-cell-data/TileDB-SOMA

Folders and files

Latest commit

History

Repository files navigation

TileDB-SOMA

What Can TileDB-SOMA Do?

Community

APIs Installation and Quick Start

API Documentation

Main SOMA Objects

Who Is Using SOMA?

Issues and Contacts

Branches

Developer Information

Code of Conduct

About

Resources

License

Stars

Watchers

Forks

Releases 126

Contributors 31

Languages