Skip to content

Commit

Permalink
Merge pull request #3 from imglib/remotes/origin/main
Browse files Browse the repository at this point in the history
Updated blogposts added 20240227
  • Loading branch information
nornil authored Apr 25, 2024
2 parents 23bdf23 + 9c3a521 commit 5fe8d3e
Show file tree
Hide file tree
Showing 7 changed files with 495 additions and 271 deletions.
149 changes: 65 additions & 84 deletions blog/2022-05-02-juliaset-lambda/2022-05-02-juliaset-lambda.ipynb

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,14 +1,26 @@
---
title: "Setup the IJava jupyter kernel"
author: "Stephan Saalfeld"
date: "2022-06-05"
categories: [ jupyter, ijava, jshell , java , kernel]
execute:
echo: true
aliases:
- /jupyter/ijava/jshell/java/kernel/2024/04/03/setup-ijava-jupyter-kernel
author: Stephan Saalfeld
badges: true
branch: master
categories:
- jupyter
- ijava
- jshell
- java
- kernel
date: '2022-06-05'
date-modified: '2024-04-03'
description: Follow these instructions to setup the IJava jupyter kernel by Spencer
Park.
layout: post
title: Setup the IJava jupyter kernel
toc: false

---

In this blog, we will show code snippets and examples to make the best use of [ImgLib2](https://github.com/imglib/imglib2), [BigDataViewer](https://github.com/bigdataviewer/bigdataviewer-core), and friends. ImgLib2 is written to be fast and we will run code that needs to be compiled, so we cannot use any of the various interpreted scripting languages like Python, Groovy, or Javascript. Instead, we will use the [JShell tool](https://docs.oracle.com/javase/9/jshell/introduction-jshell.htm#JSHEL-GUID-630F27C8-1195-4989-9F6B-2C51D46F52C8) that you can use directly in a terminal or through [Spencer Park's IJava jupyter kernel](https://github.com/SpencerPark/IJava). You can also follow these tutorials in your own Java project and use your preferred IDE, but Jupyter notebooks are a great teaching tool. Since jupyter is written in Python and most popular with the Python community, let's follow their ways and first thing create a virtual environment with conda. The lack of version controlled dependency management for Python projects makes it necessary that practically every project must run in a container or virtual environment because the dependencies of different projects almost inevitably collide. Conda is the most popular of several attempts to address this situation. Conda cannot currently be installed from the default Ubuntu repositories, so much about that, but the [installation instructions](https://docs.conda.io/projects/conda/en/latest/user-guide/install/rpm-debian.html) are tolerable, there is a PPA. Now let's create an environment for jupyter:
In this blog, we will show code snippets and examples to make the best use of [ImgLib2](https://github.com/imglib/imglib2), [BigDataViewer](https://github.com/bigdataviewer/bigdataviewer-core), and friends. ImgLib2 is written to be fast and we will run code that needs to be compiled, so we cannot use any of the various interpreted scripting languages like Python, Groovy, or Javascript. Instead, we will use the [JShell tool](https://docs.oracle.com/javase/9/jshell/introduction-jshell.htm#JSHEL-GUID-630F27C8-1195-4989-9F6B-2C51D46F52C8) that you can use directly in a terminal or through [Spencer Park's IJava jupyter kernel](https://github.com/saalfeldlab/IJava). You can also follow these tutorials in your own Java project and use your preferred IDE, but Jupyter notebooks are a great teaching tool. Since jupyter is written in Python and most popular with the Python community, let's follow their ways and first thing create a virtual environment with conda. The lack of version controlled dependency management for Python projects makes it necessary that practically every project must run in a container or virtual environment because the dependencies of different projects almost inevitably collide. Conda is the most popular of several attempts to address this situation. Conda cannot currently be installed from the default Ubuntu repositories, so much about that, but the [installation instructions](https://docs.conda.io/projects/conda/en/latest/user-guide/install/rpm-debian.html) are tolerable, there is a PPA. Now let's create an environment for jupyter:

```
conda create -n jshell-jupyter python=3
Expand All @@ -32,9 +44,8 @@ git checkout try-upgrade-gradle
./gradlew publishToMavenLocal
cd ..
git clone https://github.com/hanslovsky/IJava.git
git clone https://github.com/saalfeldlab/IJava.git
cd IJava/
git checkout hanslovsky/gradle-7.4.2
./gradlew installKernel
```

Expand All @@ -54,7 +65,7 @@ You can now start the jupyter notebook server
jupyter notebook --kernel=java
```

And experiment with the examples. [Spencer Park's IJava jupyter kernel](https://github.com/SpencerPark/IJava) makes it very easy to include dependencies. You can include the relevant snippets from a Maven POM into a tagged code block, e.g.
And experiment with the examples. [Spencer Park's IJava jupyter kernel](https://github.com/saalfeldlab/IJava) makes it very easy to include dependencies. You can include the relevant snippets from a Maven POM into a tagged code block, e.g.

```xml
%%loadFromPOM
Expand All @@ -69,6 +80,13 @@ And experiment with the examples. [Spencer Park's IJava jupyter kernel](https:/
</dependency>
```

or in gradle short notation

```
%mavenRepo scijava.public https://maven.scijava.org/content/groups/public
%maven sc.fiji:bigdataviewer-vistools:1.0.0-beta-29
```

If you prefer to run [JShell](https://docs.oracle.com/javase/9/jshell/introduction-jshell.htm#JSHEL-GUID-630F27C8-1195-4989-9F6B-2C51D46F52C8) directly, you can pull in the dependencies from a complete Maven POM with John Pooth's Maven Jshell plugin

```
Expand Down Expand Up @@ -110,4 +128,3 @@ jupyter kernelspec list
```

Done.

261 changes: 113 additions & 148 deletions blog/2022-09-27-n5-imglib2/2022-09-27-n5-imglib2.ipynb

Large diffs are not rendered by default.

Empty file.
289 changes: 289 additions & 0 deletions blog/2024-02-27-n5-tutorial-basic/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,289 @@
---
title: "N5 API Basics"
description: "Basics of the N5 API for Java developers. This tutorial shows how to read and write n-dimensional image data and structured metadata into HDF5, N5, and Zarr containers using the N5 API."
author:
- name: John Bogovic
- name: Caleb Hulbert
date: "2/27/2024"
date-modified: "4/23/2024"
notebook-links: global
image: n5-basic-tutorial-thumbnail.png
categories:
- hdf5
- n5
- zarr
- imglib2
- tutorial
format:
html:
toc: true
---

This tutorial for Java developers covers the most basic functionality of the [N5 API](https://github.com/saalfeldlab/n5)
for storing large, chunked n-dimensional image data and structured metadata. The N5 API and documentation refer to n-dimensional images as
"datasets", [terminology inherited from HDF5](https://docs.hdfgroup.org/hdf5/develop/_g_l_s.html#title3). We will use this terminology in this tutorial.
If you are used to work with Python and Numpy, an n-dimensional image or dataset is what you know as an [`ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html).
We will learn about:

* creating readers and writers
* modifying and inspecting the hierarchy ("folder structure")
* saving and loading datasets
* saving and loading metadata

## Readers and writers

[`N5Reader`](https://github.com/saalfeldlab/n5/blob/n5-3.2.0/src/main/java/org/janelia/saalfeldlab/n5/N5Reader.java)s and
[`N5Writer`](https://github.com/saalfeldlab/n5/blob/n5-3.2.0/src/main/java/org/janelia/saalfeldlab/n5/N5Writer.java)s form
the basis of the N5 API and allow you to read and write data, respectively. We generally recommend using an
[`N5Factory`](https://github.com/saalfeldlab/n5-universe/blob/n5-universe-1.4.2/src/main/java/org/janelia/saalfeldlab/n5/universe/N5Factory.java) to create readers and writers:

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#make-reader-writer echo=true >}}

The N5 API gives you access to a number of different storage formats: HDF5, Zarr, and N5's own
format. `N5Factory`'s convenience methods try to infer the storage format from the extension
of the path you provide:

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#factory-types echo=true >}}

In fact, it is possible to read with `N5Writer`s since every `N5Writer`
is also an `N5Reader`, so from now on we'll just be using the
`n5Writer`.

::: {.callout-tip}
## Try it!

We use the the N5 storage format for the rest of the tutorial, but it will work just as well over either
an HDF5 file or Zarr container.
:::

## Groups

N5 containers form hierarchies of *groups* - think "nested folders on your file system."
It's easy to create groups and test if they exist:

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#make-groups echo=true >}}

The `list` method lists groups that are children of the given group:

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#list echo=true >}}

and `deepList` recursively lists every descendent of the given group:

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#deep-list echo=true >}}

Notice that these methods *only* give information about what groups are
present and do not provide information about metadata or datasets.

::: {.callout-note}
Some storage / access systems (AWS-S3) separate permissions for reading and listing, meaning
it may be possible to access data but not list.
:::

## Datasets

N5 stores datasets (n-dimensional arrays) in particular groups in the hierarchy.

::: {.callout-warning}
Datasets must be terminal (leaf) nodes in the container hierarchy - i.e. a dataset can not contain
another group or dataset. (Is this strictly true? May be confusing with names like multiscale "datasets")
:::

We recommend using code from [n5-ij](https://github.com/saalfeldlab/n5-ij) or [n5-imglib2](https://github.com/saalfeldlab/n5-imglib2)
to write datasets. The examples in this post will use the latter.

The [`N5Utils`](https://github.com/saalfeldlab/n5-imglib2/blob/241dc2b503d01007ec6aec72dacecc9706f023ab/src/main/java/org/janelia/saalfeldlab/n5/imglib2/N5Utils.java)
class in n5-imglib2 has many useful methods, but in this post, we'll cover simple methods for reading and writing. First,
[`N5Utils.save`](https://github.com/saalfeldlab/n5-imglib2/blob/241dc2b503d01007ec6aec72dacecc9706f023ab/src/main/java/org/janelia/saalfeldlab/n5/imglib2/N5Utils.java#L1440)
writes a dataset and required metadata to the container at a group that you specify. The group will be created if it does
not already exist. The parameters will be discussed in more detail below.

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#n5-imglib2-save echo=true >}}

You can write in parallel by providing an [`ExecutorService`](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html) to this variant of
[`N5Utils.save`](https://github.com/saalfeldlab/n5-imglib2/blob/241dc2b503d01007ec6aec72dacecc9706f023ab/src/main/java/org/janelia/saalfeldlab/n5/imglib2/N5Utils.java#L1514)

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#n5-imglib2-save-exec echo=true >}}

Reading the dataset from the container is also easy with
[`N5Utils.open`](https://github.com/saalfeldlab/n5-imglib2/blob/241dc2b503d01007ec6aec72dacecc9706f023ab/src/main/java/org/janelia/saalfeldlab/n5/imglib2/N5Utils.java#L428) :

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#n5-imglib2-open echo=true >}}

::: {.callout-warning}

## Overwriting data is possible

This save method *DOES NOT* perform any checks prior to writing data and will overwrite data that exists in the specified location.
Be sure to check and take appropriate action if it is possible that data could already be at a particular location and
container to avoid data loss or corruption.
:::

This example shows that data can be over written:

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#n5-imglib2-overwrite echo=true >}}

### Parameter details

#### `groupPath`

is the location inside the container that will store the dataset. You can store an dataset at the
root of a container by specifying `""` or `"/"` as the `groupPath`. In this case, the container
will only be able to store one dataset ([see the warning above](#datasets)).

#### `blockSize`

is a very important parameter. HDF5, N5, and Zarr all break up the datasets they store
into equally sized blocks or "chunks". The block size parameter specifies the size of these blocks.

For the example above, we stored an image of size `64 x 64` using blocks sized `32 x 32`. As a result, N5 uses
four blocks to store the entire image:

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#four-blocks echo=true >}}

*Quiz:* How many blocks would there be if the block size was `64 x 8`?

<details>
<summary>Click here to show the answer.</summary>

There would be eight blocks.

One block covers the first dimension, but it takes 8 blocks to cover the second dimension ($8 \times 8 = 64$).
Also demonstrated by the code below:

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#eight-blocks echo=true >}}

</details>

::: {.callout-tip}
## Try it!

N5 lets you store your image in a single file if you want - just provide a block size that
is equal to or larger than the image size.
:::

#### `compression`

Each block is compressed independently, using the specified compression.
Use [`RawCompression`](https://github.com/saalfeldlab/n5/blob/n5-3.1.3/src/main/java/org/janelia/saalfeldlab/n5/RawCompression.java)
to store blocks without compression.

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#no-compression echo=true >}}

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#no-compression-blk-sizes echo=true >}}

Notice that blocks were previously ~1700-2000 bytes and are now ~4100 without compression.

The available compression options at the time of this writing are:

* [`BloscCompression`](https://github.com/saalfeldlab/n5-blosc/blob/n5-blosc-1.1.1/src/main/java/org/janelia/saalfeldlab/n5/blosc/BloscCompression.java)
* [`Bzip2Compression`](https://github.com/saalfeldlab/n5/blob/n5-3.1.3/src/main/java/org/janelia/saalfeldlab/n5/Bzip2Compression.java)
* [`GzipCompression`](https://github.com/saalfeldlab/n5/blob/n5-3.1.3/src/main/java/org/janelia/saalfeldlab/n5/GzipCompression.java)
* [`Lz4Compression`](https://github.com/saalfeldlab/n5/blob/n5-3.1.3/src/main/java/org/janelia/saalfeldlab/n5/Lz4Compression.java)
* [`RawCompression`](https://github.com/saalfeldlab/n5/blob/n5-3.1.3/src/main/java/org/janelia/saalfeldlab/n5/RawCompression.java)
* [`XzCompression`](https://github.com/saalfeldlab/n5/blob/n5-3.1.3/src/main/java/org/janelia/saalfeldlab/n5/XzCompression.java)
* [`ZstandardCompression`](https://github.com/JaneliaSciComp/n5-zstandard/blob/n5-zstandard-1.0.2/src/main/java/org/janelia/scicomp/n5/zstandard/ZstandardCompression.java)

## Metadata

N5 can also store rich structured metadata in addition to array data. This tutorial will discuss basic, low-level metadata operations.
Advanced operations and metadata standards may be described in a future tutorial.

### Basics

`N5Writer`s have a
[`setAttribute`](https://github.com/saalfeldlab/n5/blob/n5-3.1.3/src/main/java/org/janelia/saalfeldlab/n5/N5Writer.java#L55)
method for writing metadata to the storage backend. It takes three arguments:

```java
<T> void setAttribute(String groupPath, String attributePath, T attribute)
```

* `groupPath` : the group in which to store this metadata
* `attributePath` : the name of this attribute
* `attribute` : the metadata attribute to be stored. Can be an arbitrary type (denoted `T`).

::: {.callout-note}
There are differences between an attribute "name" and an attribute "path", but attribute "paths" are an advanced topic
and will be covered elsewhere.
:::

Similarly, `N5Reader`s have a
[`getAttribute`](https://github.com/saalfeldlab/n5/blob/n5-3.1.3/src/main/java/org/janelia/saalfeldlab/n5/N5Reader.java#L241-L244)
method:

```java
<T> T getAttribute(String groupPath, String attributePath, Class<T> clazz)
```

The last argument (`Class<T>`) lets you specify the type that `getAttribute` should return.
An `N5Exception` will be thrown if the requested type can not be created from the requested attribute.
If an attribute does not exist, `null` will be returned (see the last example of this section).
Consider these examples:

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#attributes-1 echo=true >}}

Sometimes it is possible to interpret an attribute as multiple different types:

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#attr-types echo=true >}}

### Rich metadata

It possible to save attributes of arbitrary types, enabling you to struture your
metadata into classes that are easy to save and load directly. For example, if we define a metadata class `FunWithMetadata`:

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#fun-with-metadata echo=true >}}

then make an instance and save it:

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#rich-metadata echo=true >}}

To retrieve all the metadata in a group as JSON:

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#all-metadata echo=true >}}

### Removing metadata

You can remove attributes by their name as well. To return the element that was removed, just provide the class for that element
(this mirrors the [remove method](https://docs.oracle.com/javase/8/docs/api/java/util/List.html#remove-int-) for `List`s in Java.

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#remove-attrs echo=true >}}

### Working with Dataset Metadata

Metadata used to describe datasets can be `get` and `set` the same as all other metadata.
However there are special [`DatasetAttributes`](https://github.com/saalfeldlab/n5/blob/8e14d529276b57e1817ff21df9cac9fb1a517d59/src/main/java/org/janelia/saalfeldlab/n5/DatasetAttributes.java)
methods to safely work with dataset metadata.
[`N5Reader.getDatasetAttributes`](https://github.com/saalfeldlab/n5/blob/8e14d529276b57e1817ff21df9cac9fb1a517d59/src/main/java/org/janelia/saalfeldlab/n5/N5Reader.java#L276) and
[`N5Writer.setDatasetAttributes`](https://github.com/saalfeldlab/n5/blob/8e14d529276b57e1817ff21df9cac9fb1a517d59/src/main/java/org/janelia/saalfeldlab/n5/N5Writer.java#L134)
ensure the metadata is always a valid representation of dataset metadata.
Setting `DatasetAttributes` however should only be done when the dataset is initially saved. This ensure the required metadata is tightly coupled with the data.
For example, `set`ting dataset metadata should be done through the
[N5Writer.createDataset](https://github.com/saalfeldlab/n5/blob/8e14d529276b57e1817ff21df9cac9fb1a517d59/src/main/java/org/janelia/saalfeldlab/n5/N5Writer.java#L200)
methods (or indirectly through the `N5Utils.save` [methods mentioned above](#datasets))

{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#array-metadata echo=true >}}

::: {.callout-warning}
## Warning

The attributes that N5 uses to read datasets can be set with `setAttribute`, and modifying them could corrupt your data.
**Do not manually set these attributes unless you absolutely know what you're doing!**

* `dimensions`
* `blockSize`
* `dataType`
* `compression`

The attributes that describe datasets are also accessible using `getAttribute`, try running:

```java
n5Writer.getAttribute("data", "dimensions", long[].class);
```

though using `getDatasetAttributes().getDimensions()` are generally recommended.
:::

## What to try next

* [How to work with the N5 API and ImgLib2](https://imglib.github.io/imglib2-blog/posts/2022-09-27-n5-imglib2.html)

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
28 changes: 0 additions & 28 deletions ecosystem/mastodon-sc_mastodon/index.qmd

This file was deleted.

0 comments on commit 5fe8d3e

Please sign in to comment.