initial prefetch for simple single chunked dim #161

ljstrnadiii · 2023-01-18T01:08:04Z

POC Prefetch Generator:

This is a draft pr to articulate one possible approach to "prefetching" dask arrays or xarray arrays with dask.

The goals were to simultaneously:

support batching over the first dimension, which is the only chunked dimension
support prefetching by making sure prefetch number of batches are always loading
use the typical dask mechanics to transfer data by submitting tasks, working with futures and as_completed
super basic profile of Gb/s. I want to be able to feed my gpu at around a few gb/s.
additional considerations are written in the docstring.

I also tried one approach using a Queue on the workers. This felt weird and found myself reinventing features that dask already has.

Results

Using helm to deploy a cluster on kubernetes with 8 workers (4cpu and 16gb each and relatively standard network configurations), I am able to see:

1.6Gb/s without any adjustment in prefetch or array size
data transferring between workers on the daskui

What Next?

No clue. I would like to investigate

does gb/s scale as number of workers increases?
how do we determine a good prefetch size and avoid memory issues?
does this work with tensorflow? Will I get an error about things being pickled?
Can the deterministic batch idx approach be generalized easily?
Does this even belong in xbatcher? I feel like it was functionality I was hoping xbatcher would support.

note:

pre-commit is being funny--I skipped verification on commit for now.
running the script will save a 33gb zarr dataset if you specify a remote_path!

ljstrnadiii · 2023-01-18T17:32:38Z

In another attempt to simplify an example and profile transferring data from multiple workers to a single worker (where an ml tasks would iterate over batches) I have created this example:

from more_itertools import chunked
import dask.array as da
from distributed import Client, get_client, wait
import seaborn as sns
import pandas as pd

# 16 workers (4cores, 16gb each) on kubernetes; no special network configuration
client = Client("...")                  


chunk_gbps = {}
for max_gather in [.5e9, 1e9, 2e9, 4e9, 6e9]:
    for chunk in [128, 256, 512, 1028, 2048]:
        print(f"looking at chunk {chunk}, {max_gather / 1e9}")
        _ = client.restart()
        array = da.random.random((25000, 100, 100, 9), chunks=(chunk, 100, 100, 9)).persist()
        wait(array)
        del client.datasets['test']
        client.publish_dataset(test=array)

        # determine block batch size to control transfered data with gather
        ex = array.blocks[0]
        batch_size = max(int(np.floor(max_gather / ex.nbytes)), 1)

        def compute_bytes():
            client = get_client()
            array = client.get_dataset('test')
            blocks = list(array.blocks)
            nbytes = 0
            t0 = time.time()
            for block_batch in chunked(blocks, batch_size):
                fs = [client.compute(b) for b in block_batch]
                arrays = client.gather(fs)
                for array in arrays:
                    nbytes += array.nbytes
            elapsed = time.time() - t0
            return (nbytes / elapsed) / 1e9

        # blocks = client.submit(get_blocks, pure=False)
        f = client.submit(compute_bytes, pure=False)
        chunk_gbps[(max_gather / 1e9, chunk, batch_size)] = f.result()

# plot for some trends
data = [(*k,v) for k,v in chunk_gbps.items()]
df = pd.DataFrame(data, columns=['gb_gather','chunk_size', 'actul_batch', 'gbps'])
sns.lineplot(x="gb_gather", y="gbps",hue="chunk_size",data=df)

ljstrnadiii · 2023-01-18T18:02:17Z

@jhamman @maxrjones this is sort of the approach am considering developing.

I think 2gbps should be fine, but I was able to get 8+gbps with https://github.com/NVlabs/tensorcom using basic k8s pods and a manifest, which uses msgpack with pyzmq. I am trying to avoid using that and stick with the dask mechanics, but I am tempted to mock up a quick profile script of using zmq to bypass dask entirely, but within dask tasks.

This all might not belong in xbatcher, but I wanted to put it out there to get ay feedback people might have.

ljstrnadiii · 2023-01-20T01:47:23Z

Here is an example of using the prefetch generator with tf.data.Dataset

import tensorflow as tf
from xbatcher.prefetch_generators import PrefetchBatchGenerator

# let array be chunked only along first dim
array = ...
batch_size=128

def do_tf_ml():
    batch_gen = lambda : PrefetchBatchGenerator(array=array, batch_size=batch_size, prefetch=20)
    ds_counter = tf.data.Dataset.from_generator(batch_gen, output_types=tf.int32, output_shapes=(array[:batch_size].shape))
    nbytes = 0
    t0 = time.time()
    for count_batch in ds_counter.repeat().take(128):
        nbytes += count_batch.numpy().nbytes
    elapsed = time.time() - t0
    return nbytes / elapsed

f = client.submit(do_tf_ml)
f.result() / 1e9

cmdupuis3 · 2023-03-20T15:02:02Z

Can the test at the bottom be wrapped as a function? I'm guessing it's not supposed to run for everyone.

ljstrnadiii · 2023-03-20T18:16:11Z

Can the test at the bottom be wrapped as a function? I'm guessing it's not supposed to run for everyone.

@cmdupuis3 I am not sure I understand what you are asking by wrapped as a function. Do you mean be able to submit to dask?

The BatchGenerator should be available on this branch if you check it out and install in editable mode.

cmdupuis3 · 2023-03-20T18:52:58Z

Actually I think I was confused. I read if __name__ == "__main__" as the entry point rather than as a conditional entry point. Pythonisms are not for forte lol

ljstrnadiii added 2 commits January 17, 2023 23:24

initial prefetch for simple single chunked dim

01ff602

slight update

520433c

ljstrnadiii mentioned this pull request Mar 18, 2023

Support for valid examples #158

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial prefetch for simple single chunked dim #161

initial prefetch for simple single chunked dim #161

ljstrnadiii commented Jan 18, 2023 •

edited

Loading

ljstrnadiii commented Jan 18, 2023

ljstrnadiii commented Jan 18, 2023 •

edited

Loading

ljstrnadiii commented Jan 20, 2023 •

edited

Loading

cmdupuis3 commented Mar 20, 2023

ljstrnadiii commented Mar 20, 2023

cmdupuis3 commented Mar 20, 2023

initial prefetch for simple single chunked dim #161

Are you sure you want to change the base?

initial prefetch for simple single chunked dim #161

Conversation

ljstrnadiii commented Jan 18, 2023 • edited Loading

POC Prefetch Generator:

Results

What Next?

ljstrnadiii commented Jan 18, 2023

ljstrnadiii commented Jan 18, 2023 • edited Loading

ljstrnadiii commented Jan 20, 2023 • edited Loading

cmdupuis3 commented Mar 20, 2023

ljstrnadiii commented Mar 20, 2023

cmdupuis3 commented Mar 20, 2023

ljstrnadiii commented Jan 18, 2023 •

edited

Loading

ljstrnadiii commented Jan 18, 2023 •

edited

Loading

ljstrnadiii commented Jan 20, 2023 •

edited

Loading