-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial prefetch for simple single chunked dim #161
base: main
Are you sure you want to change the base?
Conversation
In another attempt to simplify an example and profile transferring data from multiple workers to a single worker (where an ml tasks would iterate over batches) I have created this example:
|
@jhamman @maxrjones this is sort of the approach am considering developing. I think 2gbps should be fine, but I was able to get 8+gbps with https://github.com/NVlabs/tensorcom using basic k8s pods and a manifest, which uses msgpack with pyzmq. I am trying to avoid using that and stick with the dask mechanics, but I am tempted to mock up a quick profile script of using zmq to bypass dask entirely, but within dask tasks. This all might not belong in xbatcher, but I wanted to put it out there to get ay feedback people might have. |
Here is an example of using the prefetch generator with tf.data.Dataset
|
Can the test at the bottom be wrapped as a function? I'm guessing it's not supposed to run for everyone. |
@cmdupuis3 I am not sure I understand what you are asking by wrapped as a function. Do you mean be able to submit to dask? The BatchGenerator should be available on this branch if you check it out and install in editable mode. |
Actually I think I was confused. I read |
POC Prefetch Generator:
This is a draft pr to articulate one possible approach to "prefetching" dask arrays or xarray arrays with dask.
The goals were to simultaneously:
as_completed
I also tried one approach using a Queue on the workers. This felt weird and found myself reinventing features that dask already has.
Results
Using helm to deploy a cluster on kubernetes with 8 workers (4cpu and 16gb each and relatively standard network configurations), I am able to see:
What Next?
No clue. I would like to investigate
note: