Shared initialization step? #8

simonbyrne · 2020-07-31T03:25:28Z

In our old slurmci we have a separate init job that installs the packages and does a round of precompilation. This isn't such an issue with buildkite as agents have a cache that they can use (reducing the need to reinstall packages), but if we wanted to do something like this we would need a way to share the initialized cache between the downstream agents.

simonbyrne · 2020-07-31T19:02:18Z

One idea from JuliaCon BoF on Julia in Production: build a system image of all the dependencies, and invalidate only if the Manifest changes

simonbyrne · 2020-07-31T19:11:55Z

We could even do this into a Singularity container?

jakebolewski · 2020-07-31T20:00:51Z

I think that would work as long as nothing had to be written to the singularity container during runtime (after build step)

simonbyrne · 2020-07-31T22:14:20Z

I think singularity containers are immutable

simonbyrne · 2020-09-09T21:15:12Z

How about this:

We have two shared depots, one for each configuration (e.g. /groups/esm/buildkite/depot/cpu and /groups/esm/buildkite/depot/gpu)
We have 1 special agent (e.g. central-init) of which there can only be one instance running at a time (either via --dependency=singleton, or even simply run the agent on the login node), that is only used for the buildkite-agent pipeline upload steps.
This agent is the only one able to write to the shared depots: during init, we have a command that instantiates the repository with both depots, e.g. something like

module load openmpi/4.0.4
JULIA_DEPOT_PATH=/groups/esm/buildkite/depot/cpu julia --project -e 'using Pkg; Pkg.instantiate(); Pkg.precompile()'
module purge
module load openmpi/4.0.4_cuda-10.2 cuda/10.2
JULIA_DEPOT_PATH=/groups/esm/buildkite/depot/gpu julia --project -e 'using Pkg; Pkg.instantiate(); Pkg.precompile()'
module purge

I checked, and it does look like we can instantiate and precompile a CUDA-aware MPI on a non-GPU node.
all subsequent jobs add this depot to the stack, with their top depot on scratch (which should typically remain empty), i.e.

export JULIA_DEPOT_PATH="$(dirname "$BUILDKITE_BUILD_CHECKOUT_PATH")/.julia:/groups/esm/buildkite/depot/cpu"

jakebolewski · 2020-09-10T15:32:49Z

If we could do this with a singleton slurm job instead of as extra logic in the agent I think that would be preferable as it is a bit more flexible. Is this possible with slurm?

simonbyrne · 2020-09-10T15:34:51Z

on second thoughts, I agree an explicit extra step would be better.

simonbyrne · 2020-09-10T15:41:32Z

Is this possible with slurm?

Yes, you give it a unique job name (say --job-name=buildkite-init), and then use --dependency=singleton, and it will only run one instance at a time.

jakebolewski · 2020-09-10T17:59:44Z

Ok I think this should be straightforward then

simonbyrne · 2023-03-07T18:34:17Z

This might be a nice solution: https://github.com/JuliaCI/DepotCompactor.jl/

simonbyrne mentioned this issue Jul 31, 2020

Standard instance types #9

Closed

jakebolewski self-assigned this Sep 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared initialization step? #8

Shared initialization step? #8

simonbyrne commented Jul 31, 2020

simonbyrne commented Jul 31, 2020

simonbyrne commented Jul 31, 2020

jakebolewski commented Jul 31, 2020

simonbyrne commented Jul 31, 2020

simonbyrne commented Sep 9, 2020 •

edited

Loading

jakebolewski commented Sep 10, 2020

simonbyrne commented Sep 10, 2020

simonbyrne commented Sep 10, 2020

jakebolewski commented Sep 10, 2020

simonbyrne commented Mar 7, 2023

Shared initialization step? #8

Shared initialization step? #8

Comments

simonbyrne commented Jul 31, 2020

simonbyrne commented Jul 31, 2020

simonbyrne commented Jul 31, 2020

jakebolewski commented Jul 31, 2020

simonbyrne commented Jul 31, 2020

simonbyrne commented Sep 9, 2020 • edited Loading

jakebolewski commented Sep 10, 2020

simonbyrne commented Sep 10, 2020

simonbyrne commented Sep 10, 2020

jakebolewski commented Sep 10, 2020

simonbyrne commented Mar 7, 2023

simonbyrne commented Sep 9, 2020 •

edited

Loading