Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared initialization step? #8

Open
simonbyrne opened this issue Jul 31, 2020 · 10 comments
Open

Shared initialization step? #8

simonbyrne opened this issue Jul 31, 2020 · 10 comments
Assignees

Comments

@simonbyrne
Copy link
Member

In our old slurmci we have a separate init job that installs the packages and does a round of precompilation. This isn't such an issue with buildkite as agents have a cache that they can use (reducing the need to reinstall packages), but if we wanted to do something like this we would need a way to share the initialized cache between the downstream agents.

@simonbyrne
Copy link
Member Author

One idea from JuliaCon BoF on Julia in Production: build a system image of all the dependencies, and invalidate only if the Manifest changes

@simonbyrne
Copy link
Member Author

We could even do this into a Singularity container?

@jakebolewski
Copy link
Contributor

I think that would work as long as nothing had to be written to the singularity container during runtime (after build step)

@simonbyrne
Copy link
Member Author

I think singularity containers are immutable

@simonbyrne
Copy link
Member Author

simonbyrne commented Sep 9, 2020

How about this:

  • We have two shared depots, one for each configuration (e.g. /groups/esm/buildkite/depot/cpu and /groups/esm/buildkite/depot/gpu)
  • We have 1 special agent (e.g. central-init) of which there can only be one instance running at a time (either via --dependency=singleton, or even simply run the agent on the login node), that is only used for the buildkite-agent pipeline upload steps.
  • This agent is the only one able to write to the shared depots: during init, we have a command that instantiates the repository with both depots, e.g. something like
module load openmpi/4.0.4
JULIA_DEPOT_PATH=/groups/esm/buildkite/depot/cpu julia --project -e 'using Pkg; Pkg.instantiate(); Pkg.precompile()'
module purge
module load openmpi/4.0.4_cuda-10.2 cuda/10.2
JULIA_DEPOT_PATH=/groups/esm/buildkite/depot/gpu julia --project -e 'using Pkg; Pkg.instantiate(); Pkg.precompile()'
module purge
  • I checked, and it does look like we can instantiate and precompile a CUDA-aware MPI on a non-GPU node.
  • all subsequent jobs add this depot to the stack, with their top depot on scratch (which should typically remain empty), i.e.
export JULIA_DEPOT_PATH="$(dirname "$BUILDKITE_BUILD_CHECKOUT_PATH")/.julia:/groups/esm/buildkite/depot/cpu"

@jakebolewski jakebolewski self-assigned this Sep 10, 2020
@jakebolewski
Copy link
Contributor

If we could do this with a singleton slurm job instead of as extra logic in the agent I think that would be preferable as it is a bit more flexible. Is this possible with slurm?

@simonbyrne
Copy link
Member Author

on second thoughts, I agree an explicit extra step would be better.

@simonbyrne
Copy link
Member Author

Is this possible with slurm?

Yes, you give it a unique job name (say --job-name=buildkite-init), and then use --dependency=singleton, and it will only run one instance at a time.

@jakebolewski
Copy link
Contributor

Ok I think this should be straightforward then

@simonbyrne
Copy link
Member Author

This might be a nice solution: https://github.com/JuliaCI/DepotCompactor.jl/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants