Buildkite on Slurm

Run Buildkite pipelines on a Slurm cluster.

Design

The basic idea is that each Buildkite job is run inside a Slurm job: the Slurm job runs the Buildkite agent with the --acquire-job option, which ensures that only the specific Buildkite job is scheduled, and is terminated and exits once complete.

Our Slurm cluster is not web-accessible, so we are unable to use webhooks to schedule the Slurm jobs. Instead poll the Buildkite API (via bin/poll.py) via a cron job running on the cluser login node (bin/cron.sh at a regular interval (currently every minute). This does the following:

Get a list of the Buildkite jobs which are currently queued or running on the cluster via squeue. We check this by using a specific job name (buildkite), and storing the Buildkite job id in the Slurm job comment.
Query the Buildkite API to get a list of all builds for the organization that are currently scheduled. For each build, and for each job in the build, if the job is not already scheduled in Slurm, then schedule a new job to run bin/slurmjob.sh.
Query the Buildkite API for a list of all builds that are cancelled. For each build, and each job in the build, cancel any Slurm jobs with the matching job id.

Unlike regular Buildkite builds, we don't run each job in an isolated environment, so the checkout only happens on the first job (usually the pipeline upload) and the state is shared between all jobs in the build.

Passing options to Slurm

Any options in the agent metadata block which are prefixed with slurm_ are passed to sbatch: underscores _ are converted to hyphens, and the value can be left blank for options which don't have values. For example

agents:
  queue: new-central
  slurm_nodes: 1
  slurm_tasks_per_node: 2

would pass the options --nodes=1 --tasks-per-node=2.

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
.buildkite		.buildkite
.ci		.ci
bin		bin
cluster_environments		cluster_environments
hooks		hooks
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Makefile		Makefile
Manifest.toml		Manifest.toml
Project.toml		Project.toml
README.md		README.md
bootstrap.sh		bootstrap.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Buildkite on Slurm

Design

Passing options to Slurm

About

Releases

Packages

Contributors 5

Languages

License

CliMA/slurm-buildkite

Folders and files

Latest commit

History

Repository files navigation

Buildkite on Slurm

Design

Passing options to Slurm

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages