Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build-attributes: [enable-patchelf] uses incorrect dynamic linker when setting binary ELF interpreter on core24 #4508

Closed
NucciTheBoss opened this issue Jan 5, 2024 · 6 comments · Fixed by #4523 · May be fixed by canonical/craft-parts#623
Labels
bug Actual bad behavior that don't fall into maintenance or documentation

Comments

@NucciTheBoss
Copy link

Bug Description

I am using core24 as the base snap for my classic SLURM snap because core24 has the necessary packages that I need for enabling AMD GPU support in the workload scheduler: https://packages.ubuntu.com/search?keywords=librocm&searchon=names

I need to patch the binaries and shared libraries in the SLURM snap since it is classically confined, however, build-attributes: [enable-patchelf] is using an incorrect dynamic linker as the ELF interpreter for binaries. This is causing core dumps and segfaults if I try to execute the binaries on a host with an older libc implementation such as Ubuntu 22.04 or 20.04 LTS. The rpath is being set correctly though by the automatic patching mechanism. Also, for some reason, on core24, but not core22, the classic linter is warning me about a staged libc package. Several of SLURMs dependencies require libc6 >= 2.38, but it should be using the libc6 provided by the core24 base.

snapcraft-20240105-110231.695714.log

To Reproduce

  1. Download attached snapcraft.yaml file
  2. snapcraft -v pack #=> ensure that you have snapcraft 8.0.1 installed
  3. lxc launch ubuntu:22.04 snap-tester --vm
  4. lxc file push slurm_23.11.1_amd64.snap slurm-0/root/slurm_23.11.1_amd64.snap
  5. lxc shell snap-tester
  6. snap install core24 --edge
  7. snap install slurm_23.11.1_amd64.snap --dangerous --classic
  8. snap logs slurm.munged #=> See core dump error
  9. slurm.squeue -h #=> See core dump error

Environment

OS: Ubuntu 23.10 Mantic Minotaur
LXD: 5.19 rev 20600
Snap test environment: 22.04 and 20.04

snapcraft.yaml

# Copyright 2023 Canonical Ltd.
# See LICENSE file for licensing details

name: slurm
version: 23.11.1
summary: "Slurm: A Highly Scalable Workload Manager"
description: |
  Slurm is an open source, fault-tolerant, and highly scalable cluster management and
  job scheduling system for large and small Linux clusters.

  Contains Slurm services slurmctld, slurmd, slurmdbd, and slurmrestd. See `slurm-clients`
  snap for Slurm CLI commands.
license: Apache-2.0
website: "https://slurm.schedmd.com"

base: core24
build-base: devel
confinement: classic
compression: lzo

apps:
  slurmctld:
    command: sbin/slurmctld
    daemon: simple
    install-mode: disable
    after:
      - munged
  slurmd:
    command: sbin/slurmd
    daemon: simple
    install-mode: disable
    after:
      - munged
  slurmdbd:
    command: sbin/slurmdbd
    daemon: simple
    install-mode: disable
    after:
      - munged
  slurmrestd:
    command: sbin/slurmrestd
    daemon: simple
    install-mode: disable
    after:
      - munged
  sacct:
    command: bin/sacct
  sacctmgr:
    command: bin/sacctmgr
  salloc:
    command: bin/salloc
  sattach:
    command: bin/sattach
  sbatch:
    command: bin/sbatch
  sbcast:
    command: bin/sbcast
  scancel:
    command: bin/scancel
  scontrol:
    command: bin/scontrol
  scrontab:
    command: bin/scrontab
  scrun:
    command: bin/scrun
  sdiag:
    command: bin/sdiag
  sh5util:
    command: bin/sh5util
  sinfo:
    command: bin/sinfo
  sprio:
    command: bin/sprio
  squeue:
    command: bin/squeue
  sreport:
    command: bin/sreport
  srun:
    command: bin/srun
  sshare:
    command: bin/sshare
  sstat:
    command: bin/sstat
  strigger:
    command: bin/strigger
  sview:
    command: bin/sview

  munged:
    command: usr/sbin/munged
    daemon: simple
  munge:
    command: usr/bin/munge
  unmunge:
    command: usr/bin/unmunge
  remunge:
    command: usr/bin/remunge
  mungekey:
    command: usr/sbin/mungekey

parts:
  slurm:
    plugin: autotools
    build-attributes:
      - enable-patchelf
    source: "https://download.schedmd.com/slurm/slurm-${SNAPCRAFT_PROJECT_VERSION}.tar.bz2"
    source-type: tar
    build-packages:
      - patchelf
      - libmunge-dev
      - libncurses-dev
      - libgtk2.0-dev
      - default-libmysqlclient-dev
      - libpam0g-dev
      - libperl-dev
      - libpam0g-dev
      - liblua5.4-dev
      - libhwloc-dev
      - librrd-dev
      - libipmimonitoring-dev
      - hdf5-helpers
      - libfreeipmi-dev
      - libhdf5-dev
      - man2html
      - libcurl4-openssl-dev
      - libpmix-dev
      - libhttp-parser-dev
      - libyaml-dev
      - libjson-c-dev
      - libjwt-dev
      - liblz4-dev
      - bash-completion
      - libdbus-1-dev
      - librdkafka-dev
      - librocm-smi-dev
      - libibmad-dev
      - libibumad-dev
      - libnuma-dev
      - libaec-dev
    stage-packages:
      - munge
      - libncurses6
      - libgtk2.0-0
      - libmysqlclient21
      - libpam0g
      - libperl5.36
      - liblua5.4-0
      - libhwloc15
      - librrd8
      - libipmimonitoring6
      - hdf5-helpers
      - libfreeipmi17
      - hdf5-helpers
      - man2html
      - libcurl4
      - libpmix2
      - libhttp-parser2.9
      - libyaml-0-2
      - libjson-c5
      - libjwt0
      - liblz4-1
      - bash-completion
      - libdbus-1-3
      - librdkafka1
      - librocm-smi64-1
      - libibmad5
      - libibumad3
      - libnuma1
      - libaec0
      - libsz2
      - libhdf5-hl-100
      - libhdf5-103-1
    autotools-configure-parameters:
      - --prefix=/
      - --localstatedir=/var
      - --runstatedir=/var/run/slurm
      - --disable-developer
      - --disable-debug
      - --enable-slurmrestd
      - --enable-multiple-slurmd
      - --with-munge
      - --with-libcurl
      - --with-http-parser
      - --with-yaml
      - --with-json
      - --with-jwt
      - --with-hdf5=yes
      - --with-rdkafka
      - --with-freeipmi
      - --with-ofed

Relevant log output

I attached the build log to the bug description because the log is too big to be copy & pasted into the text box.

Additional context

The binaries work if you manually set the ELF interpreter to the dynamic linker in core24:

override-prime: |
      craftctl default

      set -eu
      # enable-patchelf attempts to use libc implementation
      # from stage-packages which will cause failures. This
      # experiment attempts to circumvent by using the ELF
      # interpreter provided by core24.
      export PATH=${CRAFT_PART_BUILD}/usr/bin:$PATH
      patchelf --force-rpath --set-rpath \$ORIGIN/../lib/slurm:\$ORIGIN/../lib/x86_64-linux-gnu bin/squeue
      patchelf --set-interpreter /snap/core24/current/lib64/ld-linux-x86-64.so.2 bin/squeue
@NucciTheBoss NucciTheBoss added the bug Actual bad behavior that don't fall into maintenance or documentation label Jan 5, 2024
@mr-cal
Copy link
Collaborator

mr-cal commented Jan 12, 2024

Some things I need to investigate for this:

  1. In this example, something is staging libc6 and snapcraft is using it (as designed).
    • I'm not sure how lib6 is getting staged or why this is only occurring in core24. We should follow Sergio's advice, "Check also if the dpkg.list we use to filter stage-packages is in place for core24, remember core24 is most likely moving to chisel, so would be good to sync with the Ubuntu Core team on that"
  2. patchelf stops running when an override-prime script is defined (even if it calls craftctl default)
    • I have not been able to reproduce this with a simple reproducer. I need to to try this with the slurm snap.
  3. We have a related issue reported here. Building this snap works locally but produces linter errors when using snapcraft remote-build

@tomponline
Copy link
Member

@mr-cal to clarify my comment on Element, I did not see the issue when building LXD inside an ubuntu-daily:24.04 container using SNAPCRAFT_BUILD_ENVIRONMENT=host snapcraft --verbose. I tested this snap on a ubuntu 22.04 system and it worked fine.

However I did see the issue when using the normal snapcraft invocation that manages its own LXD container and image.

@mr-cal
Copy link
Collaborator

mr-cal commented Jan 16, 2024

This bug appears to have the same problem @tomponline experienced, where the linter disagrees with what patchelf did.

@tomponline
Copy link
Member

@mr-cal thanks, do you know when this will land in a snap channel?

@mr-cal
Copy link
Collaborator

mr-cal commented Jan 24, 2024

This should be in edge later today.

It should be included in the next hotfix release 8.0.3, but @sergiusens would know the timing for that.

@nhathaway
Copy link

Still a problem in 8.3.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Actual bad behavior that don't fall into maintenance or documentation
Projects
None yet
4 participants