Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove code paths that depend on RMM_STATIC_CUDART #1667

Open
wants to merge 14 commits into
base: branch-24.12
Choose a base branch
from

Conversation

robertmaynard
Copy link
Contributor

@robertmaynard robertmaynard commented Sep 4, 2024

Description

We can remove the optimizations around CUDA_STATIC_RUNTIME and instead see if the function is already in the process space so that RMM doesn't need to have any build context to run properly

Fixes #1679

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@robertmaynard robertmaynard requested a review from a team as a code owner September 4, 2024 19:43
@github-actions github-actions bot added the cpp Pertains to C++ code label Sep 4, 2024
@harrism
Copy link
Member

harrism commented Sep 4, 2024

Are existing tests sufficient to cover this?

@robertmaynard robertmaynard requested a review from a team as a code owner September 5, 2024 13:42
@github-actions github-actions bot added the CMake label Sep 5, 2024
@robertmaynard
Copy link
Contributor Author

Are existing tests sufficient to cover this?

I will add explicit tests that control the type of cudart we are using to verify these chages

@robertmaynard robertmaynard force-pushed the remove_code_dependency_on_RMM_STATIC_CUDART branch 2 times, most recently from 326fd58 to 4866138 Compare September 5, 2024 14:34
@robertmaynard robertmaynard added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Sep 6, 2024
@robertmaynard
Copy link
Contributor Author

Did more testing today and realize that more work is needed. With this PR we are getting all the symbols required at link time, but the dlsym with RTLD_DEFAULT is failing for a couple of the use cases. The tests still pass in CI since a libcudart.so exists on the machine.

Need more time to figure out what is going wrong.

@vyasr
Copy link
Contributor

vyasr commented Sep 20, 2024

In what contexts is this failing? I assume it's a statically linked case, so the symbols cannot have been (dynamically) loaded with RTLD_LOCAL in a way that would obscure them from this scope.

@robertmaynard
Copy link
Contributor Author

In what contexts is this failing? I assume it's a statically linked case, so the symbols cannot have been (dynamically) loaded with RTLD_LOCAL in a way that would obscure them from this scope.

That is correct it is the static linking use case the is failing. The tests have the textual entries for the symbol, but the dlsym returns null.

@robertmaynard robertmaynard changed the base branch from branch-24.10 to branch-24.12 October 28, 2024 18:48
@robertmaynard robertmaynard force-pushed the remove_code_dependency_on_RMM_STATIC_CUDART branch from 91e325d to 6dc3ee3 Compare October 28, 2024 19:46
@robertmaynard robertmaynard added feature request New feature or request non-breaking Non-breaking change and removed 5 - DO NOT MERGE Hold off on merging; see PR for details labels Oct 28, 2024
include/rmm/detail/dynamic_load_runtime.hpp Outdated Show resolved Hide resolved
include/rmm/detail/dynamic_load_runtime.hpp Outdated Show resolved Hide resolved
include/rmm/detail/dynamic_load_runtime.hpp Outdated Show resolved Hide resolved
tests/CMakeLists.txt Show resolved Hide resolved
include/rmm/detail/dynamic_load_runtime.hpp Outdated Show resolved Hide resolved
@bdice
Copy link
Contributor

bdice commented Oct 31, 2024

This introduces a bit of complexity that we may be able to avoid. RMM declares its minimum supported CUDA version is 11.4 (this has been true since November 2023). We have required a minimum of at least 11.2, usually 11.4, everywhere I can think of for a long time. I think we can remove the shims for CUDA < 11.2 in this PR.

@robertmaynard robertmaynard requested a review from a team as a code owner October 31, 2024 17:43
@github-actions github-actions bot added the Python Related to RMM Python API label Oct 31, 2024
@robertmaynard robertmaynard force-pushed the remove_code_dependency_on_RMM_STATIC_CUDART branch from 5f17e74 to 22d2b7f Compare October 31, 2024 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp Pertains to C++ code feature request New feature or request non-breaking Non-breaking change Python Related to RMM Python API
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

[FEA] Remove code paths that depend on RMM_STATIC_CUDART
5 participants