-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Device function pointers #2450
Comments
Errors:
Error for similar code failing on AMDGPU:
|
Just quickly documenting my recommedation here
If this leads to function blowup, then one might need to use:
|
@maleadt said the following on slack and I would like to repeat it here:
So my understanding. There seem to be three (related) issues here:
To solve these issues, we would basically need Julia to change to be either more generic with functions / function pointers or by being more clever with type introspection. If that was possible, then we could get around the compilation issue by allowing for more flexibility for when certain code is compiled (for example, we could compile code from the DSL into " Anyway, long story short. No way this is going to be fixed any time soon, but it was good to at least finally document the issue. It seems like some people are working on this from the Vulkan side (as an extension). |
also x-ref #1853 |
Realistically, we can only truly fix this (i.e., without having to re-specialize the entire module and thus not save any compile time) if we ever get proper cross-module function pointers, which is up to NVIDIA. Lacking that, we can only make the ergonomics slightly better, but I'm not sure it's going to be much better than just passing a tuple of functions. As noted, that doesn't entirely work because of JuliaGPU/GPUCompiler.jl#607, but with some Julia-level unrolling of the for loop it should be possible to get specialized (GPU-compatible) code for that. Note that the situation in C isn't much better; the entire GPU module contains all host and device functions, so you don't really get function pointers. |
For example, to make the example from JuliaGPU/GPUCompiler.jl#607 work: using Metal, Unrolled
@unroll function kernel(a, t)
@unroll for x in t
@inbounds a[1] = x
end
return
end
function main()
a = Metal.ones(1)
@metal kernel(a, (1, 1f0))
end I'd apply that to the MWE posted here, but that one already works fine... julia> using CUDA
julia> f(x) = x+1
julia> g(x) = x*2
julia> function call_fxs!(fxs)
x = 1
for i = 1:length(fxs)
x = fxs[1](x)
@cuprintf("%g\n",x)
end
end
julia> @cuda threads = 1 call_fxs!((f, g))
9.88131e-324
1.4822e-323 @leios Does that sufficiently cover your needs? |
Yeah, loop unrolling was another thing I tried for my "real" application, but I really needed something more general. That said, I think we have enough information here for anyone who stumbles across these errors to find a solution / workaround for their problem. |
Right, so simply put. I want the following code to work:
This is what the code looks like in CUDA C:
I've been banging my head against it for a long time (a few months before this post: leios/Fable.jl#64 (comment))
My current solution involves
@generated
loops on loops, which ends up generating functions that are quite large and take a significant amount of time (sometimes up to 70 s for a kernel that runs in 0.0001 s). Mentioned here: https://discourse.julialang.org/t/is-there-any-good-way-to-call-functions-from-a-set-of-functions-in-a-cuda-kernel/102051/3?u=leiosSolutions that exist in other languages:
I have had this discussion throughout the years with @vchuravy , @jpsamaroo , and @maleadt, but never documented it because I'm apparently the only one actually hitting the issue.
To be honest, I think we are approaching something that might not be fundamentally possible with Julia, but I would like to be able to pass in arbitrary functions to a kernel without forcing recompilation of any kind.
I am not sure if it is best to put this here or in GPUCompiler.
related discussions:
The text was updated successfully, but these errors were encountered: