-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Warnings from methods that raise exceptions and are marked nogil. #1365
Comments
I don't know how to reconcile this warning with rmm/python/rmm/_cuda/stream.pyx Lines 65 to 78 in 596ccf9
It says the function must be called as |
tl;dr: I think we should replace the pattern: cdef void c_foo(self) except * nogil:
do_foo()
def foo(self):
with nogil:
self.c_foo() With def foo(self):
with nogil:
do_foo() Let's think about what the intention is here, to understand why the function should be called in a nogil block. In this code, there are two different locks in play:
So what we're trying to avoid is a lock ordering problem where one thread hold the GIL and is trying to acquire the driver lock while another holds the driver lock and is trying to acquire the GIL.
What could be happening on a different thread? A CUDA call could be in progress that (due to a user callback) needs to acquire the Python GIL while the driver lock is held. So we have a potential for deadlock (thread A holds the GIL and needs the driver lock, thread B holds the driver lock and needs the GIL). So the solution must be (in RMM) to release the lock we don't need (the GIL) when we call into RMM's C++ layer. By hand, this would be something like: void py_synchronize(...)
{
PyGIL_Release();
try {
synchronize(...);
} catch (e) {
PyGIL_Ensure():
raise_as_python_exception(e);
PyGIL_Release();
}
} Translating the C++ exception to a Python one can happily take the GIL because the assumption is that we've dropped whatever locks we took in the driver at the end of calling the C++-level Here's a sketch of the different options we have (AIUI): cdef extern from "foo.h" nogil:
void do_foo() except + nogil
# - Call to C++ function is done with GIL released.
# - _Requires_ that the caller holds the GIL on entry
# - _Acquires_ the GIL it on exit
# - Caller checks for exception by doing PyErr_Occurred()
cdef void c_foo1() except *:
with nogil:
do_foo()
# Current status quo, produces warnings
# - Call to C++ function is done with current GIL state (caller must release GIL)
# - Acquires GIL if an exception was thrown in C++
# - Releases GIL on exit (if acquired due to exception)
# - Caller checks for exception by reacquiring the GIL and doing PyErr_Occurred()
cdef void c_foo2() except * nogil:
do_foo()
# No warnings, but an exception does not produce a useful traceback
# Functionally identical to status quo except worse traceback
# - Call to C++ function is done with current GIL state (caller must release GIL)
# - Acquires GIL if an exception was thrown in C++
# - Releases GIL on exit (if acquired due to exception)
# - Caller does not check for exception (although PyErr is set, so a
# later check will pick it up, but without a traceback)
cdef void c_foo3() noexcept nogil:
do_foo()
# No warnings
# - Call to C++ function is done with current GIL state (caller must release GIL)
# - Acquires GIL if an exception was thrown in C++
# - Releases GIL on exit (if acquired due to exception)
# - Indicates an error occurred to the caller by returning -1
cdef int c_foo4() except -1 nogil:
do_foo()
def foo(*, variant):
if variant == 1:
c_foo1()
elif variant == 2:
with nogil:
c_foo2()
elif variant == 3:
with nogil:
c_foo3()
elif variant == 4:
with nogil:
c_foo4()
elif variant == 5:
# - Call to C++ is done with GIL released
# - We're in Python so we have the GIL and reacquire it after
# this block.
# - Exceptions are translated with GIL acquired
with nogil:
do_foo() Of these, I think variant 5 is the best option. What we're kind of fighting here is that we have Cython-level functions which we want to catch and re-throw C++ exceptions (so that we can propagate up the call stack). But that's not possible, as soon as you leave This performance hint is kind of silly for our use case, all of the different approaches must acquire the GIL to translate the exception to a Python one. The only minor differences are in how the exception information is propagated to the caller. I presume that we have Consequently, what we need is the Python-exported functions on the RMM classes, which by definition are holding the GIL on entry, and must arrange to drop it, at which point they should directly call the Unfortunately, there is no way to mark an extern cdef function as to be called with the GIL dropped, so we must laboriously go through the codebase and ensure we're doing so manually. |
Another option is pin to Cython 0.29 until we have a chance to make these changes |
They're not errors at the moment, so I think pinning would be a backward step (and we'd have to go through the rest of rapids and pin there too...) |
I don't think this will ever be an error because it's perfectly valid code. It's just telling you that it might be slower than you anticipated because of acquiring the GIL. I wouldn't consider that grounds for reversion because all that reversion would do is silence the warning, not change the behavior. |
Agreed, though I think we should consider how to restructure such that we don't get this warning so as we can keep track of cases where warnings might actually be important. |
I agree. I haven't had a chance to read through your very thorough list of proposals, but I imagine we'll want to apply one of them to fix the issue properly. |
Describe the bug
RMM is raising warnings during builds due to
except
being used on methods marked asnogil
. These declarations are not compatible because the GIL must be acquired for exception checks. See build logs below (also linked here):The text was updated successfully, but these errors were encountered: