You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
This problem is annoying me for years: I find pycuda runs extremely slow on Windows but not on Linux. My program contains ~20 ElementwiseKernels and ReductionKernels. I find that the SourceModule is used to compile the code, and it will save the cubin files to the cache_dir. It works well on any Linux machine as I tested, which only have ~1s overhead to load the functions later. However, running my code on Windows for the first time costs ~2min, and later it still costs ~1min. This is because it always need to preprocess the code since the source code always contains #include <pycuda-complex.hpp>:
As I tested, on any Windows computer, running nvcc --preprocess "empty_file.cu" --compiler-options -EP takes several seconds. In other words, the condition of whether using cache takes a very long time to compute.
Describe the solution you'd like
I tried to monkey patch this to remove the preprocess call above, and it works well. I'd like to find a better way to do it. The easiest way I can think of is adding an option to force ignoring the #include check (though it should not be used by default, since the user must know the potential risk)
Describe alternatives you've considered
Is there any nvcc options to speed-up the preprocessing? I don't know.
Thanks for the report, I had no idea. Continuing along those lines, I'm not sure I have a good idea for how to approach this. We could introduce a flag so you don't have to monkeypatch, but that sacrifices correctness to an extent.
#463
Does this make sense: we check the include_dirs, and ignore #include when that's empty? This should cover most of the simple cases, though it may still be incorrect if the user upgrades the CUDA/pycuda version, and keeps the cache (it's possible to append these version numbers in the cache folder/file name to invalidate this case). Also, we can do this only for the poor Windows users to minimize the potential problems.
Is your feature request related to a problem? Please describe.
This problem is annoying me for years: I find pycuda runs extremely slow on Windows but not on Linux. My program contains ~20
ElementwiseKernel
s andReductionKernel
s. I find that theSourceModule
is used to compile the code, and it will save the cubin files to thecache_dir
. It works well on any Linux machine as I tested, which only have ~1s overhead to load the functions later. However, running my code on Windows for the first time costs ~2min, and later it still costs ~1min. This is because it always need to preprocess the code since the source code always contains#include <pycuda-complex.hpp>
:pycuda/pycuda/compiler.py
Lines 89 to 90 in 96aab3f
As I tested, on any Windows computer, running
nvcc --preprocess "empty_file.cu" --compiler-options -EP
takes several seconds. In other words, the condition of whether using cache takes a very long time to compute.Describe the solution you'd like
I tried to monkey patch this to remove the preprocess call above, and it works well. I'd like to find a better way to do it. The easiest way I can think of is adding an option to force ignoring the
#include
check (though it should not be used by default, since the user must know the potential risk)Describe alternatives you've considered
Is there any nvcc options to speed-up the preprocessing? I don't know.
Additional context
The link below is one of the examples I worked on, but I guess any simple functionality of the
GPUArray
relies on theSourceModule
is impacted by this.https://github.com/bu-cisl/SSNP-IDT/blob/master/examples/forward_model.py
The text was updated successfully, but these errors were encountered: