WIP: Support for NVIDIA's Run-Time-Compiler NVRTC
I have the NVIDIA Run-Time-Compiler (NVRTC) up and running, it's just a few API calls without touching the file system at all which makes it very fast.
NVRTC provides better performance in situations where the host generates its .cu sources dynamically at run-time such that caching doesn't help (caching might even be counterproductive in the special case where each generated .cu source is unique).
Differences between NVCC and NVRTC observed so far:
- Any
#include
macros inside .cu sources compiled with NVRTC work only on a subset of header files. It appears that the STL is not supported and generates compiler errors, but for example bothpycuda-helpers.hpp
andpycuda-complex.hpp
(shipped with PyCUDA) work, also note that they include files from<CUDA>/include/*.h
which obviously also work. Theprintf()
function is also available in kernel functions (due to some opaque include magic from NVRTC). Work on this is not yet finished. - The kernel functions which are to be exported from the module need to be declared to NVRTC before compiling
- Compiler command line options are much more limited
- PyCuda needs to be linked against nvrtc.lib/libnvrtc.a, a simple change in setup.py (currently a small hack)
A minimal example of the current API;
kernel_func_source = '__global__ void kernel_func() { /* ... */ }'
mod = DynamicModule()
mod.add_source_nvrtc(kernel_func_source, exports=['kernel_func'])
mod.link()
kernel_func = mod.get_function('kernel_func')
Only add_source_nvrt()
and exports=[...]
are new, exports
contains the list of all function names later used with mod.get_function()
.
Merge request reports
Activity
mentioned in merge request !2 (merged)
Also wanted to say: Thank you for working on this (and !2 (merged)). This is shaping up nicely!
Since yesterday's commit the machine Python 3.5/Titan segfaults at the end of running the tests.
Trying to focus in on the problem, I disabled all new NVRTC related things with a preprocessor macro (so only the changes of my previous patch jit-link remain), yet the crash remains, and I think I need help in order to better understand this.
I have most suspicion around the way I declare my C++ types to Python using Boost::Python, for both classes
NvrtcCompiler
andLinker
. I have no prior experience with Boost::Python, and I could not find proper documenation on it, so maybe someone with better experience can review that part thoroughly.I am develping under Windows 8.1/64 Bit with a Geforce GTX 980, Visual 2015, didn't have any issues so far.
added 1 commit
- dad4bdba - 2nd temporary test to investigate crash: removed nvrtc library
(If you'd like that, just send me an SSH public key at andreask@illinois.edu.)
added 1 commit
- b33a5080 - Reverted temporary tests, modified static linking of nvrtc.lib in setup.py to pr…
added 1 commit
- 784f8c2f - Minor modifications to a few prototypes, removed shared_ptr<> from Boost::Python…
Attached see gdb's backtrace and full backtrace logs. That's one hell of a stack dump...
Cause of the crash is a SIGABRT which smells like a memory inconsistency. Here's the gdb command I used and its output (python3 is the correct interpreter 3.5.3rc1):
$ gdb -q -n -ex bt -batch python3 core > core-bt.log 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
What exactly does this
No such file or directory
(on stderr) mean? Is this just gdb complaining about not being able to find this file raise.c on the local system?While looking through it (most of the visited modules and functions are new to me) one idea crossed my mind: shouldn't my two C++ classes (Linker and NvrtcCompiler) be made context-specific? What is the best practice in PyCUDA to make C++ classes CUDA device context local (like thread-local)?
Edited by Christian SchnellThat's one hell of a stack dump...
Honestly, that looks like a fairly typical Python stack dump to me...
Is this just gdb complaining about not being able to find this file raise.c on the local system?
Yes.
While looking through it (most of the visited modules and functions are new to me) one idea crossed my mind: shouldn't my two C++ classes (Linker and NvrtcCompiler) be made context-specific?
They don't seem context-specific. I.e. if a context went away while compilation is going on, there would not appear to be an issue (since, for example, you don't seem to ever pass a context to the compilation process, right?). I haven't looked at the linker source code--does that receive a context anywhere?
- Resolved by Christian Schnell
added 1 commit
- 715ef3d2 - Final tuning of prototypes, changed C++ class names to properly match Boost class naming convention