WIP: Support for NVIDIA's Run-Time-Compiler NVRTC
I have the NVIDIA Run-Time-Compiler (NVRTC) up and running, it's just a few API calls without touching the file system at all which makes it very fast.
NVRTC provides better performance in situations where the host generates its .cu sources dynamically at run-time such that caching doesn't help (caching might even be counterproductive in the special case where each generated .cu source is unique).
Differences between NVCC and NVRTC observed so far:
- Any
#include
macros inside .cu sources compiled with NVRTC work only on a subset of header files. It appears that the STL is not supported and generates compiler errors, but for example bothpycuda-helpers.hpp
andpycuda-complex.hpp
(shipped with PyCUDA) work, also note that they include files from<CUDA>/include/*.h
which obviously also work. Theprintf()
function is also available in kernel functions (due to some opaque include magic from NVRTC). Work on this is not yet finished. - The kernel functions which are to be exported from the module need to be declared to NVRTC before compiling
- Compiler command line options are much more limited
- PyCuda needs to be linked against nvrtc.lib/libnvrtc.a, a simple change in setup.py (currently a small hack)
A minimal example of the current API;
kernel_func_source = '__global__ void kernel_func() { /* ... */ }'
mod = DynamicModule()
mod.add_source_nvrtc(kernel_func_source, exports=['kernel_func'])
mod.link()
kernel_func = mod.get_function('kernel_func')
Only add_source_nvrt()
and exports=[...]
are new, exports
contains the list of all function names later used with mod.get_function()
.