Skip to content
Snippets Groups Projects

WIP: Support for NVIDIA's Run-Time-Compiler NVRTC

I have the NVIDIA Run-Time-Compiler (NVRTC) up and running, it's just a few API calls without touching the file system at all which makes it very fast.

NVRTC provides better performance in situations where the host generates its .cu sources dynamically at run-time such that caching doesn't help (caching might even be counterproductive in the special case where each generated .cu source is unique).

Differences between NVCC and NVRTC observed so far:

  • Any #include macros inside .cu sources compiled with NVRTC work only on a subset of header files. It appears that the STL is not supported and generates compiler errors, but for example both pycuda-helpers.hpp and pycuda-complex.hpp (shipped with PyCUDA) work, also note that they include files from <CUDA>/include/*.h which obviously also work. The printf() function is also available in kernel functions (due to some opaque include magic from NVRTC). Work on this is not yet finished.
  • The kernel functions which are to be exported from the module need to be declared to NVRTC before compiling
  • Compiler command line options are much more limited
  • PyCuda needs to be linked against nvrtc.lib/libnvrtc.a, a simple change in setup.py (currently a small hack)

A minimal example of the current API;

kernel_func_source = '__global__ void kernel_func() { /* ... */ }'

mod = DynamicModule()
mod.add_source_nvrtc(kernel_func_source, exports=['kernel_func'])
mod.link()

kernel_func = mod.get_function('kernel_func')

Only add_source_nvrt() and exports=[...] are new, exports contains the list of all function names later used with mod.get_function().

Merge request reports

Pipeline #14065 failed

Pipeline failed for 3be2c980 on jit-compile

The target branch master does not exist. Please restore it or use a different target branch.
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • added 1 commit

    • 715ef3d2 - Final tuning of prototypes, changed C++ class names to properly match Boost class naming convention

    Compare with previous version

  • added 1 commit

    • c2a5f761 - Temporarily disabled dynamic compiler tests

    Compare with previous version

  • added 1 commit

    • b75c2d0c - Temporarily disabled all JIT C++ sources

    Compare with previous version

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading