Skip to content

Enable kernel execution w/ Local Arguments(Was: Local arguments)

Nick Curtis requested to merge arghdos/loopy:localarg into master

Update: I've refactored this PR to take into account the new ArrayArg class, as well as updating / improving cl.LocalMemory validation to properly handle checking (i.e., ensuring that any input argument is correctly sized).


This solves a problem I've been hacking my way around for some time, i.e., the concept of local kernel arguments.

The issue mainly arises in OpenCL (see here), as __local variables need to be defined in the top-level kernel call (and then passed around to sub-kernels).

I had previously implemented some chicanery to get around this (I created a class that derived from TemporaryVariable and KernelArgument and then fiddled with the argdecls), but this broke w/ 2018.1, so here we are!

The main thrusts of this are:

  1. Define a LocalArg, this is essentially only different from the GlobalArg in that it defines a nbytes property for compatibility with TemporaryVariable
  2. Implement a get_local_arg_decl for the OpenCL and CUDA targets (I believe that's all of them w/ "local" equivalents?)
  3. Implement LocalArg "creation" / validation in the pyopencl executor -- this checks that any argument passed in as a LocalArg is either a) None or b) an instance of pyopencl.LocalMemory. In case a) it simply creates the LocalMemory itself.
  4. Add a basic test (test_loopy.py::test_local_args), we probably want some more tests? Any suggestions?
Edited by Nick Curtis

Merge request reports