Enable kernel execution w/ Local Arguments(Was: Local arguments)
Update: I've refactored this PR to take into account the new ArrayArg class, as well as updating / improving cl.LocalMemory
validation to properly handle checking (i.e., ensuring that any input argument is correctly sized).
This solves a problem I've been hacking my way around for some time, i.e., the concept of local
kernel arguments.
The issue mainly arises in OpenCL (see here), as __local
variables need to be defined in the top-level kernel call (and then passed around to sub-kernels).
I had previously implemented some chicanery to get around this (I created a class that derived from TemporaryVariable
and KernelArgument
and then fiddled with the argdecls), but this broke w/ 2018.1, so here we are!
The main thrusts of this are:
- Define a
LocalArg
, this is essentially only different from theGlobalArg
in that it defines anbytes
property for compatibility withTemporaryVariable
- Implement a
get_local_arg_decl
for the OpenCL and CUDA targets (I believe that's all of them w/ "local" equivalents?) - Implement
LocalArg
"creation" / validation in the pyopencl executor -- this checks that any argument passed in as a LocalArg is either a) None or b) an instance ofpyopencl.LocalMemory
. In case a) it simply creates theLocalMemory
itself. - Add a basic test (test_loopy.py::test_local_args), we probably want some more tests? Any suggestions?
Edited by Nick Curtis