Enable kernel execution w/ Local Arguments(Was: Local arguments)
Update: I've refactored this PR to take into account the new ArrayArg class, as well as updating / improving cl.LocalMemory validation to properly handle checking (i.e., ensuring that any input argument is correctly sized).
This solves a problem I've been hacking my way around for some time, i.e., the concept of local kernel arguments.
The issue mainly arises in OpenCL (see here), as __local variables need to be defined in the top-level kernel call (and then passed around to sub-kernels).
I had previously implemented some chicanery to get around this (I created a class that derived from TemporaryVariable and KernelArgument and then fiddled with the argdecls), but this broke w/ 2018.1, so here we are!
The main thrusts of this are:
- Define a
LocalArg, this is essentially only different from theGlobalArgin that it defines anbytesproperty for compatibility withTemporaryVariable - Implement a
get_local_arg_declfor the OpenCL and CUDA targets (I believe that's all of them w/ "local" equivalents?) - Implement
LocalArg"creation" / validation in the pyopencl executor -- this checks that any argument passed in as a LocalArg is either a) None or b) an instance ofpyopencl.LocalMemory. In case a) it simply creates theLocalMemoryitself. - Add a basic test (test_loopy.py::test_local_args), we probably want some more tests? Any suggestions?
Edited by Nick Curtis