WIP: Noncontiguous array support for elementwise operations
Adds support for noncontiguous arrays when performing elementwise operations.
The old behavior (no noncontig support) is the default when creating a new elementwise kernel. To request a kernel with noncontiguous array support, pass use_strides=True
into get_elwise_kernel
, and modify calls to func.prepared_async_call
to include the new device_shape_and_strides
property of gpuarrays. Like so:
func.prepared_async_call(self._grid, self._block, None,
self.gpudata, other.gpudata, result.gpudata,
self.mem_size,
self.device_shape_and_strides,
other.device_shape_and_strides,
result.device_shape_and_strides)
This PR modifies the operator passed into get_elwise_kernel
using a regex to insert the stride- and shape-related computations. There are two versions of this regex: one that uses the standard python library re
module, and a more robust one that uses the regex
module. The first regex fails on cases where there are nested array accesses. To switch between them, change recursive_match_outer_brackets=True
on line 43 of elementwise.py.
For now, only 2D arrays are supported. The number of dimensions can be increased trivially by changing the max_dims
parameter in get_elwise_module_noncontig
and the __max_dims__
class attribute of gpuarray
. This should probably be smartly determined in the future, ideally in a way that doesn't require compiling a new kernel for each combination of array shapes.