WIP: Noncontiguous array support for elementwise operations
Adds support for noncontiguous arrays when performing elementwise operations.
The old behavior (no noncontig support) is the default when creating a new elementwise kernel. To request a kernel with noncontiguous array support, pass use_strides=True into get_elwise_kernel, and modify calls to func.prepared_async_call to include the new device_shape_and_strides property of gpuarrays. Like so:
func.prepared_async_call(self._grid, self._block, None,
self.gpudata, other.gpudata, result.gpudata,
self.mem_size,
self.device_shape_and_strides,
other.device_shape_and_strides,
result.device_shape_and_strides)
This PR modifies the operator passed into get_elwise_kernel using a regex to insert the stride- and shape-related computations. There are two versions of this regex: one that uses the standard python library re module, and a more robust one that uses the regex module. The first regex fails on cases where there are nested array accesses. To switch between them, change recursive_match_outer_brackets=True on line 43 of elementwise.py.
For now, only 2D arrays are supported. The number of dimensions can be increased trivially by changing the max_dims parameter in get_elwise_module_noncontig and the __max_dims__ class attribute of gpuarray. This should probably be smartly determined in the future, ideally in a way that doesn't require compiling a new kernel for each combination of array shapes.