test_global_parallel_reduction does not work on GPUs
@zweiner2 reported:
=PYOPENCL_TEST='nvi:0' python test_reduction.py 'test_global_parallel_reduction(cl._csc, 100000)'=
on current master (b7d1c38a) fails on GPUs (e.g. TITAN X on porter).
It works just fine on pocl, though.
This may be my fault. I tweaked that example to get a linear stride in acc42897 after discussing the example with @haogao2.
cc @mattwala