Added additional barriers in sum and inner product OpenCL kernels.
An additional barrier is now put after the reduction step. It should not be needed, but some rare failures of the test suite have been observed. Does not affect performance, because it's only one additional and fast local memory barrier only.
Loading
Please register or sign in to comment