SpGEMM: Workaround for bug in NVIDIA OpenCL compiler.
if (buffer_size == get_local_size(0)) { ... } block caused problems
with NVIDIA drivers 34x.yz. Reproducing the error on simpler kernels
was not possible.
By moving operations on index_in_C and buffer_size out of the block,
the issues get resolved.
Also introduces use of thread-private variable 'local_id' to replace
uses of get_local_id(0) in same kernel.
Might improve performance slightly.
Loading
Please sign in to comment