SpGEMM: Workaround for bug in NVIDIA OpenCL compiler.
if (buffer_size == get_local_size(0)) { ... } block caused problems with NVIDIA drivers 34x.yz. Reproducing the error on simpler kernels was not possible. By moving operations on index_in_C and buffer_size out of the block, the issues get resolved. Also introduces use of thread-private variable 'local_id' to replace uses of get_local_id(0) in same kernel. Might improve performance slightly.
Loading
Please register or sign in to comment