SpGEMM: Added OpenCL implementation of RMerge.
Replaces old OpenCL implementation. Uses shared memory rather than warp shuffles. Uses fixed workgroup size of 32 for merge kernels in order to get rid of the cost of barriers on AMD devices. Likely to perform better on AMD devices than on NVIDIA devices, but performance tests still need to be run. Fully replaces old OpenCL implementation.
Loading
Please register or sign in to comment