Skip to content
Commit d1685c1e authored by Karl Rupp's avatar Karl Rupp
Browse files

SpGEMM: Added OpenCL implementation of RMerge.

Replaces old OpenCL implementation.
Uses shared memory rather than warp shuffles.
Uses fixed workgroup size of 32 for merge kernels in order to
get rid of the cost of barriers on AMD devices.
Likely to perform better on AMD devices than on NVIDIA devices,
but performance tests still need to be run.
Fully replaces old OpenCL implementation.
parent fa7ba2fe
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment