compressed_matrix: Added first host-based prototype for SpGEMM.
Reuses some ideas from different papers on SpGEMM for GPUs. Should also work fairly well on Xeon Phi, benchmarks pending. The idea is to consider each row of the result matrix concurrently, so the approach is not restricted to sequential execution. A similar algorithm should be used for the OpenCL and CUDA backends.
Loading
Please sign in to comment