SpGEMM: Added profiling information to CUDA implementation, using __ldg()
__ldg() is supposed to improve cache utilization. Profiling available via VIENNACL_WITH_SPGEMM_CUDA_TIMINGS
Loading
Please sign in to comment
__ldg() is supposed to improve cache utilization. Profiling available via VIENNACL_WITH_SPGEMM_CUDA_TIMINGS