Skip to content
Commit 33d66246 authored by Karl Rupp's avatar Karl Rupp
Browse files

Removed static temporaries for inner_prod() and norm_X() for CUDA and OpenCL backends.

These optimizations resulted in race conditions for a multithreaded setting.
The drawback now is higher 'launch' overhead in these routines.
Benchmarking required in order to quantify overhead and consider further steps (temporaries in context)
parent db34bc39
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment