* Reduced generic vector kernel (av, avbv, avbv_v) startup by 10-20 percent by packing arguments
* Matrix-matrix operations for CUDA now functional. Performance is lower than with OpenCL, though...
Loading
Please register or sign in to comment
* Matrix-matrix operations for CUDA now functional. Performance is lower than with OpenCL, though...