Commit 575300ac authored Nov 05, 2012 by Karl Rupp

* Reduced generic vector kernel (av, avbv, avbv_v) startup by 10-20 percent by packing arguments

* Matrix-matrix operations for CUDA now functional. Performance is lower than with OpenCL, though...

parent a844270f

Please to comment