Commit 9f683d58 authored Mar 28, 2013 by Karl Rupp

Reverted to old CPU work size deduction, which is better for simple vector kernels.

Thread config 128x128 for CSR matrix-vector product is now applied right there. This gives the best of both worlds.

parent f9d9ddb8

Please to comment