Reverted to old CPU work size deduction, which is better for simple vector kernels.
Thread config 128x128 for CSR matrix-vector product is now applied right there. This gives the best of both worlds.
Loading
Please register or sign in to comment