* Eliminated old vector kernels.
* Extended support for interplay of vector-ranges and vector-slices with vector. * Tweaked kernel infrastructure. Kernel launch overhead on an NVIDIA GTX 285 is now about 15us, which is fairly close to the optimum of 13us (minimal kernel for v1 = alpha * v2, alpha being a GPU scalar). * Added support for up to 20 arguments in custom_operation class.
Loading
Please register or sign in to comment