Commit 7e57ded8 authored Aug 05, 2015 by Karl Rupp

CUDA: Runtime selection of best SpMV kernel for Maxwell devices.

Previous attempt used a dispatch based on __CUDA_ARCH__, which
turned out to be insufficient (__CUDA_ARCH__ only defined in kernel
compilation stage, but not in host compilation stage -> BOOM).

The new code queries the CUDA arch in the first run.
This may lead to non-optimal selections if a user switches the
CUDA device after the first SpMV has been run, but this is likely
to be rare. A repeated query in each SpMV, however, is too costly,
as the device query has about the same overhead as a kernel launch.

parent 1a30cfa5

Show whitespace changes

Inline Side-by-side

Please to comment