Skip to content
Commit 7e57ded8 authored by Karl Rupp's avatar Karl Rupp
Browse files

CUDA: Runtime selection of best SpMV kernel for Maxwell devices.

Previous attempt used a dispatch based on __CUDA_ARCH__, which
turned out to be insufficient (__CUDA_ARCH__ only defined in kernel
compilation stage, but not in host compilation stage -> BOOM).

The new code queries the CUDA arch in the first run.
This may lead to non-optimal selections if a user switches the
CUDA device after the first SpMV has been run, but this is likely
to be rare. A repeated query in each SpMV, however, is too costly,
as the device query has about the same overhead as a kernel launch.
parent 1a30cfa5
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment