CUDA: Runtime selection of best SpMV kernel for Maxwell devices.
Previous attempt used a dispatch based on __CUDA_ARCH__, which turned out to be insufficient (__CUDA_ARCH__ only defined in kernel compilation stage, but not in host compilation stage -> BOOM). The new code queries the CUDA arch in the first run. This may lead to non-optimal selections if a user switches the CUDA device after the first SpMV has been run, but this is likely to be rare. A repeated query in each SpMV, however, is too costly, as the device query has about the same overhead as a kernel launch.
Loading
Please sign in to comment