Merge branch 'karlrupp/sparse-matrix-matrix-product'
karlrupp/sparse-matrix-matrix-product: Fast implementations of sparse matrix-matrix products. About 1.5x faster than MKL on Haswell if AVX2 enabled. About 1.5x faster than CUSP and CUBLAS on NVIDIA GPUs. About the same performnace on MIC. Faster on FirePro W9100 with OpenCL than on a Tesla K20m with CUDA. A few more tweaks possible, but will be applied in a separate feature branch.
Loading
Please register or sign in to comment