SpMdM: Added CUDA implementation for hyb_matrix, thus resolving #22
Now: C = prod(A, B); C = prod(A, trans(B)); fully supported, where A is sparse and B, C are dense (both row- or column-major). Some kernels can be further tuned towards improved coalesced memory transfers, or for avoiding memory transfers at all.
Loading
Please register or sign in to comment