SpGEMM: Reduced memory footprint of CUDA implementation.
Maximum number of elements in scratchpad was too pessimistic. Current estimate only depends on the maximum number of nonzeros per row in B processed by the block.
Loading
Please sign in to comment