Skip to content
Commit 4381e000 authored by Karl Rupp's avatar Karl Rupp
Browse files

GMRES: Improved kernel first first stage of pipelined orthogonalization.

Use of thread-local variables is substantially slower than using
shared memory directly in this case. 2x difference on a Tesla C2050
for this particular kernel. Overall performance gains depend on sparsity
pattern of the matrix (as always).
parent 9d8bae24
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment