Skip to content
Commit 297def3b authored by Karl Rupp's avatar Karl Rupp
Browse files

Modified inner_prod() such that summation after multi-group reduction is...

Modified inner_prod() such that summation after multi-group reduction is performed on CPU, unless LHS is a GPU scalar.
This gives a few percent of performance for CG and BiCGStab and eliminates messy temporaries.
parent 9950b23b
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment