ILU: Added CUDA implementation of parallel Chow-Patel version.
Proper testing still pending. Two transpositions currently carried out on host, might be a bottleneck (probably not, because there are still lots of solver iterations to hide the overheads).
Loading
Please register or sign in to comment