- Dec 11, 2014
-
-
Karl Rupp authored
-
Karl Rupp authored
Visual Studio 2012 is an excellent test case.
-
Karl Rupp authored
-
Karl Rupp authored
Old workgroup size of 128 did not work on low-end hardware.
-
Karl Rupp authored
Writing the last value wasn't handled correctly. Now it is (cf. cuda-memcheck)
-
Karl Rupp authored
The use of the previous value 256 might be too large for some weaker hardware. Since coordinate_matrix provides only poor performance anyway, we better use a more portable value here without losing anything.
-
Karl Rupp authored
Empty columns at the end of restriction or prolongation operators might have resulted in an incorrect detection of the matrix dimensions. Using the sparse matrix adapter with explicit size specification, this is no longer an issue.
-
- Dec 10, 2014
-
-
Karl Rupp authored
-
Karl Rupp authored
Resulted in some complaints of -fsanitize=undefined
-
Karl Rupp authored
Thanks to AddressSanitizer in GCC 4.9
-
Karl Rupp authored
-Wall -Wextra -pedantic -Wconversion. A tricky detail is that the product of two chars or shorts gets promoted to 'int', so some extra logic was required.
-
- Dec 09, 2014
-
-
Karl Rupp authored
-
Karl Rupp authored
-
Karl Rupp authored
Resolves #112 (together with previous commit).
-
Karl Rupp authored
This is in response to issue #112, which reports that the use of OpenMP-'variables' derived from templates is unspecified/undefined. Problems have been observed with the Fujitsu compiler.
-
Karl Rupp authored
Resulted in complaints if called from GMRES, therefore passing it in as a pointer. Should not affect performance, but verification desired.
-
Karl Rupp authored
Without these, some user code may accidentally use the solvers without pipelining, which is not what we want...
-
- Dec 06, 2014
- Dec 05, 2014
-
-
Karl Rupp authored
Used to be a reference, for which initialization with NULL doesn't work. Now using copies, which is fine due to smart-pointer semantics.
-
- Dec 04, 2014
-
-
Karl Rupp authored
Use of thread-local variables is substantially slower than using shared memory directly in this case. 2x difference on a Tesla C2050 for this particular kernel. Overall performance gains depend on sparsity pattern of the matrix (as always).
-
Karl Rupp authored
This has been a problem if one wanted to use compressed_matrix outside the default memory domain.
-
Karl Rupp authored
Results in mild (about 10 percent) performance gains.
-
Karl Rupp authored
Provides about 10 percent better performance on average for a mix of typical matrices from the Florida Sparse Matrix Collection.
- Nov 20, 2014
-
-
Karl Rupp authored
-
Karl Rupp authored
-
Karl Rupp authored
-
Karl Rupp authored
-
Karl Rupp authored
Based on experiments on a GTX 470. Kepler and Maxwell GPUs might behave differently.
-
Karl Rupp authored
-
Karl Rupp authored
-
Karl Rupp authored
-
Karl Rupp authored
-
Karl Rupp authored
-
Karl Rupp authored
Warnings were due to conversion of floats to bools.
-
- Nov 19, 2014
-
-
Karl Rupp authored
-