- Nov 20, 2014
- Nov 19, 2014
-
-
Karl Rupp authored
-
Karl Rupp authored
See paper "Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Storage Format" by Greathouse and Daga, presented at SC 2014.
-
Karl Rupp authored
Problems were only found in the matrix-times-matrix case. Resolves #2.
-
Karl Rupp authored
Now a user can directly provide std::vector< std::map<IndexType, NumericType> > to populate the sparse matrix. This was already possible for the other sparse matrix types.
-
- Nov 17, 2014
-
-
Karl Rupp authored
Resolves #6. A little bit of input-dependent tuning is certainly possible, yet the overall control flow is now fixed. Performance largely dependent on the performance of matrix-vector and matrix-matrix products, respectively.
-
- Nov 16, 2014
-
-
Karl Rupp authored
Disable coveralls. Takes too long to run and hence fails.
-
Karl Rupp authored
Old generator tests and OpenCL random number generation. Both superseded.
-
Dominic Meiser authored
lcov takes much too long making the travis builds time out and fail.
-
Karl Rupp authored
Resolves #95. A completely typesafe interface to .set() is not possible, because OpenCL is not fully typesafe.
-
Karl Rupp authored
These routines need substantial refactoring anyway (very experimental) and are not even documented properly. Include them back into test suite only after the refactoring is completed.
-
Karl Rupp authored
Transposed forward solves accessed invalid shared memory, potentially causing runtime failures on some GPUs. This is now fixed, thanks to oclgrind for locating the problem quickly. Addresses the failures Philippe mentioned in #105.
-
Karl Rupp authored
Definitely needs better testing, this was only caught in triangular solvers.
-
- Nov 15, 2014
- Nov 14, 2014
-
-
Karl Rupp authored
Parts of the test consists of only taking timings. No need to spend a lot of time on these for nightly tests, better integrate into a separate benchmark suite.
-
Karl Rupp authored
-
Karl Rupp authored
Running with local work size 1 apparently leads to failures on both AMD and Intel SDKs (maybe due to barriers being ignored?). With a local work size of 2 everything is back to normal. Strange...
-
Karl Rupp authored
The slow uBLAS doesn't need to bring down all the good debugging features globally.
-
Karl Rupp authored
Reported by: Andreas Rost (IHU GmbH)
-
Karl Rupp authored
This test did not return an error code and was hence incorrectly flagged as successful. What a shame! :-/
-
- Nov 13, 2014
-
-
Karl Rupp authored
Changes: - correctly handle an all-zero right hand side - iteration count only reflects the true Krylov dimension used If orthogonality is lost, the extra basis vectors are not counted. - don't run into NaN on residual norm estimator if orthogonality is lost. Reported-by: Andreas Rost (IHU GmbH)
-
Karl Rupp authored
Savings on my Ivy Bridge laptop are in the range of 15 percent (GCC) to 50 percent (Clang) in fully optimized mode. Memory consumption reduced by a similar amount.
-
Karl Rupp authored
-
Karl Rupp authored
Add travis and coveralls support for continuous integration.
-
- Nov 10, 2014
-
-
Philippe Tillet authored
-
- Nov 08, 2014
-
-
Karl Rupp authored
-
Karl Rupp authored
Delicate balance with warnings in Clang.
-
Karl Rupp authored
Moved OpenCL-related code for Darwin to ViennaCLCommon.cmake to avoid code duplication.
-
Karl Rupp authored
-
Karl Rupp authored
These files are not meant to be included by the user -> hide them :-)
-
Karl Rupp authored
Was no longer in use.
-
Karl Rupp authored
-
Karl Rupp authored
Andreas, Denis, and Juraj provided substantial amounts of new code and should also receive appropriate credits. :-)
-
Karl Rupp authored
Makes it much easier to e.g. only use OpenCL even though CUDA is enabled. Since this relies on a singleton, the mechanism is not thread-safe.
-
Karl Rupp authored
-