- Aug 10, 2014
-
-
Toby Smithe authored
-
Toby Smithe authored
-
- Aug 09, 2014
-
-
Toby Smithe authored
-
- Aug 08, 2014
-
-
Toby Smithe authored
-
Karl Rupp authored
-
Toby Smithe authored
-
Karl Rupp authored
Thanks to Toby St Clere Smithe for reporting on viennacl-devel.
-
Toby Smithe authored
More satisfactory patch for NMF 0-matrix bug: compute norms of initial W and H, and if 0, set matrix/-ices to 1.0-valued-matrices
-
Karl Rupp authored
karlrupp/feature-pipelined-bicgstab: Implements a pipelined version of BiCGStab with substantially improved performance especially for smaller matrices. The algorithm is a slightly rearranged version of BiCGStab as given in the book by Y. Saad and only requires a single host<->device transfer in each step. A fairly similar algorithm has been proposed in the paper Jacques et al, "Electromagnetic scattering with the boundary integral method on MIMD systems", LNCS 1593, Springer, 1999.
-
Karl Rupp authored
Some missing const qualifiers in the OpenCL kernels resulted in an almost 4x-performance drop. With equal const qualifiers, performance is now within a roughly ten percent margin.
-
Karl Rupp authored
-
Karl Rupp authored
-
- Aug 07, 2014
-
-
Toby Smithe authored
-
Karl Rupp authored
-
Karl Rupp authored
Implementation based on the BiCGStab listed in the book by Saad and then optimized for minimum memory traffic and synchronization points similar to the paper by Jacques et al.
-
- Aug 06, 2014
-
-
Karl Rupp authored
karlrupp/feature-pipelined-cg: Improvements of the conjugate gradient algorithm by using custom kernels for full use of pipelining and avoiding kernel launch overheads to the extent possible. Follows algorithm 2.2 in the paper by Chronopoulos and Gear: "s-step Iterative Methods for Symmetric Linear Systems", J. Comp. and Appl. Math 25 (1989).
-
Karl Rupp authored
-
Karl Rupp authored
-
- Aug 05, 2014
-
-
Karl Rupp authored
Support for other sparse matrices to be added later.
-
- Aug 04, 2014
-
-
Karl Rupp authored
Provides a routine which generates a sparse matrix obtained from the discretization of the Laplace equation using finite differences and lexicographical ordering of the unknowns. One of the simplest, symmetric and positive definite matrices.
-
Karl Rupp authored
Implementation for OpenCL and CUDA to come. Reference for the implementation: Algorithm 2.2 in Chronopoulos and Gear: "s-step Iterative Methods for Symmetric Linear Systems", Journal of Comp. and Appl. Math, 1989.
-
Karl Rupp authored
wip-sliced-ell-matrix: Support for Sliced ELL-matrix format. Currently only matrix-vector products, but no sparse-matrix-times-dense-matrix-products.
-
Karl Rupp authored
So far "-arch=sm_13" was hardcoded. Now the user can overwrite this through the CUDA_ARCH_FLAG in CMake.
-
Karl Rupp authored
This format is an improved version of the ELL format, aiming at maximized memory bandwidth for reading matrix values. Currently only matrix-vector products are supported, no sparse-matrix-by-dense-matrix products. To be added at some later time. See paper "A unified sparse matrix data format for efficient general sparse matrix-vector multiply on modern processors with wide SIMD units" by Kreutzer et al. for details. We don't use any sorting of rows here.
-
- Aug 03, 2014
-
-
Karl Rupp authored
Also works in the case that A and B are the same matrix. Currently this operation is fairly slow, because the transposition is performed in main memory. On the other hand, it's certainly better to have a slow implementation rather than no implementation at all...
-
- Aug 02, 2014
-
-
Karl Rupp authored
A subnamespace was ambiguous, at least according to MS' interpretation of the standard. Reported-by: Matthew Musto <matthew.musto@gmail.com> via viennacl-devel
-
- Aug 01, 2014
-
-
Karl Rupp authored
-
Karl Rupp authored
Some nightly tests failed for element_pow() because numerical round-off errors could become too severe. This commit reduces the exponent, which should ultimately lead to no more spurious test failures.
-
Karl Rupp authored
For some reason the template argument is explicitly required here. Doesn't hurt us, so let's use the more verbose variant here.
-
Toby Smithe authored
-
- Jul 27, 2014
-
-
Philippe Tillet authored
Probably requires more investigation, it may decrease the performance on some platforms, increase it on some others...
-
Philippe Tillet authored
-
- Jul 26, 2014
-
-
Philippe Tillet authored
-
Philippe Tillet authored
Device-specific / GEMM : Implemented the new FETCH_GLOBAL_CONTIGUOUS policy. Changed use_{A,B}_local from bool to an enum...
-
- Jul 24, 2014
-
-
Philippe Tillet authored
Device-specific : Added accessor to template::parameters(); rename template::parameters to template::parameters_type
-
Karl Rupp authored
-
- Jul 22, 2014
-
-
Karl Rupp authored
Fixes #72. Under MinGW it is required to explicitly link with 'gomp'.
-
Philippe Tillet authored
Could cause compilation errors on some platforms
-
Philippe Tillet authored
-
Philippe Tillet authored
-