- Jul 31, 2015
- Jul 30, 2015
-
-
Karl Rupp authored
AddCCompilerFlagIfSupported.cmake and AddCLinkerFlagIfSupported.cmake were not included in the dist and dist-src targets.
-
Karl Rupp authored
Visual Studio 2012 ran into ambiguities with respect to conversions. Adding tie-breaker overloads of operator= fixed the problems.
-
Karl Rupp authored
Most notably: - Added fine-grained ILU - Described custom compilation via command line - Better iterative solver description (including mixed-precision-CG) - Update the GPU support table. - Sparse matrix-matrix products.
-
Karl Rupp authored
Improves performance on my laptop by a factor 3.
-
Karl Rupp authored
Resulted in build failures on Visual Studio 2012.
-
Karl Rupp authored
Improves performance on NVIDIA GPUs by about 10 percent on average. Also reduces memory footprint a little.
-
Karl Rupp authored
AMG: Fix a typo: coarseing -> coarsening.
-
Karl Rupp authored
OpenCL context handle was accidentally not set.
-
- Jul 29, 2015
-
-
Bruno Turcksin authored
-
Karl Rupp authored
Same problem showed up with OpenCL earlier in 216a6ac4 I assume that we are hitting a bug in the CUDA stack here, since the problem only shows up on some CUDA devices (e.g. K20m) and only with certain build configurations. A debug build, for example, does not show any issues. See also the follow-up discussion in #147.
-
Karl Rupp authored
-
Karl Rupp authored
Since no standalone PDF manual is available anymore, this option became obsolete.
-
Karl Rupp authored
The respective tuning code for ViennaProfiler is no longer in ViennaCL, so this optional dependency is obsolete.
-
Karl Rupp authored
This is to also support lines such as compressed_matrix<T> A = prod(B, C); So far only operator= was supported.
-
Karl Rupp authored
-
Karl Rupp authored
Old API still supported. New API uses solver objects, where the initial guess as well as the monitor callbacks are registered. New tutorial for usage: iterative-custom Resolves #97.
-
- Jul 28, 2015
- Jul 27, 2015
-
-
Karl Rupp authored
These routines compute the sum of a vector as well as the row- and column-sums of a dense matrix, respectively. The implementation reuses inner products for vectors and matrix-vector products for matrices. Thus, there is some overhead when compared to super-optimized routines involved, but this should be acceptable in almost all cases. Replaces former attempts of a reduce<>() function. Resolves #127.
-
Karl Rupp authored
Basd on request by Sumit Kumar on viennacl-devel.
-
Karl Rupp authored
-
- Jul 25, 2015
-
-
Karl Rupp authored
The previous default of sm_13 is no longer supported with CUDA 7. Not setting a default does not give a hint to the user about the value to set.
-
Philippe Tillet authored
CMake: commented out default architecture options in ViennaCLCommon.cmake. Now relies on the default behavior of nvcc to set the default behavior of ViennaCL. Couldn't find a portable way to detect the architecture portably at build time with CMake.
-
- Jul 23, 2015
- Jul 22, 2015
-
-
Karl Rupp authored
karlrupp/feature-refurbished-amg: Full rewrite of AMG functionality. Fine-grained AMG as described by Bell et al. now available. Includes GPU-accelerated setup. New implementation available for all three backends. Setup on host and application on device also possible. Resolves #16. Conflicts: viennacl/linalg/detail/amg/amg_coarse.hpp viennacl/linalg/detail/amg/amg_debug.hpp viennacl/linalg/detail/amg/amg_interpol.hpp
-
Karl Rupp authored
When randomly feeding matrices from Matrix Market one barely gets good convergence at the first attempt.
-
Karl Rupp authored
Rather than running into an assertion, the user now encounters an exception explaining the reason why the AMG setup failed. For additional diagnostics it is also possible to query the problem sizes at each level.
-
Karl Rupp authored
karlrupp/feature-chow-patel-ilu: Implements a parallel incomplete LU preconditioner. Implements a parallel incomplete Cholesky factorization preconditioner. Both preconditioners use the same sparsity pattern as A. Proposed by Chow and Patel (Algorithms 2 and 3) in SIAM J. Sci. Comp.
-
Karl Rupp authored
Proposed by Chow and Patel in SIAM J. Sci. Comp. Vol. 37, No. 2, pp. C169–C193, 2015 in Algorithm 3. Rather than a column-major computation of U, we compute the row-major L. This saves at least one (costly) transposition.
-
Karl Rupp authored
Only works with compressed_matrix.
-
- Jul 21, 2015
- Jul 20, 2015
-
-
Karl Rupp authored
These are left-overs from earlier coding.
-
Karl Rupp authored
Arguments now passed to getter/setter members rather than all squeezed into the constructor. Interpolation weight is now the same as the Jacobi weight, because the interpolation is just constructed such that it complements the action of the smoother.
-
Karl Rupp authored
Improves performance on some NVIDIA GPUs by up to a factor of two. Only kicks in if the matrix carries more than 12 nonzeros per row on average.
-