- Jul 27, 2015
-
-
Karl Rupp authored
These routines compute the sum of a vector as well as the row- and column-sums of a dense matrix, respectively. The implementation reuses inner products for vectors and matrix-vector products for matrices. Thus, there is some overhead when compared to super-optimized routines involved, but this should be acceptable in almost all cases. Replaces former attempts of a reduce<>() function. Resolves #127.
-
Karl Rupp authored
Basd on request by Sumit Kumar on viennacl-devel.
-
Karl Rupp authored
-
- Jul 25, 2015
-
-
Karl Rupp authored
The previous default of sm_13 is no longer supported with CUDA 7. Not setting a default does not give a hint to the user about the value to set.
-
Philippe Tillet authored
CMake: commented out default architecture options in ViennaCLCommon.cmake. Now relies on the default behavior of nvcc to set the default behavior of ViennaCL. Couldn't find a portable way to detect the architecture portably at build time with CMake.
-
- Jul 23, 2015
- Jul 22, 2015
-
-
Karl Rupp authored
karlrupp/feature-refurbished-amg: Full rewrite of AMG functionality. Fine-grained AMG as described by Bell et al. now available. Includes GPU-accelerated setup. New implementation available for all three backends. Setup on host and application on device also possible. Resolves #16. Conflicts: viennacl/linalg/detail/amg/amg_coarse.hpp viennacl/linalg/detail/amg/amg_debug.hpp viennacl/linalg/detail/amg/amg_interpol.hpp
-
Karl Rupp authored
When randomly feeding matrices from Matrix Market one barely gets good convergence at the first attempt.
-
Karl Rupp authored
Rather than running into an assertion, the user now encounters an exception explaining the reason why the AMG setup failed. For additional diagnostics it is also possible to query the problem sizes at each level.
-
Karl Rupp authored
karlrupp/feature-chow-patel-ilu: Implements a parallel incomplete LU preconditioner. Implements a parallel incomplete Cholesky factorization preconditioner. Both preconditioners use the same sparsity pattern as A. Proposed by Chow and Patel (Algorithms 2 and 3) in SIAM J. Sci. Comp.
-
Karl Rupp authored
Proposed by Chow and Patel in SIAM J. Sci. Comp. Vol. 37, No. 2, pp. C169–C193, 2015 in Algorithm 3. Rather than a column-major computation of U, we compute the row-major L. This saves at least one (costly) transposition.
-
Karl Rupp authored
Only works with compressed_matrix.
-
- Jul 21, 2015
- Jul 20, 2015
-
-
Karl Rupp authored
These are left-overs from earlier coding.
-
Karl Rupp authored
Arguments now passed to getter/setter members rather than all squeezed into the constructor. Interpolation weight is now the same as the Jacobi weight, because the interpolation is just constructed such that it complements the action of the smoother.
-
Karl Rupp authored
Improves performance on some NVIDIA GPUs by up to a factor of two. Only kicks in if the matrix carries more than 12 nonzeros per row on average.
-
- Jul 19, 2015
-
-
Karl Rupp authored
Strings are painfully hard to catch, whereas now it is a lot easier to just catch std::exception (or a more specific inherited class as needed).
-
Karl Rupp authored
Makes it much easier to handle errors/exceptions. Some more strings are thrown as exceptions at other locations, will fix them in separate commit.
-
Karl Rupp authored
Checked with Clang 3.0.
-
Karl Rupp authored
-
- Jul 18, 2015
-
-
Karl Rupp authored
Allows one to convert between {(u)int, (u)long, float, double} as needed. Adds support for vectors and dense matrices (including proxies). Support for viennacl::scalar<> already available via casts on host. No support for sparse matrices for now, as no use case in sight. Resolves #80. Partially addresses #124: It is now easier to convert to the same types.
-
Karl Rupp authored
Affected OpenCL backend in generator: "Unsupported reduction operator : no neutral element known"
-
Karl Rupp authored
Otherwise test fails on OpenCL devices without double precision support.
-
- Jul 17, 2015
-
-
Karl Rupp authored
Based on pull request by cdeterman on GitHub. See discussion at #146. A similar approach could also be applied for the non-symmetric case, but that is not considered stable enough (complex?).
-
Karl Rupp authored
Problem was introduced with parent commit.
-
Karl Rupp authored
Includes updates to the checker-script in auxiliary-folder.
-
Karl Rupp authored
Addresses remaining issues in #145.
-
Karl Rupp authored
This code has not been used or tested in years. Time to clean up.
-
Karl Rupp authored
Also extended test suite such that this problem cannot show up again. The cause was one declaration where 'float' was accidentally hard-coded. Resolves #145.
-
- Jul 16, 2015
-
-
Karl Rupp authored
Now provides the following: - coarsening: classical RS, aggregation - smoothing: direct interpolation, aggregation, smoothed-agg All available for all three backends, no longer requiring uBLAS. Former RS0 and RS3 dropped due to a lack of fine-grained parallelism. Implementations mostly based on paper "Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods" by Bell et al. Initial implementation in branch karlrupp/refurbish-amg which became too messy over time, hence this cleanup. New operations available via viennacl/linalg/amg_operations.hpp. This includes assign_to_dense() and amg_transpose(), which should at some point become generally available with a nicer API. Still to be added: - diagnostic information from preconditioner object - documentation in manual - more convenience for amg_tag
-
Karl Rupp authored
Parameters not obtained from a full-fledged optimizer run, but from careful manual tweaking. Obtained memory bandwidths for BLAS levels 1 and 2 are about 70 GB/sec, which is okay given that vector operations on the Xeon Phi are slow with OpenCL. BLAS level 3 improves very mildly, peaks at about 40 GFLOP/sec. Given that OpenCL for Xeon Phi (KNC) has limited use and that everybody is eager for KNL, further tuning efforts are suspended. Resolves #26.
-
- Jul 15, 2015
-
-
Karl Rupp authored
Use viennacl/tools/timer.hpp instead.
-
Karl Rupp authored
New location: viennacl/tools/random.hpp
-
Karl Rupp authored
karlrupp/feature-improve-lanczos: Extends interface such that also eigenvectors are computed and returned. Removes all uBLAS dependencies (caused problems with some CUDA/Boost combinations). Improves performance for partial reorthogonalization.
-
Karl Rupp authored
Removed a couple of unnecessary host-device copies, removed unused counters. Partial reorthogonalization now also computes eigenvectors if specified.
-
Karl Rupp authored
BLAS2 benchmark: Corrected calculation of bandwidth
-
w2z43t5 authored
Changed "BLAS3_M" and "BLAS3_N" to "BLAS2_M" and "BLAS2_N", respectively.
-