- Jul 20, 2015
-
-
Karl Rupp authored
Improves performance on some NVIDIA GPUs by up to a factor of two. Only kicks in if the matrix carries more than 12 nonzeros per row on average.
-
- Jul 19, 2015
-
-
Karl Rupp authored
Strings are painfully hard to catch, whereas now it is a lot easier to just catch std::exception (or a more specific inherited class as needed).
-
Karl Rupp authored
Makes it much easier to handle errors/exceptions. Some more strings are thrown as exceptions at other locations, will fix them in separate commit.
-
Karl Rupp authored
Checked with Clang 3.0.
-
Karl Rupp authored
-
- Jul 18, 2015
-
-
Karl Rupp authored
Allows one to convert between {(u)int, (u)long, float, double} as needed. Adds support for vectors and dense matrices (including proxies). Support for viennacl::scalar<> already available via casts on host. No support for sparse matrices for now, as no use case in sight. Resolves #80. Partially addresses #124: It is now easier to convert to the same types.
-
Karl Rupp authored
Affected OpenCL backend in generator: "Unsupported reduction operator : no neutral element known"
-
Karl Rupp authored
Otherwise test fails on OpenCL devices without double precision support.
-
- Jul 17, 2015
-
-
Karl Rupp authored
Based on pull request by cdeterman on GitHub. See discussion at #146. A similar approach could also be applied for the non-symmetric case, but that is not considered stable enough (complex?).
-
Karl Rupp authored
Problem was introduced with parent commit.
-
Karl Rupp authored
Includes updates to the checker-script in auxiliary-folder.
-
Karl Rupp authored
Addresses remaining issues in #145.
-
Karl Rupp authored
This code has not been used or tested in years. Time to clean up.
-
Karl Rupp authored
Also extended test suite such that this problem cannot show up again. The cause was one declaration where 'float' was accidentally hard-coded. Resolves #145.
-
- Jul 16, 2015
-
-
Karl Rupp authored
Parameters not obtained from a full-fledged optimizer run, but from careful manual tweaking. Obtained memory bandwidths for BLAS levels 1 and 2 are about 70 GB/sec, which is okay given that vector operations on the Xeon Phi are slow with OpenCL. BLAS level 3 improves very mildly, peaks at about 40 GFLOP/sec. Given that OpenCL for Xeon Phi (KNC) has limited use and that everybody is eager for KNL, further tuning efforts are suspended. Resolves #26.
-
- Jul 15, 2015
-
-
Karl Rupp authored
Use viennacl/tools/timer.hpp instead.
-
Karl Rupp authored
New location: viennacl/tools/random.hpp
-
Karl Rupp authored
karlrupp/feature-improve-lanczos: Extends interface such that also eigenvectors are computed and returned. Removes all uBLAS dependencies (caused problems with some CUDA/Boost combinations). Improves performance for partial reorthogonalization.
-
Karl Rupp authored
Removed a couple of unnecessary host-device copies, removed unused counters. Partial reorthogonalization now also computes eigenvectors if specified.
-
Karl Rupp authored
BLAS2 benchmark: Corrected calculation of bandwidth
-
w2z43t5 authored
Changed "BLAS3_M" and "BLAS3_N" to "BLAS2_M" and "BLAS2_N", respectively.
-
- Jul 14, 2015
-
-
Karl Rupp authored
A library user can now call CG, BiCGStab, and GMRES directly with STL types (vector<map<T, U> >, vector<U>). Internally calculations are forwarded to the corresponding ViennaCL types. Requires the user to include compressed_matrix.hpp and vector.hpp before the respective solver implementation. This is not great, but the best we can reasonably provide at this point. Resolves #12.
-
Karl Rupp authored
size() now makes use of size1(), but was declared before size1(). Inverting the order fixes compilation problems with GCC 4.8 and Clang 3.3.
-
- Jul 13, 2015
-
-
Karl Rupp authored
Simple random number generators now available in viennacl::tools: - uniform in closed interval [0, 1] - normally distributed Both implementations based on rand(), hence not thread-safe. However, they suffice for the time being.
-
Karl Rupp authored
Now demonstrating how to compute only eigenvalues and how to also include eigenvectors.
-
Karl Rupp authored
Example works for all three backends. Required a couple of internal reorganizations of prod(), handle(), and size(). Interface requirement for user-provided operators: - member function apply(x, y); to compute y = A * x; - member function size1(); to return the length of y. The second interface requirement is technical, but will be needed for matrix-free applications of rectangular matrices later. Resolves #74.
-
Karl Rupp authored
The use of y -= prod(A, x) is troublesome for matrix-free applications, hence the residual calculation residual = rhs; residual -= prod(A, current_guess); was modified to residual = prod(A, current_guess); residual = rhs - residual; to circumvent the problem. As an extra benefit, this also improves performance since our current implementation of y -= prod(A, x) for sparse matrices A relies on a temporary for prod(A, x).
-
- Jul 10, 2015
-
-
Karl Rupp authored
karlrupp/fix-and-improve-ilu: Resolves a dangling const-ref issue with preconditioner tags. Fixes a bug in ILU0 (values in U not computed correctly). Improves performance for ILUT by about three-fold. Removes unnecessary temporaries for Block-ILU.
-
Karl Rupp authored
Fixed errors in bisection algorithm
-
Karl Rupp authored
Improves performance by about a factor of three to four. Most of the optimization potential is now leveraged: Each of the block matrices could also be removed by extending the ILU0 and ILUT preconditioners accordingly. This, however, requires larger refactoring and in view of the parallel Chow-Patel ILU may not be worth the effort. Resolves #40.
-
Karl Rupp authored
Now available via the solver tag, member function abs_tolerance(). Handy in cases where the first iterate is close to the solution already. Resolves #123.
-
Karl Rupp authored
Now viennacl::cuda_arg() instead of viennacl::linalg::cuda::detail::cuda_arg. Template argument for scalar, vector and matrix no longer required. Reduces code volume and makes it easier (and more convenient) for library users to inject their own custom kernels. Resolves #132. Resolves #133.
-
Karl Rupp authored
Old interface returning only eigenvalue is still available. A more MATLAB-like interface, e.g. viennacl::tie(eigenvalue, eigenvector) = eig(A, tag); would be nice, but pollutes the code with a lot of template hackery. Resolves request by Charles Determan on the viennacl-devel list.
-
- Jul 09, 2015
-
-
Karl Rupp authored
Allows users to write e.g. A = prod(B+C, x + y); Any expressions passed to prod() are converted to temporaries. This is certainly desirable from a performance point of view for GEMM. One could do better with expression templates for GEMV, but that would not work with OpenCL. Fixes #126.
-
Andi authored
-
Andi authored
Changed std::size_t to unsigned int Added an exception when there are more than 256 eigenvalues Removed unnecessary line breaks
-
- Jul 07, 2015