Commits · c05884d8fd56ef1b3fb41fce72fb84f530220cfb · Kaushik Kulkarni / viennacl-dev

Jul 21, 2015
- Changelog: Prepared for 1.7.0 release. · c05884d8
  Karl Rupp authored Jul 21, 2015
  
  Supposedly pretty complete, might receive minor final updates.
  c05884d8
Jul 20, 2015

CG, BiCGStab, GMRES: Improved CSR SpMV for pipelined implementations. · 56e45397

Karl Rupp authored Jul 20, 2015

Improves performance on some NVIDIA GPUs by up to a factor of two.
Only kicks in if the matrix carries more than 12 nonzeros per row on average.

56e45397

Jul 19, 2015
- Exceptions: Replaced all string throws with proper exceptions. · e8dc2595
  Karl Rupp authored Jul 19, 2015
  
  Strings are painfully hard to catch, whereas now it is a lot easier to just catch std::exception (or a more specific inherited class as needed).
  e8dc2595
- CUDA: Using proper expressions rather than throwing strings. · 6c39b19d
  Karl Rupp authored Jul 19, 2015
  
  Makes it much easier to handle errors/exceptions. Some more strings are thrown as exceptions at other locations, will fix them in separate commit.
  6c39b19d
- Code quality: Removed warnings for -Wall -pedantic -Wextra -Wconversion · c7fb6d77
  Karl Rupp authored Jul 19, 2015
  
  Checked with Clang 3.0.
  c7fb6d77
- Mixed precision: Fixed flaws in OpenCL kernels. · 8ac81d63
  Karl Rupp authored Jul 19, 2015
  
  8ac81d63
Jul 18, 2015

Mixed precision: Added conversion rountines for vectors and matrices. · 9666d3a3

Karl Rupp authored Jul 18, 2015

Allows one to convert between {(u)int, (u)long, float, double} as needed.
Adds support for vectors and dense matrices (including proxies).
Support for viennacl::scalar<> already available via casts on host.
No support for sparse matrices for now, as no use case in sight.

Resolves #80.
Partially addresses #124: It is now easier to convert to the same types.

9666d3a3

inner_prod: Fixed regression in release mode for multiple products. · e3bbb42c
Karl Rupp authored Jul 18, 2015
```
Affected OpenCL backend in generator:
"Unsupported reduction operator : no neutral element known"
```
e3bbb42c
qr_method: Added guard in tests for checking double precision support. · 5afd16f8
Karl Rupp authored Jul 18, 2015
```
Otherwise test fails on OpenCL devices without double precision support.
```
5afd16f8

Jul 17, 2015
- qr_method: Extended interface to also accept viennacl::vector. · 44ecee28
  Karl Rupp authored Jul 17, 2015
  
  Based on pull request by cdeterman on GitHub. See discussion at #146. A similar approach could also be applied for the non-symmetric case, but that is not considered stable enough (complex?).
  44ecee28
- inner_prod: Fixed compilation problem with multiple inner products. · 0c57d3e1
  Karl Rupp authored Jul 17, 2015
  
  Problem was introduced with parent commit.
  0c57d3e1
- Check includes: All headers are again self-sufficient. · efa8ff30
  Karl Rupp authored Jul 17, 2015
  
  Includes updates to the checker-script in auxiliary-folder.
  efa8ff30
- qr_method: Fixed problems when including matrix.hpp · 3a2a9c7f
  Karl Rupp authored Jul 17, 2015
  
  Addresses remaining issues in #145.
  3a2a9c7f
- SSE: Removed unused implementations. · 3add3d12
  Karl Rupp authored Jul 17, 2015
  
  This code has not been used or tested in years. Time to clean up.
  3add3d12
- qr_method: Fixed compilation problems for double. · 2ad0dfbd
  Karl Rupp authored Jul 17, 2015
  
  Also extended test suite such that this problem cannot show up again. The cause was one declaration where 'float' was accidentally hard-coded. Resolves #145.
  2ad0dfbd
Jul 16, 2015

MIC: Enhanced kernel parameters for BLAS levels 1, 2, 3. · d2ef9b25

Karl Rupp authored Jul 16, 2015

Parameters not obtained from a full-fledged optimizer run,
but from careful manual tweaking. Obtained memory bandwidths
for BLAS levels 1 and 2 are about 70 GB/sec, which is okay given
that vector operations on the Xeon Phi are slow with OpenCL.
BLAS level 3 improves very mildly, peaks at about 40 GFLOP/sec.

Given that OpenCL for Xeon Phi (KNC) has limited use and that
everybody is eager for KNL, further tuning efforts are suspended.

Resolves #26.

d2ef9b25

Jul 15, 2015
- Timer: Removed deprecated examples/benchmarks/benchmark-utils.hpp · 45901648
  Karl Rupp authored Jul 15, 2015
  
  Use viennacl/tools/timer.hpp instead.
  45901648
- Random: Removed Random.hpp in tutorials/ and tests/ folders. · 7e060bae
  Karl Rupp authored Jul 15, 2015
  
  New location: viennacl/tools/random.hpp
  7e060bae
- Merge branch 'karlrupp/feature-improve-lanczos' · 0818dae5
  Karl Rupp authored Jul 15, 2015
  
  karlrupp/feature-improve-lanczos: Extends interface such that also eigenvectors are computed and returned. Removes all uBLAS dependencies (caused problems with some CUDA/Boost combinations). Improves performance for partial reorthogonalization.
  0818dae5
- Lanczos: Improved implementation of partial reorthogonalization, eigenvectors. · 8eabb354
  Karl Rupp authored Jul 15, 2015
  
  Removed a couple of unnecessary host-device copies, removed unused counters. Partial reorthogonalization now also computes eigenvectors if specified.
  8eabb354
- Merge pull request #144 from w2z43t5/master · ddb3ecdf
  Karl Rupp authored Jul 15, 2015
  
  BLAS2 benchmark: Corrected calculation of bandwidth
  ddb3ecdf
- BLAS2 benchmark: Corrected calculation of bandwidth · 1ecfbc8a
  w2z43t5 authored Jul 15, 2015
  
  Changed "BLAS3_M" and "BLAS3_N" to "BLAS2_M" and "BLAS2_N", respectively.
  1ecfbc8a
Jul 14, 2015

Iterative: Added convenience overload for STL types. · 44878475

Karl Rupp authored Jul 14, 2015

A library user can now call CG, BiCGStab, and GMRES directly
with STL types (vector<map<T, U> >, vector<U>).
Internally calculations are forwarded to the corresponding ViennaCL types.
Requires the user to include compressed_matrix.hpp and vector.hpp before
the respective solver implementation. This is not great, but the
best we can reasonably provide at this point.

Resolves #12.

44878475

size()/size1(): Fixed declaration order. · 2a3a0606

Karl Rupp authored Jul 14, 2015

size() now makes use of size1(), but was declared before size1().
Inverting the order fixes compilation problems with GCC 4.8 and Clang 3.3.

2a3a0606

Jul 13, 2015

Lanczos: Removed Boost dependency by replacing random number generator. · 16a44bdf

Karl Rupp authored Jul 13, 2015

Simple random number generators now available in viennacl::tools:
 - uniform in closed interval [0, 1]
 - normally distributed
Both implementations based on rand(), hence not thread-safe.
However, they suffice for the time being.

16a44bdf

Lanczos: Improved structure of tutorial. · 922432e0
Karl Rupp authored Jul 13, 2015
```
Now demonstrating how to compute only eigenvalues and how to also
include eigenvectors.
```
922432e0

Tutorials: Added matrix-free use of iterative solvers. · bb970f32

Karl Rupp authored Jul 13, 2015

Example works for all three backends.
Required a couple of internal reorganizations of prod(), handle(), and size().
Interface requirement for user-provided operators:
 - member function apply(x, y); to compute y = A * x;
 - member function size1(); to return the length of y.
The second interface requirement is technical, but will be needed
for matrix-free applications of rectangular matrices later.

Resolves #74.

bb970f32

BiCGStab, GMRES: Rearranged residual calculation to remove temporaries. · c6597025

Karl Rupp authored Jul 13, 2015

The use of y -= prod(A, x) is troublesome for matrix-free applications,
hence the residual calculation
 residual = rhs;
 residual -= prod(A, current_guess);
was modified to
 residual = prod(A, current_guess);
 residual = rhs - residual;
to circumvent the problem.
As an extra benefit, this also improves performance since our current
implementation of y -= prod(A, x) for sparse matrices A relies
on a temporary for prod(A, x).

c6597025

Jul 10, 2015

Merge branch 'karlrupp/fix-and-improve-ilu' · b8d58253