Commits · 12c5ed2cedf64f21465c066e486b8cf0417caaa3 · Kaushik Kulkarni / viennacl-dev

Dec 10, 2014
- Updated version to 1.6.2 · 12c5ed2c
  Karl Rupp authored Dec 10, 2014
  
  12c5ed2c
- Device database: Removed unused member in template class. · 54450e8b
  Karl Rupp authored Dec 10, 2014
```
Resulted in some complaints of -fsanitize=undefined
```
  54450e8b
- compressed_compressed_matrix: Fixed wrong buffer size in clear() · 384185b0
  Karl Rupp authored Dec 10, 2014
```
Thanks to AddressSanitizer in GCC 4.9
```
  384185b0
- GCC: Fixed all conversion warnings. · 3881ea8b
  Karl Rupp authored Dec 10, 2014
```
-Wall -Wextra -pedantic -Wconversion.
A tricky detail is that the product of two chars or shorts gets
promoted to 'int', so some extra logic was required.
```
  3881ea8b
Dec 09, 2014
- GCC, Clang: Fixed compiler warnings · 390c47c1
  Karl Rupp authored Dec 09, 2014
  
  390c47c1
- Solver Bench: Added sliced_ell_matrix and added pipelined runs. · ecb1088a
  Karl Rupp authored Dec 09, 2014
  
  ecb1088a
- FFT: Fixed flag passed to CUDA kernel. · c09831bb
  Karl Rupp authored Dec 09, 2014
```
Issue introduced here: 4ebc5c60
```
  c09831bb
- OpenMP: Fixed unspecified behavior for operations using reductions. · 71e46368
  Karl Rupp authored Dec 09, 2014
```
Resolves #112 (together with previous commit).
```
  71e46368
- OpenMP: Removed use of private and shared clauses. · 0a8e2999
  Karl Rupp authored Dec 09, 2014
```
This is in response to issue #112, which reports that the use
of OpenMP-'variables' derived from templates is unspecified/undefined.
Problems have been observed with the Fujitsu compiler.
```
  0a8e2999
- Iterative: Moved local array declaration out of CSR-adaptive kernel. · 40686dbf
  Karl Rupp authored Dec 09, 2014
```
Resulted in complaints if called from GMRES, therefore passing it in as a pointer.
Should not affect performance, but verification desired.
```
  40686dbf
- Iterative: Fixed overloads for pipelined iterative solvers. · b825e5d2
  Karl Rupp authored Dec 09, 2014
```
Without these, some user code may accidentally use the solvers
without pipelining, which is not what we want...
```
  b825e5d2
Dec 06, 2014
- Direct solve bench: Removed accidental uBLAS dependency. · 8682205a
  Karl Rupp authored Dec 06, 2014
  
  8682205a
- CUDA: Fixing uses of 'uint' and performance warnings. · 4ebc5c60
  Karl Rupp authored Dec 06, 2014
  
  4ebc5c60
- CUDA: Fixed complaints about destructor in Visual Studio. · 0e2a3758
  Karl Rupp authored Dec 06, 2014
```
Although the original code is perfectly legal, we have to introduce
some preprocessor magic to solve this issue.
```
  0e2a3758
Dec 05, 2014

vector_iterator: Fixed internal handling of smart-pointer. · 55b05254

Karl Rupp authored Dec 05, 2014

Used to be a reference, for which initialization with NULL doesn't work.
Now using copies, which is fine due to smart-pointer semantics.

55b05254

Dec 04, 2014

GMRES: Reverted to previous kernel in first pipelined stage for non-NVIDIA GPUs. · a4b7354a
Karl Rupp authored Dec 04, 2014
```
This reverts commit
4381e000
for non-NVIDIA devices, providing better performance portability.
```
a4b7354a

GMRES: Improved kernel first first stage of pipelined orthogonalization. · 4381e000

Karl Rupp authored Dec 04, 2014

Use of thread-local variables is substantially slower than using
shared memory directly in this case. 2x difference on a Tesla C2050
for this particular kernel. Overall performance gains depend on sparsity
pattern of the matrix (as always).

4381e000

compressed_matrix: Fixed missing context switch for CSR-adaptive metainfo. · 9d8bae24
Karl Rupp authored Dec 04, 2014
```
This has been a problem if one wanted to use compressed_matrix outside the
default memory domain.
```
9d8bae24
Pipelined solvers: Added better parameters for NVIDIA GPUs. · b6758fb9
Karl Rupp authored Dec 04, 2014
```
Results in mild (about 10 percent) performance gains.
```
b6758fb9

sliced_ell_matrix: Setting defaults for NVIDIA GPUs to 256. · acb1ca0c

Karl Rupp authored Dec 04, 2014

Provides about 10 percent better performance on average for a
mix of typical matrices from the Florida Sparse Matrix Collection.

acb1ca0c

Nov 20, 2014
- Doxygen: Added symbolic link to changelog, did not work with 1.8.8. · 7a0f5794
  Karl Rupp authored Nov 20, 2014
  
  7a0f5794
- Doxygen: Now taking version number directly from CMakeLists.txt · f6856df3
  Karl Rupp authored Nov 20, 2014
  
  f6856df3
- CUDA: Added CSR-adaptive to pipelined iterative solvers. · e0d55f9e
  Karl Rupp authored Nov 20, 2014
  
  e0d55f9e
- OpenCL: Added CSR-adaptive for pipelined iterative solvers. · 8678f02f
  Karl Rupp authored Nov 20, 2014
  
  8678f02f
- CUDA: Cleanup of CSR-adaptive implemenentation, adjustment of block sizes. · bac9d4ab
  Karl Rupp authored Nov 20, 2014
```
Based on experiments on a GTX 470. Kepler and Maxwell GPUs might
behave differently.
```
  bac9d4ab
- Tests: Fixed incorrect test code in libviennacl-blas1. · 9eed5e20
  Karl Rupp authored Nov 20, 2014
  
  9eed5e20
- Changelog: Fixed incorrect co-author name of CSR-adaptive paper. · 61792c08
  Karl Rupp authored Nov 20, 2014
  
  61792c08
- Changelog: Added notes for 1.6.1 release. · bdd0ddd9
  Karl Rupp authored Nov 20, 2014
  
  bdd0ddd9
- Doxygen: Fixed warnings. · 078328ff
  Karl Rupp authored Nov 20, 2014
  
  078328ff
- Updated version to 1.6.1. · 5bf0373b
  Karl Rupp authored Nov 20, 2014
  
  5bf0373b
- Visual Studio 2012: Fixed performance warnings and a test compilation error. · 1f51ee98
  Karl Rupp authored Nov 20, 2014
```
Warnings were due to conversion of floats to bools.
```
  1f51ee98
Nov 19, 2014
- Direct solve: Fixed errors obtained after resolution of self-assignment problems. · df29d5f3
  Karl Rupp authored Nov 19, 2014
  
  df29d5f3
- CUDA: Fixed compilation error in triangular solve kernels. · f321e151
  Karl Rupp authored Nov 19, 2014
```
Introduced by 0ba719f3.
```
  f321e151
- compressed_matrix: Implemented CSR-adaptive in CUDA and OpenCL. · 7d212433
  Karl Rupp authored Nov 19, 2014
```
See paper "Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Storage Format"
by Greathouse and Daga, presented at SC 2014.
```
  7d212433
- Self-assignment: Added test and fixed all bugs found. · 403b7c87
  Karl Rupp authored Nov 19, 2014
```
Problems were only found in the matrix-times-matrix case.
Resolves #2.
```
  403b7c87
- (sliced_)ell_matrix, hyb_matrix: Added overload for STL-emulated sparse matrix. · e98a3685
  Karl Rupp authored Nov 19, 2014
```
Now a user can directly provide std::vector< std::map<IndexType, NumericType> >
to populate the sparse matrix. This was already possible for the other sparse matrix
types.
```
  e98a3685
Nov 17, 2014

inplace_solve, dense: Simplified code and improved performance. · 0ba719f3

Karl Rupp authored Nov 17, 2014

Resolves #6. A little bit of input-dependent tuning is certainly possible,
yet the overall control flow is now fixed. Performance largely dependent
on the performance of matrix-vector and matrix-matrix products, respectively.

0ba719f3

Nov 16, 2014
- Merge pull request #110 from d-meiser/disable-coveralls · 8c848791
  Karl Rupp authored Nov 16, 2014
```
Disable coveralls. Takes too long to run and hence fails.
```
  8c848791
- Cleanup: Removed unused source files. · 45058f42
  Karl Rupp authored Nov 16, 2014
```
Old generator tests and OpenCL random number generation. Both superseded.
```
  45058f42
- Disable coveralls. · 49a6a336
  Dominic Meiser authored Nov 16, 2014
```
lcov takes much too long making the travis builds time out and fail.
```
  49a6a336