Commits · 6a26de59c8ae4f03277a3e1fdbded10dd441ba6b · Kaushik Kulkarni / viennacl-dev

Dec 11, 2014
- Changelog: Last updates for 1.6.2 release. · 6a26de59
  Karl Rupp authored Dec 11, 2014
  
  6a26de59
- OpenMP: Ensured compatibility with OpenMP 2.0 (only signed integers as loop vars) · 605a1a90
  Karl Rupp authored Dec 11, 2014
```
Visual Studio 2012 is an excellent test case.
```
  605a1a90
- Doxygen: Added reference to CSR-adaptive in manual. · 548d8b14
  Karl Rupp authored Dec 11, 2014
  
  548d8b14
- hyb_matrix: Using default workgroup sizes. · 9ce4b367
  Karl Rupp authored Dec 11, 2014
```
Old workgroup size of 128 did not work on low-end hardware.
```
  9ce4b367
- coordinate_matrix: Fixed bug in row_info() kernels for OpenCL and CUDA. · 41189e24
  Karl Rupp authored Dec 11, 2014
```
Writing the last value wasn't handled correctly. Now it is (cf. cuda-memcheck)
```
  41189e24
- coordinate_matrix: Changed local thread size to 128 for better compatibility. · e6faa56e
  Karl Rupp authored Dec 11, 2014
```
The use of the previous value 256 might be too large for some weaker hardware.
Since coordinate_matrix provides only poor performance anyway, we better
use a more portable value here without losing anything.
```
  e6faa56e
- AMG: Fixed possibly incorrect matrix sizes of GPU-based operators. · 731a7810
  Karl Rupp authored Dec 11, 2014
```
Empty columns at the end of restriction or prolongation operators
might have resulted in an incorrect detection of the matrix dimensions.
Using the sparse matrix adapter with explicit size specification,
this is no longer an issue.
```
  731a7810
Dec 10, 2014
- Updated version to 1.6.2 · 12c5ed2c
  Karl Rupp authored Dec 10, 2014
  
  12c5ed2c
- Device database: Removed unused member in template class. · 54450e8b
  Karl Rupp authored Dec 10, 2014
```
Resulted in some complaints of -fsanitize=undefined
```
  54450e8b
- compressed_compressed_matrix: Fixed wrong buffer size in clear() · 384185b0
  Karl Rupp authored Dec 10, 2014
```
Thanks to AddressSanitizer in GCC 4.9
```
  384185b0
- GCC: Fixed all conversion warnings. · 3881ea8b
  Karl Rupp authored Dec 10, 2014
```
-Wall -Wextra -pedantic -Wconversion.
A tricky detail is that the product of two chars or shorts gets
promoted to 'int', so some extra logic was required.
```
  3881ea8b
Dec 09, 2014
- GCC, Clang: Fixed compiler warnings · 390c47c1
  Karl Rupp authored Dec 09, 2014
  
  390c47c1
- Solver Bench: Added sliced_ell_matrix and added pipelined runs. · ecb1088a
  Karl Rupp authored Dec 09, 2014
  
  ecb1088a
- FFT: Fixed flag passed to CUDA kernel. · c09831bb
  Karl Rupp authored Dec 09, 2014
```
Issue introduced here: 4ebc5c60
```
  c09831bb
- OpenMP: Fixed unspecified behavior for operations using reductions. · 71e46368
  Karl Rupp authored Dec 09, 2014
```
Resolves #112 (together with previous commit).
```
  71e46368
- OpenMP: Removed use of private and shared clauses. · 0a8e2999
  Karl Rupp authored Dec 09, 2014
```
This is in response to issue #112, which reports that the use
of OpenMP-'variables' derived from templates is unspecified/undefined.
Problems have been observed with the Fujitsu compiler.
```
  0a8e2999
- Iterative: Moved local array declaration out of CSR-adaptive kernel. · 40686dbf
  Karl Rupp authored Dec 09, 2014
```
Resulted in complaints if called from GMRES, therefore passing it in as a pointer.
Should not affect performance, but verification desired.
```
  40686dbf
- Iterative: Fixed overloads for pipelined iterative solvers. · b825e5d2
  Karl Rupp authored Dec 09, 2014
```
Without these, some user code may accidentally use the solvers
without pipelining, which is not what we want...
```
  b825e5d2
Dec 06, 2014
- Direct solve bench: Removed accidental uBLAS dependency. · 8682205a
  Karl Rupp authored Dec 06, 2014
  
  8682205a
- CUDA: Fixing uses of 'uint' and performance warnings. · 4ebc5c60
  Karl Rupp authored Dec 06, 2014
  
  4ebc5c60
- CUDA: Fixed complaints about destructor in Visual Studio. · 0e2a3758
  Karl Rupp authored Dec 06, 2014
```
Although the original code is perfectly legal, we have to introduce
some preprocessor magic to solve this issue.
```
  0e2a3758
Dec 05, 2014

vector_iterator: Fixed internal handling of smart-pointer. · 55b05254

Karl Rupp authored Dec 05, 2014

Used to be a reference, for which initialization with NULL doesn't work.
Now using copies, which is fine due to smart-pointer semantics.

55b05254

Dec 04, 2014

GMRES: Reverted to previous kernel in first pipelined stage for non-NVIDIA GPUs. · a4b7354a
Karl Rupp authored Dec 04, 2014
```
This reverts commit
4381e000
for non-NVIDIA devices, providing better performance portability.
```
a4b7354a

GMRES: Improved kernel first first stage of pipelined orthogonalization. · 4381e000

Karl Rupp authored Dec 04, 2014

Use of thread-local variables is substantially slower than using
shared memory directly in this case. 2x difference on a Tesla C2050
for this particular kernel. Overall performance gains depend on sparsity
pattern of the matrix (as always).

4381e000

compressed_matrix: Fixed missing context switch for CSR-adaptive metainfo. · 9d8bae24
Karl Rupp authored Dec 04, 2014
```
This has been a problem if one wanted to use compressed_matrix outside the
default memory domain.
```
9d8bae24
Pipelined solvers: Added better parameters for NVIDIA GPUs. · b6758fb9
Karl Rupp authored Dec 04, 2014
```
Results in mild (about 10 percent) performance gains.
```
b6758fb9

sliced_ell_matrix: Setting defaults for NVIDIA GPUs to 256. · acb1ca0c

Karl Rupp authored Dec 04, 2014

Provides about 10 percent better performance on average for a
mix of typical matrices from the Florida Sparse Matrix Collection.

acb1ca0c

Nov 20, 2014
- Doxygen: Added symbolic link to changelog, did not work with 1.8.8. · 7a0f5794
  Karl Rupp authored Nov 20, 2014
  
  7a0f5794
- Doxygen: Now taking version number directly from CMakeLists.txt · f6856df3
  Karl Rupp authored Nov 20, 2014
  
  f6856df3
- CUDA: Added CSR-adaptive to pipelined iterative solvers. · e0d55f9e
  Karl Rupp authored Nov 20, 2014
  
  e0d55f9e
- OpenCL: Added CSR-adaptive for pipelined iterative solvers. · 8678f02f
  Karl Rupp authored Nov 20, 2014
  
  8678f02f
- CUDA: Cleanup of CSR-adaptive implemenentation, adjustment of block sizes. · bac9d4ab
  Karl Rupp authored Nov 20, 2014
```
Based on experiments on a GTX 470. Kepler and Maxwell GPUs might
behave differently.
```
  bac9d4ab
- Tests: Fixed incorrect test code in libviennacl-blas1. · 9eed5e20
  Karl Rupp authored Nov 20, 2014
  
  9eed5e20
- Changelog: Fixed incorrect co-author name of CSR-adaptive paper. · 61792c08
  Karl Rupp authored Nov 20, 2014
  
  61792c08
- Changelog: Added notes for 1.6.1 release. · bdd0ddd9
  Karl Rupp authored Nov 20, 2014
  
  bdd0ddd9
- Doxygen: Fixed warnings. · 078328ff
  Karl Rupp authored Nov 20, 2014
  
  078328ff
- Updated version to 1.6.1. · 5bf0373b
  Karl Rupp authored Nov 20, 2014
  
  5bf0373b
- Visual Studio 2012: Fixed performance warnings and a test compilation error. · 1f51ee98
  Karl Rupp authored Nov 20, 2014
```
Warnings were due to conversion of floats to bools.
```
  1f51ee98
Nov 19, 2014
- Direct solve: Fixed errors obtained after resolution of self-assignment problems. · df29d5f3
  Karl Rupp authored Nov 19, 2014
  
  df29d5f3
- CUDA: Fixed compilation error in triangular solve kernels. · f321e151
  Karl Rupp authored Nov 19, 2014
```
Introduced by 0ba719f3.
```
  f321e151