Commits · bdd0ddd9d55605f8fbe9e6e44d3be99ed1718074 · Kaushik Kulkarni / viennacl-dev

Nov 20, 2014
- Changelog: Added notes for 1.6.1 release. · bdd0ddd9
  Karl Rupp authored Nov 20, 2014
  
  bdd0ddd9
- Doxygen: Fixed warnings. · 078328ff
  Karl Rupp authored Nov 20, 2014
  
  078328ff
- Updated version to 1.6.1. · 5bf0373b
  Karl Rupp authored Nov 20, 2014
  
  5bf0373b
- Visual Studio 2012: Fixed performance warnings and a test compilation error. · 1f51ee98
  Karl Rupp authored Nov 20, 2014
```
Warnings were due to conversion of floats to bools.
```
  1f51ee98
Nov 19, 2014
- Direct solve: Fixed errors obtained after resolution of self-assignment problems. · df29d5f3
  Karl Rupp authored Nov 19, 2014
  
  df29d5f3
- CUDA: Fixed compilation error in triangular solve kernels. · f321e151
  Karl Rupp authored Nov 19, 2014
```
Introduced by 0ba719f3.
```
  f321e151
- compressed_matrix: Implemented CSR-adaptive in CUDA and OpenCL. · 7d212433
  Karl Rupp authored Nov 19, 2014
```
See paper "Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Storage Format"
by Greathouse and Daga, presented at SC 2014.
```
  7d212433
- Self-assignment: Added test and fixed all bugs found. · 403b7c87
  Karl Rupp authored Nov 19, 2014
```
Problems were only found in the matrix-times-matrix case.
Resolves #2.
```
  403b7c87
- (sliced_)ell_matrix, hyb_matrix: Added overload for STL-emulated sparse matrix. · e98a3685
  Karl Rupp authored Nov 19, 2014
```
Now a user can directly provide std::vector< std::map<IndexType, NumericType> >
to populate the sparse matrix. This was already possible for the other sparse matrix
types.
```
  e98a3685
Nov 17, 2014

inplace_solve, dense: Simplified code and improved performance. · 0ba719f3

Karl Rupp authored Nov 17, 2014

Resolves #6. A little bit of input-dependent tuning is certainly possible,
yet the overall control flow is now fixed. Performance largely dependent
on the performance of matrix-vector and matrix-matrix products, respectively.

0ba719f3

Nov 16, 2014

Merge pull request #110 from d-meiser/disable-coveralls · 8c848791
Karl Rupp authored Nov 16, 2014
```
Disable coveralls. Takes too long to run and hence fails.
```
8c848791
Cleanup: Removed unused source files. · 45058f42
Karl Rupp authored Nov 16, 2014
```
Old generator tests and OpenCL random number generation. Both superseded.
```
45058f42
Disable coveralls. · 49a6a336
Dominic Meiser authored Nov 16, 2014
```
lcov takes much too long making the travis builds time out and fail.
```
49a6a336

compressed_matrix: Improved documentation. · 91ca8382

Karl Rupp authored Nov 16, 2014

Resolves #95. A completely typesafe interface to .set() is not possible,
because OpenCL is not fully typesafe.

91ca8382

Tests: Disabled non-symmetric eigenvalue routines. · a249b804

Karl Rupp authored Nov 16, 2014

These routines need substantial refactoring anyway (very experimental)
and are not even documented properly. Include them back into test suite
only after the refactoring is completed.

a249b804

compressed_matrix: Fixed invalid memory access for triangular solves. · 6087a180

Karl Rupp authored Nov 16, 2014

Transposed forward solves accessed invalid shared memory, potentially
causing runtime failures on some GPUs. This is now fixed,
thanks to oclgrind for locating the problem quickly.
Addresses the failures Philippe mentioned in #105.

6087a180

Matrix: Fixed kernels for transposition in OpenCL and CUDA. · 1dea3deb
Karl Rupp authored Nov 16, 2014
```
Definitely needs better testing, this was only caught in triangular solvers.
```
1dea3deb

Nov 15, 2014
- OpenCL: Another attempt to work around the problems with older SDKs on CPUs. · 4a68e0b2
  Karl Rupp authored Nov 15, 2014
```
Reset local work size to 1 as CPU default.
Moved a barrier outside an if-conditional in order to simplify the job for the compiler.
```
  4a68e0b2
- Triangular solve: Using local work size 128 on OpenCL for GPUs and CPUs. · 92997fda
  Karl Rupp authored Nov 15, 2014
```
local work sizes 1 or 2 for CPUs seem to cause problems with some SDKs.
```
  92997fda
Nov 14, 2014
- Tests: Reduced SVD test time. · f3c1bb02
  Karl Rupp authored Nov 14, 2014
```
Parts of the test consists of only taking timings.
No need to spend a lot of time on these for nightly tests,
better integrate into a separate benchmark suite.
```
  f3c1bb02
- SVD: Fixed bug introduced by cleaning warnings before 1.6.0 release. · 3d19f65a
  Karl Rupp authored Nov 14, 2014
  
  3d19f65a
- OpenCL: Changed default workgroup sizes for CPUs. · f0f8e7c9
  Karl Rupp authored Nov 14, 2014
```
Running with local work size 1 apparently leads to failures on
both AMD and Intel SDKs (maybe due to barriers being ignored?).
With a local work size of 2 everything is back to normal. Strange...
```
  f0f8e7c9
- Tutorials: Now using BOOST_UBLAS_NDEBUG instead of NDEBUG. · cb2a78c4
  Karl Rupp authored Nov 14, 2014
```
The slow uBLAS doesn't need to bring down all the good debugging features globally.
```
  cb2a78c4
- SPAI: Fixed wrong size of copy to device for dynamic update. · a7912dcd
  Karl Rupp authored Nov 14, 2014
```
Reported by: Andreas Rost (IHU GmbH)
```
  a7912dcd
- Tests: Aborting SVD test after first failure. · 04911530
  Karl Rupp authored Nov 14, 2014
```
This test did not return an error code and was hence incorrectly flagged as successful.
What a shame! :-/
```
  04911530
Nov 13, 2014

GMRES: Improved robustness of pipelined implementation. · 0ac1d107

Karl Rupp authored Nov 13, 2014

Changes:
 - correctly handle an all-zero right hand side
 - iteration count only reflects the true Krylov dimension used
   If orthogonality is lost, the extra basis vectors are not counted.
 - don't run into NaN on residual norm estimator if orthogonality is lost.

Reported-by: Andreas Rost (IHU GmbH)

0ac1d107

Tests: Removed use of uBLAS from floating point vector tests. · 86d20fd2

Karl Rupp authored Nov 13, 2014

Savings on my Ivy Bridge laptop are in the range of 15 percent (GCC)
to 50 percent (Clang) in fully optimized mode.
Memory consumption reduced by a similar amount.

86d20fd2

README: Fixed links to Travis CI and Coveralls. · b174e820
Karl Rupp authored Nov 13, 2014

b174e820
Merge pull request #108 from d-meiser/add-travis-ci · fb7e1926
Karl Rupp authored Nov 13, 2014
```
Add travis and coveralls support for continuous integration.
```
fb7e1926

Nov 10, 2014
- Devices-DB: Fixed initialization of the accelerator AXPY profile · 05b8040f
  Philippe Tillet authored Nov 10, 2014
  
  05b8040f
Nov 08, 2014
- CMake: Last fixes for src-release. 1.6.0 completed. · d1a7d82d
  Karl Rupp authored Nov 08, 2014
  
  d1a7d82d
- Visual Studio 2012: Fixed performance warnings (float/double to bool) · 861a221f
  Karl Rupp authored Nov 08, 2014
```
Delicate balance with warnings in Clang.
```
  861a221f
- CMake: Fixes to packaging system. · a8dec52d
  Karl Rupp authored Nov 08, 2014
```
Moved OpenCL-related code for Darwin to ViennaCLCommon.cmake to avoid
code duplication.
```
  a8dec52d
- Doxygen: Added documentation of sliced_ell_matrix · 7138f6a5
  Karl Rupp authored Nov 08, 2014
  
  7138f6a5
- Sources: Moved matrix_def.hpp and vector_def.hpp into detail/ folder. · 820f645c
  Karl Rupp authored Nov 08, 2014
```
These files are not meant to be included by the user -> hide them :-)
```
  820f645c
- Random: Removed unused code from viennacl/rand/ · 7f625da0
  Karl Rupp authored Nov 08, 2014
```
Was no longer in use.
```
  7f625da0
- Doxygen: Added Juraj Kabzan to list of contributors and refined wording. · 0f47481c
  Karl Rupp authored Nov 08, 2014
  
  0f47481c
- Changelog: Made code contributions for 1.6.0 more explicit. · bb6e2304
  Karl Rupp authored Nov 08, 2014
```
Andreas, Denis, and Juraj provided substantial amounts of new code
and should also receive appropriate credits. :-)
```
  bb6e2304
- Memory: Added overload of default_memory_type() to switch default memory domain. · 9da8e4e6
  Karl Rupp authored Nov 08, 2014
```
Makes it much easier to e.g. only use OpenCL even though CUDA is enabled.
Since this relies on a singleton, the mechanism is not thread-safe.
```
  9da8e4e6
- CUDA: Fixed incomplete refactoring from Doxygen-fixes two commits earlier. · 2ea16808
  Karl Rupp authored Nov 08, 2014
  
  2ea16808