Commits · eff2ace32db9696fd73315bf8c915108469e0605 · Kaushik Kulkarni / viennacl-dev

Dec 02, 2012

* Updated 'Experimental'-status in source files. · eff2ace3

Karl Rupp authored Dec 01, 2012

* Added Doxygen comments to all namespaces.
* Updated comments on host-based implementations to clearly state (optional) OpenMP usage.

eff2ace3

Dec 01, 2012

Further polishing: · 94900385

Karl Rupp authored Dec 01, 2012

* Removed MSVC-switch in tutorials and benchmarks for reading files (require users to run from build/ accross different OS)
* Updated old Eigen-code to version 3.x
* Fixed a few more warnings in Visual Studio, added /wd4996 flag to get rid of VC iterator advertisements
* Fixed an overly strict assert() on vector-reductions with OpenCL, including a clean initialization of reduction vector
* Changed STL overload of norm_X from enable-if to plain overloading, otherwise MSVC has problems.

94900385

* Transition of converter from Boost.filesystem2 to Boost.filesystem3 · c1e00f54
Karl Rupp authored Dec 01, 2012
```
* Added finish() before copy() in tests in order to resolve issues with AMD APP SDK
```
c1e00f54

Release postponed by a day: · cde5bc1a

Karl Rupp authored Nov 30, 2012

* Fixed all warnings obtained in Visual Studio 2005 and 2010
* Reverted SFINAE in CTOR for vector to separate overloads for vector_range and vector_slice (does not work with VS 2005)
* Moved default-implementation for predicates to forwards.h, otherwise Visual Studio does not recognize forward definitions properly
* Removed unnecessary Boost.filesystem and Boost.system components check from dist-package
* Adjusted version number in Doxyfile and CMakeLists.txt

cde5bc1a

Nov 30, 2012
- * Removed a warning regarding uninitialized members in vector_operations.hpp · c776231b
  Karl Rupp authored Nov 30, 2012
  
  * Added least_squares and iterative to CUDA-examples * Fixed a minor flaw in viennacl-info
  c776231b
- * Eliminated almost all warnings with GCC at -Wextra. Only exception: SFINAE in vector-CTOR. · 60586fd8
  Karl Rupp authored Nov 30, 2012
  
  * viennacl-info now prints informations for all available platforms. * user-provided OpenCL context is no longer free'd at exit (inc() on handle after assignment). * Added Philippe's input to changelogs
  60586fd8
- * Fixed remaining doxygen warnings. · 1e7310c0
  Karl Rupp authored Nov 29, 2012
  
  * Reformatted SPAI implementation.
  1e7310c0
Nov 29, 2012

* Updated manual · a66b8b78
Karl Rupp authored Nov 29, 2012
```
* Fixed several doxygen warnings, still some more left
```
a66b8b78
* Updated license header in source files (adding Argonne) · 356107bc
Karl Rupp authored Nov 29, 2012
```
* Split manual into three parts: Core Functionality, Addon Functionality, Miscellaneous
```
356107bc

* Dense matrix-vector product now accepts matrix-ranges/slices and vector-ranges/slices as well · e233b999

Karl Rupp authored Nov 29, 2012

* Improved matrix-vector-test. Now checks all combinations of matrix/matrix-range/matrix-slice and vector/vector-range/vector-slice on rank-1-updates, matrix-vector products and triangular solves
* Removed redundant prod_impl(A, b)

e233b999

* Renamed viennacl::linalg::single_threaded to viennacl::linalg::host_based · 9cf4f1f2
Karl Rupp authored Nov 29, 2012
```
* Reduced execution time of sparse-test by speeding up the reference uBLAS calculations (double-transpose-trick)
```
9cf4f1f2

Nov 28, 2012

* Added least-squares example · 11b7e84f

Karl Rupp authored Nov 28, 2012

* Added inplace_qr_apply_trans_Q() to compute rhs of least-squares system R = Q^T b without setting up Q
* Fixed overloads for inplace_solve(A, b). More tests required, though.
* Unified use of viennacl::traits::clear(result) in CG and BiCGStab

11b7e84f

* Fixed a minor bug in lu_factorize that showed up with the CUDA backend only · 8f971c47
Karl Rupp authored Nov 28, 2012
```
* Added missing include directives for lu.hpp in one example and one test
```
8f971c47

* Reimplementation of LU factorization in viennacl/linalg/lu.hpp. Better... · e8a6e5b3

Karl Rupp authored Nov 27, 2012

* Reimplementation of LU factorization in viennacl/linalg/lu.hpp. Better performance, but still a lot of unused potential.
* Replaced slow generic CUDA matrix-matrix multiplication kernel by several semi-automatically generated kernels. Performance still only half of OpenCL, although code is virtually identical.
* Fixed a bug with C = prod(A, B) if C is a matrix_range or matrix_slice. An unnecessary temporary was introduced.
* CUDA-benchmarks now build correctly

e8a6e5b3

Nov 21, 2012

* Reduced overhead for copying to/from ublas::compressed_matrix<> · 68ec5e72

Karl Rupp authored Nov 21, 2012

* Generalized sparse_matrix_adapter. Now all types std::vector< std::map<T, U> > are supported (T was fixed to 'unsigned int' previously)

68ec5e72

* Added CUDA examples/tutorials/tests to build system · ede1ed5c

Karl Rupp authored Nov 21, 2012

* Added level scheduling to ILUT, renamed routines from multifrontal_XYZ() to level_scheduling_XYZ()
* Fixed a couple of issues in block-ILU and improved performance. Now works well with CPU/OpenCL/CUDA, with the latter striving for higher block sizes than the default 8.

ede1ed5c

Nov 18, 2012

* Added level scheduling to ILU0. Solver cycle times look good, but setup is still quite expensive. · 730b17be

Karl Rupp authored Nov 17, 2012

* CPU-fallback for ViennaCL-based block-ILU now working correctly.
* Removed old bicgstab-kernels (unused anyway)
* Eliminated 'potentially uninitialized variable' warnings in BiCGStab

730b17be

Nov 16, 2012

* Added support for row-/Jacobi-preconditioner with coordinate_matrix · 013e159c

Karl Rupp authored Nov 16, 2012

* Improved OpenCL matrix-vector performance of coordinate_matrix (factor 2 on GTX 285)
* Added restart to BiCGStab if search direction vanishes or a certain number of iterations is reached.
* Added two missing operator-overloads for vector in order to handle b - prod(A,x)

013e159c

* Added missing kernel initialization call to row_info() for OpenCL · 88598ec9

Karl Rupp authored Nov 15, 2012

* Fixed wrong estimated residual in BiCGStab as introduced with the previous commit
* Improved performance of block-ILU.

88598ec9

Nov 15, 2012

* Added element-wise operations for vectors · 119785b2

Karl Rupp authored Nov 15, 2012

* Row- and Jacobi-preconditioner now work on CPU, OpenCL and CUDA
* Final summation in norm_1, norm_2, norm_inf is now carried out on GPU or CPU, depending on target (same as for inner_prod())
* Tweaked CG and BiCGStab to use norm_2 instead of inner_prod(v, v)

119785b2

Simplified implementation of inner_prod(). Might yield better performance on AMD GPUs. · 6cca4eb9
Karl Rupp authored Nov 15, 2012

6cca4eb9

* Typesafe multi-backend transfer now working, making implementations based... · 8213cb0c

Karl Rupp authored Nov 14, 2012

* Typesafe multi-backend transfer now working, making implementations based upon them nice and compact :-)
* added operator= to compressed_matrix<>
* moved viennacl::backend::memory_types to viennacl::memory_types

8213cb0c

Nov 14, 2012
- Implemented support for typesafe cross-domain transfer of memory buffers. More testing required. · 1c0f1224
  Karl Rupp authored Nov 14, 2012
  
  1c0f1224
- * Pimped incomplete Cholesky factorization. Speed now comparable to ILU. · 7510910c
  Karl Rupp authored Nov 13, 2012
  
  * Fixed some of the problems in the block preconditioners. * Valgrind complains about uninitialized memory when using Cholesky with OpenCL. More investigations required. Maybe related to the AMD APP SDK bug on Trinity?
  7510910c
Nov 13, 2012
- Added first implementation of incomplete Cholesky preconditioner. Requires improvements. · 5118cac9
  Karl Rupp authored Nov 13, 2012
  
  5118cac9
- Added missing diagonal_assign_cpu-kernels for matrices. Initializer types now all working. · a4d0d439
  Karl Rupp authored Nov 13, 2012
  
  a4d0d439
- * Added matrix initializers (work for CPU and OpenCL, CUDA-testing required) · 32af3402
  Karl Rupp authored Nov 12, 2012
  
  * Added workaround for AMD APP SDK 2.7 bug on Trinity APUs (Catalyst 12.8) to tests
  32af3402
Nov 12, 2012
- Transfer RAM<->OpenCL<->CUDA<->RAM now implemented. · 24b5e039
  Karl Rupp authored Nov 12, 2012
  
  24b5e039
- Pimped ILU(0,T)-preconditioners, up to one order of magnitude faster with new... · 1b975b43
  Karl Rupp authored Nov 11, 2012
  
  Pimped ILU(0,T)-preconditioners, up to one order of magnitude faster with new low-level implementations.
  1b975b43
Nov 11, 2012
- Reimplementation of ILU0 for compressed_matrix. Using the new multi-backend,... · 3a53cf71
  Karl Rupp authored Nov 11, 2012
  
  Reimplementation of ILU0 for compressed_matrix. Using the new multi-backend, first tests indicate speedups of 10. That rocks. :-)
  3a53cf71
- Sparse triangular solvers for compressed_matrix now working (CPU, OpenCL, CUDA). · 113e0c7f
  Karl Rupp authored Nov 11, 2012
  
  113e0c7f
- Sparse triangular solver for compressed_matrix now working on CPU. Moved in... · 49010db8
  Karl Rupp authored Nov 10, 2012
  
  Sparse triangular solver for compressed_matrix now working on CPU. Moved in place for OpenCL and CUDA, testing required.
  49010db8
Nov 09, 2012
- Working on sparse triangular solvers. Performance still not great, but soon sufficient for ILU. · 98af43d5
  Karl Rupp authored Nov 09, 2012
  
  98af43d5
Nov 08, 2012

Some progress with triangular solver for compressed_matrix: unit-lower-solve... · 81969744

Karl Rupp authored Nov 08, 2012

Some progress with triangular solver for compressed_matrix: unit-lower-solve and upper-solve working, but performance is rather poor. This is, however, expected, because the data structure is not well suited for that.

81969744

* Corrected deprecated VIENNACL_HAVE_XYZ in examples · 43434a19
Karl Rupp authored Nov 07, 2012
```
* Added initializer types for vectors: unit_vector, zero_vector, scalar_vector
```
43434a19

Nov 07, 2012
- Added VIENNACL_WITH_OPENMP guard to existing OpenMP stuff · 3b0b1dbe
  Karl Rupp authored Nov 07, 2012
  
  3b0b1dbe
- * Unified preprocessor defines for external toolkit: VIENNACL_WITH_XYZ. Old... · b35f793b
  Karl Rupp authored Nov 07, 2012
  
  * Unified preprocessor defines for external toolkit: VIENNACL_WITH_XYZ. Old uses of VIENNACL_HAVE_{UBLAS|EIGEN|MTL4} still work, but their use is deprecated.
  b35f793b
Nov 06, 2012
- Completed CUDA backend by adding direct triangular solvers and LU factorization. All tests pass. · 5476f4ba
  Karl Rupp authored Nov 06, 2012
  
  5476f4ba
Nov 05, 2012
- * Reduced generic vector kernel (av, avbv, avbv_v) startup by 10-20 percent by packing arguments · 575300ac
  Karl Rupp authored Nov 05, 2012
  
  * Matrix-matrix operations for CUDA now functional. Performance is lower than with OpenCL, though...
  575300ac
- Fixed minor bugs in vector CUDA kernels (accidental uses of 'float' instead of... · a844270f
  Karl Rupp authored Nov 05, 2012
  
  Fixed minor bugs in vector CUDA kernels (accidental uses of 'float' instead of template parameter 'T')
  a844270f