Commits · e233b999964f1c6ddcd91140c15756485faeb1cd · Kaushik Kulkarni / viennacl-dev

Nov 29, 2012

* Dense matrix-vector product now accepts matrix-ranges/slices and vector-ranges/slices as well · e233b999

Karl Rupp authored Nov 29, 2012

* Improved matrix-vector-test. Now checks all combinations of matrix/matrix-range/matrix-slice and vector/vector-range/vector-slice on rank-1-updates, matrix-vector products and triangular solves
* Removed redundant prod_impl(A, b)

e233b999

* Renamed viennacl::linalg::single_threaded to viennacl::linalg::host_based · 9cf4f1f2
Karl Rupp authored Nov 29, 2012
```
* Reduced execution time of sparse-test by speeding up the reference uBLAS calculations (double-transpose-trick)
```
9cf4f1f2

Nov 28, 2012

* Added least-squares example · 11b7e84f

Karl Rupp authored Nov 28, 2012

* Added inplace_qr_apply_trans_Q() to compute rhs of least-squares system R = Q^T b without setting up Q
* Fixed overloads for inplace_solve(A, b). More tests required, though.
* Unified use of viennacl::traits::clear(result) in CG and BiCGStab

11b7e84f

* Fixed a minor bug in lu_factorize that showed up with the CUDA backend only · 8f971c47
Karl Rupp authored Nov 28, 2012
```
* Added missing include directives for lu.hpp in one example and one test
```
8f971c47

* Reimplementation of LU factorization in viennacl/linalg/lu.hpp. Better... · e8a6e5b3

Karl Rupp authored Nov 27, 2012

* Reimplementation of LU factorization in viennacl/linalg/lu.hpp. Better performance, but still a lot of unused potential.
* Replaced slow generic CUDA matrix-matrix multiplication kernel by several semi-automatically generated kernels. Performance still only half of OpenCL, although code is virtually identical.
* Fixed a bug with C = prod(A, B) if C is a matrix_range or matrix_slice. An unnecessary temporary was introduced.
* CUDA-benchmarks now build correctly

e8a6e5b3

Nov 21, 2012

* Reduced overhead for copying to/from ublas::compressed_matrix<> · 68ec5e72

Karl Rupp authored Nov 21, 2012

* Generalized sparse_matrix_adapter. Now all types std::vector< std::map<T, U> > are supported (T was fixed to 'unsigned int' previously)

68ec5e72

* Added CUDA examples/tutorials/tests to build system · ede1ed5c

Karl Rupp authored Nov 21, 2012

* Added level scheduling to ILUT, renamed routines from multifrontal_XYZ() to level_scheduling_XYZ()
* Fixed a couple of issues in block-ILU and improved performance. Now works well with CPU/OpenCL/CUDA, with the latter striving for higher block sizes than the default 8.

ede1ed5c

Nov 18, 2012

* Added level scheduling to ILU0. Solver cycle times look good, but setup is still quite expensive. · 730b17be

Karl Rupp authored Nov 17, 2012

* CPU-fallback for ViennaCL-based block-ILU now working correctly.
* Removed old bicgstab-kernels (unused anyway)
* Eliminated 'potentially uninitialized variable' warnings in BiCGStab

730b17be

Nov 16, 2012

* Added support for row-/Jacobi-preconditioner with coordinate_matrix · 013e159c

Karl Rupp authored Nov 16, 2012

* Improved OpenCL matrix-vector performance of coordinate_matrix (factor 2 on GTX 285)
* Added restart to BiCGStab if search direction vanishes or a certain number of iterations is reached.
* Added two missing operator-overloads for vector in order to handle b - prod(A,x)

013e159c

* Added missing kernel initialization call to row_info() for OpenCL · 88598ec9

Karl Rupp authored Nov 15, 2012

* Fixed wrong estimated residual in BiCGStab as introduced with the previous commit
* Improved performance of block-ILU.

88598ec9

Nov 15, 2012

* Added element-wise operations for vectors · 119785b2

Karl Rupp authored Nov 15, 2012

* Row- and Jacobi-preconditioner now work on CPU, OpenCL and CUDA
* Final summation in norm_1, norm_2, norm_inf is now carried out on GPU or CPU, depending on target (same as for inner_prod())
* Tweaked CG and BiCGStab to use norm_2 instead of inner_prod(v, v)

119785b2

Simplified implementation of inner_prod(). Might yield better performance on AMD GPUs. · 6cca4eb9
Karl Rupp authored Nov 15, 2012

6cca4eb9

* Typesafe multi-backend transfer now working, making implementations based... · 8213cb0c

Karl Rupp authored Nov 14, 2012

* Typesafe multi-backend transfer now working, making implementations based upon them nice and compact :-)
* added operator= to compressed_matrix<>
* moved viennacl::backend::memory_types to viennacl::memory_types

8213cb0c

Nov 14, 2012
- Implemented support for typesafe cross-domain transfer of memory buffers. More testing required. · 1c0f1224
  Karl Rupp authored Nov 14, 2012
  
  1c0f1224
- * Pimped incomplete Cholesky factorization. Speed now comparable to ILU. · 7510910c
  Karl Rupp authored Nov 13, 2012
```
* Fixed some of the problems in the block preconditioners.
* Valgrind complains about uninitialized memory when using Cholesky with OpenCL. More investigations required. Maybe related to the AMD APP SDK bug on Trinity?
```
  7510910c
Nov 13, 2012
- Added first implementation of incomplete Cholesky preconditioner. Requires improvements. · 5118cac9
  Karl Rupp authored Nov 13, 2012
  
  5118cac9
- Added missing diagonal_assign_cpu-kernels for matrices. Initializer types now all working. · a4d0d439
  Karl Rupp authored Nov 13, 2012
  
  a4d0d439
- * Added matrix initializers (work for CPU and OpenCL, CUDA-testing required) · 32af3402
  Karl Rupp authored Nov 12, 2012
```
* Added workaround for AMD APP SDK 2.7 bug on Trinity APUs (Catalyst 12.8) to tests
```
  32af3402
Nov 12, 2012
- Transfer RAM<->OpenCL<->CUDA<->RAM now implemented. · 24b5e039
  Karl Rupp authored Nov 12, 2012
  
  24b5e039
- Pimped ILU(0,T)-preconditioners, up to one order of magnitude faster with new... · 1b975b43
  Karl Rupp authored Nov 11, 2012
```
Pimped ILU(0,T)-preconditioners, up to one order of magnitude faster with new low-level implementations.
```
  1b975b43
Nov 11, 2012
- Reimplementation of ILU0 for compressed_matrix. Using the new multi-backend,... · 3a53cf71
  Karl Rupp authored Nov 11, 2012
```
Reimplementation of ILU0 for compressed_matrix. Using the new multi-backend, first tests indicate speedups of 10. That rocks. :-)
```
  3a53cf71
- Sparse triangular solvers for compressed_matrix now working (CPU, OpenCL, CUDA). · 113e0c7f
  Karl Rupp authored Nov 11, 2012
  
  113e0c7f
- Sparse triangular solver for compressed_matrix now working on CPU. Moved in... · 49010db8
  Karl Rupp authored Nov 10, 2012
```
Sparse triangular solver for compressed_matrix now working on CPU. Moved in place for OpenCL and CUDA, testing required.
```
  49010db8
Nov 09, 2012
- Working on sparse triangular solvers. Performance still not great, but soon sufficient for ILU. · 98af43d5
  Karl Rupp authored Nov 09, 2012
  
  98af43d5
Nov 08, 2012

Some progress with triangular solver for compressed_matrix: unit-lower-solve... · 81969744

Karl Rupp authored Nov 08, 2012

Some progress with triangular solver for compressed_matrix: unit-lower-solve and upper-solve working, but performance is rather poor. This is, however, expected, because the data structure is not well suited for that.

81969744

* Corrected deprecated VIENNACL_HAVE_XYZ in examples · 43434a19
Karl Rupp authored Nov 07, 2012
```
* Added initializer types for vectors: unit_vector, zero_vector, scalar_vector
```
43434a19

Nov 07, 2012
- Added VIENNACL_WITH_OPENMP guard to existing OpenMP stuff · 3b0b1dbe
  Karl Rupp authored Nov 07, 2012
  
  3b0b1dbe
- * Unified preprocessor defines for external toolkit: VIENNACL_WITH_XYZ. Old... · b35f793b
  Karl Rupp authored Nov 07, 2012
```
* Unified preprocessor defines for external toolkit: VIENNACL_WITH_XYZ. Old uses of VIENNACL_HAVE_{UBLAS|EIGEN|MTL4} still work, but their use is deprecated.
```
  b35f793b
Nov 06, 2012
- Completed CUDA backend by adding direct triangular solvers and LU factorization. All tests pass. · 5476f4ba
  Karl Rupp authored Nov 06, 2012
  
  5476f4ba
Nov 05, 2012
- * Reduced generic vector kernel (av, avbv, avbv_v) startup by 10-20 percent by packing arguments · 575300ac
  Karl Rupp authored Nov 05, 2012
```
* Matrix-matrix operations for CUDA now functional. Performance is lower than with OpenCL, though...
```
  575300ac
- Fixed minor bugs in vector CUDA kernels (accidental uses of 'float' instead of... · a844270f
  Karl Rupp authored Nov 05, 2012
```
Fixed minor bugs in vector CUDA kernels (accidental uses of 'float' instead of template parameter 'T')
```
  a844270f
Nov 04, 2012
- Sparse matrices now working with CUDA. · b759bf80
  Karl Rupp authored Nov 04, 2012
  
  b759bf80
Nov 03, 2012
- Vector operations and matrix operations now work for CUDA. No matrix-matrix... · 8c815ad1
  Karl Rupp authored Nov 03, 2012
```
Vector operations and matrix operations now work for CUDA. No matrix-matrix products and no triangular solvers yet.
```
  8c815ad1
- Added CUDA skeleton for vector operations. Needs testing. · 4b9eb57d
  Karl Rupp authored Nov 02, 2012
  
  4b9eb57d
Nov 02, 2012
- Added memory handling and scalar operations for CUDA. · 6bc0e300
  Karl Rupp authored Nov 02, 2012
  
  6bc0e300
Nov 01, 2012
- Compilation with nvcc now works. Fixed all warnings and errors reported by nvcc. · d036b567
  Karl Rupp authored Nov 01, 2012
  
  d036b567
- Refurbished build system, now deals correctly with OpenCL-enabled and -disabled environments. · 9611f739
  Karl Rupp authored Nov 01, 2012
  
  9611f739
- * Added VIENNACL_HAVE_OPENCL flag for including OpenCL specific stuff. · 8c84096f
  Karl Rupp authored Oct 31, 2012
```
* Most parts of the ViennaCL core now compile without any OpenCL installation on the system.
* Improved build-system towards this new scenario, yet further love required.
```
  8c84096f
Oct 30, 2012
- Cleaned up norm-kernels. Now using a single kernel for norm_1, norm_2 and... · 4baa4d13
  Karl Rupp authored Oct 30, 2012
```
Cleaned up norm-kernels. Now using a single kernel for norm_1, norm_2 and norm_inf. Also, final summations are generalized to a single kernel and now use shared memory.
```
  4baa4d13
- Purged old/unused kernels in auxiliary folder · 0ed4f0f7
  Karl Rupp authored Oct 30, 2012
  
  0ed4f0f7