Commits · a335d1946765c299513ed9569a1d11a9d56b7257 · Kaushik Kulkarni / viennacl-dev

Aug 10, 2014
- Some more context fixes (matrix_operations, svd) · a335d194
  Toby Smithe authored Aug 10, 2014
  
  a335d194
- Change cache_path to use std::string throughout, to fix weird string termination bugs in Python · 8c2a5bdf
  Toby Smithe authored Aug 10, 2014
  
  8c2a5bdf
Aug 09, 2014
- ocl::context: set _cache_path on construction (no need to wait for init), and add accessor methods · ce232f91
  Toby Smithe authored Aug 09, 2014
  
  ce232f91
Aug 08, 2014
- FFT: Create temporary objects with correct contexts · 7f7cd0b4
  Toby Smithe authored Aug 08, 2014
  
  7f7cd0b4
- BiCGStab: Added missing #include <numeric> · 0965e292
  Karl Rupp authored Aug 08, 2014
  
  0965e292
- NMF: eliminate a floating-point / conversion warning · ac92b722
  Toby Smithe authored Aug 08, 2014
  
  ac92b722
- BiCGStab: Added missing #include statements. · 6cb1124a
  Karl Rupp authored Aug 08, 2014
```
Thanks to Toby St Clere Smithe for reporting on viennacl-devel.
```
  6cb1124a
- More satisfactory patch for NMF 0-matrix bug: compute norms of initial W and... · 36352caa
  Toby Smithe authored Aug 08, 2014
```
More satisfactory patch for NMF 0-matrix bug: compute norms of initial W and H, and if 0, set matrix/-ices to 1.0-valued-matrices
```
  36352caa
- Merge branch 'karlrupp/feature-pipelined-bicgstab' · cc044483
  Karl Rupp authored Aug 08, 2014
```
karlrupp/feature-pipelined-bicgstab:
Implements a pipelined version of BiCGStab with substantially improved
performance especially for smaller matrices. The algorithm is a slightly
rearranged version of BiCGStab as given in the book by Y. Saad
and only requires a single host<->device transfer in each step.
A fairly similar algorithm has been proposed in the paper
Jacques et al, "Electromagnetic scattering with the boundary integral
method on MIMD systems", LNCS 1593, Springer, 1999.
```
  cc044483
- BiCGStab: Applied const-correctness in OpenCL and CUDA kernels. · b8c91799
  Karl Rupp authored Aug 08, 2014
```
Some missing const qualifiers in the OpenCL kernels resulted in an almost
4x-performance drop. With equal const qualifiers, performance is now within
a roughly ten percent margin.
```
  b8c91799
- BiCGStab: Added pipelined CUDA kernels. CUDA implementation now complete. · 39532998
  Karl Rupp authored Aug 08, 2014
  
  39532998
- BiCGStab: Added OpenCL implementation of pipelined kernels. · 314e816b
  Karl Rupp authored Aug 08, 2014
  
  314e816b
Aug 07, 2014
- NMF: Create objects using correct fixes, and don't stop after just one set of iterations · 8f3d97b9
  Toby Smithe authored Aug 07, 2014
  
  8f3d97b9
- BiCGStab: Completed implementation of host-based backend. · 52a9f1af
  Karl Rupp authored Aug 07, 2014
  
  52a9f1af
- BiCGStab: Added pipelined implementation for compressed_matrix with host-based backend. · 005e23b7
  Karl Rupp authored Aug 07, 2014
```
Implementation based on the BiCGStab listed in the book by Saad and
then optimized for minimum memory traffic and synchronization points
similar to the paper by Jacques et al.
```
  005e23b7
Aug 06, 2014

Merge branch 'karlrupp/feature-pipelined-cg' · 09e9eaa0

Karl Rupp authored Aug 06, 2014

karlrupp/feature-pipelined-cg:
Improvements of the conjugate gradient algorithm by using custom kernels
for full use of pipelining and avoiding kernel launch overheads to the extent
possible.
Follows algorithm 2.2 in the paper by Chronopoulos and Gear:
"s-step Iterative Methods for Symmetric Linear Systems",
J. Comp. and Appl. Math 25 (1989).

09e9eaa0

CG: Added CUDA kernels. Pipelined CG now complete. · 3aac6265
Karl Rupp authored Aug 06, 2014

3aac6265
CG: Completed implementation of improved (pipelined) OpenCL kernels. · 4cd830a4
Karl Rupp authored Aug 06, 2014

4cd830a4

Aug 05, 2014
- CG: Added improved kernels for OpenCL when using compressed_matrix<> · 41c0c5a6
  Karl Rupp authored Aug 05, 2014
```
Support for other sparse matrices to be added later.
```
  41c0c5a6
Aug 04, 2014

Tools: Added sparse matrix generation routine (FDM 2D) · 7237e71f

Karl Rupp authored Aug 04, 2014

Provides a routine which generates a sparse matrix obtained from
the discretization of the Laplace equation using finite differences
and lexicographical ordering of the unknowns. One of the simplest,
symmetric and positive definite matrices.

7237e71f

Pipelined CG: Added implementation for host-based execution. · 812e3918

Karl Rupp authored Aug 04, 2014

Implementation for OpenCL and CUDA to come.
Reference for the implementation: Algorithm 2.2 in
Chronopoulos and Gear:
"s-step Iterative Methods for Symmetric Linear Systems",
Journal of Comp. and Appl. Math, 1989.

812e3918

Merge branch 'wip-sliced-ell-matrix'. · 246370b0

Karl Rupp authored Aug 04, 2014

wip-sliced-ell-matrix: Support for Sliced ELL-matrix format.
Currently only matrix-vector products, but no
sparse-matrix-times-dense-matrix-products.

246370b0

CUDA: Added option for specifying the CUDA arch through CMake. · d82ddd78
Karl Rupp authored Aug 04, 2014
```
So far "-arch=sm_13" was hardcoded. Now the user can overwrite this
through the CUDA_ARCH_FLAG in CMake.
```
d82ddd78

sliced_ell_matrix: Support for Sliced ELL-matrix format (sliced_ell_matrix). · 1f9db9c9

Karl Rupp authored Aug 04, 2014

This format is an improved version of the ELL format,
aiming at maximized memory bandwidth for reading matrix values.
Currently only matrix-vector products are supported, no
sparse-matrix-by-dense-matrix products. To be added at some later time.
See paper
"A unified sparse matrix data format for efficient general sparse matrix-vector multiply on modern processors with wide SIMD units"
by Kreutzer et al. for details. We don't use any sorting of rows here.

1f9db9c9

Aug 03, 2014

Matrix: Added support for the operations A = trans(B), A += trans(B), A -= trans(B) · 0f8d1b6e

Karl Rupp authored Aug 03, 2014

Also works in the case that A and B are the same matrix.
Currently this operation is fairly slow, because the transposition is performed
in main memory. On the other hand, it's certainly better to have a slow
implementation rather than no implementation at all...

0f8d1b6e

Aug 02, 2014

VS 2012: Fixed compilation problem in compressed_compressed_matrix<> · ec267fcb

Karl Rupp authored Aug 02, 2014

A subnamespace was ambiguous, at least according to MS' interpretation of
the standard.

Reported-by: Matthew Musto <matthew.musto@gmail.com> via viennacl-devel

ec267fcb

Aug 01, 2014
- Scheduler: Added missing implementations for element_pow() · 25076ce0
  Karl Rupp authored Aug 01, 2014
  
  25076ce0
- Tests: Reduced round-off errors in element_pow() test. · 2b8e01d7
  Karl Rupp authored Aug 01, 2014
```
Some nightly tests failed for element_pow() because numerical round-off errors
could become too severe. This commit reduces the exponent, which should
ultimately lead to no more spurious test failures.
```
  2b8e01d7
- VS2012: Fixed weird compilation problem with std::min() · 5c0ee23f
  Karl Rupp authored Aug 01, 2014
```
For some reason the template argument is explicitly required here.
Doesn't hurt us, so let's use the more verbose variant here.
```
  5c0ee23f
- vector_operations: create temporary scalars with correct context · c3b474a1
  Toby Smithe authored Aug 01, 2014
  
  c3b474a1
Jul 27, 2014
- Device-Specific / GEMM : Now letting the compiler unroll the loop. · 5fae4e32
  Philippe Tillet authored Jul 27, 2014
```
Probably requires more investigation, it may decrease the performance on some platforms, increase it on some others...
```
  5fae4e32
- Device-Specific / Template-Base : Added a default warp size for CPUs · d2e5d704
  Philippe Tillet authored Jul 27, 2014
  
  d2e5d704
Jul 26, 2014
- Device-specific / GEMM : Fixed bug when simd-width > 1 for FETCH_GLOBAL_CONTIGUOUS · 98d4d325
  Philippe Tillet authored Jul 26, 2014
  
  98d4d325
- Device-specific / GEMM : Implemented the new FETCH_GLOBAL_CONTIGUOUS policy.... · c1c04eca
  Philippe Tillet authored Jul 26, 2014
```
Device-specific / GEMM : Implemented the new FETCH_GLOBAL_CONTIGUOUS policy. Changed use_{A,B}_local from bool to an enum...
```
  c1c04eca
Jul 24, 2014
- Device-specific : Added accessor to template::parameters(); rename... · 527352be
  Philippe Tillet authored Jul 24, 2014
```
Device-specific : Added accessor to template::parameters(); rename template::parameters to template::parameters_type
```
  527352be
- CMake: Fixed non-matching conditions in if() and endif() block. · ca0c4d33
  Karl Rupp authored Jul 24, 2014
  
  ca0c4d33
Jul 22, 2014
- CMake, OpenMP, MinGW: Fixed missing OpenMP linkage under MinGW for libviennacl. · 801c95e8
  Karl Rupp authored Jul 22, 2014
```
Fixes #72. Under MinGW it is required to explicitly link with 'gomp'.
```
  801c95e8
- Device-specific: Fixed error codes definition · 14391c58
  Philippe Tillet authored Jul 22, 2014
```
Could cause compilation errors on some platforms
```
  14391c58
- Device-Specific : Further stabilization of the row-wise reduction template · a76dc2d0
  Philippe Tillet authored Jul 22, 2014
  
  a76dc2d0
- Device-specific : Fixed silly bug in row-wise reduction template · 266ff46b
  Philippe Tillet authored Jul 22, 2014
  
  266ff46b