Commits · 1a30cfa5f79c1bdb6875ee991666f3cca0bbd6a3 · Kaushik Kulkarni / viennacl-dev

Jul 31, 2015
- Doc: Last updates for 1.7.0 release. · 1a30cfa5
  Karl Rupp authored Jul 31, 2015
  
  1a30cfa5
- Revert "GEMM: Increased block size to 128 on CPU." · c3be1eaf
  Karl Rupp authored Jul 31, 2015
```
This reverts commit 955ec252.
Other desktop systems resulted in reduced performance.
```
  c3be1eaf
- AMG: Switched OpenMP loop indices to signed types. · cad2d1a4
  Karl Rupp authored Jul 31, 2015
```
Otherwise we get complaints on Visual Studio, since OpenMP 2.0 only
supports signed integers.
```
  cad2d1a4
Jul 30, 2015
- CMake: Updated deployment script to include missing files. · 2b24ffcd
  Karl Rupp authored Jul 31, 2015
```
AddCCompilerFlagIfSupported.cmake and AddCLinkerFlagIfSupported.cmake
were not included in the dist and dist-src targets.
```
  2b24ffcd
- Visual Studio: Fixed compilation issues caused by conversion CTORs. · f6a5f184
  Karl Rupp authored Jul 30, 2015
```
Visual Studio 2012 ran into ambiguities with respect to conversions.
Adding tie-breaker overloads of operator= fixed the problems.
```
  f6a5f184
- Doc: Improved structure and content of manual. · 3bfa02ac
  Karl Rupp authored Jul 30, 2015
```
Most notably:
 - Added fine-grained ILU
 - Described custom compilation via command line
 - Better iterative solver description (including mixed-precision-CG)
 - Update the GPU support table.
 - Sparse matrix-matrix products.
```
  3bfa02ac
- GEMM: Increased block size to 128 on CPU. · 955ec252
  Karl Rupp authored Jul 30, 2015
```
Improves performance on my laptop by a factor 3.
```
  955ec252
- AMG: Added missing include for std::greater. · 1fcbc6d7
  Karl Rupp authored Jul 30, 2015
```
Resulted in build failures on Visual Studio 2012.
```
  1fcbc6d7
- sliced_ell_matrix: Reduced block size to 32. · 7b037ad4
  Karl Rupp authored Jul 30, 2015
```
Improves performance on NVIDIA GPUs by about 10 percent on average.
Also reduces memory footprint a little.
```
  7b037ad4
- Merge pull request #148 from Rombur/fix_typo · 973616ee
  Karl Rupp authored Jul 30, 2015
```
AMG: Fix a typo: coarseing -> coarsening.
```
  973616ee
- SpGEMM: Fixed conversion constructor in compressed_matrix for OpenCL. · 44bbf1a7
  Karl Rupp authored Jul 30, 2015
```
OpenCL context handle was accidentally not set.
```
  44bbf1a7
Jul 29, 2015

Fix a typo: coarseing -> coarsening. · eabc685c
Bruno Turcksin authored Jul 29, 2015

eabc685c

SpGEMM: Fixed problem in CUDA kernel (stage 3) due to CUDA bug(?) · 2a6e9153

Karl Rupp authored Jul 29, 2015

Same problem showed up with OpenCL earlier in
216a6ac4
I assume that we are hitting a bug in the CUDA stack here,
since the problem only shows up on some CUDA devices (e.g. K20m)
and only with certain build configurations. A debug build, for example,
does not show any issues.

See also the follow-up discussion in #147.

2a6e9153

Doxygen: Fixed all warnings obtained when building docs. · f6b2dc92
Karl Rupp authored Jul 29, 2015

f6b2dc92
CMake: Removed BUILD_MANUAL option. · e7c57f1a
Karl Rupp authored Jul 29, 2015
```
Since no standalone PDF manual is available anymore, this option became obsolete.
```
e7c57f1a

ViennaProfiler: Removed CMake bindings. · 7396ec10

Karl Rupp authored Jul 29, 2015

The respective tuning code for ViennaProfiler is no longer in ViennaCL,
so this optional dependency is obsolete.

7396ec10

SpGEMM: Added conversion constructor. · da416a33

Karl Rupp authored Jul 29, 2015

This is to also support lines such as
 compressed_matrix<T> A = prod(B, C);
So far only operator= was supported.

da416a33

Iterative: Fixed convenience overloads for STL types. · 1da7090f
Karl Rupp authored Jul 29, 2015

1da7090f

Iterative: Added API for passing monitors and initial guesses. · 7b598ba9

Karl Rupp authored Jul 29, 2015

Old API still supported. New API uses solver objects, where the initial
guess as well as the monitor callbacks are registered.
New tutorial for usage: iterative-custom

Resolves #97.

7b598ba9

Jul 28, 2015
- AMG: Fixed incorrect MIS-2 check for coarsening in CUDA backend. · 50f9d227
  Karl Rupp authored Jul 28, 2015
```
Resolves #147.
```
  50f9d227
- GCC 4.6, Clang 3.0: Fixed picky compilation warnings. · c74b7f50
  Karl Rupp authored Jul 28, 2015
```
Flags used:
-Wall -pedantic -Wextra -Weverything
-Wno-exit-time-destructors -Wno-padded -Wno-global-constructors
-Wno-weak-vtables -Wno-unreachable-code
```
  c74b7f50
Jul 27, 2015

sum: Added support for sum(), row_sum(), column_sum() · 4063c941

Karl Rupp authored Jul 27, 2015

These routines compute the sum of a vector as well as the row- and
column-sums of a dense matrix, respectively. The implementation
reuses inner products for vectors and matrix-vector products for
matrices. Thus, there is some overhead when compared to super-optimized
routines involved, but this should be acceptable in almost all cases.

Replaces former attempts of a reduce<>() function.

Resolves #127.

4063c941

Eigen: Added support for row-major matrices. · 7d29adf3
Karl Rupp authored Jul 27, 2015
```
Basd on request by Sumit Kumar on viennacl-devel.
```
7d29adf3
CUDA: Fixed external linkage problems of two kernels. · 97121db0
Karl Rupp authored Jul 27, 2015

97121db0

Jul 25, 2015

CMake: CUDA_ARCH_FLAG now defaults to sm_20. · 75c616a4

Karl Rupp authored Jul 25, 2015

The previous default of sm_13 is no longer supported with CUDA 7.
Not setting a default does not give a hint to the user about the value to set.

75c616a4

CMake: commented out default architecture options in ViennaCLCommon.cmake. Now... · 327f924a

Philippe Tillet authored Jul 24, 2015

CMake: commented out default architecture options in ViennaCLCommon.cmake. Now relies on the default behavior of nvcc to set the default behavior of ViennaCL.

Couldn't find a portable way to detect the architecture portably at build time with CMake.

327f924a

Jul 23, 2015
- Mixed-precision CG: Now available for all three backends. · 2b0a9e37
  Karl Rupp authored Jul 23, 2015
```
Leverages the new conversion routines, no custom-kernels required.
```
  2b0a9e37
- Parallel ILU: Fixed post-merge compilation issues with CUDA. · 95048eda
  Karl Rupp authored Jul 23, 2015
```
CUDA argument retrieval had to be updated.
```
  95048eda
Jul 22, 2015

Merge branch 'karlrupp/feature-refurbished-amg' · 429cbe71

Karl Rupp authored Jul 22, 2015

karlrupp/feature-refurbished-amg:
 Full rewrite of AMG functionality.
 Fine-grained AMG as described by Bell et al. now available.
 Includes GPU-accelerated setup.
 New implementation available for all three backends.
 Setup on host and application on device also possible.

Resolves #16.

Conflicts:
	viennacl/linalg/detail/amg/amg_coarse.hpp
	viennacl/linalg/detail/amg/amg_debug.hpp
	viennacl/linalg/detail/amg/amg_interpol.hpp

429cbe71

AMG: Added another explicit note in manual that AMG is not a silver bullet. · b1cabbee
Karl Rupp authored Jul 22, 2015
```
When randomly feeding matrices from Matrix Market one barely gets good convergence
at the first attempt.
```
b1cabbee

AMG: Added exception if coarsening gets stuck and other diagnostics. · d9decdbc

Karl Rupp authored Jul 22, 2015

Rather than running into an assertion, the user now encounters an
exception explaining the reason why the AMG setup failed.
For additional diagnostics it is also possible to query the problem
sizes at each level.

d9decdbc

Merge branch 'karlrupp/feature-chow-patel-ilu' · d69a28ca

Karl Rupp authored Jul 22, 2015

karlrupp/feature-chow-patel-ilu:
 Implements a parallel incomplete LU preconditioner.
 Implements a parallel incomplete Cholesky factorization preconditioner.
 Both preconditioners use the same sparsity pattern as A.
 Proposed by Chow and Patel (Algorithms 2 and 3) in SIAM J. Sci. Comp.

d69a28ca

ICHOL: Added parallel incomplete Cholesky factorization by Chow and Patel. · 8f344966

Karl Rupp authored Jul 22, 2015

Proposed by Chow and Patel in SIAM J. Sci. Comp. Vol. 37, No. 2, pp. C169–C193, 2015
in Algorithm 3. Rather than a column-major computation of U,
we compute the row-major L. This saves at least one (costly) transposition.

8f344966

Chow-Patel-ILU: Added compile-time warning if wrong matrix type is passed. · 9e55af86
Karl Rupp authored Jul 22, 2015
```
Only works with compressed_matrix.
```
9e55af86

Jul 21, 2015
- Preparations: Updated version number to 1.7.0 and copyright notice. · e0bc5f0e
  Karl Rupp authored Jul 21, 2015
```
Release 1.7.0 is getting closer.
Still a few more things to complete, though.
```
  e0bc5f0e
- Changelog: Prepared for 1.7.0 release. · c05884d8
  Karl Rupp authored Jul 21, 2015
```
Supposedly pretty complete, might receive minor final updates.
```
  c05884d8
- AMG: Updated manual to reflect recent changes. · a4a0cbfc
  Karl Rupp authored Jul 21, 2015
  
  a4a0cbfc
Jul 20, 2015

AMG: Removed unused source files. · f7d9ff2f
Karl Rupp authored Jul 20, 2015
```
These are left-overs from earlier coding.
```
f7d9ff2f

AMG: Refined interface of AMG tag, documented improved interface. · 20bfe44d

Karl Rupp authored Jul 20, 2015

Arguments now passed to getter/setter members rather than all
squeezed into the constructor. Interpolation weight is now the same
as the Jacobi weight, because the interpolation is just constructed
such that it complements the action of the smoother.

20bfe44d

CG, BiCGStab, GMRES: Improved CSR SpMV for pipelined implementations. · 56e45397

Karl Rupp authored Jul 20, 2015

Improves performance on some NVIDIA GPUs by up to a factor of two.
Only kicks in if the matrix carries more than 12 nonzeros per row on average.

56e45397