Commits · 4063c941235d46804cd448db7ddecf0c3238548f · Kaushik Kulkarni / viennacl-dev

Jul 27, 2015

sum: Added support for sum(), row_sum(), column_sum() · 4063c941

Karl Rupp authored 9 years ago

These routines compute the sum of a vector as well as the row- and
column-sums of a dense matrix, respectively. The implementation
reuses inner products for vectors and matrix-vector products for
matrices. Thus, there is some overhead when compared to super-optimized
routines involved, but this should be acceptable in almost all cases.

Replaces former attempts of a reduce<>() function.

Resolves #127.

4063c941

Eigen: Added support for row-major matrices. · 7d29adf3
Karl Rupp authored 9 years ago
```
Basd on request by Sumit Kumar on viennacl-devel.
```
7d29adf3
CUDA: Fixed external linkage problems of two kernels. · 97121db0
Karl Rupp authored 9 years ago

97121db0

Jul 25, 2015

CMake: CUDA_ARCH_FLAG now defaults to sm_20. · 75c616a4

Karl Rupp authored 9 years ago

The previous default of sm_13 is no longer supported with CUDA 7.
Not setting a default does not give a hint to the user about the value to set.

75c616a4

CMake: commented out default architecture options in ViennaCLCommon.cmake. Now... · 327f924a

Philippe Tillet authored 9 years ago

CMake: commented out default architecture options in ViennaCLCommon.cmake. Now relies on the default behavior of nvcc to set the default behavior of ViennaCL.

Couldn't find a portable way to detect the architecture portably at build time with CMake.

327f924a

Jul 23, 2015
- Mixed-precision CG: Now available for all three backends. · 2b0a9e37
  Karl Rupp authored 9 years ago
  
  Leverages the new conversion routines, no custom-kernels required.
  2b0a9e37
- Parallel ILU: Fixed post-merge compilation issues with CUDA. · 95048eda
  Karl Rupp authored 9 years ago
  
  CUDA argument retrieval had to be updated.
  95048eda
Jul 22, 2015

Merge branch 'karlrupp/feature-refurbished-amg' · 429cbe71

Karl Rupp authored 9 years ago

karlrupp/feature-refurbished-amg:
 Full rewrite of AMG functionality.
 Fine-grained AMG as described by Bell et al. now available.
 Includes GPU-accelerated setup.
 New implementation available for all three backends.
 Setup on host and application on device also possible.

Resolves #16.

Conflicts:
	viennacl/linalg/detail/amg/amg_coarse.hpp
	viennacl/linalg/detail/amg/amg_debug.hpp
	viennacl/linalg/detail/amg/amg_interpol.hpp

429cbe71

AMG: Added another explicit note in manual that AMG is not a silver bullet. · b1cabbee
Karl Rupp authored 9 years ago
```
When randomly feeding matrices from Matrix Market one barely gets good convergence
at the first attempt.
```
b1cabbee

AMG: Added exception if coarsening gets stuck and other diagnostics. · d9decdbc

Karl Rupp authored 9 years ago

Rather than running into an assertion, the user now encounters an
exception explaining the reason why the AMG setup failed.
For additional diagnostics it is also possible to query the problem
sizes at each level.

d9decdbc

Merge branch 'karlrupp/feature-chow-patel-ilu' · d69a28ca

Karl Rupp authored 9 years ago

karlrupp/feature-chow-patel-ilu:
 Implements a parallel incomplete LU preconditioner.
 Implements a parallel incomplete Cholesky factorization preconditioner.
 Both preconditioners use the same sparsity pattern as A.
 Proposed by Chow and Patel (Algorithms 2 and 3) in SIAM J. Sci. Comp.

d69a28ca

ICHOL: Added parallel incomplete Cholesky factorization by Chow and Patel. · 8f344966

Karl Rupp authored 9 years ago

Proposed by Chow and Patel in SIAM J. Sci. Comp. Vol. 37, No. 2, pp. C169–C193, 2015
in Algorithm 3. Rather than a column-major computation of U,
we compute the row-major L. This saves at least one (costly) transposition.

8f344966

Chow-Patel-ILU: Added compile-time warning if wrong matrix type is passed. · 9e55af86
Karl Rupp authored 9 years ago
```
Only works with compressed_matrix.
```
9e55af86

Jul 21, 2015
- Preparations: Updated version number to 1.7.0 and copyright notice. · e0bc5f0e
  Karl Rupp authored 9 years ago
  
  Release 1.7.0 is getting closer. Still a few more things to complete, though.
  e0bc5f0e
- Changelog: Prepared for 1.7.0 release. · c05884d8
  Karl Rupp authored 9 years ago
  
  Supposedly pretty complete, might receive minor final updates.
  c05884d8
- AMG: Updated manual to reflect recent changes. · a4a0cbfc
  Karl Rupp authored 9 years ago
  
  a4a0cbfc
Jul 20, 2015

AMG: Removed unused source files. · f7d9ff2f
Karl Rupp authored 9 years ago
```
These are left-overs from earlier coding.
```
f7d9ff2f

AMG: Refined interface of AMG tag, documented improved interface. · 20bfe44d

Karl Rupp authored 9 years ago

Arguments now passed to getter/setter members rather than all
squeezed into the constructor. Interpolation weight is now the same
as the Jacobi weight, because the interpolation is just constructed
such that it complements the action of the smoother.

20bfe44d

CG, BiCGStab, GMRES: Improved CSR SpMV for pipelined implementations. · 56e45397

Karl Rupp authored 9 years ago

Improves performance on some NVIDIA GPUs by up to a factor of two.
Only kicks in if the matrix carries more than 12 nonzeros per row on average.

56e45397

Jul 19, 2015
- Exceptions: Replaced all string throws with proper exceptions. · e8dc2595
  Karl Rupp authored 9 years ago
  
  Strings are painfully hard to catch, whereas now it is a lot easier to just catch std::exception (or a more specific inherited class as needed).
  e8dc2595
- CUDA: Using proper expressions rather than throwing strings. · 6c39b19d
  Karl Rupp authored 9 years ago
  
  Makes it much easier to handle errors/exceptions. Some more strings are thrown as exceptions at other locations, will fix them in separate commit.
  6c39b19d
- Code quality: Removed warnings for -Wall -pedantic -Wextra -Wconversion · c7fb6d77
  Karl Rupp authored 9 years ago
  
  Checked with Clang 3.0.
  c7fb6d77
- Mixed precision: Fixed flaws in OpenCL kernels. · 8ac81d63
  Karl Rupp authored 9 years ago
  
  8ac81d63
Jul 18, 2015

Mixed precision: Added conversion rountines for vectors and matrices. · 9666d3a3

Karl Rupp authored 9 years ago

Allows one to convert between {(u)int, (u)long, float, double} as needed.
Adds support for vectors and dense matrices (including proxies).
Support for viennacl::scalar<> already available via casts on host.
No support for sparse matrices for now, as no use case in sight.

Resolves #80.
Partially addresses #124: It is now easier to convert to the same types.

9666d3a3

inner_prod: Fixed regression in release mode for multiple products. · e3bbb42c
Karl Rupp authored 9 years ago
```
Affected OpenCL backend in generator:
"Unsupported reduction operator : no neutral element known"
```
e3bbb42c
qr_method: Added guard in tests for checking double precision support. · 5afd16f8
Karl Rupp authored 9 years ago
```
Otherwise test fails on OpenCL devices without double precision support.
```
5afd16f8

Jul 17, 2015
- qr_method: Extended interface to also accept viennacl::vector. · 44ecee28
  Karl Rupp authored 9 years ago
  
  Based on pull request by cdeterman on GitHub. See discussion at #146. A similar approach could also be applied for the non-symmetric case, but that is not considered stable enough (complex?).
  44ecee28
- inner_prod: Fixed compilation problem with multiple inner products. · 0c57d3e1
  Karl Rupp authored 9 years ago
  
  Problem was introduced with parent commit.
  0c57d3e1
- Check includes: All headers are again self-sufficient. · efa8ff30
  Karl Rupp authored 9 years ago
  
  Includes updates to the checker-script in auxiliary-folder.
  efa8ff30
- qr_method: Fixed problems when including matrix.hpp · 3a2a9c7f
  Karl Rupp authored 9 years ago
  
  Addresses remaining issues in #145.
  3a2a9c7f
- SSE: Removed unused implementations. · 3add3d12
  Karl Rupp authored 9 years ago
  
  This code has not been used or tested in years. Time to clean up.
  3add3d12
- qr_method: Fixed compilation problems for double. · 2ad0dfbd
  Karl Rupp authored 9 years ago
  
  Also extended test suite such that this problem cannot show up again. The cause was one declaration where 'float' was accidentally hard-coded. Resolves #145.
  2ad0dfbd
Jul 16, 2015

AMG: Extended earlier OpenCL-only AMG implementation to CUDA and OpenMP. · 016dfb91

Karl Rupp authored 9 years ago

Now provides the following:
 - coarsening: classical RS, aggregation
 - smoothing: direct interpolation, aggregation, smoothed-agg
All available for all three backends, no longer requiring uBLAS.
Former RS0 and RS3 dropped due to a lack of fine-grained parallelism.

Implementations mostly based on paper
"Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods"
by Bell et al. Initial implementation in branch
 karlrupp/refurbish-amg
which became too messy over time, hence this cleanup.

New operations available via viennacl/linalg/amg_operations.hpp.
This includes assign_to_dense() and amg_transpose(),
which should at some point become generally available with a nicer API.

Still to be added:
 - diagnostic information from preconditioner object
 - documentation in manual
 - more convenience for amg_tag

016dfb91

MIC: Enhanced kernel parameters for BLAS levels 1, 2, 3. · d2ef9b25

Karl Rupp authored 9 years ago

Parameters not obtained from a full-fledged optimizer run,
but from careful manual tweaking. Obtained memory bandwidths
for BLAS levels 1 and 2 are about 70 GB/sec, which is okay given
that vector operations on the Xeon Phi are slow with OpenCL.
BLAS level 3 improves very mildly, peaks at about 40 GFLOP/sec.

Given that OpenCL for Xeon Phi (KNC) has limited use and that
everybody is eager for KNL, further tuning efforts are suspended.

Resolves #26.

d2ef9b25

Jul 15, 2015
- Timer: Removed deprecated examples/benchmarks/benchmark-utils.hpp · 45901648
  Karl Rupp authored 9 years ago
  
  Use viennacl/tools/timer.hpp instead.
  45901648
- Random: Removed Random.hpp in tutorials/ and tests/ folders. · 7e060bae
  Karl Rupp authored 9 years ago
  
  New location: viennacl/tools/random.hpp
  7e060bae
- Merge branch 'karlrupp/feature-improve-lanczos' · 0818dae5
  Karl Rupp authored 9 years ago
  
  karlrupp/feature-improve-lanczos: Extends interface such that also eigenvectors are computed and returned. Removes all uBLAS dependencies (caused problems with some CUDA/Boost combinations). Improves performance for partial reorthogonalization.
  0818dae5
- Lanczos: Improved implementation of partial reorthogonalization, eigenvectors. · 8eabb354
  Karl Rupp authored 9 years ago
  
  Removed a couple of unnecessary host-device copies, removed unused counters. Partial reorthogonalization now also computes eigenvectors if specified.
  8eabb354
- Merge pull request #144 from w2z43t5/master · ddb3ecdf
  Karl Rupp authored 9 years ago
  
  BLAS2 benchmark: Corrected calculation of bandwidth
  ddb3ecdf
- BLAS2 benchmark: Corrected calculation of bandwidth · 1ecfbc8a
  w2z43t5 authored 9 years ago
  
  Changed "BLAS3_M" and "BLAS3_N" to "BLAS2_M" and "BLAS2_N", respectively.
  1ecfbc8a