Commits · def061525e14cf4dc92cb21ede46e26f107a899f · Kaushik Kulkarni / viennacl-dev · GitLab

Snippets Groups Projects

Jul 07, 2015

Fixed a typo in bisect_gpu.hpp · def06152
Andi authored 9 years ago
```
Replaced VIENNACL_LINAL_BISECT_GPU by VIENNACL_LINALG_BISECT_GPU
```
def06152
Removed unnecessary comments in bisection config file · 3ec88ef9
Andi authored 9 years ago

3ec88ef9

Changed preprocessor macros in bisection algorithm · df611881

Andi authored 9 years ago

Replaced the following preprocessor macros in OpenCL backend of the bisection algorithm:
MAX_THREADS_BLOCK -> VIENNACL_BISECT_MAX_THREADS_BLOCK
MAX_SMALL_MATRIX -> VIENNACL_BISECT_MAX_SMALL_MATRIX
MAX_SMALL_MATRIX -> VIENNACL_BISECT_MAX_THREADS_BLOCK_SMALL_MATRIX
MIN_ABS_INTERVAL -> VIENNACL_BISECT_MIN_ABS_INTERVAL

df611881

Fixed errors in bisection algorithm · c007d754

Andi authored 9 years ago

* Fixed an error with an endless loop.
* Fixed an error with a race condition.

c007d754

Jul 06, 2015
- Tests for CUDA · 573ab795
  Andi authored 9 years ago
  
  573ab795
Jul 05, 2015
- added TQL1 algorithm, improved compatibility for bisection algorithm · 38a627b6
  Andi authored 9 years ago
  
  * added TQL1 algorithm * reduced OpenCL work group size from 512 to 256 for better compatibility for bisection algorithm * added option to reduce CUDA block dimension for bisection algorithm
  38a627b6
- Improved compatibility for bisection algorithm · 4e932fca
  Andi authored 9 years ago
  
  4e932fca
- added tql2 algorithm, changed bisection algorithm test · ab2a32ae
  Andi authored 9 years ago
  
  ab2a32ae
Jul 02, 2015
- Merge pull request #142 from cdeterman/master · eaccd7fc
  Karl Rupp authored 9 years ago
  
  added eigen map size1 and size2 traits methods
  eaccd7fc
- added eigen map size methods · ca2e337f
  Charles Determan authored 9 years ago
  
  ca2e337f
- Eigen: Added overloads for Eigen::Map<> of vectors. · 2e818e53
  Karl Rupp authored 9 years ago
  
  viennacl::copy() now also works for Eigen::Map<VectorXf> and Eigen::Map<VectorXd> As a positive side effect, also improves performance of the copy. Refer to #137 for discussion.
  2e818e53
Jun 29, 2015
- compressed_matrix: Refined OpenCL kernels for NVIDIA Kepler GPUs. · e55216f5
  Karl Rupp authored 9 years ago
  
  Based on previous tweaks of CUDA kernels. Performance gain up to 30 percent. Tests on Maxwell pending.
  e55216f5
Jun 24, 2015

Eigen: Extended wrapper to also support Eigen::Map<> for dense matrices. · 32ff5024

Karl Rupp authored 9 years ago

Works for dynamically sized matrices.
Statically sized (small) matrices are not supported, because they will
provide extremely poor performance due to PCI-e latency.

32ff5024

Jun 23, 2015

SpGEMM: Workaround for bug in NVIDIA OpenCL compiler. · 216a6ac4

Karl Rupp authored 9 years ago

if (buffer_size == get_local_size(0)) { ... } block caused problems
with NVIDIA drivers 34x.yz. Reproducing the error on simpler kernels
was not possible.
By moving operations on index_in_C and buffer_size out of the block,
the issues get resolved.

Also introduces use of thread-private variable 'local_id' to replace
uses of get_local_id(0) in same kernel.
Might improve performance slightly.

216a6ac4

Jun 11, 2015

CUDA: Added improved CSR SpMV kernel. · b3a6f0e1

Karl Rupp authored 9 years ago

Used whenever average number of nonzeros per row is larger
than 6.5 (Maxwell) or 12.0 (Kepler and earlier).
Overall performance about 10-20 percent better than CUSPARSE.

b3a6f0e1

Jun 08, 2015

SpGEMM: Compatibility fixes so that older NVIDIA GPUs can be used as well. · a4ddca5f

Karl Rupp authored 9 years ago

Resolves the need for warp-shuffles by using shared memory instead.
Exclusive scans now run on the device (no thrust or host-based operations).

a4ddca5f

May 31, 2015

stdint.h: Finally removed (forgotten in previous commit) · 979abd97
Karl Rupp authored 9 years ago
```
Caused problems with Visual Studio 2008, since it's not in C++03.
```
979abd97

Visual Studio 2008: Fixed a couple of accidental C++11 features. · 795eec5d

Karl Rupp authored 9 years ago

std::map<Key, Value>::at() and std::vector<T>::data() was used a couple of times.
stdint.h is not available in C++03, hence needed to be replaced by custom typedefs.
As a result, compilation on VS 2008 failed, as these features are unavailable there.

795eec5d

May 28, 2015
- Armadillo: Added bindings similar to Eigen, MTL4, and Boost.uBLAS. · 373f55d6
  Karl Rupp authored 9 years ago
  
  Supports dense and sparse matrix copies, vector copies, and also iterative solvers (no preconditioners). Tested with Armadillo 5.200.1. Resolves #125.
  373f55d6
- Merge pull request #113 from d-meiser/add-asan-to-travis · b42c06d0
  Karl Rupp authored 9 years ago
  
  Provide address sanitizer option for travis builds.
  b42c06d0
May 27, 2015
- Scan: Fixed incorrect exclusive_scan() for large vector sizes. · 22ec5de1
  Karl Rupp authored 9 years ago
  
  Also extended test suite accordingly.
  22ec5de1
- Scans: Fixed bug in CPU-version of exclusive scan for small vectors. · 74f6136c
  Karl Rupp authored 9 years ago
  
  Extended test suite accordingly to cover both implementations.
  74f6136c
- compressed_matrix: Added support for operator<< · 4e6de156
  Karl Rupp authored 9 years ago
  
  Significantly simplifies debugging and diagonstics :-)
  4e6de156
- compressed_matrix: generate_row_block_information() now public. · d9d5df1c
  Karl Rupp authored 9 years ago
  
  This routine is required in cases where the user populates the memory buffers manually. Otherwise, failures in sparse matrix-products are to be expected.
  d9d5df1c
- Scan: Added in-place versions, fixed issue in in-place OpenMP implementation. · de4d6ad2
  Karl Rupp authored 9 years ago
  
  Adds in-place versions inclusive_scan(x); exclusive_scan(x); Extended test suite uncovered a bug in the in-place version of the OpenMP implementation of exclusive_scan(x);
  de4d6ad2
- SpGEMM: Fixed compilation and external linkage problems with CUDA 6.0. · 5318fef6
  Karl Rupp authored 9 years ago
  
  Warp-shuffles required an explicit cast to int.
  5318fef6
May 23, 2015
- Scan: Fixed incomplete migration of OpenCL kernels from SVD module. · 54bf0c38
  Karl Rupp authored 9 years ago
  
  54bf0c38
May 22, 2015

Scan: Refurbished CUDA and OpenCL implementations. · aadb5b72

Karl Rupp authored 9 years ago

Now uses only three kernels and one temporary buffer rather than the
previous approach with four kernels and two temporary vectors(!).
Also prepared explicit API for inplace-scans.

Possible further optimizations:
 - Non-inplace scans can run without temporary buffer
 - Small vectors can run with only one kernel invocation, no temporary buffer
 - Test suite for scans needs more love.

aadb5b72

May 21, 2015

SpGEMM: Fixed missing barrier in OpenCL kernels. · 525fc3ae

Karl Rupp authored 9 years ago

The current kernels only worked for true lock-step execution.
On the CPU, where each work group is executed by a few threads,
an additional barrier is required for a correct execution.
Should also fix problems on some NVIDIA GPUs.

525fc3ae

May 20, 2015
- Scan: Added fast OpenMP implementation, fixed OpenCL bug, shifted to vectors. · 54474134
  Karl Rupp authored 9 years ago
  
  This is the first step of a larger overhaul of the inclusive_scan and exclusive_scan.
  54474134
- SpGEMM: Minor improvements to OpenCL kernel. · 6347fe4d
  Karl Rupp authored 9 years ago
  
  Eliminates a few redundant writes.
  6347fe4d
May 10, 2015

Merge branch 'karlrupp/sparse-matrix-matrix-product' · 66a8949c

Karl Rupp authored 9 years ago

karlrupp/sparse-matrix-matrix-product:
 Fast implementations of sparse matrix-matrix products.
 About 1.5x faster than MKL on Haswell if AVX2 enabled.
 About 1.5x faster than CUSP and CUBLAS on NVIDIA GPUs.
 About the same performnace on MIC.
 Faster on FirePro W9100 with OpenCL than on a Tesla K20m with CUDA.
 A few more tweaks possible, but will be applied in a separate feature branch.

66a8949c

SpGEMM: Switched back to dynamic scheduling with OpenMP. · b3e5daa0

Karl Rupp authored 9 years ago

Lists and hashes did not perform well, so removed.
Work estimation only showed very mild gains over dynamic scheduling with
suitable block size, so for the time being we stick to the much simpler version.

b3e5daa0

May 07, 2015

SpGEMM: Fixed bug in OpenCL kernel. · f0e57ab7

Karl Rupp authored 9 years ago

Number of work per group was computed incorrectly, thus not all rows have been visited.

f0e57ab7

compressed_matrix: Improved handling of dimensions for STL matrix. · 0cfd0263

Karl Rupp authored 9 years ago

When copying back from device, there's no need for implicitly assuming a square matrix,
because the dimensions are known on the device.

0cfd0263

SpGEMM: Fixed warnings and added asserts() to check for proper dimensions. · 5bf86cc7
Karl Rupp authored 9 years ago

5bf86cc7

SpGEMM: Added OpenCL implementation of RMerge. · d1685c1e

Karl Rupp authored 9 years ago

Replaces old OpenCL implementation.
Uses shared memory rather than warp shuffles.
Uses fixed workgroup size of 32 for merge kernels in order to
get rid of the cost of barriers on AMD devices.
Likely to perform better on AMD devices than on NVIDIA devices,
but performance tests still need to be run.
Fully replaces old OpenCL implementation.

d1685c1e

Apr 27, 2015

custom-cuda: Fixed incorrect passing of arguments to CUDA kernel. · f29e01e0

Karl Rupp authored 9 years ago

Template resolution picked up the incorrect type (char*) in response
to the way pointers are stored internally. See issue #133.

Reported-by: Arijit Hazra <mailtohazra@gmail.com> via viennacl-support

f29e01e0

Apr 18, 2015
- SpGEMM: Added AVX2 to numerical stage. · fa7ba2fe
  Karl Rupp authored 9 years ago
  
  fa7ba2fe
- SpGEMM: Broader use of AVX2 in symbolic stage. · c78438cf
  Karl Rupp authored 9 years ago
  
  c78438cf