Commits · d254aaa4f3b64ea80f70aaa485bc2aa7f430af2c · Kaushik Kulkarni / viennacl-dev

Jan 20, 2016

Mac OS: Reenabled release of memory buffers. · d254aaa4

Karl Rupp authored Jan 20, 2016

Early versions of Mac OS had problems with static objects in the background.
Thus, reference decrement had been enabled. However, this is problematic
for memory handles, as this implies that memory is never free'd.
This is certainly not a feature, hence reenabling reference decrements for
memory buffers.

d254aaa4

Version: Increased to 1.7.1. · 6cb7c249
Karl Rupp authored Jan 20, 2016
```
Starting release tests.
```
6cb7c249
Copyright: Updated year span from 2010-2015 to 2010-2016. · cd4c5f40
Karl Rupp authored Jan 20, 2016

cd4c5f40

Jan 19, 2016
- changelog: Collected changes for 1.7.1 release. · 90d25adc
  Karl Rupp authored Jan 19, 2016
  
  90d25adc
- OpenCL: Removed left-over use of abs() for unsigned integers. · 08a9c71f
  Karl Rupp authored Jan 19, 2016
```
Caused problems on Mac OS X 10.6.8.
This fixes the forgotten bits in commit
fa046011
```
  08a9c71f
- OpenCL: Fixed work size restriction in sum_inner_prod kernel. · d156a78c
  Karl Rupp authored Jan 19, 2016
```
Now possible to use work group size 1. Should have been part of
commit 745e3400 already.
```
  d156a78c
Jan 18, 2016

Matrix: Added arbitrary expressions for trans() · e73e96d1

Karl Rupp authored Jan 18, 2016

Resolves #149.
Works for dense matrices. Sparse matrices are intentionally not part
of this commit, because sparse matrix transposition is something one
better avoids whenever possible.

e73e96d1

Sparse matrices: No more temporaries for y += Ax; and y -= Ax; · dc1c0413

Karl Rupp authored Jan 18, 2016

Now using two sets of SpMV kernels:
 - One for y = Ax for performance reasons
 - One for y = alpha * Ax + beta * y; for inplace-operations.
Previous use of temporaries for inplace-operations was too slow and
resulted in unnecessary memory overhead.

dc1c0413

OpenCL: max and min on vectors now also work with work group size 1. · 26e81488
Karl Rupp authored Jan 18, 2016
```
Required for some OpenCL SDKs on CPUs.
```
26e81488
OpenCL: Removed use of abs() on unsigned integers. · fa046011
Karl Rupp authored Jan 18, 2016
```
This resulted in errors on Mac OS X 10.6.8.
```
fa046011
OpenCL: Compatibility of summation kernel for work group size 1. · 745e3400
Karl Rupp authored Jan 18, 2016
```
Work group size 1 is encountered on some older implementations for CPUs
as well as on current mobile hardware.
```
745e3400

Jan 17, 2016

OpenCL: Improved compatibility of triangular solves for work size 1. · b6cf6eb2

Karl Rupp authored Jan 17, 2016

On CPUs the maximium workgroup size might be 1, which was not considered
in the previous implementation. This commit removes this restriction.

b6cf6eb2

OpenCL: Clarified error message for 'unknown_error'. · f6754ff3

Karl Rupp authored Jan 17, 2016

This error is usually encountered if the OpenCL SDK or driver is not
installed correctly. However, this was not properly explained in
the error message. Now fixed.

f6754ff3

Travis-CI: Removed no longer needed call to viennacl-info · 37c5f507
Karl Rupp authored Jan 17, 2016
```
Since OpenCL is now disabled, viennacl-info is no longer built.
```
37c5f507

Jan 16, 2016

Travis-CI: Removed OpenCL from tests. · 5b8b6011

Karl Rupp authored Jan 16, 2016

New Ubuntu Trusty machines no longer provide the OpenCL driver from fglrx.
Although this is not perfect, running tests without OpenCL is better
than not running any tests at all...

5b8b6011

Travis-CI: Attempt to fix now missing fglrx package. · c015d284

Karl Rupp authored Jan 16, 2016

Apparently the test machines experienced an upgrade, so the old
fglrx version was no longer available.

c015d284

CMake: Now setting MACOSX_RPATH to avoid dev-warning. · 7e57a5ea
Karl Rupp authored Jan 16, 2016
```
See http://www.kitware.com/blog/home/post/510 for background info.
```
7e57a5ea

Device Database: Improved device detection algorithm. · 107c1da8

Karl Rupp authored Jan 16, 2016

- Added new AMD GPU codenames.
- NVIDIA Fermi GPUs are mapped to names in the database.
  Improved detection of Tesla GPUs.

Resolves #150.

107c1da8

Jan 15, 2016

Merge branch 'karlrupp/feature-reduce-generator-usage' · b9e075c6

Karl Rupp authored Jan 15, 2016

karlrupp/feature-reduce-generator-usage:
  Removes BLAS level 1 and 2 operations from generator, only GEMM remains.
  Reasons: no support for char and short, some unnecessary kernel launches, poor CPU support.
  Fixes #167.

Conflicts:
	viennacl/device_specific/builtin_database/devices/gpu/nvidia/kepler/tesla_k20m.hpp

b9e075c6

Generator: Major cleanup, only matrix-products remain. · 993889c4

Karl Rupp authored Jan 15, 2016

With BLAS level 1 and 2 operations moved out of the generator again,
a lot of code was no longer needed. This commit removes all these
no longer needed code lines.

993889c4

Generator: Removed unused code. · eb2d0d57

Karl Rupp authored Jan 15, 2016

Operations are now pulled out of the generator, so dead code is no longer needed.

eb2d0d57

OpenCL: Fixed problems observed with Intel OpenCL Runtime 14.2 · d9bed704

Karl Rupp authored Jan 15, 2016

First issue: Internal assertion in Intel OpenCL vectorizer.
Workaround by refactoring binary element-wise operations.
Second issue: Multiple inner products produced wrong results with CPUs.
Fixed by using correct number of groups.

d9bed704

Jan 11, 2016
- Typos: Fixed two typos in docs and one in tests. · cf6ebc5d
  Patrick Sanan authored Jan 11, 2016
  
  cf6ebc5d
Oct 20, 2015
- GCC: Fixed unused variable and conversion warnings. · afe92df2
  Karl Rupp authored Oct 20, 2015
  
  afe92df2
Oct 19, 2015
- Merge pull request #164 from Franz-S/request · a72ee2b4
  Karl Rupp authored Oct 19, 2015
```
Optimization of matrix and vector operations in the OpenMP backend
```
  a72ee2b4
Sep 29, 2015
- CG: Fixed incorrect check for absolute tolerance. · 3de1eb05
  Karl Rupp authored Sep 29, 2015
```
Squared norm was used in computation, so we need to square the tolerance.
```
  3de1eb05
Sep 22, 2015

OpenCL: Removed float condition for ternary operator in kernels. · d46bcd95

Karl Rupp authored Sep 22, 2015

Resolves #166. In short, the OpenCL standard prohibits the use
of floating point arguments for expr1 in the ternary operator
 expr1 ? expr2 : expr3

d46bcd95

Sep 15, 2015
- trans(): changed order of instructions for better clarity · 62519178
  Franz-S authored Sep 15, 2015
  
  62519178
- trans(): Fixed false calculation of the boundary for the loop · e4fc1599
  Franz-S authored Sep 15, 2015
  
  e4fc1599
- trans(): changed the matrix names to common data_A and data_B · fabc16a7
  Franz-S authored Sep 14, 2015
  
  fabc16a7
- copy_vec(): Fixed a bug and formatting the code · 6a145798
  Franz-S authored Sep 14, 2015
```
The boundary for the iteration on coulumns was changed from A_size1 to A_size2
```
  6a145798
- OpenMp: Introduction on minimal matrix size for matrix operations. · 54c4f5fd
  Franz-S authored Sep 14, 2015
  
  54c4f5fd
- OpenMp: Parallelized scaled_rank_1_update() · 93135ff3
  Franz-S authored Sep 08, 2015
```
Increases perfomance on my latop due to better memory bandwidth.
Correct a bug if data_alpha is Integer and reciprocal_alpha=TRUE.
```
  93135ff3
- OpenMp: Parallelized prod_impl() · fa445fbd
  Franz-S authored Sep 10, 2015
```
Increases performance by 2x due to better memory access.
```
  fa445fbd
- OpenMp: improved performance on the trans() function · 324d3090
  Franz-S authored Sep 09, 2015
```
The matrix will be divided into sub-matrices for better storage access.
This increases the performance by 3x on my laptop.
```
  324d3090
- sum_impl(): Fixed datatype in for-loop, · 49fc6755
  Franz-S authored Sep 08, 2015
```
OpenMp does not support vcl_size_t in for-loops, migration to
long was required.
```
  49fc6755
- OpenMP:Parallelized index_norm_inf() · 886b2267
  Franz-S authored Sep 02, 2015
```
Increases performance by 2x on my laptop due to better memory bandwidth
```
  886b2267
- OpenMP:Parallelized sum_impl() · e5776334
  Franz-S authored Aug 30, 2015
```
Increases performance by 2x on my laptop due to better memory bandwidth
Correct a mistake in the description
```
  e5776334
- OpenMP:Parallelized max_impl() · d08567ed
  Franz-S authored Aug 21, 2015
```
Increases performance by 2x on my laptop due to better memory bandwidth.
```
  d08567ed
- OpenMP:Parallelized min_impl() · b3df531e
  Franz-S authored Aug 21, 2015
```
Increases performance by 2x on my laptop due to better memory bandwidth.
Correct a mistake in the description.
```
  b3df531e