- Jan 20, 2016
-
-
Karl Rupp authored
Early versions of Mac OS had problems with static objects in the background. Thus, reference decrement had been enabled. However, this is problematic for memory handles, as this implies that memory is never free'd. This is certainly not a feature, hence reenabling reference decrements for memory buffers.
-
Karl Rupp authored
Starting release tests.
-
Karl Rupp authored
-
- Jan 19, 2016
-
-
Karl Rupp authored
-
- Jan 18, 2016
-
-
Karl Rupp authored
Resolves #149. Works for dense matrices. Sparse matrices are intentionally not part of this commit, because sparse matrix transposition is something one better avoids whenever possible.
-
Karl Rupp authored
Now using two sets of SpMV kernels: - One for y = Ax for performance reasons - One for y = alpha * Ax + beta * y; for inplace-operations. Previous use of temporaries for inplace-operations was too slow and resulted in unnecessary memory overhead.
-
Karl Rupp authored
Required for some OpenCL SDKs on CPUs.
-
Karl Rupp authored
This resulted in errors on Mac OS X 10.6.8.
-
Karl Rupp authored
Work group size 1 is encountered on some older implementations for CPUs as well as on current mobile hardware.
-
- Jan 17, 2016
-
-
Karl Rupp authored
On CPUs the maximium workgroup size might be 1, which was not considered in the previous implementation. This commit removes this restriction.
-
Karl Rupp authored
This error is usually encountered if the OpenCL SDK or driver is not installed correctly. However, this was not properly explained in the error message. Now fixed.
-
Karl Rupp authored
Since OpenCL is now disabled, viennacl-info is no longer built.
-
- Jan 16, 2016
-
-
Karl Rupp authored
New Ubuntu Trusty machines no longer provide the OpenCL driver from fglrx. Although this is not perfect, running tests without OpenCL is better than not running any tests at all...
-
Karl Rupp authored
Apparently the test machines experienced an upgrade, so the old fglrx version was no longer available.
-
Karl Rupp authored
See http://www.kitware.com/blog/home/post/510 for background info.
-
Karl Rupp authored
- Added new AMD GPU codenames. - NVIDIA Fermi GPUs are mapped to names in the database. Improved detection of Tesla GPUs. Resolves #150.
-
- Jan 15, 2016
-
-
Karl Rupp authored
karlrupp/feature-reduce-generator-usage: Removes BLAS level 1 and 2 operations from generator, only GEMM remains. Reasons: no support for char and short, some unnecessary kernel launches, poor CPU support. Fixes #167. Conflicts: viennacl/device_specific/builtin_database/devices/gpu/nvidia/kepler/tesla_k20m.hpp
-
Karl Rupp authored
With BLAS level 1 and 2 operations moved out of the generator again, a lot of code was no longer needed. This commit removes all these no longer needed code lines.
-
Karl Rupp authored
Operations are now pulled out of the generator, so dead code is no longer needed.
-
Karl Rupp authored
First issue: Internal assertion in Intel OpenCL vectorizer. Workaround by refactoring binary element-wise operations. Second issue: Multiple inner products produced wrong results with CPUs. Fixed by using correct number of groups.
-
- Jan 11, 2016
-
-
Patrick Sanan authored
-
- Oct 20, 2015
-
-
Karl Rupp authored
-
- Oct 19, 2015
-
-
Karl Rupp authored
Optimization of matrix and vector operations in the OpenMP backend
-
- Sep 29, 2015
-
-
Karl Rupp authored
Squared norm was used in computation, so we need to square the tolerance.
-
- Sep 22, 2015
-
-
Karl Rupp authored
Resolves #166. In short, the OpenCL standard prohibits the use of floating point arguments for expr1 in the ternary operator expr1 ? expr2 : expr3
-
- Sep 15, 2015
-
-
Franz-S authored
-
Franz-S authored
-
Franz-S authored
-
Franz-S authored
The boundary for the iteration on coulumns was changed from A_size1 to A_size2
-
Franz-S authored
-
Franz-S authored
Increases perfomance on my latop due to better memory bandwidth. Correct a bug if data_alpha is Integer and reciprocal_alpha=TRUE.
-
Franz-S authored
Increases performance by 2x due to better memory access.
-
Franz-S authored
The matrix will be divided into sub-matrices for better storage access. This increases the performance by 3x on my laptop.
-
Franz-S authored
OpenMp does not support vcl_size_t in for-loops, migration to long was required.
-
Franz-S authored
Increases performance by 2x on my laptop due to better memory bandwidth
-
Franz-S authored
Increases performance by 2x on my laptop due to better memory bandwidth Correct a mistake in the description
-
Franz-S authored
Increases performance by 2x on my laptop due to better memory bandwidth.
-
Franz-S authored
Increases performance by 2x on my laptop due to better memory bandwidth. Correct a mistake in the description.
-