- Nov 07, 2014
- Nov 06, 2014
-
-
Karl Rupp authored
Derived from a Radeon HD 6850 with a tuning run.
-
Karl Rupp authored
Obtained on a Radeon HD 5450. Very low-end GPU, profile aims at compatibility rather than performance.
-
Karl Rupp authored
-
Karl Rupp authored
Flags used: -Wall -Wextra -Weverything -pedantic -Werror -Wno-exit-time-destructors -Wno-global-constructors -Wno-padded -Wno-weak-vtables
-
Karl Rupp authored
-
Karl Rupp authored
Only requires four kernels per iteration, which is much better than the Householder version. Implementation follows Algorithm 2.1 in Walker, Zhou: "A Simpler GMRES" (1994)
-
Karl Rupp authored
-
Karl Rupp authored
-
Karl Rupp authored
-
- Nov 05, 2014
-
-
Toby Smithe authored
-
Karl Rupp authored
Also includes a new example showing the use case. Resolves #69. Reported-by: Pushkar Ratnalikar via viennacl-devel
-
Karl Rupp authored
Might have gotten lost during refactoring?
-
Karl Rupp authored
-
Karl Rupp authored
Vector types lead to compilation issues on NVIDIA GPUs with abs(), since x = abs(y) does not compile due to incompatible vector types.
-
Karl Rupp authored
-
Karl Rupp authored
-
Karl Rupp authored
-
Karl Rupp authored
Discusssion here: https://github.com/viennacl/viennacl-dev/issues/106
-
- Nov 04, 2014
-
-
Karl Rupp authored
No more collisions with GMRES anymore. Resolves #61
-
Karl Rupp authored
Resolves #49
-
Karl Rupp authored
-
Karl Rupp authored
Either vector_base/matrix_base always create shallow copies in their copy-CTOR, or they create deep copies in their copy-CTOR. The side-effects of shallow copies can be corrupted data, leading to wrong results. On the contrary, deep copies are likely to have only poor performance as a side-effect, which is not as bad. The only use for shallow copies is within proxy objects, which now have suitable overloads for their copy-CTOR. Resolves #60
-
Karl Rupp authored
Now avoiding unnecessary temporary buffers.
-
Karl Rupp authored
FFT: Rewrote to fix VS2013 compilation errors
-
Karl Rupp authored
-
- Nov 03, 2014
-
-
Matthew Musto authored
Assuming the complex numbers are in Cartesian form this should be identical in function to the prior function.
-
Karl Rupp authored
Use of 'uint' is rejected by some compilers, e.g. Visual Studio.
-
Karl Rupp authored
-
Karl Rupp authored
Never include this without a separate switch for Apple systems.
-
Karl Rupp authored
-
Karl Rupp authored
Resolves #68. Resolves #76.
-
Karl Rupp authored
-
Karl Rupp authored
v1 -= pow(v1, v2); might be poorly conditioned if v1 is close to 1.0. This fix lifts the values of v1 to be at least 1.1, hopefully fixing the repeated issues seen in the nightly tests in the past.
-
Philippe Tillet authored
-
- Nov 02, 2014
-
-
Karl Rupp authored
Execution times on CPU are otherwise excessive. Provide runtime flag for switching sizes later.
-
Karl Rupp authored
About a factor of 20 faster than previous implementation. I estimate that more microtuning can get another factor of 2. Higher performance gains will most likely require intrinsics.
-
Karl Rupp authored
-