Skip to content
Commit e8a6e5b3 authored by Karl Rupp's avatar Karl Rupp
Browse files

* Reimplementation of LU factorization in viennacl/linalg/lu.hpp. Better...

* Reimplementation of LU factorization in viennacl/linalg/lu.hpp. Better performance, but still a lot of unused potential.
* Replaced slow generic CUDA matrix-matrix multiplication kernel by several semi-automatically generated kernels. Performance still only half of OpenCL, although code is virtually identical.
* Fixed a bug with C = prod(A, B) if C is a matrix_range or matrix_slice. An unnecessary temporary was introduced.
* CUDA-benchmarks now build correctly
parent 68ec5e72
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment