- Aug 03, 2023
-
-
- Jul 30, 2023
-
-
- Jul 28, 2023
-
-
- Jul 25, 2023
-
-
This avoids long-lived references to CL kernels held by loopy caches
-
- Jul 19, 2023
-
-
Andreas Klöckner authored
-
- Jun 02, 2023
-
-
- May 29, 2023
-
-
gives a 40% performance boost in CUDA
-
- May 25, 2023
-
-
- May 23, 2023
-
-
Isuru Fernando authored
-
-
- May 21, 2023
-
-
Isuru Fernando authored
-
- May 17, 2023
-
-
Isuru Fernando authored
* Use pytential branch * Refactor E2P * try new loopy branch * fix formatting * disable domains check * register only if not found * Move kernel_scaling to the outer kernel * Refactor P2E * Use loopy main * re-enable implemented domains check * Rename some loopy kernel handling functions --------- Co-authored-by:
Andreas Kloeckner <inform@tiker.net>
-
- May 02, 2023
-
-
- Apr 30, 2023
-
-
- Apr 29, 2023
-
-
- Apr 25, 2023
-
-
- Apr 22, 2023
-
-
Isuru Fernando authored
* Move derivative taker to a separate file * Add fold markers * fix docs
-
- Apr 06, 2023
-
-
- Mar 30, 2023
-
-
- Feb 17, 2023
-
-
Isuru Fernando authored
-
- Feb 14, 2023
-
-
- Jan 20, 2023
-
-
- Jan 10, 2023
-
-
- Jan 06, 2023
-
-
Isuru Fernando authored
* Merge local_isrc and local_isrc_strength and tag as vec for coalescced access * Use new name * Add an explanation about the optimization * Back to loopy main with renamed transform Co-authored-by:
Andreas Kloeckner <inform@tiker.net>
-
- Jan 05, 2023
-
-
Isuru Fernando authored
* Optimize M2L for GPU * Move icoeff_tgt to top level iname * Fix substitution * use loopy branch * remove unused imports * go back to loopy main * Reduce diff * move all optimizations to m2l_translation * Remove extraneous FIXME Co-authored-by:
Andreas Klöckner <inform@tiker.net>
-
- Nov 23, 2022
-
-
Andreas Klöckner authored
-
Andreas Klöckner authored
-
Andreas Klöckner authored
-
- Nov 06, 2022
-
-