Reduce cache-to-execution latency by using lazy unpickling of instructions. (!242) · Merge requests · Andreas Klöckner / loopy

In order to this, invoker generation needs to be cached.

Performance impact on a warm-cache run of test_fmm.py from sumpy

Before

real 43.77
user 44.58
sys 2.96

After

real 37.08
user 37.73
sys 3.02

Edited Mar 16, 2018 by Matt Wala

Reduce cache-to-execution latency by using lazy unpickling of instructions.