Speed up kernel comparison / loading from cache
Things working:
Things left to do:
-
Instructions should not have to be analyzed at all along the fast path from "loading from cache" to "execute".
- Cache generation of invokers (this is also a non-trivial latency penalty). Invoker generation looks at the instructions and so prevents us from being fully lazy.
- Kernel.copy() iterates through all instructions; add an option to disable this
-
Full use of lazy data structures
- Add code to generate eq keys and persistent hash keys for instructions (among other things, this has to handle pymbolic expressions, and to normalize the order of sets)
- Use LazilyUnpicklingList for the list of instructions
cc: @inducer
See also: pytential#38