get_typed_and_scheduled_kernel() unnecessarily does type inference when the result can be cached on disk
Although KernelExecutorBase.get_typed_and_scheduled_kernel() is memoized, it still does type inference for cached kernels in this line: https://gitlab.tiker.net/inducer/loopy/blob/ee8783cbc0b1b962d32b8a387b274ea6cbad615b/loopy/execution.py#L164
@inducer, I think this is why we are seeing calls to loopy's type inference in the warm cache FMM runs.