making kernel is slow
Hi,
It seems that making kernel is quite slow in loopy when there are many instructions. Here I have the so-called Holzapfel
kernel generated by firedrake/tsfc, which has 8k instructions. lp.make_kernel()
takes around 55 seconds on my machine. Profiling it seems to indicate that most of the time is spent in instruction.uniquify_instruction_ids
and creation.resolve_dependencies
(which calls fnmatch
64m times in the double loop through instructions).
Are there any suggestions/recommended practices in mitigating this? (More than happy to help if there's implementation to do of course)
Please see the script below and the pickled objects uploaded.
Many thanks!
-TJ
P.S. code generation only takes 6 seconds on this.
import pickle
import loopy as lp
data = pickle.load(open("data.file", "rb"))
domains = pickle.load(open("domains.file", "rb"))
instructions = pickle.load(open("instructions.file", "rb"))
knl = lp.make_kernel(domains, instructions, data, name="test", target=lp.CTarget(), seq_dependencies=True)