So, I have returned to the problem where this has happened. Unfortunately it is hard to provide an MWE, as it is triggered by definition for HUGE kernels. One thing that I noticed while investigating is that the kernel has more inames (246) than it has instructions (230). What is the expected complexity of the scheduler in the number of inames? Looking at the stack trace, we see the scheduler going into recursion on these three lines:
File "/home/dominic/dune/dune-perftool/python/loopy/loopy/schedule/__init__.py", line 1246, in generate_loop_schedules_internal debug=debug): File "/home/dominic/dune/dune-perftool/python/loopy/loopy/schedule/__init__.py", line 923, in generate_loop_schedules_internal allow_boost=rec_allow_boost, debug=debug): File "/home/dominic/dune/dune-perftool/python/loopy/loopy/schedule/__init__.py", line 1030, in generate_loop_schedules_internal allow_boost=rec_allow_boost, debug=debug):
One thing that I noticed while investigating is that the kernel has more inames (246) than it has instructions (230).
Oh wow. Most kernels I work with have just a handful of inames but possibly a few thousand instructions--so I've mostly been thinking about scaling in instruction count.
What is the expected complexity of the scheduler in the number of inames?
I'd expect it to be linear, but I can't say much about the constant.
Have you tried messing with the recursion limit here?
I put max(2*len(kernel.instructions), 4*len(kernel.all_inames())) and that works for me. TBH, the 4 does not originate from code inspection, but rather from counting up until it works...
I know that having so many inames is weird: It is a sum factorization implementation for a system with many components. The kernels are 90% matrix-matrix multiplications (which have 3 inames and 1 instruction).