Instruction scheduling is slow

The instruction scheduler creates a priority queue of instructions to check at each call to the scheduler:, as can be seen here

However, this queue is inefficient for two reasons:

Sorting does not take into account instruction dependencies, so the scheduler may try A before B even when A depends on B.
The sorting can be re-used across recursive calls to the scheduler.

This has a noticeable performance impact for large (order 25-ish) pytential kernels.