To resolve issues where some instructions aren't within inames which are, e.g., local hardware axes, this MR provides a transformation to nest such instructions within that loop and adding a predicate to only evaluate once.
Consider the following variant of the convolution kernel in test_convolution
in test_apps.py
:
knl = lp.split_iname(knl, "im_x", 2, outer_tag="g.0", inner_tag="l.0")
knl = lp.split_iname(knl, "im_y", 2, outer_tag="g.1", inner_tag="l.1")
knl = lp.tag_inames(knl, dict(ifeat="l.2"))
knl = lp.add_prefetch(knl, "img", "im_x_inner, im_y_inner, f_x, f_y",
default_tag="l.auto")
knl = nest_and_predicate_instructions(knl, "ifeat", "id:img_fetch_rule")
The kernel will schedule if silenced_warnings=['write_race_local(img_fetch_rule)','iname-order']
, but the code produces is incorrect (even though it seems to be correctly nesting and predicating the fetch instruction). Without nest_and_predicate_instructions
the kernel schedules and is correct, but the same prefetch is occurring redundantly for all ifeat
/l.2
.