ilp.seq tag for iname bounded by temporaries
Hi,
We recently run into the problem of failing the following check when trying to split an iname bounded by temporaries:
https://gitlab.tiker.net/inducer/loopy/blob/master/loopy/check.py#L232
The code snippet is
knl = lp.make_kernel(["{[i] : 0 <= i < n}", "{[j] : s <= j < e}"],
"""
for i
<> s = a[i]
<> e = b[i]
for j
c[j] = d[j] + c[j]
end
end
""", [lp.GlobalArg("a", int), lp.GlobalArg("b", int), lp.GlobalArg("d", float), lp.GlobalArg("c", float), ...],
target=lp.CTarget())
unroll = lp.split_iname(knl, "j", 4, inner_tag="ilp.seq")
What we trying to achieve is split the j
loop, which is bounded by temporaries s
and e
, and that seems to fail the checks. Note that we are doing similar splitting for the element loop, which works fine because the bounds are passed in as integer arguments. I'm wondering what rational behind that check, and anything I can do to mitigate that (via assumptions etc)?
The expected code to generate (after I manually switch off the check above) is something like this
void loopy_kernel(long const *__restrict__ a, long const *__restrict__ b, double const *__restrict__ d, double *__restrict__ c, int const n)
{
int e;
int s;
for (int i = 0; i <= -1 + n; ++i)
{
e = b[i];
s = a[i];
for (int j_outer = (s / 4); j_outer <= ((-4 + e) / 4); ++j_outer)
for (int j_inner = 0; j_inner <= 3; ++j_inner)
c[4 * j_outer + j_inner] = d[4 * j_outer + j_inner] + c[4 * j_outer + j_inner];
}
}
The context of this is, in the extruded meshes in Firedrake, we want to batch the cells "vertically" rather than "horizontally", since the dofs for the cells in one column are arranged adjacent to one another, we therefore amortize the cost of indirect accesses (this translates to about 10% improvements in my test cases of Helmholtz on hexagonals). In the special case of extruded meshes with variable number of layers, we get the code similar to the example above, i.e. having the number of layers passed in as an array, and the loop over layers bounded by temporaries.
Thanks!
-TJ