push iname to innermost
This is somewhat related to #129 and !245 , as I'm trying to avoid tagging with ilp.seq
Consider this nested loop:
knl = lp.make_kernel([
"{ [i]: 0 <= i < 10}",
"{ [j]: 0 <= j < 20 }",
"{ [elem]: 0 <= elem < 4}"
],
"""
for elem
for i
a[i*4 + elem] = 1 {id = i}
end
for j
b[j*4 + elem] = 1 {id = j}
end
end
""",
target=lp.CTarget()
)
knl = lp.add_and_infer_dtypes(knl, {"a": np.dtype(np.float64)})
I want to make elem
loop innermost, I can do this simply with knl = lp.tag_inames(knl, {"elem": "ilp.seq"})
void loopy_kernel(double *__restrict__ a, int *__restrict__ b)
{
for (int j = 0; j <= 19; ++j)
for (int elem = 0; elem <= 3; ++elem)
b[4 * j + elem] = 1;
for (int i = 0; i <= 9; ++i)
for (int elem = 0; elem <= 3; ++elem)
a[4 * i + elem] = 1.0;
}
But I'm not sure how to achieve this with prioritize_loops, e.g. knl = lp.prioritize_loops(knl, "i, j, elem")
void loopy_kernel(double *__restrict__ a, int *__restrict__ b)
{
for (int elem = 0; elem <= 3; ++elem)
{
for (int i = 0; i <= 9; ++i)
a[4 * i + elem] = 1.0;
for (int j = 0; j <= 19; ++j)
b[4 * j + elem] = 1;
}
}
Do I need to write the kernel differently, with elem1
, elem2
separately for i
and j
loop, for instance, or maybe with duplicate_inames()
?
Many thanks,
-TJ