Sequential loop bounds generation does not obey projection semantics
knl = lp.make_kernel(
"{[i, loc1, loc2]: 0 <= loc1 <= 1 and 0 <= loc2 <= 2"
" and 0 <= i <= loc1 and 0 <= i <= loc2}",
"""
<>tmp[loc2] = 0
for i
tmp[i] = 1 {inames=i:loc2}
end
out[loc2] = tmp[loc2]
""",
"...",
seq_dependencies=True)
knl = lp.tag_inames(knl, dict(loc1="l.0", loc2="l.0"))
knl = lp.set_temporary_scope(knl, "tmp", "local")
This kernel returns [1, 1, 0]
but I would expect it to return [1, 1, 1]
. The root cause appears to be that the loop bounds for i
are too tight - they rely on both loc1
and loc2
. I think they should be using only loc2
, to be consistent with the projection semantics of loopy.
(Part of) the fix is that get_usable_inames_for_conditional()
should use only the common set of parallel inames in the block. Right now it returns both loc1
and loc2
.
Edited by Matt Wala