Barrier insertion: Dependencies before first barrier in loop body don't always get handled
import loopy as lp
knl = lp.make_kernel(
"{[i,j]: 0 <= i,j < 10 }",
"""
for i
<>a[i] = i
for j
<>t = a[(i + 1) % 10]
<>b[i,j] = a[i] + t
b[i,j] = b[i,j] + 1
end
end
""",
seq_dependencies=True)
knl = lp.tag_inames(knl, dict(i="l.0"))
knl = lp.set_temporary_scope(knl, "a", "local")
knl = lp.set_temporary_scope(knl, "b", "local")
print(lp.get_one_scheduled_kernel(lp.preprocess_kernel(knl)))
results in
---------------------------------------------------------------------------
KERNEL: loopy_kernel
---------------------------------------------------------------------------
ARGUMENTS:
---------------------------------------------------------------------------
DOMAINS:
{ [i, j] : 0 <= i <= 9 and 0 <= j <= 9 }
---------------------------------------------------------------------------
INAME IMPLEMENTATION TAGS:
i: l.0
j: None
---------------------------------------------------------------------------
TEMPORARIES:
a: type: np:dtype('int32'), shape: (10), dim_tags: (N0:stride:1) scope:local
b: type: np:dtype('int32'), shape: (10, 10), dim_tags: (N1:stride:10, N0:stride:1) scope:local
t: type: np:dtype('int32'), shape: () scope:private
---------------------------------------------------------------------------
INSTRUCTIONS:
↱ [i] a[i] <- i # insn
└↱[i,j] t <- a[((i + 1) % 10)] # insn_0
↱└[i,j] b[i, j] <- a[i] + t # insn_1
└ [i,j] b[i, j] <- b[i, j] + 1 # insn_2
---------------------------------------------------------------------------
SCHEDULE:
0: CALL KERNEL loopy_kernel(extra_args=[], extra_inames=[])
1: [insn] a[i] <- i
2: FOR j
3: [insn_0] t <- a[((i + 1) % 10)]
4: ---BARRIER:local---
5: [insn_1] b[i, j] <- a[i] + t
6: ---BARRIER:local---
7: [insn_2] b[i, j] <- b[i, j] + 1
8: END j
9: RETURN FROM KERNEL loopy_kernel
---------------------------------------------------------------------------
I think there needs to be a barrier between the definitions of a
and t
.
The problem seems to be in this line
because seen_barrier()
appears to clear the list of barrier insertion candidates. I'm not 100% confident about this though.