Skip to content

Barrier insertion: Dependencies before first barrier in loop body don't always get handled

import loopy as lp
knl = lp.make_kernel(
    "{[i,j]: 0 <= i,j < 10 }",
    """
    for i
     <>a[i] = i
     for j
      <>t = a[(i + 1) % 10]
      <>b[i,j] = a[i] + t
      b[i,j] = b[i,j] + 1
     end
    end
    """,
    seq_dependencies=True)
knl = lp.tag_inames(knl, dict(i="l.0"))
knl = lp.set_temporary_scope(knl, "a", "local")
knl = lp.set_temporary_scope(knl, "b", "local")
print(lp.get_one_scheduled_kernel(lp.preprocess_kernel(knl)))

results in

---------------------------------------------------------------------------
KERNEL: loopy_kernel
---------------------------------------------------------------------------
ARGUMENTS:
---------------------------------------------------------------------------
DOMAINS:
{ [i, j] : 0 <= i <= 9 and 0 <= j <= 9 }
---------------------------------------------------------------------------
INAME IMPLEMENTATION TAGS:
i: l.0
j: None
---------------------------------------------------------------------------
TEMPORARIES:
a: type: np:dtype('int32'), shape: (10), dim_tags: (N0:stride:1) scope:local
b: type: np:dtype('int32'), shape: (10, 10), dim_tags: (N1:stride:10, N0:stride:1) scope:local
t: type: np:dtype('int32'), shape: () scope:private
---------------------------------------------------------------------------
INSTRUCTIONS:
↱ [i]                                  a[i] <- i   # insn
└↱[i,j]                                t <- a[((i + 1) % 10)]   # insn_0
↱└[i,j]                                b[i, j] <- a[i] + t   # insn_1
└ [i,j]                                b[i, j] <- b[i, j] + 1   # insn_2
---------------------------------------------------------------------------
SCHEDULE:
   0: CALL KERNEL loopy_kernel(extra_args=[], extra_inames=[])
   1:     [insn] a[i] <- i
   2:     FOR j
   3:         [insn_0] t <- a[((i + 1) % 10)]
   4:         ---BARRIER:local---
   5:         [insn_1] b[i, j] <- a[i] + t
   6:         ---BARRIER:local---
   7:         [insn_2] b[i, j] <- b[i, j] + 1
   8:     END j
   9: RETURN FROM KERNEL loopy_kernel
---------------------------------------------------------------------------

I think there needs to be a barrier between the definitions of a and t. The problem seems to be in this line because seen_barrier() appears to clear the list of barrier insertion candidates. I'm not 100% confident about this though.