Kernel printing: hard to tell where loops start and end
E.g. when printing out test_rob_stroud_bernstein_full
one gets the following. From the output, it's hard to tell at a glance that every instruction belongs to el_inner
:
---------------------------------------------------------------------------
KERNEL: loopy_kernel
---------------------------------------------------------------------------
ARGUMENTS:
coeffs: GlobalArg, type: <runtime>, shape: unknown
nels: ValueArg, type: <runtime>
qpts: GlobalArg, type: <runtime>, shape: (2, 7), dim_tags: (N1:stride:7, N0:stride:1)
result: GlobalArg, type: <runtime>, shape: (nels, 7, 7), dim_tags: (N2:stride:49, N1:stride:7, N0:stride:1)
---------------------------------------------------------------------------
DOMAINS:
[nels] -> { [i2, alpha1, alpha2, i1_2, alpha1_2, i2_2, el_inner, el_outer_outer, el_outer_inner] : 0 <= i2 <= 6 and 0 <= alpha1 <= 4 and 0 <= alpha2 <= 4 - alpha1 and 0 <= i1_2 <= 6 and 0 <= alpha1_2 <= 4 and 0 <= i2_2 <= 6 and 0 <= el_inner <= 15 and 0 <= el_outer_inner <= 1 and -el_inner - 32el_outer_outer <= 16el_outer_inner < nels - el_inner - 32el_outer_outer }
---------------------------------------------------------------------------
INAME IMPLEMENTATION TAGS:
alpha1: unr
alpha1_2: None
alpha2: unr
el_inner: l.0
el_outer_inner: ilp.unr
el_outer_outer: g.0
i1_2: None
i2: l.1
i2_2: None
---------------------------------------------------------------------------
TEMPORARIES:
aind: type: <auto>, shape: () scope:auto
r: type: <auto>, shape: () scope:auto
r2: type: <auto>, shape: () scope:auto
s: type: <auto>, shape: () scope:auto
s2: type: <auto>, shape: () scope:auto
tmp: type: <auto>, shape: (5, 7), dim_tags: (N1:stride:7, N0:stride:1) scope:auto
w: type: <auto>, shape: () scope:auto
w2: type: <auto>, shape: () scope:auto
xi: type: <auto>, shape: () scope:auto
xi2: type: <auto>, shape: () scope:auto
---------------------------------------------------------------------------
INSTRUCTIONS:
↱↱ [el_inner,el_outer_inner,el_outer_outer,i2]
││ xi <- qpts[1, i2] # insn
└│↱↱ [el_inner,el_outer_inner,el_outer_outer,i2]
│││ s <- 1 + (-1)*xi # insn_0
↱└└│ [el_inner,el_outer_inner,el_outer_outer,i2]
│ │ r <- xi / s # insn_1
│↱ │ [el_inner,el_outer_inner,el_outer_outer,i2]
││ │ aind <- 0 # aind_init
││↱└ [alpha1,el_inner,el_outer_inner,el_outer_outer,i2]
│││ w <- s**(4 + (-1)*alpha1) # init_w
│││↱↱↱ [alpha1,el_inner,el_outer_inner,el_outer_outer,i2]
││││││ tmp[alpha1, i2] <- tmp[alpha1, i2] + w*coeffs[aind] # write_tmp
└│└└││↱[alpha1,alpha2,el_inner,el_outer_inner,el_outer_outer,i2]
│ │││ w <- w*r*(4 + (-1)*alpha1 + (-1)*alpha2) / (1 + alpha2) # update_w
↱└ └│└[alpha1,alpha2,el_inner,el_outer_inner,el_outer_outer,i2]
│ │ aind <- aind + 1 # aind_incr
└↱↱ │ [el_inner,el_outer_inner,el_outer_outer,i1_2]
││ │ xi2 <- qpts[0, i1_2] # insn_2
↱└│↱ │ [el_inner,el_outer_inner,el_outer_outer,i1_2]
│ ││ │ s2 <- 1 + (-1)*xi2 # insn_3
└↱└│ │ [el_inner,el_outer_inner,el_outer_outer,i1_2]
│ │ │ r2 <- xi2 / s2 # insn_4
│ └ │ [el_inner,el_outer_inner,el_outer_outer,i1_2]
│ │ w2 <- s2**4 # insn_5
│ └ [alpha1_2,el_inner,el_outer_inner,el_outer_outer,i1_2,i2_2]
│ result[el_inner + (el_outer_inner + el_outer_outer*2)*16, i1_2, i2_2] <- result[el_inner + (el_outer_inner + el_outer_outer*2)*16, i1_2, i2_2] + w2*tmp[alpha1_2, i2_2] # insn_6
└ [alpha1_2,el_inner,el_outer_inner,el_outer_outer,i1_2]
w2 <- w2*r2*(4 + (-1)*alpha1_2) / (1 + alpha1_2) # insn_7
---------------------------------------------------------------------------
There are two ways of going about this (that I can think of):
- infer as much of a nesting order as we can at this stage, and always print out inames in the nesting order
- use a different color for each iname in the list
I am inclined to go with (1), although there is still the potential confusion that if there are multiple nesting orders the scheduler may pick a different one. Perhaps a different visual indicator to indicate that the nesting order is ambiguous, such as surrounding equivalently-nested inames in {}?