l.0-parallel reduction codegen mishandles simul_reduce
As reported by @kaushikcfd on Riot:
import loopy as lp
knl = lp.make_kernel(
"{[i]: 0<=i<4}",
"""
a = simul_reduce(sum, i, 7*i)
b = simul_reduce(sum, i, 10*i)
""")
knl = lp.tag_inames(knl, "i:l.0")
knl = lp.realize_reduction(knl)
print(lp.generate_code_v2(knl).device_code())
generates two sets of inames.
Edited by Kaushik Kulkarni