Short-vector shuffles
cc @arghdos
This issue makes use of this gist posted by @arghdos. (edited to fix gist address)
There are multiple things going on here:
-
In the 'error' case, a message 'When trying to index the array 'b' along axis 2 (tagged 'vec'), the index was not a compile-time constant (but it has to be in order for code to be generated). You likely want to unroll the iname(s) 'i_inner'' is generated. This is misleading because at this point we're already in the 'scalar-only' failure path of this code, i.e. we're just checking if the index in the vec-tagged array axis happens to be a compile-time constant after all. The error message should state that. That error should end up with a type of Unvectorizable
. -
Ideally, as long as the vector index only depends on the vec
-tagged iname, we should generate a vector shuffle. In OpenCL, those look something likevec.yxzw
.
The fact that the "works" case works is due to the following, which is printed during code gen:
vectorizing iname 'i_inner' occurs in unvectorized subscript axis 2 (1-based) of expression 'b[j, i_outer + (2 + i_inner) // 4, 2 + i_inner + (-4)*((2 + i_inner) // 4)]'
Carried over from #125:
For array subscripts with affine indices, we can infer alignment robustly without user intervention using the polyhedral infrastructure. For non-affine indices, we probably want a way to annotate.
Language-wise, those annotations could look like:
a@aligned[i,j]
aligned(a[i,j])
a.aligned[i,j]
I propose that we subclass pymbolic's Subscript node to add a tri-state (true/false/unknown) "aligned" flag, same with Loopy's LinearSubscript. (and rewrite any vanilla Subscript nodes we run into to ours)
To avoid inconsistency, I think the syntax should be the same on the load and store
-
This syntax needs to be mentioned in the documentation somewhere.
On the code generation end, I propose we create an ASTBuilder method to do codegen for a vector load and a vector store, with an aligned flag. The value of that flag should be determined centrally somewhere. (i.e. not per-target/per-AST builder)