Newer
Older
Documentation Notes
^^^^^^^^^^^^^^^^^^^
- Need to clarify fundamental difference between constants baked into code
and things that remain variable. (ISL parameters, symbolic shapes)
Things to consider
^^^^^^^^^^^^^^^^^^
- Depedencies are pointwise for shared loop dimensions
and global over non-shared ones (between dependent and ancestor)
- multiple insns could fight over which iname gets local axis 0
-> complicated optimization problem
- Every loop in loopy is opened at most once.
Too restrictive?
- Loop bounds currently may not depend on parallel dimensions
Does it make sense to relax this?
- Why do CSEs necessarily have to duplicate the inames?
-> because that would be necessary for a sequential prefetch
- Cannot do slab decomposition on inames that share a tag with
other inames
-> Is that reasonable?
- Not using all hw loop dimensions causes an error, as
is the case for variant 3 in the rank_one test.
- Loopy as a data model for implementing custom rewritings
- We won't generate WAW barrier-needing dependencies
from one instruction to itself.
- Automatically generate testing code vs. sequential.
- If isl can prove that all operands are positive, may use '/' instead of
'floor_div'.
- Fix all tests
- Deal with equality constraints.
(These arise, e.g., when partitioning a loop of length 16 into 16s.)
- Float4 joining on fetch/store?
- How can one automatically generate something like microblocks?
- Better for loop bound generation
-> Try a triangular loop
- Eliminate the first (pre-)barrier in a loop.
- Generate automatic test against sequential code.
- Automatically verify that all array access is within bounds.
- Reason about generated code, give user feedback on potential
improvements.
- Convolutions, Stencils
- DMA engine threads?
- Divisibility, modulo, strides?
- Try, fix indirect addressing
- Use gists (why do disjoint sets arise?)
- Nested slab decomposition (in conjunction with conditional hoisting) could
generate nested conditional code.
- For forced workgroup sizes: check that at least one iname
maps to them.
- variable shuffle detection
-> will need unification
Andreas Klöckner
committed
- user interface for dim length prescription
- Restrict-to-sequential and tagging have nothing to do with each other.
-> Removed SequentialTag and turned it into a separate computed kernel
property.
Andreas Klöckner
committed
- Just touching a variable written to by a non-idempotent
instruction makes that instruction also not idempotent
-> Idempotent renamed to boostable.
-> Done.
- Give the user control over which reduction inames are
duplicated.
- assert dependencies <= parent_inames in loopy/__init__.py
-> Yes, this must be the case.
-> If you include reduction inames.
- Give a good error message if a parameter assignment in get_problems()
is missing.
- Slab decomposition for ILP
-> I don't think that's possible.
- It is hard to understand error messages that referred to instructions that
are generated during preprocessing.
-> Expose preprocessing to the user so she can inspect the preprocessed
kernel.
- Which variables need to be duplicated for ILP?
-> Only reduction
- implemented_domain may end up being smaller than requested in cse
evaluations--check that!
- Allow prioritization of loops in scheduling.
- Make axpy better.
- Screwy lower bounds in slab decomposition
- Flag, exploit idempotence
- Some things involving CSEs might be impossible to schedule
a[i,j] = cse(b[i]) * cse(c[j])
- Be smarter about automatic local axis choice
-> What if we run out of axes?
- Implement condition hoisting
(needed, e.g., by slab decomposition)
- Check for non-use of hardware axes
- Slab decomposition for parallel dimensions
- implement at the outermost nesting level regardless
- bound *all* tagged inames
- can't slab inames that share tags with other inames (for now)
- Make syntax for iname dependencies
- make syntax for insn dependencies
- CSE iname duplication might be unnecessary?
(don't think so: It might be desired to do a full fetch before a mxm k loop
even if that requires going iterative.)
- Reduction needs to know a neutral element
- Types of reduction variables?
- Generalize reduction to be over multiple variables
- duplicate_dimensions can be implemented without having to muck around
with individual constraints:
- add_dims
- move_dims
- intersect
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
Should a dependency on an iname be forced in a CSE?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Local var:
l | n
g | y
dl | Err
d | Err
Private var:
l | y
g | y
dl | Err
d | Err
dg: Invalid-> error
d: is duplicate
l: is tagged as local idx
g: is tagged as group idx
Raise error if dl is targeting a private variable, regardless of whether it's
a dependency or not.