Newer
Older
Documentation Notes
^^^^^^^^^^^^^^^^^^^
- Need to clarify fundamental difference between constants baked into code
and things that remain variable. (ISL parameters, symbolic shapes)
Things to consider
^^^^^^^^^^^^^^^^^^
- Depedencies are pointwise for shared loop dimensions
and global over non-shared ones (between dependent and ancestor)
- multiple insns could fight over which iname gets local axis 0
-> complicated optimization problem
- Every loop in loopy is opened at most once.
Too restrictive?
- Loop bounds currently may not depend on parallel dimensions
Does it make sense to relax this?
- Why do CSEs necessarily have to duplicate the inames?
-> because that would be necessary for a sequential prefetch
- Cannot do slab decomposition on inames that share a tag with
other inames
-> Is that reasonable?
- Parallel dimension splitting/merging via tags
-> unnecessary?
- Not using all hw loop dimensions causes an error, as
is the case for variant 3 in the rank_one test.
- Loopy as a data model for implementing custom rewritings
- We won't generate WAWs barrier-needing dependencies
from one instruction to itself.
-> will need unification
- Fix all tests
- Automatically generate testing code vs. sequential.
- Deal with equality constraints.
(These arise, e.g., when partitioning a loop of length 16 into 16s.)
- duplicate_dimensions can be implemented without having to muck around
with individual constraints:
- add_dims
- move_dims
- intersect
- Float4 joining on fetch/store?
- How can one automatically generate something like microblocks?
- Better for loop bound generation
-> Try a triangular loop
- Eliminate the first (pre-)barrier in a loop.
- Generate automatic test against sequential code.
- Automatically verify that all array access is within bounds.
- Reason about generated code, give user feedback on potential
improvements.
- Convolutions, Stencils
- DMA engine threads?
- Divisibility, modulo, strides?
- Try, fix indirect addressing
- Use gists (why do disjoint sets arise?)
- Nested slab decomposition (in conjunction with conditional hoisting) could
generate nested conditional code.
Andreas Klöckner
committed
- user interface for dim length prescription
- Restrict-to-sequential and tagging have nothing to do with each other.
-> Removed SequentialTag and turned it into a separate computed kernel
property.
Andreas Klöckner
committed
- Just touching a variable written to by a non-idempotent
instruction makes that instruction also not idempotent
-> Idempotent renamed to boostable.
-> Done.
- Give the user control over which reduction inames are
duplicated.
- assert dependencies <= parent_inames in loopy/__init__.py
-> Yes, this must be the case.
-> If you include reduction inames.
- Give a good error message if a parameter assignment in get_problems()
is missing.
- Slab decomposition for ILP
-> I don't think that's possible.
- It is hard to understand error messages that referred to instructions that
are generated during preprocessing.
-> Expose preprocessing to the user so she can inspect the preprocessed
kernel.
- Which variables need to be duplicated for ILP?
-> Only reduction
- implemented_domain may end up being smaller than requested in cse
evaluations--check that!
- Allow prioritization of loops in scheduling.
- Make axpy better.
- Screwy lower bounds in slab decomposition
- Flag, exploit idempotence
- Some things involving CSEs might be impossible to schedule
a[i,j] = cse(b[i]) * cse(c[j])
- Be smarter about automatic local axis choice
-> What if we run out of axes?
- Implement condition hoisting
(needed, e.g., by slab decomposition)
- Check for non-use of hardware axes
- Slab decomposition for parallel dimensions
- implement at the outermost nesting level regardless
- bound *all* tagged inames
- can't slab inames that share tags with other inames (for now)
- Make syntax for iname dependencies
- make syntax for insn dependencies
- CSE iname duplication might be unnecessary?
(don't think so: It might be desired to do a full fetch before a mxm k loop
even if that requires going iterative.)
- Reduction needs to know a neutral element
- Types of reduction variables?
- Generalize reduction to be over multiple variables
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
Should a dependency on an iname be forced in a CSE?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Local var:
l | n
g | y
dl | Err
d | Err
Private var:
l | y
g | y
dl | Err
d | Err
dg: Invalid-> error
d: is duplicate
l: is tagged as local idx
g: is tagged as group idx
Raise error if dl is targeting a private variable, regardless of whether it's
a dependency or not.