Newer
Older
Documentation Notes
^^^^^^^^^^^^^^^^^^^
- Need to clarify fundamental difference between constants baked into code
and things that remain variable. (ISL parameters, symbolic shapes)
Things to consider
^^^^^^^^^^^^^^^^^^
- Depedencies are pointwise for shared loop dimensions
and global over non-shared ones (between dependent and ancestor)
- multiple insns could fight over which iname gets local axis 0
-> complicated optimization problem
- Every loop in loopy is opened at most once.
Too restrictive?
- Loop bounds currently may not depend on parallel dimensions
Does it make sense to relax this?
- Why do CSEs necessarily have to duplicate the inames?
-> because that would be necessary for a sequential prefetch
- Cannot do slab decomposition on inames that share a tag with
other inames
-> Is that reasonable?
- Parallel dimension splitting/merging via tags
-> unnecessary?
- Not using all hw loop dimensions causes an error, as
is the case for variant 3 in the rank_one test.
- Measure efficiency of corner cases
To-do
^^^^^
- Just touching a variable written to by a non-idempotent
instruction makes that instruction also not idempotent
- assert dependencies <= parent_inames in loopy/__init__.py
???
- user interface for dim length prescription
- Deal with equality constraints.
(These arise, e.g., when partitioning a loop of length 16 into 16s.)
- Better for loop bound generation
-> Try a triangular loop
- Eliminate the first (pre-)barrier in a loop.
- Generate automatic test against sequential code.
- Automatically verify that all array access is within bounds.
- Reason about generated code, give user feedback on potential
improvements.
- Convolutions, Stencils
- DMA engine threads?
- Divisibility, modulo, strides?
- Try, fix indirect addressing
- variable shuffle detection
- Use gists (why do disjoint sets arise?)
- Nested slab decomposition (in conjunction with conditional hoisting) could
generate nested conditional code.
- Give a good error message if a parameter assignment in get_problems()
is missing.
- Slab decomposition for ILP
-> I don't think that's possible.
- It is hard to understand error messages that referred to instructions that
are generated during preprocessing.
-> Expose preprocessing to the user so she can inspect the preprocessed
kernel.
- Which variables need to be duplicated for ILP?
-> Only reduction
- implemented_domain may end up being smaller than requested in cse
evaluations--check that!
- Allow prioritization of loops in scheduling.
- Make axpy better.
- Screwy lower bounds in slab decomposition
- Flag, exploit idempotence
- Some things involving CSEs might be impossible to schedule
a[i,j] = cse(b[i]) * cse(c[j])
- Be smarter about automatic local axis choice
-> What if we run out of axes?
- Implement condition hoisting
(needed, e.g., by slab decomposition)
- Check for non-use of hardware axes
- Slab decomposition for parallel dimensions
- implement at the outermost nesting level regardless
- bound *all* tagged inames
- can't slab inames that share tags with other inames (for now)
- Make syntax for iname dependencies
- make syntax for insn dependencies
- CSE iname duplication might be unnecessary?
(don't think so: It might be desired to do a full fetch before a mxm k loop
even if that requires going iterative.)
- Reduction needs to know a neutral element
- Types of reduction variables?
- Generalize reduction to be over multiple variables
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
Should a dependency on an iname be forced in a CSE?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Local var:
l | n
g | y
dl | Err
d | Err
Private var:
l | y
g | y
dl | Err
d | Err
dg: Invalid-> error
d: is duplicate
l: is tagged as local idx
g: is tagged as group idx
Raise error if dl is targeting a private variable, regardless of whether it's
a dependency or not.