Snippets Groups Projects

Forked from Andreas Klöckner / loopy

5659 commits behind the upstream repository.

11 years ago
3c173a0a

Add set_loop_priority, remove check_kernels, auto-schedule in auto_test · 3c173a0a
Andreas Klöckner authored 11 years ago

3c173a0a

History

Add set_loop_priority, remove check_kernels, auto-schedule in auto_test
Andreas Klöckner authored 11 years ago

reference.rst 6.18 KiB

Reference Guide

This guide defines all functionality exposed by loopy. If you would like a more gentle introduction, you may consider reading the example-based guide :ref:`guide` instead.

Inames

Loops are (by default) entered exactly once. This is necessary to preserve depdency semantics--otherwise e.g. a fetch could happen inside one loop nest, and then the instruction using that fetch could be inside a wholly different loop nest.

Integer Domain

Expressions

if
reductions
- duplication of reduction inames
complex-valued arithmetic
tagging of array access and substitution rule use ("$")

Assignments and Substitution Rules

Syntax of an instruction:

label: [i,j|k,l] <float32> lhs[i,j,k] = EXPRESSION : dep_label, dep_label_2

The above example illustrates all the parts that are allowed in loo.py's instruction syntax. All of these except for lhs and EXPRESSION are optional.

label is a unique identifier for this instruction, enabling you to refer back to the instruction uniquely during further transformation as well as specifying ordering dependencies.
dep_label,dep_label_2 are dependencies of the current instruction. Loo.py will enforce that the instructions marked with these labels are scheduled before this instruction.
<float32> declares lhs as a temporary variable, with shape given by the ranges of the lhs subscripts. (Note that in this case, the lhs subscripts must be pure inames, not expressions, for now.) Instead of a concrete type, an empty set of angle brackets <> may be given to indicate that type inference should figure out the type of the temporary.
[i,j|k,l] specifies the inames within which this instruction is run. Independent copies of the inames k and l will be made for this instruction.

Syntax of an substitution rule:

rule_name(arg1, arg2) := EXPRESSION

Tags

Tag	Meaning
None \| "for"	Sequential loop
"l.N"	Local (intra-group) axis N
"l.auto"	Automatically chosen local (intra-group) axis
"g.N"	Group-number axis N
"unr"	Plain unrolling
"ilp" \| "ilp.unr"	Unroll using instruction-level parallelism
"ilp.seq"	Realize parallel iname as innermost loop

(Throughout this table, N must be replaced by an actual number.)

"ILP" does three things:

Restricts loops to be innermost
Duplicates reduction storage for any reductions nested around ILP usage
Causes a loop (unrolled or not) to be opened/generated for each involved instruction

Automatic Axis Assignment

Automatic local axes are chosen as follows:

For each instruction containing "l.auto" inames:
1. Find the lowest-numbered unused axis. If none exists,
  
  use sequential unrolling instead.
2. Find the iname that has the smallest stride in any global
  
  array access occurring in the instruction.
3. Assign the low-stride iname to the available axis, splitting
  
  the iname if it is too long for the available axis size.

If you need different behavior, use :func:`tag_inames` and :func:`split_iname` to change the assignment of "l.auto" axes manually.

Creating Kernels

Arguments

String Syntax

Substitution rules
Instructions

Kernels

Do not create :class:`LoopKernel` objects directly. Instead, use the following function, which takes the same arguments, but does some extra post-processing.

Transforming Kernels

Matching contexts

Wrangling inames

Dealing with Substitution Rules

Caching, Precomputation and Prefetching

Influencing data access

Padding

Manipulating Instructions

Argument types

Finishing up

Automatic Testing

Troubleshooting

Printing :class:`LoopKernel` objects

If you're confused about things loopy is referring to in an error message or about the current state of the :class:`LoopKernel` you are transforming, the following always works:

print kernel

(And it yields a human-readable--albeit terse--representation of kernel.)