-
Andreas Klöckner authoredAndreas Klöckner authored
Reference Guide
This guide defines all functionality exposed by loopy. If you would like a more gentle introduction, you may consider reading the example-based guide :ref:`guide` instead.
Inames
Loops are (by default) entered exactly once. This is necessary to preserve depdency semantics--otherwise e.g. a fetch could happen inside one loop nest, and then the instruction using that fetch could be inside a wholly different loop nest.
Integer Domain
Expressions
- if
-
- reductions
-
- duplication of reduction inames
- complex-valued arithmetic
- tagging of array access and substitution rule use ("$")
Specifying Types
:mod:`loopy` uses the same type system as :mod:`numpy`. (See :class:`numpy.dtype`) It also uses :mod:`pyopencl` for a registry of user-defined types and their C equivalents. See :func:`pyopencl.tools.get_or_register_dtype` and related functions.
For a string representation of types, all numpy types (e.g. float32
etc.)
are accepted, in addition to what is registered in :mod:`pyopencl`.
Tags
Tag | Meaning |
---|---|
None | "for" | Sequential loop |
"l.N" | Local (intra-group) axis N |
"l.auto" | Automatically chosen local (intra-group) axis |
"g.N" | Group-number axis N |
"unr" | Plain unrolling |
"ilp" | "ilp.unr" | Unroll using instruction-level parallelism |
"ilp.seq" | Realize parallel iname as innermost loop |
(Throughout this table, N must be replaced by an actual number.)
"ILP" does three things:
- Restricts loops to be innermost
- Duplicates reduction storage for any reductions nested around ILP usage
- Causes a loop (unrolled or not) to be opened/generated for each involved instruction
Automatic Axis Assignment
Automatic local axes are chosen as follows:
-
- For each instruction containing "l.auto" inames:
-
-
- Find the lowest-numbered unused axis. If none exists,
- use sequential unrolling instead.
-
- Find the iname that has the smallest stride in any global
- array access occurring in the instruction.
-
- Assign the low-stride iname to the available axis, splitting
- the iname if it is too long for the available axis size.
-
If you need different behavior, use :func:`tag_inames` and :func:`split_iname` to change the assignment of "l.auto" axes manually.
Creating Kernels
Arguments
Temporary Variables
Temporary variables model OpenCL's private
and local
address spaces. Both
have the lifetime of a kernel invocation.
Instructions
Assignments
The general syntax of an instruction is a simple assignment:
LHS[i,j,k] = EXPRESSION
Several extensions of this syntax are defined, as discussed below. They may be combined freely.
You can also use an instruction to declare a new temporary variable. (See
:ref:`temporaries`.) See :ref:`types` for what types are acceptable. If the
LHS
has a subscript, bounds on the indices are inferred (which must be
constants at the time of kernel creation) and the declared temporary is
created as an array. Instructions declaring temporaries have the following
form:
<temp_var_type> LHS[i,j,k] = EXPRESSION
You can also create a temporary and ask loopy to determine its type automatically. This uses the following syntax:
<> LHS[i,j,k] = EXPRESSION
Lastly, each instruction may optionally have a number of attributes specified, using the following format:
LHS[i,j,k] = EXPRESSION {attr1,attr2=value1:value2}
These are usually key-value pairs. The following attributes are recognized:
-
id=value
sets the instruction's identifier tovalue
.value
must be unique within the kernel. This identifier is used to refer to the instruction after it has been created, such as fromdep
attributes (see below) or from :mod:`context matches <loopy.context_matching>`. -
id_prefix=value
also sets the instruction's identifier, however uniqueness is ensured by loopy itself, by appending further components (often numbers) to the givenid_prefix
. -
inames=i:j:k
forces the instruction to reside within the loops over :ref:`inames`i
,j
andk
(and only those).Note
The default for the inames that the instruction depends on is the inames used in the instruction itself plus the common subset of inames shared by writers of all variables read by the instruction.
You can add a plus sign ("
+
") to the front of this option value to indicate that you would like the inames you specify here to be in addition to the ones found by the heuristic described above. -
dep=id1:id2
creates a dependency of this instruction on the instructions with identifiersid1
andid2
. This requires that the code generated for this instruction appears textually after both of these instructions' generated code.Identifiers here are allowed to be wildcards as defined by the Python module :mod:`fnmatchcase`.
Note
If this is not specified, :mod:`loopy` will automatically add depdencies of reading instructions on writing instructions if and only if there is exactly one writing instruction for the written variable (temporary or argument).
-
priority=integer
sets the instructions priority to the valueinteger
. Instructions with higher priority will be scheduled sooner, if possible. Note that the scheduler may still schedule a lower-priority instruction ahead of a higher-priority one if loop orders or dependencies require it. -
if=variable1:variable2
Only execute this instruction if all condition variables (which must be scalar variables) evaluate totrue
(as defined by C).
C Block Instructions
Substitution Rules
Syntax of a substitution rule:
rule_name(arg1, arg2) := EXPRESSION
Kernels
Do not create :class:`LoopKernel` objects directly. Instead, use the following function, which is responsible for creating kernels:
Transforming Kernels
Matching contexts
Wrangling inames
Dealing with Parameters
Dealing with Substitution Rules
Caching, Precomputation and Prefetching
Influencing data access
Padding
Manipulating Instructions
Library interface
Argument types
Finishing up
Running
Automatic Testing
Troubleshooting
Printing :class:`LoopKernel` objects
If you're confused about things loopy is referring to in an error message or about the current state of the :class:`LoopKernel` you are transforming, the following always works:
print kernel
(And it yields a human-readable--albeit terse--representation of kernel.)