Loopy kernel generation
Although there's a lot of TODOs and FIXMEs, I think this is in a state that it should be looked over to make sure we have a basic agreement over the design.
Overall design:
I imagine code generation will eventually be a multi-stage process. Right now, it is a single pass. Code is generated recursively in CodeGenMapper
, using a (partially) mutable CodeGenState
. This mapper acts on nodes in the computation graph. Each node updates the kernel, adding the necessary implementation for the node, and returns an ImplementedResult
. An ImplementedResult
represents a generated value (array expression). This can be either an array, a loopy expression, or substitution rule. You can convert the generated value to a (scalar) loopy expression via the to_loopy_expression
method.
One complication is that a "loopy expression" is not a pure expression but involves context (e.g., reduction bounds and dependencies). To handle that, I introduced a LoopyExpressionContext
class. The idea is that the caller (who wants to use the expression) calls to_loopy_expression
with a context that is populated by the callee, and the caller uses it to figure out how to generate the right code for the loopy expression. I am not sure about this part of the design. I haven't implemented anything that actually makes use of the LoopyExpressionContext
yet. I would appreciate suggestions.
There's a second mapper for generating expressions for IndexLambda
and the like. This is InlinedExpressionGenMapper
. It is mutually recursive with CodeGenMapper
. It also takes a LoopyExpressionContext
.
What's currently supported:
- generation of Placeholders
- generation of IndexLambdas (as expressions, not arrays yet)
generation of instructions to copy expressions to outputs
What's not supported yet:
- any sort of preprocessing of the graph
- any sort of respect for tags
- any sort of handling of symbolic shapes
Potential controversial things
- How to represent expression context (see
LoopyExpressionContext
). Also, what sort of context is needed for generating loopy expressions. I added a node type for named output arguments (Output
).- Graph transformations (see also #4).
The modulepytato.transform
adds a copy transformation which I needed to supportOutput
. I imagine this transformation will serve as a template for others, so we should decide on how to express these.
Other notable changes:
- Binary operators in
IndexLambda
. This requires a policy on shape equality (see #3). - Made
Namespace
inherit fromMapping
. - Changed imports to respect PEP8 order (I think). I.e., system imports, third party imports, then local imports.
-
Type stubs for(Type stubs are now in pytools.)pytools
. The stub formemoize_method
is necessary, otherwise Mypy complains. The other stubs are nice to have. - Implemented hashing and equality for
Array
.This code is somewhat repetitive, it would be nice if it were not. shape
anddtype
are now attributes stored inArray
, to avoid repetitive code.
Closes #7 (closed)