Loopy kernel generation
Although there's a lot of TODOs and FIXMEs, I think this is in a state that it should be looked over to make sure we have a basic agreement over the design.
Overall design:
I imagine code generation will eventually be a multi-stage process. Right now, it is a single pass. Code is generated recursively in CodeGenMapper, using a (partially) mutable CodeGenState. This mapper acts on nodes in the computation graph. Each node updates the kernel, adding the necessary implementation for the node, and returns an ImplementedResult. An ImplementedResult represents a generated value (array expression). This can be either an array, a loopy expression, or substitution rule. You can convert the generated value to a (scalar) loopy expression via the to_loopy_expression method.
One complication is that a "loopy expression" is not a pure expression but involves context (e.g., reduction bounds and dependencies). To handle that, I introduced a LoopyExpressionContext class. The idea is that the caller (who wants to use the expression) calls to_loopy_expression with a context that is populated by the callee, and the caller uses it to figure out how to generate the right code for the loopy expression. I am not sure about this part of the design. I haven't implemented anything that actually makes use of the LoopyExpressionContext yet. I would appreciate suggestions.
There's a second mapper for generating expressions for IndexLambda and the like. This is InlinedExpressionGenMapper. It is mutually recursive with CodeGenMapper. It also takes a LoopyExpressionContext.
What's currently supported:
- generation of Placeholders
- generation of IndexLambdas (as expressions, not arrays yet)
generation of instructions to copy expressions to outputs
What's not supported yet:
- any sort of preprocessing of the graph
- any sort of respect for tags
- any sort of handling of symbolic shapes
Potential controversial things
- How to represent expression context (see
LoopyExpressionContext). Also, what sort of context is needed for generating loopy expressions. I added a node type for named output arguments (Output).- Graph transformations (see also #4).
The modulepytato.transformadds a copy transformation which I needed to supportOutput. I imagine this transformation will serve as a template for others, so we should decide on how to express these.
Other notable changes:
- Binary operators in
IndexLambda. This requires a policy on shape equality (see #3). - Made
Namespaceinherit fromMapping. - Changed imports to respect PEP8 order (I think). I.e., system imports, third party imports, then local imports.
-
Type stubs for(Type stubs are now in pytools.)pytools. The stub formemoize_methodis necessary, otherwise Mypy complains. The other stubs are nice to have. - Implemented hashing and equality for
Array.This code is somewhat repetitive, it would be nice if it were not. shapeanddtypeare now attributes stored inArray, to avoid repetitive code.
Closes #7 (closed)