Skip to content

Loopy kernel generation

Matt Wala requested to merge codegen into master

Although there's a lot of TODOs and FIXMEs, I think this is in a state that it should be looked over to make sure we have a basic agreement over the design.

Overall design:

I imagine code generation will eventually be a multi-stage process. Right now, it is a single pass. Code is generated recursively in CodeGenMapper, using a (partially) mutable CodeGenState. This mapper acts on nodes in the computation graph. Each node updates the kernel, adding the necessary implementation for the node, and returns an ImplementedResult. An ImplementedResult represents a generated value (array expression). This can be either an array, a loopy expression, or substitution rule. You can convert the generated value to a (scalar) loopy expression via the to_loopy_expression method.

One complication is that a "loopy expression" is not a pure expression but involves context (e.g., reduction bounds and dependencies). To handle that, I introduced a LoopyExpressionContext class. The idea is that the caller (who wants to use the expression) calls to_loopy_expression with a context that is populated by the callee, and the caller uses it to figure out how to generate the right code for the loopy expression. I am not sure about this part of the design. I haven't implemented anything that actually makes use of the LoopyExpressionContext yet. I would appreciate suggestions.

There's a second mapper for generating expressions for IndexLambda and the like. This is InlinedExpressionGenMapper. It is mutually recursive with CodeGenMapper. It also takes a LoopyExpressionContext.

What's currently supported:

  • generation of Placeholders
  • generation of IndexLambdas (as expressions, not arrays yet)
  • generation of instructions to copy expressions to outputs

What's not supported yet:

  • any sort of preprocessing of the graph
  • any sort of respect for tags
  • any sort of handling of symbolic shapes

Potential controversial things 🔪

  • How to represent expression context (see LoopyExpressionContext). Also, what sort of context is needed for generating loopy expressions.
  • I added a node type for named output arguments (Output).
  • Graph transformations (see also #4). The module pytato.transform adds a copy transformation which I needed to support Output. I imagine this transformation will serve as a template for others, so we should decide on how to express these.

Other notable changes:

  • Binary operators in IndexLambda. This requires a policy on shape equality (see #3).
  • Made Namespace inherit from Mapping.
  • Changed imports to respect PEP8 order (I think). I.e., system imports, third party imports, then local imports.
  • Type stubs for pytools. The stub for memoize_method is necessary, otherwise Mypy complains. The other stubs are nice to have. (Type stubs are now in pytools.)
  • Implemented hashing and equality for Array. This code is somewhat repetitive, it would be nice if it were not.
  • shape and dtype are now attributes stored in Array, to avoid repetitive code.

Closes #7 (closed)

Edited by Matt Wala

Merge request reports

Loading