Skip to content
Snippets Groups Projects
Unverified Commit ec98289c authored by Matt Smith's avatar Matt Smith Committed by GitHub
Browse files

Avoid double caching in mappers that derive from `CachedMapper` (#585)

* refactor deduplicate_data_wrappers to avoid dependence on erroneous super().rec usage in CachedMapAndCopyMapper

Here is a sketch of what happens with super().rec vs Mapper.rec for the previous implementation of
deduplicate_data_wrappers. Suppose we have 2 data wrappers a and b with the same data pointer.

With super().rec:

1) map_fn maps a to itself, then mapper copies a to a'; mapper caches a -> a' (twice, once in
   super().rec and then again in rec),
2) map_fn maps b to a, then mapper maps (via cache in super().rec call) a to a'; mapper caches
   b -> a'.

=> Only a' in output DAG.

With Mapper.rec:

1) map_fn maps a to itself, then mapper copies a to a'; caches a -> a',
2) map_fn maps b to a, then mapper copies a to a''; caches b -> a''.

=> Both a' and a'' in output DAG.

* call Mapper.rec instead of super().rec to avoid double caching

* call Mapper.rec from CachedMapper too just to avoid copy/paste errors

* add assertion to check for double caching

* add comment explaining use of Mapper.rec
parent dec22c10
No related branches found
No related tags found
Loading
Checking pipeline status