You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a function with the following specification: program = precompute(program, expressionPattern, computeAtIndex, storeAtIndex, storageType, storagePermutation)
This function finds all instances of expressionPattern below a loop level specified by the computeAtIndex. The expressionPattern should roughly be a pure function of the expressions it depends on. We the hoist the captured expressionPatterns and any intermediates used to compute them. If we go through wrapper arrays, we need to analyze them to deduce hoisted loop structure. However, we don't plan to do much bounds analysis beyond that - instead, we will lower the hoisted computation in a manner similar to which we would lower the original loop - we hope that the pre-existing infrastructure can simplify that to avoid too much over computing. We might need special storage types to avoid over-computing/allocating due needing to store something in the case of a dense array.
The storage for the hoisted computation is allocated right above storeAtIndex, which must be at or above the computeAtIndex. The storageType must have a dimension for every index below storeAtIndex. ThestoragePermutation, by default the identity, determines how hoisted indicies are mapped to the storage. The access pattern should be concordant with the storage - we don't want to introduce format conversions via this mechanism.
Introduce a transformation that fuses two loops where one is directly nested below the other. There are a few different strategies for how we could do this:
By enabling wrapper arrays for mod and div - this could be independently useful.
By adding reshape, stretch, and repeat wrapper arrays - reshape is independently useful. See
#151
By using a symbolic tensor, F[i0, i1, f] which is True only when f would be the fused index of i0 and i1 - this is difficult due to the current lowering order.
Reorder
Loop reordering is very possible, but it's unclear if we want to similarly transpose corresponding inputs.
Split
Split a loop using a chunkmask
The text was updated successfully, but these errors were encountered:
Precompute
Add a function with the following specification:
program = precompute(program, expressionPattern, computeAtIndex, storeAtIndex, storageType, storagePermutation)
This function finds all instances of
expressionPattern
below a loop level specified by thecomputeAtIndex
. TheexpressionPattern
should roughly be a pure function of the expressions it depends on. We the hoist the capturedexpressionPattern
s and any intermediates used to compute them. If we go through wrapper arrays, we need to analyze them to deduce hoisted loop structure. However, we don't plan to do much bounds analysis beyond that - instead, we will lower the hoisted computation in a manner similar to which we would lower the original loop - we hope that the pre-existing infrastructure can simplify that to avoid too much over computing. We might need special storage types to avoid over-computing/allocating due needing to store something in the case of a dense array.The storage for the hoisted computation is allocated right above
storeAtIndex
, which must be at or above thecomputeAtIndex
. ThestorageType
must have a dimension for every index belowstoreAtIndex
. ThestoragePermutation
, by default the identity, determines how hoisted indicies are mapped to the storage. The access pattern should be concordant with the storage - we don't want to introduce format conversions via this mechanism.Fuse
(previously #155)
Introduce a transformation that fuses two loops where one is directly nested below the other. There are a few different strategies for how we could do this:
#151
By using a symbolic tensor, F[i0, i1, f] which is True only when f would be the fused index of i0 and i1 - this is difficult due to the current lowering order.
Reorder
Loop reordering is very possible, but it's unclear if we want to similarly transpose corresponding inputs.
Split
Split a loop using a
chunkmask
The text was updated successfully, but these errors were encountered: