This CL is the 6th and last on the path to simplifying AffineMap composition.
This removes `AffineValueMap::forwardSubstitutions` and replaces it by simple
calls to `fullyComposeAffineMapAndOperands`.
PiperOrigin-RevId: 228962580
clients. Let's re-add it in the future if there is ever a reason to. NFC.
Unrelatedly, add a use of a variable to unbreak the non-assert build.
PiperOrigin-RevId: 228284026
- refactor toAffineFromEq and the code surrounding it; refactor code into
FlatAffineConstraints::getSliceBounds
- add FlatAffineConstraints methods to detect identifiers as mod's and div's of other
identifiers
- add FlatAffineConstraints::getConstantLower/UpperBound
- Address b/122118218 (don't assert on invalid fusion depths cmdline flags -
instead, don't do anything; change cmdline flags
src-loop-depth -> fusion-src-loop-depth
- AffineExpr/Map print method update: don't fail on null instances (since we have
a wrapper around a pointer, it's avoidable); rationale: dump/print methods should
never fail if possible.
- Update memref-dataflow-opt to add an optimization to avoid a unnecessary call to
IsRangeOneToOne when it's trivially going to be true.
- Add additional test cases to exercise the new support
- update a few existing test cases since the maps are now generated uniformly with
all destination loop operands appearing for the backward slice
- Fix projectOut - fix wrong range for getBestElimCandidate.
- Fix for getConstantBoundOnDimSize() - didn't show up in any test cases since
we didn't have any non-hyperrectangular ones.
PiperOrigin-RevId: 228265152
- when SSAValue/MLValue existed, code at several places was forced to create additional
aggregate temporaries of SmallVector<SSAValue/MLValue> to handle the conversion; get
rid of such redundant code
- use filling ctors instead of explicit loops
- for smallvectors, change insert(list.end(), ...) -> append(...
- improve comments at various places
- turn getMemRefAccess into MemRefAccess ctor and drop duplicated
getMemRefAccess. In the next CL, provide getAccess() accessors for load,
store, DMA op's to return a MemRefAccess.
PiperOrigin-RevId: 228243638
- this is CL 1/2 that does a clean up and gets rid of one limitation in an
underlying method - as a result, fusion works for more cases.
- fix bugs/incomplete impl. in toAffineMapFromEq
- fusing across rank changing reshapes for example now just works
For eg. given a rank 1 memref to rank 2 memref reshape (64 -> 8 x 8) like this,
-loop-fusion -memref-dataflow-opt now completely fuses and inlines/store-forward
to get rid of the temporary:
INPUT
// Rank 1 -> Rank 2 reshape
for %i0 = 0 to 64 {
%v = load %A[%i0]
store %v, %B[%i0 floordiv 8, i0 mod 8]
}
for %i1 = 0 to 8
for %i2 = 0 to 8
%w = load %B[%i1, i2]
"foo"(%w) : (f32) -> ()
OUTPUT
$ mlir-opt -loop-fusion -memref-dataflow-opt fuse_reshape.mlir
#map0 = (d0, d1) -> (d0 * 8 + d1)
mlfunc @fuse_reshape(%arg0: memref<64xf32>) {
for %i0 = 0 to 8 {
for %i1 = 0 to 8 {
%0 = affine_apply #map0(%i0, %i1)
%1 = load %arg0[%0] : memref<64xf32>
"foo"(%1) : (f32) -> ()
}
}
}
AFAIK, there is no polyhedral tool / compiler that can perform such fusion -
because it's not really standard loop fusion, but possible through a
generalized slicing-based approach such as ours.
PiperOrigin-RevId: 227918338
- introduce PostDominanceInfo in the right/complete way and use that for post
dominance check in store-load forwarding
- replace all uses of Analysis/Utils::dominates/properlyDominates with
DominanceInfo::dominates/properlyDominates
- drop all redundant copies of dominance methods in Analysis/Utils/
- in pipeline-data-transfer, replace dominates call with a much less expensive
check; similarly, substitute dominates() in checkMemRefAccessDependence with
a simpler check suitable for that context
- fix a bug in properlyDominates
- improve doc for 'for' instruction 'body'
PiperOrigin-RevId: 227320507
- the load/store forwarding relies on memref dependence routines as well as
SSA/dominance to identify the memref store instance uniquely supplying a value
to a memref load, and replaces the result of that load with the value being
stored. The memref is also deleted when possible if only stores remain.
- add methods for post dominance for MLFunction blocks.
- remove duplicated getLoopDepth/getNestingDepth - move getNestingDepth,
getMemRefAccess, getNumCommonSurroundingLoops into Analysis/Utils (were
earlier static)
- add a helper method in FlatAffineConstraints - isRangeOneToOne.
PiperOrigin-RevId: 227252907
requires enhancing DominanceInfo to handle the structure of an ML function,
which is required anyway. Along the way, this also fixes a const correctness
problem with Instruction::getBlock().
This is step 24/n towards merging instructions and statements.
PiperOrigin-RevId: 227228900
consistent and moving the using declarations over. Hopefully this is the last
truly massive patch in this refactoring.
This is step 21/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227178245
The last major renaming is Statement -> Instruction, which is why Statement and
Stmt still appears in various places.
This is step 19/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227163082
StmtResult -> InstResult, StmtOperand -> InstOperand, and remove the old names.
This is step 17/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227121537
FuncBuilder class. Also rename SSAValue.cpp to Value.cpp
This is step 12/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227067644
is the new base of the SSA value hierarchy. This CL also standardizes all the
nomenclature and comments to use 'Value' where appropriate. This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing.
This is step 11/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227064624
making it more similar to the CFG side of things. It is true that in a deeply
nested case that this is not a guaranteed O(1) time operation, and that 'get'
could lead compiler hackers to think this is cheap, but we need to merge these
and we can look into solutions for this in the future if it becomes a problem
in practice.
This is step 9/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226983931
StmtBlock. This is more consistent with IfStmt and also conceptually makes
more sense - a forstmt "isn't" its body, it contains its body.
This is step 1/N towards merging BasicBlock and StmtBlock. This is required
because in the new regime StmtBlock will have a use list (just like BasicBlock
does) of operands, and ForStmt already has a use list for its induction
variable.
This is a mechanical patch, NFC.
PiperOrigin-RevId: 226684158
reuse existing ones.
- drop IterationDomainContext, redundant since FlatAffineConstraints has
MLValue information associated with its dimensions.
- refactor to use existing support
- leads to a reduction in LOC
- as a result of these changes, non-constant loop bounds get naturally
supported for dep analysis.
- update test cases to include a couple with non-constant loop bounds
- rename addBoundsFromForStmt -> addForStmtDomain
- complete TODO for getLoopIVs (handle 'if' statements)
PiperOrigin-RevId: 226082008
- when adding constraints from a 'for' stmt into FlatAffineConstraints,
correctly add bound operands of the 'for' stmt as a dimensional identifier or
a symbolic identifier depending on whether the bound operand is a valid
MLFunction symbol
- update test case to exercise this.
PiperOrigin-RevId: 225988511
As MLIR moves towards dialect-specific types, a generic Type::getBitWidth does
not make sense for all of them. Even with the current type system, the bit
width is not defined (and causes the method in question to abort) for all
TensorFlow types.
This commit restricts the bit width definition to primitive standard types that
have a number of bits appearing verbatim in their type, i.e., integers and
floats. As a side effect, it delegates the decision on the bit width of the
`index` to the backends. Existing backends currently hardcode it to 64 bits.
The Type::getBitWidth method is replaced by Type::getIntOrFloatBitWidth that
only applies to integers and floats. The call sites are updated to use the new
method, where applicable, or rewritten so as not rely on it. Incidentally,
this fixes a utility method that did not account for memrefs being allowed to
have vectors as element types in the size computation.
As an observation, several places in the code use Type in places where a more
specific type could be used instead. Some of those are fixed by this commit.
PiperOrigin-RevId: 225844792
*) Adds simple greedy fusion algorithm to drive experimentation. This algorithm greedily fuses loop nests with single-writer/single-reader memref dependences to improve locality.
*) Adds support for fusing slices of a loop nest computation: fusing one loop nest into another by adjusting the source loop nest's iteration bounds (after it is fused into the destination loop nest). This is accomplished by solving for the source loop nest's IVs in terms of the destination loop nests IVs and symbols using the dependece polyhedron, then creating AffineMaps of these functions for the loop bounds of the fused source loop.
*) Adds utility function 'insertMemRefComputationSlice' which computes and inserts computation slice from loop nest surrounding a source memref access into the loop nest surrounding the destingation memref access.
*) Adds FlatAffineConstraints::toAffineMap function which returns and AffineMap which represents an equality contraint where one dimension identifier is represented as a function of all others in the equality constraint.
*) Adds multiple fusion unit tests.
PiperOrigin-RevId: 225842944
- use addBoundsForForStmt
- getLoopIVs can return a vector of ForStmt * instead of const ForStmt *; the
returned things aren't owned / part of the stmt on which it's being called.
- other minor API cleanup
PiperOrigin-RevId: 225774301
- extend memref-bound-check to store op's
- make the bound check an analysis util and move to lib/Analysis/Utils.cpp (so that
one doesn't need to always create a pass to use it)
PiperOrigin-RevId: 225564830
- add method normalizeConstraintsByGCD
- call normalizeConstraintsByGCD() and GCDTightenInequalities() at the end of
projectOut.
- remove call to GCDTightenInequalities() from getMemRefRegion
- change isEmpty() to check isEmptyByGCDTest() / hasInvalidConstraint() each
time an identifier is eliminated (to detect emptiness early).
- make FourierMotzkinEliminate, gaussianEliminateId(s),
GCDTightenInequalities() private
- improve / update stale comments
PiperOrigin-RevId: 224866741
- generate DMAs correctly now using strided DMAs where needed
- add support for multi-level/nested strides; op still supports one level of
stride for now.
Other things
- add test case for symbolic lower/upper bound; cases where the DMA buffer
size can't be bounded by a known constant
- add test case for dynamic shapes where the DMA buffers are however bounded by
constants
- refactor some of the '-dma-generate' code
PiperOrigin-RevId: 224584529
update/improve/clean up API.
- update FlatAffineConstraints::getConstBoundDifference; return constant
differences between symbolic affine expressions, look at equalities as well.
- fix buffer size computation when generating DMAs symbolic in outer loops,
correctly handle symbols at various places (affine access maps, loop bounds,
loop IVs outer to the depth at which DMA generation is being done)
- bug fixes / complete some TODOs for getMemRefRegion
- refactor common code b/w memref dependence check and getMemRefRegion
- FlatAffineConstraints API update; added methods employ trivial checks /
detection - sufficient to handle hyper-rectangular cases in a precise way
while being fast / low complexity. Hyper-rectangular cases fall out as
trivial cases for these methods while other cases still do not cause failure
(either return conservative or return failure that is handled by the caller).
PiperOrigin-RevId: 224229879
FlatAffineConstraints::composeMap: should return false instead of asserting on
a semi-affine map. Make getMemRefRegion just propagate false when encountering
semi-affine maps (instead of crashing!)
PiperOrigin-RevId: 223828743
and getMemRefRegion() to work with specified loop depths; add support for
outgoing DMAs, store op's.
- add support for getMemRefRegion symbolic in outer loops - hence support for
DMAs symbolic in outer surrounding loops.
- add DMA generation support for outgoing DMAs (store op's to lower memory
space); extend getMemoryRegion to store op's. -memref-bound-check now works
with store op's as well.
- fix dma-generate (references to the old memref in the dma_start op were also
being replaced with the new buffer); we need replace all memref uses to work
only on a subset of the uses - add a new optional argument for
replaceAllMemRefUsesWith. update replaceAllMemRefUsesWith to take an optional
'operation' argument to serve as a filter - if provided, only those uses that
are dominated by the filter are replaced.
- Add missing print for attributes for dma_start, dma_wait op's.
- update the FlatAffineConstraints API
PiperOrigin-RevId: 221889223
- constant bounded memory regions, static shapes, no handling of
overlapping/duplicate regions (through union) for now; also only, load memory
op's.
- add build methods for DmaStartOp, DmaWaitOp.
- move getMemoryRegion() into Analysis/Utils and expose it.
- fix addIndexSet, getMemoryRegion() post switch to exclusive upper bounds;
update test cases for memref-bound-check and memref-dependence-check for
exclusive bounds (missed in a previous CL)
PiperOrigin-RevId: 220729810
multiple TODOs.
- replace the fake test pass (that worked on just the first loop in the
MLFunction) to perform DMA pipelining on all suitable loops.
- nested DMAs work now (DMAs in an outer loop, more DMAs in nested inner loops)
- fix bugs / assumptions: correctly copy memory space and elemental type of source
memref for double buffering.
- correctly identify matching start/finish statements, handle multiple DMAs per
loop.
- introduce dominates/properlyDominates utitilies for MLFunction statements.
- move checkDominancePreservationOnShifts to LoopAnalysis.h; rename it
getShiftValidity
- refactor getContainingStmtPos -> findAncestorStmtInBlock - move into
Analysis/Utils.h; has two users.
- other improvements / cleanup for related API/utilities
- add size argument to dma_wait - for nested DMAs or in general, it makes it
easy to obtain the size to use when lowering the dma_wait since we wouldn't
want to identify the matching dma_start, and more importantly, in general/in the
future, there may not always be a dma_start dominating the dma_wait.
- add debug information in the pass
PiperOrigin-RevId: 217734892