Simplify affine.min ops, enabling various other canonicalizations inside the peeled loop body.
affine.min ops such as:
```
map = affine_map<(d0)[s0, s1] -> (s0, -d0 + s1)>
%r = affine.min #affine.min #map(%iv)[%step, %ub]
```
are rewritten them into (in the case the peeled loop):
```
%r = %step
```
To determine how an affine.min op should be rewritten and to prove its correctness, FlatAffineConstraints is utilized.
Differential Revision: https://reviews.llvm.org/D107222
* Rename ids to values in FlatAffineValueConstraints.
* Overall cleanup of comments in FlatAffineConstraints and FlatAffineValueConstraints.
Differential Revision: https://reviews.llvm.org/D107947
* Extract "value" functionality of `FlatAffineConstraints` into a new derived `FlatAffineValueConstraints` class. Current users of `FlatAffineConstraints` can use `FlatAffineValueConstraints` without additional code changes, thus NFC.
* `FlatAffineConstraints` no longer associates dimensions with SSA Values. All functionality that requires this, is moved to `FlatAffineValueConstraints`.
* `FlatAffineConstraints` no longer makes assumptions about where Values associated with dimensions are coming from.
Differential Revision: https://reviews.llvm.org/D107725
Reimplement this function in terms of `composeMatchingMap`.
Also fix a bug in `composeMatchingMap` where local dims of `this` could be missing in `localCst`.
Differential Revision: https://reviews.llvm.org/D107813
This function overload is similar to the existing `FlatAffineConstraints::addLowerOrUpperBound`. It constrains a dimension based on an affine map. However, in contrast to the other overloading, it does not attempt to align dimensions/symbols of the affine map with the dimensions/symbols of the constraint set. Instead, dimensions/symbols are expected to already be aligned.
Differential Revision: https://reviews.llvm.org/D107727
This function aligns an affine map (and operands) with given dims and syms SSA values.
This is useful in conjunction with `FlatAffineConstraints::addLowerOrUpperBound`, which requires the `boundMap` to be aligned with the constraint set's dims and syms.
Differential Revision: https://reviews.llvm.org/D107728
There are cases in which it is not desirable to fully compose the bound map with the operands when adding lower/upper bounds to a `FlatAffineConstraints`.
E.g., this is the case when bounds should be expressed in terms of the operands only (and not the operands' dependencies). This also makes `addLowerOrUpperBound` useable together with operands that are defined through semi-affine expressions.
Differential Revision: https://reviews.llvm.org/D107221
Bounds such as `dim_{pos} <= c_1 * dim_x + ...` where `x == pos` are invalid. `addLowerOrUpperBound` previously added an incorrect inequality to the set. Such cases are now explicitly rejected.
Differential Revision: https://reviews.llvm.org/D107220
This patch fixes a bug in the existing implementation of detectAsFloorDiv,
where floordivs with numerator with non-zero constant term and floordivs with
numerator only consisting of a constant term were not being detected.
Reviewed By: vinayaka-polymage
Differential Revision: https://reviews.llvm.org/D107214
This CL adds a new RegionBranchTerminatorOpInterface to query information about operands that can be
passed to successor regions. Similar to the BranchOpInterface, it allows to freely define the
involved operands. However, in contrast to the BranchOpInterface, it expects an additional region
number to distinguish between various use cases which might require different operands passed to
different regions.
Moreover, we added new utility functions (namely getMutableRegionBranchSuccessorOperands and
getRegionBranchSuccessorOperands) to query (mutable) operand ranges for operations equiped with the
ReturnLike trait and/or implementing the newly added interface. This simplifies reasoning about
terminators in the scope of the nested regions.
We also adjusted the SCF.ConditionOp to benefit from the newly added capabilities.
Differential Revision: https://reviews.llvm.org/D105018
- Rename isLastUse to isDeadAfter to reflect what the function does.
- Avoid a second walk over all operations in BlockInfoBuilder constructor.
- use std::move() to save the new in set.
Differential Revision: https://reviews.llvm.org/D106702
Changes include the following:
1. Single iteration reduction loops being sibling fused at innermost insertion level
are skipped from being considered as sequential loops.
Otherwise, the slice bounds of these loops is reset.
2. Promote loops that are skipped in previous step into outer loops.
3. Two utility function - buildSliceTripCountMap, getSliceIterationCount - are moved from
mlir/lib/Transforms/Utils/LoopFusionUtils.cpp to mlir/lib/Analysis/Utils.cpp
Reviewed By: bondhugula, vinayaka-polymage
Differential Revision: https://reviews.llvm.org/D104249
Fix FlatAffineConstraints::getConstantBoundOnDimSize to ensure that
returned bounds on dim size are always non-negative regardless of the
constraints on that dimension. Add an assertion at the user.
Differential Revision: https://reviews.llvm.org/D105171
Affine scalar replacement (and other affine passes, though not fixed here) don't properly handle operations with nested regions. This patch fixes the pass and two affine utilities to function properly given a non-affine internal region
This patch prevents the pass from throwing an internal compiler error when running on the added test case.
Differential Revision: https://reviews.llvm.org/D105058
This results in significant deduplication of code. This patch is not expected to change any functionality, it's just some simplification in preparation for future work. Also slightly simplified some code that was being touched anyway and added some unit tests for some functions that were touched.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D105152
During slice computation of affine loop fusion, detect one id as the mod
of another id w.r.t a constant in a more generic way. Restrictions on
co-efficients of the ids is removed. Also, information from the
previously calculated ids is used for simplification of affine
expressions, e.g.,
If `id1` = `id2`,
`id_n - divisor * id_q - id_r + id1 - id2 = 0`, is simplified to:
`id_n - divisor * id_q - id_r = 0`.
If `c` is a non-zero integer,
`c*id_n - c*divisor * id_q - c*id_r = 0`, is simplified to:
`id_n - divisor * id_q - id_r = 0`.
Reviewed By: bondhugula, ayzhuang
Differential Revision: https://reviews.llvm.org/D104614
Now that memref supports arbitrary element types, add support for memref of
memref and make sure it is properly converted to the LLVM dialect. The type
support itself avoids adding the interface to the memref type itself similarly
to other built-in types. This allows the shape, and therefore byte size, of the
memref descriptor to remain a lowering aspect that is easier to customize and
evolve as opposed to sanctifying it in the data layout specification for the
memref type itself.
Factor out the code previously in a testing pass to live in a dedicated data
layout analysis and use that analysis in the conversion to compute the
allocation size for memref of memref. Other conversions will be ported
separately.
Depends On D103827
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D103828
This allows for checking if a given operation may modify/reference/or both a given value. Right now this API is limited to Value based memory locations, but we should expand this to include attribute based values at some point. This is left for future work because the rest of the AliasAnalysis API also has this restriction.
Differential Revision: https://reviews.llvm.org/D101673
This it to make more clear the difference between this and
an AliasAnalysis.
For example, given a sequence of subviews that create values
A -> B -> C -> d:
BufferViewFlowAnalysis::resolve(B) => {B, C, D}
AliasAnalysis::resolve(B) => {A, B, C, D}
Differential Revision: https://reviews.llvm.org/D100838
This patch adds support for vectorizing loops with 'iter_args'
implementing known reductions along the vector dimension. Comparing to
the non-vector-dimension case, two additional things are done during
vectorization of such loops:
- The resulting vector returned from the loop is reduced to a scalar
using `vector.reduce`.
- In some cases a mask is applied to the vector yielded at the end of
the loop to prevent garbage values from being written to the
accumulator.
Vectorization of reduction loops is disabled by default. To enable it, a
map from loops to array of reduction descriptors should be explicitly passed to
`vectorizeAffineLoops`, or `vectorize-reductions=true` should be passed
to the SuperVectorize pass.
Current limitations:
- Loops with a non-unit step size are not supported.
- n-D vectorization with n > 1 is not supported.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D100694
We weren't properly visiting region successors when the terminator wasn't return like, which could create incorrect results in the analysis. This revision ensures that we properly visit region successors, to avoid optimistically assuming a value is constant when it isn't.
Differential Revision: https://reviews.llvm.org/D101783
Introduce a basic support for parallelizing affine loops with reductions
expressed using iteration arguments. Affine parallelism detector now has a flag
to assume such reductions are parallel. The transformation handles a subset of
parallel reductions that are can be expressed using affine.parallel:
integer/float addition and multiplication. This requires to detect the
reduction operation since affine.parallel only supports a fixed set of
reduction operators.
Reviewed By: chelini, kumasento, bondhugula
Differential Revision: https://reviews.llvm.org/D101171
Explicitly check for uninitialized to prevent crashes in edge cases where the derived analysis creates a lattice element for a value that hasn't been visited yet.
This revision takes the forward value propagation engine in SCCP and refactors it into a more generalized forward dataflow analysis framework. This framework allows for propagating information about values across the various control flow constructs in MLIR, and removes the need for users to reinvent the traversal (often not as completely). There are a few aspects of the traversal, that were conservative for SCCP, that should be relaxed to support the needs of different value analyses. To keep this revision simple, these conservative behaviors will be left in (Note that this won't produce an incorrect result, but may produce more conservative results than necessary in certain edge cases. e.g. region entry arguments for non-region branch interface operations). The framework also only focuses on computing lattices for values, given the SCCP origins, but this is something to relax as needed in the future.
Given that this logic is already in SCCP, a majority of this commit is NFC. The more interesting parts are the interface glue that clients interact with.
Differential Revision: https://reviews.llvm.org/D100915
Symbols are now supported in the integer emptiness check. Remove some outdated assertions checking that there are no symbols.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D100327
Fixes a bug in affine fusion pipeline where an incorrect slice is computed.
After the slice computation is done, original domain of the the source is
compared with the new domain that will result if the fusion succeeds. If the
new domain must be a subset of the original domain for the slice to be
valid. If the slice computed is incorrect, fusion based on such a slice is
avoided.
Relevant test cases are added/edited.
Fixes https://bugs.llvm.org/show_bug.cgi?id=49203
Differential Revision: https://reviews.llvm.org/D98239
NestedPattern uses a BumpPtrAllocator to store child (nested) pattern
objects to decrease the overhead of dynamic allocation. This assumes all
allocations happen inside the allocator that will be freed as a whole.
However, NestedPattern contains `std::function` as a member, which
allocates internally using `new`, unaware of the BumpPtrAllocator. Since
NestedPattern only holds pointers to the nested patterns allocated in
the BumpPtrAllocator, it never calls their destructors, so the
destructor of the `std::function`s they contain are never called either,
leaking the allocated memory.
Make NestedPattern explicitly call destructors of nested patterns. This
additionally requires to actually copy the nested patterns in
copy-construction and copy-assignment instead of just sharing the
pointer to the arena-allocated list of children to avoid double-free. An
alternative solution would be to add reference counting to the list of
arena-allocated list of children.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98485
This patch extends the Region, Block and Operation visitors to also support pre-order walks.
We introduce a new template argument that dictates the walk order (only pre-order and
post-order are supported for now). The default order for Regions, Blocks and Operations is
post-order. Mixed orders (e.g., Region/Block pre-order + Operation post-order) could easily
be implemented, as shown in NumberOfExecutions.cpp.
Reviewed By: rriddle, frgossen, bondhugula
Differential Revision: https://reviews.llvm.org/D97217
Fix 'isLoopParallel' utility so that 'iter_args' is taken into account
and loops with loop-carried dependences are not classified as parallel.
Reviewed By: tungld, vinayaka-polymage
Differential Revision: https://reviews.llvm.org/D97347
SliceAnalysis originally was developed in the context of affine.for within mlfunc.
It predates the notion of region.
This revision updates it to not hardcode specific ops like scf::ForOp.
When rooted at an op, the behavior of the slice computation changes as it recurses into the regions of the op. This does not support gathering all values transitively depending on a loop induction variable anymore.
Additional variants rooted at a Value are added to also support the existing behavior.
Differential revision: https://reviews.llvm.org/D96702
This revision adds a new `AliasAnalysis` class that represents the main alias analysis interface in MLIR. The purpose of this class is not to hold the aliasing logic itself, but to provide an interface into various different alias analysis implementations. As it evolves this should allow for users to plug in specialized alias analysis implementations for their own needs, and have them immediately usable by other analyses and transformations.
This revision also adds an initial simple generic alias, LocalAliasAnalysis, that provides support for performing stateless local alias queries between values. This class is similar in scope to LLVM's BasicAA.
Differential Revision: https://reviews.llvm.org/D92343
This makes ignoring a result explicit by the user, and helps to prevent accidental errors with dropped results. Marking LogicalResult as no discard was always the intention from the beginning, but got lost along the way.
Differential Revision: https://reviews.llvm.org/D95841
This is the last revision to migrate using SimplePadOp to PadTensorOp, and the
SimplePadOp is removed in the patch. Update a bit in SliceAnalysis because the
PadTensorOp takes a region different from SimplePadOp. This is not covered by
LinalgOp because it is not a structured op.
Also, remove a duplicated comment from cpp file, which is already described in a
header file. And update the pseudo-mlir in the comment.
This is as same as D95615 but fixing one dep in CMakeLists.txt
Different from D95671, the fix was applied to run target.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D95785
This reverts commit d9b953d84b.
This commit resulted in build bot failures and the author is away from a
computer, so I am reverting on their behalf until they have a chance to
look into this.
This is the last revision to migrate using SimplePadOp to PadTensorOp, and the
SimplePadOp is removed in the patch. Update a bit in SliceAnalysis because the
PadTensorOp takes a region different from SimplePadOp. This is not covered by
LinalgOp because it is not a structured op.
Also, remove a duplicated comment from cpp file, which is already described in a
header file. And update the pseudo-mlir in the comment.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D95671