This changes the behavior of constructing MLIRContext to no longer load globally registered dialects on construction. Instead Dialects are only loaded explicitly on demand:
- the Parser is lazily loading Dialects in the context as it encounters them during parsing. This is the only purpose for registering dialects and not load them in the context.
- Passes are expected to declare the dialects they will create entity from (Operations, Attributes, or Types), and the PassManager is loading Dialects into the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only need to load the dialect for the IR it will emit, and the optimizer is self-contained and load the required Dialects. For example in the Toy tutorial, the compiler only needs to load the Toy dialect in the Context, all the others (linalg, affine, std, LLVM, ...) are automatically loaded depending on the optimization pipeline enabled.
Differential Revision: https://reviews.llvm.org/D85622
This changes the behavior of constructing MLIRContext to no longer load globally registered dialects on construction. Instead Dialects are only loaded explicitly on demand:
- the Parser is lazily loading Dialects in the context as it encounters them during parsing. This is the only purpose for registering dialects and not load them in the context.
- Passes are expected to declare the dialects they will create entity from (Operations, Attributes, or Types), and the PassManager is loading Dialects into the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only need to load the dialect for the IR it will emit, and the optimizer is self-contained and load the required Dialects. For example in the Toy tutorial, the compiler only needs to load the Toy dialect in the Context, all the others (linalg, affine, std, LLVM, ...) are automatically loaded depending on the optimization pipeline enabled.
Linalg to processors.
This changes adds infrastructure to distribute the loops generated in
Linalg to processors at the time of generation. This addresses use
case where the instantiation of loop is done just to distribute
them. The option to distribute is added to TilingOptions for now and
will allow specifying the distribution as a transformation option,
just like tiling and promotion are specified as options.
Differential Revision: https://reviews.llvm.org/D85147
This revision adds a folding pattern to replace affine.min ops by the actual min value, when it can be determined statically from the strides and bounds of enclosing scf loop .
This matches the type of expressions that Linalg produces during tiling and simplifies boundary checks. For now Linalg depends both on Affine and SCF but they do not depend on each other, so the pattern is added there.
In the future this will move to a more appropriate place when it is determined.
The canonicalization of AffineMinOp operations in the context of enclosing scf.for and scf.parallel proceeds by:
1. building an affine map where uses of the induction variable of a loop
are replaced by `%lb + %step * floordiv(%iv - %lb, %step)` expressions.
2. checking if any of the results of this affine map divides all the other
results (in which case it is also guaranteed to be the min).
3. replacing the AffineMinOp by the result of (2).
The algorithm is functional in simple parametric tiling cases by using semi-affine maps. However simplifications of such semi-affine maps are not yet available and the canonicalization does not succeed yet.
Differential Revision: https://reviews.llvm.org/D82009
Replaced definition of named ND ConvOps with tensor comprehension
syntax which reduces boilerplate code significantly. Furthermore,
new ops to support TF convolutions added (without strides and dilations).
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D84628
This commit is part of a greater project which aims to add
full end-to-end support for convolutions inside mlir. The
reason behind having conv ops for each rank rather than
having one generic ConvOp is to enable better optimizations
for every N-D case which reflects memory layout of input/kernel
buffers better and simplifies code as well. We expect plain linalg.conv
to be progressively retired.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D83879
- Add getArgumentTypes() to Region (missed from before)
- Adopt Region argument API in `hasMultiplyAddBody`
- Fix 2 typos in comments
Differential Revision: https://reviews.llvm.org/D84807
- replace DotOp, now that DRR rules have been dropped.
- Capture arguments mismatch in the parser. The number of parsed arguments must
equal the number of expected arguments.
Reviewed By: ftynse, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D82952
linalg.indexed_generic (consumer) with tensor arguments.
The implementation of fusing std.constant producer with a
linalg.indexed_generic consumer was already in place. It is exposed
with this change. Also cleaning up some of the patterns that implement
the fusion to not be templated, thereby avoiding lot of conditional
checks for calling the right instantiation.
Differential Revision: https://reviews.llvm.org/D84566
The `makeTiledViews` did not use the sizes of the tiled views based on
the result of the loop bound inference computation. This manifested as
an error in computing tile sizes with convolution where not all the
result expression of concatenated affine maps are simple
AffineDimExpr.
Differential Revision: https://reviews.llvm.org/D84366
Right now there is a branching for 2 functions based on whether target map has
symbols or not. In this commit these functions are merged into one.
Furthermore, emitting does not require inverse and map applying as it computes
the correct Range in a single step and thus reduces unnecessary overhead.
Differential Revision: https://reviews.llvm.org/D83756
Loop bound inference is right now very limited as it supports only permutation maps and thus
it is impossible to implement convolution with linalg.generic as it requires more advanced
loop bound inference. This commits solves it for the convolution case.
Depends On D83158
Differential Revision: https://reviews.llvm.org/D83191
The underlying infrastructure supports this already, just add the
pattern matching for linalg.generic.
Differential Revision: https://reviews.llvm.org/D84335
This commit adds functionality needed for implementation of convolutions with
linalg.generic op. Since linalg.generic right now expects indexing maps to be
just permutations, offset indexing needed in convolutions is not possible.
Therefore in this commit we address the issue by adding support for symbols inside
indexing maps which enables more advanced indexing. The upcoming commit will
solve the problem of computing loop bounds from such maps.
Differential Revision: https://reviews.llvm.org/D83158
Summary:
linalg.copy + linalg.fill can be used to create a padded local buffer.
The `masked` attribute is only valid on this padded buffer.
When forwarding to vector.transfer ops, the attribute must be reset
conservatively.
Differential Revision: https://reviews.llvm.org/D83782
Improve the logic deciding if it is safe to hoist vector transfer read/write
out of the loop. Change the logic to prevent hoisting operations if there are
any unknown access to the memref in the loop no matter where the operation is.
For other transfer read/write in the loop check if we can prove that they
access disjoint memory and ignore them in this case.
Differential Revision: https://reviews.llvm.org/D83538
This revision adds support for vectorizing named and generic contraction ops to vector.contract. Cases in which the memref is 0-D are special cased to emit std.load/std.store instead of vector.transfer. Relevant tests are added.
Differential revision: https://reviews.llvm.org/D83307
While lowering min/max pooling ops to loops, generate the right kind of
load/stores (std or affine) instead of always generating std
load/stores.
Differential Revision: https://reviews.llvm.org/D83080
Current Affine comparison builders, which use operator overload, default to signed comparison. This creates the possibility of misuse of these builders and potential correctness issues when dealing with unsigned integers. This change makes the distinction between signed and unsigned comparison builders and forces the caller to make a choice between the two.
Differential Revision: https://reviews.llvm.org/D82323
This revision removes the TypeConverter parameter passed to the apply* methods, and instead moves the responsibility of region type conversion to patterns. The types of a region can be converted using the 'convertRegionTypes' method, which acts similarly to the existing 'applySignatureConversion'. This method ensures that all blocks within, and including those moved into, a region will have the block argument types converted using the provided converter.
This has the benefit of making more of the legalization logic controlled by patterns, instead of being handled explicitly by the driver. It also opens up the possibility to support multiple type conversions at some point in the future.
This revision also adds a new utility class `FailureOr<T>` that provides a LogicalResult friendly facility for returning a failure or a valid result value.
Differential Revision: https://reviews.llvm.org/D81681
This class enables for abstracting more of the details for the rewrite process, and will allow for clients to apply specific cost models to the pattern list. This allows for DialectConversion and the GreedyPatternRewriter to share the same underlying matcher implementation. This also simplifies the plumbing necessary to support dynamic patterns.
Differential Revision: https://reviews.llvm.org/D81985
Recent work has introduced support for constructing loops via `::build` with
callbacks that construct loop bodies using only the core OpBuilder. This is now
supported on all loop types that Linalg lowers to. Refactor LoopNestBuilder in
Linalg to rely on this functionality instead of using a custom EDSC-based
approach to creating loop nests.
The specialization targeting parallel loops is also simplified by factoring out
the recursive call into a separate static function and considering only two
alternatives: top-level loop is parallel or sequential.
This removes the last remaining in-tree use of edsc::LoopBuilder, which is now
deprecated and will be removed soon.
Differential Revision: https://reviews.llvm.org/D81873
Summary:
This revision replaces MatmulOp, now that DRR rules have been dropped.
This revision also fixes minor parsing bugs and a plugs a few holes to get e2e paths working (e.g. library call emission).
During the replacement the i32 version had to be dropped because only the EDSC operators +, *, etc support type inference.
Deciding on a type-polymorphic behavior, and implementing it, is left for future work.
Reviewers: aartbik
Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, frgossen, Kayjukh, jurahul, msifontes
Tags: #mlir
Differential Revision: https://reviews.llvm.org/D81935
It is quite common for the same type to be converted many types throughout the conversion process, and there isn't any good reason why we aren't caching that result. Especially given that we currently use identity conversion to signify legality. This revision also adds a few additional helpers to TypeConverter.
Differential Revision: https://reviews.llvm.org/D81679
This revision replaces MatmulOp, now that DRR rules have been dropped.
This revision also fixes minor parsing bugs and a plugs a few holes to get e2e paths working (e.g. library call emission).
During the replacement the i32 version had to be dropped because only the EDSC operators +, *, etc support type inference.
Deciding on a type-polymorphic behavior, and implementing it, is left for future work.
Differential Revision: https://reviews.llvm.org/D79762
Summary:
* extra ';' in the following files:
mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
mlir/lib/Dialect/Shape/IR/Shape.cpp
* base class ‘mlir::ConvertVectorToSCFBase<ConvertVectorToSCFPass>’
should be explicitly initialized in the copy constructor [-Wextra] in
mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp
* warning: ‘bool Expression::operator==(const Expression&) const’
defined but not used [-Wunused-function] in
mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp
Differential Revision: https://reviews.llvm.org/D81673
This parameter gives the developers the freedom to choose their desired function
signature conversion for preparing their functions for buffer placement. It is
introduced for BufferAssignmentFuncOpConverter, and also for
BufferAssignmentReturnOpConverter, and BufferAssignmentCallOpConverter to adapt
the return and call operations with the selected function signature conversion.
If the parameter is set, buffer placement won't also deallocate the returned
buffers.
Differential Revision: https://reviews.llvm.org/D81137
This revision adds a helper function to hoist vector.transfer_read /
vector.transfer_write pairs out of immediately enclosing scf::ForOp
iteratively, if the following conditions are true:
1. The 2 ops access the same memref with the same indices.
2. All operands are invariant under the enclosing scf::ForOp.
3. No uses of the memref either dominate the transfer_read or are
dominated by the transfer_write (i.e. no aliasing between the write and
the read across the loop)
To improve hoisting opportunities, call the `moveLoopInvariantCode` helper
function on the candidate loop above which to hoist. Hoisting the transfers
results in scf::ForOp yielding the value that originally transited through
memory.
This revision additionally exposes `moveLoopInvariantCode` as a helper in
LoopUtils.h and updates SliceAnalysis to support return scf::For values and
allow hoisting across multiple scf::ForOps.
Differential Revision: https://reviews.llvm.org/D81199
Update linalg to affine lowering for convop to use affine load for input
whenever there is no padding. It had always been using std.loads because
max in index functions (needed for non-zero padding if not materializing
zeros) couldn't be represented in the non-zero padding cases.
In the future, the non-zero padding case could also be made to use
affine - either by materializing or using affine.execute_region. The
latter approach will not impact the scf/std output obtained after
lowering out affine.
Differential Revision: https://reviews.llvm.org/D81191
This revision adds a helper function to hoist alloc/dealloc pairs and
alloca op out of immediately enclosing scf::ForOp if both conditions are true:
1. all operands are defined outside the loop.
2. all uses are ViewLikeOp or DeallocOp.
This is now considered Linalg-specific and will be generalized on a per-need basis.
Differential Revision: https://reviews.llvm.org/D81152
Summary:
The fusion for tensor_reshape is embedding the information to indexing maps,
thus the exising pattenr also works for indexed_generic ops.
Depends On D80347
Differential Revision: https://reviews.llvm.org/D80348
Summary:
Different from the fusion between generic ops, indices are involved. In this
context, we need to re-map the indices for producer since the fused op is built
on consumer's perspective. This patch supports all combination of the fusion
between indexed_generic ops and generic ops, which includes tests case:
1) generic op as producer and indexed_generic op as consumer.
2) indexed_generic op as producer and generic op as consumer.
3) indexed_generic op as producer and indexed_generic op as consumer.
Differential Revision: https://reviews.llvm.org/D80347
This revision replaces the load + vector.type_cast by appropriate vector transfer
operations. These play more nicely with other vector abstractions and canonicalization
patterns and lower to load/store with or without masks when appropriate.
Differential Revision: https://reviews.llvm.org/D80809
This revision adds custom rewrites for patterns that arise during linalg structured
ops vectorization. These patterns allow the composition of linalg promotion,
vectorization and removal of redundant copies.
The patterns are voluntarily limited and restrictive atm.
More robust behavior will be implemented once more powerful side effect modeling and analyses are available on view/subview.
On the transfer_read side, the following pattern is rewritten:
```
%alloc = ...
[optional] %view = std.view %alloc ...
%subView = subview %allocOrView ...
[optional] linalg.fill(%allocOrView, %cst) ...
...
linalg.copy(%in, %subView) ...
vector.transfer_read %allocOrView[...], %cst ...
```
into
```
[unchanged] %alloc = ...
[unchanged] [optional] %view = std.view %alloc ...
[unchanged] [unchanged] %subView = subview %allocOrView ...
...
vector.transfer_read %in[...], %cst ...
```
On the transfer_write side, the following pattern is rewriten:
```
%alloc = ...
[optional] %view = std.view %alloc ...
%subView = subview %allocOrView...
...
vector.transfer_write %..., %allocOrView[...]
linalg.copy(%subView, %out)
```
Differential Revision: https://reviews.llvm.org/D80728
Buffer placement can now operates on functions that return buffers. These
buffers escape from the deallocation phase of buffer placement.
Differential Revision: https://reviews.llvm.org/D80696
operands of Generic ops.
Unit-extent dimensions are typically used for achieving broadcasting
behavior. The pattern added (along with canonicalization patterns
added previously) removes the use of unit-extent dimensions, and
instead uses a more canonical representation of the computation. This
new pattern is not added as a canonicalization for now since it
entails adding additional reshape operations. A pass is added to
exercise these patterns, along with an API entry to populate a
patterns list with these patterns.
Differential Revision: https://reviews.llvm.org/D79766
alloc/dealloc/copies.
Add options to LinalgPromotion to use callbacks for implementating the
allocation, deallocation of buffers used for the promoted subviews,
and to copy data into and from the original subviews to the allocated
buffers.
Also some misc. cleanup of the code.
Differential Revision: https://reviews.llvm.org/D80365
Modifying the loop nest builder for generating scf.parallel loops to
not generate scf.parallel loops for non-parallel iterator types in
Linalg operations. The existing implementation incorrectly generated
scf.parallel for all tiled loops. It is rectified by refactoring logic
used while lowering to loops that accounted for this.
Differential Revision: https://reviews.llvm.org/D80188
Summary:
This revision refactors the Linalg tiling pass to be written as pattern applications and retires the use of the folder in Linalg tiling.
In the early days, tiling was written as a pass that would create (partially) folded and canonicalized operations on the fly for better composability.
As this evolves towards composition of patterns, the pass-specific folder is counter-productive and is retired.
The tiling options struct evolves to take a tile size creation function which allows materializing tile sizes on the fly (in particular constant tile sizes). This plays better with folding and DCE.
With the folder going away in Tiling, the check on whether subviews are the same in linalg fusion needs to be more robust. This revision also implements such a check.
In the current form, there are still some canonicalizations missing due to AffineMin/Max ops fed by scf::ForOp. These will be improved at a later time.
Differential Revision: https://reviews.llvm.org/D80267
Making these two converters more generic. FunctionAndBlockSignatureConverter now
moves only memref results (after type conversion) to the function argument and
keeps other legal function results unchanged. NonVoidToVoidReturnOpConverter is
renamed to NoBufferOperandsReturnOpConverter. It removes only the buffer
operands from the operands of the converted ReturnOp and inserts CopyOps to copy
each buffer to the target function argument.
Differential Revision: https://reviews.llvm.org/D79329
For now the promoted buffer is indexed using the `full view`. The full view might be
slightly bigger than the partial view (which is accounting for boundaries).
Unfortunately this does not compose easily with other transformations when multiple buffers
with shapes related to each other are involved.
Take `linalg.matmul A B C` (with A of size MxK, B of size KxN and C of size MxN) and suppose we are:
- Tiling over M by 100
- Promoting A only
This is producing a `linalg.matmul promoted_A B subview_C` where `promoted_A` is a promoted buffer
of `A` of size (100xK) and `subview_C` is a subview of size mxK where m could be smaller than 100 due
to boundaries thus leading to a possible incorrect behavior.
We propose to:
- Add a new parameter to the tiling promotion allowing to enable the use of the full tile buffer.
- By default all promoted buffers will be indexed by the partial view.
Note that this could be considered as a breaking change in comparison to the way the tiling promotion
was working.
Differential Revision: https://reviews.llvm.org/D79927
All ops of the SCF dialect now use the `scf.` prefix instead of `loop.`. This
is a part of dialect renaming.
Differential Revision: https://reviews.llvm.org/D79844
The existing implementation of SubViewOp::getRanges relies on all
offsets/sizes/strides to be dynamic values and does not work in
combination with canonicalization. This revision adds a
SubViewOp::getOrCreateRanges to create the missing constants in the
canonicalized case.
This allows reactivating the fused pass with staged pattern
applications.
However another issue surfaces that the SubViewOp verifier is now too
strict to allow folding. The existing folding pattern is turned into a
canonicalization pattern which rewrites memref_cast + subview into
subview + memref_cast.
The transform-patterns-matmul-to-vector can then be reactivated.
Differential Revision: https://reviews.llvm.org/D79759
Summary:
This revision introduces a helper function to allow applying rewrite patterns, interleaved with more global transformations, in a staged fashion:
1. the first stage consists of an OwningRewritePatternList. The RewritePattern in this list are applied once and in order.
2. the second stage consists of a single OwningRewritePattern that is applied greedily until convergence.
3. the third stage consists of applying a lambda, generally used for non-local transformation effects.
This allows creating custom fused transformations where patterns can be ordered and applied at a finer granularity than a sequence of traditional compiler passes.
A test that exercises these behaviors is added.
Differential Revision: https://reviews.llvm.org/D79518
Summary:
This makes a common pattern of
`dyn_cast_or_null<OpTy>(v.getDefiningOp())` more concise.
Differential Revision: https://reviews.llvm.org/D79681
This [discussion](https://llvm.discourse.group/t/viewop-isnt-expressive-enough/991/2) raised some concerns with ViewOp.
In particular, the handling of offsets is incorrect and does not match the op description.
Note that with an elemental type change, offsets cannot be part of the type in general because sizeof(srcType) != sizeof(dstType).
Howerver, offset is a poorly chosen term for this purpose and is renamed to byte_shift.
Additionally, for all intended purposes, trying to support non-identity layouts for this op does not bring expressive power but rather increases code complexity.
This revision simplifies the existing semantics and implementation.
This simplification effort is voluntarily restrictive and acts as a stepping stone towards supporting richer semantics: treat the non-common cases as YAGNI for now and reevaluate based on concrete use cases once a round of simplification occurred.
Differential revision: https://reviews.llvm.org/D79541
Summary: This revision introduces LinalgPromotionOptions to more easily control the application of promotion patterns. It also simplifies the different entry points into Promotion in preparation for some behavior change in subsequent revisions.
Differential Revision: https://reviews.llvm.org/D79489
This dialect contains various structured control flow operaitons, not only
loops, reflect this in the name. Drop the Ops suffix for consistency with other
dialects.
Note that this only moves the files and changes the C++ namespace from 'loop'
to 'scf'. The visible IR prefix remains the same and will be updated
separately. The conversions will also be updated separately.
Differential Revision: https://reviews.llvm.org/D79578
Portions of MLIR which depend on LLVMIR generally need to depend on
intrinsics_gen, to ensure that tablegen'd header files from LLVM are built
first. Without this, we get errors, typically about llvm/IR/Attributes.inc
not being found.
Note that previously the Linalg Dialect depended on intrinsics_gen, but it
doesn't need to, since it doesn't use LLVMIR.
Differential Revision: https://reviews.llvm.org/D79389
- Exports MLIR targets to be used out-of-tree.
- mimicks `add_clang_library` and `add_flang_library`.
- Fixes libMLIR.so
After https://reviews.llvm.org/D77515 libMLIR.so was no longer containing
any object files. We originally had a cludge there that made it work with
the static initalizers and when switchting away from that to the way the
clang shlib does it, I noticed that MLIR doesn't create a `obj.{name}` target,
and doesn't export it's targets to `lib/cmake/mlir`.
This is due to MLIR using `add_llvm_library` under the hood, which adds
the target to `llvmexports`.
Differential Revision: https://reviews.llvm.org/D78773
[MLIR] Fix libMLIR.so and LLVM_LINK_LLVM_DYLIB
Primarily, this patch moves all mlir references to LLVM libraries into
either LLVM_LINK_COMPONENTS or LINK_COMPONENTS. This enables magic in
the llvm cmake files to automatically replace reference to LLVM components
with references to libLLVM.so when necessary. Among other things, this
completes fixing libMLIR.so, which has been broken for some configurations
since D77515.
Unlike previously, the pattern is now that mlir libraries should almost
always use add_mlir_library. Previously, some libraries still used
add_llvm_library. However, this confuses the export of targets for use
out of tree because libraries specified with add_llvm_library are exported
by LLVM. Instead users which don't need/can't be linked into libMLIR.so
can specify EXCLUDE_FROM_LIBMLIR
A common error mode is linking with LLVM libraries outside of LINK_COMPONENTS.
This almost always results in symbol confusion or multiply defined options
in LLVM when the same object file is included as a static library and
as part of libLLVM.so. To catch these errors more directly, there's now
mlir_check_all_link_libraries.
To simplify usage of add_mlir_library, we assume that all mlir
libraries depend on LLVMSupport, so it's not necessary to separately specify
it.
tested with:
BUILD_SHARED_LIBS=on,
BUILD_SHARED_LIBS=off + LLVM_BUILD_LLVM_DYLIB,
BUILD_SHARED_LIBS=off + LLVM_BUILD_LLVM_DYLIB + LLVM_LINK_LLVM_DYLIB.
By: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com>
Differential Revision: https://reviews.llvm.org/D79067
[MLIR] Move from using target_link_libraries to LINK_LIBS
This allows us to correctly generate dependencies for derived targets,
such as targets which are created for object libraries.
By: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com>
Differential Revision: https://reviews.llvm.org/D79243
Three commits have been squashed to avoid intermediate build breakage.
Linalg transformations are currently exposed as DRRs.
Unfortunately RewriterGen does not play well with the line of work on named linalg ops which require variadic operands and results.
Additionally, DRR is arguably not the right abstraction to expose compositions of such patterns that don't rely on SSA use-def semantics.
This revision abandons DRRs and exposes manually written C++ patterns.
Refactorings and cleanups are performed to uniformize APIs.
This refactoring will allow replacing the currently manually specified Linalg named ops.
A collateral victim of this refactoring is the `tileAndFuse` DRR, and the one associated test, which will be revived at a later time.
Lastly, the following 2 tests do not add value and are altered:
- a dot_perm tile + interchange test does not test anything new and is removed
- a dot tile + lower to loops does not need 2-D tiling and is trimmed.
These libraries are distinct from other things in Analysis in that they
operate only on core IR concepts. This also simplifies dependencies
so that Dialect -> Analysis -> Parser -> IR. Previously, the parser depended
on portions of the the Analysis directory as well, which sometimes
caused issues with the way the cmake makefile generator discovers
dependencies on generated files during compilation.
Differential Revision: https://reviews.llvm.org/D79240
Summary:
This revision cleans up a layer of complexity in ScopedContext and uses InsertGuard instead of previously manual bookkeeping.
The method `getBuilder` is renamed to `getBuilderRef` and spurious copies of OpBuilder are tracked.
This results in some canonicalizations not happening anymore in the Linalg matmul to vector test. This test is retired because relying on DRRs for this has been shaky at best. The solution will be better support to write fused passes in C++ with more idiomatic pattern composition and application.
Differential Revision: https://reviews.llvm.org/D79208
This revision adds support to allow named ops to lower to loops.
Linalg.batch_matmul successfully lowers to loops and to LLVM.
In the process, this test also activates linalg to affine loops.
However padded convolutions to not lower to affine.load atm so this revision overrides the type of underlying load / store operation.
Differential Revision: https://reviews.llvm.org/D79135
OperationHandle mostly existed to mirror the behavior of ValueHandle.
This has become unnecessary and can be retired.
Differential Revision: https://reviews.llvm.org/D78692
Instead of using llvm_unreachable to guard against fusing linalg.conv,
reject fusing linalg.conv in isFusableInto.
tileLinalgOpImpl is a templated function now and it can operate on
loop.parellel. So we should avoid calling into getForInductionVarOwner
which always assumes loop.for.
Differential Revision: https://reviews.llvm.org/D78936
it to fusing different kinds of linalg operations on tensors.
The implementation of fusion on tensor was initially planned for just
GenericOps (and maybe IndexedGenericOps). With addition of
linalg.tensor_reshape, and potentially other such non-structured ops,
refactor the existing implementation to allow easier specification of
fusion between different linalg operations on tensors.
Differential Revision: https://reviews.llvm.org/D78463
The buffer allocated by a promotion can be subject to other transformations afterward. For example it could be vectorized, in which case it is needed to ensure that this buffer is memory-aligned.
Differential Revision: https://reviews.llvm.org/D78556
This revision is the first in a set of improvements that aim at allowing
more generalized named Linalg op generation from a mathematical
specification.
This revision allows creating a new op and checks that the parser,
printer and verifier are hooked up properly.
This opened up a few design points that will be addressed in the future:
1. A named linalg op has a static region builder instead of an
explicitly parsed region. This is not currently compatible with
assemblyFormat so a custom parser / printer are needed.
2. The convention for structured ops and tensor return values needs to
evolve to allow tensor-land and buffer land specifications to agree
3. ReferenceIndexingMaps and referenceIterators will need to become
static to allow building attributes at parse time.
4. Error messages will be improved once we have 3. and we pretty print
in custom form.
Differential Revision: https://reviews.llvm.org/D78327
The promotion transformation is promoting all input and output buffers of the transformed op. The user might want to only promote some of these buffers.
Differential Revision: https://reviews.llvm.org/D78498
There were some unused CMakeFiles for Affine/IR and Affine/EDSC.
This change builds separate MLIRAffineOps and MLIRAffineEDSC libraries
using those CMakeFiles. This combination replaces the old MLIRAffine
library.
Differential Revision: https://reviews.llvm.org/D78317
The function attribute in generic ops is not paying for itself.
A region is the more standardized way of specifying a custom computation.
If needed this region can call a function directly.
This is deemed more natural than managing a dedicated function attribute.
This also simplifies named ops generation by trimming unnecessary complexity.
Differential Revision: https://reviews.llvm.org/D78266
Summary:
Modified AffineMap::get to remove support for the overload which allowed
an ArrayRef of AffineExpr but no context (and gathered the context from a
presumed first entry, resulting in bugs when there were 0 results).
Instead, we support only a ArrayRef and a context, and a version which
takes a single AffineExpr.
Additionally, removed some now needless case logic which previously
special cased which call to AffineMap::get to use.
Reviewers: flaub, bondhugula, rriddle!, nicolasvasilache, ftynse, ulysseB, mravishankar, antiagainst, aartbik
Subscribers: mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, bader, grosul1, frgossen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78226
The inversePermutation method returns a null map on failure. Update
uses of this method within Linalg to handle this. In LinalgToLoops the
null return value was used to emit scalar code. Modify that to return
failure, and emit scalar implementation when affine map is "empty",
i.e. 1 dims, 0 symbols and no result exprs.
Differential Revision: https://reviews.llvm.org/D77964
Summary: Functional.h contains many different methods that have a direct, and more efficient, equivalent in LLVM. This revision replaces all usages with the LLVM equivalent, and removes the header. This is part of larger cleanup, pr45513, merging MLIR support facilities into LLVM.
Differential Revision: https://reviews.llvm.org/D78053
This change is NFC since the facility to tile and generate
loop.parallel loops already exists in Linalg.
Differential Revision: https://reviews.llvm.org/D77965
The invertPermutation method does not return a nullptr anymore, but
rather returns an empty map for the scalar case. Update the check in
LinalgToLoops to reflect this.
Also add test case for generating scalar code.
The outer parallel loops of a linalg operation is lowered to
loop.parallel, with the other loops lowered to loop.for. This gets the
lowering to loop.parallel on par with the loop.for lowering. In future
the reduction loop could also be lowered to loop.parallel.
Also add a utility function that returns the loops that are
created.
Differential Revision: https://reviews.llvm.org/D77678
Rename mlir::applyPatternsGreedily -> applyPatternsAndFoldGreedily. The
new name is a more accurate description of the method - it performs
both, application of the specified patterns and folding of all ops in
the op's region irrespective of whether any patterns have been supplied.
Differential Revision: https://reviews.llvm.org/D77478
Summary: Pass options are a better choice for various reasons and avoid the need for static constructors.
Differential Revision: https://reviews.llvm.org/D77707
This revision removes the reliance of Promotion on `linalg.slice` which is meant
for the rank-reducing case.
Differential Revision: https://reviews.llvm.org/D77676
Summary:
This is much cleaner, and fits the same structure as many other tablegen backends. This was not done originally as the CRTP in the pass classes made it overly verbose/complex.
Differential Revision: https://reviews.llvm.org/D77367
This revision removes all of the CRTP from the pass hierarchy in preparation for using the tablegen backend instead. This creates a much cleaner interface in the C++ code, and naturally fits with the rest of the infrastructure. A new utility class, PassWrapper, is added to replicate the existing behavior for passes not suitable for using the tablegen backend.
Differential Revision: https://reviews.llvm.org/D77350
This revision adds support for generating utilities for passes such as options/statistics/etc. that can be inferred from the tablegen definition. This removes additional boilerplate from the pass, and also makes it easier to remove the reliance on the pass registry to provide certain things(e.g. the pass argument).
Differential Revision: https://reviews.llvm.org/D76659
This generates a Passes.td for all of the dialects that have transformation passes. This removes the need for global registration for all of the dialect passes.
Differential Revision: https://reviews.llvm.org/D76657
Summary:
The RAW fusion happens only if the produecer block dominates the consumer block.
The WAW pattern also works with the precondition. I.e., if a producer can
dominate the consumer, they can fairly fuse together.
Since they are all tilable, we can think the pattern like this way:
Input:
```
linalg_op1 view
tile_loop
subview_2
linalg_op2 subview_2
```
Tile the first Linalg op as same as the second Linalg.
```
tile_loop
subview_1
linalg_op1 subview_1
tile_loop
subview_2
liangl_op2 subview_2
```
Since the first Linalg op is tilable in the same way and the computation are
independently, it's fair to fuse it with the second Linalg op.
```
tile_loop
subview_1
linalg_op1 subview_1
linalg_op2 subview_2
```
In short, this patch includes:
- Handling both RAW and WAW pattern.
- Adding a interface method to get input and output buffers.
- Exposing a method to get a StringRef of a dependency type.
- Fixing existing WAW tests and add one more use case: initialize the buffer
before conv op.
Differential Revision: https://reviews.llvm.org/D76897
Summary:
Performs an N-D pooling operation similarly to the description in the TF
documentation:
https://www.tensorflow.org/api_docs/python/tf/nn/pool
Different from the description, this operation doesn't perform on batch and
channel. It only takes tensors of rank `N`.
```
output[x[0], ..., x[N-1]] =
REDUCE_{z[0], ..., z[N-1]}
input[
x[0] * strides[0] - pad_before[0] + dilation_rate[0]*z[0],
...
x[N-1]*strides[N-1] - pad_before[N-1] + dilation_rate[N-1]*z[N-1]
],
```
The required optional arguments are:
- strides: an i64 array specifying the stride (i.e. step) for window
loops.
- dilations: an i64 array specifying the filter upsampling/input
downsampling rate
- padding: an i64 array of pairs (low, high) specifying the number of
elements to pad along a dimension.
If strides or dilations attributes are missing then the default value is
one for each of the input dimensions. Similarly, padding values are zero
for both low and high in each of the dimensions, if not specified.
Differential Revision: https://reviews.llvm.org/D76414
Existing tiling implementation of Linalg would still work for tiling
the batch dimensions of the convolution op.
Differential Revision: https://reviews.llvm.org/D76637
Summary:
Change AffineOps Dialect structure to better group both IR and Tranforms. This included extracting transforms directly related to AffineOps. Also move AffineOps to Affine.
Differential Revision: https://reviews.llvm.org/D76161
Summary:
After D75831 has been landed, both the generic op and indexed_generic op can
handle 0-D edge case. In the previous patch, only generic op has been updated.
This patch updates the lowering to loops for indexed_generic op. Since they are
almost the sanme, the patch also refactors the common part.
Differential Revision: https://reviews.llvm.org/D76413
Summary:
Although bool and int1 are the same sometimes, using bool constant matches the
semantic better. In any cases, we don't have to care the type of conditions if
we remove the intial value. The type is determined automatically by the returned
type of logical operations.
Differential Revision: https://reviews.llvm.org/D76338
Summary:
To enable this, two changes are needed:
1) Add an optional attribute `padding` to linalg.conv.
2) Compute if the indices accessing is out of bound in the loops. If so, use the
padding value `0`. Otherwise, use the value derived from load.
In the patch, the padding only works for lowering without other transformations,
e.g., tiling, fusion, etc.
Differential Revision: https://reviews.llvm.org/D75722
Summary:
This revision adds lowering of vector.contract to llvm.intr.matrix_multiply.
Note that there is currently a mismatch between the MLIR vector dialect which
expects row-major layout and the LLVM matrix intrinsics which expect column
major layout.
As a consequence, we currently only match a vector.contract with indexing maps
that express column-major matrix multiplication.
Other cases would require additional transposes and it is better to wait for
LLVM intrinsics to provide a per-operation attribute that would specify which
layout is expected.
A separate integration test, not submitted to MLIR core, has independently
verified that correct execution occurs on a 2x2x2 matrix multiplication.
Differential Revision: https://reviews.llvm.org/D76014
HasNoSideEffect can now be implemented using the MemoryEffectInterface, removing the need to check multiple things for the same information. This also removes an easy foot-gun for users as 'Operation::hasNoSideEffect' would ignore operations that dynamically, or recursively, have no side effects. This also leads to an immediate improvement in some of the existing users, such as DCE, now that they have access to more information.
Differential Revision: https://reviews.llvm.org/D76036
This revision takes advantage of the empty AffineMap to specify the
0-D edge case. This allows removing a bunch of annoying corner cases
that ended up impacting users of Linalg.
Differential Revision: https://reviews.llvm.org/D75831
add_llvm_library and add_llvm_executable may need to create new targets with
appropriate dependencies. As a result, it is not sufficient in some
configurations (namely LLVM_BUILD_LLVM_DYLIB=on) to only call
add_dependencies(). Instead, the explicit TableGen dependencies must
be passed to add_llvm_library() or add_llvm_executable() using the DEPENDS
keyword.
Differential Revision: https://reviews.llvm.org/D74930
In cmake, it is redundant to have a target list under target_link_libraries()
and add_dependency(). This patch removes the redundant dependency from
add_dependency().
Differential Revision: https://reviews.llvm.org/D74929
CMake allows calling target_link_libraries() without a keyword,
but this usage is not preferred when also called with a keyword,
and has surprising behavior. This patch explicitly specifies a
keyword when using target_link_libraries().
Differential Revision: https://reviews.llvm.org/D75725
output has zero rank.
While lowering to loops, no indices should be used in the load/store
operation if the buffer is zero-rank.
Differential Revision: https://reviews.llvm.org/D75391
add_llvm_library and add_llvm_executable may need to create new targets with
appropriate dependencies. As a result, it is not sufficient in some
configurations (namely LLVM_BUILD_LLVM_DYLIB=on) to only call
add_dependencies(). Instead, the explicit TableGen dependencies must
be passed to add_llvm_library() or add_llvm_executable() using the DEPENDS
keyword.
Differential Revision: https://reviews.llvm.org/D74930
In cmake, it is redundant to have a target list under target_link_libraries()
and add_dependency(). This patch removes the redundant dependency from
add_dependency().
Differential Revision: https://reviews.llvm.org/D74929
When compiling libLLVM.so, add_llvm_library() manipulates the link libraries
being used. This means that when using add_llvm_library(), we need to pass
the list of libraries to be linked (using the LINK_LIBS keyword) instead of
using the standard target_link_libraries call. This is preparation for
properly dealing with creating libMLIR.so as well.
Differential Revision: https://reviews.llvm.org/D74864
add_llvm_library and add_llvm_executable may need to create new targets with
appropriate dependencies. As a result, it is not sufficient in some
configurations (namely LLVM_BUILD_LLVM_DYLIB=on) to only call
add_dependencies(). Instead, the explicit TableGen dependencies must
be passed to add_llvm_library() or add_llvm_executable() using the DEPENDS
keyword.
Differential Revision: https://reviews.llvm.org/D74930
In cmake, it is redundant to have a target list under target_link_libraries()
and add_dependency(). This patch removes the redundant dependency from
add_dependency().
Differential Revision: https://reviews.llvm.org/D74929
When compiling libLLVM.so, add_llvm_library() manipulates the link libraries
being used. This means that when using add_llvm_library(), we need to pass
the list of libraries to be linked (using the LINK_LIBS keyword) instead of
using the standard target_link_libraries call. This is preparation for
properly dealing with creating libMLIR.so as well.
Differential Revision: https://reviews.llvm.org/D74864
Instead of creating extra libraries we don't really need, collect a
list of all dialects and use that instead.
Differential Revision: https://reviews.llvm.org/D75221
This revision performs some basic refactoring towards more easily defining Linalg "named" ops. Such named ops form the backbone of operations that are ubiquitous in the ML application domain.
Thus far IntegerType has been signless: a value of IntegerType does
not have a sign intrinsically and it's up to the specific operation
to decide how to interpret those bits. For example, std.addi does
two's complement arithmetic, and std.divis/std.diviu treats the first
bit as a sign.
This design choice was made some time ago when we did't have lots
of dialects and dialects were more rigid. Today we have much more
extensible infrastructure and different dialect may want different
modelling over integer signedness. So while we can say we want
signless integers in the standard dialect, we cannot dictate for
others. Requiring each dialect to model the signedness semantics
with another set of custom types is duplicating the functionality
everywhere, considering the fundamental role integer types play.
This CL extends the IntegerType with a signedness semantics bit.
This gives each dialect an option to opt in signedness semantics
if that's what they want and helps code sharing. The parser is
modified to recognize `si[1-9][0-9]*` and `ui[1-9][0-9]*` as
signed and unsigned integer types, respectively, leaving the
original `i[1-9][0-9]*` to continue to mean no indication over
signedness semantics. All existing dialects are not affected (yet)
as this is a feature to opt in.
More discussions can be found at:
https://groups.google.com/a/tensorflow.org/d/msg/mlir/XmkV8HOPWpo/7O4X0Nb_AQAJ
Differential Revision: https://reviews.llvm.org/D72533
Fixing a bug where using a zero-rank shaped type operand to
linalg.generic ops hit an unrelated assert. This also meant that
lowering the operation to loops was not supported. Adding roundtrip
tests and lowering to loops test for zero-rank shaped type operand
with fixes to make the test pass.
Differential Revision: https://reviews.llvm.org/D74638
Summary:
Linalg's promotion pass was only supporting f32 buffers due to how the
zero value was build for the `fill` operation.
Moreover, `promoteSubViewOperands` was returning a vector with one entry
per float subview while omitting integer subviews. For a program
with only integer subviews the return vector would be of size 0.
However, `promoteSubViewsOperands` would try to access a non zero
number of entries of this vector, resulting in a sefgault.
Reviewers: nicolasvasilache, ftynse
Reviewed By: ftynse
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74532
This CL refactors EDSCs to layer them better and break unnecessary
dependencies. After this refactoring, the top-level EDSC target only
depends on IR but not on Dialects anymore and each dialect has its
own EDSC directory.
This simplifies the layering and breaks cyclic dependencies.
In particular, the declarative builder + folder are made explicit and
are now confined to Linalg.
As the refactoring occurred, certain classes and abstractions that were not
paying for themselves have been removed.
Differential Revision: https://reviews.llvm.org/D74302
The initial implementation of the fusion operation exposes a method to
fuse a consumer with its producer, when
- both the producer and consumer operate on tensors
- the producer has only a single result value
- the producer has only "parallel" iterator types
A new interface method hasTensorSemantics is added to verify that an
operation has all operands and results of type RankedTensorType.
Differential Revision: https://reviews.llvm.org/D74172
Summary:
It is often needed to map entire ranges rather than single values. To avoid
writing the same for loop every time, I have added an overload to the map
method.
Differential Revision: https://reviews.llvm.org/D73894
LinalgDependenceGraph was not updated after successful producer-consumer
fusion for linalg ops. In this patch it is fixed by reconstructing
LinalgDependenceGraph on every iteration. This is very ineffective and
should be improved by updating LDGraph only when it is necessary.
Summary:
After the `subview` operation was migrated from Linalg to Standard, it changed
semantics and does not guarantee the absence of out-of-bounds accesses through
the created view anymore. Compute the size of the subview to make sure it
always fits within the view (subviews in last iterations of the loops may be
smaller than those in other iterations).
Differential Revision: https://reviews.llvm.org/D73614
Summary:
This diff adds a transformation patter to rewrite linalg.fill as broadcasting a scaler into a vector.
It uses the same preconditioning as matmul (memory is contiguous).
Reviewers: nicolasvasilache
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73391
Summary:
This is a simple extension to allow vectorization to work not only on GenericLinalgOp
but more generally across named ops too.
For now, this still only vectorizes matmul-like ops but is a step towards more
generic vectorization of Linalg ops.
Reviewers: ftynse
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72942
Summary:
This diff moves the conversion pass declaration closer to its definition
and makes the namespacing of passes consistent with the rest of the
infrastructure (i.e. `mlir::linalg::createXXXPass` -> `mlir::createXXXPass`).
Reviewers: ftynse, jpienaar, mehdi_amini
Subscribers: rriddle, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72766
Summary:
This diff fixes issues with the semantics of linalg.generic on tensors that appeared when converting directly from HLO to linalg.generic.
The changes are self-contained within MLIR and can be captured and tested independently of XLA.
The linalg.generic and indexed_generic are updated to:
To allow progressive lowering from the value world (a.k.a tensor values) to
the buffer world (a.k.a memref values), a linalg.generic op accepts
mixing input and output ranked tensor values with input and output memrefs.
```
%1 = linalg.generic #trait_attribute %A, %B {other-attributes} :
tensor<?x?xf32>,
memref<?x?xf32, stride_specification>
-> (tensor<?x?xf32>)
```
In this case, the number of outputs (args_out) must match the sum of (1) the
number of output buffer operands and (2) the number of tensor return values.
The semantics is that the linalg.indexed_generic op produces (i.e.
allocates and fills) its return values.
Tensor values must be legalized by a buffer allocation pass before most
transformations can be applied. Such legalization moves tensor return values
into output buffer operands and updates the region argument accordingly.
Transformations that create control-flow around linalg.indexed_generic
operations are not expected to mix with tensors because SSA values do not
escape naturally. Still, transformations and rewrites that take advantage of
tensor SSA values are expected to be useful and will be added in the near
future.
Subscribers: bmahjour, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72555
Summary:
This diff adds support to allow `linalg.generic` and
`linalg.indexed_generic` to take tensor input and output
arguments.
The subset of output tensor operand types must appear
verbatim in the result types after an arrow. The parser,
printer and verifier are extended to accomodate this
behavior.
The Linalg operations now support variadic ranked tensor
return values. This extension exhibited issues with the
current handling of NativeCall in RewriterGen.cpp. As a
consequence, an explicit cast to `SmallVector<Value, 4>`
is added in the proper place to support the new behavior
(better suggestions are welcome).
Relevant cleanups and name uniformization are applied.
Relevant invalid and roundtrip test are added.
Reviewers: mehdi_amini, rriddle, jpienaar, antiagainst, ftynse
Subscribers: burmako, shauheen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72022
Summary: This is part of an ongoing cleanup and uniformization work.
Reviewers: ftynse
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72084
Summary:
This is part of an ongoing cleanup and uniformization work.
This diff performs 3 types of cleanups:
1. Uniformize transformation names.
2. Replace all pattern operands that need not be captured by `$_`
3. Replace all usage of pattern captured op by the normalized `op` name (instead of positional parameters such as `$0`)
Reviewers: ftynse
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72081
This is an initial step to refactoring the representation of OpResult as proposed in: https://groups.google.com/a/tensorflow.org/g/mlir/c/XXzzKhqqF_0/m/v6bKb08WCgAJ
This change will make it much simpler to incrementally transition all of the existing code to use value-typed semantics.
PiperOrigin-RevId: 286844725
This CL allows specifying an additional name for specifying the .td file that is used to generate the doc for a dialect. This is necessary for a dialect like Linalg which has different "types" of ops that are used in different contexts.
This CL also restructures the Linalg documentation and renames LinalgLibraryOps -> LinalgStructuredOps but is otherwise NFC.
PiperOrigin-RevId: 286450414
This PR targest issue tensorflow/mlir#295. It exposes the already existing
subiew promotion pass as a declarative pattern
Change-Id: If901ebef9fb53fcd0b12ecc536f6b174ce320b92
Closestensorflow/mlir#315
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/315 from tetuante:issue295 8e5f268b6d85f31015c33505329dbd7a4db97ac5
PiperOrigin-RevId: 285801463
This will be evolved into a simple programming model for custom ops and custom layers in followup CLs.
This CL also deletes the obsolete tablegen's reference-impl.td that was using EDSCs.
PiperOrigin-RevId: 285459545
This CL adds more common information to StructuredOpsUtils.h
The n_view attribute is retired in favor of args_in + args_out but the CL is otherwise NFC.
PiperOrigin-RevId: 285000621
This patch closes issue tensorflow/mlir#272
We add a standalone iterator permutation transformation to Linalg.
This transformation composes a permutation map with the maps in the
"indexing_maps" attribute. It also permutes "iterator_types"
accordingly.
Change-Id: I7c1e693b8203aeecc595a7c012e738ca1100c857
Closestensorflow/mlir#307
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/307 from tetuante:issue272 f7908d58792f4111119721885e247045104f1131
PiperOrigin-RevId: 284824102
This CL uses the newly expanded matcher support to easily detect when a linalg.generic has a multiply-accumulate body. A linalg.generic with such a body is rewritten as a vector contraction.
This CL additionally limits the rewrite to the case of matrix multiplication on contiguous and statically shaped memrefs for now.
Before expanding further, we should harden the infrastructure for expressing custom ops with the structured ops abstraction.
PiperOrigin-RevId: 284566659
This patch closes issue tensorflow/mlir#271.
It adds an optional permutation map to declarative tiling transformations.
The map is expressed as a list of integers.
Closestensorflow/mlir#288
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/288 from tetuante:issue271 2df2938d6a1f01b3bc404ded08dea2dd1e10b588
PiperOrigin-RevId: 284064151
This CL rewrites the linalg ops to loops transformations as patterns that can be targeted directly from Tablegen. Reliance on OpFolder is removed and to cope with it we introduce local folding patterns that are applied greedily.
PiperOrigin-RevId: 282765550
This will make it easier to scale out test patterns and build specific passes that do not interfere with independent testing.
PiperOrigin-RevId: 281736335
This modification will allow to easily plug lowering of linalg ops to different types of loops (affine, loop.for and other future constructs).
This is purely NFC for now.
PiperOrigin-RevId: 280652186
This is step 1/n in refactoring infrastructure along the Vector dialect to make it ready for retargetability and composable progressive lowering.
PiperOrigin-RevId: 280529784
Following up on the consolidation of MemRef descriptor conversion, update
Linalg-to-LLVM conversion to use the helper class that abstracts away the
implementation details of the MemRef descriptor. This required MemRefDescriptor
to become publicly visible. Since this conversion is heavily EDSC-based,
introduce locally an additional wrapper that uses builder and location pointed
to by the EDSC context while emitting descriptor manipulation operations.
PiperOrigin-RevId: 280429228
This CL uses the now standard std.subview in linalg.
Two shortcuts are currently taken to allow this port:
1. the type resulting from a view is currently degraded to fully dynamic to pass the SubViewOp verifier.
2. indexing into SubViewOp may access out of bounds since lowering to LLVM does not currently enforce it by construction.
These will be fixed in subsequent commits after discussions.
PiperOrigin-RevId: 280250129
This CL adds an extra pointer to the memref descriptor to allow specifying alignment.
In a previous implementation, we used 2 types: `linalg.buffer` and `view` where the buffer type was the unit of allocation/deallocation/alignment and `view` was the unit of indexing.
After multiple discussions it was decided to use a single type, which conflates both, so the memref descriptor now needs to carry both pointers.
This is consistent with the [RFC-Proposed Changes to MemRef and Tensor MLIR Types](https://groups.google.com/a/tensorflow.org/forum/#!searchin/mlir/std.view%7Csort:date/mlir/-wKHANzDNTg/4K6nUAp8AAAJ).
PiperOrigin-RevId: 279959463
This change allows for adding additional nested references to a SymbolRefAttr to allow for further resolving a symbol if that symbol also defines a SymbolTable. If a referenced symbol also defines a symbol table, a nested reference can be used to refer to a symbol within that table. Nested references are printed after the main reference in the following form:
symbol-ref-attribute ::= symbol-ref-id (`::` symbol-ref-id)*
Example:
module @reference {
func @nested_reference()
}
my_reference_op @reference::@nested_reference
Given that SymbolRefAttr is now more general, the existing functionality centered around a single reference is moved to a derived class FlatSymbolRefAttr. Followup commits will add support to lookups, rauw, etc. for scoped references.
PiperOrigin-RevId: 279860501
This operation is a companion operation to the std.view operation added as proposed in "Updates to the MLIR MemRefType" RFC.
PiperOrigin-RevId: 279766410
Now that a view op has graduated to the std dialect, we can update Linalg to use it and remove ops that have become obsolete. As a byproduct, the linalg buffer and associated ops can also disappear.
PiperOrigin-RevId: 279073591
This CL adds a simple pattern for specifying producer-consumer fusion on Linalg operations.
Implementing such an extension reveals some interesting properties.
Since Linalg operates on a buffer abstraction, the output buffers are specified as in/out parameters to the ops. As a consequence, there are no SSA use-def chains and one cannot specify complex dag input patterns with the current infrastructure.
Instead this CL uses constraints based on the existing linalg dependence analysis to focus the pattern and refine patterns based on the type of op that last wrote in a buffer.
This is a very local property and is less powerful than the generic dag specification based on SSA use-def chains.
This will be generalized in the future.
PiperOrigin-RevId: 277931503
MLIR const-correctness policy is to avoid having `const` on IR objects.
LinalgDependenceGraph is not an IR object but an auxiliary data structure.
Furthermore, it is not updated once constructed unlike IR objects. Add const
qualifiers to get* and find* methods of LinalgDependenceGraph since they are
not modifying the graph. This allows transformation functions that require the
dependence graph to take it by const-reference, clearly indicating that they
are not modifying it (and that the graph may have to be recomputed after the
transformation).
PiperOrigin-RevId: 277731608
Linalg ops provide a good anchor for pattern matching/rewriting transformations.
This CL adds a simple example of how multi-level tiling may be specified by attaching a simple StringAttr to ops as they are transformed so we can easily specify partial lowering to control transformation application.
This is a first stab at taking advantage of higher-level information contained in Linalg ops and will evolve in the future.
PiperOrigin-RevId: 277497958
This will be used to specify declarative Linalg transformations in a followup CL. In particular, the PatternRewrite mechanism does not allow folding and has its own way of tracking erasure.
PiperOrigin-RevId: 277149158
This allows mixing linalg operations with vector transfer operations (with additional modifications to affine ops) and is a step towards solving tensorflow/mlir#189.
PiperOrigin-RevId: 275543361
This CL creates a new Linalg promotion pass that operates on SubViewOp and decouples it from Linalg tiling. This is mostly moving code around.
PiperOrigin-RevId: 275329213
When the implementation of the strided memref [RFC](https://groups.google.com/a/tensorflow.org/forum/#!msg/mlir/MaL8m2nXuio/1scRqZa6AQAJ) landed, linalg started using this type instead of the now retired !linalg.view.
As static and partially static cases appear, the stride information needs to be maintained properly. In particular, the result type of the subview op was generally incorrect.
This CL fixes the issue by computing a return type that:
1. always has dynamic sizes, which is generally the only correct way to construct a subview in the absence of data padding and/or code versioning.
2. has the same strides as the base strided memref.
Point 1. above can be further refined but will needs further analysis and canonicalization to optimize the particular case where:
1. The base memref has static size along a given dimension.
2. The subview size can be statically derived (e.g. after canonicalization).
3. *And* the subview size is an even divisor of the base memref.
This 3rd constraint is well-known in the case of tiled layouts that don't assume implicit padding: the boundary tile may be only partial and has size given by `problem_size % tile_size`.
Tests are updated as appropriate.
PiperOrigin-RevId: 274578624
This fixes an omission that prevents Linalg to lower generic ops regions operating on ops in the VectorOps dialect.
To achieve this we simply need to `populateVectorToLLVMConversionPatterns` in the conversion.
Relevant tests are added.
PiperOrigin-RevId: 274577325
When an operation with regions gets replaced, we currently require that all of the remaining nested operations are still converted even though they are going to be replaced when the rewrite is finished. This cl adds a tracking for a minimal set of operations that are known to be "dead". This allows for ignoring the legalization of operations that are won't survive after conversion.
PiperOrigin-RevId: 274009003
This function-like operation allows one to define functions that have wrapped
LLVM IR function type, in particular variadic functions. The operation was
added in parallel to the existing lowering flow, this commit only switches the
flow to use it.
Using a custom function type makes the LLVM IR dialect type system more
consistent and avoids complex conversion rules for functions that previously
had to use the built-in function type instead of a wrapped LLVM IR dialect type
and perform conversions during the analysis.
PiperOrigin-RevId: 273910855
During the conversion, both the original and the converted function may coexist
in the module and have the same symbol name. There is no guarantee which of the
two will be found by the symbol lookup. Avoid returning the result of the
library function lookup when lowering Linalg to Standard or LLVM. Use the
symbol reference instead. After the conversion completes, only one symbol will
remain and the Ops using SymbolRefAttrs will be referring to the correct one.
PiperOrigin-RevId: 273510079
Certain lowering patterns were reported as [missing](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/dkdmHa77sSQ).
This CL adds them and allows Linalg/roundtrip.mlir and Linalg/loops.mlir to lower to LLVM directly. Those 2 tests are updated to additionally check that the direct lowering to LLVM does not crash.
The following points, left as TODOs still need to be addressed for correct end-to-end execution:
1. the lowering for ConvOp needs to pass attributes such as strides and dilations; the external library call needs to support it.
2. the lowering for GenericOp needs to support lowering to loops as a DialectConversion pattern. This is blocked on the DialectConversion infrastructure accepting an OperationFolder.
PiperOrigin-RevId: 272878131
This makes the name of the conversion pass more consistent with the naming
scheme, since it actually converts from the Loop dialect to the Standard
dialect rather than working with arbitrary control flow operations.
PiperOrigin-RevId: 272612112
This CL finishes the implementation of the Linalg + Affine type unification of the [strided memref RFC](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
As a consequence, the !linalg.view type, linalg::DimOp, linalg::LoadOp and linalg::StoreOp can now disappear and Linalg can use standard types everywhere.
PiperOrigin-RevId: 272187165
This CL modifies the linalg-fusion pass such that it does not tile anymore as part of the pass. Tiling is a separate concern that enables linalg fusion but should happen before.
This makes fusion more composable with other decisions.
In particular the fusion pass now becomes greedy and only applies the transformation on a best-effort basis.
This should also let fusion work in a multi-hop fashion with chains of producer/consumers.
Since the fusion pass does not perform tiling anymore, tests are rewritten to be in pretiled form and make the intent of the test clearer (albeit more verbose).
PiperOrigin-RevId: 271357741
The helper functions makePositionAttr() and positionAttr() were originally
introduced in the lowering-to-LLVM-dialect pass to construct integer array
attributes that are used for static positions in extract/insertelement.
Constructing an integer array attribute being fairly common, a utility function
Builder::getI64ArrayAttr was later introduced into the Builder API. Drop
makePositionAttr and similar homegrown functions and use that API instead.
PiperOrigin-RevId: 269295836
View descriptors are converted to *pointer to* LLVM struct to avoid ABI issues related to C struct packing. This creates unnecessary complexity and hampers unification with memrefs.
Instead, this CL makes view descriptors convert to LLVM struct (as it was originally) and promotes all structs to pointers right before calling an external function.
PiperOrigin-RevId: 267602693
This CL adds support for proper cloning of Linalg ops that have regions (i.e. the generic linalg op). This is used to properly implement tiling and fusion for such ops. Adequate tests are added.
PiperOrigin-RevId: 267027176
Remove unused variables and attributes from BaseViewConversionHelper
on mlir/lib/Dialect/Linalg/Transforms/LowerToLLVMDialect.cpp
Closestensorflow/mlir#116
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/116 from alexst07:fix-warnings 5f638e4677492cf71a9cc040eeb6b57427d32e06
PiperOrigin-RevId: 266972082
This interface will allow for providing hooks to interrop with operation folding. The first hook, 'shouldMaterializeInto', will allow for controlling which region to insert materialized constants into. The folder will generally materialize constants into the top-level isolated region, this allows for materializing into a lower level ancestor region if it is more profitable/correct.
PiperOrigin-RevId: 266702972
This change refactors and cleans up the implementation of the operation walk methods. After this refactoring is that the explicit template parameter for the operation type is no longer needed for the explicit op walks. For example:
op->walk<AffineForOp>([](AffineForOp op) { ... });
is now accomplished via:
op->walk([](AffineForOp op) { ... });
PiperOrigin-RevId: 266209552
Add an extra RewritePattern that does not convert types to rewrite a CopyOp that has non-identity permutations into a sequence of TransposeOp followed by a CopyOp without such permutations.
This RewitePattern is made to fail in the non-permutation case so that the conversion pattern can kick in to lower to LLVM.
This is an instance of A->A->B lowering where A->A is done by a RewritePattern in case_1 and A->B is done by a ConversionPatternRewriter when not(case_1).
PiperOrigin-RevId: 265171380
Add a conversion pattern that transforms a linalg.transpose op into:
1. A function entry `alloca` operation to allocate a ViewDescriptor.
2. A load of the ViewDescriptor from the pointer allocated in 1.
3. Updates to the ViewDescriptor to introduce the data ptr, offset, size
and stride. Size and stride are permutations of the original values.
4. A store of the resulting ViewDescriptor to the alloca'ed pointer.
The linalg.transpose op is replaced by the alloca'ed pointer.
PiperOrigin-RevId: 265169112
This CL extends support for lowering of linalg to external C++ libraries with CopyOp. Currently this can only work when the permutation maps in the copies are identity. Future support for permutations will be added later.
PiperOrigin-RevId: 265093025
linalg.subview used to lower to a slice with a bounded range resulting in correct bounded accesses. However linalg.slice could still index out of bounds. This CL moves the bounding to linalg.slice.
LLVM select and cmp ops gain a more idiomatic builder.
PiperOrigin-RevId: 264897125