Reductions in innermost loops become harder for the backend to disambiguate
after bufferization into memrefs, resulting in less efficient load-update-store
cycles. By scalarizing innermost reductions, the backend is more likely to assign
a register to perform the reduction (also prepares vectorization). Even though
we could scalarize reductions for more outer loops and while-loops as well,
currently scalarization is only done for chains of innermost for-loops, where
it matters most, to avoid complicating codegen unnecessary (viz. adding lots
of yield instructions).
This CL also refactors condition simplification into the merger class,
where it belongs, so that conditions are simplified only once per loop
nest and not repeatedly as was currently done. This CL also fixes a few
minor bugs, some layout issues, and comments.
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D93143
This better matches the rest of the infrastructure, is much simpler, and makes it easier to move these types to being declaratively specified.
Differential Revision: https://reviews.llvm.org/D93432
This is useful for scalar code that uses for/while loops.
This has also been confirmed to work for representing std.pow as an
scf.for loop on gpus.
Differential Revision: https://reviews.llvm.org/D93308
Fix a bug causing to pick the wrong vector size to broadcast to when the source
vectors have different ranks.
Differential Revision: https://reviews.llvm.org/D93118
After bufferization, the backend has much more trouble hoisting loop invariant
loads from the loops generated by the sparse compiler. Therefore, this is done
during sparse code generation. Note that we don't bother hoisting derived
invariant expressions on SSA values, since the backend does that very well.
Still TBD: scalarize reductions to avoid load-add-store cycles
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D92534
In the past, the reshape op can be folded only if the indexing map is
permutation in consumer's usage. We can relax to condition to be projected
permutation.
This patch still limits the fusion for scalar cases. Scalar case is a corner
case, because we need to decide where to put extra dims.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D92466
Add support for vectorization for linalg.generic representing element-wise ops.
Those are converted to transfer_read + vector ops + transfer_write.
Also re-organize the vectorization tests to be together.
Implementation derived from the work of @burmako, @agrue and
@fedelebron.
Differential Revision: https://reviews.llvm.org/D92540
Given that OpState already implicit converts to Operator*, this seems reasonable.
The alternative would be to add more functions to OpState which forward to Operation.
Reviewed By: rriddle, ftynse
Differential Revision: https://reviews.llvm.org/D92266
This change gives sparse compiler clients more control over selecting
individual types for the pointers and indices in the sparse storage schemes.
Narrower width obviously results in smaller memory footprints, but the
range should always suffice for the maximum number of entries or index value.
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D92126
This CL adds the ability to request different parallelization strategies
for the generate code. Every "parallel" loop is a candidate, and converted
to a parallel op if it is an actual for-loop (not a while) and the strategy
allows dense/sparse outer/inner parallelization.
This will connect directly with the work of @ezhulenev on parallel loops.
Still TBD: vectorization strategy
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D91978
Generalizes invariant handling to anything defined outside the Linalg op
(parameters and SSA computations). Fixes bug that was using parameter number
as tensor number.
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D91985
Print part of an op of the form:
```
<optional-offset-prefix>`[` offset-list `]`
<optional-size-prefix>`[` size-list `]`
<optional-stride-prefix>[` stride-list `]`
```
Also address some leftover nits.
Differential revision: https://reviews.llvm.org/D92031
Exposing some utility functions from Linalg to allow for promotion of
fused views outside of the core tile+fuse logic.
This is an alternative to patch D91322 which adds the promotion logic
to the tileAndFuse method. Downside with that approach is that it is
not easily customizable based on needs.
Differential Revision: https://reviews.llvm.org/D91503
Enhance the tile+fuse logic to allow fusing a sequence of operations.
Make sure the value used to obtain tile shape is a
SubViewOp/SubTensorOp. Current logic used to get the bounds of loop
depends on the use of `getOrCreateRange` method on `SubViewOp` and
`SubTensorOp`. Make sure that the value/dim used to compute the range
is from such ops. This fix is a reasonable WAR, but a btter fix would
be to make `getOrCreateRange` method be a method of `ViewInterface`.
Differential Revision: https://reviews.llvm.org/D90991
This revision refactors code used in various Linalg transformations and makes it a first class citizen to the LinalgStructureOpInterface. This is in preparation to allowing more advanced Linalg behavior but is otherwise NFC.
Differential revision: https://reviews.llvm.org/D91863
Adds tests for full sum reduction (tensors summed up into scalars)
and the well-known sampled-dense-dense-matrix-product. Refines
the optimizations rules slightly to handle the summation better.
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D91818
Add transformation to be able to forward transfer_write into transfer_read
operation and to be able to remove dead transfer_write when a transfer_write is
overwritten before being read.
Differential Revision: https://reviews.llvm.org/D91321
This reverts commit f8284d21a8.
Revert "[mlir][Linalg] NFC: Expose some utility functions used for promotion."
This reverts commit 0c59f51592.
Revert "Remove unused isZero function"
This reverts commit 0f9f0a4046.
Change f8284d21 led to multiple failures in IREE compilation.
Exposing some utility functions from Linalg to allow for promotion of
fused views outside of the core tile+fuse logic.
This is an alternative to patch D91322 which adds the promotion logic
to the tileAndFuse method. Downside with that approach is that it is
not easily customizable based on needs.
Differential Revision: https://reviews.llvm.org/D91503
This commit starts a new pass and patterns for converting Linalg
named ops to generic ops. This enables us to leverage the flexbility
from generic ops during transformations. Right now only linalg.conv
is supported; others will be added when useful.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D91357
Rationale:
Make sure preconditions are tested already during verfication.
Currently, the only way a sparse rewriting rule can fail is if
(1) the linalg op does not have sparse annotations, or
(2) a yet to be handled operation is encounted inside the op
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D91748
As discussed in https://llvm.discourse.group/t/mlir-support-for-sparse-tensors/2020
this CL is the start of sparse tensor compiler support in MLIR. Starting with a
"dense" kernel expressed in the Linalg dialect together with per-dimension
sparsity annotations on the tensors, the compiler automatically lowers the
kernel to sparse code using the methods described in Fredrik Kjolstad's thesis.
Many details are still TBD. For example, the sparse "bufferization" is purely
done locally since we don't have a global solution for propagating sparsity
yet. Furthermore, code to input and output the sparse tensors is missing.
Nevertheless, with some hand modifications, the generated MLIR can be
easily converted into runnable code already.
Reviewed By: nicolasvasilache, ftynse
Differential Revision: https://reviews.llvm.org/D90994
These includes have been deprecated in favor of BuiltinDialect.h, which contains the definitions of ModuleOp and FuncOp.
Differential Revision: https://reviews.llvm.org/D91572
motivated by a refactoring in the new sparse code (yet to be merged), this avoids some lengthy code dup
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D91465
This removes the need to have an explicit `cast<>` given that we always know it `isa` instance of the interface.
Differential Revision: https://reviews.llvm.org/D91304
We lower them to a std.global_memref (uniqued by constant value) + a
std.get_global_memref to produce the corresponding memref value.
This allows removing Linalg's somewhat hacky lowering of tensor
constants, now that std properly supports this.
Differential Revision: https://reviews.llvm.org/D91306
It was incorrect in the presence of a tensor argument with multiple
uses.
The bufferization of subtensor_insert was writing into a converted
memref operand, but there is no guarantee that the converted memref for
that operand is safe to write into. In this case, the same converted
memref is written to in-place by the subtensor_insert bufferization,
violating the tensor-level semantics.
I left some comments in a TODO about ways forward on this. I will be
working actively on this problem in the coming days.
Differential Revision: https://reviews.llvm.org/D91371
This change does two main things
1) An operation might have multiple dependences to the same
producer. Not tracking them correctly can result in incorrect code
generation with fusion. To rectify this the dependence tracking
needs to also have the operand number in the consumer.
2) Improve the logic used to find the fused loops making it easier to
follow. The only constraint for fusion is that linalg ops (on
buffers) have update semantics for the result. Fusion should be
such that only one iteration of the fused loop (which is also a
tiled loop) must touch only one (disjoint) tile of the output. This
could be relaxed by allowing for recomputation that is the default
when oeprands are tensors, or can be made legal with promotion of
the fused view (in future).
Differential Revision: https://reviews.llvm.org/D90579
This CL integrates the new sparse annotations (hereto merely added as fully
transparent attributes) more tightly to the generic linalg op in order to add
verification of the annotations' consistency as well as to make make other
passes more aware of their presence (in the long run, rewriting rules must
preserve the integrity of the annotations).
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D91224
This patch converts elementwise ops on tensors to linalg.generic ops
with the same elementwise op in the payload (except rewritten to
operate on scalars, obviously). This is a great form for later fusion to
clean up.
E.g.
```
// Compute: %arg0 + %arg1 - %arg2
func @f(%arg0: tensor<?xf32>, %arg1: tensor<?xf32>, %arg2: tensor<?xf32>) -> tensor<?xf32> {
%0 = addf %arg0, %arg1 : tensor<?xf32>
%1 = subf %0, %arg2 : tensor<?xf32>
return %1 : tensor<?xf32>
}
```
Running this through
`mlir-opt -convert-std-to-linalg -linalg-fusion-for-tensor-ops` we get:
```
func @f(%arg0: tensor<?xf32>, %arg1: tensor<?xf32>, %arg2: tensor<?xf32>) -> tensor<?xf32> {
%0 = linalg.generic {indexing_maps = [#map0, #map0, #map0, #map0], iterator_types = ["parallel"]} ins(%arg0, %arg1, %arg2 : tensor<?xf32>, tensor<?xf32>, tensor<?xf32>) {
^bb0(%arg3: f32, %arg4: f32, %arg5: f32): // no predecessors
%1 = addf %arg3, %arg4 : f32
%2 = subf %1, %arg5 : f32
linalg.yield %2 : f32
} -> tensor<?xf32>
return %0 : tensor<?xf32>
}
```
So the elementwise ops on tensors have nicely collapsed into a single
linalg.generic, which is the form we want for further transformations.
Differential Revision: https://reviews.llvm.org/D90354
Previously, linalg-bufferize was a "finalizing" bufferization pass (it
did a "full" conversion). This wasn't great because it couldn't be used
composably with other bufferization passes like std-bufferize and
scf-bufferize.
This patch makes linalg-bufferize a composable bufferization pass.
Notice that the integration tests are switched over to using a pipeline
of std-bufferize, linalg-bufferize, and (to finalize the conversion)
func-bufferize. It all "just works" together.
While doing this transition, I ran into a nasty bug in the 1-use special
case logic for forwarding init tensors. That logic, while
well-intentioned, was fundamentally flawed, because it assumed that if
the original tensor value had one use, then the converted memref could
be mutated in place. That assumption is wrong in many cases. For
example:
```
%0 = some_tensor : tensor<4xf32>
br ^bb0(%0, %0: tensor<4xf32>, tensor<4xf32>)
^bb0(%bbarg0: tensor<4xf32>, %bbarg1: tensor<4xf32>)
// %bbarg0 is an alias of %bbarg1. We cannot safely write
// to it without analyzing uses of %bbarg1.
linalg.generic ... init(%bbarg0) {...}
```
A similar example can happen in many scenarios with function arguments.
Even more sinister, if the converted memref is produced by a
`std.get_global_memref` of a constant global memref, then we might
attempt to write into read-only statically allocated storage! Not all
memrefs are writable!
Clearly, this 1-use check is not a local transformation that we can do
on the fly in this pattern, so I removed it.
The test is now drastically shorter and I basically rewrote the CHECK
lines from scratch because:
- the new composable linalg-bufferize just doesn't do as much, so there
is less to test
- a lot of the tests were related to the 1-use check, which is now gone,
so there is less to test
- the `-buffer-hoisting -buffer-deallocation` is no longer mixed in, so
the checks related to that had to be rewritten
Differential Revision: https://reviews.llvm.org/D90657
Previously, they were only defined for `FuncOp`.
To support this, `FunctionLike` needs a way to get an updated type
from the concrete operation. This adds a new hook for that purpose,
called `getTypeWithoutArgsAndResults`.
For now, `FunctionLike` continues to assume the type is
`FunctionType`, and concrete operations that use another type can hide
the `getType`, `setType`, and `getTypeWithoutArgsAndResults` methods.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D90363
The bufferization patterns are moved to the .cpp file, which is
preferred in the codebase when it makes sense.
The LinalgToStandard patterns are kept a header because they are
expected to be used individually. However, they are moved to
LinalgToStandard.h which is the file corresponding to where they are
defined.
This also removes TensorCastOpConverter, which is handled by
populateStdBufferizePatterns now. Eventually, the constant op lowering
will be handled as well, but it there are currently holdups on moving
it (see https://reviews.llvm.org/D89916).
Differential Revision: https://reviews.llvm.org/D90254
Linalg "tile-and-fuse" is currently exposed as a Linalg pass "-linalg-fusion" but only the mechanics of the transformation are currently relevant.
Instead turn it into a "-test-linalg-greedy-fusion" pass which performs canonicalizations to enable more fusions to compose.
This allows dropping the OperationFolder which is not meant to be used with the pattern rewrite infrastructure.
Differential Revision: https://reviews.llvm.org/D90394
This patch adds support for fusing linalg.indexed_generic op with
linalg.tensor_reshape op by expansion, i.e.
- linalg.indexed_generic op -> linalg.tensor_reshape op when the
latter is expanding.
- linalg.tensor_reshape op -> linalg.indexed_generic op when the
former is folding.
Differential Revision: https://reviews.llvm.org/D90082
This class represents a rewrite pattern list that has been frozen, and thus immutable. This replaces the uses of OwningRewritePatternList in pattern driver related API, such as dialect conversion. When PDL becomes more prevalent, this API will allow for optimizing a set of patterns once without the need to do this per run of a pass.
Differential Revision: https://reviews.llvm.org/D89104
There are several pieces of pattern rewriting infra in IR/ that really shouldn't be there. This revision moves those pieces to a better location such that they are easier to evolve in the future(e.g. with PDL). More concretely this revision does the following:
* Create a Transforms/GreedyPatternRewriteDriver.h and move the apply*andFold methods there.
The definitions for these methods are already in Transforms/ so it doesn't make sense for the declarations to be in IR.
* Create a new lib/Rewrite library and move PatternApplicator there.
This new library will be focused on applying rewrites, and will also include compiling rewrites with PDL.
Differential Revision: https://reviews.llvm.org/D89103
Adds support for
- Dropping unit dimension loops for indexed_generic ops.
- Folding consecutive folding (or expanding) reshapes when the result
(or src) is a scalar.
- Fixes to indexed_generic -> generic fusion when zero-dim tensors are
involved.
Differential Revision: https://reviews.llvm.org/D90118
This revision allows the fusion of the producer of input tensors in the consumer under a tiling transformation (which produces subtensors).
Many pieces are still missing (e.g. support init_tensors, better refactor LinalgStructuredOp interface support, try to merge implementations and reuse code) but this still allows getting started.
The greedy pass itself is just for testing purposes and will be extracted in a separate test pass.
Differential revision: https://reviews.llvm.org/D89491
The current fusion on tensors fuses reshape ops with generic ops by
linearizing the indexing maps of the fused tensor in the generic
op. This has some limitations
- It only works for static shapes
- The resulting indexing map has a linearization that would be
potentially prevent fusion later on (for ex. tile + fuse).
Instead, try to fuse the reshape consumer (producer) with generic op
producer (consumer) by expanding the dimensionality of the generic op
when the reshape is expanding (folding). This approach conflicts with
the linearization approach. The expansion method is used instead of
the linearization method.
Further refactoring that changes the fusion on tensors to be a
collection of patterns.
Differential Revision: https://reviews.llvm.org/D89002
This revision adds a programmable codegen strategy from linalg based on staged rewrite patterns. Testing is exercised on a simple linalg.matmul op.
Differential Revision: https://reviews.llvm.org/D89374
Update linalg-to-loops lowering for pooling operations to perform
padding of the input when specified by the corresponding attribute.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D88911
TensorConstantOp bufferization currently uses the vector dialect to store constant data into memory.
Due to natural vector size and alignment properties, this is problematic with n>1-D vectors whose most minor dimension is not naturally aligned.
Instead, this revision linearizes the constant and introduces a linalg.reshape to go back to the desired shape.
Still this is still to be considered a workaround and a better longer term solution will probably involve `llvm.global`.
Differential Revision: https://reviews.llvm.org/D89311
This revision introduces support for buffer allocation for any named linalg op.
To avoid template instantiating many ops, a new ConversionPattern is created to capture the LinalgOp interface.
Some APIs are updated to remain consistent with MLIR style:
`OwningRewritePatternList * -> OwningRewritePatternList &`
`BufferAssignmentTypeConverter * -> BufferAssignmentTypeConverter &`
Differential revision: https://reviews.llvm.org/D89226
The buffer placement preparation tests in
test/Transforms/buffer-placement-preparation* are using Linalg as a test
dialect which leads to confusion and "copy-pasta", i.e. Linalg is being
extended now and when TensorsToBuffers.cpp is changed, TestBufferPlacement is
sometimes kept in-sync, which should not be the case.
This has led to the unnoticed bug, because the tests were in a different directory and the patterns were slightly off.
Differential Revision: https://reviews.llvm.org/D89209
This revision belongs to a series of patches that reduce reliance of Linalg transformations on templated rewrite and conversion patterns.
Instead, this uses a MatchAnyTag pattern for the vast majority of cases and dispatches internally.
Differential revision: https://reviews.llvm.org/D89133
This revision also inserts an end-to-end test that lowers tensors to buffers all the way to executable code on CPU.
Differential revision: https://reviews.llvm.org/D88998
The simplest case is when the indexing maps are DimIds in every component. This covers cwise ops.
Also:
* Expose populateConvertLinalgOnTensorsToBuffersPatterns in Transforms.h
* Expose emitLoopRanges in Transforms.h
Differential Revision: https://reviews.llvm.org/D88781
This revision adds init_tensors support to buffer allocation for Linalg on tensors.
Currently makes the assumption that the init_tensors fold onto the first output tensors.
This assumption is not currently enforced or cast in stone and requires experimenting with tiling linalg on tensors for ops **without reductions**.
Still this allows progress towards the end-to-end goal.
This revision introduces a `subtensor` op, which is the counterpart of `subview` for a tensor operand. This also refactors the relevant pieces to allow reusing the `subview` implementation where appropriate.
This operation will be used to implement tiling for Linalg on tensors.
The pattern is structured similar to other patterns like
LinalgTilingPattern. The fusion patterns takes options that allows you
to fuse with producers of multiple operands at once.
- The pattern fuses only at the level that is known to be legal, i.e
if a reduction loop in the consumer is tiled, then fusion should
happen "before" this loop. Some refactoring of the fusion code is
needed to fuse only where it is legal.
- Since the fusion on buffers uses the LinalgDependenceGraph that is
not mutable in place the fusion pattern keeps the original
operations in the IR, but are tagged with a marker that can be later
used to find the original operations.
This change also fixes an issue with tiling and
distribution/interchange where if the tile size of a loop were 0 it
wasnt account for in these.
Differential Revision: https://reviews.llvm.org/D88435
while folding tensor_reshape op.
While folding reshapes that introduce unit extent dims, the logic to
compute the reassociation maps can be generalized to handle some
corner cases, for example, when the folded shape still has unit-extent
dims but corresponds to folded unit extent dims of the expanded shape.
Differential Revision: https://reviews.llvm.org/D88521
Current setup for conv op vectorization does not enable user to specify tile
sizes as well as dimensions for vectorization. In this commit we change that by
adding tile sizes as pass arguments. Every dimension with corresponding tile
size > 1 is automatically vectorized.
Differential Revision: https://reviews.llvm.org/D88533
Manually-defined named ops do not currently support `init_tensors` or return values and may never support them. Add extra interface to the StructuredOpInterface so that we can still write op-agnostic transformations based on StructuredOpInterface.
This is an NFC extension in preparation for tiling on tensors.
Differential Revision: https://reviews.llvm.org/D88481
This revision changes the signatures of helper function that Linalg uses to create loops so that they can also take iterArgs.
iterArgs are asserted empty to ensure no functional change.
This is a mechanical change in preparation of tiling on linalg on tensors to avoid polluting the implementation with an NFC change.
Differential Revision: https://reviews.llvm.org/D88480
A sequence of two reshapes such that one of them is just adding unit
extent dims can be folded to a single reshape.
Differential Revision: https://reviews.llvm.org/D88057
ConvOp vectorization supports now only convolutions of static shapes with dimensions
of size either 3(vectorized) or 1(not) as underlying vectors have to be of static
shape as well. In this commit we add support for convolutions of any size as well as
dynamic shapes by leveraging existing matmul infrastructure for tiling of both input
and kernel to sizes accepted by the previous version of ConvOp vectorization.
In the future this pass can be extended to take "tiling mask" as a user input which
will enable vectorization of user specified dimensions.
Differential Revision: https://reviews.llvm.org/D87676
The LinalgTilingPattern class dervied from the base deletes the
original operation. This allows for the use case where the more
transformations are necessary on the original operation after
tiling. In such cases the pattern can derive from
LinalgBaseTilingPattern instead of LinalgTilingPattern.
Differential Revision: https://reviews.llvm.org/D87308
This patch adds a new named structured op to accompany linalg.matmul and
linalg.matvec. We needed it for our codegen, so I figured it would be useful
to add it to Linalg.
Reviewed By: nicolasvasilache, mravishankar
Differential Revision: https://reviews.llvm.org/D87292
In this commit a new way of convolution ops lowering is introduced.
The conv op vectorization pass lowers linalg convolution ops
into vector contractions. This lowering is possible when conv op
is first tiled by 1 along specific dimensions which transforms
it into dot product between input and kernel subview memory buffers.
This pass converts such conv op into vector contraction and does
all necessary vector transfers that make it work.
Differential Revision: https://reviews.llvm.org/D86619
With `dynamic_tensor_from_elements` tensor values of dynamic size can be
created. The body of the operation essentially maps the index space to tensor
elements.
Declare SCF operations in the `scf` namespace to avoid name clash with the new
`std.yield` operation. Resolve ambiguities between `linalg/shape/std/scf.yield`
operations.
Differential Revision: https://reviews.llvm.org/D86276
Sizes of tiles (subviews) are bigger by 1 than they should. Let's consider
1D convolution without batches or channels. Furthermore let m iterate over
the output and n over the kernel then input is accessed with m + n. In tiling
subview sizes for convolutions are computed by applying requested tile size
together with kernel size to the above mentioned expression thus let's say
for tile size of 2 the subview size is 2 + size(n), which is bigger by one
than it should since we move kernel only once. The problem behind it is that
range is not turned into closed interval before the composition. This commit
fixes the problem by turning ranges first into closed intervals by substracting
1 and after the composition back to half open by adding 1.
Differential Revision: https://reviews.llvm.org/D86638
In this PR, the users of BufferPlacement can configure
BufferAssginmentTypeConverter. These new configurations would give the user more
freedom in the process of converting function signature, and return and call
operation conversions.
These are the new features:
- Accepting callback functions for decomposing types (i.e. 1 to N type
conversion such as unpacking tuple types).
- Defining ResultConversionKind for specifying whether a function result
with a certain type should be appended to the function arguments list or
should be kept as function result. (Usage:
converter.setResultConversionKind<MemRefType>(AppendToArgumentList))
- Accepting callback functions for composing or decomposing values (i.e. N
to 1 and 1 to N value conversion).
Differential Revision: https://reviews.llvm.org/D85133
This reverts commit 94f5d24877 because
of failing the following tests:
MLIR :: Dialect/Linalg/tensors-to-buffers.mlir
MLIR :: Transforms/buffer-placement-preparation-allowed-memref-results.mlir
MLIR :: Transforms/buffer-placement-preparation.mlir
In this PR, the users of BufferPlacement can configure
BufferAssginmentTypeConverter. These new configurations would give the user more
freedom in the process of converting function signature, and return and call
operation conversions.
These are the new features:
- Accepting callback functions for decomposing types (i.e. 1 to N type
conversion such as unpacking tuple types).
- Defining ResultConversionKind for specifying whether a function result
with a certain type should be appended to the function arguments list or
should be kept as function result. (Usage:
converter.setResultConversionKind<MemRefType>(AppendToArgumentList))
- Accepting callback functions for composing or decomposing values (i.e. N
to 1 and 1 to N value conversion).
Differential Revision: https://reviews.llvm.org/D85133
The tensor_reshape op was only fusible only if it is a collapsing case. Now we
propagate the op to all the operands so there is a further chance to fuse it
with generic op. The pre-conditions are:
1) The producer is not an indexed_generic op.
2) All the shapes of the operands are the same.
3) All the indexing maps are identity.
4) All the loops are parallel loops.
5) The producer has a single user.
It is possible to fuse the ops if the producer is an indexed_generic op. We
still can compute the original indices. E.g., if the reshape op collapses the d0
and d1, we can use DimOp to get the width of d1, and calculate the index
`d0 * width + d1`. Then replace all the uses with it. However, this pattern is
not implemented in the patch.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86314
The PDL Interpreter dialect provides a lower level abstraction compared to the PDL dialect, and is targeted towards low level optimization and interpreter code generation. The dialect operations encapsulates low-level pattern match and rewrite "primitives", such as navigating the IR (Operation::getOperand), creating new operations (OpBuilder::create), etc. Many of the operations within this dialect also fuse branching control flow with some form of a predicate comparison operation. This type of fusion reduces the amount of work that an interpreter must do when executing.
An example of this representation is shown below:
```mlir
// The following high level PDL pattern:
pdl.pattern : benefit(1) {
%resultType = pdl.type
%inputOperand = pdl.input
%root, %results = pdl.operation "foo.op"(%inputOperand) -> %resultType
pdl.rewrite %root {
pdl.replace %root with (%inputOperand)
}
}
// May be represented in the interpreter dialect as follows:
module {
func @matcher(%arg0: !pdl.operation) {
pdl_interp.check_operation_name of %arg0 is "foo.op" -> ^bb2, ^bb1
^bb1:
pdl_interp.return
^bb2:
pdl_interp.check_operand_count of %arg0 is 1 -> ^bb3, ^bb1
^bb3:
pdl_interp.check_result_count of %arg0 is 1 -> ^bb4, ^bb1
^bb4:
%0 = pdl_interp.get_operand 0 of %arg0
pdl_interp.is_not_null %0 : !pdl.value -> ^bb5, ^bb1
^bb5:
%1 = pdl_interp.get_result 0 of %arg0
pdl_interp.is_not_null %1 : !pdl.value -> ^bb6, ^bb1
^bb6:
pdl_interp.record_match @rewriters::@rewriter(%0, %arg0 : !pdl.value, !pdl.operation) : benefit(1), loc([%arg0]), root("foo.op") -> ^bb1
}
module @rewriters {
func @rewriter(%arg0: !pdl.value, %arg1: !pdl.operation) {
pdl_interp.replace %arg1 with(%arg0)
pdl_interp.return
}
}
}
```
Differential Revision: https://reviews.llvm.org/D84579
This changes the behavior of constructing MLIRContext to no longer load globally
registered dialects on construction. Instead Dialects are only loaded explicitly
on demand:
- the Parser is lazily loading Dialects in the context as it encounters them
during parsing. This is the only purpose for registering dialects and not load
them in the context.
- Passes are expected to declare the dialects they will create entity from
(Operations, Attributes, or Types), and the PassManager is loading Dialects into
the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only
need to load the dialect for the IR it will emit, and the optimizer is
self-contained and load the required Dialects. For example in the Toy tutorial,
the compiler only needs to load the Toy dialect in the Context, all the others
(linalg, affine, std, LLVM, ...) are automatically loaded depending on the
optimization pipeline enabled.
To adjust to this change, stop using the existing dialect registration: the
global registry will be removed soon.
1) For passes, you need to override the method:
virtual void getDependentDialects(DialectRegistry ®istry) const {}
and registery on the provided registry any dialect that this pass can produce.
Passes defined in TableGen can provide this list in the dependentDialects list
field.
2) For dialects, on construction you can register dependent dialects using the
provided MLIRContext: `context.getOrLoadDialect<DialectName>()`
This is useful if a dialect may canonicalize or have interfaces involving
another dialect.
3) For loading IR, dialect that can be in the input file must be explicitly
registered with the context. `MlirOptMain()` is taking an explicit registry for
this purpose. See how the standalone-opt.cpp example is setup:
mlir::DialectRegistry registry;
registry.insert<mlir::standalone::StandaloneDialect>();
registry.insert<mlir::StandardOpsDialect>();
Only operations from these two dialects can be in the input file. To include all
of the dialects in MLIR Core, you can populate the registry this way:
mlir::registerAllDialects(registry);
4) For `mlir-translate` callback, as well as frontend, Dialects can be loaded in
the context before emitting the IR: context.getOrLoadDialect<ToyDialect>()
Differential Revision: https://reviews.llvm.org/D85622
This changes the behavior of constructing MLIRContext to no longer load globally
registered dialects on construction. Instead Dialects are only loaded explicitly
on demand:
- the Parser is lazily loading Dialects in the context as it encounters them
during parsing. This is the only purpose for registering dialects and not load
them in the context.
- Passes are expected to declare the dialects they will create entity from
(Operations, Attributes, or Types), and the PassManager is loading Dialects into
the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only
need to load the dialect for the IR it will emit, and the optimizer is
self-contained and load the required Dialects. For example in the Toy tutorial,
the compiler only needs to load the Toy dialect in the Context, all the others
(linalg, affine, std, LLVM, ...) are automatically loaded depending on the
optimization pipeline enabled.
To adjust to this change, stop using the existing dialect registration: the
global registry will be removed soon.
1) For passes, you need to override the method:
virtual void getDependentDialects(DialectRegistry ®istry) const {}
and registery on the provided registry any dialect that this pass can produce.
Passes defined in TableGen can provide this list in the dependentDialects list
field.
2) For dialects, on construction you can register dependent dialects using the
provided MLIRContext: `context.getOrLoadDialect<DialectName>()`
This is useful if a dialect may canonicalize or have interfaces involving
another dialect.
3) For loading IR, dialect that can be in the input file must be explicitly
registered with the context. `MlirOptMain()` is taking an explicit registry for
this purpose. See how the standalone-opt.cpp example is setup:
mlir::DialectRegistry registry;
registry.insert<mlir::standalone::StandaloneDialect>();
registry.insert<mlir::StandardOpsDialect>();
Only operations from these two dialects can be in the input file. To include all
of the dialects in MLIR Core, you can populate the registry this way:
mlir::registerAllDialects(registry);
4) For `mlir-translate` callback, as well as frontend, Dialects can be loaded in
the context before emitting the IR: context.getOrLoadDialect<ToyDialect>()
Differential Revision: https://reviews.llvm.org/D85622
This changes the behavior of constructing MLIRContext to no longer load globally
registered dialects on construction. Instead Dialects are only loaded explicitly
on demand:
- the Parser is lazily loading Dialects in the context as it encounters them
during parsing. This is the only purpose for registering dialects and not load
them in the context.
- Passes are expected to declare the dialects they will create entity from
(Operations, Attributes, or Types), and the PassManager is loading Dialects into
the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only
need to load the dialect for the IR it will emit, and the optimizer is
self-contained and load the required Dialects. For example in the Toy tutorial,
the compiler only needs to load the Toy dialect in the Context, all the others
(linalg, affine, std, LLVM, ...) are automatically loaded depending on the
optimization pipeline enabled.
To adjust to this change, stop using the existing dialect registration: the
global registry will be removed soon.
1) For passes, you need to override the method:
virtual void getDependentDialects(DialectRegistry ®istry) const {}
and registery on the provided registry any dialect that this pass can produce.
Passes defined in TableGen can provide this list in the dependentDialects list
field.
2) For dialects, on construction you can register dependent dialects using the
provided MLIRContext: `context.getOrLoadDialect<DialectName>()`
This is useful if a dialect may canonicalize or have interfaces involving
another dialect.
3) For loading IR, dialect that can be in the input file must be explicitly
registered with the context. `MlirOptMain()` is taking an explicit registry for
this purpose. See how the standalone-opt.cpp example is setup:
mlir::DialectRegistry registry;
mlir::registerDialect<mlir::standalone::StandaloneDialect>();
mlir::registerDialect<mlir::StandardOpsDialect>();
Only operations from these two dialects can be in the input file. To include all
of the dialects in MLIR Core, you can populate the registry this way:
mlir::registerAllDialects(registry);
4) For `mlir-translate` callback, as well as frontend, Dialects can be loaded in
the context before emitting the IR: context.getOrLoadDialect<ToyDialect>()
This changes the behavior of constructing MLIRContext to no longer load globally registered dialects on construction. Instead Dialects are only loaded explicitly on demand:
- the Parser is lazily loading Dialects in the context as it encounters them during parsing. This is the only purpose for registering dialects and not load them in the context.
- Passes are expected to declare the dialects they will create entity from (Operations, Attributes, or Types), and the PassManager is loading Dialects into the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only need to load the dialect for the IR it will emit, and the optimizer is self-contained and load the required Dialects. For example in the Toy tutorial, the compiler only needs to load the Toy dialect in the Context, all the others (linalg, affine, std, LLVM, ...) are automatically loaded depending on the optimization pipeline enabled.
Differential Revision: https://reviews.llvm.org/D85622
This changes the behavior of constructing MLIRContext to no longer load globally registered dialects on construction. Instead Dialects are only loaded explicitly on demand:
- the Parser is lazily loading Dialects in the context as it encounters them during parsing. This is the only purpose for registering dialects and not load them in the context.
- Passes are expected to declare the dialects they will create entity from (Operations, Attributes, or Types), and the PassManager is loading Dialects into the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only need to load the dialect for the IR it will emit, and the optimizer is self-contained and load the required Dialects. For example in the Toy tutorial, the compiler only needs to load the Toy dialect in the Context, all the others (linalg, affine, std, LLVM, ...) are automatically loaded depending on the optimization pipeline enabled.
Linalg to processors.
This changes adds infrastructure to distribute the loops generated in
Linalg to processors at the time of generation. This addresses use
case where the instantiation of loop is done just to distribute
them. The option to distribute is added to TilingOptions for now and
will allow specifying the distribution as a transformation option,
just like tiling and promotion are specified as options.
Differential Revision: https://reviews.llvm.org/D85147
This revision adds a folding pattern to replace affine.min ops by the actual min value, when it can be determined statically from the strides and bounds of enclosing scf loop .
This matches the type of expressions that Linalg produces during tiling and simplifies boundary checks. For now Linalg depends both on Affine and SCF but they do not depend on each other, so the pattern is added there.
In the future this will move to a more appropriate place when it is determined.
The canonicalization of AffineMinOp operations in the context of enclosing scf.for and scf.parallel proceeds by:
1. building an affine map where uses of the induction variable of a loop
are replaced by `%lb + %step * floordiv(%iv - %lb, %step)` expressions.
2. checking if any of the results of this affine map divides all the other
results (in which case it is also guaranteed to be the min).
3. replacing the AffineMinOp by the result of (2).
The algorithm is functional in simple parametric tiling cases by using semi-affine maps. However simplifications of such semi-affine maps are not yet available and the canonicalization does not succeed yet.
Differential Revision: https://reviews.llvm.org/D82009
Replaced definition of named ND ConvOps with tensor comprehension
syntax which reduces boilerplate code significantly. Furthermore,
new ops to support TF convolutions added (without strides and dilations).
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D84628
This commit is part of a greater project which aims to add
full end-to-end support for convolutions inside mlir. The
reason behind having conv ops for each rank rather than
having one generic ConvOp is to enable better optimizations
for every N-D case which reflects memory layout of input/kernel
buffers better and simplifies code as well. We expect plain linalg.conv
to be progressively retired.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D83879
- Add getArgumentTypes() to Region (missed from before)
- Adopt Region argument API in `hasMultiplyAddBody`
- Fix 2 typos in comments
Differential Revision: https://reviews.llvm.org/D84807
- replace DotOp, now that DRR rules have been dropped.
- Capture arguments mismatch in the parser. The number of parsed arguments must
equal the number of expected arguments.
Reviewed By: ftynse, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D82952
linalg.indexed_generic (consumer) with tensor arguments.
The implementation of fusing std.constant producer with a
linalg.indexed_generic consumer was already in place. It is exposed
with this change. Also cleaning up some of the patterns that implement
the fusion to not be templated, thereby avoiding lot of conditional
checks for calling the right instantiation.
Differential Revision: https://reviews.llvm.org/D84566
The `makeTiledViews` did not use the sizes of the tiled views based on
the result of the loop bound inference computation. This manifested as
an error in computing tile sizes with convolution where not all the
result expression of concatenated affine maps are simple
AffineDimExpr.
Differential Revision: https://reviews.llvm.org/D84366
Right now there is a branching for 2 functions based on whether target map has
symbols or not. In this commit these functions are merged into one.
Furthermore, emitting does not require inverse and map applying as it computes
the correct Range in a single step and thus reduces unnecessary overhead.
Differential Revision: https://reviews.llvm.org/D83756
Loop bound inference is right now very limited as it supports only permutation maps and thus
it is impossible to implement convolution with linalg.generic as it requires more advanced
loop bound inference. This commits solves it for the convolution case.
Depends On D83158
Differential Revision: https://reviews.llvm.org/D83191
The underlying infrastructure supports this already, just add the
pattern matching for linalg.generic.
Differential Revision: https://reviews.llvm.org/D84335
This commit adds functionality needed for implementation of convolutions with
linalg.generic op. Since linalg.generic right now expects indexing maps to be
just permutations, offset indexing needed in convolutions is not possible.
Therefore in this commit we address the issue by adding support for symbols inside
indexing maps which enables more advanced indexing. The upcoming commit will
solve the problem of computing loop bounds from such maps.
Differential Revision: https://reviews.llvm.org/D83158
Summary:
linalg.copy + linalg.fill can be used to create a padded local buffer.
The `masked` attribute is only valid on this padded buffer.
When forwarding to vector.transfer ops, the attribute must be reset
conservatively.
Differential Revision: https://reviews.llvm.org/D83782
Improve the logic deciding if it is safe to hoist vector transfer read/write
out of the loop. Change the logic to prevent hoisting operations if there are
any unknown access to the memref in the loop no matter where the operation is.
For other transfer read/write in the loop check if we can prove that they
access disjoint memory and ignore them in this case.
Differential Revision: https://reviews.llvm.org/D83538
This revision adds support for vectorizing named and generic contraction ops to vector.contract. Cases in which the memref is 0-D are special cased to emit std.load/std.store instead of vector.transfer. Relevant tests are added.
Differential revision: https://reviews.llvm.org/D83307
While lowering min/max pooling ops to loops, generate the right kind of
load/stores (std or affine) instead of always generating std
load/stores.
Differential Revision: https://reviews.llvm.org/D83080
Current Affine comparison builders, which use operator overload, default to signed comparison. This creates the possibility of misuse of these builders and potential correctness issues when dealing with unsigned integers. This change makes the distinction between signed and unsigned comparison builders and forces the caller to make a choice between the two.
Differential Revision: https://reviews.llvm.org/D82323
This revision removes the TypeConverter parameter passed to the apply* methods, and instead moves the responsibility of region type conversion to patterns. The types of a region can be converted using the 'convertRegionTypes' method, which acts similarly to the existing 'applySignatureConversion'. This method ensures that all blocks within, and including those moved into, a region will have the block argument types converted using the provided converter.
This has the benefit of making more of the legalization logic controlled by patterns, instead of being handled explicitly by the driver. It also opens up the possibility to support multiple type conversions at some point in the future.
This revision also adds a new utility class `FailureOr<T>` that provides a LogicalResult friendly facility for returning a failure or a valid result value.
Differential Revision: https://reviews.llvm.org/D81681
This class enables for abstracting more of the details for the rewrite process, and will allow for clients to apply specific cost models to the pattern list. This allows for DialectConversion and the GreedyPatternRewriter to share the same underlying matcher implementation. This also simplifies the plumbing necessary to support dynamic patterns.
Differential Revision: https://reviews.llvm.org/D81985
Recent work has introduced support for constructing loops via `::build` with
callbacks that construct loop bodies using only the core OpBuilder. This is now
supported on all loop types that Linalg lowers to. Refactor LoopNestBuilder in
Linalg to rely on this functionality instead of using a custom EDSC-based
approach to creating loop nests.
The specialization targeting parallel loops is also simplified by factoring out
the recursive call into a separate static function and considering only two
alternatives: top-level loop is parallel or sequential.
This removes the last remaining in-tree use of edsc::LoopBuilder, which is now
deprecated and will be removed soon.
Differential Revision: https://reviews.llvm.org/D81873
Summary:
This revision replaces MatmulOp, now that DRR rules have been dropped.
This revision also fixes minor parsing bugs and a plugs a few holes to get e2e paths working (e.g. library call emission).
During the replacement the i32 version had to be dropped because only the EDSC operators +, *, etc support type inference.
Deciding on a type-polymorphic behavior, and implementing it, is left for future work.
Reviewers: aartbik
Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, frgossen, Kayjukh, jurahul, msifontes
Tags: #mlir
Differential Revision: https://reviews.llvm.org/D81935
It is quite common for the same type to be converted many types throughout the conversion process, and there isn't any good reason why we aren't caching that result. Especially given that we currently use identity conversion to signify legality. This revision also adds a few additional helpers to TypeConverter.
Differential Revision: https://reviews.llvm.org/D81679
This revision replaces MatmulOp, now that DRR rules have been dropped.
This revision also fixes minor parsing bugs and a plugs a few holes to get e2e paths working (e.g. library call emission).
During the replacement the i32 version had to be dropped because only the EDSC operators +, *, etc support type inference.
Deciding on a type-polymorphic behavior, and implementing it, is left for future work.
Differential Revision: https://reviews.llvm.org/D79762
Summary:
* extra ';' in the following files:
mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
mlir/lib/Dialect/Shape/IR/Shape.cpp
* base class ‘mlir::ConvertVectorToSCFBase<ConvertVectorToSCFPass>’
should be explicitly initialized in the copy constructor [-Wextra] in
mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp
* warning: ‘bool Expression::operator==(const Expression&) const’
defined but not used [-Wunused-function] in
mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp
Differential Revision: https://reviews.llvm.org/D81673
This parameter gives the developers the freedom to choose their desired function
signature conversion for preparing their functions for buffer placement. It is
introduced for BufferAssignmentFuncOpConverter, and also for
BufferAssignmentReturnOpConverter, and BufferAssignmentCallOpConverter to adapt
the return and call operations with the selected function signature conversion.
If the parameter is set, buffer placement won't also deallocate the returned
buffers.
Differential Revision: https://reviews.llvm.org/D81137
This revision adds a helper function to hoist vector.transfer_read /
vector.transfer_write pairs out of immediately enclosing scf::ForOp
iteratively, if the following conditions are true:
1. The 2 ops access the same memref with the same indices.
2. All operands are invariant under the enclosing scf::ForOp.
3. No uses of the memref either dominate the transfer_read or are
dominated by the transfer_write (i.e. no aliasing between the write and
the read across the loop)
To improve hoisting opportunities, call the `moveLoopInvariantCode` helper
function on the candidate loop above which to hoist. Hoisting the transfers
results in scf::ForOp yielding the value that originally transited through
memory.
This revision additionally exposes `moveLoopInvariantCode` as a helper in
LoopUtils.h and updates SliceAnalysis to support return scf::For values and
allow hoisting across multiple scf::ForOps.
Differential Revision: https://reviews.llvm.org/D81199
Update linalg to affine lowering for convop to use affine load for input
whenever there is no padding. It had always been using std.loads because
max in index functions (needed for non-zero padding if not materializing
zeros) couldn't be represented in the non-zero padding cases.
In the future, the non-zero padding case could also be made to use
affine - either by materializing or using affine.execute_region. The
latter approach will not impact the scf/std output obtained after
lowering out affine.
Differential Revision: https://reviews.llvm.org/D81191
This revision adds a helper function to hoist alloc/dealloc pairs and
alloca op out of immediately enclosing scf::ForOp if both conditions are true:
1. all operands are defined outside the loop.
2. all uses are ViewLikeOp or DeallocOp.
This is now considered Linalg-specific and will be generalized on a per-need basis.
Differential Revision: https://reviews.llvm.org/D81152
Summary:
The fusion for tensor_reshape is embedding the information to indexing maps,
thus the exising pattenr also works for indexed_generic ops.
Depends On D80347
Differential Revision: https://reviews.llvm.org/D80348
Summary:
Different from the fusion between generic ops, indices are involved. In this
context, we need to re-map the indices for producer since the fused op is built
on consumer's perspective. This patch supports all combination of the fusion
between indexed_generic ops and generic ops, which includes tests case:
1) generic op as producer and indexed_generic op as consumer.
2) indexed_generic op as producer and generic op as consumer.
3) indexed_generic op as producer and indexed_generic op as consumer.
Differential Revision: https://reviews.llvm.org/D80347
This revision replaces the load + vector.type_cast by appropriate vector transfer
operations. These play more nicely with other vector abstractions and canonicalization
patterns and lower to load/store with or without masks when appropriate.
Differential Revision: https://reviews.llvm.org/D80809
This revision adds custom rewrites for patterns that arise during linalg structured
ops vectorization. These patterns allow the composition of linalg promotion,
vectorization and removal of redundant copies.
The patterns are voluntarily limited and restrictive atm.
More robust behavior will be implemented once more powerful side effect modeling and analyses are available on view/subview.
On the transfer_read side, the following pattern is rewritten:
```
%alloc = ...
[optional] %view = std.view %alloc ...
%subView = subview %allocOrView ...
[optional] linalg.fill(%allocOrView, %cst) ...
...
linalg.copy(%in, %subView) ...
vector.transfer_read %allocOrView[...], %cst ...
```
into
```
[unchanged] %alloc = ...
[unchanged] [optional] %view = std.view %alloc ...
[unchanged] [unchanged] %subView = subview %allocOrView ...
...
vector.transfer_read %in[...], %cst ...
```
On the transfer_write side, the following pattern is rewriten:
```
%alloc = ...
[optional] %view = std.view %alloc ...
%subView = subview %allocOrView...
...
vector.transfer_write %..., %allocOrView[...]
linalg.copy(%subView, %out)
```
Differential Revision: https://reviews.llvm.org/D80728
Buffer placement can now operates on functions that return buffers. These
buffers escape from the deallocation phase of buffer placement.
Differential Revision: https://reviews.llvm.org/D80696
operands of Generic ops.
Unit-extent dimensions are typically used for achieving broadcasting
behavior. The pattern added (along with canonicalization patterns
added previously) removes the use of unit-extent dimensions, and
instead uses a more canonical representation of the computation. This
new pattern is not added as a canonicalization for now since it
entails adding additional reshape operations. A pass is added to
exercise these patterns, along with an API entry to populate a
patterns list with these patterns.
Differential Revision: https://reviews.llvm.org/D79766
alloc/dealloc/copies.
Add options to LinalgPromotion to use callbacks for implementating the
allocation, deallocation of buffers used for the promoted subviews,
and to copy data into and from the original subviews to the allocated
buffers.
Also some misc. cleanup of the code.
Differential Revision: https://reviews.llvm.org/D80365
Modifying the loop nest builder for generating scf.parallel loops to
not generate scf.parallel loops for non-parallel iterator types in
Linalg operations. The existing implementation incorrectly generated
scf.parallel for all tiled loops. It is rectified by refactoring logic
used while lowering to loops that accounted for this.
Differential Revision: https://reviews.llvm.org/D80188
Summary:
This revision refactors the Linalg tiling pass to be written as pattern applications and retires the use of the folder in Linalg tiling.
In the early days, tiling was written as a pass that would create (partially) folded and canonicalized operations on the fly for better composability.
As this evolves towards composition of patterns, the pass-specific folder is counter-productive and is retired.
The tiling options struct evolves to take a tile size creation function which allows materializing tile sizes on the fly (in particular constant tile sizes). This plays better with folding and DCE.
With the folder going away in Tiling, the check on whether subviews are the same in linalg fusion needs to be more robust. This revision also implements such a check.
In the current form, there are still some canonicalizations missing due to AffineMin/Max ops fed by scf::ForOp. These will be improved at a later time.
Differential Revision: https://reviews.llvm.org/D80267
Making these two converters more generic. FunctionAndBlockSignatureConverter now
moves only memref results (after type conversion) to the function argument and
keeps other legal function results unchanged. NonVoidToVoidReturnOpConverter is
renamed to NoBufferOperandsReturnOpConverter. It removes only the buffer
operands from the operands of the converted ReturnOp and inserts CopyOps to copy
each buffer to the target function argument.
Differential Revision: https://reviews.llvm.org/D79329
For now the promoted buffer is indexed using the `full view`. The full view might be
slightly bigger than the partial view (which is accounting for boundaries).
Unfortunately this does not compose easily with other transformations when multiple buffers
with shapes related to each other are involved.
Take `linalg.matmul A B C` (with A of size MxK, B of size KxN and C of size MxN) and suppose we are:
- Tiling over M by 100
- Promoting A only
This is producing a `linalg.matmul promoted_A B subview_C` where `promoted_A` is a promoted buffer
of `A` of size (100xK) and `subview_C` is a subview of size mxK where m could be smaller than 100 due
to boundaries thus leading to a possible incorrect behavior.
We propose to:
- Add a new parameter to the tiling promotion allowing to enable the use of the full tile buffer.
- By default all promoted buffers will be indexed by the partial view.
Note that this could be considered as a breaking change in comparison to the way the tiling promotion
was working.
Differential Revision: https://reviews.llvm.org/D79927
All ops of the SCF dialect now use the `scf.` prefix instead of `loop.`. This
is a part of dialect renaming.
Differential Revision: https://reviews.llvm.org/D79844
The existing implementation of SubViewOp::getRanges relies on all
offsets/sizes/strides to be dynamic values and does not work in
combination with canonicalization. This revision adds a
SubViewOp::getOrCreateRanges to create the missing constants in the
canonicalized case.
This allows reactivating the fused pass with staged pattern
applications.
However another issue surfaces that the SubViewOp verifier is now too
strict to allow folding. The existing folding pattern is turned into a
canonicalization pattern which rewrites memref_cast + subview into
subview + memref_cast.
The transform-patterns-matmul-to-vector can then be reactivated.
Differential Revision: https://reviews.llvm.org/D79759
Summary:
This revision introduces a helper function to allow applying rewrite patterns, interleaved with more global transformations, in a staged fashion:
1. the first stage consists of an OwningRewritePatternList. The RewritePattern in this list are applied once and in order.
2. the second stage consists of a single OwningRewritePattern that is applied greedily until convergence.
3. the third stage consists of applying a lambda, generally used for non-local transformation effects.
This allows creating custom fused transformations where patterns can be ordered and applied at a finer granularity than a sequence of traditional compiler passes.
A test that exercises these behaviors is added.
Differential Revision: https://reviews.llvm.org/D79518
Summary:
This makes a common pattern of
`dyn_cast_or_null<OpTy>(v.getDefiningOp())` more concise.
Differential Revision: https://reviews.llvm.org/D79681
This [discussion](https://llvm.discourse.group/t/viewop-isnt-expressive-enough/991/2) raised some concerns with ViewOp.
In particular, the handling of offsets is incorrect and does not match the op description.
Note that with an elemental type change, offsets cannot be part of the type in general because sizeof(srcType) != sizeof(dstType).
Howerver, offset is a poorly chosen term for this purpose and is renamed to byte_shift.
Additionally, for all intended purposes, trying to support non-identity layouts for this op does not bring expressive power but rather increases code complexity.
This revision simplifies the existing semantics and implementation.
This simplification effort is voluntarily restrictive and acts as a stepping stone towards supporting richer semantics: treat the non-common cases as YAGNI for now and reevaluate based on concrete use cases once a round of simplification occurred.
Differential revision: https://reviews.llvm.org/D79541
Summary: This revision introduces LinalgPromotionOptions to more easily control the application of promotion patterns. It also simplifies the different entry points into Promotion in preparation for some behavior change in subsequent revisions.
Differential Revision: https://reviews.llvm.org/D79489
This dialect contains various structured control flow operaitons, not only
loops, reflect this in the name. Drop the Ops suffix for consistency with other
dialects.
Note that this only moves the files and changes the C++ namespace from 'loop'
to 'scf'. The visible IR prefix remains the same and will be updated
separately. The conversions will also be updated separately.
Differential Revision: https://reviews.llvm.org/D79578
Portions of MLIR which depend on LLVMIR generally need to depend on
intrinsics_gen, to ensure that tablegen'd header files from LLVM are built
first. Without this, we get errors, typically about llvm/IR/Attributes.inc
not being found.
Note that previously the Linalg Dialect depended on intrinsics_gen, but it
doesn't need to, since it doesn't use LLVMIR.
Differential Revision: https://reviews.llvm.org/D79389
- Exports MLIR targets to be used out-of-tree.
- mimicks `add_clang_library` and `add_flang_library`.
- Fixes libMLIR.so
After https://reviews.llvm.org/D77515 libMLIR.so was no longer containing
any object files. We originally had a cludge there that made it work with
the static initalizers and when switchting away from that to the way the
clang shlib does it, I noticed that MLIR doesn't create a `obj.{name}` target,
and doesn't export it's targets to `lib/cmake/mlir`.
This is due to MLIR using `add_llvm_library` under the hood, which adds
the target to `llvmexports`.
Differential Revision: https://reviews.llvm.org/D78773
[MLIR] Fix libMLIR.so and LLVM_LINK_LLVM_DYLIB
Primarily, this patch moves all mlir references to LLVM libraries into
either LLVM_LINK_COMPONENTS or LINK_COMPONENTS. This enables magic in
the llvm cmake files to automatically replace reference to LLVM components
with references to libLLVM.so when necessary. Among other things, this
completes fixing libMLIR.so, which has been broken for some configurations
since D77515.
Unlike previously, the pattern is now that mlir libraries should almost
always use add_mlir_library. Previously, some libraries still used
add_llvm_library. However, this confuses the export of targets for use
out of tree because libraries specified with add_llvm_library are exported
by LLVM. Instead users which don't need/can't be linked into libMLIR.so
can specify EXCLUDE_FROM_LIBMLIR
A common error mode is linking with LLVM libraries outside of LINK_COMPONENTS.
This almost always results in symbol confusion or multiply defined options
in LLVM when the same object file is included as a static library and
as part of libLLVM.so. To catch these errors more directly, there's now
mlir_check_all_link_libraries.
To simplify usage of add_mlir_library, we assume that all mlir
libraries depend on LLVMSupport, so it's not necessary to separately specify
it.
tested with:
BUILD_SHARED_LIBS=on,
BUILD_SHARED_LIBS=off + LLVM_BUILD_LLVM_DYLIB,
BUILD_SHARED_LIBS=off + LLVM_BUILD_LLVM_DYLIB + LLVM_LINK_LLVM_DYLIB.
By: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com>
Differential Revision: https://reviews.llvm.org/D79067
[MLIR] Move from using target_link_libraries to LINK_LIBS
This allows us to correctly generate dependencies for derived targets,
such as targets which are created for object libraries.
By: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com>
Differential Revision: https://reviews.llvm.org/D79243
Three commits have been squashed to avoid intermediate build breakage.
Linalg transformations are currently exposed as DRRs.
Unfortunately RewriterGen does not play well with the line of work on named linalg ops which require variadic operands and results.
Additionally, DRR is arguably not the right abstraction to expose compositions of such patterns that don't rely on SSA use-def semantics.
This revision abandons DRRs and exposes manually written C++ patterns.
Refactorings and cleanups are performed to uniformize APIs.
This refactoring will allow replacing the currently manually specified Linalg named ops.
A collateral victim of this refactoring is the `tileAndFuse` DRR, and the one associated test, which will be revived at a later time.
Lastly, the following 2 tests do not add value and are altered:
- a dot_perm tile + interchange test does not test anything new and is removed
- a dot tile + lower to loops does not need 2-D tiling and is trimmed.
These libraries are distinct from other things in Analysis in that they
operate only on core IR concepts. This also simplifies dependencies
so that Dialect -> Analysis -> Parser -> IR. Previously, the parser depended
on portions of the the Analysis directory as well, which sometimes
caused issues with the way the cmake makefile generator discovers
dependencies on generated files during compilation.
Differential Revision: https://reviews.llvm.org/D79240
Summary:
This revision cleans up a layer of complexity in ScopedContext and uses InsertGuard instead of previously manual bookkeeping.
The method `getBuilder` is renamed to `getBuilderRef` and spurious copies of OpBuilder are tracked.
This results in some canonicalizations not happening anymore in the Linalg matmul to vector test. This test is retired because relying on DRRs for this has been shaky at best. The solution will be better support to write fused passes in C++ with more idiomatic pattern composition and application.
Differential Revision: https://reviews.llvm.org/D79208
This revision adds support to allow named ops to lower to loops.
Linalg.batch_matmul successfully lowers to loops and to LLVM.
In the process, this test also activates linalg to affine loops.
However padded convolutions to not lower to affine.load atm so this revision overrides the type of underlying load / store operation.
Differential Revision: https://reviews.llvm.org/D79135
OperationHandle mostly existed to mirror the behavior of ValueHandle.
This has become unnecessary and can be retired.
Differential Revision: https://reviews.llvm.org/D78692
Instead of using llvm_unreachable to guard against fusing linalg.conv,
reject fusing linalg.conv in isFusableInto.
tileLinalgOpImpl is a templated function now and it can operate on
loop.parellel. So we should avoid calling into getForInductionVarOwner
which always assumes loop.for.
Differential Revision: https://reviews.llvm.org/D78936
it to fusing different kinds of linalg operations on tensors.
The implementation of fusion on tensor was initially planned for just
GenericOps (and maybe IndexedGenericOps). With addition of
linalg.tensor_reshape, and potentially other such non-structured ops,
refactor the existing implementation to allow easier specification of
fusion between different linalg operations on tensors.
Differential Revision: https://reviews.llvm.org/D78463
The buffer allocated by a promotion can be subject to other transformations afterward. For example it could be vectorized, in which case it is needed to ensure that this buffer is memory-aligned.
Differential Revision: https://reviews.llvm.org/D78556
This revision is the first in a set of improvements that aim at allowing
more generalized named Linalg op generation from a mathematical
specification.
This revision allows creating a new op and checks that the parser,
printer and verifier are hooked up properly.
This opened up a few design points that will be addressed in the future:
1. A named linalg op has a static region builder instead of an
explicitly parsed region. This is not currently compatible with
assemblyFormat so a custom parser / printer are needed.
2. The convention for structured ops and tensor return values needs to
evolve to allow tensor-land and buffer land specifications to agree
3. ReferenceIndexingMaps and referenceIterators will need to become
static to allow building attributes at parse time.
4. Error messages will be improved once we have 3. and we pretty print
in custom form.
Differential Revision: https://reviews.llvm.org/D78327
The promotion transformation is promoting all input and output buffers of the transformed op. The user might want to only promote some of these buffers.
Differential Revision: https://reviews.llvm.org/D78498
There were some unused CMakeFiles for Affine/IR and Affine/EDSC.
This change builds separate MLIRAffineOps and MLIRAffineEDSC libraries
using those CMakeFiles. This combination replaces the old MLIRAffine
library.
Differential Revision: https://reviews.llvm.org/D78317
The function attribute in generic ops is not paying for itself.
A region is the more standardized way of specifying a custom computation.
If needed this region can call a function directly.
This is deemed more natural than managing a dedicated function attribute.
This also simplifies named ops generation by trimming unnecessary complexity.
Differential Revision: https://reviews.llvm.org/D78266
Summary:
Modified AffineMap::get to remove support for the overload which allowed
an ArrayRef of AffineExpr but no context (and gathered the context from a
presumed first entry, resulting in bugs when there were 0 results).
Instead, we support only a ArrayRef and a context, and a version which
takes a single AffineExpr.
Additionally, removed some now needless case logic which previously
special cased which call to AffineMap::get to use.
Reviewers: flaub, bondhugula, rriddle!, nicolasvasilache, ftynse, ulysseB, mravishankar, antiagainst, aartbik
Subscribers: mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, bader, grosul1, frgossen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78226
The inversePermutation method returns a null map on failure. Update
uses of this method within Linalg to handle this. In LinalgToLoops the
null return value was used to emit scalar code. Modify that to return
failure, and emit scalar implementation when affine map is "empty",
i.e. 1 dims, 0 symbols and no result exprs.
Differential Revision: https://reviews.llvm.org/D77964
Summary: Functional.h contains many different methods that have a direct, and more efficient, equivalent in LLVM. This revision replaces all usages with the LLVM equivalent, and removes the header. This is part of larger cleanup, pr45513, merging MLIR support facilities into LLVM.
Differential Revision: https://reviews.llvm.org/D78053
This change is NFC since the facility to tile and generate
loop.parallel loops already exists in Linalg.
Differential Revision: https://reviews.llvm.org/D77965
The invertPermutation method does not return a nullptr anymore, but
rather returns an empty map for the scalar case. Update the check in
LinalgToLoops to reflect this.
Also add test case for generating scalar code.
The outer parallel loops of a linalg operation is lowered to
loop.parallel, with the other loops lowered to loop.for. This gets the
lowering to loop.parallel on par with the loop.for lowering. In future
the reduction loop could also be lowered to loop.parallel.
Also add a utility function that returns the loops that are
created.
Differential Revision: https://reviews.llvm.org/D77678
Rename mlir::applyPatternsGreedily -> applyPatternsAndFoldGreedily. The
new name is a more accurate description of the method - it performs
both, application of the specified patterns and folding of all ops in
the op's region irrespective of whether any patterns have been supplied.
Differential Revision: https://reviews.llvm.org/D77478
Summary: Pass options are a better choice for various reasons and avoid the need for static constructors.
Differential Revision: https://reviews.llvm.org/D77707
This revision removes the reliance of Promotion on `linalg.slice` which is meant
for the rank-reducing case.
Differential Revision: https://reviews.llvm.org/D77676
Summary:
This is much cleaner, and fits the same structure as many other tablegen backends. This was not done originally as the CRTP in the pass classes made it overly verbose/complex.
Differential Revision: https://reviews.llvm.org/D77367
This revision removes all of the CRTP from the pass hierarchy in preparation for using the tablegen backend instead. This creates a much cleaner interface in the C++ code, and naturally fits with the rest of the infrastructure. A new utility class, PassWrapper, is added to replicate the existing behavior for passes not suitable for using the tablegen backend.
Differential Revision: https://reviews.llvm.org/D77350
This revision adds support for generating utilities for passes such as options/statistics/etc. that can be inferred from the tablegen definition. This removes additional boilerplate from the pass, and also makes it easier to remove the reliance on the pass registry to provide certain things(e.g. the pass argument).
Differential Revision: https://reviews.llvm.org/D76659
This generates a Passes.td for all of the dialects that have transformation passes. This removes the need for global registration for all of the dialect passes.
Differential Revision: https://reviews.llvm.org/D76657
Summary:
The RAW fusion happens only if the produecer block dominates the consumer block.
The WAW pattern also works with the precondition. I.e., if a producer can
dominate the consumer, they can fairly fuse together.
Since they are all tilable, we can think the pattern like this way:
Input:
```
linalg_op1 view
tile_loop
subview_2
linalg_op2 subview_2
```
Tile the first Linalg op as same as the second Linalg.
```
tile_loop
subview_1
linalg_op1 subview_1
tile_loop
subview_2
liangl_op2 subview_2
```
Since the first Linalg op is tilable in the same way and the computation are
independently, it's fair to fuse it with the second Linalg op.
```
tile_loop
subview_1
linalg_op1 subview_1
linalg_op2 subview_2
```
In short, this patch includes:
- Handling both RAW and WAW pattern.
- Adding a interface method to get input and output buffers.
- Exposing a method to get a StringRef of a dependency type.
- Fixing existing WAW tests and add one more use case: initialize the buffer
before conv op.
Differential Revision: https://reviews.llvm.org/D76897
Summary:
Performs an N-D pooling operation similarly to the description in the TF
documentation:
https://www.tensorflow.org/api_docs/python/tf/nn/pool
Different from the description, this operation doesn't perform on batch and
channel. It only takes tensors of rank `N`.
```
output[x[0], ..., x[N-1]] =
REDUCE_{z[0], ..., z[N-1]}
input[
x[0] * strides[0] - pad_before[0] + dilation_rate[0]*z[0],
...
x[N-1]*strides[N-1] - pad_before[N-1] + dilation_rate[N-1]*z[N-1]
],
```
The required optional arguments are:
- strides: an i64 array specifying the stride (i.e. step) for window
loops.
- dilations: an i64 array specifying the filter upsampling/input
downsampling rate
- padding: an i64 array of pairs (low, high) specifying the number of
elements to pad along a dimension.
If strides or dilations attributes are missing then the default value is
one for each of the input dimensions. Similarly, padding values are zero
for both low and high in each of the dimensions, if not specified.
Differential Revision: https://reviews.llvm.org/D76414
Existing tiling implementation of Linalg would still work for tiling
the batch dimensions of the convolution op.
Differential Revision: https://reviews.llvm.org/D76637
Summary:
Change AffineOps Dialect structure to better group both IR and Tranforms. This included extracting transforms directly related to AffineOps. Also move AffineOps to Affine.
Differential Revision: https://reviews.llvm.org/D76161
Summary:
After D75831 has been landed, both the generic op and indexed_generic op can
handle 0-D edge case. In the previous patch, only generic op has been updated.
This patch updates the lowering to loops for indexed_generic op. Since they are
almost the sanme, the patch also refactors the common part.
Differential Revision: https://reviews.llvm.org/D76413
Summary:
Although bool and int1 are the same sometimes, using bool constant matches the
semantic better. In any cases, we don't have to care the type of conditions if
we remove the intial value. The type is determined automatically by the returned
type of logical operations.
Differential Revision: https://reviews.llvm.org/D76338
Summary:
To enable this, two changes are needed:
1) Add an optional attribute `padding` to linalg.conv.
2) Compute if the indices accessing is out of bound in the loops. If so, use the
padding value `0`. Otherwise, use the value derived from load.
In the patch, the padding only works for lowering without other transformations,
e.g., tiling, fusion, etc.
Differential Revision: https://reviews.llvm.org/D75722
Summary:
This revision adds lowering of vector.contract to llvm.intr.matrix_multiply.
Note that there is currently a mismatch between the MLIR vector dialect which
expects row-major layout and the LLVM matrix intrinsics which expect column
major layout.
As a consequence, we currently only match a vector.contract with indexing maps
that express column-major matrix multiplication.
Other cases would require additional transposes and it is better to wait for
LLVM intrinsics to provide a per-operation attribute that would specify which
layout is expected.
A separate integration test, not submitted to MLIR core, has independently
verified that correct execution occurs on a 2x2x2 matrix multiplication.
Differential Revision: https://reviews.llvm.org/D76014
HasNoSideEffect can now be implemented using the MemoryEffectInterface, removing the need to check multiple things for the same information. This also removes an easy foot-gun for users as 'Operation::hasNoSideEffect' would ignore operations that dynamically, or recursively, have no side effects. This also leads to an immediate improvement in some of the existing users, such as DCE, now that they have access to more information.
Differential Revision: https://reviews.llvm.org/D76036
This revision takes advantage of the empty AffineMap to specify the
0-D edge case. This allows removing a bunch of annoying corner cases
that ended up impacting users of Linalg.
Differential Revision: https://reviews.llvm.org/D75831
add_llvm_library and add_llvm_executable may need to create new targets with
appropriate dependencies. As a result, it is not sufficient in some
configurations (namely LLVM_BUILD_LLVM_DYLIB=on) to only call
add_dependencies(). Instead, the explicit TableGen dependencies must
be passed to add_llvm_library() or add_llvm_executable() using the DEPENDS
keyword.
Differential Revision: https://reviews.llvm.org/D74930
In cmake, it is redundant to have a target list under target_link_libraries()
and add_dependency(). This patch removes the redundant dependency from
add_dependency().
Differential Revision: https://reviews.llvm.org/D74929
CMake allows calling target_link_libraries() without a keyword,
but this usage is not preferred when also called with a keyword,
and has surprising behavior. This patch explicitly specifies a
keyword when using target_link_libraries().
Differential Revision: https://reviews.llvm.org/D75725
output has zero rank.
While lowering to loops, no indices should be used in the load/store
operation if the buffer is zero-rank.
Differential Revision: https://reviews.llvm.org/D75391
add_llvm_library and add_llvm_executable may need to create new targets with
appropriate dependencies. As a result, it is not sufficient in some
configurations (namely LLVM_BUILD_LLVM_DYLIB=on) to only call
add_dependencies(). Instead, the explicit TableGen dependencies must
be passed to add_llvm_library() or add_llvm_executable() using the DEPENDS
keyword.
Differential Revision: https://reviews.llvm.org/D74930
In cmake, it is redundant to have a target list under target_link_libraries()
and add_dependency(). This patch removes the redundant dependency from
add_dependency().
Differential Revision: https://reviews.llvm.org/D74929
When compiling libLLVM.so, add_llvm_library() manipulates the link libraries
being used. This means that when using add_llvm_library(), we need to pass
the list of libraries to be linked (using the LINK_LIBS keyword) instead of
using the standard target_link_libraries call. This is preparation for
properly dealing with creating libMLIR.so as well.
Differential Revision: https://reviews.llvm.org/D74864
add_llvm_library and add_llvm_executable may need to create new targets with
appropriate dependencies. As a result, it is not sufficient in some
configurations (namely LLVM_BUILD_LLVM_DYLIB=on) to only call
add_dependencies(). Instead, the explicit TableGen dependencies must
be passed to add_llvm_library() or add_llvm_executable() using the DEPENDS
keyword.
Differential Revision: https://reviews.llvm.org/D74930
In cmake, it is redundant to have a target list under target_link_libraries()
and add_dependency(). This patch removes the redundant dependency from
add_dependency().
Differential Revision: https://reviews.llvm.org/D74929
When compiling libLLVM.so, add_llvm_library() manipulates the link libraries
being used. This means that when using add_llvm_library(), we need to pass
the list of libraries to be linked (using the LINK_LIBS keyword) instead of
using the standard target_link_libraries call. This is preparation for
properly dealing with creating libMLIR.so as well.
Differential Revision: https://reviews.llvm.org/D74864
Instead of creating extra libraries we don't really need, collect a
list of all dialects and use that instead.
Differential Revision: https://reviews.llvm.org/D75221
This revision performs some basic refactoring towards more easily defining Linalg "named" ops. Such named ops form the backbone of operations that are ubiquitous in the ML application domain.
Thus far IntegerType has been signless: a value of IntegerType does
not have a sign intrinsically and it's up to the specific operation
to decide how to interpret those bits. For example, std.addi does
two's complement arithmetic, and std.divis/std.diviu treats the first
bit as a sign.
This design choice was made some time ago when we did't have lots
of dialects and dialects were more rigid. Today we have much more
extensible infrastructure and different dialect may want different
modelling over integer signedness. So while we can say we want
signless integers in the standard dialect, we cannot dictate for
others. Requiring each dialect to model the signedness semantics
with another set of custom types is duplicating the functionality
everywhere, considering the fundamental role integer types play.
This CL extends the IntegerType with a signedness semantics bit.
This gives each dialect an option to opt in signedness semantics
if that's what they want and helps code sharing. The parser is
modified to recognize `si[1-9][0-9]*` and `ui[1-9][0-9]*` as
signed and unsigned integer types, respectively, leaving the
original `i[1-9][0-9]*` to continue to mean no indication over
signedness semantics. All existing dialects are not affected (yet)
as this is a feature to opt in.
More discussions can be found at:
https://groups.google.com/a/tensorflow.org/d/msg/mlir/XmkV8HOPWpo/7O4X0Nb_AQAJ
Differential Revision: https://reviews.llvm.org/D72533
Fixing a bug where using a zero-rank shaped type operand to
linalg.generic ops hit an unrelated assert. This also meant that
lowering the operation to loops was not supported. Adding roundtrip
tests and lowering to loops test for zero-rank shaped type operand
with fixes to make the test pass.
Differential Revision: https://reviews.llvm.org/D74638
Summary:
Linalg's promotion pass was only supporting f32 buffers due to how the
zero value was build for the `fill` operation.
Moreover, `promoteSubViewOperands` was returning a vector with one entry
per float subview while omitting integer subviews. For a program
with only integer subviews the return vector would be of size 0.
However, `promoteSubViewsOperands` would try to access a non zero
number of entries of this vector, resulting in a sefgault.
Reviewers: nicolasvasilache, ftynse
Reviewed By: ftynse
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74532
This CL refactors EDSCs to layer them better and break unnecessary
dependencies. After this refactoring, the top-level EDSC target only
depends on IR but not on Dialects anymore and each dialect has its
own EDSC directory.
This simplifies the layering and breaks cyclic dependencies.
In particular, the declarative builder + folder are made explicit and
are now confined to Linalg.
As the refactoring occurred, certain classes and abstractions that were not
paying for themselves have been removed.
Differential Revision: https://reviews.llvm.org/D74302
The initial implementation of the fusion operation exposes a method to
fuse a consumer with its producer, when
- both the producer and consumer operate on tensors
- the producer has only a single result value
- the producer has only "parallel" iterator types
A new interface method hasTensorSemantics is added to verify that an
operation has all operands and results of type RankedTensorType.
Differential Revision: https://reviews.llvm.org/D74172
Summary:
It is often needed to map entire ranges rather than single values. To avoid
writing the same for loop every time, I have added an overload to the map
method.
Differential Revision: https://reviews.llvm.org/D73894
LinalgDependenceGraph was not updated after successful producer-consumer
fusion for linalg ops. In this patch it is fixed by reconstructing
LinalgDependenceGraph on every iteration. This is very ineffective and
should be improved by updating LDGraph only when it is necessary.
Summary:
After the `subview` operation was migrated from Linalg to Standard, it changed
semantics and does not guarantee the absence of out-of-bounds accesses through
the created view anymore. Compute the size of the subview to make sure it
always fits within the view (subviews in last iterations of the loops may be
smaller than those in other iterations).
Differential Revision: https://reviews.llvm.org/D73614
Summary:
This diff adds a transformation patter to rewrite linalg.fill as broadcasting a scaler into a vector.
It uses the same preconditioning as matmul (memory is contiguous).
Reviewers: nicolasvasilache
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73391
Summary:
This is a simple extension to allow vectorization to work not only on GenericLinalgOp
but more generally across named ops too.
For now, this still only vectorizes matmul-like ops but is a step towards more
generic vectorization of Linalg ops.
Reviewers: ftynse
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72942
Summary:
This diff moves the conversion pass declaration closer to its definition
and makes the namespacing of passes consistent with the rest of the
infrastructure (i.e. `mlir::linalg::createXXXPass` -> `mlir::createXXXPass`).
Reviewers: ftynse, jpienaar, mehdi_amini
Subscribers: rriddle, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72766
Summary:
This diff fixes issues with the semantics of linalg.generic on tensors that appeared when converting directly from HLO to linalg.generic.
The changes are self-contained within MLIR and can be captured and tested independently of XLA.
The linalg.generic and indexed_generic are updated to:
To allow progressive lowering from the value world (a.k.a tensor values) to
the buffer world (a.k.a memref values), a linalg.generic op accepts
mixing input and output ranked tensor values with input and output memrefs.
```
%1 = linalg.generic #trait_attribute %A, %B {other-attributes} :
tensor<?x?xf32>,
memref<?x?xf32, stride_specification>
-> (tensor<?x?xf32>)
```
In this case, the number of outputs (args_out) must match the sum of (1) the
number of output buffer operands and (2) the number of tensor return values.
The semantics is that the linalg.indexed_generic op produces (i.e.
allocates and fills) its return values.
Tensor values must be legalized by a buffer allocation pass before most
transformations can be applied. Such legalization moves tensor return values
into output buffer operands and updates the region argument accordingly.
Transformations that create control-flow around linalg.indexed_generic
operations are not expected to mix with tensors because SSA values do not
escape naturally. Still, transformations and rewrites that take advantage of
tensor SSA values are expected to be useful and will be added in the near
future.
Subscribers: bmahjour, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72555
Summary:
This diff adds support to allow `linalg.generic` and
`linalg.indexed_generic` to take tensor input and output
arguments.
The subset of output tensor operand types must appear
verbatim in the result types after an arrow. The parser,
printer and verifier are extended to accomodate this
behavior.
The Linalg operations now support variadic ranked tensor
return values. This extension exhibited issues with the
current handling of NativeCall in RewriterGen.cpp. As a
consequence, an explicit cast to `SmallVector<Value, 4>`
is added in the proper place to support the new behavior
(better suggestions are welcome).
Relevant cleanups and name uniformization are applied.
Relevant invalid and roundtrip test are added.
Reviewers: mehdi_amini, rriddle, jpienaar, antiagainst, ftynse
Subscribers: burmako, shauheen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72022
Summary: This is part of an ongoing cleanup and uniformization work.
Reviewers: ftynse
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72084
Summary:
This is part of an ongoing cleanup and uniformization work.
This diff performs 3 types of cleanups:
1. Uniformize transformation names.
2. Replace all pattern operands that need not be captured by `$_`
3. Replace all usage of pattern captured op by the normalized `op` name (instead of positional parameters such as `$0`)
Reviewers: ftynse
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72081
This is an initial step to refactoring the representation of OpResult as proposed in: https://groups.google.com/a/tensorflow.org/g/mlir/c/XXzzKhqqF_0/m/v6bKb08WCgAJ
This change will make it much simpler to incrementally transition all of the existing code to use value-typed semantics.
PiperOrigin-RevId: 286844725
This CL allows specifying an additional name for specifying the .td file that is used to generate the doc for a dialect. This is necessary for a dialect like Linalg which has different "types" of ops that are used in different contexts.
This CL also restructures the Linalg documentation and renames LinalgLibraryOps -> LinalgStructuredOps but is otherwise NFC.
PiperOrigin-RevId: 286450414
This PR targest issue tensorflow/mlir#295. It exposes the already existing
subiew promotion pass as a declarative pattern
Change-Id: If901ebef9fb53fcd0b12ecc536f6b174ce320b92
Closestensorflow/mlir#315
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/315 from tetuante:issue295 8e5f268b6d85f31015c33505329dbd7a4db97ac5
PiperOrigin-RevId: 285801463
This will be evolved into a simple programming model for custom ops and custom layers in followup CLs.
This CL also deletes the obsolete tablegen's reference-impl.td that was using EDSCs.
PiperOrigin-RevId: 285459545
This CL adds more common information to StructuredOpsUtils.h
The n_view attribute is retired in favor of args_in + args_out but the CL is otherwise NFC.
PiperOrigin-RevId: 285000621
This patch closes issue tensorflow/mlir#272
We add a standalone iterator permutation transformation to Linalg.
This transformation composes a permutation map with the maps in the
"indexing_maps" attribute. It also permutes "iterator_types"
accordingly.
Change-Id: I7c1e693b8203aeecc595a7c012e738ca1100c857
Closestensorflow/mlir#307
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/307 from tetuante:issue272 f7908d58792f4111119721885e247045104f1131
PiperOrigin-RevId: 284824102
This CL uses the newly expanded matcher support to easily detect when a linalg.generic has a multiply-accumulate body. A linalg.generic with such a body is rewritten as a vector contraction.
This CL additionally limits the rewrite to the case of matrix multiplication on contiguous and statically shaped memrefs for now.
Before expanding further, we should harden the infrastructure for expressing custom ops with the structured ops abstraction.
PiperOrigin-RevId: 284566659
This patch closes issue tensorflow/mlir#271.
It adds an optional permutation map to declarative tiling transformations.
The map is expressed as a list of integers.
Closestensorflow/mlir#288
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/288 from tetuante:issue271 2df2938d6a1f01b3bc404ded08dea2dd1e10b588
PiperOrigin-RevId: 284064151
This CL rewrites the linalg ops to loops transformations as patterns that can be targeted directly from Tablegen. Reliance on OpFolder is removed and to cope with it we introduce local folding patterns that are applied greedily.
PiperOrigin-RevId: 282765550
This will make it easier to scale out test patterns and build specific passes that do not interfere with independent testing.
PiperOrigin-RevId: 281736335
This modification will allow to easily plug lowering of linalg ops to different types of loops (affine, loop.for and other future constructs).
This is purely NFC for now.
PiperOrigin-RevId: 280652186
This is step 1/n in refactoring infrastructure along the Vector dialect to make it ready for retargetability and composable progressive lowering.
PiperOrigin-RevId: 280529784
Following up on the consolidation of MemRef descriptor conversion, update
Linalg-to-LLVM conversion to use the helper class that abstracts away the
implementation details of the MemRef descriptor. This required MemRefDescriptor
to become publicly visible. Since this conversion is heavily EDSC-based,
introduce locally an additional wrapper that uses builder and location pointed
to by the EDSC context while emitting descriptor manipulation operations.
PiperOrigin-RevId: 280429228
This CL uses the now standard std.subview in linalg.
Two shortcuts are currently taken to allow this port:
1. the type resulting from a view is currently degraded to fully dynamic to pass the SubViewOp verifier.
2. indexing into SubViewOp may access out of bounds since lowering to LLVM does not currently enforce it by construction.
These will be fixed in subsequent commits after discussions.
PiperOrigin-RevId: 280250129
This CL adds an extra pointer to the memref descriptor to allow specifying alignment.
In a previous implementation, we used 2 types: `linalg.buffer` and `view` where the buffer type was the unit of allocation/deallocation/alignment and `view` was the unit of indexing.
After multiple discussions it was decided to use a single type, which conflates both, so the memref descriptor now needs to carry both pointers.
This is consistent with the [RFC-Proposed Changes to MemRef and Tensor MLIR Types](https://groups.google.com/a/tensorflow.org/forum/#!searchin/mlir/std.view%7Csort:date/mlir/-wKHANzDNTg/4K6nUAp8AAAJ).
PiperOrigin-RevId: 279959463
This change allows for adding additional nested references to a SymbolRefAttr to allow for further resolving a symbol if that symbol also defines a SymbolTable. If a referenced symbol also defines a symbol table, a nested reference can be used to refer to a symbol within that table. Nested references are printed after the main reference in the following form:
symbol-ref-attribute ::= symbol-ref-id (`::` symbol-ref-id)*
Example:
module @reference {
func @nested_reference()
}
my_reference_op @reference::@nested_reference
Given that SymbolRefAttr is now more general, the existing functionality centered around a single reference is moved to a derived class FlatSymbolRefAttr. Followup commits will add support to lookups, rauw, etc. for scoped references.
PiperOrigin-RevId: 279860501
This operation is a companion operation to the std.view operation added as proposed in "Updates to the MLIR MemRefType" RFC.
PiperOrigin-RevId: 279766410
Now that a view op has graduated to the std dialect, we can update Linalg to use it and remove ops that have become obsolete. As a byproduct, the linalg buffer and associated ops can also disappear.
PiperOrigin-RevId: 279073591
This CL adds a simple pattern for specifying producer-consumer fusion on Linalg operations.
Implementing such an extension reveals some interesting properties.
Since Linalg operates on a buffer abstraction, the output buffers are specified as in/out parameters to the ops. As a consequence, there are no SSA use-def chains and one cannot specify complex dag input patterns with the current infrastructure.
Instead this CL uses constraints based on the existing linalg dependence analysis to focus the pattern and refine patterns based on the type of op that last wrote in a buffer.
This is a very local property and is less powerful than the generic dag specification based on SSA use-def chains.
This will be generalized in the future.
PiperOrigin-RevId: 277931503
MLIR const-correctness policy is to avoid having `const` on IR objects.
LinalgDependenceGraph is not an IR object but an auxiliary data structure.
Furthermore, it is not updated once constructed unlike IR objects. Add const
qualifiers to get* and find* methods of LinalgDependenceGraph since they are
not modifying the graph. This allows transformation functions that require the
dependence graph to take it by const-reference, clearly indicating that they
are not modifying it (and that the graph may have to be recomputed after the
transformation).
PiperOrigin-RevId: 277731608
Linalg ops provide a good anchor for pattern matching/rewriting transformations.
This CL adds a simple example of how multi-level tiling may be specified by attaching a simple StringAttr to ops as they are transformed so we can easily specify partial lowering to control transformation application.
This is a first stab at taking advantage of higher-level information contained in Linalg ops and will evolve in the future.
PiperOrigin-RevId: 277497958
This will be used to specify declarative Linalg transformations in a followup CL. In particular, the PatternRewrite mechanism does not allow folding and has its own way of tracking erasure.
PiperOrigin-RevId: 277149158
This allows mixing linalg operations with vector transfer operations (with additional modifications to affine ops) and is a step towards solving tensorflow/mlir#189.
PiperOrigin-RevId: 275543361
This CL creates a new Linalg promotion pass that operates on SubViewOp and decouples it from Linalg tiling. This is mostly moving code around.
PiperOrigin-RevId: 275329213
When the implementation of the strided memref [RFC](https://groups.google.com/a/tensorflow.org/forum/#!msg/mlir/MaL8m2nXuio/1scRqZa6AQAJ) landed, linalg started using this type instead of the now retired !linalg.view.
As static and partially static cases appear, the stride information needs to be maintained properly. In particular, the result type of the subview op was generally incorrect.
This CL fixes the issue by computing a return type that:
1. always has dynamic sizes, which is generally the only correct way to construct a subview in the absence of data padding and/or code versioning.
2. has the same strides as the base strided memref.
Point 1. above can be further refined but will needs further analysis and canonicalization to optimize the particular case where:
1. The base memref has static size along a given dimension.
2. The subview size can be statically derived (e.g. after canonicalization).
3. *And* the subview size is an even divisor of the base memref.
This 3rd constraint is well-known in the case of tiled layouts that don't assume implicit padding: the boundary tile may be only partial and has size given by `problem_size % tile_size`.
Tests are updated as appropriate.
PiperOrigin-RevId: 274578624
This fixes an omission that prevents Linalg to lower generic ops regions operating on ops in the VectorOps dialect.
To achieve this we simply need to `populateVectorToLLVMConversionPatterns` in the conversion.
Relevant tests are added.
PiperOrigin-RevId: 274577325
When an operation with regions gets replaced, we currently require that all of the remaining nested operations are still converted even though they are going to be replaced when the rewrite is finished. This cl adds a tracking for a minimal set of operations that are known to be "dead". This allows for ignoring the legalization of operations that are won't survive after conversion.
PiperOrigin-RevId: 274009003
This function-like operation allows one to define functions that have wrapped
LLVM IR function type, in particular variadic functions. The operation was
added in parallel to the existing lowering flow, this commit only switches the
flow to use it.
Using a custom function type makes the LLVM IR dialect type system more
consistent and avoids complex conversion rules for functions that previously
had to use the built-in function type instead of a wrapped LLVM IR dialect type
and perform conversions during the analysis.
PiperOrigin-RevId: 273910855
During the conversion, both the original and the converted function may coexist
in the module and have the same symbol name. There is no guarantee which of the
two will be found by the symbol lookup. Avoid returning the result of the
library function lookup when lowering Linalg to Standard or LLVM. Use the
symbol reference instead. After the conversion completes, only one symbol will
remain and the Ops using SymbolRefAttrs will be referring to the correct one.
PiperOrigin-RevId: 273510079
Certain lowering patterns were reported as [missing](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/dkdmHa77sSQ).
This CL adds them and allows Linalg/roundtrip.mlir and Linalg/loops.mlir to lower to LLVM directly. Those 2 tests are updated to additionally check that the direct lowering to LLVM does not crash.
The following points, left as TODOs still need to be addressed for correct end-to-end execution:
1. the lowering for ConvOp needs to pass attributes such as strides and dilations; the external library call needs to support it.
2. the lowering for GenericOp needs to support lowering to loops as a DialectConversion pattern. This is blocked on the DialectConversion infrastructure accepting an OperationFolder.
PiperOrigin-RevId: 272878131
This makes the name of the conversion pass more consistent with the naming
scheme, since it actually converts from the Loop dialect to the Standard
dialect rather than working with arbitrary control flow operations.
PiperOrigin-RevId: 272612112
This CL finishes the implementation of the Linalg + Affine type unification of the [strided memref RFC](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
As a consequence, the !linalg.view type, linalg::DimOp, linalg::LoadOp and linalg::StoreOp can now disappear and Linalg can use standard types everywhere.
PiperOrigin-RevId: 272187165
This CL modifies the linalg-fusion pass such that it does not tile anymore as part of the pass. Tiling is a separate concern that enables linalg fusion but should happen before.
This makes fusion more composable with other decisions.
In particular the fusion pass now becomes greedy and only applies the transformation on a best-effort basis.
This should also let fusion work in a multi-hop fashion with chains of producer/consumers.
Since the fusion pass does not perform tiling anymore, tests are rewritten to be in pretiled form and make the intent of the test clearer (albeit more verbose).
PiperOrigin-RevId: 271357741
The helper functions makePositionAttr() and positionAttr() were originally
introduced in the lowering-to-LLVM-dialect pass to construct integer array
attributes that are used for static positions in extract/insertelement.
Constructing an integer array attribute being fairly common, a utility function
Builder::getI64ArrayAttr was later introduced into the Builder API. Drop
makePositionAttr and similar homegrown functions and use that API instead.
PiperOrigin-RevId: 269295836
View descriptors are converted to *pointer to* LLVM struct to avoid ABI issues related to C struct packing. This creates unnecessary complexity and hampers unification with memrefs.
Instead, this CL makes view descriptors convert to LLVM struct (as it was originally) and promotes all structs to pointers right before calling an external function.
PiperOrigin-RevId: 267602693
This CL adds support for proper cloning of Linalg ops that have regions (i.e. the generic linalg op). This is used to properly implement tiling and fusion for such ops. Adequate tests are added.
PiperOrigin-RevId: 267027176
Remove unused variables and attributes from BaseViewConversionHelper
on mlir/lib/Dialect/Linalg/Transforms/LowerToLLVMDialect.cpp
Closestensorflow/mlir#116
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/116 from alexst07:fix-warnings 5f638e4677492cf71a9cc040eeb6b57427d32e06
PiperOrigin-RevId: 266972082
This interface will allow for providing hooks to interrop with operation folding. The first hook, 'shouldMaterializeInto', will allow for controlling which region to insert materialized constants into. The folder will generally materialize constants into the top-level isolated region, this allows for materializing into a lower level ancestor region if it is more profitable/correct.
PiperOrigin-RevId: 266702972
This change refactors and cleans up the implementation of the operation walk methods. After this refactoring is that the explicit template parameter for the operation type is no longer needed for the explicit op walks. For example:
op->walk<AffineForOp>([](AffineForOp op) { ... });
is now accomplished via:
op->walk([](AffineForOp op) { ... });
PiperOrigin-RevId: 266209552
Add an extra RewritePattern that does not convert types to rewrite a CopyOp that has non-identity permutations into a sequence of TransposeOp followed by a CopyOp without such permutations.
This RewitePattern is made to fail in the non-permutation case so that the conversion pattern can kick in to lower to LLVM.
This is an instance of A->A->B lowering where A->A is done by a RewritePattern in case_1 and A->B is done by a ConversionPatternRewriter when not(case_1).
PiperOrigin-RevId: 265171380
Add a conversion pattern that transforms a linalg.transpose op into:
1. A function entry `alloca` operation to allocate a ViewDescriptor.
2. A load of the ViewDescriptor from the pointer allocated in 1.
3. Updates to the ViewDescriptor to introduce the data ptr, offset, size
and stride. Size and stride are permutations of the original values.
4. A store of the resulting ViewDescriptor to the alloca'ed pointer.
The linalg.transpose op is replaced by the alloca'ed pointer.
PiperOrigin-RevId: 265169112
This CL extends support for lowering of linalg to external C++ libraries with CopyOp. Currently this can only work when the permutation maps in the copies are identity. Future support for permutations will be added later.
PiperOrigin-RevId: 265093025
linalg.subview used to lower to a slice with a bounded range resulting in correct bounded accesses. However linalg.slice could still index out of bounds. This CL moves the bounding to linalg.slice.
LLVM select and cmp ops gain a more idiomatic builder.
PiperOrigin-RevId: 264897125