This revision refactors and simplifies the pattern detection logic: thanks to SSA value properties, we can actually look at all the uses of a given value and avoid having to pattern-match specific chains of operations.
A bufferization pattern for subtensor is added and specific inplaceability analysis is implemented for the simple case of subtensor. More advanced use cases will follow.
Differential revision: https://reviews.llvm.org/D102512
This pattern inlines operands to a linalg.generic operation that use a constant
index and hence are loop-invariant scalars. This reduces the number of
linalg.generic operands and unlocks some canonicalizations that rely on seeing
an explicit tensor.extract.
Differential Revision: https://reviews.llvm.org/D102682
LinalgOps that are all parallel do not use the value of `outs`
tensor. The semantics is that the `outs` tensor is fully
overwritten. Using anything other than `init_tensor` can add false
dependencies between operations, when the use is just for the shape of
the tensor. Adding a canonicalization to always use `init_tensor` in
such cases, breaks this dependence.
Differential Revision: https://reviews.llvm.org/D102561
Replace the templated linalgLowerOpToLoops method by three specialized methods linalgOpToLoops, LinalgOpToParallelLoops, and linalgOpToAffineLoops.
Differential Revision: https://reviews.llvm.org/D102324
This covers the extremely common case of replacing all uses of a Value
with a new op that is itself a user of the original Value.
This should also be a little bit more efficient than the
`SmallPtrSet<Operation *, 1>{op}` idiom that was being used before.
Differential Revision: https://reviews.llvm.org/D102373
All glue and clutter in the linalg ops has been replaced by proper
sparse tensor type encoding. This code is no longer needed. Thanks
to ntv@ for giving us a temporary home in linalg.
So long, and thanks for all the fish.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102098
The current design uses a unique entry for each argument/result attribute, with the name of the entry being something like "arg0". This provides for a somewhat sparse design, but ends up being much more expensive (from a runtime perspective) in-practice. The design requires building a string every time we lookup the dictionary for a specific arg/result, and also requires N attribute lookups when collecting all of the arg/result attribute dictionaries.
This revision restructures the design to instead have an ArrayAttr that contains all of the attribute dictionaries for arguments and another for results. This design reduces the number of attribute name lookups to 1, and allows for O(1) lookup for individual element dictionaries. The major downside is that we can end up with larger memory usage, as the ArrayAttr contains an entry for each element even if that element has no attributes. If the memory usage becomes too problematic, we can experiment with a more sparse structure that still provides a lot of the wins in this revision.
This dropped the compilation time of a somewhat large TensorFlow model from ~650 seconds to ~400 seconds.
Differential Revision: https://reviews.llvm.org/D102035
The pattern to convert subtensor ops to their rank-reduced versions
(by dropping unit-dims in the result) can also convert to a zero-rank
tensor. Handle that case.
This also fixes a OOB access bug in the existing pattern for such
cases.
Differential Revision: https://reviews.llvm.org/D101949
This expose a lambda control instead of just a boolean to control unit
dimension folding.
This however gives more control to user to pick a good heuristic.
Folding reshapes helps fusion opportunities but may generate sub-optimal
generic ops.
Differential Revision: https://reviews.llvm.org/D101917
Fixing a minor bug which lead to element type of the output being
modified when folding reshapes with generic op.
Differential Revision: https://reviews.llvm.org/D101942
The old index op handling let the new index operations point back to the
producer block. As a result, after fusion some index operations in the
fused block had back references to the old producer block resulting in
illegal IR. The patch now relies on a block and value mapping to avoid
such back references.
Differential Revision: https://reviews.llvm.org/D101887
This revision migrates more code from Linalg into the new permanent home of
SparseTensor. It replaces the test passes with proper compiler passes.
NOTE: the actual removal of the last glue and clutter in Linalg will follow
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D101811
Given the source and destination shapes, if they are static, or if the
expanded/collapsed dimensions are unit-extent, it is possible to
compute the reassociation maps that can be used to reshape one type
into another. Add a utility method to return the reassociation maps
when possible.
This utility function can be used to fuse a sequence of reshape ops,
given the type of the source of the producer and the final result
type. This pattern supercedes a more constrained folding pattern added
to DropUnitDims pass.
Differential Revision: https://reviews.llvm.org/D101343
Convert subtensor and subtensor_insert operations to use their
rank-reduced versions to drop unit dimensions.
Differential Revision: https://reviews.llvm.org/D101495
The current implementation had a bug as it was relying on the target vector
dimension sizes to calculate where to insert broadcast. If several dimensions
have the same size we may insert the broadcast on the wrong dimension. The
correct broadcast cannot be inferred from the type of the source and
destination vector.
Instead when we want to extend transfer ops we calculate an "inverse" map to the
projected permutation and insert broadcast in place of the projected dimensions.
Differential Revision: https://reviews.llvm.org/D101738
This is the very first step toward removing the glue and clutter from linalg and
replace it with proper sparse tensor types. This revision migrates the LinalgSparseOps
into SparseTensorOps of a sparse tensor dialect. This also provides a new home for
sparse tensor related transformation.
NOTE: the actual replacement with sparse tensor types (and removal of linalg glue/clutter)
will follow but I am trying to keep the amount of changes per revision manageable.
Differential Revision: https://reviews.llvm.org/D101573
This is the very first step toward removing the glue and clutter from linalg and
replace it with proper sparse tensor types. This revision migrates the LinalgSparseOps
into SparseTensorOps of a sparse tensor dialect. This also provides a new home for
sparse tensor related transformation.
NOTE: the actual replacement with sparse tensor types (and removal of linalg glue/clutter)
will follow but I am trying to keep the amount of changes per revision manageable.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D101488
FillOp allows complex ops, and filling a properly sized buffer with
a default zero complex number is implemented.
Differential Revision: https://reviews.llvm.org/D99939
This revision adds support for vectorizing more general linalg operations with projected permutation maps.
This is achieved by eagerly broadcasting the intermediate vector to the common size
of the iteration domain of the linalg op. This allows a much more natural expression of
generalized vectorization but may introduce additional computations until all the
proper canonicalizations are implemented.
This generalization modifies the vector.transfer_read/write permutation logic and
exposes the fact that the logic employed in vector.contract was too ad-hoc.
As a consequence, changes occur in the permutation / transposition logic for contraction. In turn this prompts supporting more cases in the lowering of contract
to matrix intrinsics, which is required to make the corresponding tests pass.
Differential revision: https://reviews.llvm.org/D101165
Splat constant folding was limited to `std.constant` operations. Instead, use
the constant matcher and apply splat constant folding to any constant-like
operation that holds a splat attribute.
Differential Revision: https://reviews.llvm.org/D101301
The interchange option attached to the linalg to loop lowering affects only the loops and does not update the memory accesses generated in to body of the operation. Instead of performing the interchange during the loop lowering use the interchange pattern.
Differential Revision: https://reviews.llvm.org/D100758
Example:
```
%0 = linalg.init_tensor : tensor<...>
%1 = linalg.generic ... outs(%0: tensor<...>)
%2 = linalg.generic ... outs(%0: tensor<...>)
```
Memref allocated as a result of `init_tensor` bufferization can be incorrectly overwritten by the second linalg.generic operation
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D100921
This will prevent fusion that spains all dims and generates
(d0, d1, ...) -> () reshape that isn't legal
Differential Revision: https://reviews.llvm.org/D100805
Break up the dependency between SCF ops and substituteMin helper and make a
more generic version of AffineMinSCFCanonicalization. This reduce dependencies
between linalg and SCF and will allow the logic to be used with other kind of
ops. (Like ID ops).
Differential Revision: https://reviews.llvm.org/D100321
Instead of always running the region builder check if the generalized op has a region attached. If yes inline the existing region instead of calling the region builder. This change circumvents a problem with named operations that have a region builder taking captures and the generalization pass not knowing about this captures.
Differential Revision: https://reviews.llvm.org/D100880
The patch extends the vectorization pass to lower linalg index operations to vector code. It allocates constant 1d vectors that enumerate the indexes along the iteration dimensions and broadcasts/transposes these 1d vectors to the iteration space.
Differential Revision: https://reviews.llvm.org/D100373
This patch extends the control-flow cost-model for detensoring by
implementing a forward-looking pass on block arguments that should be
detensored. This makes sure that if a (to-be-detensored) block argument
"escapes" its block through the terminator, then the successor arguments
are also detensored.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D100457
The patch replaces the index operations in the body of fused producers and linearizes the indices after expansion.
Differential Revision: https://reviews.llvm.org/D100479
Update the dimensions of the index operations to account for dropped dimensions and replace the index operations of dropped dimensions by zero.
Differential Revision: https://reviews.llvm.org/D100395
Instead of interchanging loops during the loop lowering this pass performs the interchange by permuting the indexing maps. It also updates the iterator types and the index accesses in the body of the operation.
Differential Revision: https://reviews.llvm.org/D100627
Rationale:
Now that vector<?xindex> is allowed, the restriction on vectorization
of index types in the sparse compiler can be removed. Also needs
generalization of scatter/gather index types.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D100522
The patch updates the tiling pass to add the tile offsets to the indices returned by the linalg operations.
Differential Revision: https://reviews.llvm.org/D100379
The patch extends the linalg to loop lowering pass to replace all linalg index operations by the induction variables of the generated loop nests.
Differential Revision: https://reviews.llvm.org/D100364
This patch introduces the neccessary infrastructure changes to implement
cost-modelling for detensoring. In particular, it introduces the
following changes:
- An extension to the dialect conversion framework to selectively
convert sub-set of non-entry BB arguments.
- An extension to branch conversion pattern to selectively convert
sub-set of a branche's operands.
- An interface for detensoring cost-modelling.
- 2 simple implementations of 2 different cost models.
This sets the stage to explose cost-modelling for detessoring in an
easier way. We still need to come up with better cost models.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D99945
Fusing a constant with a linalg.generic operation can result in the
fused operation being illegal since the loop bound computation
fails. Avoid such fusions.
Differential Revision: https://reviews.llvm.org/D100272
The `linalg.index` operation provides access to the iteration indexes of immediately enclosing linalg operations. It takes a dimension `dim` attribute and returns the iteration index in the given dimension. Having `linalg.index` allows us to unify `linalg.generic` and `linalg.indexed_generic` and also enables index access in named operations.
Differential Revision: https://reviews.llvm.org/D100292
Recent change enable dropping unit-trip loops of "reduction" iterator
type as well. This is fine as long as there is one other "reduction"
iterator in the operation. Without this the initialized value (value
of `out`) is not read which leads to a correctness issue.
Also fix a bug in the `fill` -> `tensor_reshape` folding. The `out`
operand of the `fill` needs to be reshaped to get the `out` operand of
the generated `fill` operation.
Differential Revision: https://reviews.llvm.org/D100145
Some sparse matrices operate on integral values (in contrast with the common
f32 and f64 values). This CL expands the compiler and runtime support to deal
with several common type combinations.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D99999
Linalg fusion on tensors has mismatching assumptions on the operand side than on the region bbArg side.
Relax the behavior on the operand/indexing map side so that we better support output operands that may also be read from.
Differential revision: https://reviews.llvm.org/D99499
Right now Elementwise operations fusion in Linalg fuses everything it
can. This can run up against resource limits of the target hardware
without some checks. This patch adds a callback function that clients
can use to implement a cost function. When two elementwise operations
are deemed structurally fusable, the callback can be used to control
if the fusion applies.
Differential Revision: https://reviews.llvm.org/D99820
The moved `populate` methods are only relevant to Linalg
operations. So they are better of in `linalg` namespace. Also rename
`populateLinalgTensorOpsFusionPatterns` to
`populateElementwiseOpsFusionPatterns`. This makes the scope of these
patterns explicit and disambiguates it with fusion on tensors using
tile + fuse.
Differential Revision: https://reviews.llvm.org/D99819
Rationale:
Small indices and values, when allowed by the required range of the
input tensors, can reduce the memory footprint of sparse tensors
even more. Note, however, that we must be careful zero extending
the values (since sparse tensors never use negatives for indexing),
but LLVM treats the index type as signed in most memory operations
(like the scatter and gather). This CL dots all the i's in this regard.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D99777
Subtensor operations that are taking a slice out of a tensor that is
unit-extent along a dimension can be rewritten to drop that dimension.
Differential Revision: https://reviews.llvm.org/D99226
Drop usage of `emitRemark` and use `notifyMatchFailure` instead to
avoid unnecessary spew during compilation.
Differential Revision: https://reviews.llvm.org/D99485
init tensor operands also has indexing map and generally follow
the same constraints we expect for non-init-tensor operands.
Differential Revision: https://reviews.llvm.org/D99115
This commit exposes an option to the pattern
FoldWithProducerReshapeOpByExpansion to allow
folding unit dim reshapes. This gives callers
more fine-grained controls.
Differential Revision: https://reviews.llvm.org/D99114
Until now Linalg fusion only allow fusing producers whose operands
are all permutation indexing maps. It's easier to deduce the
subtensor/subview but it is an unnecessary constraint, as in tiling
we have more advanced logic to deduce the subranges even when the
operand is not of permutation indexing maps, e.g., the input operand
for convolution ops.
This patch uses the logic on tiling side to deduce subranges for
fusion. This enables fusing convolution with its consumer ops
when possible.
Along the way, we are now generating proper affine.min ops to guard
against size boundaries, if we cannot be certain they won't be
out of bounds.
Differential Revision: https://reviews.llvm.org/D99014
This is a preparation step to reuse makeTiledShapes in tensor
fusion. Along the way, did some lightweight cleanups.
Differential Revision: https://reviews.llvm.org/D99013
All linalg operations having a region builder shall call it during op creation. Calling it during vectorization is obsolete.
Differential Revision: https://reviews.llvm.org/D99168
Fix the BlockAndValueMapping update that was missing entries for scf.for op's blockIterArgs.
Skip cloning subtensors of the padded tensor as the logic for these is separate.
Add a filter to drop side-effecting ops.
Tests are beefed up to verify the IR is sound in all hoisting configurations for 2-level 3-D tiled matmul.
Differential Revision: https://reviews.llvm.org/D99255
To match an interface or trait, users currently have to use the `MatchAny` tag. This tag can be quite problematic for compile time for things like the canonicalizer, as the `MatchAny` patterns may get applied to *every* operation. This revision adds better support by bucketing interface/trait patterns based on which registered operations have them registered. This means that moving forward we will only attempt to match these patterns to operations that have this interface registered. Two simplify defining patterns that match traits and interfaces, two new utility classes have been added: OpTraitRewritePattern and OpInterfaceRewritePattern.
Differential Revision: https://reviews.llvm.org/D98986
This revision introduces proper backward slice computation during the hoisting of
PadTensorOp. This allows hoisting padding even across multiple levels of tiling.
Such hoisting requires the proper handling of loop bounds that may depend on enclosing
loop variables.
Differential revision: https://reviews.llvm.org/D98965
This nicely aligns the naming with RewritePatternSet. This type isn't
as widely used, but we keep a using declaration in to help with
downstream consumption of this change.
Differential Revision: https://reviews.llvm.org/D99131
This doesn't change APIs, this just cleans up the many in-tree uses of these
names to use the new preferred names. We'll keep the old names around for a
couple weeks to help transitions.
Differential Revision: https://reviews.llvm.org/D99127
- Drop unnecessary occurrences of rewriter.eraseOp: dead linalg ops on tensors should be cleaned up by DCE.
- reimplement the part of Linalg on fusion that constructs the body and block arguments: the previous implementation had too much magic. Instead this spells out all cases explicitly and asserts / introduces TODOs for incorrect cases.
As a consequence, we can use the default traversal order for this pattern.
Differential Revision: https://reviews.llvm.org/D99070
GreedyPatternRewriteDriver was changed from bottom-up traversal to top-down traversal. Not all passes work yet with that change for traversal order. To give some time for fixing, add an option to allow to switch back to bottom-up traversal. Use this option in FusionOfTensorOpsPass which fails otherwise.
Differential Revision: https://reviews.llvm.org/D99059
This updates the codebase to pass the context when creating an instance of
OwningRewritePatternList, and starts removing extraneous MLIRContext
parameters. There are many many more to be removed.
Differential Revision: https://reviews.llvm.org/D99028
This reverts commit 32a744ab20.
CI is broken:
test/Dialect/Linalg/bufferize.mlir:274:12: error: CHECK: expected string not found in input
// CHECK: %[[MEMREF:.*]] = tensor_to_memref %[[IN]] : memref<?xf32>
^
`BufferizeAnyLinalgOp` fails because `FillOp` is not a `LinalgGenericOp` and it fails while reading operand sizes attribute.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98671
This is a temporary work-around to get our all-annotations-all-flags
stress testing effort run clean. In the long run, we want to provide
efficient implementations of strided loads and stores though
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D98563
It is to use the methods in LinalgInterfaces.cpp for additional static shape verification to match the shaped operands and loop on linalgOps. If I used the existing methods, I would face circular dependency linking issue. Now we can use them as methods of LinalgOp.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D98163
Return the vectorization results using a vector passed by reference instead of returning them embedded in a structure.
Differential Revision: https://reviews.llvm.org/D98182
Reduction updates should be masked, just like the load and stores.
Note that alternatively, we could use the fact that masked values are
zero of += updates and mask invariants to get this working but that
would not work for *= updates. Masking the update itself is cleanest.
This change also replaces the constant mask with a broadcast of "true"
since this constant folds much better for various folding patterns.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98000
Found with exhaustive testing, it is possible that a while loop
appears in between chainable for loops. As long as we don't
scalarize reductions in while loops, this means we need to
terminate the chain at the while. This also refactors the
reduction code into more readable helper methods.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D97886
Some elementwise operations are not scalarizable, vectorizable, or tensorizable.
Split `ElementwiseMappable` trait into the following, more precise traits.
- `Elementwise`
- `Scalarizable`
- `Vectorizable`
- `Tensorizable`
This allows for reuse of `Elementwise` in dialects like HLO.
Differential Revision: https://reviews.llvm.org/D97674
This patch continues detensorizing implementation by detensoring
internal control flow in functions.
In order to detensorize functions, all the non-entry block's arguments
are detensored and branches between such blocks are properly updated to
reflect the detensored types as well. Function entry block (signature)
is left intact.
This continues work towards handling github/google/iree#1159.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D97148
The universal index was maintained if dense indices were still
in place, and lattice points followed. However, it should only
be kept if any of those following lattice points actually
consumes the universal index. This change also fixes an
inaccuracy with a missing broadcast around vector invariant.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D97594
Similar to mask-load/store and compress/expand, the gather and
scatter operation now allow for higher dimension uses. Note that
to support the mixed-type index, the new syntax is:
vector.gather %base [%i,%j] [%kvector] ....
The first client of this generalization is the sparse compiler,
which needs to define scatter and gathers on dense operands
of higher dimensions too.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D97422
When computing dense address, a vectorized index must be accounted
for properly. This bug was formerly undetected because we get 0 * prev + i
in most cases, which folds away the scalar part. Now it works for all cases.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D97317
This transformation was only used for quick experimentation and is not general enough.
Retire it.
Differential Revision: https://reviews.llvm.org/D97266
This commit is the first baby step towards detensoring in
linalg-on-tensors.
Detensoring is the process through which a tensor value is convereted to one
or potentially more primitive value(s). During this process, operations with
such detensored operands are also converted to an equivalen form that works
on primitives.
The detensoring process is driven by linalg-on-tensor ops. In particular, a
linalg-on-tensor op is checked to see whether *all* its operands can be
detensored. If so, those operands are converted to thier primitive
counterparts and the linalg op is replaced by an equivalent op that takes
those new primitive values as operands.
This works towards handling github/google/iree#1159.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96271
Simplifies the way lattices are optimized with less, but more
powerful rules. This also fixes an inaccuracy where too many
lattices resulted (expecting a non-existing universal index).
Also puts no-side-effects on all proper getters and unifies
bufferization flags order in integration tests (for future,
more complex use cases).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D97134
This commit introduced a cyclic dependency:
Memref dialect depends on Standard because it used ConstantIndexOp.
Std depends on the MemRef dialect in its EDSC/Intrinsics.h
Working on a fix.
This reverts commit 8aa6c3765b.
Create the memref dialect and move several dialect-specific ops without
dependencies to other ops from std dialect to this dialect.
Moved ops:
AllocOp -> MemRef_AllocOp
AllocaOp -> MemRef_AllocaOp
DeallocOp -> MemRef_DeallocOp
MemRefCastOp -> MemRef_CastOp
GetGlobalMemRefOp -> MemRef_GetGlobalOp
GlobalMemRefOp -> MemRef_GlobalOp
PrefetchOp -> MemRef_PrefetchOp
ReshapeOp -> MemRef_ReshapeOp
StoreOp -> MemRef_StoreOp
TransposeOp -> MemRef_TransposeOp
ViewOp -> MemRef_ViewOp
The roadmap to split the memref dialect from std is discussed here:
https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667
Differential Revision: https://reviews.llvm.org/D96425
Rationale:
Narrower types for overhead storage yield a smaller memory footprint for
sparse tensors and thus needs to be supported. Also, more value types
need to be supported to deal with all kinds of kernels. Since the
"one-size-fits-all" sparse storage scheme implementation is used
instead of actual codegen, the library needs to be able to support
all combinations of desired types. With some crafty templating and
overloading, the actual code for this is kept reasonably sized though.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D96819
This revision adds support for hoisting "subtensor + vector.transfer_read" / "subtensor_insert + vector.transfer_write pairs" across scf.for.
The unit of hoisting becomes a HoistableRead / HoistableWrite struct which contains a pair of "vector.transfer_read + optional subtensor" / "vector.transfer_write + optional subtensor_insert".
scf::ForOp canonicalization patterns are applied greedily on the successful application of the transformation to cleanup the IR more eagerly and potentially expose more transformation opportunities.
Differential revision: https://reviews.llvm.org/D96731
SliceAnalysis originally was developed in the context of affine.for within mlfunc.
It predates the notion of region.
This revision updates it to not hardcode specific ops like scf::ForOp.
When rooted at an op, the behavior of the slice computation changes as it recurses into the regions of the op. This does not support gathering all values transitively depending on a loop induction variable anymore.
Additional variants rooted at a Value are added to also support the existing behavior.
Differential revision: https://reviews.llvm.org/D96702
This revision takes advantage of the newly extended `ref` directive in assembly format
to allow better region handling for LinalgOps. Specifically, FillOp and CopyOp now build their regions explicitly which allows retiring older behavior that relied on specific op knowledge in both lowering to loops and vectorization.
This reverts commit 3f22547fd1 and reland 973e133b76 with a workaround for
a gcc bug that does not accept lambda default parameters:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59949
Differential Revision: https://reviews.llvm.org/D96598
This reverts commit 973e133b76.
It triggers an issue in gcc5 that require investigation, the build is
broken with:
/tmp/ccdpj3B9.s: Assembler messages:
/tmp/ccdpj3B9.s:5821: Error: symbol `_ZNSt17_Function_handlerIFvjjEUljjE2_E9_M_invokeERKSt9_Any_dataOjS6_' is already defined
/tmp/ccdpj3B9.s:5860: Error: symbol `_ZNSt14_Function_base13_Base_managerIUljjE2_E10_M_managerERSt9_Any_dataRKS3_St18_Manager_operation' is already defined
This revision takes advantage of the newly extended `ref` directive in assembly format
to allow better region handling for LinalgOps. Specifically, FillOp and CopyOp now build their regions explicitly which allows retiring older behavior that relied on specific op knowledge in both lowering to loops and vectorization.
Differential Revision: https://reviews.llvm.org/D96598
The AffineMap in the MemRef inferred by SubViewOp may have uncompressed symbols which result in type mismatch on otherwise unused symbols. Make the computation of the AffineMap compress those unused symbols which results in better canonical types.
Additionally, improve the error message to report which inferred type was expected.
Differential Revision: https://reviews.llvm.org/D96551
The dimension order of a filter in tensorflow is
[filter_height, filter_width, in_channels, out_channels], which is different
from current definition. The current definition follows TOSA spec. Add TF
version conv ops to .tc, so we do not have to insert a transpose op around a
conv op.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D96038
This revision connects the generated sparse code with an actual
sparse storage scheme, which can be initialized from a test file.
Lacking a first-class citizen SparseTensor type (with buffer),
the storage is hidden behind an opaque pointer with some "glue"
to bring the pointer back to tensor land. Rather than generating
sparse setup code for each different annotated tensor (viz. the
"pack" methods in TACO), a single "one-size-fits-all" implementation
has been added to the runtime support library. Many details and
abstractions need to be refined in the future, but this revision
allows full end-to-end integration testing and performance
benchmarking (with on one end, an annotated Lingalg
op and, on the other end, a JIT/AOT executable).
Reviewed By: nicolasvasilache, bixia
Differential Revision: https://reviews.llvm.org/D95847
This revision fixes the indexing logic into the packed tensor that result from hoisting padding. Previously, the index was incorrectly set to the loop induction variable when in fact we need to compute the iteration count (i.e. `(iv - lb).ceilDiv(step)`).
Differential Revision: https://reviews.llvm.org/D96417
This revision fixes the fact that the padding transformation did not have enough information to set the proper type for the padding value.
Additionally, the verifier for Yield in the presence of PadTensorOp is fixed to properly report incorrect number of results or operands. Previously, the error would be silently ignored which made the core issue difficult to debug.
Differential Revision: https://reviews.llvm.org/D96264
This reverts commit 511dd4f438 along with
a couple fixes.
Original message:
Now the context is the first, rather than the last input.
This better matches the rest of the infrastructure and makes
it easier to move these types to being declaratively specified.
Phabricator: https://reviews.llvm.org/D96111