Add a feature to `EnumAttr` definition to generate
specialized Attribute class for the particular enumeration.
This class will inherit `StringAttr` or `IntegerAttr` and
will override `classof` and `getValue` methods.
With this class the enumeration predicate can be checked with simple
RTTI calls (`isa`, `dyn_cast`) and it will return the typed enumeration
directly instead of raw string/integer.
Based on the following discussion:
https://llvm.discourse.group/t/rfc-add-enum-attribute-decorator-class/2252
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D97836
If MLIR_CUDA_RUNNER_ENABLED, register a 'gpu-to-cubin' conversion pass to mlir-opt.
The next step is to switch CUDA integration tests from mlir-cuda-runner to mlir-opt + mlir-cpu-runner and remove mlir-cuda-runner.
Depends On D98279
Reviewed By: herhut, rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D98203
Instead of configuring kernel-to-cubin/rocdl lowering through callbacks, introduce a base class that target-specific passes can derive from.
Put the base class in GPU/Transforms, according to the discussion in D98203.
The mlir-cuda-runner will go away shortly, and the mlir-rocdl-runner as well at some point. I therefore kept the existing code path working and will remove it in a separate step.
Depends On D98168
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D98279
Just a pure method renaming.
It is a preparation step for replacing "memory space as raw integer"
with more generic "memory space as attribute", which will be done in
separate commit.
The `MemRefType::getMemorySpace` method will return `Attribute` and
become the main API, while `getMemorySpaceAsInt` will be declared as
deprecated and will be replaced in all in-tree dialects (also in separate
commits).
Reviewed By: mehdi_amini, rriddle
Differential Revision: https://reviews.llvm.org/D97476
Lower !gpu.async.tokens returned from async.execute regions to events instead of streams.
Make !gpu.async.token returned from !async.execute single-use.
This allows creating one event per use and destroying them without leaking or ref-counting.
Technically we only need this for stream/event-based lowering. I kept the code separate
from the rest of the gpu-async-region pass so that we can make this optional or move
to a separate pass as needed.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D96965
This commit introduced a cyclic dependency:
Memref dialect depends on Standard because it used ConstantIndexOp.
Std depends on the MemRef dialect in its EDSC/Intrinsics.h
Working on a fix.
This reverts commit 8aa6c3765b.
Create the memref dialect and move several dialect-specific ops without
dependencies to other ops from std dialect to this dialect.
Moved ops:
AllocOp -> MemRef_AllocOp
AllocaOp -> MemRef_AllocaOp
DeallocOp -> MemRef_DeallocOp
MemRefCastOp -> MemRef_CastOp
GetGlobalMemRefOp -> MemRef_GetGlobalOp
GlobalMemRefOp -> MemRef_GlobalOp
PrefetchOp -> MemRef_PrefetchOp
ReshapeOp -> MemRef_ReshapeOp
StoreOp -> MemRef_StoreOp
TransposeOp -> MemRef_TransposeOp
ViewOp -> MemRef_ViewOp
The roadmap to split the memref dialect from std is discussed here:
https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667
Differential Revision: https://reviews.llvm.org/D96425
These properties were useful for a few things before traits had a better integration story, but don't really carry their weight well these days. Most of these properties are already checked via traits in most of the code. It is better to align the system around traits, and improve the performance/cost of traits in general.
Differential Revision: https://reviews.llvm.org/D96088
This reverts commit 511dd4f438 along with
a couple fixes.
Original message:
Now the context is the first, rather than the last input.
This better matches the rest of the infrastructure and makes
it easier to move these types to being declaratively specified.
Phabricator: https://reviews.llvm.org/D96111
Now the context is the first, rather than the last input.
This better matches the rest of the infrastructure and makes
it easier to move these types to being declaratively specified.
Differential Revision: https://reviews.llvm.org/D96111
This makes ignoring a result explicit by the user, and helps to prevent accidental errors with dropped results. Marking LogicalResult as no discard was always the intention from the beginning, but got lost along the way.
Differential Revision: https://reviews.llvm.org/D95841
Do not cache gpu.async.token type so that the pass can be created before the GPU dialect is registered.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D94397
This class used to serve a few useful purposes:
* Allowed containing a null DictionaryAttr
* Provided some simple mutable API around a DictionaryAttr
The first of which is no longer an issue now that there is much better caching support for attributes in general, and a cache in the context for empty dictionaries. The second results in more trouble than it's worth because it mutates the internal dictionary on every action, leading to a potentially large number of dictionary copies. NamedAttrList is a much better alternative for the second use case, and should be modified as needed to better fit it's usage as a DictionaryAttrBuilder.
Differential Revision: https://reviews.llvm.org/D93442
This better matches the rest of the infrastructure, is much simpler, and makes it easier to move these types to being declaratively specified.
Differential Revision: https://reviews.llvm.org/D93432
- the !gpu.async.token is the second result of 'gpu.alloc async', not the first.
- async.execute construction takes operand types not yet wrapped in !async.value.
- fix typo
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D93156
This is part of a larger refactoring the better congregates the builtin structures under the BuiltinDialect. This also removes the problematic "standard" naming that clashes with the "standard" dialect, which is not defined within IR/. A temporary forward is placed in StandardTypes.h to allow time for downstream users to replaced references.
Differential Revision: https://reviews.llvm.org/D92435
Given that OpState already implicit converts to Operator*, this seems reasonable.
The alternative would be to add more functions to OpState which forward to Operation.
Reviewed By: rriddle, ftynse
Differential Revision: https://reviews.llvm.org/D92266
These includes have been deprecated in favor of BuiltinDialect.h, which contains the definitions of ModuleOp and FuncOp.
Differential Revision: https://reviews.llvm.org/D91572
Looks like we have a blind spot in the testing matrix.
AsyncRegionRewriter.cpp: In member function ‘virtual void {anonymous}::GpuAsyncRegionPass::runOnFunction()’:
AsyncRegionRewriter.cpp:113:16: internal compiler error: in replace_placeholders_r, at cp/tree.c:2804
if (getFunction()
~~~~~~~~~~~~~
.getRegion()
~~~~~~~~~~~~
.walk(Callback{OpBuilder{&getContext()}})
~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is a roll-forward of rGec7780ebdab4, now that the remaining
gpu.launch_func have been converted to custom form in rGb22f111023ba.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D90420
AllReduceLowering is currently the only GPU rewrite pattern, but more are coming. This is a preparation change.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D89370
This combines two separate ops (D88972: `gpu.create_token`, D89043: `gpu.host_wait`) into one.
I do after all like the idea of combining the two ops, because it matches exactly the pattern we are
going to have in the other gpu ops that will implement the AsyncOpInterface (launch_func, copies, alloc):
If the op is async, we return a !gpu.async.token. Otherwise, we synchronize with the host and don't return a token.
The use cases for `gpu.wait async` and `gpu.wait` are further apart than those of e.g. `gpu.h2d async` and `gpu.h2d`,
but I like the consistent meaning of the `async` keyword in GPU ops.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D89160
The updated version of kernel outlining did not handle cases correctly
where an operand of a candidate for sinking itself was defined by an operation
that is a sinking candidate. In such cases, it could happen that sunk
operations were inserted in the wrong order, breaking ssa properties.
Differential Revision: https://reviews.llvm.org/D89112
The previous implementation did not support sinking simple expressions. In particular,
it is often beneficial to sink dim operations.
Differential Revision: https://reviews.llvm.org/D88439
- Use TypeRange instead of ArrayRef<Type> where possible.
- Change some of the custom builders to also use TypeRange
Differential Revision: https://reviews.llvm.org/D87944
Now backends spell out which namespace they want to be in, instead of relying on
clients #including them inside already-opened namespaces. This also means that
cppNamespaces should be fully qualified, and there's no implicit "::mlir::"
prepended to them anymore.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D86811
This patch moves the registration to a method in the MLIRContext: getOrCreateDialect<ConcreteDialect>()
This method requires dialect to provide a static getDialectNamespace()
and store a TypeID on the Dialect itself, which allows to lazyily
create a dialect when not yet loaded in the context.
As a side effect, it means that duplicated registration of the same
dialect is not an issue anymore.
To limit the boilerplate, TableGen dialect generation is modified to
emit the constructor entirely and invoke separately a "init()" method
that the user implements.
Differential Revision: https://reviews.llvm.org/D85495
- Arguments of the first block of a region are considered region arguments.
- Add API on Region class to deal with these arguments directly instead of
using the front() block.
- Changed several instances of existing code that can use this API
- Fixes https://bugs.llvm.org/show_bug.cgi?id=46535
Differential Revision: https://reviews.llvm.org/D83599