Depends On D105037
Avoid creating too many tasks when the number of workers is large.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D105126
Depends On D104850
Add a test that verifies that canonicalization removes all async overheads if it is statically known that the scf.parallel operation will be computed using a single block.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D104891
Depends On D104780
Recursive work splitting instead of sequential async tasks submission gives ~20%-30% speedup in microbenchmarks.
Algorithm outline:
1. Collapse scf.parallel dimensions into a single dimension
2. Compute the block size for the parallel operations from the 1d problem size
3. Launch parallel tasks
4. Each parallel task reconstructs its own bounds in the original multi-dimensional iteration space
5. Each parallel task computes the original parallel operation body using scf.for loop nest
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D104850
Specify the `!async.group` size (the number of tokens that will be added to it) at construction time. `async.await_all` operation can potentially race with `async.execute` operations that keep updating the group, for this reason it is required to know upfront how many tokens will be added to the group.
Reviewed By: ftynse, herhut
Differential Revision: https://reviews.llvm.org/D104780
This doesn't change APIs, this just cleans up the many in-tree uses of these
names to use the new preferred names. We'll keep the old names around for a
couple weeks to help transitions.
Differential Revision: https://reviews.llvm.org/D99127
This updates the codebase to pass the context when creating an instance of
OwningRewritePatternList, and starts removing extraneous MLIRContext
parameters. There are many many more to be removed.
Differential Revision: https://reviews.llvm.org/D99028
Add an option to pass the number of worker threads to select the number of async regions for parallel for transformation.
```
std::unique_ptr<OperationPass<FuncOp>> createAsyncParallelForPass(int numWorkerThreads);
```
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D92835
Depends On D89958
1. Adds `async.group`/`async.awaitall` to group together multiple async tokens/values
2. Rewrite scf.parallel operation into multiple concurrent async.execute operations over non overlapping subranges of the original loop.
Example:
```
scf.for (%i, %j) = (%lbi, %lbj) to (%ubi, %ubj) step (%si, %sj) {
"do_some_compute"(%i, %j): () -> ()
}
```
Converted to:
```
%c0 = constant 0 : index
%c1 = constant 1 : index
// Compute blocks sizes for each induction variable.
%num_blocks_i = ... : index
%num_blocks_j = ... : index
%block_size_i = ... : index
%block_size_j = ... : index
// Create an async group to track async execute ops.
%group = async.create_group
scf.for %bi = %c0 to %num_blocks_i step %c1 {
%block_start_i = ... : index
%block_end_i = ... : index
scf.for %bj = %c0 t0 %num_blocks_j step %c1 {
%block_start_j = ... : index
%block_end_j = ... : index
// Execute the body of original parallel operation for the current
// block.
%token = async.execute {
scf.for %i = %block_start_i to %block_end_i step %si {
scf.for %j = %block_start_j to %block_end_j step %sj {
"do_some_compute"(%i, %j): () -> ()
}
}
}
// Add produced async token to the group.
async.add_to_group %token, %group
}
}
// Await completion of all async.execute operations.
async.await_all %group
```
In this example outer loop launches inner block level loops as separate async
execute operations which will be executed concurrently.
At the end it waits for the completiom of all async execute operations.
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D89963