Commit Graph

17 Commits

Author SHA1 Message Date
Nicolas Vasilache b2729fda60 [mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)
This revision follows up on the conversation titled:

```[llvm-dev] Understanding and controlling some of the AVX shuffle emission paths```

The revision adds a vblendps-based implementation for transpose8x8 and further distinguishes between and intrinsics and an inline_asm implementation.

This results in roughly 20% fewer cycles as reported by llvm-mca:

After this revision (intrinsic version, resolves to virtually identical assembly as per the llvm-dev discussion, no vblendps instruction is emitted):
```
Iterations:        100
Instructions:      5900
Total Cycles:      2415
Total uOps:        7300

Dispatch Width:    6
uOps Per Cycle:    3.02
IPC:               2.44
Block RThroughput: 24.0

Cycles with backend pressure increase [ 89.90% ]
Throughput Bottlenecks:
  Resource Pressure       [ 89.65% ]
  - SKXPort1  [ 0.04% ]
  - SKXPort2  [ 12.42% ]
  - SKXPort3  [ 12.42% ]
  - SKXPort5  [ 89.52% ]
  Data Dependencies:      [ 37.06% ]
  - Register Dependencies [ 37.06% ]
  - Memory Dependencies   [ 0.00% ]
```

After this revision (inline_asm version, vblendps instructions are indeed emitted):
```
Iterations:        100
Instructions:      6300
Total Cycles:      2015
Total uOps:        7700

Dispatch Width:    6
uOps Per Cycle:    3.82
IPC:               3.13
Block RThroughput: 20.0

Cycles with backend pressure increase [ 83.47% ]
Throughput Bottlenecks:
  Resource Pressure       [ 83.18% ]
  - SKXPort0  [ 14.49% ]
  - SKXPort1  [ 14.54% ]
  - SKXPort2  [ 19.70% ]
  - SKXPort3  [ 19.70% ]
  - SKXPort5  [ 83.03% ]
  - SKXPort6  [ 14.49% ]
  Data Dependencies:      [ 39.75% ]
  - Register Dependencies [ 39.75% ]
  - Memory Dependencies   [ 0.00% ]
```

An accessible copy of the conversation is available [here](https://gist.github.com/nicolasvasilache/68c7f34012584b0e00f335bcb374ede0).

Differential Revision: https://reviews.llvm.org/D114393
2021-11-23 07:31:22 +00:00
Mehdi Amini e0b7bee7cf Revert "[mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)"
This reverts commit a9e236bed8.
This broke the Windows build:

mlir\include\mlir/Dialect/X86Vector/Transforms.h(28): error C2061: syntax error: identifier 'uint'
2021-11-22 19:23:18 +00:00
Nicolas Vasilache a9e236bed8 [mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)
This revision follows up on the conversation titled:

```[llvm-dev] Understanding and controlling some of the AVX shuffle emission paths```

The revision adds a vblendps-based implementation for transpose8x8 and further distinguishes between and intrinsics and an inline_asm implementation.

This results in roughly 20% fewer cycles as reported by llvm-mca:

After this revision (intrinsic version, resolves to virtually identical assembly as per the llvm-dev discussion, no vblendps instruction is emitted):
```
Iterations:        100
Instructions:      5900
Total Cycles:      2415
Total uOps:        7300

Dispatch Width:    6
uOps Per Cycle:    3.02
IPC:               2.44
Block RThroughput: 24.0

Cycles with backend pressure increase [ 89.90% ]
Throughput Bottlenecks:
  Resource Pressure       [ 89.65% ]
  - SKXPort1  [ 0.04% ]
  - SKXPort2  [ 12.42% ]
  - SKXPort3  [ 12.42% ]
  - SKXPort5  [ 89.52% ]
  Data Dependencies:      [ 37.06% ]
  - Register Dependencies [ 37.06% ]
  - Memory Dependencies   [ 0.00% ]
```

After this revision (inline_asm version, vblendps instructions are indeed emitted):
```
Iterations:        100
Instructions:      6300
Total Cycles:      2015
Total uOps:        7700

Dispatch Width:    6
uOps Per Cycle:    3.82
IPC:               3.13
Block RThroughput: 20.0

Cycles with backend pressure increase [ 83.47% ]
Throughput Bottlenecks:
  Resource Pressure       [ 83.18% ]
  - SKXPort0  [ 14.49% ]
  - SKXPort1  [ 14.54% ]
  - SKXPort2  [ 19.70% ]
  - SKXPort3  [ 19.70% ]
  - SKXPort5  [ 83.03% ]
  - SKXPort6  [ 14.49% ]
  Data Dependencies:      [ 39.75% ]
  - Register Dependencies [ 39.75% ]
  - Memory Dependencies   [ 0.00% ]
```

An accessible copy of the conversation is available [here](https://gist.github.com/nicolasvasilache/68c7f34012584b0e00f335bcb374ede0).

Reviewed By: ftynse, dcaballe

Differential Revision: https://reviews.llvm.org/D114335
2021-11-22 10:32:34 +00:00
Nicolas Vasilache 34ff857350 [mlir][X86Vector] Add specialized vector.transpose lowering patterns for AVX2
This revision adds an implementation of 2-D vector.transpose for 4x8 and 8x8 for
AVX2 and surfaces it to the Linalg level of control.

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D113347
2021-11-11 07:33:31 +00:00
Nicolas Vasilache 885072820c [mlir][Vector] Add a pattern to lower 2-D vector.transpose to shape_cast+shuffle.
The 2-D case can be rewritten to generate quite fewer instructions and a single vector.shuffle which seems to provide a nice performance boost.
Add this arrow to our quiver by exposing it with a new vector transform option.

Differential Revision: https://reviews.llvm.org/D113062
2021-11-02 22:12:46 +00:00
Nicolas Vasilache d054b80bd3 [mlir][Vector] NFC - Add option to hook vector.transpose lowering to strategies.
This revision also moves some code around to improve overall structure.

Differential Revision: https://reviews.llvm.org/D112437
2021-10-25 12:26:33 +00:00
Nicolas Vasilache 176a0ea535 [mlr][Linalg] NFC - Add option to hook vector.multi_reduction lowering to strategies.
Differential Revision: https://reviews.llvm.org/D112414
2021-10-25 11:31:39 +00:00
thomasraoux 1d8cc45b0e [mlir][vector] Add patterns to convert multidimreduce to vector.contract
add several patterns that will simplify contraction vectorization in the
future. With those canonicalizationns we will be able to remove the special
case for contration during vectorization and rely on those transformations to
avoid materizalizing broadcast ops.

Differential Revision: https://reviews.llvm.org/D112121
2021-10-21 14:03:32 -07:00
Ahmed S. Taei a3dd4e7770 Drop transfer_read inner most unit dimensions
Add a pattern to take a rank-reducing subview and drop inner most
contiguous unit dim.
This is useful when lowering vector to backends with 1d vector types.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D111561
2021-10-20 19:27:04 +00:00
Mogball a54f4eae0e [MLIR] Replace std ops with arith dialect ops
Precursor: https://reviews.llvm.org/D110200

Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.

Renamed all instances of operations in the codebase and in tests.

Reviewed By: rriddle, jpienaar

Differential Revision: https://reviews.llvm.org/D110797
2021-10-13 03:07:03 +00:00
Lei Zhang 3964c1db91 [mlir][vector] Split populateVectorContractLoweringPatterns
It was bundling quite a lot of patterns that convert high-D
vector ops into low-D elementary ops. It might not be good
for all of the patterns to happen for a particular downstream
user. For example, `ShapeCastOpRewritePattern` rewrites
`vector.shape_cast` into data movement extract/insert ops.

Instead, split the entry point into multiple ones so users
can pull in patterns on demand.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D111225
2021-10-07 09:39:26 -04:00
harsh-nod e33f301ec2 [mlir] Add support for moving reductions to outer most dimensions in vector.multi_reduction
The approach for handling reductions in the outer most
dimension follows that for inner most dimensions, outlined
below

First, transpose to move reduction dims, if needed
Convert reduction from n-d to 2-d canonical form
Then, for outer reductions, we emit the appropriate op
(add/mul/min/max/or/and/xor) and combine the results.

Differential Revision: https://reviews.llvm.org/D107675
2021-08-13 12:59:50 -07:00
Matthias Springer d1a9e9a7cb [mlir][vector] Remove vector.transfer_read/write to LLVM lowering
This simplifies the vector to LLVM lowering. Previously, both vector.load/store and vector.transfer_read/write lowered directly to LLVM. With this commit, there is a single path to LLVM vector load/store instructions and vector.transfer_read/write ops must first be lowered to vector.load/store ops.

* Remove vector.transfer_read/write to LLVM lowering.
* Allow non-unit memref strides on all but the most minor dimension for vector.load/store ops.
* Add maxTransferRank option to populateVectorTransferLoweringPatterns.
* vector.transfer_reads with changing element type can no longer be lowered to LLVM. (This functionality is needed only for SPIRV.)

Differential Revision: https://reviews.llvm.org/D106118
2021-07-17 14:07:27 +09:00
thomasraoux 291025389c [mlir][vector] Refactor Vector Unrolling and remove Tuple ops
Simplify vector unrolling pattern to be more aligned with rest of the
patterns and be closer to vector distribution.
The new implementation uses ExtractStridedSlice/InsertStridedSlice
instead of the Tuple ops. After this change the ops based on Tuple don't
have any more used so they can be removed.

This allows removing signifcant amount of dead code and will allow
extending the unrolling code going forward.

Differential Revision: https://reviews.llvm.org/D105381
2021-07-07 11:11:26 -07:00
thomasraoux 627733b5f0 [mlir][vector] Extend vector distribution to all elementwise and contract
Uses elementwise interface to generalize canonicalization pattern and add a new
pattern for vector.contract case.

Differential Revision: https://reviews.llvm.org/D104343
2021-06-30 16:22:31 -07:00
Mehdi Amini b5e22e6d42 Migrate MLIR test passes to the new registration API
Make sure they all define getArgument()/getDescription().

Depends On D104421

Differential Revision: https://reviews.llvm.org/D104426
2021-06-16 23:42:17 +00:00
River Riddle 3fef2d26a3 [mlir][NFC] Move passes in test/lib/Transforms/ to a directory that mirrors what they test
test/lib/Transforms/ has bitrot and become somewhat of a dumping grounds for testing pretty much any part of the project. This revision cleans this up, and moves the files within to a directory that reflects what is actually being tested.

Differential Revision: https://reviews.llvm.org/D102456
2021-05-14 10:28:11 -07:00