Commit Graph

1215 Commits

Author SHA1 Message Date
Philip Reames eb912411e9 [RISCV] Add cost model coverage for mask reductions requiring legalization 2022-06-10 12:04:48 -07:00
Philip Reames d459530804 [RISCV] Fix accidental deletion of test lines in 2247e4d
Apparently, update_analyze_test_checks.sh does *not* warn on conflicting CHECKs, it just silently drops those lines from the generated test.  That is.. less than helpful.
2022-06-10 09:02:41 -07:00
Craig Topper e91051184c [RISCV] Mark FSIN and other math functions as Expand for scalable vectors.
This prevents them from being assumed legal by the cost model.

This matches what is done for AArch64 SVE.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D123799
2022-06-10 08:40:07 -07:00
Philip Reames 2247e4de83 [RISCV] Use common prefixes to reduce duplication in cost model tests 2022-06-10 08:36:14 -07:00
Philip Reames 2b83467d9e [RISCV] Broaden cost model coverage for fixed vectors w/i1 element type 2022-06-10 08:12:42 -07:00
Philip Reames 0e29a80fdc [RISCV] Add cost model for reverse shuffle
The majority of the cost appears to be forming the indices vector.

Differential Revision: https://reviews.llvm.org/D127141
2022-06-09 07:21:40 -07:00
David Green 53be6ab25c [ARM] Fix MVE getShuffleCost legalized type check
The MVE shuffle costing for VREV instructions was making incorrect
assumptions as to legalized vector types remaining as vectors. Add a
quick check to ensure they are indeed vectors before attempting to get
the number of elements.
2022-06-07 14:36:04 +01:00
Philip Reames beb06f3c53 [RISCV] Add cost model test coverage of scalable reductions 2022-06-06 14:32:30 -07:00
David Green 75631438e3 [AArch64] Costmodel tests for llvm.vscale intrinsics. NFC
These shows that the cost of a @llvm.vscale is indeed 1, not 10.
2022-05-26 10:16:21 +01:00
Simon Pilgrim 6c80267d0f [CostModel][X86] getScalarizationOverhead - improve extraction costs for > 128-bit vectors
We were using the default getScalarizationOverhead expansion for extraction costs, which adds up all the individual element extraction costs.

This is fine for 128-bit vectors, but for 256/512-bit vectors each element extraction also has to account for extracting the upper 128-bit subvector extraction before it can handle the element. For scalarization costs we only need to extract each demanded subvector once.

Differential Revision: https://reviews.llvm.org/D125527
2022-05-24 15:18:08 +01:00
David Green b4dd9fc370 [ARM] Cost modelling for MVE vector fptoi_sat
Building on top of D125665, this adds MVE costs for fptosi.sat and
fptoui.sat, providing MVE is available and the types are legal.

Differential Revision: https://reviews.llvm.org/D125666
2022-05-20 11:00:34 +01:00
David Green 80aab0312a [ARM] Cost modelling for scalar fptoi_sat
Similar to D124357, this adds some cost modelling for fptoi_sat for Arm
targets. Where VFP2 is available (and FP64/FP16 for the relevant types),
the operations are legal as the Arm instructions naturally saturate.
Otherwise they will need an extra smin/smax clamp, similar to AArch64.

Differential Revision: https://reviews.llvm.org/D125665
2022-05-19 19:53:21 +01:00
David Green 4c6a070a2c [AArch64] Teach perfect shuffles tables about D-lane movs
Similar to D123386, this adds D-Movs to the AArch64 perfect shuffle
tables, slightly lowering the costs a little more. This is a rough
improvement in general, especially if you ignore mov v0.16b, v2.16b type
moves that are often artefacts of the calling convention.

The D register movs are encoded as (0x4 | LaneIdx), and to generate a D
register move we are required to bitcast into a higher type, but it is
otherwise very similar to the S-lane mov's already supported.

Differential Revision: https://reviews.llvm.org/D125477
2022-05-17 18:16:45 +01:00
Simon Pilgrim 47be07074a [CostModel][X86] Auto generate partial interleaved load LV costs using UTC_ARGS --filter control 2022-05-12 17:46:41 +01:00
Simon Pilgrim 14e83ada16 [CostModel][X86] Auto generate masked load/store LV costs using UTC_ARGS --filter control
Also fix a sse42 -> sse4.2 typo so that we actually test costs for sse4.2
2022-05-12 17:40:40 +01:00
Simon Pilgrim a5c45c4dc1 [CostModel][X86] Auto generate gather/scatter LV costs using UTC_ARGS --filter control
Also fix a sse42 -> sse4.2 typo so that we actually test costs for sse4.2
2022-05-12 17:39:06 +01:00
Craig Topper 39e63bd2d8 [IR][CostModel] A scalable vector shuffle can't be an identity or reverse shuffle.
Even if the minimum number of elements is 1 and the length doesn't change,
we don't know what vscale is so we can't classify it as identity mask. Instead it
is a zero element splat.

For reverse, we shouldn't classify it as a reverse unless there are at least 2 elements
in the mask. This applies to both fixed and scalable vectors. For fixed vectors, a single
element would be an identity shuffle. For scalable vector it's a zero elt splat.

Reviewed By: sdesmalen, liaolucy

Differential Revision: https://reviews.llvm.org/D124655
2022-05-09 21:37:25 -07:00
David Green dccc69a38d [AArch64] Add extra reverse costs.
This adds some extra costs for reverse shuffles under AArch64, filling
in the i16/f16/i8 gaps in the cost model.

Differential Revision: https://reviews.llvm.org/D124786
2022-05-06 18:23:36 +01:00
Simon Pilgrim 3d107ce2b2 [CostModel][X86] Relax fcmp costs on SSE41 targets or later
Only pre-SSE41 targets double-pump the fp comparison ops
2022-05-06 13:29:40 +01:00
Simon Pilgrim cbfa857346 [CostModel][X86] Adjust 128-bit select costs to account for slow BLENDV op
Based off the script from D103695 - Jaguar, Bulldozer, Silvermont (et al) and Haswell all have slow BLENDV ops, so adjust the worse case cost values
2022-05-06 13:07:34 +01:00
Simon Pilgrim d21bf51494 [CostModel][X86] Adjust pre-SSE41 fp scalar select costs to account for vector ops
Based off the script from D103695, we now mainly use BLENDV or OR(AND,ANDN) to select scalar float/double ops
2022-05-06 11:41:55 +01:00
Simon Pilgrim f0e8c1d6d9 [CostModel][X86] Adjust 256-bit select costs to account for slow BLENDV op
Based off the script from D103695, on AVX1, Jaguar/Bulldozer both have low throughput for ymm select patterns (BLENDV + OR(AND,ANDN))), and even on AVX2 Haswell still struggles with BLENDV ops
2022-05-06 11:27:37 +01:00
Simon Pilgrim 4236a10717 [CostModel][X86] Add more complete float/double select cost test coverage
We were only testing basic vector types
2022-05-06 10:45:36 +01:00
Peter Waller 75f9e83ace [AArch64] Add -aarch64-insert-extract-base-cost
The new flag -aarch64-insert-extract-base-cost can be used to
set the value of AArch64Subtarget::getVectorInsertExtractBaseCost(),
for the purposes of experimentation.

Differential Revision: https://reviews.llvm.org/D124835
2022-05-05 10:35:45 +00:00
David Green 2dcb2d8562 [AArch64] Cost modelling for fptoi_sat
This builds on top of the target-independent cost model added in D124269
to add aarch64 specific costs for fptoui_sat and fptosi_sat intrinsics.
For many common types they will be legal instructions as the AArch64
instructions will saturate naturally. For unsupported pairs of integer
and floating point types, an additional min/max clamp is needed.

Differential Revision: https://reviews.llvm.org/D124357
2022-05-02 11:36:05 +01:00
Simon Pilgrim 86bb7df6e6 [CostModel][X86] getScalarizationOverhead - handle vXi1 extracts with MOVMSK (pre-AVX512)
We can quickly extract multiple elements of a bool vector using MOVMSK ops - since we don't know what generated the vXi1, I've been optimistic and assumed we can use PMOVMSKB to extract the maximum number of bools with a single op.

The MOVMSK pattern isn't great for extract+insert round trips as vXi1 type legalization can interfere with this a lot - so this relies on us remaining good at using getScalarizationOverhead properly (and tagging both Insert and Extract modes) for those round trip cases.

The AVX512 KMOV codegen for bool extraction is a bit of a mess so for now I've not included that - the per-element cost is a lot more accurate for current codegen.
2022-05-02 09:58:39 +01:00
David Green 986de8f50b [AArch64] Add more comprehensive reverse shuffle costmodel tests. NFC 2022-05-02 09:16:57 +01:00
Simon Pilgrim d5198cf92f [CostModel][X86] Check for 'null op' truncations
If the legalized src/dst types are the same, assume the "truncation" is free.

This fixes some edge cases such as mul lo/hi ops and bool vectors which will get legalized back to legal vector widths
2022-05-01 12:03:40 +01:00
Simon Pilgrim c2964746e3 [CostModel][X86] Reduce cost of vector selects on SSE2/AVX1 targets
Based off the script from D103695, we were exaggerating the cost of the OR(AND(X,M),AND(Y,~M)) expansion using instruction count instead of effective throughput
2022-05-01 09:32:14 +01:00
Alexey Bataev 371412e065 [COST]Fix crash for non-power-2 vector shuffle mask.
Need to normalizize the mask to avoid possible crashes during attempts
to estimate cost of the very long shuffles with non-power-2 number of
elements in masks.
2022-04-29 07:28:07 -07:00
LiaoChunyu 03a3654203 [RISCV] Add cost model for SK_Broadcast
Add cost model for broadcast shuffle in RISCVTTIImpl::getShuffleCost
with scalable vector. The cost model might not the best.

For scalable vector, BasicTTIImpl::getShuffleCost return invalid cost,
so this patch relies on the existing cost model in BasicTTIImpl.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124101
2022-04-29 13:28:02 +08:00
Alexey Bataev 75e1cf4a6a [COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

Differential Revision: https://reviews.llvm.org/D100486
2022-04-28 10:04:41 -07:00
Alexey Bataev ac23cf738a [COST][NFC]Add a test for non-power-2 shuffles, NFC. 2022-04-28 09:08:28 -07:00
Alexey Bataev 9861ca0c23 Revert "[COST]Improve cost model for shuffles in SLP."
This reverts commit 29a470e380 to fix
a crash reported in https://reviews.llvm.org/D100486#3479989.
2022-04-28 08:11:56 -07:00
David Green 05b0a49832 [AArch64] Add a fp128 shuffle test. NFC
These legalize to scalar types, so it's useful to have a test case that
covers them.
2022-04-28 14:28:45 +01:00
Alexey Bataev 29a470e380 [COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

Differential Revision: https://reviews.llvm.org/D100486
2022-04-27 10:56:26 -07:00
Vasileios Porpodas c7bb5ac5ca [NFC] Renamed /test/Analysis/CostModel/X86/splat-load.ll test and added more checks.
Renamed test/Analysis/CostModel/X86/splat-load.ll to shuffle-load.ll
to align it with AArch64's similar test.

Also added a complete list of checks for all vector combinations up to 512-bits.

Differential Revision: https://reviews.llvm.org/D124528
2022-04-27 09:47:43 -07:00
David Green 8e2a0e61f5 [AArch64] Break up larger shuffle-masks into legal sizes in getShuffleCost
Given a larger-than-legal shuffle mask, the final codegen will split
into multiple sub-vectors. This attempts to model that in
AArch64TTIImpl::getShuffleCost, splitting masks up according to the size
of the legalized vectors. If the sub-masks have at most 2 input sources
we can call getShuffleCost on them and sum the costs, to get a more
accurate final cost for the entire shuffle. The call to
improveShuffleKindFromMask helps to improve the shuffle kind for the
sub-mask cost call.

Differential Revision: https://reviews.llvm.org/D123414
2022-04-27 13:51:50 +01:00
David Green d42f222f9d [AArch64] Add some larger shuffle cost tests. NFC 2022-04-27 13:30:50 +01:00
David Green d6327050e0 [AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost
Given a shuffle with 4 elements size 16 or 32, we can use the costs
directly from the PerfectShuffle tables to get a slightly more accurate
cost for the resulting shuffle.

Differential Revision: https://reviews.llvm.org/D123409
2022-04-27 12:09:01 +01:00
David Green 4a8c13a6f4 [CostModel] Add basic fptoi_sat costs
This adds some basic fptosi_sat and fptoui_sat target independent cost
modelling. The fptosi_sat is modelled as a fmin/fmax to saturate the
value, followed by a fp convert. The signed values then have an
additional fcmp+select for handling Nan correctly.

The AArch64/Arm costs may be more incorrect, as the instruction exist
natively. This can be fixed with target specific cost updates.

Differential Revision: https://reviews.llvm.org/D124269
2022-04-27 09:30:00 +01:00
Vasileios Porpodas fa8a9fea47 Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"
This reverts commit 6a9bbd9f20.

Code review: https://reviews.llvm.org/D124202
2022-04-26 14:02:40 -07:00
Vasileios Porpodas 6a9bbd9f20 Revert "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"
This reverts commit 55ce296d6f.
2022-04-26 11:25:26 -07:00
Vasileios Porpodas 55ce296d6f [SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`
Before this patch `Args` was used to pass a broadcat's arguments by SLP.
This patch changes this. `Args` is now used for passing the operands of
the shuffle.

Differential Revision: https://reviews.llvm.org/D124202
2022-04-26 11:11:29 -07:00
Vasileios Porpodas 957ada4164 [AArch64][NFC] Deleted llvm/test/Analysis/CostModel/AArch64/splat-load.ll test
This test is no longer necessary as it is a subset of:
llvm/test/Analysis/CostModel/AArch64/shuffle-load.ll

Differential Revision: https://reviews.llvm.org/D124456
2022-04-26 10:22:11 -07:00
David Green 1159984802 [CostModel] Add fptoi_sat costmodel tests. NFC 2022-04-25 18:44:35 +01:00
David Green 091c2f953d [AArch64] Add some splat of load cost model tests. NFC
They do not work yet, but we can hopefully adjust the cost for them to
get them to be recognized
2022-04-22 09:38:06 +01:00
Vasileios Porpodas e83ad23daf [TTI] Pre-commit cost model tests splat-loads. 2022-04-21 14:45:51 -07:00
Roman Lebedev be5c15c7ae
[NFC][Costmodel][LV][X86] Refresh one or two interleaved load/store tests 2022-04-15 17:43:18 +03:00
LiaoChunyu 505fce5a9e [RISCV] Add basic code modeling for llvm.experimental.stepvector intrinsic
Scalable vectors llvm.experimental.stepvector intrinsic
will crash due to an invalid cost when run the code through the loopunroll.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D122782
2022-04-11 10:19:23 +08:00
David Green fa784f6382 [AArch64] Insert subvector costs
An insert subvector under aarch64 can often be done as a single lane mov
operation. For example a v4i8 inserted into a v16i8 is a s-reg mov, so
long as the index is a multiple of 4. This teaches the cost model that,
using code copied over from the X86 backend.

Some of the costs (v16i16_4_0) are still high because they get matched
as a SK_Select, not an SK_InsertSubvector. D120879 has some codegen
tests for inserting subvectors, which I were added as
llvm/test/CodeGen/AArch64/insert-subvector.ll.

Differential Revision: https://reviews.llvm.org/D120880
2022-04-07 19:27:41 +01:00
David Green 750bf3582a [AArch64] Increase cost of v2i64 multiplies
The cost of a v2i64 multiply was special cased in D92208 as scalarized
into 4*extract + 2*insert + 2*mul. Scalarizing to/from gpr registers are
expensive though, and the cost wasn't high enough to prevent vectorizing
in places where it can be detrimental for performance. This increases it
so that the costs of copying to/from GPRs is increased to 2 each, with
the total cost increasing to 14. So long as umull/smull are handled
correctly (as in D123006) this seems to lead to better vectorization
factors and better performance.

Differential Revision: https://reviews.llvm.org/D123007
2022-04-04 17:42:20 +01:00
David Green 2abaa027d9 [AArch64] Teach the costmodel about widening muls
A vector mul(sext, sext) or mul(zext, zext) will be code generated as a
single smull or umull instruction. This most notably effects v2i64
multiplies, which are otherwise not legal and need to be expanded.

The oneuse check has also been slightly changed, as it is already
checked from the use of isWideningInstruction in getCastInstrCost.

Differential Revision: https://reviews.llvm.org/D123006
2022-04-04 12:45:04 +01:00
David Green 2e2f38a1ac [AArch64] Add widening arithmetic cost tests. NFC 2022-04-04 12:19:45 +01:00
Simon Pilgrim d663166acb [CostModel][X86] Reduce cost of v2i64 icmp base cost on SSE2 targets
Based off the script from D103695, we were exaggerating the cost of the v2i64 comparison expansion using instruction count instead of effective throughput
2022-03-30 09:11:55 +01:00
Arthur Eubanks d051c566cd [test] Remove the last couple uses of -analyze in llvm/test 2022-03-23 11:31:12 -07:00
David Green c56dd20f69 [AArch64] Add extra insert subvector cost model tests. NFC 2022-03-22 12:20:19 +00:00
Yeting Kuo ecd7a0132a [RISCV] Add basic cost model for vector casting
To perform the cost model of vector casting, the patch consider most vector
casts as their scalar form and consider those vector form of free scalr castings
as 1.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121771
2022-03-22 14:17:08 +08:00
Simon Pilgrim 5dde9c1286 [CostModel][X86] Reduce cost of extracting bool vector elements
For constant indices, these are now just a MOVMSK+TEST/BT
2022-03-18 19:02:47 +00:00
Simon Pilgrim 4455c5cdea [CostModel][X86] Update RUN -passes=* to double quotes to appease update scripts on windows 2022-03-18 11:44:18 +00:00
Craig Topper bbd2ecf9f0 [RISCV] Add +experimental-zvfh extension to cover half types in vectors.
Currently we allow half types in vectors if the scalar Zfh extension
is enabled. This behavior is not inline with the vector spec. For f32
and f64 types, the Zve32f, Zve64f, Zve64d, and V explicitly control
the availablity of floating point types in vectors.

In order to make our compiler compliant, we either need to remove all support
for half in vectors or we need an extension to control it.

Draft spec here https://github.com/riscv/riscv-v-spec/pull/780

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D121345
2022-03-17 10:04:02 -07:00
David Sherwood e7b89c2fc3 Add BasicTTIImpl cost model for llvm.get.active.lane.mask intrinsic
The vectoriser sometimes generates predicated vector loops using
the llvm.get.active.lane.mask intrinsic so it's important that we
are able to calculate a valid cost for the call instruction. When
SVE is enabled we are able to use a single whilelo instruction
for some vector types - in such cases I've marked the cost as 1.
For all other cases I've set the cost according to how the intrinsic
will be expanded.

Tests added here:

  Analysis/CostModel/AArch64/sve-intrinsics.ll
  Analysis/CostModel/ARM/active_lane_mask.ll
  Analysis/CostModel/RISCV/active_lane_mask.ll

Differential Revision: https://reviews.llvm.org/D121109
2022-03-14 09:35:05 +00:00
Yeting Kuo ae7c6647f3 [RISCV] Add basic code modeling for fixed length vector reduction.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121447
2022-03-14 11:04:31 +08:00
Florian Hahn aa590e5823
[AArch64] Improve costs for some conversions to fp16.
Currently the cost model under-estimates the cost of certain
FP16 conversions.

This patch updates getCastInstrCost to return more accurate costs for
the cases improved in c2ed9fd054.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D113700
2022-03-11 10:27:39 +00:00
Florian Hahn 697f55e368
[AArch64] Move fp16 cast tests.
Move FP16 tests to fp16cast function, as suggested in D113700.
2022-03-10 12:22:06 +00:00
Roman Lebedev 2f80ea7f4f
[NFC][LV] Use different braces in debug output
The analysis passes output function name encapsulated in `'` braces,
but LV uses `"`. Harmonizing this may help in creating an update script
for the LV costmodel test checks.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D121105
2022-03-07 19:32:37 +03:00
David Green 43b638241a [AArch64] Use NPM for cost model tests. NFC
As per the other tests, this switches the run lines back to using the
NPM via
-passes='print<cost-model>' -cost-kind=throughput 2>&1 -disable-output
2022-03-07 08:57:50 +00:00
Alex Tsao 89f15fc687 [RISCV] Add cost modelling for masked memory op
The patch adds very basic cost model for masked memory op on scalable vector.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D117884
2022-03-03 20:47:58 +08:00
David Green 47f4cd9c3d [AArch64] Update costs for some fp16 converts
This updates the costs for FP16 converts, as some of them were pretty
high.

Differential Revision: https://reviews.llvm.org/D120771
2022-03-03 11:17:24 +00:00
David Green 65c0e45a37 [AArch64] Vector shifts cost 1
The costs of vector shifts was 2 as opposed to 1, as the nodes are
marked custom. Fix this like the others and mark the nodes as cheap.

Differential Revision: https://reviews.llvm.org/D120773
2022-03-03 10:42:57 +00:00
David Green 97e0366d67 [AArch64] Add some fp16 conversion cost tests. NFC 2022-03-02 18:07:14 +00:00
Nikita Popov 98cfcae4e9 Revert "[RISCV] Add cost modelling for masked memory op"
This reverts commit 76f243b53b.

The newly added test fails.
2022-03-02 17:32:10 +01:00
Alex Tsao 76f243b53b [RISCV] Add cost modelling for masked memory op
The patch adds very basic cost model for masked memory op on scalable vector.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D117884
2022-03-02 22:48:41 +08:00
David Green 02de975259 [AArch64] Add some tests for the cost of extending an extract. NFC 2022-03-02 14:47:32 +00:00
David Green 62c2b070d5 [AArch64] Add simple arithmetic cost model test. NFC 2022-03-01 23:31:02 +00:00
David Green 2e7c35ea12 [AArch64] Cleanup and extend cast costs. NFC 2022-02-26 17:59:02 +00:00
David Green 5fe8307b70 [AArch64] Add scalar min/max costs. NFC
The vector costs were already added, this adds scalar variants to
complete the test coverage.
2022-02-25 17:11:24 +00:00
Pavel Kosov 37fa99eda0 [SchedModels][CortexA55] Add ASIMD integer instructions
Depends on D114642

Original review https://reviews.llvm.org/D112201

OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D117003
2022-02-17 13:41:57 +03:00
Arthur Eubanks 15ba588d6d [test] Migrate '-analyze -cost-model' to '-passes=print<cost-model>' 2022-02-09 15:42:16 -08:00
David Green b55d4c2ad8 Revert "[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`"
This reverts commit 77a0da926c as we've
received multiple reports of this significantly impacting performance,
in ways that don't seem to just be target specific cost models going
wrong. I would offer some reproducers, but the test changes here seem to
be full of them!

Reverting for now and hopefully we can remove the "hack" more carefully
as we go.
2022-02-09 20:02:54 +00:00
Craig Topper 09629215c2 [RISCV] Add a really basic cost model for SK_Splice.
While testing scalable vectors I found that if we generate a
vector splice intrinsic and run the code through the loop unroller,
we'll crash due to an invalid cost.

This adds a basic cost based on the 2 slide instructions used by the
lowering in D119303.

We probably need to factor LMUL into this, but that's true for
arithmetic instructions too. So I've ignored for the moment.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D119316
2022-02-09 11:43:31 -08:00
Roman Lebedev 77a0da926c
[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`
D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model.
What it essentially does is prevents scalarized vectorization of masked memory operations:
```
  // TODO: Cost model for emulated masked load/store is completely
  // broken. This hack guides the cost model to use an artificially
  // high enough value to practically disable vectorization with such
  // operations, except where previously deployed legality hack allowed
  // using very low cost values. This is to avoid regressions coming simply
  // from moving "masked load/store" check from legality to cost model.
  // Masked Load/Gather emulation was previously never allowed.
  // Limited number of Masked Store/Scatter emulation was allowed.
```

While i don't really understand about what specifically `is completely broken`
was talking about, i believe that at least on X86 with AVX2-or-later,
this is no longer true. (or at least, i would like to know what is still broken).
So i would like to follow suit after D111460, and like wise disable that hack for AVX2+.

But since this was added for X86 specifically, let's just instead completely remove this hack.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114779
2022-02-07 16:08:31 +03:00
Florian Hahn 17ebd68ae6
[AArch64] Fix costs of float vector compare/selects pairs.
The current cost-model overestimates the cost of vector compares &
selects for ordered floating point compares. This patch fixes that by
extending the existing logic for integer predicates.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D118256
2022-01-31 10:18:29 +00:00
Florian Hahn cb3df1a299
[AArch64] Add vector compare/select tests with UNE predicate.
Precommit some additional tests for D118256.
2022-01-27 14:20:40 +00:00
Florian Hahn e6ebd2c72d
[AArch64] Add float vector compare/select cost-model tests. 2022-01-26 16:27:29 +00:00
Alban Bridonneau 2feddb37b4 Implement correct cost for SVE bitcasts
We have some bitcasts which we know will be simplified,
so their cost is zero.

Reviewed By: david-arm, sdesmalen

Differential Revision: https://reviews.llvm.org/D118019
2022-01-26 14:25:44 +00:00
Stanislav Mekhanoshin bb1fe36977 [AMDGPU] Make v8i16/v8f16 legal
Differential Revision: https://reviews.llvm.org/D117721
2022-01-24 11:51:08 -08:00
eopXD 3cf15af2da [RISCV] Remove experimental prefix from rvv-related extensions.
Extensions affected: +v, +zve*, +zvl*

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117860
2022-01-22 20:18:40 -08:00
Shao-Ce SUN a0a76fee0c [RISCV] update zfh and zfhmin extention to v1.0
`zfh` and `zfhmin` have been ratified, with version 1.0.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117098
2022-01-15 09:21:24 +08:00
David Sherwood ef1ca4d3e9 [AArch64] Fix incorrect use of MVT::getVectorNumElements in AArch64TTIImpl::getVectorInstrCost
If we are inserting into or extracting from a scalable vector we do
not know the number of elements at runtime, so we can only let the
index wrap for fixed-length vectors.

Tests added here:

  Analysis/CostModel/AArch64/sve-insert-extract.ll

Differential Revision: https://reviews.llvm.org/D117099
2022-01-13 09:27:14 +00:00
Andrew Litteken 4ff4e7ea30 [CostModel] Use cost of target trunc type when only it is the only use of a non-register sized load
The code size cost model for most targets uses the legalization cost for the type of the pointer of a load. If this load is followed directly by a trunc instruction, and is the only use of the result of the load, only one instruction is generated in the target assembly language. This adds a check for this case, and uses the target type of the trunc instruction if so.

This did not show any changes in CTMark code size benchmarks.

Reviewers: paquette, samparker, dmgreen

Differential Revision: https://reviews.llvm.org/D109388
2022-01-12 18:03:50 -06:00
Simon Pilgrim 5eb47961c4 [CostModel][X86] Update ROTL/ROTR vXi8/vXi16 costs on AVX512BW targets
Refresh based off recent improvements to codegen and the helper script from D103695
2022-01-10 13:18:25 +00:00
David Green bc615e436c [AArch64] Update addo and subo costs
Similar to D116732, this adds basic scalar sadd_with_overflow,
uadd_with_overflow, ssub_with_overflow and usub_with_overflow costs for
aarch64, which are usually quite efficiently lowered.

Differential Revision: https://reviews.llvm.org/D116734
2022-01-07 16:20:23 +00:00
David Green c65270cf96 [AArch64] Add basic umulo and smulo costs
This adds some AArch64 specific smul_with_overflow and umul_with_overflow
costs, overriding the default costs. The code generation for these mul
with overflow intrinsics is usually better than the default expansion on
AArch64. The costs come from https://godbolt.org/z/zEzYhMWqo with various
types, or llvm/test/CodeGen/AArch64/arm64-xaluo.ll.

Differential Revision: https://reviews.llvm.org/D116732
2022-01-06 17:22:47 +00:00
Nico Weber 085f078307 Revert "Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`.""
This reverts commit 859ebca744.
The change contained many unrelated changes and e.g. restored
unit test failes for the old lld port.
2022-01-05 13:10:25 -05:00
David Salinas 859ebca744 Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`."
This reverts commit 640beb38e7.

That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort).
Reverting until we have a better solution to s_cselect_b64 codegen cleanup

Change-Id: Ibf8e397df94001f248fba609f072088a46abae08

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D115960

Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105
2022-01-05 17:57:32 +00:00
Jun Ma 80e56ad9ae [TTI] Return invalid cost for scalable vector in getShuffleCost
Differential Revision: https://reviews.llvm.org/D116362
2022-01-05 18:59:11 +08:00
Daniil Fukalov a2120f6b44 [NFC][AMDGPU][CostModel] Add tests for AMDGPU cost model, part 2. 2021-12-22 22:33:57 +03:00
Daniil Fukalov deaedab14a [NFC][AMDGPU][CostModel] Add tests for AMDGPU cost model. 2021-12-22 22:32:09 +03:00
Matthew Devereau e00f22c1b1 [AArch64][SVE] Teach cost model that masked loads/stores are cheap
Reduce the cost of VLS masked loads/stores to make the vectorizor emit them more frequently.
2021-12-17 15:04:45 +00:00