Commit Graph

625 Commits

Author SHA1 Message Date
Jonas Paulsson 370a8c8025 [SystemZ] Make sure not to call getZExtValue on a >64 bit constant.
Better use isZero() and isIntN() in SystemZTargetTransformInfo rather than
calling getZExtValue() since the immediate operand may be wider than 64 bits,
which is not allowed with getZExtValue().

Fixes https://bugs.llvm.org/show_bug.cgi?id=47600

Review: Simon Pilgrim
2020-09-23 15:36:32 +02:00
Bing1 Yu ec24e50553 [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8)
add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D87884
2020-09-23 10:29:10 +08:00
Simon Pilgrim 18a3ebcd30 [CostModel][X86] Add some select shuffle costs tests for D87884 2020-09-21 16:09:05 +01:00
Simon Pilgrim de25ebaac6 [CostModel][X86] Add vXi32 division by uniform constant costs (PR47476)
Other types can be handled in future patches but their uniform / non-uniform costs are more similar and don't appear to cause many vectorization issues.
2020-09-10 12:17:54 +01:00
Sam Parker 0af4147804 [ARM][CostModel] CodeSize costs for i1 arith ops
When optimising for size, make the cost of i1 logical operations
relatively expensive so that optimisations don't try to combine
predicates.

Differential Revision: https://reviews.llvm.org/D86525
2020-09-07 09:27:18 +01:00
Anna Welker 064981f0ce [ARM][MVE] Enable MVE gathers and scatters by default
Enable MVE gather/scatters by default, which requires some
minor adaptations in some tests.

Differential revision: https://reviews.llvm.org/D86776
2020-08-28 19:05:29 +01:00
David Green 677c1590c0 [ARM] Increase MVE gather/scatter cost by MVECostFactor.
MVE Gather scatter codegeneration is looking a lot better than it used
to, but still has some issues. The instructions we currently model as 1
cycle per element, which is a bit low for some cases. Increasing the
cost by the MVECostFactor brings them in-line with our other instruction
costs. This will have the effect of only generating then when the extra
benefit is more likely to overcome some of the issues. Notably in
running out of registers and vectorizing loops that could otherwise be
SLP vectorized.

In the short-term whilst we look at other ways of dealing with those
more directly, we can increase the costs of gathers to make them more
likely to be beneficial when created.

Differential Revision: https://reviews.llvm.org/D86444
2020-08-26 13:03:46 +01:00
Sam Parker da4ada116e [NFC][ARM] arith code size cost tests
Add a run to measure the code size cost of arithmetic instructions
and add a function for i1 types.
2020-08-25 11:16:01 +01:00
David Sherwood 7b64765cd1 [SVE] Fix TypeSize related warnings with IR truncates of scalable vectors
In getCastInstrCost when the instruction is a truncate we were relying
upon the implicit TypeSize -> uint64_t cast when asking if a given type
has the same size as a legal integer. I've changed the code to only
ask the question if the type is fixed length.

I have also changed InstCombinerImpl::SimplifyDemandedUseBits to bail
out for now if the type is a scalable vector.

I've added the following new tests:

  Analysis/CostModel/AArch64/sve-trunc.ll
  Transforms/InstCombine/AArch64/sve-trunc.ll

for both of these fixes.

Differential revision: https://reviews.llvm.org/D86432
2020-08-25 09:17:56 +01:00
Christopher Tetreault 5eff21c8ff [NFC][documentation] clarify comment in test
test referenced a relative path to a file, but the path was not correct
relative to the project the test is in

Differential Revision: https://reviews.llvm.org/D86368
2020-08-21 14:30:47 -07:00
Sam Parker acf0bb41e4 [ARM][CostModel] Select instruction costs.
Modify the ARM getCmpSelInstrCost implementation for the code size
costs of selects. Now consider the legalization cost and increase
the cost of i1 because those values wouldn't live in a general purpose
register. We also make selects +1 more expensive to account for the IT
instruction.

Differential Revision: https://reviews.llvm.org/D82091
2020-08-21 08:49:56 +01:00
dfukalov 4ccc38813e [AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation.
Add cases of fused fmul+fadd/fsub with f16 and f64 operands to cost model.
Also added operations with contract attribute.

Fixed line endings in test.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D84995
2020-08-06 21:43:27 +03:00
Sam Parker f2675ab45f [ARM][CostModel] Implement getCFInstrCost
As with other targets, set the throughput cost of control-flow
instructions to free so that we don't miss out of vectorization
opportunities.

Differential Revision: https://reviews.llvm.org/D85283
2020-08-05 12:44:51 +01:00
Simon Pilgrim d1abca187d [CostModel][X86] Add SSE costs for SMAX/SMIN/UMAX/UMIN intrinsics 2020-07-29 15:55:43 +01:00
Simon Pilgrim 0a0f28254a [CostModel][X86] Add SSE costs for ABS intrinsics 2020-07-29 14:33:59 +01:00
David Green 9ddb28964c [ARM] Tune getCastInstrCost for extending masked loads and truncating masked stores
This patch uses the feature added in D79162 to fix the cost of a
sext/zext of a masked load, or a trunc for a masked store.
Previously, those were considered cheap or even free, but it's
not the case as we cannot split the load in the same way we would for
normal loads.

This updates the costs to better reflect reality, and adds a test for it
in test/Analysis/CostModel/ARM/cast.ll.

It also adds a vectorizer test that showcases the improvement: in some
cases, the vectorizer will now choose a smaller VF when
tail-predication is enabled, which results in better codegen. (Because
if it were to use a higher VF in those cases, the code we see above
would be generated, and the vmovs would block tail-predication later in
the process, resulting in very poor codegen overall)

Original Patch by Pierre van Houtryve

Differential Revision: https://reviews.llvm.org/D79163
2020-07-29 13:41:34 +01:00
Simon Pilgrim c5ef1f1edd [TTI] Add default cost expansion for abs/smax/smin/umax/umin intrinsics 2020-07-29 12:13:06 +01:00
Simon Pilgrim 3f7249046a [CostModel][X86] Add smax/smin/umin/umax intrinsics cost model tests
Costs currently fall back to scalar generic intrinsic calls
2020-07-28 19:56:11 +01:00
Simon Pilgrim c6920081a8 [CostModel][X86] Add abs intrinsics cost model tests
abs costs currently falls back in scalar generic intrinsic calls
2020-07-28 19:56:10 +01:00
Jinsong Ji d28f86723f Re-land "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support"
This reverts commit bf544fa1c3.

Fixed the typo in PPCInstrInfo.cpp.
2020-07-28 14:00:11 +00:00
Jinsong Ji bf544fa1c3 Revert "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support"
This reverts commit adffce7153.

This is breaking test-suite, revert while investigation.
2020-07-27 21:07:00 +00:00
Jinsong Ji adffce7153 [PowerPC] Remove QPX/A2Q BGQ/BGP CNK support
Per RFC http://lists.llvm.org/pipermail/llvm-dev/2020-April/141295.html
no one is making use of QPX/A2Q/BGQ/BGP CNK anymore.

This patch remove the support of QPX/A2Q in llvm, BGQ/BGP in clang,
CNK support in openmp/polly.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D83915
2020-07-27 19:24:39 +00:00
dfukalov 76a0c0ee6f [AMDGPU][CostModel] Improve cost estimation for fused {fadd|fsub}(a,fmul(b,c))
Summary:
If result of fmul(b,c) has one use, in almost all cases (except denormals are
IEEE) the pair of operations will be fused in one fma/mad/mac/etc.

Reviewers: rampitec

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits, kerbowa

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83919
2020-07-16 03:06:38 +03:00
David Sherwood c06b7e2ab5 [SVE] Fix implicit TypeSize->uint64_t conversion getCastInstrCost
In getCastInstrCost() when comparing different sizes for src and
dst types we should be using the TypeSize comparison operators
instead of relying upon TypeSize being converted a uin64_t.
Previously this meant we were dropping the scalable property and
treating fixed and scalable vector types the same.

Differential Revision: https://reviews.llvm.org/D83461
2020-07-14 08:16:31 +01:00
Stanislav Mekhanoshin f7a7efbf88 [AMDGPU] Tweak getTypeLegalizationCost()
Even though wide vectors are legal they still cost more as we
will have to eventually split them. Not all operations can
be uniformly done on vector types.

Conservatively add the cost of splitting at least to 8 dwords,
which is our widest possible load.

We are more or less lying to cost mode with this change but
this can prevent vectorizer from creation of wide vectors which
results in RA problems for us.

Differential Revision: https://reviews.llvm.org/D83078
2020-07-06 14:07:48 -07:00
David Green 146dad0077 [ARM] MVE FP16 cost adjustments
This adjusts the MVE fp16 cost model, similar to how we already do for
integer casts. It uses the base cost of 1 per cvt for most fp extend /
truncates, but adjusts it for loads and stores where we know that a
extending load has been used to get the load into the correct lane, and
only an MVE VCVTB is then needed.

Differential Revision: https://reviews.llvm.org/D81813
2020-07-06 15:57:51 +01:00
David Green afdb2ef2ed [ARM] Adjust default fp extend and trunc costs
This adds some default costs for fp extends and truncates, generally
costing them as 1 per lane. If the type is not legal then the cost will
include a call to an __aeabi_ function.

Some NEON code is also adjusted to make sure it applies to the expected
types, now that fp16 is a more common thing.

Differential Revision: https://reviews.llvm.org/D82458
2020-07-06 14:23:17 +01:00
David Green 60b8b2beea [ARM] Add extra extend and trunc costs for cast instructions
This expands the existing extend costs with a few extras for larger
types than legal, which will usually be split under MVE. It also adds
trunk support for the same thing. These should not have a large effect
on many things, but makes the costs explicit and keeps a certain balance
between the trunks and extends.

Differential Revision: https://reviews.llvm.org/D82457
2020-07-06 11:33:05 +01:00
David Green 55227f85d0 [ARM] Use BaseT::getMemoryOpCost for getMemoryOpCost
This alters getMemoryOpCost to use the Base TargetTransformInfo version
that includes some additional checks for whether extending loads are
legal. This will generally have the effect of making <2 x ..> and some
<4 x ..> loads/stores more expensive, which in turn should help favour
larger vector factors.

Notably it alters the cost of a <4 x half>, which with the current
codegen will be expensive if it is not extended.

Differential Revision: https://reviews.llvm.org/D82456
2020-07-06 10:58:40 +01:00
Florian Hahn 1ccc49924a [AArch64] Add getCFInstrCost, treat branches as free for throughput.
D79164/2596da31740f changed getCFInstrCost to return 1 per default.
AArch64 did not have its own implementation, hence the throughput cost
of CFI instructions is overestimated. On most cores, most branches should
be predicated and essentially free throughput wise.

This restores a 9% performance regression on a SPEC2006 benchmark on
AArch64 with -O3 LTO & PGO.

This patch effectively restores pre 2596da3174 behavior for AArch64
and undoes the AArch64 test changes of the patch.

Reviewers: samparker, dmgreen, anemet

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D82755
2020-06-30 20:34:04 +01:00
David Green f14457f5d8 [ARM] Split cast cost tests, and add masked load/store tests. NFC
This file has grown quite large and could do with being split up. This
splits away the load/store + cast tests into a separate file. Some
masked load/store + cast tests have been added too, along with some
extra load/store + fpcast tests.
2020-06-25 13:24:17 +01:00
dfukalov 129388ddc4 [AMDGPU][CostModel] Add fneg cost estimation
Summary: The estimation uses AMDGPUTargetLowering::isFNegFree()

Reviewers: rampitec

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82065
2020-06-19 17:31:35 +03:00
Paul Walker 4612f39120 [SVE] Add flag to specify SVE register size, using this to calculate legal vector types.
Adds aarch64-sve-vector-bits-{min,max} to allow the size of SVE
data registers (in bits) to be specified. This allows the code
generator to make assumptions it normally couldn't. As a starting
point this information is used to mark fixed length vector types
that can fit within the specified size as legal.

Reviewers: rengolin, efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80384
2020-06-18 12:11:16 +00:00
Sam Parker 2596da3174 [CostModel] getCFInstrCost in getUserCost.
Have BasicTTI call the base implementation so that both agree on the
default behaviour, which the default being a cost of '1'. This has
required an X86 specific implementation as it seems to be very
reliant on those instructions being free. Changes are also made to
AMDGPU so that their implementations distinguish between cost kinds,
so that the unrolling isn't affected. PowerPC also has its own
implementation to prevent changes to the reg-usage vectorizer test.

The cost model test changes now reflect that ret instructions are not
generally free.

Differential Revision: https://reviews.llvm.org/D79164
2020-06-15 09:28:46 +01:00
David Green 7507186b94 [ARM] Additional cast cost tests.
This adds additional cast cpst tests useful for MVE, notably around half
types.
2020-06-14 14:30:07 +01:00
Simon Pilgrim 28947bc23c [CostModel][X86] Add broadcast costs for vXi1 bool vectors
Doesn't mean much on non-AVX512 targets but better to keep with the other shuffles
2020-06-10 15:27:15 +01:00
Sam Parker e70cf280f8 [NFC][ARM][AArch64] Test runs
Add code size tests runs for memory ops for both architectures.
2020-06-02 09:05:30 +01:00
Sam Parker 792575ff32 [NFC][ARM][AArch64] More code size tests
Add analysis runs for icmp, fcmp and select instructions.
2020-05-26 14:47:02 +01:00
Sam Parker c5bbc8dd6d [NFC][ARM] Fix for previous commit
Actually analyse code-size for the size runs...
2020-05-26 10:45:35 +01:00
Sam Parker 48cdbd081c [NFC][ARM] Add code size analysis tests
Add code size runs for the cast costs.
2020-05-26 10:30:43 +01:00
Sam Parker 64cfb8a864 [NFC][ARM] Add intrinsic code size runs
Add code size analysis of arithmetic intrinsics.
2020-05-26 09:41:54 +01:00
Sam Parker 1f72d5880e [CostModel] Check for free intrinsics in BasicTTI
Recommitting part of "[CostModel] Unify Intrinsic Costs."
de71def3f5

Now that the 'free' intrinsic information has been sunk to the lowest
level, query the base implementation in BasicTTI before doing
anything else. I suspect this is the change that was causing the main
changes, particularly the large effects on debug builds.

Differential Revision: https://reviews.llvm.org/D80012
2020-05-26 08:37:13 +01:00
Sam Parker fb3ba38021 [CostModel] Remove getExtCost
This has not been implemented by any backends which appear to cover
the functionality through getCastInstrCost. Sink what there is in the
default implementation into BasicTTI.

Differential Revision: https://reviews.llvm.org/D78922
2020-05-21 07:18:06 +01:00
Eli Friedman 11aa3707e3 StoreInst should store Align, not MaybeAlign
This is D77454, except for stores.  All the infrastructure work was done
for loads, so the remaining changes necessary are relatively small.

Differential Revision: https://reviews.llvm.org/D79968
2020-05-15 12:26:58 -07:00
Sam Parker 0ef62fc25d [NFC][ARM] Intrinsic CostModel Tests
Add throughput tests for saturating, overflowing and reduction
operations.
2020-05-15 13:38:42 +01:00
Stanislav Mekhanoshin 184b383457 Add v16f64 value type
We need to use it to handle <16 x double> indirect indexes
in the AMDGPU BE.

The only visible change from adding it is in ARM cost model.
To me it looks reasonable. With doubling a vector size it
quadruples the cost up to the size 8 and then it did only
double it. Now it also quadruples, which seems a logical
progression to me.

Actual AMDGPU code is to follow, this is a common part, plus
load/store legalization in the AMDGPU BE not to break what
works now.

Differential Revision: https://reviews.llvm.org/D79952
2020-05-14 14:28:00 -07:00
Sam Parker 6bbad7285c [CostModel] Modify BasicTTI getCastInstrCost
Fix the assumption that all bitcasts of the same type sizes are free.
We now only assume that bitcasts between ints and ptrs of the same
size are free. This allows TTImpl to just call the concrete
implementation of getCastInstrCost.

Differential Revision: https://reviews.llvm.org/D78918
2020-05-13 07:26:08 +01:00
Sam Parker f1f8cffce4 [NFC][AArch64] More casts tests...
Don't use truncs are users because sometimes they're free too.
2020-05-12 13:06:17 +01:00
Sam Parker e114bdf072 [NFC][AArch64] More cast cost tests
Add truncating stores and casts with users.
2020-05-12 11:32:52 +01:00
Sam Parker b4a8091a11 [ARM][CostModel] Improve getCastInstrCost
- Specifically check for sext/zext users which have 'long' form NEON
  instructions.
- Add more entries to the table for sext/zexts so that we can report
  more accurately the number of vmovls required for NEON.
- Pass the instruction to the pass implementation.

Differential Revision: https://reviews.llvm.org/D79561
2020-05-12 10:32:20 +01:00