Commit Graph

430 Commits

Author SHA1 Message Date
Nikita Popov 8e1a464e6a [CodeGen][X86] Expand UADDSAT to NOT+UMIN+ADD
Followup to D56636, this time handling the UADDSAT case by expanding
uadd.sat(a, b) to umin(a, ~b) + b.

Differential Revision: https://reviews.llvm.org/D56869

llvm-svn: 352409
2019-01-28 19:19:09 +00:00
Simon Pilgrim adca820927 [TTI] Add generic SADDSAT/SSUBSAT costs
Add generic costs calculation for SADDSAT/SSUBSAT intrinsics, this uses generic costs for sadd_with_overflow/ssub_with_overflow, an extra sign comparison + a selects based on the sign/overflow.

This completes PR40316

Differential Revision: https://reviews.llvm.org/D57239

llvm-svn: 352315
2019-01-27 13:51:59 +00:00
Nemanja Ivanovic 7d007ddedf [PowerPC] Update Vector Costs for P9
For the power9 CPU, vector operations consume a pair of execution units rather
than one execution unit like a scalar operation. Update the target transform
cost functions to reflect the higher cost of vector operations when targeting
Power9.

Patch by RolandF.

Differential revision: https://reviews.llvm.org/D55461

llvm-svn: 352261
2019-01-26 01:18:48 +00:00
Simon Pilgrim 30b206b5da [CostModel][X86] Add SMUL fixed point cost tests
llvm-svn: 352046
2019-01-24 13:48:20 +00:00
Simon Pilgrim 47ca8606ba [TTI] Add generic SADDO/SSUBO costs
Added x86 scalar sadd_with_overflow/ssub_with_overflow costs.

llvm-svn: 352045
2019-01-24 13:36:45 +00:00
Simon Pilgrim a131e4e296 [TTI] Add generic UADDSAT/USUBSAT costs
Add generic costs calculation for UADDSAT/USUBSAT intrinsics, this fallbacks to using generic costs for uadd_with_overflow/usub_with_overflow + a select.

Differential Revision: https://reviews.llvm.org/D56907

llvm-svn: 352044
2019-01-24 12:27:10 +00:00
Simon Pilgrim 2d1964b90f [TTI] Add generic UADDO/USUBO costs
Added x86 scalar uadd_with_overflow/usub_with_overflow costs.

Differential Revision: https://reviews.llvm.org/D56907

llvm-svn: 352043
2019-01-24 12:10:20 +00:00
Simon Pilgrim f87226eb70 [IR] Match intrinsic parameter by scalar/vectorwidth
This patch replaces the existing LLVMVectorSameWidth matcher with LLVMScalarOrSameVectorWidth.

The matching args must be either scalars or vectors with the same number of elements, but in either case the scalar/element type can differ, specified by LLVMScalarOrSameVectorWidth.

I've updated the _overflow intrinsics to demonstrate this - allowing it to return a i1 or <N x i1> overflow result, matching the scalar/vectorwidth of the other (add/sub/mul) result type.

The masked load/store/gather/scatter intrinsics have also been updated to use this, although as we specify the reference type to be llvm_anyvector_ty we guarantee the mask will be <N x i1> so no change in behaviour

Differential Revision: https://reviews.llvm.org/D57090

llvm-svn: 351957
2019-01-23 16:00:22 +00:00
Simon Pilgrim ee900efb30 [CostModel][X86] Add ICMP Predicate specific costs
First step towards PR40376, this patch adds support for getCmpSelInstrCost to use the (optional) Instruction CmpInst predicate to indicate the type of integer comparison we're performing and alter the costs accordingly.

Differential Revision: https://reviews.llvm.org/D57013

llvm-svn: 351810
2019-01-22 12:29:38 +00:00
Simon Pilgrim 44feb4a87b [CostModel][X86] Add XOP icmp cost tests (PR40376)
llvm-svn: 351741
2019-01-21 11:33:52 +00:00
Simon Pilgrim c934d3a01b [CostModel][X86] Add explicit vector select costs
Prior to SSE41 (and sometimes on AVX1), vector select has to be performed as a ((X & C)|(Y & ~C)) bit select.

Exposes a couple of issues with the min/max reduction costs (which only go down to SSE42 for some reason).

The increase pre-SSE41 selection costs also prevent a couple of tests from firing any longer, so I've either tweaked the target or added AVX tests as well to the existing SSE2 tests.

llvm-svn: 351685
2019-01-20 13:55:01 +00:00
Simon Pilgrim 1231904c48 [CostModel][X86] Add explicit fcmp costs for pre-SSE42 targets
Typical throughputs: cmpss/cmpps = 1cy and cmpsd/cmppd = 2cy before the Core2 era

llvm-svn: 351684
2019-01-20 13:21:43 +00:00
Simon Pilgrim 60e5a3accb [CostModel][X86] Split icmp/fcmp costs tests and test all comparison codes
llvm-svn: 351682
2019-01-20 12:10:42 +00:00
Simon Pilgrim 5d7182ecb6 [CostModel][X86] Add masked load/store/gather/scatter tests for SSE2/SSE42/AVX1 targets
llvm-svn: 351681
2019-01-20 11:23:01 +00:00
Simon Pilgrim a8b009fd14 [CostModel][X86] Add non-constant vselect cost tests
Also add AVX512 costs at the same time

llvm-svn: 351680
2019-01-20 11:19:35 +00:00
Nikita Popov d3b86b79fa Reapply "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Reapplying with updated SLPVectorizer tests.

Differential Revision: https://reviews.llvm.org/D56636

llvm-svn: 351219
2019-01-15 18:43:41 +00:00
Nikita Popov 5885eec35a Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"
This reverts commit r351125.

I missed test changes in an SLPVectorizer test, due to the cost model
changes. Reverting for now.

llvm-svn: 351129
2019-01-14 22:18:39 +00:00
Nikita Popov 8e9a8432a8 [CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Differential Revision: https://reviews.llvm.org/D56636

llvm-svn: 351125
2019-01-14 21:43:30 +00:00
Simon Pilgrim c2054144ee [CostModel][X86] Fix SSE1 FADD/FSUB costs
Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4

Add the other costs so that we're not relying on the default "is legal/custom" cost logic.

llvm-svn: 350403
2019-01-04 16:55:57 +00:00
Simon Pilgrim 71d61567c0 [CostModel][X86] Add SSE1 fp cost tests
llvm-svn: 350401
2019-01-04 16:37:01 +00:00
Simon Pilgrim f0c533b7db [CostModel][X86] Add truncate cost tests to cover all legal destination types
We were only testing costs for legal source vector element counts

llvm-svn: 350323
2019-01-03 14:49:39 +00:00
Simon Pilgrim d824f99a6c [X86] Add ADD/SUB SSAT/USAT vector costs (PR40123)
Costs for real SSE2 instructions

llvm-svn: 350295
2019-01-03 11:38:42 +00:00
Simon Pilgrim 55ea89305d [X86] Add ADD/SUB SSAT/USAT cost tests (PR40123)
llvm-svn: 350293
2019-01-03 11:29:24 +00:00
Craig Topper c6bfb05762 [CostModel][X86] Don't count 2 shuffles on the last level of a pairwise arithmetic or min/max reduction
This is split from D55452 with the correct patch this time.

Pairwise reductions require two shuffles on every level but the last. On the last level the two shuffles are <1, u, u, u...> and <0, u, u, u...>, but <0, u, u, u...> will be dropped by InstCombine/DAGCombine as being an identity shuffle.

Differential Revision: https://reviews.llvm.org/D55615

llvm-svn: 349072
2018-12-13 19:08:10 +00:00
Craig Topper 84d62ce6c1 [CostModel][X86][AArch64] Adjust cost of the scalarization part of min/max reduction.
Summary: The comment says we need 3 extracts and a select at the end. But didn't we just account for the select in the vector cost above. Aren't we just extracting the single element after taking the min/max in the vector register?

Reviewers: RKSimon, spatel, ABataev

Reviewed By: RKSimon

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D55480

llvm-svn: 348739
2018-12-10 06:58:58 +00:00
Craig Topper d1498ed8df [CostModel][X86] Fix overcounting arithmetic cost in illegal types in getArithmeticReductionCost/getMinMaxReductionCost
We were overcounting the number of arithmetic operations needed at each level before we reach a legal type. We were using the full vector type for that level, but we are going to split the input vector at that level in half. So the effective arithmetic operation cost at that level is half the width.

So for example on 8i32 on an sse target. Were were calculating the cost of an 8i32 op which is likely 2 for basic integer. Then after the loop we count 2 more v4i32 ops. For a total arith cost of 4. But if you look at the assembly there would only be 3 arithmetic ops.

There are still more bugs in this code that I'm going to work on next. The non pairwise code shouldn't count extract subvectors in the loop. There are no extracts, the types are split in registers. For pairwise we need to use 2 two src permute shuffles.

Differential Revision: https://reviews.llvm.org/D55397

llvm-svn: 348621
2018-12-07 18:20:56 +00:00
Craig Topper 381b4fb0ab [X86] Remove -costmodel-reduxcost=true from the experimental vector reduction intrinsic tests as it appears to be unnecessary. NFC
I think this has something to do with matching reductions from extractelement, binops, and shuffles. But we're not matching here.

llvm-svn: 348340
2018-12-05 07:56:50 +00:00
Craig Topper b4719e5842 [X86] Add more cost model tests for vector reductions with narrow vector types. NFC
llvm-svn: 348339
2018-12-05 07:26:57 +00:00
Jonas Paulsson 8ae0f88b13 [SystemZ::TTI] Return zero cost for ICmp that becomes Load And Test.
A loaded value with multiple users compared with 0 will become a load and
test single instruction. The load is not folded in this case (multiple
users), but the compare instruction is eliminated.

This patch returns 0 cost for the icmp in these cases.

Review: Ulrich Weigand
https://reviews.llvm.org/D55111

llvm-svn: 348141
2018-12-03 14:30:18 +00:00
Simon Pilgrim 102854f4d4 [TTI] Reduction costs only need to include a single extract element cost (REAPPLIED)
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.

For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.

Fixes PR37731

Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997

Differential Revision: https://reviews.llvm.org/D54585

llvm-svn: 348076
2018-12-01 14:18:31 +00:00
Craig Topper 8e10e9423d [X86] Replace '-mcpu=skx' with -mattr=avx512f or -mattr=avx512bw in interleave/strided load/store cost model tests.
llvm-svn: 348056
2018-12-01 00:21:49 +00:00
Jonas Paulsson b1d014883c [SystemZ::TTI] i8/i16 operands extension costs revisited
Three minor changes to these extra costs:

* For ICmp instructions, instead of adding 2 all the time for extending each
  operand, this is only done if that operand is neither a load or an
  immediate.

* The operands extension costs for divides removed, because we now use a high
  cost already for the divide (20).

* The costs for lhsr/ashr extra costs removed as this did not seem useful.

Review: Ulrich Weigand
https://reviews.llvm.org/D55053

llvm-svn: 347961
2018-11-30 07:09:34 +00:00
Craig Topper 81f1b4a361 [X86] Make X86TTIImpl::getCastInstrCost properly handle the case where AVX512 is enabled, but 512-bit vectors aren't legal.
Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal.

This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature.

Differential Revision: https://reviews.llvm.org/D54984

llvm-svn: 347786
2018-11-28 18:11:42 +00:00
Craig Topper d3bb036bc9 [X86] Add some cost model entries for sext/zext for avx512bw
This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst.

I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types.

Differential Revision: https://reviews.llvm.org/D54979

llvm-svn: 347785
2018-11-28 18:11:39 +00:00
Craig Topper f3b6f583e2 [X86] Add a combine for back to back VSRAI instructions
Expansion of SIGN_EXTEND_INREG can create a VSRAI instruction. If there is already a VSRAI after it, we should combine them into a larger VSRAI

Differential Revision: https://reviews.llvm.org/D54959

llvm-svn: 347784
2018-11-28 18:03:38 +00:00
Jonas Paulsson 06acb3a236 [SystemZ::TTI] Improve cost for compare of i64 with extended i32 load
CGF/CLGF compares an i64 register with a sign/zero extended loaded i32 value
in memory.

This patch makes such a load considered foldable and so gets a 0 cost.

Review: Ulrich Weigand
https://reviews.llvm.org/D54944

llvm-svn: 347735
2018-11-28 08:58:27 +00:00
Jonas Paulsson d6b7aca911 [SystemZ::TTI] Improve costs for i16 add, sub and mul against memory.
AH, SH and MH costs are already covered in the cases where LHS is 32 bits and
RHS is 16 bits of memory sign-extended to i32.

As these instructions are also used when LHS is i16, this patch recognizes
that the loads will get folded then as well.

Review: Ulrich Weigand
https://reviews.llvm.org/D54940

llvm-svn: 347734
2018-11-28 08:31:50 +00:00
Jonas Paulsson 011a503f25 [SystemZ::TTI] Improved cost values for comparison against memory.
Single instructions exist for i8 and i16 comparisons of memory against a
small immediate.

This patch makes sure that if the load in these cases has a single user (the
ICmp), it gets a 0 cost (folded), and also that the ICmp gets a cost of 1.

Review: Ulrich Weigand
https://reviews.llvm.org/D54897

llvm-svn: 347733
2018-11-28 08:08:05 +00:00
Jonas Paulsson 5da8e432b9 [SystemZ::TTI] Return zero cost for scalar load/store connected with a bswap.
Since byte-swapping loads and stores are supported, a 'load -> bswap' or
'bswap -> store' sequence should have the cost of one.

Review: Ulrich Weigand
https://reviews.llvm.org/D54870

llvm-svn: 347732
2018-11-28 07:52:34 +00:00
Craig Topper 078e58da15 [X86] Add test cases to show that we don't properly take -mprefer-vector-width=256 and -min-legal-vector-width=256 into account when costing sext/zext.
The check lines marked AVX256 in the zext256/sext256 functions should be closer to the AVX values which would take into account a splitting cost.

llvm-svn: 347722
2018-11-28 00:33:34 +00:00
Craig Topper 5e7dcc65bd [X86] Add exhaustive cost model testing for sext/zext for all vector types we reasonably support. Add cost model tests for truncating to vXi1.
Our sext/zext cost modeling was somewhat incomplete. And had no coverage for the fact that avx512bw v32i16/v64i8 types return a scalarization cost.

Truncates are a whole different mess because isTruncateFree is returning true for vectors when it shouldn't and that's the fall back for anything not in the tables.

llvm-svn: 347719
2018-11-27 22:46:05 +00:00
Craig Topper e535babe4c [X86] Add cost model tests for experimental.vector.reduce.* with -x86-experimental-vector-widening-legalization
llvm-svn: 347697
2018-11-27 19:44:40 +00:00
Craig Topper 2f9e5c40a6 [X86] Add cost model test for masked load an store with -x86-experimental-vector-widening-legalization
llvm-svn: 347696
2018-11-27 19:44:36 +00:00
Craig Topper 69cf86e327 [X86] Add cost model tests for fp_to_int/int_to_fp with -x86-experimental-vector-widening-legalization
llvm-svn: 347695
2018-11-27 19:44:34 +00:00
Craig Topper 1a2f9f765e [X86] Add cost model tests for shifts with -x86-experimental-vector-widening-legalization.
llvm-svn: 347694
2018-11-27 19:44:30 +00:00
Fedor Sergeev 8cd9d1b5ce Revert "[TTI] Reduction costs only need to include a single extract element cost"
This reverts commit r346970.
It was causing PR39774, a crash in slp-vectorizer on a rather simple loop
with just a bunch of 'and's in the body.

llvm-svn: 347541
2018-11-26 10:17:27 +00:00
Jonas Paulsson 96782c2c0b [SystemZTTIImpl] Give correct cost values for vector bswap intrinsics.
Implement getIntrinsicInstrCost() and return costs reflecting that bswap can
be done with a vperm per vector register.

Review: Ulrich Weigand
https://reviews.llvm.org/D54789

llvm-svn: 347445
2018-11-22 07:17:29 +00:00
Simon Pilgrim 924f193419 [TTI] Reduction costs only need to include a single extract element cost
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.

For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.

Fixes PR37731

Differential Revision: https://reviews.llvm.org/D54585

llvm-svn: 346970
2018-11-15 17:42:53 +00:00
Simon Pilgrim 2b166c5044 [TTI] getOperandInfo - a broadcast shuffle means the result is OK_UniformValue
llvm-svn: 346868
2018-11-14 15:04:08 +00:00
Simon Pilgrim cdb170794b [CostModel] Add generic expansion funnel shift cost support
Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost.

This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs.

llvm-svn: 346854
2018-11-14 12:24:50 +00:00