Commit Graph

3545 Commits

Author SHA1 Message Date
Simon Pilgrim c444af1c20 [CostModel][X86] Add CostKinds handling for mul ops
This was achieved using the 'cost-tables vs llvm-mca' script D103695

Also fix a missing pmullw v16i16 half-rate throughput as znver1 double-pumps - matches numbers from AMD SoG + Agner
2022-09-04 11:59:05 +01:00
Simon Pilgrim c4a174be91 [CostModel][X86] Add vector shift test coverage for codesize/latency/size-latency cost kinds 2022-09-04 11:02:30 +01:00
Simon Pilgrim 444685de06 [CostModel][X86] Adjust mul v4i32/v8i32 throughput cost
Based off the numbers from AMD SoG + Agner - vXi32 are both half-rate, and znver1 double-pumps the v8i32 op

We should have caught this earlier as many Intel models have half-rate pmulld already :-(
2022-09-03 18:45:08 +01:00
Simon Pilgrim 114b7762a9 [CostModel][X86] Add CostKinds handling for add/sub ops
This was achieved using the 'cost-tables vs llvm-mca' script D103695
2022-09-03 18:45:08 +01:00
Simon Pilgrim 5aee2726d8 [CostModel][X86] Add CostKinds handling for fdiv ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695

As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'

As the uop count (used for TCK_SizeAndLatency) for divss/divps is typically so low, we need to override isExpensiveToSpeculativelyExecute to ensure we keep fdiv calls behind branches - although for some very recent cpu targets it might not be necessary any more and could be relaxed.
2022-09-03 15:48:39 +01:00
Simon Pilgrim 1c12e12111 [CostModel][X86] Add fdiv(double) throughput x87 costs for 2022-09-03 14:08:25 +01:00
Simon Pilgrim 0735200e3f [CostModel][X86] Add CostKinds handling for fmul ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695

As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'
2022-09-03 10:42:20 +01:00
Eli Friedman b219a9c0a2 [CostModel][AArch64] Fix ctpop intrinsic cost when NEON is disabled.
If we don't have NEON, we use the generic fallback, which takes 12
instructions. Make sure the costs reflect that.

(On a related note, we could optimize the generic fallback a bit. It
currently uses sequences like lsr+and+add; if we use and+lsr+add
instead, we can fold the lsr into the add.)

Differential Revision: https://reviews.llvm.org/D133154
2022-09-02 15:17:55 -07:00
Mingming Liu 242203d254 [AArch64][TTI] Add cost table entry for trunc over vector of integers.
1) Tablegen patterns exist to use 'xtn' and 'uzp1' for trunc [1]. Cost table entries are updated based on the actual number of {xtn, uzp1} instructions generated.
2) Without this, an IR instruction like trunc <8 x i16> %v to <8 x i8> is considered free and might be sinked to other basic blocks. As a result, the sinked 'trunc' is in a different basic block with its (usually not-free) vector operand and misses the chance to be combined during instruction selection. (examples in [2])
3) It's a lot of effort to teach CodeGenPrepare.cpp to sink the operand of trunc without introducing regressions, since the instruction to compute the operand of trunc could be faster (e.g., throughput) than the instruction corresponding to "trunc (bin-vector-op". For instance in [3], sinking %1 (as trunc operand) into bb.1 and bb.2 means to replace 2 xtn with 2 shrn (shrn has a throughput of 1 and only utilize v1 pipeline), which is not necessarily good, especially since ushr result needs to be preserved for store operation in bb.0. Meanwhile, it's too optimistic (for CodeGenPrepare pass) to assume machine-cse will always be able to de-dup shrn from various basic blocks into one shrn.

[1] For {v8i16->v8i8, v4i32->v4i16, v2i64->v2i32}, 813ae2871d/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L4472).
    For concat (trunc, trunc) -> uzip1, 813ae2871d/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L5428-L5437)
[2] examples
    - trunc(umin(X, 255)) -> UQXTRN v8i8 (and other {u,s}x{min,max} pattern for v8i16 operands) from 813ae2871d/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L4515-L4528)
    - trunc (AArch64vlshr v8i16, imm) -> SHRNv8i8 (same missed for SHRNv2i32) from 813ae2871d/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L6743-L6748)
[3]
    ---
    ; instruction latency / throughput / pipeline on `neoverse-n1`
    bb.0:
      %1 = lshr <8 x i16> %10, <i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4>   ; ushr, latency 2, throughput 1, pipeline V1
      %2 = trunc <8 x i16> %1 to <8 x i8>  ; xtn, latency 2, throughput 2, pipeline V
      %3 = store <8 x i8> %1, ptr %addr
      br cond i1 cond, label bb.1, label bb.2

    bb.1:
      %4 = trunc <8 x i16> %1 to <8 x i8> ; xtn

    bb.2:
      %5 = trunc <8 x i16> %1 to <8 x i8> ; xtn
    ---

Differential Revision: https://reviews.llvm.org/D132784
2022-09-02 10:06:55 -07:00
Simon Pilgrim 116d8f8cf0 Revert rG11765b77be84d793ebedc5b5436c463490746131 "[CostModel][X86] Add CostKinds handling for fmul ops"
I need to address some x87 codegen changes before re-committing this.
2022-09-02 17:21:25 +01:00
Simon Pilgrim 11765b77be [CostModel][X86] Add CostKinds handling for fmul ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695

As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'
2022-09-02 16:57:23 +01:00
Simon Pilgrim ad16f3e413 [CostModel][X86] Add CostKinds handling for fadd/fsub/fneg ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 which I'll update shortly

As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'
2022-09-02 11:50:01 +01:00
liqinweng 62a238a1f8 [RISCV][NFC] Add cost model tests of llvm.fmuladd
Reviewed By: benshi001

Differential Revision: https://reviews.llvm.org/D132922
2022-09-01 15:33:02 +08:00
Mingming Liu 75f1b328f8 [AArch64][CostModel][NFC] Specify target datalayout explicitly for cost analysis test.
- Use linux little endian data layout string.

Differential Revision: https://reviews.llvm.org/D132889
2022-08-31 09:52:24 -07:00
Simon Pilgrim ad8e4dd2ad [CostModel][X86] Add and/or/xor general cost kinds support
Account for double-pumping on early AVX1/AVX2 targets
2022-08-31 17:26:05 +01:00
jacquesguan 45c1ce321d [RISCV] Add cost model for select and integer compare instructions.
This patch adds cost model for vector select and integer compare instructions.
2022-08-31 11:32:58 +08:00
jacquesguan 4fd53fd8eb [RISCV][test] Add cost model coverage for compare instructions.
Differential Revision: https://reviews.llvm.org/D132827
2022-08-31 10:59:57 +08:00
Mingming Liu 3785234b03 [NFC][AArch64] Specify datalayout explicitly for cast.ll and
arith-overflow.ll and update tests accordingly.

- These two tests stands out when data layout is explicitly added in a
  sweep study (D132889)

 Differential Revision: https://reviews.llvm.org/D132856
2022-08-30 09:43:29 -07:00
Simon Pilgrim 7830445086 [CostModel][X86] Account for add/sub 512-bit vector splitting costs on non-AVX512BW targets 2022-08-30 16:54:06 +01:00
jacquesguan ae4422982a [RISCV][NFC] Add cost model coverage for fp arithmetic instructions.
This patch adds cost model coverage for fp arithmetic instructions. Some is not exact, I am working on a revision to implement that.

Differential Revision: https://reviews.llvm.org/D132537
2022-08-29 11:06:42 +08:00
Arthur Eubanks 7a94d189ad [LazyCallGraph] Update libcall list when replacing a libcall node's function
Otherwise when we visit all libcalls in
updateCGAndAnalysisManagerForPass(), the old libcall is dead and doesn't
have a node.

We treat libcalls conservatively in LazyCallGraph because any function
may introduce calls to them out of thin air.

It is weird to change the signature of a libcall since introducing calls
to the libcall with a different signature may break, but other passes
like deadargelim already do it, so let's preserve this behavior for now.

Fixes an issue found in D128830.

Reviewed By: psamolysov

Differential Revision: https://reviews.llvm.org/D132764
2022-08-27 10:57:53 -07:00
Philip Reames b45a262679 [RISCV] Enable fixed length vectors and loop vectorization with same
This change enables the use of RISCV's variable length vector registers for fixed length vectors in the IR, and implicitly enables various IR transforms which generate fixed length vectors if legal (e.g. LoopVectorize). Specifically, this enables fixed length vectors which are known to be inbounds of the underlying variable hardware size.

For context, remember that the +V extension provides a minimum VLEN of 128. The embedded variants provide lower minimums. The analogy here is essentially vectorizing for SSE on a machine which may or may not include AVX2/AVX512. We won't get full utilization by default, but we will get some benefit. And of course, with an explicit mcpu we can vectorize to the exact target hardware.

The LV impact is mostly related to vectorizer robustness. In cases we haven't yet fully implemented scalable vectorization support, we can fall back to fixed length vectorization.

SLP has been disabled for now, even when fixed vectors are enabled.  See a310637 and associated review.  There are a few addiitional code quality issues which need worked through before turning SLP on would be reasonable.

Differential Revision: https://reviews.llvm.org/D131508
2022-08-26 14:45:23 -07:00
Simon Pilgrim 0d0dc4e6ab [CostModel][X86] Add CodeSize handling for and/or/xor ops
Eventually this will be part of the cost table lookup
2022-08-26 18:42:52 +01:00
Simon Pilgrim f9445ae75c [CostModel][X86] Add CodeSize handling for fneg ops
Eventually this will be part of the cost table lookup
2022-08-26 17:34:52 +01:00
Paul Walker 3bb228729f [CostModel][SVE] Correct cost model of SK_Splice shuffles for <vscale x 1 x Ty> vector types.
AArch64TTIImpl::getSpliceCost() is now used more aggressively and
LNT (MultiSource/Benchmarks/mafft) exposed a failure case for
<vscale x 1 x i1>. I've tested other element types and whilst they
can be costed they cannot be code generated, so this patch returns
InstructionCost::getInvalid() for all cases.
2022-08-26 16:06:01 +01:00
Florian Hahn 555e09c2b0
[LAA] Rename printing pass to print<access-info>.
This updates the naming for the LAA printing pass to be in line with
most other analysis printing passes.

The old name has come up as confusing multiple times already, e.g. in
D131924.
2022-08-26 11:00:09 +01:00
Philip Reames 53f738ce7e [RISCV] Add empirical costs for integer min/max and saturing add/sub
All of these are lowered to a single instruction for all legal vector types.
2022-08-25 09:27:17 -07:00
Philip Reames ca5f8b0909 [RISCV][CostModel] Correct typo in saturating intrinsic names
The fact that we silently accept unrecognized intrinsic names is sometimes a bit annoying.
2022-08-25 09:10:22 -07:00
Philip Reames 4006928669 [RISCV][CostModel] Add test coverage for all the vectorizable binary intrinsics 2022-08-25 08:56:02 -07:00
Simon Pilgrim 3edec9ba60 [CostModel][X86] Support cost kind specific look up tables (REAPPLIED)
Most of our cost model tables have been created assuming cost kind == recip-throughput. But we're starting to see passes wanting to get accurate costs for the other kinds as well. Some of these can be determined procedurally (e.g. codesize by default could just be the split count after type legalization), but others are going to need to be handled in cost tables - this is especially true for x86 which has so many ISA combinations.

I've created a 'CostKindCosts' struct which can hold cost values for the 4 cost kinds, defaulting to -1U for unknown cost, this can be used with the existing CostTblEntryT/CostTableLookup template code. I've also added a [TargetCostKind] accessor to make it much easier to look up individual <Optional> costs.

This just changes the ISD::SELECT costs to check the effect (and also to check that the ISD::SETCC are correctly handled for default/None cost kinds) - the plan would be to slowly extend this and move the CostKindTblEntry type somewhere generic to allow other targets to use it once its matured.

I'm also going to resurrect D103695 so that it can help with latency/codesize/sizelatency coverage testing.

For sizelatency - IIRC the definition was vague to let it be target specific - I've tried to use typical uop counts so they're comparable to MicroOpBufferSize etc.

REAPPLIED: Added early out to prevent getCmpSelInstrCost being used for anything but generic integer/float scalar/vector types - getTypeLegalizationCost can't handle the "exotic" TypeID enums that some passes attempt to get a costs for (aggregates etc.).

Differential Revision: https://reviews.llvm.org/D132216
2022-08-25 16:49:17 +01:00
Benjamin Kramer ab85996e47 Revert "[CostModel][X86] Support cost kind specific look up tables"
This reverts commit 45846854a2.

This triggers an assertion failure during Clang selfhost

Unknown type!
UNREACHABLE executed at llvm/lib/CodeGen/ValueTypes.cpp:548!
*** SIGABRT received by PID 6107 (TID 6107) on cpu 218 from PID 6107; stack trace: ***
    @     0x556c8827c2d1         64  llvm::llvm_unreachable_internal()
    @     0x556c82a5542a         32  llvm::MVT::getVT()
    @     0x556c82a54a28         80  llvm::EVT::getEVT()
    @     0x556c7dda1526         80  llvm::TargetLoweringBase::getValueType()
    @     0x556c8174dd38        112  llvm::BasicTTIImplBase<>::getTypeLegalizationCost()
    @     0x556c81755e72        144  llvm::X86TTIImpl::getCmpSelInstrCost()
    @     0x556c8174cadf        512  llvm::TargetTransformInfoImplCRTPBase<>::getInstructionCost()
    @     0x556c84ab4dd2         32  llvm::TargetTransformInfo::getInstructionCost()
    @     0x556c82ead283       1968  llvm::sinkRegion()
2022-08-25 15:42:44 +02:00
Simon Pilgrim 2e5f16516a [CostModel][X86] Add CodeSize handling for fdiv ops
Eventually this will be part of the cost table lookup
2022-08-25 14:08:03 +01:00
Simon Pilgrim 45846854a2 [CostModel][X86] Support cost kind specific look up tables
Most of our cost model tables have been created assuming cost kind == recip-throughput. But we're starting to see passes wanting to get accurate costs for the other kinds as well. Some of these can be determined procedurally (e.g. codesize by default could just be the split count after type legalization), but others are going to need to be handled in cost tables - this is especially true for x86 which has so many ISA combinations.

I've created a 'CostKindCosts' struct which can hold cost values for the 4 cost kinds, defaulting to -1U for unknown cost, this can be used with the existing CostTblEntryT/CostTableLookup template code. I've also added a [TargetCostKind] accessor to make it much easier to look up individual <Optional> costs.

This just changes the ISD::SELECT costs to check the effect (and also to check that the ISD::SETCC are correctly handled for default/None cost kinds) - the plan would be to slowly extend this and move the CostKindTblEntry type somewhere generic to allow other targets to use it once its matured.

I'm also going to resurrect D103695 so that it can help with latency/codesize/sizelatency coverage testing.

For sizelatency - IIRC the definition was vague to let it be target specific - I've tried to use typical uop counts so they're comparable to MicroOpBufferSize etc.

Differential Revision: https://reviews.llvm.org/D132216
2022-08-25 12:23:36 +01:00
Philip Reames 0473ac8876 [RISCV] Remove cttz/ctlz cost model coverage for the moment
I'd used the wrong signatures for these as I had not remembered we'd added the second boolean argument.  We appear to still parse the old form just fine, but we should use the two argument form in test since that's what the LangRef actually describes.
2022-08-24 15:55:46 -07:00
Philip Reames 03798f268b {RISCV] Backout cttz/ctlz instruction costs
Craig points out correctly in post-commit review that these depend on the availability of floating point extensions.
2022-08-24 15:40:48 -07:00
Philip Reames d4d6e71ea2 [RISCV] Add empirical costs for bswap/bitreverse/ctpop/ctlz/cttz
If anyone is looking for a source of ideas on vector codegen improvements, the lowerings for several of these seem to include pretty obvious fixits.
2022-08-24 15:09:21 -07:00
Philip Reames cb3f32a20d [RISCV] Add cost model coverage for integer bitmanip intrinsics 2022-08-24 15:09:21 -07:00
Philip Reames 42af1a776a [RISCV] Add empirically measured vector sqrt intrinsic costs 2022-08-24 14:27:57 -07:00
Philip Reames 0ad88e74eb [RISCV] Add cost model coverage for floating point sqrt intrinsic 2022-08-24 14:27:27 -07:00
Philip Reames 4d3134866f [RISCV] Add vector fabs intrinsic costs
We have a fabs vector instruction, and are using it for current lowering.
2022-08-24 14:09:51 -07:00
Simon Pilgrim f9de13232f [X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.

For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.

Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.

Differential Revision: https://reviews.llvm.org/D132520
2022-08-24 17:28:18 +01:00
Simon Pilgrim 9317e6311f [TTI] Add SK_Splice shuffle mask detection and X86 costs
Enables fixed sized vectors to detect SK_Splice shuffle patterns and provides basic X86 cost support

Differential Revision: https://reviews.llvm.org/D132374
2022-08-23 20:07:30 +01:00
Philip Reames 7c7dc10fcd [RISCV] Add cost model coverage for trig, log, and exp unary routines 2022-08-23 12:06:54 -07:00
Florian Hahn 494b6c46d6
[LAA] Add test cases where BTC can be used to rule out dependences.
Test cases for using the backedge-taken-count to rule out dependencies between
an invariant and strided accesses.
2022-08-22 13:11:26 +01:00
David Green 0cf9e47f27 [AArch64] Add SK_Splice fixed-width costs
A fixed length SK_Splice shuffle vector is lowered to a Ext under
AArch64, which should have a cost of 1.

Differential Revision: https://reviews.llvm.org/D132299
2022-08-22 12:44:57 +01:00
Simon Pilgrim d0397dbdae [CostModel][X86] Fix a off-by-one typo in the v64i8 splice shuffle test 2022-08-22 12:40:15 +01:00
Simon Pilgrim 541e24ef76 [CostModel][X86] Add some basic SK_Splice shuffle test coverage
These are currently recognised as SK_PermuteTwoSrc
2022-08-22 11:39:31 +01:00
Simon Pilgrim 7ff2a9f250 [CostModel][X86] Add CodeSize handling for fadd/fsub/fmul/fsqrt ops
Eventually this will be part of the cost table lookup
2022-08-21 17:42:11 +01:00
Simon Pilgrim 3c4391b4bb Revert rG15de7aaae52ef4be9f9ff3b130804e5b5ccd29f4 "[CostModel][X86] Add CodeSize/SizeLatency handling for fadd/fsub/fmul/fsqrt ops"
This is unintentionally affecting some backend tests
2022-08-21 16:51:45 +01:00
Simon Pilgrim 15de7aaae5 [CostModel][X86] Add CodeSize/SizeLatency handling for fadd/fsub/fmul/fsqrt ops
Eventually this will be part of the cost table lookup
2022-08-21 16:39:57 +01:00