llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	c444af1c20	[CostModel][X86] Add CostKinds handling for mul ops This was achieved using the 'cost-tables vs llvm-mca' script D103695 Also fix a missing pmullw v16i16 half-rate throughput as znver1 double-pumps - matches numbers from AMD SoG + Agner	2022-09-04 11:59:05 +01:00
Simon Pilgrim	c4a174be91	[CostModel][X86] Add vector shift test coverage for codesize/latency/size-latency cost kinds	2022-09-04 11:02:30 +01:00
Simon Pilgrim	444685de06	[CostModel][X86] Adjust mul v4i32/v8i32 throughput cost Based off the numbers from AMD SoG + Agner - vXi32 are both half-rate, and znver1 double-pumps the v8i32 op We should have caught this earlier as many Intel models have half-rate pmulld already :-(	2022-09-03 18:45:08 +01:00
Simon Pilgrim	114b7762a9	[CostModel][X86] Add CostKinds handling for add/sub ops This was achieved using the 'cost-tables vs llvm-mca' script D103695	2022-09-03 18:45:08 +01:00
Simon Pilgrim	5aee2726d8	[CostModel][X86] Add CostKinds handling for fdiv ops This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping' As the uop count (used for TCK_SizeAndLatency) for divss/divps is typically so low, we need to override isExpensiveToSpeculativelyExecute to ensure we keep fdiv calls behind branches - although for some very recent cpu targets it might not be necessary any more and could be relaxed.	2022-09-03 15:48:39 +01:00
Simon Pilgrim	1c12e12111	[CostModel][X86] Add fdiv(double) throughput x87 costs for	2022-09-03 14:08:25 +01:00
Simon Pilgrim	0735200e3f	[CostModel][X86] Add CostKinds handling for fmul ops This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'	2022-09-03 10:42:20 +01:00
Eli Friedman	b219a9c0a2	[CostModel][AArch64] Fix ctpop intrinsic cost when NEON is disabled. If we don't have NEON, we use the generic fallback, which takes 12 instructions. Make sure the costs reflect that. (On a related note, we could optimize the generic fallback a bit. It currently uses sequences like lsr+and+add; if we use and+lsr+add instead, we can fold the lsr into the add.) Differential Revision: https://reviews.llvm.org/D133154	2022-09-02 15:17:55 -07:00
Mingming Liu	242203d254	[AArch64][TTI] Add cost table entry for trunc over vector of integers. 1) Tablegen patterns exist to use 'xtn' and 'uzp1' for trunc [1]. Cost table entries are updated based on the actual number of {xtn, uzp1} instructions generated. 2) Without this, an IR instruction like trunc <8 x i16> %v to <8 x i8> is considered free and might be sinked to other basic blocks. As a result, the sinked 'trunc' is in a different basic block with its (usually not-free) vector operand and misses the chance to be combined during instruction selection. (examples in [2]) 3) It's a lot of effort to teach CodeGenPrepare.cpp to sink the operand of trunc without introducing regressions, since the instruction to compute the operand of trunc could be faster (e.g., throughput) than the instruction corresponding to "trunc (bin-vector-op". For instance in [3], sinking %1 (as trunc operand) into bb.1 and bb.2 means to replace 2 xtn with 2 shrn (shrn has a throughput of 1 and only utilize v1 pipeline), which is not necessarily good, especially since ushr result needs to be preserved for store operation in bb.0. Meanwhile, it's too optimistic (for CodeGenPrepare pass) to assume machine-cse will always be able to de-dup shrn from various basic blocks into one shrn. [1] For {v8i16->v8i8, v4i32->v4i16, v2i64->v2i32}, `813ae2871d/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L4472)`. For concat (trunc, trunc) -> uzip1, `813ae2871d/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L5428-L5437)` [2] examples - trunc(umin(X, 255)) -> UQXTRN v8i8 (and other {u,s}x{min,max} pattern for v8i16 operands) from `813ae2871d/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L4515-L4528)` - trunc (AArch64vlshr v8i16, imm) -> SHRNv8i8 (same missed for SHRNv2i32) from `813ae2871d/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L6743-L6748)` [3] --- ; instruction latency / throughput / pipeline on `neoverse-n1` bb.0: %1 = lshr <8 x i16> %10, <i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4> ; ushr, latency 2, throughput 1, pipeline V1 %2 = trunc <8 x i16> %1 to <8 x i8> ; xtn, latency 2, throughput 2, pipeline V %3 = store <8 x i8> %1, ptr %addr br cond i1 cond, label bb.1, label bb.2 bb.1: %4 = trunc <8 x i16> %1 to <8 x i8> ; xtn bb.2: %5 = trunc <8 x i16> %1 to <8 x i8> ; xtn --- Differential Revision: https://reviews.llvm.org/D132784	2022-09-02 10:06:55 -07:00
Simon Pilgrim	116d8f8cf0	Revert rG11765b77be84d793ebedc5b5436c463490746131 "[CostModel][X86] Add CostKinds handling for fmul ops" I need to address some x87 codegen changes before re-committing this.	2022-09-02 17:21:25 +01:00
Simon Pilgrim	11765b77be	[CostModel][X86] Add CostKinds handling for fmul ops This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'	2022-09-02 16:57:23 +01:00
Simon Pilgrim	ad16f3e413	[CostModel][X86] Add CostKinds handling for fadd/fsub/fneg ops This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 which I'll update shortly As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'	2022-09-02 11:50:01 +01:00
liqinweng	62a238a1f8	[RISCV][NFC] Add cost model tests of llvm.fmuladd Reviewed By: benshi001 Differential Revision: https://reviews.llvm.org/D132922	2022-09-01 15:33:02 +08:00
Mingming Liu	75f1b328f8	[AArch64][CostModel][NFC] Specify target datalayout explicitly for cost analysis test. - Use linux little endian data layout string. Differential Revision: https://reviews.llvm.org/D132889	2022-08-31 09:52:24 -07:00
Simon Pilgrim	ad8e4dd2ad	[CostModel][X86] Add and/or/xor general cost kinds support Account for double-pumping on early AVX1/AVX2 targets	2022-08-31 17:26:05 +01:00
jacquesguan	45c1ce321d	[RISCV] Add cost model for select and integer compare instructions. This patch adds cost model for vector select and integer compare instructions.	2022-08-31 11:32:58 +08:00
jacquesguan	4fd53fd8eb	[RISCV][test] Add cost model coverage for compare instructions. Differential Revision: https://reviews.llvm.org/D132827	2022-08-31 10:59:57 +08:00
Mingming Liu	3785234b03	[NFC][AArch64] Specify datalayout explicitly for cast.ll and arith-overflow.ll and update tests accordingly. - These two tests stands out when data layout is explicitly added in a sweep study (D132889) Differential Revision: https://reviews.llvm.org/D132856	2022-08-30 09:43:29 -07:00
Simon Pilgrim	7830445086	[CostModel][X86] Account for add/sub 512-bit vector splitting costs on non-AVX512BW targets	2022-08-30 16:54:06 +01:00
jacquesguan	ae4422982a	[RISCV][NFC] Add cost model coverage for fp arithmetic instructions. This patch adds cost model coverage for fp arithmetic instructions. Some is not exact, I am working on a revision to implement that. Differential Revision: https://reviews.llvm.org/D132537	2022-08-29 11:06:42 +08:00
Arthur Eubanks	7a94d189ad	[LazyCallGraph] Update libcall list when replacing a libcall node's function Otherwise when we visit all libcalls in updateCGAndAnalysisManagerForPass(), the old libcall is dead and doesn't have a node. We treat libcalls conservatively in LazyCallGraph because any function may introduce calls to them out of thin air. It is weird to change the signature of a libcall since introducing calls to the libcall with a different signature may break, but other passes like deadargelim already do it, so let's preserve this behavior for now. Fixes an issue found in D128830. Reviewed By: psamolysov Differential Revision: https://reviews.llvm.org/D132764	2022-08-27 10:57:53 -07:00
Philip Reames	b45a262679	[RISCV] Enable fixed length vectors and loop vectorization with same This change enables the use of RISCV's variable length vector registers for fixed length vectors in the IR, and implicitly enables various IR transforms which generate fixed length vectors if legal (e.g. LoopVectorize). Specifically, this enables fixed length vectors which are known to be inbounds of the underlying variable hardware size. For context, remember that the +V extension provides a minimum VLEN of 128. The embedded variants provide lower minimums. The analogy here is essentially vectorizing for SSE on a machine which may or may not include AVX2/AVX512. We won't get full utilization by default, but we will get some benefit. And of course, with an explicit mcpu we can vectorize to the exact target hardware. The LV impact is mostly related to vectorizer robustness. In cases we haven't yet fully implemented scalable vectorization support, we can fall back to fixed length vectorization. SLP has been disabled for now, even when fixed vectors are enabled. See `a310637` and associated review. There are a few addiitional code quality issues which need worked through before turning SLP on would be reasonable. Differential Revision: https://reviews.llvm.org/D131508	2022-08-26 14:45:23 -07:00
Simon Pilgrim	0d0dc4e6ab	[CostModel][X86] Add CodeSize handling for and/or/xor ops Eventually this will be part of the cost table lookup	2022-08-26 18:42:52 +01:00
Simon Pilgrim	f9445ae75c	[CostModel][X86] Add CodeSize handling for fneg ops Eventually this will be part of the cost table lookup	2022-08-26 17:34:52 +01:00
Paul Walker	3bb228729f	[CostModel][SVE] Correct cost model of SK_Splice shuffles for <vscale x 1 x Ty> vector types. AArch64TTIImpl::getSpliceCost() is now used more aggressively and LNT (MultiSource/Benchmarks/mafft) exposed a failure case for <vscale x 1 x i1>. I've tested other element types and whilst they can be costed they cannot be code generated, so this patch returns InstructionCost::getInvalid() for all cases.	2022-08-26 16:06:01 +01:00
Florian Hahn	555e09c2b0	[LAA] Rename printing pass to print<access-info>. This updates the naming for the LAA printing pass to be in line with most other analysis printing passes. The old name has come up as confusing multiple times already, e.g. in D131924.	2022-08-26 11:00:09 +01:00
Philip Reames	53f738ce7e	[RISCV] Add empirical costs for integer min/max and saturing add/sub All of these are lowered to a single instruction for all legal vector types.	2022-08-25 09:27:17 -07:00
Philip Reames	ca5f8b0909	[RISCV][CostModel] Correct typo in saturating intrinsic names The fact that we silently accept unrecognized intrinsic names is sometimes a bit annoying.	2022-08-25 09:10:22 -07:00
Philip Reames	4006928669	[RISCV][CostModel] Add test coverage for all the vectorizable binary intrinsics	2022-08-25 08:56:02 -07:00
Simon Pilgrim	3edec9ba60	[CostModel][X86] Support cost kind specific look up tables (REAPPLIED) Most of our cost model tables have been created assuming cost kind == recip-throughput. But we're starting to see passes wanting to get accurate costs for the other kinds as well. Some of these can be determined procedurally (e.g. codesize by default could just be the split count after type legalization), but others are going to need to be handled in cost tables - this is especially true for x86 which has so many ISA combinations. I've created a 'CostKindCosts' struct which can hold cost values for the 4 cost kinds, defaulting to -1U for unknown cost, this can be used with the existing CostTblEntryT/CostTableLookup template code. I've also added a [TargetCostKind] accessor to make it much easier to look up individual <Optional> costs. This just changes the ISD::SELECT costs to check the effect (and also to check that the ISD::SETCC are correctly handled for default/None cost kinds) - the plan would be to slowly extend this and move the CostKindTblEntry type somewhere generic to allow other targets to use it once its matured. I'm also going to resurrect D103695 so that it can help with latency/codesize/sizelatency coverage testing. For sizelatency - IIRC the definition was vague to let it be target specific - I've tried to use typical uop counts so they're comparable to MicroOpBufferSize etc. REAPPLIED: Added early out to prevent getCmpSelInstrCost being used for anything but generic integer/float scalar/vector types - getTypeLegalizationCost can't handle the "exotic" TypeID enums that some passes attempt to get a costs for (aggregates etc.). Differential Revision: https://reviews.llvm.org/D132216	2022-08-25 16:49:17 +01:00
Benjamin Kramer	ab85996e47	Revert "[CostModel][X86] Support cost kind specific look up tables" This reverts commit `45846854a2`. This triggers an assertion failure during Clang selfhost Unknown type! UNREACHABLE executed at llvm/lib/CodeGen/ValueTypes.cpp:548! * SIGABRT received by PID 6107 (TID 6107) on cpu 218 from PID 6107; stack trace: * @ 0x556c8827c2d1 64 llvm::llvm_unreachable_internal() @ 0x556c82a5542a 32 llvm::MVT::getVT() @ 0x556c82a54a28 80 llvm::EVT::getEVT() @ 0x556c7dda1526 80 llvm::TargetLoweringBase::getValueType() @ 0x556c8174dd38 112 llvm::BasicTTIImplBase<>::getTypeLegalizationCost() @ 0x556c81755e72 144 llvm::X86TTIImpl::getCmpSelInstrCost() @ 0x556c8174cadf 512 llvm::TargetTransformInfoImplCRTPBase<>::getInstructionCost() @ 0x556c84ab4dd2 32 llvm::TargetTransformInfo::getInstructionCost() @ 0x556c82ead283 1968 llvm::sinkRegion()	2022-08-25 15:42:44 +02:00
Simon Pilgrim	2e5f16516a	[CostModel][X86] Add CodeSize handling for fdiv ops Eventually this will be part of the cost table lookup	2022-08-25 14:08:03 +01:00
Simon Pilgrim	45846854a2	[CostModel][X86] Support cost kind specific look up tables Most of our cost model tables have been created assuming cost kind == recip-throughput. But we're starting to see passes wanting to get accurate costs for the other kinds as well. Some of these can be determined procedurally (e.g. codesize by default could just be the split count after type legalization), but others are going to need to be handled in cost tables - this is especially true for x86 which has so many ISA combinations. I've created a 'CostKindCosts' struct which can hold cost values for the 4 cost kinds, defaulting to -1U for unknown cost, this can be used with the existing CostTblEntryT/CostTableLookup template code. I've also added a [TargetCostKind] accessor to make it much easier to look up individual <Optional> costs. This just changes the ISD::SELECT costs to check the effect (and also to check that the ISD::SETCC are correctly handled for default/None cost kinds) - the plan would be to slowly extend this and move the CostKindTblEntry type somewhere generic to allow other targets to use it once its matured. I'm also going to resurrect D103695 so that it can help with latency/codesize/sizelatency coverage testing. For sizelatency - IIRC the definition was vague to let it be target specific - I've tried to use typical uop counts so they're comparable to MicroOpBufferSize etc. Differential Revision: https://reviews.llvm.org/D132216	2022-08-25 12:23:36 +01:00
Philip Reames	0473ac8876	[RISCV] Remove cttz/ctlz cost model coverage for the moment I'd used the wrong signatures for these as I had not remembered we'd added the second boolean argument. We appear to still parse the old form just fine, but we should use the two argument form in test since that's what the LangRef actually describes.	2022-08-24 15:55:46 -07:00
Philip Reames	03798f268b	{RISCV] Backout cttz/ctlz instruction costs Craig points out correctly in post-commit review that these depend on the availability of floating point extensions.	2022-08-24 15:40:48 -07:00
Philip Reames	d4d6e71ea2	[RISCV] Add empirical costs for bswap/bitreverse/ctpop/ctlz/cttz If anyone is looking for a source of ideas on vector codegen improvements, the lowerings for several of these seem to include pretty obvious fixits.	2022-08-24 15:09:21 -07:00
Philip Reames	cb3f32a20d	[RISCV] Add cost model coverage for integer bitmanip intrinsics	2022-08-24 15:09:21 -07:00
Philip Reames	42af1a776a	[RISCV] Add empirically measured vector sqrt intrinsic costs	2022-08-24 14:27:57 -07:00
Philip Reames	0ad88e74eb	[RISCV] Add cost model coverage for floating point sqrt intrinsic	2022-08-24 14:27:27 -07:00
Philip Reames	4d3134866f	[RISCV] Add vector fabs intrinsic costs We have a fabs vector instruction, and are using it for current lowering.	2022-08-24 14:09:51 -07:00
Simon Pilgrim	f9de13232f	[X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis. For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling. Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU. Differential Revision: https://reviews.llvm.org/D132520	2022-08-24 17:28:18 +01:00
Simon Pilgrim	9317e6311f	[TTI] Add SK_Splice shuffle mask detection and X86 costs Enables fixed sized vectors to detect SK_Splice shuffle patterns and provides basic X86 cost support Differential Revision: https://reviews.llvm.org/D132374	2022-08-23 20:07:30 +01:00
Philip Reames	7c7dc10fcd	[RISCV] Add cost model coverage for trig, log, and exp unary routines	2022-08-23 12:06:54 -07:00
Florian Hahn	494b6c46d6	[LAA] Add test cases where BTC can be used to rule out dependences. Test cases for using the backedge-taken-count to rule out dependencies between an invariant and strided accesses.	2022-08-22 13:11:26 +01:00
David Green	0cf9e47f27	[AArch64] Add SK_Splice fixed-width costs A fixed length SK_Splice shuffle vector is lowered to a Ext under AArch64, which should have a cost of 1. Differential Revision: https://reviews.llvm.org/D132299	2022-08-22 12:44:57 +01:00
Simon Pilgrim	d0397dbdae	[CostModel][X86] Fix a off-by-one typo in the v64i8 splice shuffle test	2022-08-22 12:40:15 +01:00
Simon Pilgrim	541e24ef76	[CostModel][X86] Add some basic SK_Splice shuffle test coverage These are currently recognised as SK_PermuteTwoSrc	2022-08-22 11:39:31 +01:00
Simon Pilgrim	7ff2a9f250	[CostModel][X86] Add CodeSize handling for fadd/fsub/fmul/fsqrt ops Eventually this will be part of the cost table lookup	2022-08-21 17:42:11 +01:00
Simon Pilgrim	3c4391b4bb	Revert rG15de7aaae52ef4be9f9ff3b130804e5b5ccd29f4 "[CostModel][X86] Add CodeSize/SizeLatency handling for fadd/fsub/fmul/fsqrt ops" This is unintentionally affecting some backend tests	2022-08-21 16:51:45 +01:00
Simon Pilgrim	15de7aaae5	[CostModel][X86] Add CodeSize/SizeLatency handling for fadd/fsub/fmul/fsqrt ops Eventually this will be part of the cost table lookup	2022-08-21 16:39:57 +01:00

1 2 3 4 5 ...

3545 Commits