llvm-project

Commit Graph

Author	SHA1	Message	Date
Matthias Braun	189900eb14	X86: Stop assigning register costs for longer encodings. This stops reporting CostPerUse 1 for `R8`-`R15` and `XMM8`-`XMM31`. This was previously done because instruction encoding require a REX prefix when using them resulting in longer instruction encodings. I found that this regresses the quality of the register allocation as the costs impose an ordering on eviction candidates. I also feel that there is a bit of an impedance mismatch as the actual costs occure when encoding instructions using those registers, but the order of VReg assignments is not primarily ordered by number of Defs+Uses. I did extensive measurements with the llvm-test-suite wiht SPEC2006 + SPEC2017 included, internal services showed similar patterns. Generally there are a log of improvements but also a lot of regression. But on average the allocation quality seems to improve at a small code size regression. Results for measuring static and dynamic instruction counts: Dynamic Counts (scaled by execution frequency) / Optimization Remarks: Spills+FoldedSpills -5.6% Reloads+FoldedReloads -4.2% Copies -0.1% Static / LLVM Statistics: regalloc.NumSpills mean -1.6%, geomean -2.8% regalloc.NumReloads mean -1.7%, geomean -3.1% size..text mean +0.4%, geomean +0.4% Static / LLVM Statistics: mean -2.2%, geomean -3.1%) regalloc.NumSpills mean -2.6%, geomean -3.9%) regalloc.NumReloads mean +0.6%, geomean +0.6%) size..text Static / LLVM Statistics: regalloc.NumSpills mean -3.0% regalloc.NumReloads mean -3.3% size..text mean +0.3%, geomean +0.3% Differential Revision: https://reviews.llvm.org/D133902	2022-09-30 16:01:33 -07:00
Bjorn Pettersson	0513b0305a	[X86] Avoid miscompile in combineOr (X86ISelLowering.cpp) In combineOr (X86ISelLowering.cpp) there is a DAG combine that rewrite a "(0 - SetCC) \| C" pattern into something simpler given that a LEA can be used. Another requirement is that C has some specific value, for example 1 or 7. When checking those requirements the code used a 32-bit unsigned variable to store the value of C. So for a 64-bit OR this could miscompile in case any of the 32 most significant bits in C were non zero. This patch adds fixes the bug by using a large enough type for the C value. The faulty code seem to have been introduced by commit `9bceb8981d` (D131358). Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D134892	2022-09-29 21:24:31 +02:00
Stefan Gränitz	ed8409dfa0	[ObjC][ARC] Fix target register for call expanded from CALL_RVMARKER on Windows Fix regression https://github.com/llvm/llvm-project/issues/56952 for Clang CodeGen on Windows. In the Windows ABI the instruction sequence that is expanded from CALL_RVMARKER should use RCX as target register and not RDI. Reviewed By: rnk, fhahn Differential Revision: https://reviews.llvm.org/D134441	2022-09-27 18:49:40 +02:00
Simon Pilgrim	196f27bb56	[CostModel][X86] Add missing cost kinds for v2i64 icmp on SLM	2022-09-25 15:12:21 +01:00
Simon Pilgrim	faff990e9b	[X86] Fix Icelake VPMULLQ zmm pipes and adjust AVX512DQ v8i64 mul costs to match worse case Icelake PMULLQ throughput regressed cf SkylakeServer as its Pipe0 only Confirmed with Intel SOM, Agner and instlatx64	2022-09-25 14:18:08 +01:00
Josh Stone	cb46ffdbf4	[X86] Use BuildStackAdjustment in stack probes This has the advantage of dealing with live EFLAGS, using LEA instead of SUB if needed to avoid clobbering. That also respects feature "lea-sp". We could allow unrolled stack probing from blocks with live-EFLAGS, if canUseAsEpilogue learns when emitStackProbeInlineGeneric will be used. Differential Revision: https://reviews.llvm.org/D134495	2022-09-23 09:30:32 -07:00
Josh Stone	26c37b461a	[X86] Don't allow prologue stack probing with live EFLAGS Fixes https://github.com/llvm/llvm-project/issues/49509 Differential Revision: https://reviews.llvm.org/D134494	2022-09-23 09:30:32 -07:00
Josh Stone	4dcfb09e40	[NFC][CodeGen] Use const MF in TargetLowering stack probe functions This makes them callable from places like canUseAsPrologue. Differential Revision: https://reviews.llvm.org/D134492	2022-09-23 09:30:32 -07:00
Simon Pilgrim	a6e9141505	[TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436) The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both. This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling. Fixes #50778 Differential Revision: https://reviews.llvm.org/D111968	2022-09-23 14:03:18 +01:00
Simon Pilgrim	98907f8685	[CostModel][X86] Tidyup sdiv/srem/udiv/urem by constant cost tables Preparation for adding cost kinds handling This is necessary to eventually unblock D111968	2022-09-22 20:46:33 +01:00
Simon Pilgrim	dc93202b44	[CostModel][X86] Remove duplicate ashr v4i64 cost table entry. NFCI.	2022-09-22 17:27:26 +01:00
Simon Pilgrim	e030be64d8	[CostModel][X86] Add partial CostKinds handling for funnelshifts/rotates This mainly just adds costs for the targets where we have actual funnelshift/rotate instructions (VBMI2/XOP etc.) - the cases where we expand still need addressing, although for many the default shift+or expansion, especially for uniform cases, isn't that bad. This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-22 11:24:11 +01:00
Simon Pilgrim	b2cd8118d0	[CostModel][X86] Add CostKinds handling for smax/smin/umax/umin instructions This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-22 10:19:23 +01:00
Simon Pilgrim	839ba13c3e	[CostModel][X86] Add vbmi2 costs for funnelshift/rotate intrinsics Add costs for the funnel shift instructions - fixes some discrepancies I was hitting with costs numbers from the 'cost-tables vs llvm-mca' script D103695	2022-09-21 13:48:22 +01:00
Serge Pavlov	181279ffcd	[X86][GlobalISel] Add support for sret demotion The change add support for the cases when return value is passed in memory rathen than in registers. Differential Revision: https://reviews.llvm.org/D134181	2022-09-20 11:47:53 +07:00
Simon Pilgrim	6b4d409f69	[CostModel][X86] Add CostKinds handling for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF instructions This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-19 17:37:58 +01:00
Simon Pilgrim	135c9b2c4b	[CostModel][X86] Add CostKinds handling for vector ctlz instructions This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-19 16:44:09 +01:00
Simon Pilgrim	2538adde5c	[CostModel][X86] Add CostKinds handling for cttz This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-19 15:57:03 +01:00
Simon Pilgrim	d90a42d64c	[CostModel][X86] Add CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF cost handling Without LZCNT/BMI, the *_ZERO_UNDEF costs are cheaper as they can avoid the zero handling.	2022-09-19 14:06:33 +01:00
Kazu Hirata	cf07277fb4	[X86] Fix the LEA optimization pass The LEA optimization pass visits each basic block of a given machine function. In each basic block, for each pair of LEAs that differ only in their displacement fields, we replace all uses of the second LEA with the first LEA while adjusting the displacement. Now, without this patch, after all the replacements are made, the following assert triggers: assert(MRI->use_empty(LastVReg) && "The LEA's def register must have no uses"); The replacement loop uses: for (MachineOperand &MO : llvm::make_early_inc_range(MRI->use_operands(LastVReg))) { which is equivalent to: for (auto UI = MRI->use_begin(LastVReg), UE = MRI->use_end(); UI != UE;) { MachineOperand &MO = UI++; // <-- Look! That is, immediately after the post increment, make_early_inc_range already has the iterator for the next iteration in its mind. The problem is that in one iteration of the loop, we could replace two uses in a debug instruction like: DBG_VALUE_LIST !"r", !DIExpression(DW_OP_LLVM_arg, 0), %0:gr64, %0:gr64, ... So, the iterator for the next iteration becomes invalid. We end up traversing a garbage use list from that point on. In turn, we don't get to visit remaining uses. The patch fixes the problem by switching to a "draining" while loop: while (!MRI->use_empty(LastVReg)) { MachineOperand &MO = MRI->use_begin(LastVReg); MachineInstr &MI = *MO.getParent(); The credit goes to Simon Pilgrim for reducing the test case. Fixes https://github.com/llvm/llvm-project/issues/57673 Differential Revision: https://reviews.llvm.org/D133631	2022-09-18 17:50:17 -07:00
David Majnemer	8a868d8859	Revert "Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView"" This reverts commit `cd20a18286` and adds a "let Heading" to NoStackProtectorDocs.	2022-09-16 19:39:48 +00:00
Simon Pilgrim	23cb1c42cd	[CostModel][X86] Update throughput costs for CTLZ ops This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (and recent fixes to the bdver2 + alderlake models) Adding full CostKinds costs are affecting some other tests as they make assumptions about SizeLatency costs, so they need addressing first	2022-09-16 16:56:49 +01:00
Simon Pilgrim	89e4cb603d	[X86] Add missing (unsupported) zmm vector move classes Although unsupported on HSW, we reuse this model for KNL which does require them Noticed when running the cost model fuzz script from D103695 with -mcpu=knl	2022-09-16 15:31:26 +01:00
Simon Pilgrim	f8fa04295f	[CostModel][X86] Add CostKinds handling for vector integer comparisons These were based off a mixture of vector integer add/sub costs and the numbers from the 'cost-tables vs llvm-mca' script from D103695 - the extra costs for different predicates are still proving tricky to implement, but I've gotten most costs to within +/1 now - the AVX512 are tricky as we still don't handle predicate results properly, so most of these were done by hand.	2022-09-16 13:03:41 +01:00
Sergei Barannikov	c6acb4eb0f	[SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s All in-tree targets pass pointer-sized ConstantSDNodes to the method. This overload reduced amount of boilerplate code a bit. This also makes getCALLSEQ_END consistent with getCALLSEQ_START, which already takes uint64_ts.	2022-09-15 14:02:12 -04:00
Simon Pilgrim	94620e4fc3	[CostModel][X86] Add CostKinds handling for vector shift by generic/non-uniform shift amounts These are the worst case generic vector shift costs, where nothing is known about the shift amounts - in particular this should stop us using the default sizelatency cost of 1 for so many pre-AVX2 vector shifts that can often actually expand during lowering to +20 uops, just for 128-bit vectors, resulting in some horrible inline/unroll decisions. This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)	2022-09-15 16:51:58 +01:00
Simon Pilgrim	0ec028fe10	[CostModel][X86] Add CostKinds handling for vector shift by uniform/constuniform ops Vector shift by const uniform is the cheapest shift instruction we have, non-const uniform have a marginally higher cost - some targets 'splat' the amount internally to use the shift-per-element instruction, others see a higher cost for the explicit zeroing of the upper bits for the (64-bit) shift amount. This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)	2022-09-15 14:05:30 +01:00
Simon Pilgrim	854a4595b6	[CostModel][X86] getArithmeticInstrCost - move GLM/SLM custom costs AFTER constant shift -> multiply canonicalization Corrects the shift by constant costs to better account for them being converted to multiples for lowering - which demonstrates that we should probably be trying harder NOT to convert these to multiplies for some CPUs (v4i32 in particular).	2022-09-14 11:46:26 +01:00
Simon Pilgrim	40ab7875f8	[CostModel][X86] Fix throughput costs for AVX512BW v32i16 shifts Fixes regression from `a931dbfbd3`	2022-09-14 11:18:23 +01:00
Sylvestre Ledru	cd20a18286	Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView" Causing: https://github.com/llvm/llvm-project/issues/57709 This reverts commit `ab56719acd`.	2022-09-13 10:53:59 +02:00
David Majnemer	ab56719acd	[clang, llvm] Add __declspec(safebuffers), support it in CodeView __declspec(safebuffers) is equivalent to __attribute__((no_stack_protector)). This information is recorded in CodeView. While we are here, add support for strict_gs_check.	2022-09-12 21:15:34 +00:00
Kazu Hirata	9606608474	[llvm] Use x.empty() instead of llvm::empty(x) (NFC) I'm planning to deprecate and eventually remove llvm::empty. I thought about replacing llvm::empty(x) with std::empty(x), but it turns out that all uses can be converted to x.empty(). That is, no use requires the ability of std::empty to accept C arrays and std::initializer_list. Differential Revision: https://reviews.llvm.org/D133677	2022-09-12 13:34:35 -07:00
Craig Topper	38ffa2bb96	[LegalizeTypes] Improve splitting for urem/udiv by constant for some constants. For remainder: If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer type, this urem will be expand by DAGCombiner using multiply by magic constant. We do have to take into account that adding high and low together can produce a carry, making it a (BitWidth / 2)+1 bit number. So we need to also add back in the carry from the first addition. For division: We can use the above trick to compute the remainder, subtract that remainder from the dividend, then multiply by the multiplicative inverse of the Divisor modulo (1 << BitWidth). This is based on the section "Remainder by Summing Digits" in Hacker's delight. The remainder trick is similar to a trick you may have learned for determining if a decimal number is divisible by 3. You can add all the digits together and see if the sum is divisible by 3. If you're not sure if the sum is divisible by 3, you can add its digits together. This can be repeated until you have a single decimal digit. If that digit is 3, 6, or 9, then the original number is divisible by 3. This works because 10 % 3 == 1. gcc already does this same trick. There are additional tricks gcc does urem as well as srem, udiv, and sdiv that I plan to add in future patches. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130862	2022-09-12 10:34:52 -07:00
Matthias Gehre	c1502425ba	Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth Also remove new-pass-manager version of ExpandLargeDivRem because there is no way yet to access TargetLowering in the new pass manager. Differential Revision: https://reviews.llvm.org/D133691	2022-09-12 17:06:16 +01:00
Simon Pilgrim	20ad05f9b4	[CostModel][X86] Add CostKinds handling for abs ops This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695	2022-09-12 16:34:37 +01:00
Simon Pilgrim	bd0109f392	[CostModel][X86] Move AVX512/AVX2 uniform shift costs into the generic uniform cost tables They shouldn't be happening after XOP shift costs - AVX2 shift supports takes preference over XOP for everything but vXi8 shifts - the improvement is pretty limited as it only affects bdver4 targets but it does help clean up a fraction of the messy shift cost logic....	2022-09-12 12:08:42 +01:00
Simon Pilgrim	a931dbfbd3	[CostModel][X86] Merge AVX512BW vXi8/vXi16 shifts into default AVX512BW cost table We only need to handle the uniform cases early	2022-09-10 18:18:42 +01:00
Simon Pilgrim	10edf88458	[CostModel][X86] Update CTPOP costs With the bdver2 model updates, many of the AVX1 costs were far too high - it also helped expose some costs mismatches for Atom/Silvermont	2022-09-10 17:57:20 +01:00
Simon Pilgrim	4994f87ca1	[X86] Fix bdver2 128-bit shuffles throughputs Noticed while trying to get vector ctpop/ctlz/cttz costs fixed using the script from D103695 - all of these are full-rate but the throughput costs were weirdly high for bdver2 Matches AMD 15h SoG, Agner and instlatx64	2022-09-10 17:34:40 +01:00
Simon Pilgrim	7785bd34e7	[X86] Fix bdver2 128-bit ALU/logic/shift throughputs Noticed while trying to get vector shifts costs fixed using the script from D103695 - all of these are full-rate but the throughput costs were weirdly high for bdver2 Matches AMD 15h SoG, Agner and instlatx64	2022-09-10 16:23:29 +01:00
Simon Pilgrim	05f56f10ed	[X86] Fix VPPERM load folding latency Noticed while investigating BITREVERSE cost numbers with the D103695 script - VPPERM folded loads was using the WriteVarShuffleX defaults and was missing an override like the VPPERM reg-reg variants	2022-09-09 13:57:39 +01:00
Simon Pilgrim	55b78e28d8	[CostModel][X86] Add missing i8 throughput cost	2022-09-09 10:58:51 +01:00
Joe Loser	5e96cea1db	[llvm] Use std::size instead of llvm::array_lengthof LLVM contains a helpful function for getting the size of a C-style array: `llvm::array_lengthof`. This is useful prior to C++17, but not as helpful for C++17 or later: `std::size` already has support for C-style arrays. Change call sites to use `std::size` instead. Differential Revision: https://reviews.llvm.org/D133429	2022-09-08 09:01:53 -06:00
Simon Pilgrim	e74102a963	[CostModel][X86] Merge getTypeBasedIntrinsicInstrCost into getIntrinsicInstrCost For the few non type based intrinsic cases we can just check for !isTypeBasedOnly() to access the args directly. I don't think we have a need to keep getTypeBasedIntrinsicInstrCost in BasicTTIImpl.h any more and can do a similar merge there as well - but it's a messier refactor and will take a while.	2022-09-07 12:04:09 +01:00
Marco Elver	0ba8886af5	[FastISel] Propagate PCSections metadata to MachineInstr Propagate PC sections metadata to MachineInstr when FastISel is doing instruction selection. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130884	2022-09-07 11:36:01 +02:00
Xiang1 Zhang	c836ddaf72	[X86][NFC] Refine load/store reg to StackSlot for extensibility Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D133078	2022-09-07 14:35:42 +08:00
Simon Pilgrim	648e182d92	[CostModel][X86] getIntrinsicInstrCost - convert to CostKindTblEntry Begin the refactoring to use CostKindTblEntry and return real latency/codesize/sizelatency costs instead of reusing the throughput numbers This should allow us to merge getTypeBasedIntrinsicInstrCost into getIntrinsicInstrCost and remove all remaining references	2022-09-06 22:05:32 +01:00
Markus Böck	f049b2c3fc	[MC] Emit Stackmaps before debug info This patch is essentially an alternative to https://reviews.llvm.org/D75836 and was mentioned by @lhames in a comment. The gist of the issue is that Mach-O has restrictions on which kind of sections are allowed after debug info has been emitted, which is also properly asserted within LLVM. Problem is that stack maps are currently emitted as one of the last sections in each target-specific AsmPrinter so far, which would cause the assertion to trigger. The current approach of special casing for the `__LLVM_STACKMAPS` section is not viable either, as downstream users can overwrite the stackmap format using plugins, which may want to use different sections. This patch fixes the issue by emitting the stack map earlier, right before debug info is emitted. The way this is implemented is by taking the choice when to emit the StackMap away from the target AsmPrinter and doing so in the base class. The only disadvantage of this approach is that the `StackMaps` member is now part of the base class, even for targets that do not support them. This is functionaly not a problem however, as emitting an empty `StackMaps` is a no-op. Differential Revision: https://reviews.llvm.org/D132708	2022-09-06 20:20:56 +02:00
Simon Pilgrim	10e0f3e948	[CostModel][X86] Add CostKinds handling for ctpop ops This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (although it still struggles with avx512 predicate numbers which had to be done manually) Some of the pre-AVX values still aren't great - atom/slm worst case numbers for ctpop expansion really affect these (especially throughput/latency), so we need to clean them up in a more consistent way - its a pity we don't have models for more older cpus (merom/nehalem etc.) as other examples.	2022-09-06 17:27:24 +01:00
Matthias Gehre	2090e85fee	[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64 This adds the ExpandLargeDivRem to the default pass pipeline. The limit at which it expands div/rem instructions is configured via a new TargetTransformInfo hook (default: no expansion) X86, Arm and AArch64 backends implement this hook to expand div/rem instructions with more than 128 bits. Differential Revision: https://reviews.llvm.org/D130076	2022-09-06 15:32:04 +01:00

1 2 3 4 5 ...

22902 Commits