llvm-project

Commit Graph

Author	SHA1	Message	Date
Paweł Bylica	e399c58778	[DAGCombine] Add tests for D57317 Add two tests for D57317: Deduplicate addcarry node using commutativity. https://reviews.llvm.org/D57317	2022-10-01 16:59:44 +02:00
Matthias Braun	189900eb14	X86: Stop assigning register costs for longer encodings. This stops reporting CostPerUse 1 for `R8`-`R15` and `XMM8`-`XMM31`. This was previously done because instruction encoding require a REX prefix when using them resulting in longer instruction encodings. I found that this regresses the quality of the register allocation as the costs impose an ordering on eviction candidates. I also feel that there is a bit of an impedance mismatch as the actual costs occure when encoding instructions using those registers, but the order of VReg assignments is not primarily ordered by number of Defs+Uses. I did extensive measurements with the llvm-test-suite wiht SPEC2006 + SPEC2017 included, internal services showed similar patterns. Generally there are a log of improvements but also a lot of regression. But on average the allocation quality seems to improve at a small code size regression. Results for measuring static and dynamic instruction counts: Dynamic Counts (scaled by execution frequency) / Optimization Remarks: Spills+FoldedSpills -5.6% Reloads+FoldedReloads -4.2% Copies -0.1% Static / LLVM Statistics: regalloc.NumSpills mean -1.6%, geomean -2.8% regalloc.NumReloads mean -1.7%, geomean -3.1% size..text mean +0.4%, geomean +0.4% Static / LLVM Statistics: mean -2.2%, geomean -3.1%) regalloc.NumSpills mean -2.6%, geomean -3.9%) regalloc.NumReloads mean +0.6%, geomean +0.6%) size..text Static / LLVM Statistics: regalloc.NumSpills mean -3.0% regalloc.NumReloads mean -3.3% size..text mean +0.3%, geomean +0.3% Differential Revision: https://reviews.llvm.org/D133902	2022-09-30 16:01:33 -07:00
Simon Pilgrim	ba8e2cb90d	[X86] Tweak avx512-gfni-intrinsics.ll tests to avoid xor(select(c,x,0)) 'passthrough' patterns These can be manipulated by foldSelectWithIdentityConstant and lose the predicate/predicate-zero instruction test coverage - use an insertvalue chain into an aggregate instead to retain all the results. Noticed while trying to convert foldSelectWithIdentityConstant to use llvm::isNeutralConstant	2022-09-30 15:38:17 +01:00
Serge Pavlov	b3913a9cdf	[GlobalISel] Do not crash on widening vector result Function buildCopyToRegs did not handle properly the case when it should make wider vector result. It happened, for example, in a function that returns value of type <2 x f32>, which should be widen to <4 x f32> to fit XMM register. The function eventually calls MachineIRBuilder.buildUnmerge, which does not expect that only one destination register is specified. Now this case is treated specifically in buildCopyToRegs. Differential Revision: https://reviews.llvm.org/D128546	2022-09-30 21:30:55 +07:00
Amaury Séchet	d7600c7ccb	[DAG] select Cond, C, -1 --> or (sext (not Cond)), C when C is MVT::i1 In the spirit of D130765 . Get rid of cbranches and/or cmov. Usually shorter, but sometime not, becaus eit's hard to prededict when dependency breaking xor will be introduced. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D134736	2022-09-30 00:36:58 +00:00
Bjorn Pettersson	0513b0305a	[X86] Avoid miscompile in combineOr (X86ISelLowering.cpp) In combineOr (X86ISelLowering.cpp) there is a DAG combine that rewrite a "(0 - SetCC) \| C" pattern into something simpler given that a LEA can be used. Another requirement is that C has some specific value, for example 1 or 7. When checking those requirements the code used a 32-bit unsigned variable to store the value of C. So for a 64-bit OR this could miscompile in case any of the 32 most significant bits in C were non zero. This patch adds fixes the bug by using a large enough type for the C value. The faulty code seem to have been introduced by commit `9bceb8981d` (D131358). Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D134892	2022-09-29 21:24:31 +02:00
Bjorn Pettersson	e4fcbf3950	[X86] Pre-commit test case showing bug in combineOr (X86ISelLowering.cpp) In combineOr (X86ISelLowering.cpp) there is a DAG combine that rewrite a "(0 - SetCC) \| C" pattern into something simpler given that a LEA can be used. Another requirement is that C has some specific value, for example 1 or 7. When doing that check it is using a 32-bit unsigned variable to store the value of C. So for a 64-bit OR this could miscompile in case any of the 32 most significant bits in C are set. This patch adds a test case to show this miscompile bug. Differential Revision: https://reviews.llvm.org/D134890	2022-09-29 21:24:31 +02:00
Stefan Gränitz	4a617c426d	[WinEH] Prepare test win64-funclet-preisel-intrinsics.ll for extension to nested try-catch case (NFC)	2022-09-29 11:30:27 +02:00
Amaury Séchet	c78e947d26	Change constant in cmov-promotion to avoid optimizations	2022-09-27 21:14:13 +00:00
Amaury Séchet	4bab490de2	[X86] Add test case for D134736. NFC	2022-09-27 21:07:10 +00:00
Stefan Gränitz	ed8409dfa0	[ObjC][ARC] Fix target register for call expanded from CALL_RVMARKER on Windows Fix regression https://github.com/llvm/llvm-project/issues/56952 for Clang CodeGen on Windows. In the Windows ABI the instruction sequence that is expanded from CALL_RVMARKER should use RCX as target register and not RDI. Reviewed By: rnk, fhahn Differential Revision: https://reviews.llvm.org/D134441	2022-09-27 18:49:40 +02:00
Amaury Séchet	d1baed7c9c	[DAG] select Cond, -1, C --> or (sext Cond), C if Cond is MVT::i1 This seems to be beneficial overall, except for midpoint-int.ll . The X86 backend seems to generate zeroing that are not necesary. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D131260	2022-09-27 12:54:52 +00:00
Simon Pilgrim	8427c836f7	[X86] Clean up prefixes for avoid-sfb.ll Simplifies the diff for D134697	2022-09-27 11:01:17 +01:00
Amaury Séchet	79b69bf8c9	Autogenerate stack-folding-fp X86 tests. NFC	2022-09-26 14:26:08 +00:00
Han Zhu	67a04edd4e	[X86] Pre-commit unit test for D134477	2022-09-24 21:51:35 -07:00
Leonard Chan	79565766be	Reland "[llvm] Support forward-referenced globals with dso_local_equivalent" This reverts commit `eef5db2c74`. See https://github.com/llvm/llvm-project/issues/57815. dso_local_equivalent would fail with an assertion on forward-referenced globals. This is an issue that only comes up in textual IR, which is why we've never seen this assertion with clang. Differential Revision: https://reviews.llvm.org/D134234	2022-09-23 18:32:07 +00:00
Josh Stone	cb46ffdbf4	[X86] Use BuildStackAdjustment in stack probes This has the advantage of dealing with live EFLAGS, using LEA instead of SUB if needed to avoid clobbering. That also respects feature "lea-sp". We could allow unrolled stack probing from blocks with live-EFLAGS, if canUseAsEpilogue learns when emitStackProbeInlineGeneric will be used. Differential Revision: https://reviews.llvm.org/D134495	2022-09-23 09:30:32 -07:00
Josh Stone	26c37b461a	[X86] Don't allow prologue stack probing with live EFLAGS Fixes https://github.com/llvm/llvm-project/issues/49509 Differential Revision: https://reviews.llvm.org/D134494	2022-09-23 09:30:32 -07:00
Leonard Chan	eef5db2c74	Revert "[llvm] Support forward-referenced globals with dso_local_equivalent" This reverts commit `411020ad1c`. One of the tests here fails on some upstream builders: https://lab.llvm.org/buildbot#builders/16/builds/35314	2022-09-21 20:14:30 +00:00
Leonard Chan	411020ad1c	[llvm] Support forward-referenced globals with dso_local_equivalent See https://github.com/llvm/llvm-project/issues/57815. dso_local_equivalent would fail with an assertion on forward-referenced globals. This is an issue that only comes up in textual IR, which is why we've never seen this assertion with clang. Differential Revision: https://reviews.llvm.org/D134234	2022-09-21 19:31:35 +00:00
Serge Pavlov	181279ffcd	[X86][GlobalISel] Add support for sret demotion The change add support for the cases when return value is passed in memory rathen than in registers. Differential Revision: https://reviews.llvm.org/D134181	2022-09-20 11:47:53 +07:00
Simon Pilgrim	8206044183	[DAG] SimplifyDemandedVectorElts - add MULHS/MULHU handling to existing MUL/AND handling Allows to determine known zero elements, which particularly helps simplification of DIV/REM by constant patterns	2022-09-19 12:44:43 +01:00
Kazu Hirata	cf07277fb4	[X86] Fix the LEA optimization pass The LEA optimization pass visits each basic block of a given machine function. In each basic block, for each pair of LEAs that differ only in their displacement fields, we replace all uses of the second LEA with the first LEA while adjusting the displacement. Now, without this patch, after all the replacements are made, the following assert triggers: assert(MRI->use_empty(LastVReg) && "The LEA's def register must have no uses"); The replacement loop uses: for (MachineOperand &MO : llvm::make_early_inc_range(MRI->use_operands(LastVReg))) { which is equivalent to: for (auto UI = MRI->use_begin(LastVReg), UE = MRI->use_end(); UI != UE;) { MachineOperand &MO = UI++; // <-- Look! That is, immediately after the post increment, make_early_inc_range already has the iterator for the next iteration in its mind. The problem is that in one iteration of the loop, we could replace two uses in a debug instruction like: DBG_VALUE_LIST !"r", !DIExpression(DW_OP_LLVM_arg, 0), %0:gr64, %0:gr64, ... So, the iterator for the next iteration becomes invalid. We end up traversing a garbage use list from that point on. In turn, we don't get to visit remaining uses. The patch fixes the problem by switching to a "draining" while loop: while (!MRI->use_empty(LastVReg)) { MachineOperand &MO = MRI->use_begin(LastVReg); MachineInstr &MI = *MO.getParent(); The credit goes to Simon Pilgrim for reducing the test case. Fixes https://github.com/llvm/llvm-project/issues/57673 Differential Revision: https://reviews.llvm.org/D133631	2022-09-18 17:50:17 -07:00
Sotiris Apostolakis	b827e7c600	[SelectOpti] Restrict load sinking This is a follow-up to D133777, which resolved a use-after-free case but did not cover all possible memory bugs due to misplacement of loads. In short, the overall problem was that sinked loads could be moved after state-modifying instructions leading to memory bugs. The solution is to restrict load sinking unless it is found to be sound. i) Within a basic block (to-be-sinked load and select-user are in the same BB), loads can be sinked only if there is no intervening state-modifying instruction. This is a conservative approach to avoid resorting to alias analysis to detect potential memory overlap. ii) Across basic blocks, sinking of loads is avoided. This is because going over multiple basic blocks looking for memory conflicts could be computationally expensive and also unlikely to allow loads to sink. Further, experiments showed that not sinking these loads has a slight positive performance effect. Maybe for some of these loads, having some separation allows enough time for the load to be executed in time for its user. This is not the case for floating point operations that benefit more from sinking. The solution in D133777 was essentially undone in this patch, since the latter is a complete solution to the observed problem. Overall, the performance impact of this patch is minimal. Tested on two internal Google workloads with instrPGO. Search application showed <0.05% perf difference, while the database one showed a slight improvement, but not statistically significant. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D133999	2022-09-16 20:50:46 +00:00
Liqiang Tao	2e37557fde	StackProtector: ensure stack checks are inserted before the tail call The IR stack protector pass should insert stack checks before the tail calls not only the musttail calls. So that the attributes `ssqreq` and `tail call`, which are emited by llvm-opt, could be both enabled by llvm-llc. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D133860	2022-09-16 22:24:46 +08:00
Nikita Popov	b4309800e9	[CodeGen] Don't zero callee-save registers with zero-call-used-regs (PR57692) Callee save registers must be preserved, so -fzero-call-used-regs should not be zeroing them. The previous implementation only did not zero callee save registers that were saved&restored inside the function, but we need preserve all of them. Fixes https://github.com/llvm/llvm-project/issues/57692. Differential Revision: https://reviews.llvm.org/D133946	2022-09-16 11:52:29 +02:00
Craig Topper	ace05124f5	[IntegerDivision][AMDGPU] Use CreateLogicalOr to block poison propagation. There are two ctlz intrinsics here with the zero_is_poison flag set. There are also two comparisons that check if either of the inputs the ctlzs are zero. We need to use a logical or to block the poison from the ctlz if either of the inputs is zero. Reviewed By: arsenm, aqjune Differential Revision: https://reviews.llvm.org/D130680	2022-09-15 09:38:02 -07:00
Sotiris Apostolakis	eda61fb656	[SelectOpti] Fix lifetime intrinsic bug When a select is converted to a branch and load instructions are sinked to the true/false blocks, lifetime intrinsics (if present) could be made unsound if not moved. This conservatively moves all lifetime intrinsics in a transformed BB to the end block to ensure preserved lifetime semantics. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D133777	2022-09-13 19:00:18 +00:00
Simon Pilgrim	8bf04e9f2a	[X86] Add GFNI test coverage for bitreverse codegen We should be able to efficiently use the vector version for scalar bitreverse, like we do for XOP.	2022-09-13 11:23:03 +01:00
Matthias Gehre	6bf1b4e8e0	Move ExpandLargeDivRem to llvm/test/CodeGen/X86 because they need a triple	2022-09-13 08:29:54 +01:00
Matthias Braun	d871bce265	Use update_mir_test_checks for some more tests.	2022-09-12 11:35:52 -07:00
Craig Topper	38ffa2bb96	[LegalizeTypes] Improve splitting for urem/udiv by constant for some constants. For remainder: If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer type, this urem will be expand by DAGCombiner using multiply by magic constant. We do have to take into account that adding high and low together can produce a carry, making it a (BitWidth / 2)+1 bit number. So we need to also add back in the carry from the first addition. For division: We can use the above trick to compute the remainder, subtract that remainder from the dividend, then multiply by the multiplicative inverse of the Divisor modulo (1 << BitWidth). This is based on the section "Remainder by Summing Digits" in Hacker's delight. The remainder trick is similar to a trick you may have learned for determining if a decimal number is divisible by 3. You can add all the digits together and see if the sum is divisible by 3. If you're not sure if the sum is divisible by 3, you can add its digits together. This can be repeated until you have a single decimal digit. If that digit is 3, 6, or 9, then the original number is divisible by 3. This works because 10 % 3 == 1. gcc already does this same trick. There are additional tricks gcc does urem as well as srem, udiv, and sdiv that I plan to add in future patches. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130862	2022-09-12 10:34:52 -07:00
Craig Topper	545affbf79	[DAGCombiner] Use HandleSDNode to keep node alive across call to getNegatedExpression. getNegatedExpression can delete nodes. If the first call to getNegatedExpression produced a node that the second call also manages to create, it might get deleted. Use a HandleSDNode to ensure it has a use to prevent it from being deleted. Fixes PR57658. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D133602	2022-09-09 22:02:41 -07:00
Craig Topper	aa83bdd198	[DAGCombiner][X86] Fold (sub (subcarry X, 0, Carry), Y) -> (subcarry X, Y, Carry) Fixes PR57576. Differential Revision: https://reviews.llvm.org/D133471	2022-09-08 22:56:46 -07:00
Eric Wang	d8a2d3f7d4	[NFC][Regalloc] Introduce the RegAllocPriorityAdvisorAnalysis This patch introduces the priority analysis and the priority advisor, the default implementation, and the scaffolding for introducing the other implementations of the advisor. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D132835	2022-09-08 07:50:03 -07:00
Craig Topper	7c99bf800f	[X86] Pre-commit test for PR57576. NFC	2022-09-07 21:04:51 -07:00
Marco Elver	343700358f	[AsmPrinter] Emit PCs into requested PCSections Interpret MD_pcsections in AsmPrinter emitting the requested metadata to the associated sections. Functions and normal instructions are handled. Differential Revision: https://reviews.llvm.org/D130879	2022-09-07 11:36:02 +02:00
Xiang1 Zhang	c836ddaf72	[X86][NFC] Refine load/store reg to StackSlot for extensibility Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D133078	2022-09-07 14:35:42 +08:00
Xiang1 Zhang	16743c9534	[CodeGen] Limit building time in CodeGenPrepare for huge function Details: Currently CodeGenPrepare is very time consuming in handling big functions. Old Algorithm : It iterate each BB in function, and go on handle very instructions in BB. Due to some instruction optimizations may affect the BBs' dominate tree. The old logic will re-iterate and try optimize for each BB. Suppose we have a big function with 20000 BBs, If we handled the last BB with fine tuning the dominate tree. We need totally re-iterate and try optimize the 20000 BBs from the beginning. The Complex is near N! And we really encounter somes big tests (> 20000 BBs) that cost more than 30 mins in this pass. (Debug version compiler will cost 2 hours here) What this patch do for huge function ? It mainly changes the iteration way for optimization. 1 We do optimizeBlock for each BB (that is same with old way). And, in the meaning time, If BB is changed/updated in the optimization, it will be put into FreshBBs (try do optimizeBlock again). The new created BB at previous iteration will also put into FreshBBs. 2 For the BBs which not updated at previous iteration, we directly skip it. Strictly speaking, here may miss some opportunity, but the probability is very small. 3 For Instructions in single BB, we do optimizeInst for each instruction. If optimizeInst change the instruction dominator in this BB, rather than break and go back to optimize the first BB (the old way), we directly iterate instructions (to do optimizeInst) in this updated BB again (the new way). What this patch do for small/normal (not huge) function ? It is same with the Old Algorithm. (NFC) Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D129352	2022-09-07 10:05:40 +08:00
Markus Böck	f049b2c3fc	[MC] Emit Stackmaps before debug info This patch is essentially an alternative to https://reviews.llvm.org/D75836 and was mentioned by @lhames in a comment. The gist of the issue is that Mach-O has restrictions on which kind of sections are allowed after debug info has been emitted, which is also properly asserted within LLVM. Problem is that stack maps are currently emitted as one of the last sections in each target-specific AsmPrinter so far, which would cause the assertion to trigger. The current approach of special casing for the `__LLVM_STACKMAPS` section is not viable either, as downstream users can overwrite the stackmap format using plugins, which may want to use different sections. This patch fixes the issue by emitting the stack map earlier, right before debug info is emitted. The way this is implemented is by taking the choice when to emit the StackMap away from the target AsmPrinter and doing so in the base class. The only disadvantage of this approach is that the `StackMaps` member is now part of the base class, even for targets that do not support them. This is functionaly not a problem however, as emitting an empty `StackMaps` is a no-op. Differential Revision: https://reviews.llvm.org/D132708	2022-09-06 20:20:56 +02:00
Matthias Gehre	2090e85fee	[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64 This adds the ExpandLargeDivRem to the default pass pipeline. The limit at which it expands div/rem instructions is configured via a new TargetTransformInfo hook (default: no expansion) X86, Arm and AArch64 backends implement this hook to expand div/rem instructions with more than 128 bits. Differential Revision: https://reviews.llvm.org/D130076	2022-09-06 15:32:04 +01:00
Benjamin Kramer	c349d7f4ff	[SelectionDAG] Rewrite bfloat16 softening to use the "half promotion" path The main difference is that this preserves intermediate rounding steps, which the other route doesn't. This aligns bfloat16 more with half floats, which use this path on most targets. I didn't understand what the difference was between these softening approaches when I first added bfloat lowerings, would be nice if we only had one of them. Based on @pengfei 's D131502 Differential Revision: https://reviews.llvm.org/D133207	2022-09-06 11:54:34 +02:00
Freddy Ye	d5fa8b1c2c	[X86] Support SAE for VCVTPS2PH from intrinsic. For now, clang and gcc both failed to generate sae version from _mm512_cvt_roundps_ph: https://godbolt.org/z/oh7eTGY5z. Intrinsic guide description is also wrong, which will be update soon. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D132641	2022-09-06 11:28:12 +08:00
Craig Topper	7927c4c5ce	[X86] Add test cases for PR57549. NFC	2022-09-05 13:12:18 -07:00
Simon Pilgrim	4e6783f866	[DAG] getFreeze()/getNode() - account for operand depth when calling isGuaranteedNotToBeUndefOrPoison (PR57554) Similar to #57402 - we were calling isGuaranteedNotToBeUndefOrPoison on the freeze operand (with Depth = 0), but wasn't accounting for the fact that a later isGuaranteedNotToBeUndefOrPoison assertion will call from the new node (with Depth = 0 as well) - which will then recursively call isGuaranteedNotToBeUndefOrPoison for its operands with Depth = 1 Fixes #57554	2022-09-05 11:46:46 +01:00
Craig Topper	0d1d36cfa6	[X86] Pre-commit tests for D130862. NFC	2022-09-04 21:19:01 -07:00
Simon Pilgrim	62cdfdab4d	[DAG] canCreateUndefOrPoison - add freeze(insert_subvector(x,y,c)) -> insert_subvector(freeze(x),freeze(y),c) support We already have plenty of assertions in place to ensure that the insertion index is constant and inrange	2022-09-03 13:41:33 +01:00
Simon Pilgrim	3968844bff	[X86] Add test showing failure to fold freeze(insert_subvector(x,y,c)) -> insert_subvector(freeze(x),freeze(y),c) If at least one of x and y are known never poison.	2022-09-03 13:27:08 +01:00
Nikita Popov	5134bd432f	[DwarfEhPrepare] Assign dummy debug location for inserted _Unwind_Resume calls (PR57469) DwarfEhPrepare inserts calls to _Unwind_Resume into landing pads. If _Unwind_Resume happens to be defined in the same module and debug info is used, then this leads to a verifier error: inlinable function call in a function with debug info must have a !dbg location call void @_Unwind_Resume(ptr %exn.obj) #0 Fix this by assigning a dummy location to the call. (As this happens in the backend, inlining is not actually relevant here.) Fixes https://github.com/llvm/llvm-project/issues/57469. Differential Revision: https://reviews.llvm.org/D133095	2022-09-01 16:35:49 +02:00
Nick Desaulniers	d7474bef77	[llvm][TailDuplicator] don't taildup isInlineAsmBrIndirectTargets This fixes a crash observed after https://reviews.llvm.org/D129997. Similar to D88823. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D130127	2022-08-31 13:07:10 -07:00

1 2 3 4 5 ...

18345 Commits