llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	40abb28f61	RegAllocGreedy: Fix subranges when rematerializing dead subreg defs This would create a new interval missing the subrange and hit this verifier error: * Bad machine code: Live interval for subreg operand has no subranges * - function: test_remat_subreg_def - basic block: %bb.0 (0xa568758) [0B;128B) - instruction: 32B dead undef %4.sub0:vreg_64 = V_MOV_B32_e32 2, implicit $exec	2022-07-24 11:51:59 -04:00
Simon Pilgrim	562ee7cc5f	[DAG] visitSMUL_LOHI/visitUMUL_LOHI - ensure we canonicalize constants to the RHS	2022-07-24 16:09:56 +01:00
Simon Pilgrim	428c0f2adc	[DAG] getNode - assert that SMUL_LOHI/UMUL_LOHI nodes have the correct ops + types	2022-07-24 15:30:57 +01:00
Simon Pilgrim	0708771cce	[DAG] MaskedVectorIsZero - don't bother with (-1).isSubsetOf mask check. NFC. Just use KnownBits::isZero() to ensure all the bits are known zero.	2022-07-24 13:12:21 +01:00
Simon Pilgrim	e82d49bfed	[DAG] SimplifyMultipleUseDemandedBits - early-out for any scalable vector types Noticed while working to remove SelectionDAG::GetDemandedBits - we were relying on the callers to have already bailed for scalable vectors	2022-07-24 12:59:43 +01:00
Simon Pilgrim	a3e38b4a20	[DAG] SimplifyDemandedVectorElts - if every and/mul element-pair has a zero/undef then just constant fold to zero	2022-07-24 12:00:31 +01:00
Kazu Hirata	7bfa06f6c0	[CodeGen] Use range-based for loops (NFC)	2022-07-23 16:10:46 -07:00
Simon Pilgrim	ac8be21365	[DAG] isSplatValue - don't attempt to merge any BITCAST sub elements if they contain UNDEFs We still haven't found a solution that correctly handles 'don't care' sub elements properly - given how close it is to the next release branch, I'm making this fail safe change and we can revisit this later if we can't find alternatives. NOTE: This isn't a reversion of D128570 - it's the removal of undef handling across bitcasts entirely Fixes #56520	2022-07-23 18:38:48 +01:00
Dmitri Gribenko	aba43035bd	Use llvm::sort instead of std::sort where possible llvm::sort is beneficial even when we use the iterator-based overload, since it can optionally shuffle the elements (to detect non-determinism). However llvm::sort is not usable everywhere, for example, in compiler-rt. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D130406	2022-07-23 15:19:05 +02:00
Simon Pilgrim	5f89d2bae9	[DAG] Move OR(AND(X,C1),AND(OR(X,Y),C2)) -> OR(AND(X,OR(C1,C2)),AND(Y,C2)) fold to SimplifyDemandedBits This will fix the SystemZ v3i31 memcpy regression in D77804 (with the help of D129765 as well....). It should also allow us to /bend/ the oneuse limitation for cases where we can use demanded bits to safely peek though multiple uses of the AND ops.	2022-07-23 13:17:24 +01:00
Simon Pilgrim	6aff1b7b3c	[DAG] SimplifyDemandedBits - pull out repeated getValueType() calls. NFC.	2022-07-23 12:01:54 +01:00
Simon Pilgrim	2421a5af72	[DAG] ExpandIntRes_ADDSUB - create UADDO/USUBO instead of ADDCARRY/SUBCARRY if overflow is known to be zero As noticed on D127115, when splitting ADD/SUB nodes we often end up with cases where overflow from the lower bits is impossible - in such cases we're better off breaking the carry chain dependency as soon as possible. This path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen diff without a topological worklist.	2022-07-23 11:13:44 +01:00
Simon Pilgrim	8937252465	[DAG] computeKnownBits - add basic shift-by-parts handling Concat KnownBits from ISD::SHL_PARTS / ISD::SRA_PARTS / ISD::SRL_PARTS lo/hi operands and perform the KnownBits calculation by the shift amount on the extended type, before splitting the KnownBits based on the requested lo/hi result.	2022-07-23 09:46:30 +01:00
ARCHIT SAXENA	3bb1ce2319	Add a nop instruction if a section starts with landing pad for function splitter This change adds a nop instruction if section starts with landing pad. This change is like [D73739](https://reviews.llvm.org/D73739) which avoids zero offset landing pad in basic block sections. Detailed description: The current machine functions splitter can create ˜sections which start with a landing pad themselves. This places landing pad at offset zero from LPStart. ``` .section .text.split.foo10,"ax",@progbits foo10.cold: # %lpad .cfi_startproc .cfi_personality 3, __gxx_personality_v0 .cfi_lsda 3, .Lexception5 .cfi_def_cfa %rsp, 16 .Ltmp11: <--- This is a Landing pad and also LP Start as it is start of this section movq %rax, %rdi <--- first instruction is at offest 0 from LPStart callq _Unwind_Resume@PLT ``` This will cause landing pad entries to become zero (.Ltmp11-foo10.cold) ``` .Lcst_begin4: .uleb128 .Ltmp9-.Lfunc_begin2 # >> Call Site 1 << .uleb128 .Ltmp10-.Ltmp9 # Call between .Ltmp9 and .Ltmp10 .uleb128 .Ltmp11-foo10.cold <---This is zero # jumps to .Ltmp11 .byte 3 # On action: 2 .uleb128 .Ltmp10-.Lfunc_begin2 # >> Call Site 2 << .uleb128 .Lfunc_end9-.Ltmp10 # Call between .Ltmp10 and .Lfunc_end9 .byte 0 # has no landing pad .byte 0 # On action: cleanup .p2align 2 ``` The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. This change adds a nop instruction at start of such sections so that such a case could be avoided. Output: ``` .section .text.split.foo10,"ax",@progbits foo10.cold: # %lpad .cfi_startproc .cfi_personality 3, __gxx_personality_v0 .cfi_lsda 3, .Lexception5 .cfi_def_cfa %rsp, 16 nop <--- new instruction that is added .Ltmp11: movq %rax, %rdi callq _Unwind_Resume@PLT ``` Reviewed By: modimo, snehasish, rahmanl Differential Revision: https://reviews.llvm.org/D130133	2022-07-22 15:20:10 -07:00
Craig Topper	be208b40c1	[DAGCombiner] Simplify code around call to reduceLoadWidth in visitAND. NFC We were looking for loads or any_extend+load. reduceLoadWidth hasn't known how to look through such an any_extend to find the load since D40667 almost 5 years ago. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130333	2022-07-22 08:36:56 -07:00
Nikita Popov	c2be703c6c	[AsmPrinter] Move lowerConstant() error code out of switch (NFC) Move this out of the switch, so that different branches can indicate an error by breaking out of the switch. This becomes important if there are more than the two current error cases.	2022-07-22 16:08:28 +02:00
Cullen Rhodes	bf268a05cd	[AArch64] Emit vector FP cmp when LE is used with fast-math Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130093	2022-07-22 07:53:55 +00:00
jacquesguan	e60eb7053d	recommit "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat." With fix for AArch64 and Hexgon test cases.	2022-07-21 17:34:34 +08:00
David Green	23d6186be0	[SelectionDAG] Fix fptoi.sat scalable vector lowering Vector fptosi_sat and fptoui_sat were being expanded by unrolling the vector operation. This doesn't work for scalable vector, so this patch adds a call to TLI.expandFP_TO_INT_SAT if the vector is scalable. Scalable tests are added for AArch64 and RISCV. Some of the AArch64 fptoi_sat operations should be legal, but that will be handled in another patch. Differential Revision: https://reviews.llvm.org/D130028	2022-07-21 08:00:22 +01:00
esmeyi	339392ecf2	[AIX] follow-up of D124654. Emitting the remaining aliases instead of reporting an error to avoid SPEC2017 PEAK failures. And mark this as a TODO.	2022-07-21 01:10:09 -04:00
Simon Pilgrim	029e83b401	[DAG] getNode - don't bother creating ADDO(X,0) or SUBO(X,0) nodes. Similar to what we already do in getNode for basic ADD/SUB nodes, return the X operand directly, but here we know that there will be no/zero overflow as well. As noted on D127115 - this path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen without a topological worklist.	2022-07-20 12:04:33 +01:00
Simon Pilgrim	766cd95481	[DAG] getNode - assert that ADDO/SUBO nodes have the correct ops + types	2022-07-20 11:23:58 +01:00
Simon Pilgrim	9fc347aa4e	[DAG] PromoteIntRes_BUILD_VECTOR - extend constant boolean vectors according to target BooleanContents PromoteIntRes_BUILD_VECTOR currently always ANY_EXTENDs build vector operands, but if this is a constant boolean vector we're losing the useful ability to keep the vector matching the BooleanContents mode used by the target. This patch extends constant boolean vectors according to target BooleanContents, allowing a number of additional all-bits folds (notable XOR -> NOT conversions) to occur. Differential Revision: https://reviews.llvm.org/D129641	2022-07-20 10:49:31 +01:00
Lorenzo Albano	07d69d9fc9	[VP] Legalize the stride operand for EXPERIMENTAL_VP_STRIDED SDNodes Add promotion and expansion of integer operands for experimental_vp_strided SelectionDAG nodes; the expansion is actually just a truncation of the stride operand. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D123112	2022-07-20 10:22:43 +02:00
Kazu Hirata	76e18cc4f6	[llvm] Use llvm::any_of and llvm::none_of (NFC)	2022-07-20 00:36:19 -07:00
Kazu Hirata	0387da6f4f	Use value instead of getValue (NFC)	2022-07-19 21:18:26 -07:00
Kazu Hirata	41ae78ea3a	Use has_value instead of hasValue (NFC)	2022-07-19 20:15:44 -07:00
Kazu Hirata	bbbb4393ee	[CodeGen] Use value_or instead of getValueOr (NFC)	2022-07-19 19:50:43 -07:00
David Truby	4c82f56d8f	[llvm][SVE] Remove redundant and when comparing against extending load When determining if an `and` should be merged into an extending load the constant argument to the `and` is currently not checked if the argument requires truncation. This prevents the combine happening when the vector width is half the normal available vector width for SVE VLA vectors. Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D129281	2022-07-19 17:08:32 +01:00
Simon Pilgrim	71c502cbca	[DAG] Call SimplifyDemandedBits from ISD::MUL nodes Noticed while triaging D129765.	2022-07-19 14:11:04 +01:00
Benjamin Kramer	8aff88fd3a	[LegalizeDAG] Propagate alignment in ExpandExtractFromVectorThroughStack Unlike the name suggests this can reuse any store as a base for a memory-based vector extract. If that store is underaligned the loads created to extract will have an invalid alignment. Since most CPUs are forgiving wrt alignment this is almost never an issue, on x86 this is only reproducible by extracting a 128 bit vector out of a wider vector. I tried making a test case in the context of https://reviews.llvm.org/D127982 but it's really really fragile, as the output pretty much looks like a missed optimization.	2022-07-19 13:13:55 +02:00
Simon Pilgrim	0f6b0461b0	[DAG] SimplifyDemandedBits - relax "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" to match only demanded bits The "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold is currently limited to the XOR mask being a shifted all-bits mask, but we can relax this to only need to match under the demanded bits. This helps expose more bit extraction/clearing patterns and fixes the PowerPC testCompares*.ll regressions from D127115 Alive2: https://alive2.llvm.org/ce/z/fl7T7K Differential Revision: https://reviews.llvm.org/D129933	2022-07-19 10:59:07 +01:00
Max Kazantsev	69b284aaf6	Revert "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat." This reverts commit `58dfaaaace`. Massive AARCH test failures in buildbot.	2022-07-19 13:41:52 +07:00
jacquesguan	58dfaaaace	[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat. This revision supports to scalarize a binary operation of two scalable splat vectors. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122791	2022-07-19 11:20:51 +08:00
Matt Arsenault	8d0383eb69	CodeGen: Remove AliasAnalysis from regalloc This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable. Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy. Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.	2022-07-18 17:23:41 -04:00
Jay Foad	dbed4326dd	[LiveIntervals] Find better anchoring end points when repairing ranges r175673 changed repairIntervalsInRange to find anchoring end points for ranges automatically, but the calculation of Begin included the first instruction found that already had an index. This patch changes it to exclude that instruction: 1. For symmetry, so that the half open range [Begin,End) only includes instructions that do not already have indexes. 2. As a possible performance improvement, since repairOldRegInRange will scan fewer instructions. 3. Because repairOldRegInRange hits assertion failures in some cases when it sees a def that already has a live interval. (3) fixes about ten tests in the CodeGen lit test suite when -early-live-intervals is forced on. Differential Revision: https://reviews.llvm.org/D110182	2022-07-18 19:34:43 +01:00
Itay Bookstein	2570f226d1	[SDAG] Remove single-result restriction on commutative CSE The DAG Combiner unnecessarily restricts commutative CSE to nodes with a single result value. This commit removes that restriction. Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D129666	2022-07-18 19:19:13 +03:00
Lorenzo Albano	c00a44fa68	[VP] IR expansion pass for VP gather and scatter Add vp_gather and vp_scatter expansion to unpredicated intrinsics. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D120664	2022-07-18 17:00:38 +02:00
Nikita Popov	56b4b6e81b	[SDAG] Fix release build This variable was only declared in debug builds, but is needed in release builds as well.	2022-07-18 14:10:31 +02:00
Max Kazantsev	d693fd29f1	[Verifier] Make Verifier recognize undef tokens as correct IR Undef tokens may appear in unreached code as result of RAUW of some optimization, and it should not be considered as bad IR. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D128904 Reviewed By: mkazantsev	2022-07-18 16:26:06 +07:00
Lorenzo Albano	f390781cec	[VP] Implementing expansion pass for VP load and store. Added function to the ExpandVectorPredication pass to handle VP loads and stores. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D109584	2022-07-18 08:47:54 +02:00
Craig Topper	7fa1c32634	[CodeGen] Remove unnecessary APInt copy. NFC	2022-07-17 23:41:53 -07:00
Craig Topper	a55ff6aadd	[Support][CodeGen] Fix spelling Divison->Division. NFC	2022-07-17 23:16:29 -07:00
Craig Topper	795602af0c	[CodeGen] Don't compare bool with integer 0. NFC The IsAdd field is a bool.	2022-07-17 23:16:14 -07:00
Kazu Hirata	3112987d5c	Remove unused forward declarations (NFC)	2022-07-17 15:37:48 -07:00
Simon Pilgrim	53b90dd372	[DAG] Fold (or (and X, C1), (and (or X, Y), C2)) -> (or (and X, C1\|C2), (and Y, C2)) Pulled out of D77804 Alive2: https://alive2.llvm.org/ce/z/g61VRe	2022-07-17 18:51:41 +01:00
Simon Pilgrim	26ce33706f	[DAG] computeKnownBits - move UDIV handling to same place as UREM/SREM. NFC.	2022-07-17 11:59:42 +01:00
Simon Pilgrim	5ec47c6dc5	[DAG] Add MERGE_VALUE computeKnownBits/ComputeNumSignBits handling. Just forward the value tracking to the operand specified by the ResNo	2022-07-17 11:58:08 +01:00
Kazu Hirata	9e6d1f4b5d	[CodeGen] Qualify auto variables in for loops (NFC)	2022-07-17 01:33:28 -07:00
Kazu Hirata	c0fe37de04	[CodeGen] Remove redundant declaration createGreedyRegisterAllocator (NFC) The function is declared in llvm/include/llvm/CodeGen/Passes.h. Identified with readability-redundant-declaration.	2022-07-16 15:43:34 -07:00
Kazu Hirata	4d9d07c5fb	[CodeGen] Use RegClassFilterFunc where appropriate (NFC)	2022-07-16 15:43:33 -07:00
Sanjay Patel	7ca3e23f25	[SDAG] narrow truncated sign_extend_inreg trunc (sign_ext_inreg X, iM) to iN --> sign_ext_inreg (trunc X to iN), iM There are improvements on existing tests from this, and there are a pair of large regressions in D127115 for Thumb2 caused by not folding this pattern. Differential Revision: https://reviews.llvm.org/D129890	2022-07-16 16:29:15 -04:00
Simon Pilgrim	a44bdf9bc1	[DAG] visitINSERT_VECTOR_ELT - refactor BUILD_VECTOR creation from INSERT_VECTOR_ELT chain. D127595 added the ability to recurse up a (one-use) INSERT_VECTOR_ELT chain to create a BUILD_VECTOR before other combines manage to break the chain, something that is particularly bad in D127115. The patch generalises this so it doesn't have to build the chain starting from the last element insertion, instead it can now start from any insertion and will recurse up the chain until it finds all elements or finds a UNDEF/BUILD_VECTOR/SCALAR_TO_VECTOR which represents that start of the chain. Fixes several regressions in D127115	2022-07-16 16:37:31 +01:00
Simon Pilgrim	52b6168c16	[DAG] visitINSERT_VECTOR_ELT - remove duplicate VT.getVectorNumElements() call. NFC.	2022-07-16 16:20:49 +01:00
Tim Besard	a323dfc015	Don't sink ptrtoint/inttoptr sequences into non-noop addrspacecasts. In https://reviews.llvm.org/D30114, support for mismatching address spaces was introduced to CodeGenPrepare's optimizeMemoryInst, using addrspacecast as it was argued that only no-op addrspacecasts would be considered when constructing the address mode. However, by doing inttoptr/ptrtoint, it's possible to get CGP to emit an addrspace that's not actually no-op, introducing a miscompilation: define void @kernel(i8* %julia_ptr) { %intptr = ptrtoint i8* %julia_ptr to i64 %ptr = inttoptr i64 %intptr to i32 addrspace(3)* br label %end end: store atomic i32 1, i32 addrspace(3)* %ptr unordered, align 4 ret void } Gets compiled to: define void @kernel(i8* %julia_ptr) { end: %0 = addrspacecast i8* %julia_ptr to i32 addrspace(3)* store atomic i32 1, i32 addrspace(3)* %0 unordered, align 4 ret void } In the case of NVPTX, this introduces a cvta.to.shared, whereas leaving out the %end block and branch doesn't trigger this optimization. This results in illegal memory accesses as seen in https://github.com/JuliaGPU/CUDA.jl/issues/558 In this change, I introduced a check before doing the pointer cast that verifies address spaces are the same. If not, it emits a ptrtoint/inttoptr combination to get a no-op cast between address spaces. I decided against disallowing ptrtoint/inttoptr with non-default AS in matchOperationAddr, because now its still possible to look through multiple sequences of them that ultimately do not result in a address space mismatch (i.e. the second lit test).	2022-07-16 10:56:42 -04:00
Simon Pilgrim	2bb6b03d71	Fix signed/unsigned mismatch	2022-07-16 11:48:41 +01:00
Simon Pilgrim	a5d0122f75	[DAG] Canonicalize non-inlane shuffle -> AND if all non-inlane referenced elements are known zero As mentioned on D127115, this patch that attempts to recognise shuffle masks that could be simplified to a AND mask - we already have a similar transform that will fold AND -> 'clear mask' shuffle, but this patch handles cases where the referenced elements are not from the same lane indices but are known to be zero. Differential Revision: https://reviews.llvm.org/D129150	2022-07-16 11:38:24 +01:00
Simon Pilgrim	1cb7416ee3	[DAG] combineShiftAnd1ToBitTest - match "and (srl (not X), C)), 1 --> (and X, 1<<C) == 0" patterns combineShiftAnd1ToBitTest already matches "and (not (srl X, C)), 1 --> (and X, 1<<C) == 0" patterns, but we can end up with situations where the not is before the shift. Part of some yak shaving for D127115 to generalise the "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold.	2022-07-16 11:00:07 +01:00
Kazu Hirata	1a5d007659	Use has_value/value instead of hasValue/getValue (NFC)	2022-07-15 21:48:17 -07:00
Simon Pilgrim	3c8bf29696	[DAG] Move "xor (X logical_shift ShiftC), XorC --> (not X) logical_shift ShiftC" fold into SimplifyDemandedBits SimplifyDemandedBits is called slightly later which allows the not(sext(x)) -> sext(not(x)) fold to occur via foldLogicOfShifts As mentioned on D127115, we should be able to further generalise this based off the demanded bits.	2022-07-15 13:10:15 +01:00
Edd Barrett	2e62a26fd7	[stackmaps] Legalise patchpoint arguments. This is similar to D125680, but for llvm.experimental.patchpoint (instead of llvm.experimental.stackmap). Differential review: https://reviews.llvm.org/D129268	2022-07-15 12:01:59 +01:00
Nikita Popov	2a721374ae	[IR] Don't use blockaddresses as callbr arguments Following some recent discussions, this changes the representation of callbrs in IR. The current blockaddress arguments are replaced with `!` label constraints that refer directly to callbr indirect destinations: ; Before: %res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo)) to label %asm.fallthrough [label %foo] ; After: %res = callbr i8* asm "", "=r,r,!i"(i8* %x) to label %asm.fallthrough [label %foo] The benefit of this is that we can easily update the successors of a callbr, without having to worry about also updating blockaddress references. This should allow us to remove some limitations: * Allow unrolling/peeling/rotation of callbr, or any other clone-based optimizations (https://github.com/llvm/llvm-project/issues/41834) * Allow duplicate successors (https://github.com/llvm/llvm-project/issues/45248) This is just the IR representation change though, I will follow up with patches to remove limtations in various transformation passes that are no longer needed. Differential Revision: https://reviews.llvm.org/D129288	2022-07-15 10:18:17 +02:00
Craig Topper	dcfc1fd26f	[SelectionDAG][RISCV][AMDGPU][ARM] Improve SimplifyDemandedBits for SHL with variable shift amount. If we have a variable shift amount and the demanded mask has leading zeros, we can propagate those leading zeros to not demand those bits from operand 0. This can allow zero_extend/sign_extend to become any_extend. This pattern can occur due to C integer promotion rules. This transform is already done by InstCombineSimplifyDemanded.cpp where sign_extend can be turned into zero_extend for example. Reviewed By: spatel, foad Differential Revision: https://reviews.llvm.org/D121833	2022-07-14 16:10:14 -07:00
Amara Emerson	d4f84df0a0	[GlobalISel] Change widenScalar of G_FCONSTANT to mutate into G_CONSTANT. Widening a G_FCONSTANT by extending and then generating G_FPTRUNC doesn't produce the same result all the time. Instead, we can just transform it to a G_CONSTANT of the same bit pattern and truncate using a plain G_TRUNC instead. Fixes https://github.com/llvm/llvm-project/issues/56454 Differential Revision: https://reviews.llvm.org/D129743	2022-07-14 11:05:10 -07:00
Guozhi Wei	2f11b3a6d7	[MachineCombiner] Don't compute the latency of transient instructions If an MI will not generate a target instruction, we should not compute its latency. Then we can compute more precise instruction sequence cost, and get better result. Differential Revision: https://reviews.llvm.org/D129615	2022-07-14 17:08:14 +00:00
Nikita Popov	dcf4b733ef	[SCEVExpander] Make CanonicalMode handing in isSafeToExpand() more robust (PR50506) isSafeToExpand() for addrecs depends on whether the SCEVExpander will be used in CanonicalMode. At least one caller currently gets this wrong, resulting in PR50506. Fix this by a) making the CanonicalMode argument on the freestanding functions required and b) adding member functions on SCEVExpander that automatically take the SCEVExpander mode into account. We can use the latter variant nearly everywhere, and thus make sure that there is no chance of CanonicalMode mismatch. Fixes https://github.com/llvm/llvm-project/issues/50506. Differential Revision: https://reviews.llvm.org/D129630	2022-07-14 14:41:51 +02:00
Jannik Silvanus	e5c4cde451	[AMDGPU] SIMachineScheduler: Add support for several MachineScheduler features The SI machine scheduler inherits from ScheduleDAGMI. This patch adds support for a few features that are implemented in ScheduleDAGMI (or its base classes) that were missing so far because their support is implemented in overridden functions. * Support cl::opt -view-misched-dags This option allows to open a graphical window of the scheduling DAG. * Support cl::opt -misched-print-dags This option allows to print the scheduling DAG in text form. * After constructing the scheduling DAG, call postprocessDAG() to apply any registered DAG mutations. Note that currently there are no mutations defined in AMDGPUTargetMachine.cpp in case SIScheduler is used. Still add this to avoid surprises in the future in case mutations are added. Differential Revision: https://reviews.llvm.org/D128808	2022-07-14 09:45:31 +02:00
Kazu Hirata	611ffcf4e4	[llvm] Use value instead of getValue (NFC)	2022-07-13 23:11:56 -07:00
Amara Emerson	2824bdd92f	[GlobalISel] Fix and(load)->zextload combine crash. We shouldn't use getOpcodeDef() if we need to guarantee the def has only one user since under the hood it may look through copies and optimization hints, which themselves may have multiple users.	2022-07-13 14:58:45 -07:00
Philip Reames	dde2a7fb6d	[RISCV] Exploit fact that vscale is always power of two to replace urem sequence When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale. vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.) We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.) It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic. Differential Revision: https://reviews.llvm.org/D129609	2022-07-13 10:54:47 -07:00
Simon Pilgrim	d172842b51	[DAG] SimplifyDemandedVectorElts - adjust demanded elements for selection mask for known zero results If an element is known zero from both selections then it shouldn't matter what the selection mask element is.	2022-07-13 17:36:05 +01:00
Philip Reames	fd67992f9c	[DAGCombine] fold (urem x, (lshr pow2, y)) -> (and x, (add (lshr pow2, y), -1)) We have the same fold in InstCombine - though implemented via OrZero flag on isKnownToBePowerOfTwo. The reasoning here is that either a) the result of the lshr is a power-of-two, or b) we have a div-by-zero triggering UB which we can ignore. Differential Revision: https://reviews.llvm.org/D129606	2022-07-13 08:34:38 -07:00
esmeyi	100319cdb4	[AIX] follow-up of D124654. Report an error when alias symbols are not emitted all.	2022-07-13 03:39:08 -04:00
Kai Nacke	4ae254e488	Revert "[GISel] Unify use of getStackGuard" This reverts commit `e60b4fb2b7`.	2022-07-12 17:00:43 -04:00
Kai Nacke	e60b4fb2b7	[GISel] Unify use of getStackGuard Some rework of getStackGuard() based on comments in https://reviews.llvm.org/D129505. - getStackGuard() now creates and returns the destination register, simplifying calls - the pointer type is passed to getStackGuard() to avoid recomputation - removed PtrMemTy in emitSPDescriptorParent(), because this type is only used here when loading the value but not when storing the value Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129576	2022-07-12 16:46:37 -04:00
Craig Topper	8eaf00e04d	[TargetLowering][RISCV] Make expandCTLZ work for non-power of 2 types. To convert CTLZ to popcount we do x = x \| (x >> 1); x = x \| (x >> 2); ... x = x \| (x >>16); x = x \| (x >>32); // for 64-bit input return popcount(~x); This smears the most significant set bit across all of the bits below it then inverts the remaining 0s and does a population count. To support non-power of 2 types, the last shift amount must be more than half of the size of the type. For i15, the last shift was previously a shift by 4, with this patch we add another shift of 8. Fixes PR56457. Differential Revision: https://reviews.llvm.org/D129431	2022-07-12 11:36:37 -07:00
Kai Nacke	42f7364fcb	[GISel] Check useLoadStackGuardNode() before generating LOAD_STACK_GUARD When lowering llvm::stackprotect intrinsic, the SDAG implementation checks useLoadStackGuardNode() to either create a LOAD_STACK_GUARD or use the first argument of the intrinsic. This check is not present in the IRTranslator, which results in always generating a LOAD_STACK_GUARD even if the target does not support it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129505	2022-07-12 11:44:42 -04:00
Simon Pilgrim	ded62411f7	[DAG] SimplifyDemandedBits - AND/OR/XOR - attempt basic knownbits simplifications before calling SimplifyMultipleUseDemandedBits Noticed while investigating the SystemZ regressions in D77804, prefer handling the knownbits analysis/simplification in the bitop nodes directly before falling back to SimplifyMultipleUseDemandedBits	2022-07-12 14:09:00 +01:00
Jay Foad	0d1b5268e8	[MachineVerifier] Try harder to verify LiveStacks Verify the LiveStacks analysis after a pass that claims to preserve it, even if there are no further passes (apart from the verifier itself) that would use the analysis. Differential Revision: https://reviews.llvm.org/D129200	2022-07-12 09:54:54 +01:00
Nikita Popov	c64aba5d93	[SDAG] Don't duplicate ParseConstraints() implementation SDAGBuilder (NFCI) visitInlineAsm() in SDAGBuilder was duplicating a lot of the code in ParseConstraints(), in particular all the logic to determine the operand value and constraint VT. Rely on the data computed by ParseConstraints() instead, and update its ConstraintVT implementation to match getCallOperandValEVT() more precisely.	2022-07-12 10:42:02 +02:00
Craig Topper	b05160dbdf	[SelectionDAG] Simplify how we drop poison flags in SimplifyDemandedBits. As far as I can tell what was happening in the original code is that the getNode call receives the same operands as the original node with different SDNodeFlags. The logic inside getNode detects that the node already exists and intersects the flags into the existing node and returns it. This results in Op and NewOp for the TLO.CombineTo call always being the same node. We may have already called CombineTo as part of the recursive handling. A second call to CombineTo as we unwind the recursion overwrites the previous CombineTo. I think this means any time we updated the poison flags that was the only change that ends up getting made and we relied on DAGCombiner to revisit and call SimplifyDemandedBits again. The second time the poison flags wouldn't need to be dropped and we would keep the CombineTo call from further down the recursion. We can instead call setFlags to drop the poison flags and remove the call to TLO.CombineTo. This way we keep the CombineTo from deeper in the recursion which should be more efficient. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D129511	2022-07-11 13:42:33 -07:00
Sanjay Patel	d0eec5f7e7	[SDAG] enhance sub->xor fold to ignore signbit As suggested in the post-commit feedback for D128123, we can ease the mask constraint to ignore the MSB (and make the code easier to read by adjusting the check). https://alive2.llvm.org/ce/z/bbvqWv	2022-07-11 12:37:50 -04:00
Mircea Trofin	24c6c35270	[mlgo] Don't provide default model URLs Pointed out in Issue #56432: the current reference models may not be quite friendly to open source projects. Their purpose is only illustrative - the expectation is that projects would train their own. To avoid unintentionally pulling such a model, made the URL cmake setting require explicit user setting. Differential Revision: https://reviews.llvm.org/D129342	2022-07-11 07:37:14 -07:00
Stephen Tozer	f9ac161af9	[DebugInfo][InstrRef] Fix error in copy handling in InstrRefLDV Currently, an error exists when InstrRefBasedLDV observes transfers of variables across copies, which causes it to lose track of variables under certain circumstances, resulting in shorter lifetimes for those variables as LDV gives up searching for live locations for them. This patch fixes this issue by storing the currently tracked values in the destination first, then updating them manually later without clobbering or assigning them the wrong value. Differential Revision: https://reviews.llvm.org/D128101	2022-07-11 13:38:23 +01:00
Kazu Hirata	5b55b7f6d2	[CodeGen] Remove unused member variable NextCascade (NFC)	2022-07-10 18:57:40 -07:00
Kazu Hirata	1fd6611fc8	[SelectionDAG] Restore calls to has_value (NFC) This patch restores calls to has_value to make it clear that we are checking the presence of an optional value, not the underlying value. This patch partially reverts `d08f34b592`. Differential Revision: https://reviews.llvm.org/D129454	2022-07-10 14:37:23 -07:00
David Green	28b41237e6	[InterleaveAccessPass] Handle multi-use binop shuffles D89489 added some logic to the interleaved access pass to attempt to undo the folding of shuffles into binops, that instcombine performs. If early-cse is run too, the binops may be commoned into a single operation with multiple shuffle uses. It is still profitable reverse the transform though, so long as all the uses are shuffles. Differential Revision: https://reviews.llvm.org/D129419	2022-07-10 17:24:37 +01:00
Nicolai Hähnle	ede600377c	ManagedStatic: remove many straightforward uses in llvm (Reapply after revert in `e9ce1a5880` due to Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other than error categories, to be checked in more detail and reapplied separately.) Bulk remove many of the more trivial uses of ManagedStatic in the llvm directory, either by defining a new getter function or, in many cases, moving the static variable directly into the only function that uses it. Differential Revision: https://reviews.llvm.org/D129120	2022-07-10 10:29:15 +02:00
Nicolai Hähnle	e9ce1a5880	Revert "ManagedStatic: remove many straightforward uses in llvm" This reverts commit `e6f1f06245`. Reverting due to a failure on the fuchsia-x86_64-linux buildbot.	2022-07-10 09:54:30 +02:00
Nicolai Hähnle	e6f1f06245	ManagedStatic: remove many straightforward uses in llvm Bulk remove many of the more trivial uses of ManagedStatic in the llvm directory, either by defining a new getter function or, in many cases, moving the static variable directly into the only function that uses it. Differential Revision: https://reviews.llvm.org/D129120	2022-07-10 09:15:08 +02:00
Craig Topper	40866b74bd	[DAGCombiner][X86] Fold sra (sub AddC, (shl X, N1C)), N1C --> sext (sub AddC1',(trunc X to (width - N1C))) We already handled this case for add with a constant RHS. A similar pattern can occur for sub with a constant left hand side. Test cases use add and a mul representing (neg (shl X, C)) because that's what I saw in the wild. The mul will be decomposed and then the new transform can kick in. Tests have not been committed, but this patch shows the changes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D128769	2022-07-09 11:53:44 -07:00
Alexander Yermolovich	a84e1e6c0d	[DWARF] Add linkagename to hash Originally encountered with RUST, but also there are cases with distributed LTO where debug info dwo units contain structurally the same debug information, with difference in DW_AT_linkage_name. This causes collision on DWO ID. Differential Revision: https://reviews.llvm.org/D129317	2022-07-08 10:15:25 -07:00
Matt Arsenault	13ac4c3de9	GlobalISel: Add buildBoolExtInReg helper	2022-07-08 11:55:08 -04:00
Matt Arsenault	e9a45d45d0	GlobalISel: Allow forming atomic/volatile G_SEXTLOAD Mirror the change to G_ZEXTLOAD.	2022-07-08 11:55:08 -04:00
Matt Arsenault	1ee6ce9bad	GlobalISel: Allow forming atomic/volatile G_ZEXTLOAD SelectionDAG has a target hook, getExtendForAtomicOps, which it uses in the computeKnownBits implementation for ATOMIC_LOAD. This is pretty ugly (as is having a separate load opcode for atomics), so instead allow making use of atomic zextload. Enable this for AArch64 since the DAG path defaults in to the zext behavior. The tablegen changes are pretty ugly, but partially helps migrate SelectionDAG from using ISD::ATOMIC_LOAD to regular ISD::LOAD with atomic memory operands. For now the DAG emitter will emit matchers for patterns which the DAG will not produce. I'm still a bit confused by the intent of the isLoad/isStore/isAtomic bits. The DAG implementation rejects trying to use any of these in combination. For now I've opted to make the isLoad checks also check isAtomic, although I think having isLoad and isAtomic set on these makes most sense.	2022-07-08 11:55:08 -04:00
Simon Pilgrim	b53046122f	[DAG] SimplifyDemandedBits - fold AND(INSERT_SUBVECTOR(C,X,I),M) -> INSERT_SUBVECTOR(AND(C,M),X,I) If all the demanded bits of the AND mask covering the inserted subvector 'X' are known to be one, then the mask isn't affecting the subvector at all. In which case, if the base vector 'C' is undef/constant, then move the AND mask up to just (constant) fold it directly. Addresses some of the regressions from D129150, particularly the cases where we're attempting to zero the upper elements of a widened vector. Differential Revision: https://reviews.llvm.org/D129290	2022-07-08 16:08:31 +01:00
Daniil Fukalov	6858a17f66	[LiveIntervals] Fix incorrect range (re)construction from subranges. After D82916 `updateAllRanges()` started to fix holes in main range with subranges but it fails on instructions with two subregs def which are parts of one reg. The main range constructed with //all// subranges of subregs just after processing the first operand. So the main range gets intervals from subranges those are not updated yet. The patch takes into account lane mask to update the main range. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D128553	2022-07-08 16:07:19 +03:00
Sanjay Patel	8b75671314	[SDAG] try to replace subtract-from-constant with xor This is almost the same as the abandoned D48529, but it allows splat vector constants too. This replaces the x86-specific code that was added with the alternate patch D48557 with the original generic combine. This transform is a less restricted form of an existing InstCombine and the proposed SDAG equivalent for that in D128080: https://alive2.llvm.org/ce/z/OUm6N_ Differential Revision: https://reviews.llvm.org/D128123	2022-07-08 08:14:24 -04:00
OCHyams	6b62ca9043	[NFC][SelectionDAG] Fix debug prints in salvageUnresolvedDbgValue The prints are printing pointer values - fix by dereferencing the pointers.	2022-07-08 12:09:30 +01:00
Petar Avramovic	2483f43d47	[AArch64][GlobalISel] Fix call lowering for <3 x i32> vector arguments Differential Revision: https://reviews.llvm.org/D129194	2022-07-08 10:25:45 +02:00
Sergei Barannikov	2247fdc84d	[SelectionDAG] computeKnownBits / ComputeNumSignBits for the remaining overflow-aware nodes Some overflow-aware nodes were missing from the switches in computeKnownBits and ComputeNumSignBits.	2022-07-08 09:19:19 +01:00
Joseph Huber	41fba3c107	[Metadata] Add 'exclude' metadata to add the exclude flags on globals This patchs adds a new metadata kind `exclude` which implies that the global variable should be given the necessary flags during code generation to not be included in the final executable. This is done using the ``SHF_EXCLUDE`` flag on ELF for example. This should make it easier to specify this flag on a variable without needing to explicitly check the section name in the target backend. Depends on D129053 D129052 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D129151	2022-07-07 12:20:40 -04:00
Joseph Huber	1d2ce4da84	[Object] Add ELF section type for offloading objects Currently we use the `.llvm.offloading` section to store device-side objects inside the host, creating a fat binary. The contents of these sections is currently determined by the name of the section while it should ideally be determined by its type. This patch adds the new `SHT_LLVM_OFFLOADING` section type to the ELF section types. Which should make it easier to identify this specific data format. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D129052	2022-07-07 12:20:30 -04:00
Bradley Smith	60d6be5dd3	[LegalizeTypes] Replace vecreduce_xor/or/and with vecreduce_add/umax/umin if not legal This is done during type legalization since the target representation of these nodes may not be valid until after type legalization, and after type legalization the fact that these are dealing with i1 types may be lost. Differential Revision: https://reviews.llvm.org/D128996	2022-07-07 09:33:54 +00:00
Sander de Smalen	15c3ba8a44	[AArc64] Legalisation of compares and truncates of nxv1i1 types. Truncates and compares require some changes to generic legalisation functions to use ElementCount instead of getNumElements. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D129082	2022-07-07 07:39:27 +00:00
Eli Friedman	696f53665d	[AsmPrinter] Fix bit pattern for i1 vectors. Vectors are defined to be tightly packed, regardless of the element type. The AsmPrinter didn't realize this, and was allocating extra padding. Fixes https://github.com/llvm/llvm-project/issues/49286 Fixes https://github.com/llvm/llvm-project/issues/53246 Fixes https://github.com/llvm/llvm-project/issues/55522 Differential Revision: https://reviews.llvm.org/D129164	2022-07-06 12:56:47 -07:00
Edd Barrett	ed8ef65f3d	[stackmaps] Start legalizing live variable operands Prior to this change, live variable operands passed to `llvm.experimental.stackmap` would be emitted directly to target nodes, meaning that they don't get legalised. The upshot of this is that LLVM may crash when encountering illegally typed target nodes. e.g. https://github.com/llvm/llvm-project/issues/21657 This change introduces a platform independent stackmap DAG node whose operands are legalised as per usual, thus avoiding aforementioned crashes. Note that some kinds of argument are still not handled properly, namely vectors, structs, and large integers, like i128s. These will need to be addressed in follow-up changes. Note also that this does not change the behaviour of `llvm.experimental.patchpoint`. A follow up change will do the same for this intrinsic. Differential review: https://reviews.llvm.org/D125680	2022-07-06 14:01:54 +01:00
Shilei Tian	1023ddaf77	[LLVM] Add the support for fmax and fmin in atomicrmw instruction This patch adds the support for `fmax` and `fmin` operations in `atomicrmw` instruction. For now (at least in this patch), the instruction will be expanded to CAS loop. There are already a couple of targets supporting the feature. I'll create another patch(es) to enable them accordingly. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127041	2022-07-06 10:57:53 -04:00
Nikita Popov	f96cb66d19	[ValueTracking] Accept Instruction in isSafeToSpeculativelyExecute() (NFC) As constant expressions can no longer trap, it only makes sense to call isSafeToSpeculativelyExecute on Instructions, so limit the API to accept only them, rather than general Operators or Values.	2022-07-06 11:12:49 +02:00
Nikita Popov	bb84e5eeff	[SelectionDAGISel] Drop unused variable (NFC)	2022-07-06 10:46:13 +02:00
Nikita Popov	8ee913d83b	[IR] Remove Constant::canTrap() (NFC) As integer div/rem constant expressions are no longer supported, constants can no longer trap and are always safe to speculate. Remove the Constant::canTrap() method and its usages.	2022-07-06 10:36:47 +02:00
Simon Pilgrim	7068c843d2	[DAG] visitREM - use isAllOnesOrAllOnesSplat instead of isConstOrConstSplat We were only using the N1C scalar/splat value once, so for clarity use isAllOnesOrAllOnesSplat instead if we actually need it.	2022-07-05 16:44:31 +01:00
Simon Pilgrim	e7a0fa4df0	[DAG] foldAddSubOfSignBit - don't bother creating the new shift node unless constant folding succeeds Noticed by inspection - the new shift is only ever used if the constant fold occurs	2022-07-05 16:44:31 +01:00
Thomas Symalla	04c5fed5e0	[NFC] Fix wrong comment.	2022-07-05 13:37:44 +02:00
Simon Pilgrim	cce64e7a9c	[DAG] visitTRUNCATE - move GetDemandedBits AFTER SimplifyDemandedBits. Another cleanup step before removing GetDemandedBits entirely.	2022-07-04 11:25:40 +01:00
Nikita Popov	7283f48a05	[IR] Remove support for insertvalue constant expression This removes the insertvalue constant expression, as part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179. This is very similar to the extractvalue removal from D125795. insertvalue is also not supported in bitcode, so no auto-ugprade is necessary. ConstantExpr::getInsertValue() can be replaced with IRBuilder::CreateInsertValue() or ConstantFoldInsertValueInstruction(), depending on whether a constant result is required (with the latter being fallible). The ConstantExpr::hasIndices() and ConstantExpr::getIndices() methods also go away here, because there are no longer any constant expressions with indices. Differential Revision: https://reviews.llvm.org/D128719	2022-07-04 09:27:22 +02:00
esmeyi	d2a35e4d39	[AIX] Handling the label alignment of a global variable with its multiple aliases. This patch handles the case where a variable has multiple aliases. AIX's assembly directive .set is not usable for the aliasing purpose, and using different labels allows AIX to emulate symbol aliases. If a value is emitted between any two labels, meaning they are not aligned, XCOFF will automatically calculate the offset for them. This patch implements: 1) Emits the label of the alias just before emitting the value of the sub-element that the alias referred to. 2) A set of aliases that refers to the same offset should be aligned. 3) We didn't emit aliasing labels for common and zero-initialized local symbols in PPCAIXAsmPrinter::emitGlobalVariableHelper, but emitted linkage for them in AsmPrinter::emitGlobalAlias, which caused a FAILURE. This patch fixes the bug by blocking emitting linkage for the alias without a label. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D124654	2022-07-03 23:16:16 -04:00
Quentin Colombet	f4145ddf5b	[GISel] Don't fold convergent instruction across CFG Before merging two instructions together, GISel does some sanity checks that the folding is legal. However that check was missing that the source of the pattern may be convergent. When the destination location is in a different basic block, the folding is invalid. Differential Revision: https://reviews.llvm.org/D128539	2022-07-01 10:24:24 -07:00
Sander de Smalen	690db16422	[AArch64] Make nxv1i1 types a legal type for SVE. One motivation to add support for these types are the LD1Q/ST1Q instructions in SME, for which we have defined a number of load/store intrinsics which at the moment still take a `<vscale x 16 x i1>` predicate regardless of their element type. This patch adds basic support for the nxv1i1 type such that it can be passed/returned from functions, as well as some basic support to support some existing tests that result in a nxv1i1 type. It also adds support for splats. Other operations (e.g. insert/extract subvector, logical ops, etc) will be supported in follow-up patches. Reviewed By: paulwalker-arm, efriedma Differential Revision: https://reviews.llvm.org/D128665	2022-07-01 15:11:13 +00:00
Xiang1 Zhang	72a23cef7e	[ISel] Match all bits when merge undefs for DAG combine Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D128570	2022-07-01 09:09:43 +08:00
Xiang1 Zhang	64f44a90ef	Revert "[ISel] Match all bits when merge undef(s) for DAG combine" This reverts commit `5fe5aa284e`.	2022-07-01 08:59:04 +08:00
Xiang1 Zhang	5fe5aa284e	[ISel] Match all bits when merge undef(s) for DAG combine	2022-07-01 08:58:00 +08:00
Nuno Lopes	373571dbb4	[NFC] Switch a few uses of undef to poison as placeholders for unreachble code	2022-06-30 23:01:43 +01:00
jeff	09424f802c	[AMDGPU] Check for CopyToReg PhysReg clobbers in pre-RA-sched Differential Revision: https://reviews.llvm.org/D128681	2022-06-30 09:18:04 -07:00
Luo, Yuanke	fa8656d28d	[greedyalloc] Return early when there is no register to allocate. In X86 we split greddy register allocation into 2 passes. The 1st pass is to allocate tile register, and the 2nd pass is to allocate the rest of virtual register. In most cases there is no tile register, so the 1st pass is unnecessary. To improve the compiling time, we check if there is any register need to be allocated by invoking callback `ShouldAllocateClass`. If there is no register to be allocated, just return false in the pass. This would improve the 1st greed RA pass for normal cases. Differential Revision: https://reviews.llvm.org/D128804	2022-06-30 11:12:05 +08:00
Stefan Pintilie	e50a8c8435	[GlobalMerge] Ensure that the MustKeepGlobalVariables has all globals from each landingpad clause. The filter clause in the landingpad may not have a GlobalVariable operand. It may instead have a ConstantArray of operands and each operand within this ConstantArray should also be checked to see if it is a GlobalVariable. This patch add the check for the ConstantArray as well as a debug message that outputs the contents of MustKeepGlobalVariables. Reviewed By: lei, amyk, scui Differential Revision: https://reviews.llvm.org/D128287	2022-06-29 15:55:47 -05:00
Nikita Popov	16033ffdd9	[ConstExpr] Remove more leftovers of extractvalue expression (NFC) Remove some leftover bits of extractvalue handling after the removal in D125795.	2022-06-29 10:45:19 +02:00
Nikita Popov	348ea34bcd	[AsmPrinter] Further restrict expressions supported in global initializers lowerConstant() currently accepts a number of constant expressions which have corresponding MC expressions, but which cannot be evaluated as a relocatable expression (unless the operands are constant, in which case we'll just fold the expression to a constant). The motivation here is to clarify which constant expressions are really needed for https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179, and in particular clarify that we do not need to support any division expressions, which are particularly problematic. Differential Revision: https://reviews.llvm.org/D127972	2022-06-29 10:02:07 +02:00
Luo, Yuanke	5cb0979870	[X86][AMX] Split greedy RA for tile register When we fill the shape to tile configure memory, the shape is gotten from AMX pseudo instruction. However the register for the shape may be split or spilled by greedy RA. That cause we fill the shape to config memory after ldtilecfg is executed, so that the shape configuration would be wrong. This patch is to split the tile register allocation from greedy register allocation, so that after tile registers are allocated the shape registers are still virtual register. The shape register only may be redefined or multi-defined by phi elimination pass, two address pass. That doesn't affect tile register configuration. Differential Revision: https://reviews.llvm.org/D128584	2022-06-29 10:35:43 +08:00
Guozhi Wei	ddc9e8861c	[MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency Add a new pattern A - (B + C) ==> (A - B) - C to give machine combiner a chance to evaluate which instruction sequence has lower latency. Differential Revision: https://reviews.llvm.org/D124564	2022-06-28 21:42:51 +00:00
Rahman Lavaee	0aa6df6575	[Propeller] Encode address offsets of basic blocks relative to the end of the previous basic blocks. This is a resurrection of D106421 with the change that it keeps backward-compatibility. This means decoding the previous version of `LLVM_BB_ADDR_MAP` will work. This is required as the profile mapping tool is not released with LLVM (AutoFDO). As suggested by @jhenderson we rename the original section type value to `SHT_LLVM_BB_ADDR_MAP_V0` and assign a new value to the `SHT_LLVM_BB_ADDR_MAP` section type. The new encoding adds a version byte to each function entry to specify the encoding version for that function. This patch also adds a feature byte to be used with more flexibility in the future. An use-case example for the feature field is encoding multi-section functions more concisely using a different format. Conceptually, the new encoding emits basic block offsets and sizes as label differences between each two consecutive basic block begin and end label. When decoding, offsets must be aggregated along with basic block sizes to calculate the final offsets of basic blocks relative to the function address. This encoding uses smaller values compared to the existing one (offsets relative to function symbol). Smaller values tend to occupy fewer bytes in ULEB128 encoding. As a result, we get about 17% total reduction in the size of the bb-address-map section (from about 11MB to 9MB for the clang PGO binary). The extra two bytes (version and feature fields) incur a small 3% size overhead to the `LLVM_BB_ADDR_MAP` section size. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D121346	2022-06-28 07:42:54 -07:00
Tim Northover	4aafebce52	SelectionDAG: allow FP extensions when folding extract/insert. Before, we were trying to sign extend half -> float, and asserted in getNode.	2022-06-28 12:08:35 +01:00
Guillaume Chatelet	3c126d5fe4	[Alignment] Replace commonAlignment with std::min `commonAlignment` is a shortcut to pick the smallest of two `Align` objects. As-is it doesn't bring much value compared to `std::min`. Differential Revision: https://reviews.llvm.org/D128345	2022-06-28 07:15:02 +00:00
Fangrui Song	f1e27716cf	[LiveInterval] Simplify with partition_point. NFC	2022-06-27 19:25:26 -07:00
Yuanfang Chen	6678f8e505	[ubsan] Using metadata instead of prologue data for function sanitizer Information in the function `Prologue Data` is intentionally opaque. When a function with `Prologue Data` is duplicated. The self (global value) references inside `Prologue Data` is still pointing to the original function. This may cause errors like `fatal error: error in backend: Cannot represent a difference across sections`. This patch detaches the information from function `Prologue Data` and attaches it to a function metadata node. This and D116130 fix https://github.com/llvm/llvm-project/issues/49689. Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D115844	2022-06-27 12:09:13 -07:00
Patrick Walton	becbbb7e3c	Round up zero-sized symbols to 1 byte in `.debug_aranges` (without breaking other logic). This commit modifies the AsmPrinter to avoid emitting any zero-sized symbols to the .debug_aranges table, by rounding their size up to 1. Entries with zero length violate the DWARF 5 spec, which states: > Each descriptor is a triple consisting of a segment selector, the beginning > address within that segment of a range of text or data covered by some entry > owned by the corresponding compilation unit, followed by the non-zero length > of that range. In practice, these zero-sized entries produce annoying warnings in lld and cause GNU binutils to truncate the table when parsing it. Other parts of LLVM, such as DWARFDebugARanges in the DebugInfo module (specifically the appendRange method), already avoid emitting zero-sized symbols to .debug_aranges, but not comprehensively in the AsmPrinter. In fact, the AsmPrinter does try to avoid emitting such zero-sized symbols when labels aren't involved, but doesn't when the symbol to emitted is a difference of two labels; this patch extends that logic to handle the case in which the symbol is defined via labels. Furthermore, this patch fixes a bug in which `available_externally` symbols would cause unpredictable values to be emitted into the `.debug_aranges` table under certain circumstances. In practice I don't believe that this caused issues up until now, but the root cause of this bug--an invalid DenseMap lookup--triggered failures in Chromium when combined with an earlier version of this patch. Therefore, this patch fixes that bug too. This is a revised version of diff D126257, which was reverted due to breaking tests. The now-reverted version of this patch didn't distinguish between symbols that didn't have their size reported to the DwarfDebug handler and those that had their size reported to be zero. This new version of the patch instead restricts the special handling only to the symbols whose size is definitively known to be zero. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D126835	2022-06-27 10:01:03 -07:00
Matt Arsenault	97ed2fbc5f	MIR: Fix parse error on empty CustomRegMask	2022-06-27 08:50:35 -04:00
Bradley Smith	a83aa33d1b	[IR] Move vector.insert/vector.extract out of experimental namespace These intrinsics are now fundemental for SVE code generation and have been present for a year and a half, hence move them out of the experimental namespace. Differential Revision: https://reviews.llvm.org/D127976	2022-06-27 10:48:45 +00:00
Kazu Hirata	94460f5136	Don't use Optional::hasValue (NFC) This patch replaces x.hasValue() with x where x is contextually convertible to bool.	2022-06-26 19:54:41 -07:00
Kazu Hirata	d08f34b592	[llvm] Don't use Optional::hasValue (NFC) This patch replaces Optional::hasValue with the implicit cast to bool in conditionals only.	2022-06-26 18:31:51 -07:00
Kazu Hirata	a81b64a1fb	[llvm] Use Optional::has_value instead of Optional::hasValue (NFC) This patch replaces x.hasValue() with x.has_value() where x is not contextually convertible to bool.	2022-06-26 16:10:42 -07:00
Craig Topper	44b456e5f0	[CodeGenPrepare] Avoid double map lookup. NFCI	2022-06-26 10:47:14 -07:00
Nikita Popov	ec19223131	Revert "[LiveInterval] Simplify. NFC" This reverts commit `e733b80f3c`. This caused a major compile-time regression: https://llvm-compile-time-tracker.com/compare.php?from=3b7c3a654c9175f41ac871a937cbcae73dfb3c5d&to=e733b80f3cba26bf2df9bd691120f37fc1af21ce&stat=instructions About 1% on average, 10% on individual files.	2022-06-26 11:51:07 +02:00
Kazu Hirata	a7938c74f1	[llvm] Don't use Optional::hasValue (NFC) This patch replaces Optional::hasValue with the implicit cast to bool in conditionals only.	2022-06-25 21:42:52 -07:00
Fangrui Song	e733b80f3c	[LiveInterval] Simplify. NFC	2022-06-25 11:59:33 -07:00
Kazu Hirata	3b7c3a654c	Revert "Don't use Optional::hasValue (NFC)" This reverts commit `aa8feeefd3`.	2022-06-25 11:56:50 -07:00
Kazu Hirata	aa8feeefd3	Don't use Optional::hasValue (NFC)	2022-06-25 11:55:57 -07:00
Matt Arsenault	e7bc73739a	GlobalISel: Make LoadStoreOpt preserve all Avoids dropping CSE info analysis	2022-06-25 09:24:54 -04:00
Matt Arsenault	a397846cb0	CodeGen: Use else if between Value and PseudoSourceValue cases These are mutually exclusive.	2022-06-25 09:24:25 -04:00
chenglin.bi	8c74205642	[SelectionDAG][DAGCombiner] Reuse exist node by reassociate When already have (op N0, N2), reassociate (op (op N0, N1), N2) to (op (op N0, N2), N1) to reuse the exist (op N0, N2) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122539	2022-06-24 23:15:06 +08:00
Nabeel Omer	0d41794335	[SLP] Add cost model for `llvm.powi.` intrinsics (REAPPLIED) Patch was reverted in `4c5f10a` due to buildbot failures, now being reapplied with updated AArch64 and RISCV tests. This patch adds handling for the llvm.powi. intrinsics in BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization. Closes #53887. Differential Revision: https://reviews.llvm.org/D128172	2022-06-24 10:23:19 +00:00
Carl Ritson	874fbe2cbb	[MachineSink] Clear kill flags on operands outside loop If an instruction is sunk into a loop then any kill flags on operands declared outside the loop must be cleared as these will be live for all loop iterations. Fixes #46827 Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D126754	2022-06-24 14:02:48 +09:00
Lian Wang	1ce30457c1	[LegalizeTypes][NFC] Add an assert to WidenVecRes_EXTRACT_SUBVECTOR and adjust some code Reviewed By: craig.topper, david-arm Differential Revision: https://reviews.llvm.org/D128038	2022-06-24 03:06:16 +00:00
Lian Wang	770fe864fe	[SelectionDAG] Enable WidenVecOp_VECREDUCE for scalable vector Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128239	2022-06-24 02:32:53 +00:00
Craig Topper	8b10ffabae	[RISCV] Disable <vscale x 1 x > types with Zve32x or Zve32f. According to the vector spec, mf8 is not supported for i8 if ELEN is 32. Similarily mf4 is not suported for i16/f16 or mf2 for i32/f32. Since RVVBitsPerBlock is 64 and LMUL is calculated as ((MinNumElements ElementSize) / RVVBitsPerBlock) this means we need to disable any type with MinNumElements==1. For generic IR, these types will now be widened in type legalization. For RVV intrinsics, we'll probably hit a fatal error somewhere. I plan to work on disabling the intrinsics in the riscv_vector.h header. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D128286	2022-06-23 08:49:18 -07:00
Baptiste Saleil	79e77a9f39	[AMDGPU] Flush the vmcnt counter in loop preheaders when necessary waitcnt vmcnt instructions are currently generated in loop bodies before using values loaded outside of the loop. In some cases, it is better to flush the vmcnt counter in a loop preheader before entering the loop body. This patch detects these cases and generates waitcnt instructions to flush the counter. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D115747	2022-06-23 10:53:21 -04:00
Nico Weber	851a5efe45	Revert "[fastalloc] Support allocating specific register class in fastalloc" This reverts commit `719658d078`. Breaks a few things, see comments on https://reviews.llvm.org/D128437 There's disagreement about the best fix. So let's keep HEAD green while discussions are happening.	2022-06-23 10:44:24 -04:00
Luo, Yuanke	719658d078	[fastalloc] Support allocating specific register class in fastalloc The base RA support infrastructure that only allow a specific register class be allocated in RA pss. Since greedy RA, basic RA derived from base RA, they all allow allocating specific register class. Fast RA doesn't support allocating register for specific register class. This patch is to enable ShouldAllocateClass in fast RA, so that it can support allocating register for specific register class. Differential Revision: https://reviews.llvm.org/D126771	2022-06-23 14:42:04 +08:00
chenglin.bi	9c2bf534f5	Revert "[SelectionDAG][DAGCombiner] Reuse exist node by reassociate" This reverts commit `6c951c5ee6`.	2022-06-23 13:21:51 +08:00
Matt Arsenault	370aa2f88f	InlineSpiller: Don't fold spills into undef reads This was producing a load into a dead register which was a verifier error.	2022-06-22 20:47:55 -04:00
Guillaume Chatelet	57ffff6db0	Revert "[NFC] Remove dead code" This reverts commit `8ba2cbff70`.	2022-06-22 14:55:47 +00:00
Guillaume Chatelet	8ba2cbff70	[NFC] Remove dead code	2022-06-22 13:33:58 +00:00
Simon Pilgrim	2c3a4a9334	[DAG] SelectionDAG::GetDemandedBits - don't recurse back into GetDemandedBits Another minor cleanup as we work toward removing GetDemandedBits entirely - call SimplifyMultipleUseDemandedBits directly.	2022-06-22 13:48:57 +01:00
Simon Pilgrim	1c2b756cd6	[DAG] visitTRUNCATE - move TRUNCATE(ADDE/ADDCARRY) folds to switch statement handling the other binops. NFC.	2022-06-21 22:07:41 +01:00
Simon Pilgrim	8cecb6be56	[DAG] Remove SelectionDAG::GetDemandedBits DemandedElts variant. NFC. We're slowly removing SelectionDAG::GetDemandedBits and replacing it with SimplifyMultipleUseDemandedBits, we no longer have any uses for the vector demanded elt variant.	2022-06-21 21:23:10 +01:00
Nabeel Omer	4c5f10aeeb	Revert rGe6ccb57bb3f6b761f2310e97fd6ca99eff42f73e "[SLP] Add cost model for `llvm.powi.*` intrinsics" This reverts commit `e6ccb57bb3`.	2022-06-21 15:05:55 +00:00
Nabeel Omer	e6ccb57bb3	[SLP] Add cost model for `llvm.powi.` intrinsics This patch adds handling for the llvm.powi. intrinsics in BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization. Closes #53887. Differential Revision: https://reviews.llvm.org/D128172	2022-06-21 14:40:34 +00:00
Markus Lavin	3815ae29b5	[machinesink] fix debug invariance issue Do not include debug instructions when comparing block sizes with thresholds. Differential Revision: https://reviews.llvm.org/D127208	2022-06-21 08:13:09 +02:00
Kazu Hirata	7a47ee51a1	[llvm] Don't use Optional::getValue (NFC)	2022-06-20 22:45:45 -07:00
Kazu Hirata	d66cbc565a	Don't use Optional::hasValue (NFC)	2022-06-20 20:26:05 -07:00
Kazu Hirata	0916d96d12	Don't use Optional::hasValue (NFC)	2022-06-20 20:17:57 -07:00
Kazu Hirata	064a08cd95	Don't use Optional::hasValue (NFC)	2022-06-20 20:05:16 -07:00
chenglin.bi	6c951c5ee6	[SelectionDAG][DAGCombiner] Reuse exist node by reassociate When already have (op N0, N2), reassociate (op (op N0, N1), N2) to (op (op N0, N2), N1) to reuse the exist (op N0, N2) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122539	2022-06-21 09:45:19 +08:00
Luo, Yuanke	44e8a205f4	[fastregalloc] Enhance the heuristics for liveout in self loop. For below case, virtual register is defined twice in the self loop. We don't need to spill %0 after the third instruction `%0 = def (tied %0)`, because it is defined in the second instruction `%0 = def`. 1 bb.1 2 %0 = def 3 %0 = def (tied %0) 4 ... 5 jmp bb.1 Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D125079	2022-06-21 09:18:49 +08:00
Kazu Hirata	5413bf1bac	Don't use Optional::hasValue (NFC)	2022-06-20 11:33:56 -07:00
David Green	c0ecbfa4fd	[AArch64] Known bits for AArch64ISD::DUP An AArch64ISD::DUP is just a splat, where the known bits for each lane are the same as the input. This teaches that to computeKnownBitsForTargetNode. Problems arise for constants though, as a constant BUILD_VECTOR can be lowered to an AArch64ISD::DUP, which SimplifyDemandedBits would then turn back into a constant BUILD_VECTOR leading to an infinite cycle. This has been prevented by adding a isTargetCanonicalConstantNode node to prevent the conversion back into a BUILD_VECTOR. Differential Revision: https://reviews.llvm.org/D128144	2022-06-20 19:11:57 +01:00
Kazu Hirata	e0e687a615	[llvm] Don't use Optional::hasValue (NFC)	2022-06-20 10:38:12 -07:00
Guillaume Chatelet	03036061c7	[Alignment] Use 'previous()' method instead of scalar division This is in preparation of integration with D128052. Differential Revision: https://reviews.llvm.org/D128169	2022-06-20 11:01:43 +00:00
Simon Pilgrim	e4a124dda5	[DAG] Fold (srl (shl x, c1), c2) -> and(shl/srl(x, c3), m) Similar to the existing (shl (srl x, c1), c2) fold Part of the work to fix the regressions in D77804 Differential Revision: https://reviews.llvm.org/D125836	2022-06-20 08:37:38 +01:00
Lian Wang	ab25e263a9	[SelectionDAG] Enable WidenVecOp_VECREDUCE_SEQ for scalable vector Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D127710	2022-06-20 06:30:26 +00:00
Craig Topper	314dbde12c	[DAGCombiner][ARM][RISCV] Teach ShrinkLoadReplaceStoreWithStore to use truncstore. The VT we want to shrink to may not be legal especially after type legalization. Fixes PR56110. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D128135	2022-06-19 15:50:15 -07:00
Haojian Wu	44582afe48	Fix an unused-variable warning in release build, NFC.	2022-06-19 20:52:00 +02:00
David Green	e995e34469	[MachinePipeliner] Handle failing constrainRegClass The included test hits a verifier problems as one of the instructions: ``` %113:tgpreven, %114:tgprodd = MVE_VMLSLDAVas16 %12:tgpreven(tied-def 0), %11:tgprodd(tied-def 1), %7:mqpr, %8:mqpr, 0, $noreg, $noreg ``` Has two inputs that come from different PHIs with the same base reg, but conflicting regclasses: ``` %11:tgprodd = PHI %103:gpr, %bb.1, %16:gpr, %bb.2 %12:tgpreven = PHI %103:gpr, %bb.1, %17:gpr, %bb.2 ``` The MachinePipeliner would attempt to use %103 for both the %11 and %12 operands in the prolog, constraining the register class to the common subset of both. Unfortunately there are no registers that are both odd and even, so the second constrainRegClass fails. Fix this situation by inserting a COPY for the second if the call to constrainRegClass fails. The register allocation can then fold that extra copy away. The register allocation of Q regs changed with this test, but the R regs were the same and no new instructions are needed in the final assembly. Differential Revision: https://reviews.llvm.org/D127971	2022-06-19 18:55:19 +01:00
Simon Pilgrim	ba3f2667b6	[DAG] Add MaskedVectorIsZero helper Equivalent to MaskedValueIsZero, except its checking if all of the demanded vectors elements are known to be zero	2022-06-19 17:56:30 +01:00
Simon Pilgrim	1ebe5cac46	[DAG] SimplifyDemandedBits - add DemandedElts handling to ISD::SIGN_EXTEND_INREG simplification	2022-06-19 15:35:29 +01:00
Simon Pilgrim	db1be696c4	[DAG] SimplifyDemandedBits - add ISD::VSELECT handling	2022-06-19 15:18:25 +01:00
Kazu Hirata	129b531c9c	[llvm] Use value_or instead of getValueOr (NFC)	2022-06-18 23:07:11 -07:00
Kazu Hirata	3cbe0bc4a1	[CodeGen] Use default member initialization (NFC) Identified with modernize-use-default-member-init.	2022-06-18 12:01:34 -07:00
Kazu Hirata	4271a1ff33	[llvm] Call *set::insert without checking membership first (NFC)	2022-06-18 10:17:22 -07:00
Kazu Hirata	b254d67160	[llvm] Call *set::insert without checking membership first (NFC)	2022-06-18 08:32:54 -07:00
Han-Kuan Chen	e29133629b	[MachineCopyPropagation][RISCV] Fix D125335 accidentally change control flow. D125335 makes regsOverlap skip following control flow, which is not entended in the original code. Differential Revision: https://reviews.llvm.org/D128039	2022-06-17 21:40:08 -07:00
Vitaly Buka	f0ca0a324f	[CodeGen] Init EmptyExpr before the first use	2022-06-17 17:40:06 -07:00
Guillaume Chatelet	90f96ec7a5	[NFC][Alignment] Remove assumeAligned from MachineFrameInfo ctor	2022-06-17 15:21:17 +00:00
Paul Walker	0e21f1d56a	[SelectionDAG] Extend WidenVecOp_INSERT_SUBVECTOR to cover more cases. WidenVecOp_INSERT_SUBVECTOR only supported cases where widening effectively converts the insert into a copy. However, when the widened subvector is no bigger than the vector being inserted into and we can be sure there's no loss of data, we can simply emit another INSERT_SUBVECTOR. Fixes: #54982 Differential Revision: https://reviews.llvm.org/D127508	2022-06-17 12:39:42 +00:00
Mingming Liu	1e67385d28	[MachineBlockPlacementStats] Added check for "-filter-print-funcs" option to the machine-block-placement-stats. Differential Revision: https://reviews.llvm.org/D128019	2022-06-16 21:59:54 -07:00
Mingming Liu	b7d09557f6	Revert "[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats." This reverts commit `46d45df451`. Going to add differential revision link to commit message and re-commit.	2022-06-16 21:56:08 -07:00
Mingming Liu	46d45df451	[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats.	2022-06-16 21:48:08 -07:00
Lian Wang	f2bcf33058	[LegalizeTypes][NFC] Merge promote SPLAT_VECTOR and promote SCALAR_TO_VECTOR to one function Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D127825	2022-06-17 02:43:52 +00:00
Lian Wang	16215eb979	[LegalizeTypes][RISCV][NFC] Modify assert in PromoteIntRes_STEP_VECTOR and add some tests for RISCV Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D127939	2022-06-17 02:26:09 +00:00
Craig Topper	e6c7a3a54f	[SelectionDAG] Don't apply MinRCSize constraint in InstrEmitter::AddRegisterOperand for IMPLICIT_DEF sources. MinRCSize is 4 and prevents constrainRegClass from changing the register class if the new class has size less than 4. IMPLICIT_DEF gets a unique vreg for each use and will be removed by the ProcessImplicitDef pass before register allocation. I don't think there is any reason to prevent constraining the virtual register to whatever register class the use needs. The attached test case was previously creating a copy of IMPLICIT_DEF because vrm8nov0 has 3 registers in it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D128005	2022-06-16 14:55:14 -07:00
Adrian Tong	55311801f0	Allow bitwidth difference when checking for isOneOrOneSplat. This helps handling a case where the BUILD_VECTOR has i16 element type and i32 constant operands t2: v8i16 = setcc t8, t17, setult:ch t3: v8i16 = BUILD_VECTOR Constant:i32<1>, ... t4: v8i16 = and t2, t3 t5: v8i16 = add t8, t4 This can be turned into t5: v8i16 = sub t8, t2, and allows us to remove t3 and t4 from the DAG. Differential Revision: https://reviews.llvm.org/D127354	2022-06-16 16:04:20 +00:00
Kito Cheng	e9f7263b38	Reland "[SplitKit] Handle early clobber + tied to def correctly" This reverts commit `7207373e1e`. We found another RISC-V bug when landing D126048, and it has been fixed by D127642 now. Differential Revision: https://reviews.llvm.org/D126048	2022-06-16 17:13:09 +08:00
Craig Topper	3aa6ec619f	[ValueTypes] Add types for nxv16bf16 and nxv32bf16. This is needed by our downstream and makes bf16 and f16 have the same set of scalable vector types. Reviewed By: rui.zhang Differential Revision: https://reviews.llvm.org/D127877	2022-06-15 23:00:53 -07:00
Benjamin Kramer	8c4a07c61f	[DAGCombiner] Fold fold (fp_to_bf16 (bf16_to_fp op)) -> op	2022-06-15 19:54:39 +02:00
Benjamin Kramer	ca50cb120b	[SelectionDAG] Constant fold FP_TO_BF16 and BF16_TO_FP.	2022-06-15 18:51:32 +02:00
Paul Robinson	ac2ad3b7bb	[PS5] Support sin+cos->sincos optimization	2022-06-15 09:36:05 -07:00
Paul Robinson	654a835c3f	[PS5] Trap after noreturn calls, with special case for stack-check-fail	2022-06-15 09:02:17 -07:00
Luo, Yuanke	16547f9fbb	[CodeGen] Fix the bug of machine sink The use operand may be undefined. In that case we can just continue to check the next operand since it won't increase register pressure. Differential Revision: https://reviews.llvm.org/D127848	2022-06-15 23:35:52 +08:00
Benjamin Kramer	8bc0bb9564	Add a conversion from double to bf16 This introduces a new compiler-rt function `__truncdfbf2`.	2022-06-15 12:56:31 +02:00
Benjamin Kramer	fb34d531af	Promote bf16 to f32 when the target doesn't support it This is modeled after the half-precision fp support. Two new nodes are introduced for casting from and to bf16. Since casting from bf16 is a simple operation I opted to always directly lower it to integer arithmetic. The other way round is more complicated if you want to preserve IEEE semantics, so it's handled by a new __truncsfbf2 compiler-rt builtin. This is of course very bare bones, but sufficient to get a semi-softened fadd on x86. Possible future improvements: - Targets with bf16 conversion instructions can now make fp_to_bf16 legal - The software conversion to bf16 can be replaced by a trivial implementation under fast math. Differential Revision: https://reviews.llvm.org/D126953	2022-06-15 12:56:31 +02:00
Keith Walker	94fac097ad	[DebugInfo][ARM] Not readonly check for RWPI globals When compiling for the RWPI relocation model [1], the debug information is wrong for readonly global variables. Writable global variables are accessed by the static base register (R9 on ARM) in the RWPI relocation model. This is being correctly generated Readonly global variables are not accessed by the static base register in the RWPI relocation model. This case is incorrectly generating the same debugging information as for writable global variables. References: [1] ARM Read-Write Position Independence: https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#read-write-position-independence-rwpi Differential Revision: https://reviews.llvm.org/D126361	2022-06-15 11:52:12 +01:00
Simon Pilgrim	f096d5926d	[DAG] Fix SDLoc mismatch in (shl (srl x, c1), c2) -> and(shift(x,c3)) fold Noticed by @craig.topper on D125836 which uses a tweaked copy of the same code. Differential Revision: https://reviews.llvm.org/D127772	2022-06-15 11:07:59 +01:00
Ping Deng	c06f77ec0d	[SelectionDAG] fold 'Op0 - (X * MulC)' to 'Op0 + (X << log2(-MulC))' Reviewed By: craig.topper, spatel Differential Revision: https://reviews.llvm.org/D127474	2022-06-15 05:50:18 +00:00
Paul Robinson	0ce33c2941	[PS5] Default to 'sce' debugger tuning	2022-06-14 15:28:28 -07:00
Serguei Katkov	d713f0eab8	Revert "[MachineSSAUpdater] compile time improvement in GetValueInMiddleOfBlock" It looks like it causes buildbot failures. As an example: https://lab.llvm.org/buildbot/#/builders/121/builds/20364 Revert to investigate... This reverts commit `6bf2791814`.	2022-06-14 20:27:21 +07:00
Florian Hahn	e5c4308ba1	[InterleavedLoadComb] Rename uses when inserting new uses. This fixes a crash due to uses needing to be renamed.	2022-06-14 13:15:23 +01:00
Serguei Katkov	6bf2791814	[MachineSSAUpdater] compile time improvement in GetValueInMiddleOfBlock GetValueInMiddleOfBlock uses result of GetValueAtEndOfBlockInternal if there is no value defined for current basic block. If there is already a value it tries (in this order): to find single register coming from all predecessors find existing phi node which matches our incoming registers build new phi. The compile time improvement is to use current available value if it is defined out of current BB or it is a PHI register. This is due to it can be used in the middle basic block. Reviewed By: sameerds Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D126523	2022-06-14 18:00:34 +07:00
Guillaume Chatelet	c0e85f1c3b	[NFC][Alignment] Use Align in SafeStack	2022-06-14 10:56:36 +00:00
Guillaume Chatelet	6725d80640	[NFC][Alignment] Use Align in shouldAlignPointerArgs	2022-06-14 10:56:36 +00:00
Denis Antrushin	c0e965e222	[Statepoints] FixupStatepoint: Clear isKill flag if COPY is not deleted. When spilling CSRs, FixupStatepoint pass does simple copy propagation, trying to find COPY instruction which defines register being spilled and spill COPY source instead. I.e., if we have CSR $x and found $x = COPY $y we will spill $y instead. But we may be unable to delete COPY instruction for some reason. Then, spill will be inserted after it, adding another use of $y. If COPY instruction was last use of $y (killed it), after insertion of the spill it is not, so `isKill` flag must be cleared. We failed to do so and this patch fixes this issue. Reviewed By: skatkov Differential Revision: https://reviews.llvm.org/D127308	2022-06-14 10:52:32 +03:00
Kazu Hirata	a2232da2a5	[CodeGen] Remove addSEHCatchHandler and addSEHCleanupHandler (NFC) The last uses of these functions are removed on Oct 9, 2015 in commit `14e773500e`.	2022-06-13 23:08:49 -07:00
Kazu Hirata	34ff78c5cf	[CodeGen] Remove restrictRef (NFC) The last use was removed on Apr 14, 2017 in commit `4fe9d6c640`.	2022-06-13 23:08:48 -07:00
Serguei Katkov	095bf6be28	[Greedy RegAlloc] Fix the handling of split register in last chance re-coloring. This is a fix for https://github.com/llvm/llvm-project/issues/55827. When register we are trying to re-color is split the original register (we tried to recover) has no uses after the split. However in rollback actions we assign back physical register to it. Later it causes different assertions. One of them is in attached test. This CL fixes this by avoiding assigning physical register back to register which has no usage or its live interval now is empty. Reviewed By: arsenm, qcolombet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D127281	2022-06-14 12:04:17 +07:00
Kazu Hirata	145cc9db2b	[CodeGen] Remove futureWeight (NFC) The last use was removed on Jun 5, 2022 in commit `5c06f7168f`, which itself was a patch to remove unused code.	2022-06-13 17:10:23 -07:00
Guillaume Chatelet	5a293d21fc	[NFC][Alignment] Use getAlign in SelectionDAGBuilder	2022-06-13 15:13:05 +00:00
Kazu Hirata	23d9ca10ae	[CodeGen] Remove EvictionTrack (NFC) The last of getEvictor use was removed on Jun 5, 2022 in commit `5c06f7168f`, which was itself a patch to remove unused code. Once we remove getEvictor, EvictionTrack becomes a write-only data structure. The data in it won't affect compilation, so the entire class is essentially dead.	2022-06-13 07:21:29 -07:00
Kazu Hirata	246e83e973	[GlobalISel] Remove buildSequence (NFC) The last use was removed on Jun 27, 2019 in commit `8138996128`.	2022-06-13 06:58:36 -07:00
Nikita Popov	b9a7dea917	[SelectionDAG] Handle trapping aggregate (PR49839) Call canTrap() on Constant to account for trapping ConstantAggregate.	2022-06-13 15:06:53 +02:00
Simon Pilgrim	7d8fd4f5db	[DAG] visitINSERT_VECTOR_ELT - attempt to reconstruct BUILD_VECTOR before other fold interfere Another issue unearthed by D127115 We take a long time to canonicalize an insert_vector_elt chain before being able to convert it into a build_vector - even if they are already in ascending insertion order, we fold the nodes one at a time into the build_vector 'seed', leaving plenty of time for other folds to alter it (in particular recognising when they come from extract_vector_elt resulting in a shuffle_vector that is much harder to fold with). D127115 makes this particularly difficult as we're almost guaranteed to have the lost the sequence before all possible insertions have been folded. This patch proposes to begin at the last insertion and attempt to collect all the (oneuse) insertions right away and create the build_vector before its too late. Differential Revision: https://reviews.llvm.org/D127595	2022-06-13 11:48:18 +01:00
Jez Ng	d4bcb45db7	[MC][re-land] Omit DWARF unwind info if compact unwind is present where eligible This reverts commit `d941d59783`. Differential Revision: https://reviews.llvm.org/D122258	2022-06-12 17:24:19 -04:00
Simon Pilgrim	1cf9b24da3	[DAG] Enable ISD::FSHL/R SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts. This helps with several of the regressions from D125836	2022-06-12 19:25:20 +01:00
Jez Ng	d941d59783	Revert "[MC] Omit DWARF unwind info if compact unwind is present where eligible" This reverts commit `ef501bf85d`.	2022-06-12 10:47:08 -04:00
Jez Ng	ef501bf85d	[MC] Omit DWARF unwind info if compact unwind is present where eligible Previously, omitting unnecessary DWARF unwinds was only done in two cases: * For Darwin + aarch64, if no DWARF unwind info is needed for all the functions in a TU, then the `__eh_frame` section would be omitted entirely. If any one function needed DWARF unwind, then MC would emit DWARF unwind entries for all the functions in the TU. * For watchOS, MC would omit DWARF unwind on a per-function basis, as long as compact unwind was available for that function. This diff makes it so that we omit DWARF unwind on a per-function basis for Darwin + aarch64 as well. In addition, we introduce the flag `--emit-dwarf-unwind=` which can toggle between `always`, `no-compact-unwind` (only emit DWARF when CU cannot be emitted for a given function), and the target platform `default`. `no-compact-unwind` is particularly useful for newer x86_64 platforms: we don't want to omit DWARF unwind for x86_64 in general due to possible backwards compat issues, but we should make it possible for people to opt into this behavior if they are only targeting newer platforms. Motivation: I'm working on adding support for `__eh_frame` to LLD, but I'm concerned that we would suffer a perf hit. Processing compact unwind is already expensive, and that's a simpler format than EH frames. Given that MC currently produces one EH frame entry for every compact unwind entry, I don't think processing them will be cheap. I tried to do something clever on LLD's end to drop the unnecessary EH frames at parse time, but this made the code significantly more complex. So I'm looking at fixing this at the MC level instead. Addendum: It turns out that there was a latent bug in the X86 backend when `OmitDwarfIfHaveCompactUnwind` is naively enabled, which is not too surprising given that this combination has not been heretofore used. For functions that have unwind info that cannot be encoded with CU, MC would end up dropping both the compact unwind entry (OK; existing behavior) as well as the DWARF entries (not OK). This diff fixes things so that we emit the DWARF entry, as well as a CU entry with encoding `UNWIND_X86_MODE_DWARF` -- this basically tells the unwinder to look for the DWARF entry. I'm not 100% sure the `UNWIND_X86_MODE_DWARF` CU entry is necessary, this was the simplest fix. ld64 seems to be able to handle both the absence and presence of this CU entry. Ultimately ld64 (and LLD) will synthesize `UNWIND_X86_MODE_DWARF` if it is absent, so there is no impact to the final binary size. Reviewed By: davide, lhames Differential Revision: https://reviews.llvm.org/D122258	2022-06-12 10:03:56 -04:00
Simon Pilgrim	54ae4ca755	[DAG] visitSRL - pull out ShiftVT. NFC.	2022-06-12 14:02:23 +01:00
Simon Pilgrim	cf5c63d187	[DAG] visitVECTOR_SHUFFLE - fold splat(insert_vector_elt()) and splat(scalar_to_vector()) to build_vector splats Addresses a number of regressions identified in D127115	2022-06-11 21:06:42 +01:00
Simon Pilgrim	44a0cd25df	[DAG] visitINSERT_VECTOR_ELT - add <1 x ???> insert_vector_elt(v0,extract_vector_elt(v1,0),0) special case handling Check if we're just replacing one v1x?? vector with another	2022-06-11 19:30:00 +01:00
Simon Pilgrim	a71ad6a3c8	[DAG] visitINSERT_VECTOR_ELT - fold insert_vector_elt(scalar_to_vector(x),v,i) -> build_vector() Allow scalar_to_vector nodes to be used for the start of a build_vector creation	2022-06-11 15:29:22 +01:00
Simon Pilgrim	693f4db1ec	[DAG] visitINSERT_VECTOR_ELT - refactor BUILD_VECTOR insertion to remove early-out. NFCI. Remove the early-out cases so we can more easily add additional folds in the future.	2022-06-11 12:01:13 +01:00
Paul Walker	10d55c4634	[SelectionDAG] Remove invalid TypeSize conversion from WidenVecOp_BITCAST. Differential Revision: https://reviews.llvm.org/D127322	2022-06-11 10:41:13 +01:00
Kazu Hirata	a98965d92f	[CodeGen] Use llvm::erase_value (NFC)	2022-06-10 22:59:48 -07:00
Fangrui Song	adf4142f76	[MC] De-capitalize SwitchSection. NFC Add SwitchSection to return switchSection. The API will be removed soon.	2022-06-10 22:50:55 -07:00
Eli Friedman	0ff51d5dde	Fix interaction of CFI instructions with MachineOutliner. 1. When checking if a candidate contains a CFI instruction, actually iterate over all of the instructions, instead of stopping halfway through. 2. Make sure copied CFI directives refer to the correct instruction. Fixes https://github.com/llvm/llvm-project/issues/55842 Differential Revision: https://reviews.llvm.org/D126930	2022-06-10 13:37:49 -07:00
Guillaume Chatelet	95083fa3b8	[NFC] Remove deadcode	2022-06-10 15:13:42 +00:00
Simon Pilgrim	91adbc3208	[DAG] SimplifyDemandedVectorElts - adding SimplifyMultipleUseDemandedVectorElts handling to ISD::CONCAT_VECTORS Attempt to look through multiple use operands of ISD::CONCAT_VECTORS nodes Another minor improvement for D127115	2022-06-10 16:06:43 +01:00
Guillaume Chatelet	38637ee477	[clang] Add support for __builtin_memset_inline In the same spirit as D73543 and in reply to https://reviews.llvm.org/D126768#3549920 this patch is adding support for `__builtin_memset_inline`. The idea is to get support from the compiler to easily write efficient memory function implementations. This patch could be split in two: - one for the LLVM part adding the `llvm.memset.inline.*` intrinsics. - and another one for the Clang part providing the instrinsic as a builtin. Differential Revision: https://reviews.llvm.org/D126903	2022-06-10 13:13:59 +00:00
Nikita Popov	c10921fa1a	[CGP] Also freeze ctlz/cttz operand when despeculating D125887 changed the ctlz/cttz despeculation transform to insert a freeze for the introduced branch on zero. While this does fix the "branch on poison" issue, we may still get in trouble if we pick a different value for the branch and for the ctz argument (i.e. non-zero for the branch, but zero for the ctz). To avoid this, we should use the same frozen value in both positions. This does cause a regression in RISCV codegen by introducing an additional sext. The DAG looks like this: t0: ch = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %3 t4: i64 = AssertSext t2, ValueType:ch:i32 t23: i64 = freeze t4 t9: ch = CopyToReg t0, Register:i64 %0, t23 t16: ch = CopyToReg t0, Register:i64 %4, Constant:i64<32> t18: ch = TokenFactor t9, t16 t25: i64 = sign_extend_inreg t23, ValueType:ch:i32 t24: i64 = setcc t25, Constant:i64<0>, seteq:ch t28: i64 = and t24, Constant:i64<1> t19: ch = brcond t18, t28, BasicBlock:ch<cond.end 0x8311f68> t21: ch = br t19, BasicBlock:ch<cond.false 0x8311e80> I don't see a really obvious way to improve this, as we can't push the freeze past the AssertSext (which may produce poison). Differential Revision: https://reviews.llvm.org/D126638	2022-06-10 09:46:10 +02:00
Simon Moll	b8c2781ff6	[NFC] format InstructionSimplify & lowerCaseFunctionNames Clang-format InstructionSimplify and convert all "FunctionName"s to "functionName". This patch does touch a lot of files but gets done with the cleanup of InstructionSimplify in one commit. This is the alternative to the less invasive clang-format only patch: D126783 Reviewed By: spatel, rengolin Differential Revision: https://reviews.llvm.org/D126889	2022-06-09 16:10:08 +02:00
Simon Pilgrim	7dbfcfa735	[DAG] combineInsertEltToShuffle - if EXTRACT_VECTOR_ELT fails to match an existing shuffle op, try to replace an undef op if there is one. This should fix a number of shuffle regressions in D127115 where the re-ordered combines mean we fail to fold a EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT sequence into a BUILD_VECTOR if we extract from more than one vector source.	2022-06-09 14:56:14 +01:00
Guillaume Chatelet	dc3367970e	[SelectionDAG] Handle bzero/memset libcalls globally instead of per target Differential Revision: https://reviews.llvm.org/D127279	2022-06-09 08:34:55 +00:00
Craig Topper	4bcfc41846	[SelectionDAG] Teach computeKnownBits that a nsw self multiply produce a positive value. This matches what we do in IR. For the RISC-V test case, this allows us to use -8 for the AND mask instead of materializing a constant in a register. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D127335	2022-06-08 14:55:58 -07:00
Kai Nacke	d897a14c2e	[SystemZ] Fix check for zero size when lowering memcmp. During lowering of memcmp/bcmp, the check for a size of 0 is done in 2 different ways. In rare cases this can lead to a crash in SystemZSelectionDAGInfo::EmitTargetCodeForMemcmp(). The root cause is that SelectionDAGBuilder::visitMemCmpBCmpCall() checks for a constant int value which is not yet evaluated. When the value is turned into a SDValue, then the evaluation is done and results in a ConstantSDNode. But EmitTargetCodeForMemcmp() expects the special case of 0 length to be handled, which results in an assertion. The fix is to turn the value into a SDValue, so that both functions use the same check. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D126900	2022-06-08 14:52:13 -04:00
Simon Pilgrim	b84c10d4bc	[DAG] visitVSELECT - don't wait for truncation of sub before attempting to match with getTruncatedUSUBSAT Fixes some X86 PSUBUS regressions encountered in D127115 where the truncate was being replaced with a PACKSS/PACKUS before the fold got called again	2022-06-08 16:16:35 +01:00
Joseph Huber	9e0dbd2a2a	[Target] Remove `startswith` for adding `SHF_EXCLUDE` to offload section Summary: We use the special section name `.llvm.offloading` to store device imagees in the host object file. We want these to be stripped by the linker as they are not used after linking so we use the `SHF_EXCLUDE` flag to instruct the linker to drop them. We used to do this for all sections that started with `.llvm.offloading` when we encoded metadata in the section name itself. Now we embed a special binary containing the metadata, we should only add the flag on this name specifically.	2022-06-08 09:56:51 -04:00
Paul Walker	d88354213c	[SelectionDAG] Remove invalid TypeSize conversion from PromoteIntRes_BITCAST. Extend the TypeWidenVector case of PromoteIntRes_BITCAST to work with TypeSize directly rather than silently casting to unsigned. To accomplish this I've extended TypeSize with an interface that essentially allows TypeSize division when both operands have the same number of dimensions. There still exists combinations of scalable vector bitcasts that cause compiler crashes. I call these out by adding "is missing" entries to sve-bitcast. Depends on D126957. Fixes: #55114 Differential Revision: https://reviews.llvm.org/D127126	2022-06-08 10:30:07 +01:00
Paul Walker	a1121c31d8	[SVE] Fix incorrect code generation for bitcasts of unpacked vector types. Bitcasting between unpacked scalable vector types of different element counts is not a NOP because the live elements are laid out differently. 01234567 e.g. nxv2i32 = XX??XX?? nxv4f16 = X?X?X?X? Differential Revision: https://reviews.llvm.org/D126957	2022-06-08 10:30:07 +01:00
Chuanqi Xu	0e10f12844	[NFC] Remove commented cerr debugging loggings There are some unused cerr debugging loggings in the codes. It is weird to remain such commented debug helpers in the product.	2022-06-08 15:58:06 +08:00
Kito Cheng	7207373e1e	Revert "[SplitKit] Handle early clobber + tied to def correctly" Revert due to failed on LLVM_ENABLE_EXPENSIVE_CHECKS. This reverts commit `e14d04909d`.	2022-06-08 13:05:35 +08:00
Kito Cheng	e14d04909d	[SplitKit] Handle early clobber + tied to def correctly Spliter will try to extend a live range into `r` slot for a use operand, that's works on most situaion, however that not work correctly when the operand has tied to def, and the def operand is early clobber. Give an example to demo what's wrong: 0 %0 = ... 16 early-clobber %0 = Op %0 (tied-def 0), ... 32 ... = Op %0 Before extend: %0 = [0r, 0d) [16e, 32d) The point we want to extend is 0d to 16e not 16r in this case, but if we use 16r here we will extend nothing because that already contained in [16e, 32d). This patch add check for detect such case and adjust the extend point. Detailed explanation for testcase: https://reviews.llvm.org/D126047 Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D126048	2022-06-08 11:33:05 +08:00
David Penry	907aedbb3d	[NFC] Fix spelling/newlines in comments/debug messages Just a few spelling mistakes and missing newlines Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D127162	2022-06-07 09:38:53 -07:00
Simon Pilgrim	a083f3caa1	[DAG] combineShuffleOfSplatVal - fold shuffle(splat,undef) -> splat, iff the splat contains no UNDEF elements As noticed on D127115 - we were missing this fold, instead just having the shuffle(shuffle(x,undef,splatmask),undef) fold. We should be able to merge these into one using SelectionDAG::isSplatValue, but we'll need to match the shuffle's undef handling first. This also exposed an issue in SelectionDAG::isSplatValue which was incorrectly propagating the undef mask across a bitcast (it was trying to just bail with a APInt::isSubsetOf if it found any undefs but that was actually the wrong way around so didn't fire for partial undef cases).	2022-06-07 16:42:24 +01:00
Matt Arsenault	56303223ac	llvm-reduce: Don't assert on functions which don't track liveness Use the query that doesn't assert if TracksLiveness isn't set, which needs to always be available. We also need to start printing liveins regardless of TracksLiveness.	2022-06-07 10:00:25 -04:00
Guillaume Chatelet	0788186182	[Alignment][NFC] Remove usage of MemSDNode::getAlignment I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with TypeSize since it came up a few times. Differential Revision: https://reviews.llvm.org/D126910	2022-06-07 13:52:20 +00:00
Nikita Popov	5a64bc207e	[DAGCombiner] Remove overzealous assertion when folding assert+trunc+assert (PR55846) These assert that there are no "useless" assertzext/assertsext nodes (that assert a wider width than a following trunc), but I don't think there is anything preventing such nodes from reaching this code. I don't think the assertion is relevant for correctness of this transform either -- if such an assert is present, then the other one will always be to a smaller width, and we'll pick that one. The assertion dates back to D37017. Fixes https://github.com/llvm/llvm-project/issues/55846. Differential Revision: https://reviews.llvm.org/D126952	2022-06-07 09:50:26 +02:00
Fangrui Song	15d82c62dc	[MC] De-capitalize MCStreamer functions Follow-up to `c031378ce0` . The class is mostly consistent now.	2022-06-07 00:31:02 -07:00
Hendrik Greving	a43d25734a	[ModuloSchedule] Fix terminator update when peeling. Fixes a bug of us not correctly updating the terminator of the loop's preheader, if multiple terminating branch instructions are present. This is tested through existing tests. The bug itself is hard or not possible to get exposed with the upstream Hexagon backend, because the machine pipeliner checks for an existing preheader, which is defined as a block with only 1 edge into the header. The condition of this bug is a block into the loop with more than 1 edge, and not every downstream target checks for an existing preheader. Differential Revision: https://reviews.llvm.org/D126386	2022-06-06 19:52:28 +00:00
Michael Kitzan	b7fcf6632f	[GISel] Add new combines for G_ADD Patch adds new GICombineRules for G_ADD: G_ADD(x, G_SUB(y, x)) -> y G_ADD(G_SUB(y, x), x) -> y Patch additionally adds new combine tests for AArch64 target for these new rules. Reviewed by: paquette Differential Revision: https://reviews.llvm.org/D87936	2022-06-06 11:19:45 -07:00
Craig Topper	be398100ea	[SelectionDAG] Further improve computeKnownBits for (smax X, C) where C is non-negative. Move the code that was added for D126896 after the normal recursive calls to computeKnownBits. This allows us to calculate trailing zeros. Previously we would break out of the switch before the recursive calls.	2022-06-06 09:59:23 -07:00
Kazu Hirata	5c06f7168f	[CodeGen] Remove splitCanCauseEvictionChain and its helpers (NFC) The last use was removed on Mar 7, 2022 in commit `294eca35a0`.	2022-06-05 20:22:47 -07:00
Kazu Hirata	43d4585e64	[GlobalISel] Remove widenWithUnmerge (NFC) The last use was removed on Dec 23, 2021 in commit `29f88b93fd`.	2022-06-05 19:58:18 -07:00
Kazu Hirata	61abcb0b37	[GlobalISel] Remove valueIsSplit (NFC) The last use was removed on Jun 27, 2019 in commit `8138996128`.	2022-06-05 19:51:03 -07:00
Lian Wang	20cf77f776	[LegalizeTypes][VP] Add widen and split support for vp.fptrunc and vp.fpext Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D126439	2022-06-06 02:28:01 +00:00
Kazu Hirata	3b9707dbc0	[llvm] Convert for_each to range-based for loops (NFC)	2022-06-05 12:07:14 -07:00
Alexey Lapshin	501d5b24db	[Debuginfo][DWARF][NFC] Refactor DwarfStringPoolEntryRef - remove isIndexed(). This patch is extraction from the https://reviews.llvm.org/D126883. It removes DwarfStringPoolEntryRef::isIndexed() and isIndexed bit since they are not used. Differential Revision: https://reviews.llvm.org/D126958	2022-06-05 21:18:31 +03:00
Fangrui Song	95a134254a	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 01:07:51 -07:00
Fangrui Song	d86a206f06	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 00:31:44 -07:00
Kazu Hirata	bcf4fa458a	[CodeGen] Use a range-based for loop (NFC)	2022-06-04 22:26:55 -07:00
Kazu Hirata	4969a6924d	Use llvm::less_first (NFC)	2022-06-04 21:23:18 -07:00
Kazu Hirata	32ce076d78	[CodeGen] Use StringRef::contains (NFC)	2022-06-04 20:58:58 -07:00
Fangrui Song	36c7d79dc4	Remove unneeded cl::ZeroOrMore for cl::opt options Similar to `557efc9a8b`. This commit handles options where cl::ZeroOrMore is more than one line below cl::opt.	2022-06-04 00:10:42 -07:00
Fangrui Song	557efc9a8b	[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded. Also remove cl::init(false) while touching the lines.	2022-06-03 21:59:05 -07:00
Benjamin Kramer	e8e4b741dd	[DAGCombiner] Add bf16 to the matrix of types that we don't promote to integer stores Remove a few stray semicolons while there.	2022-06-03 13:28:34 +02:00
Nikita Popov	ad742cf85d	[DAGCombine] Handle promotion of shift with both operands the same When promoting a shift, make sure we only fetch the second operand after promoting the first. Load promotion may replace users of the old load, and we don't want to be left with a dangling reference to the old load instruction. The crashing test case is from https://reviews.llvm.org/D126689#3553212. Differential Revision: https://reviews.llvm.org/D126886	2022-06-03 10:00:44 +02:00
Craig Topper	fa20bf1636	[DAGCombiner][RISCV] Improve computeKnownBits for (smax X, C) where C is non-negative. If C is non-negative, the result of the smax must also be non-negative, so all sign bits of the result are 0. This allows DAGCombiner to remove a zext_inreg in the modified test. This zext_inreg started as a sext that became zext before type legalization then was promoted to a zext_inreg. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126896	2022-06-02 12:34:24 -07:00
jacquesguan	5482ae6328	[LegalizeTypes][VP] Add widen and split support for VP FP integer casting op. This patch adds widen and split support for VP_FPTOSI, VP_FPTOUI, VP_SITOFP and VP_UITOFP. Differential Revision: https://reviews.llvm.org/D126847	2022-06-02 09:05:27 +00:00
jacquesguan	058791d8f2	[LegalizeTypes][VP] Add widen and split support for VP_SIGN_EXTEND and VP_ZERO_EXTEND. Differential Revision: https://reviews.llvm.org/D126442	2022-06-02 02:21:22 +00:00
Matt Arsenault	4cb722acbc	BranchFolder: Require NoPHIs The pass doesn't handle SSA and breaks any phis.	2022-06-01 21:14:49 -04:00
Hendrik Greving	a92ed167f2	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-02 00:49:11 +00:00
Quentin Colombet	1a155ee7de	[RegisterClassInfo] Invalidate cached information if ignoreCSRForAllocationOrder changes Even if CSR list is same between functions, we could have had a different allocation order if ignoreCSRForAllocationOrder is evaluated differently. Hence invalidate cached register class information if ignoreCSRForAllocationOrder changes. Patch by Srividya Karumuri <srividya_karumuri@apple.com> Differential Revision: https://reviews.llvm.org/D126565	2022-06-01 17:15:51 -07:00
Hendrik Greving	e9d05cc7d8	Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4." This reverts commit `430ac5c302`. Due to failures in Clang tests. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 13:27:49 -07:00
Hendrik Greving	430ac5c302	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 12:48:01 -07:00
Denis Antrushin	7047d79fde	[TwoAddressInstructionPass] Relax assert in statepoint processing. D124631 added special processing for STATEPOINT instructions. It appears that assertion added there is too strong. We can get two tied operands with the same register tied to different defs. If we hit such case, do not process it in statepoint-specific code and delegate it to common case.	2022-06-01 21:34:52 +07:00
Matt Arsenault	0e1c71e4a4	CodeGen: Move getAddressSpaceForPseudoSourceKind into TargetMachine Avoid the dependency on TargetInstrInfo, which depends on the subtarget and therefore the individual function. Currently AMDGPU is constructing PseudoSourceValue instances in MachineFunctionInfo. In order to facilitate copying MachineFunctionInfo, we need to stop allocating these there. Alternatively we could allow targets to subclass PseudoSourceValueManager, and allocate them similarly to MachineFunctionInfo.	2022-06-01 09:45:40 -04:00
Martin Storsjö	6b75a3523f	[ARM] [MC] Add support for writing ARM WinEH unwind info This includes .seh_* directives for generating it from assembly. It is designed fairly similarly to the ARM64 handling. For .seh_handler directives, such as ".seh_handler __C_specific_handler, @except" (which is supported on x86_64 and aarch64 so far), the "@except" bit doesn't work in ARM assembly, as '@' is used as a comment character (on all current platforms). Allow using '%' instead of '@' for this purpose. This convention is used by GAS in similar contexts already, e.g. [1]: Note on targets where the @ character is the start of a comment (eg ARM) then another character is used instead. For example the ARM port uses the % character. In practice, this unfortunately means that all such .seh_handler directives will need ifdefs for ARM. Contrary to ARM64, on ARM, it's quite common that we can't evaluate e.g. the function length at this point, due to instructions whose length is finalized later. (Also, inline jump tables end with a ".p2align 1".) If unable to to evaluate the function length immediately, emit it as an MCExpr instead. If we'd implement splitting the unwind info for a function (which isn't implemented for ARM64 yet either), we wouldn't know whether we need to split it though. Avoid calling getFrameIndexOffset() on an unset FuncInfo.UnwindHelpFrameIdx, to avoid triggering asserts in the preexisting testcase CodeGen/ARM/Windows/wineh-basic.ll. (Once MSVC exception handling is fully implemented, those changes can be reverted.) [1] https://sourceware.org/binutils/docs/as/Section.html#Section Differential Revision: https://reviews.llvm.org/D125645	2022-06-01 11:25:48 +03:00
Ping Deng	ae8ae45e2a	[DAGCombine][NFC] Add braces to 'else' to match braced 'if' Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D126624	2022-06-01 07:54:05 +00:00
Bjorn Pettersson	86caa03718	Revert "Round up zero-sized symbols to 1 byte in `.debug_aranges`." This reverts commit `256a52d9aa` (and also the follow-up commit `38eb4fe74b` that moved a test case to a different directory). As discussed in https://reviews.llvm.org/D126257 there is a suspicion that something was wrong with this commit as text section range was shortened to 1 byte rather than rounded up as shown in the llvm/test/DebugInfo/X86/dwarf-aranges.ll test case.	2022-05-31 11:03:44 +02:00
Denis Antrushin	85322e82be	[TwoAddressInstructionPass] Special processing of STATEPOINT instruction. STATEPOINT is a special pseudo instruction which represent Moving GC semantic to LLVM. Every tied def/use VReg pair in STATEPOINT represent same physical register which can 'magically' change during call wrapped by statepoint. (By construction, tied use operand is not live across STATEPOINT). This means that when converting into two-address form, there is not need to insert COPY instruction before stateppoint, what TwoAddressInstruction pass does for 'regular' instructions. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D124631	2022-05-30 19:07:30 +03:00
Simon Moll	18c1ee04de	Re-land "[VP] vp intrinsics are not speculatable" with test fix Update the llvmir-intrinsics.mlir test to account for the modified attribute sets. This reverts commit `2e2a8a2d90`.	2022-05-30 14:41:15 +02:00
Mehdi Amini	2e2a8a2d90	Revert "[VP] vp intrinsics are not speculatable" This reverts commit `78a18d2b54`. Break MLIR bot: https://lab.llvm.org/buildbot/#/builders/61/builds/27127	2022-05-30 12:26:16 +00:00
Simon Moll	78a18d2b54	[VP] vp intrinsics are not speculatable VP intrinsics show UB if the %evl parameter is out of bounds - they must not carry the speculatable attribute. The out-of-bounds UB disappears when the %evl parameter is expanded into the mask or expansion replaces the entire VP intrinsic with non-VP code. This patch - Removes the speculatable attribute on all VP intrinsics. - Generalizes the isSafeToSpeculativelyExecute function to let VP expansion know whether the VP intrinsic replacement will be speculatable. VP expansion may only discard %evl where this is the case. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125296	2022-05-30 12:20:05 +02:00
Ping Deng	88af539c0e	[RISCV] Support VP_REDUCE_MUL mask operation Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126520	2022-05-30 03:05:39 +00:00
Ping Deng	083798e270	[LegalizeTypes][VP] Add integer promotion support for vp.fptosi/vp.fptoui Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125760	2022-05-30 03:05:39 +00:00
Serge Pavlov	bdd0093f4d	[GlobalISel] Add G_IS_FPCLASS Add a generic opcode to represent `llvm.is_fpclass` intrinsic. Differential Revision: https://reviews.llvm.org/D121454	2022-05-27 13:49:47 +07:00
Ping Deng	121689a62e	[SelectionDAG][NFC] Simplify integer promotion in setcc/vp.setcc Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126516	2022-05-27 05:50:19 +00:00
Rahman Lavaee	08cc058518	Reland "[Propeller] Promote functions with propeller profiles to .text.hot." This relands commit `4d8d2580c5`. The major change here is using 'addUsedIfAvailable<BasicBlockSectionsProfileReader>()` to make sure we don't change the pipeline tests. Differential Revision: https://reviews.llvm.org/D126518	2022-05-26 19:53:14 -07:00
Rahman Lavaee	3aa249329f	Revert "[Propeller] Promote functions with propeller profiles to .text.hot." This reverts commit `4d8d2580c5`.	2022-05-26 18:45:40 -07:00
Rahman Lavaee	4d8d2580c5	[Propeller] Promote functions with propeller profiles to .text.hot. Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely). This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`. The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`. Differential Revision: https://reviews.llvm.org/D122930	2022-05-26 16:23:21 -07:00
Craig Topper	460781feef	[LegalizeTypes] Fix bug in expensive checks verification With a fix for an expensive checks build failure exposed by new RISC-V tests. Something about expanding two rotates in type legalization caused a change in the remapping tables that the expensive checks verifying wasn't expecting. See comment in the code for how it was fixed. Tests came from this commit that exposed the bug [RISCV] Add test cases showing failure to remove mask on rotate amounts. If the masking AND has multiple users we fail to remove it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D126036	2022-05-26 13:13:32 -07:00
Adrian Tong	7c13ae6490	Give option to use isCopyInstr to determine which MI is treated as Copy instruction in MCP. This is then used in AArch64 to remove copy instructions after taildup ran in machine block placement Differential Revision: https://reviews.llvm.org/D125335	2022-05-26 18:43:16 +00:00
Simon Pilgrim	f366acdbf6	[DAG] Generalize (sra (trunc (sra x, c1)), c2) -> (trunc (sra x, c1 + c2)) constant folding Remove local (uniform) constant folding and rely on getNode() to perform it Minor cleanup step toward adding non-uniform shift amount support	2022-05-26 14:05:09 +01:00
Simon Pilgrim	7b617eef80	[DAG] Cleanup "and/or of cmp with single bit diff" fold to use ISD::matchBinaryPredicate Prep work as I'm investigating some cases where TLI::convertSetCCLogicToBitwiseLogic should accept vectors.	2022-05-26 12:34:09 +01:00
Chen Zheng	d79275238f	[MachineSink] replace MachineLoop with MachineCycle reapply `62a9b36fcf` and fix module build failue: 1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def MachineCycleInfoWrapperPass is a anylysis pass, should not be there. 2: move the definition for MachineCycleInfoPrinterPass to cpp file. Otherwise, there are module conflicit for MachineCycleInfoWrapperPass in MachinePassRegistry.def and MachineCycleAnalysis.h after `62a9b36fcf`. MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-26 06:45:23 -04:00
Fangrui Song	9ee15bba47	[MC] Lower case the first letter of EmitCOFF* EmitWin* EmitCV*. NFC	2022-05-26 00:14:08 -07:00
serge-sans-paille	fb67d683db	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `7030654296` detected a few regressions, fixing them. Differential Revision: https://reviews.llvm.org/D126417	2022-05-26 08:12:34 +02:00
Lian Wang	8aa6b05deb	[LegalizeTypes][VP] Add widen and split support for VP_TRUNCATE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125950	2022-05-26 02:03:27 +00:00
Patrick Walton	256a52d9aa	Round up zero-sized symbols to 1 byte in `.debug_aranges`. This commit modifies the AsmPrinter to avoid emitting any zero-sized symbols to the .debug_aranges table, by rounding their size up to 1. Entries with zero length violate the DWARF 5 spec, which states: > Each descriptor is a triple consisting of a segment selector, the beginning > address within that segment of a range of text or data covered by some entry > owned by the corresponding compilation unit, followed by the non-zero length > of that range. In practice, these zero-sized entries produce annoying warnings in lld and cause GNU binutils to truncate the table when parsing it. Other parts of LLVM, such as DWARFDebugARanges in the DebugInfo module (specifically the appendRange method), already avoid emitting zero-sized symbols to .debug_aranges, but not comprehensively in the AsmPrinter. In fact, the AsmPrinter does try to avoid emitting such zero-sized symbols when labels aren't involved, but doesn't when the symbol to emitted is a difference of two labels; this patch extends that logic to handle the case in which the symbol is defined via labels. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D126257	2022-05-25 13:31:36 -07:00
Takafumi Arakaki	18e6b8234a	Allow pointer types for atomicrmw xchg This adds support for pointer types for `atomic xchg` and let us write instructions such as `atomicrmw xchg i64** %0, i64* %1 seq_cst`. This is similar to the patch for allowing atomicrmw xchg on floating point types: https://reviews.llvm.org/D52416. Differential Revision: https://reviews.llvm.org/D124728	2022-05-25 16:20:26 +00:00
Simon Moll	6e12711081	[VP][fix] Don't discard masks in reductions When expanding VP reductions to non VP-code, the reduction pass was ignoring the mask before. Fix this by keeping the mask and selecting neutral elements where the mask is zero. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D126362	2022-05-25 15:54:45 +02:00
Chen Zheng	80c4910f3d	Revert "[MachineSink] replace MachineLoop with MachineCycle" This reverts commit `62a9b36fcf`. Cause build failure on lldb incremental buildbot: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes	2022-05-24 22:43:37 -04:00
Paul Walker	6f215ca680	[SelectionDAG] Add support to widen ISD::STEP_VECTOR operations. Fixes: #55165 Differential Revision: https://reviews.llvm.org/D126168	2022-05-24 22:42:37 +01:00
Sotiris Apostolakis	67be40df6e	Recommit "[SelectOpti][5/5] Optimize select-to-branch transformation" Use container::size_type directly to avoid type mismatch causing build failures in Windows. Original commit message: This patch optimizes the transformation of selects to a branch when the heuristics deemed it profitable. It aggressively sinks eligible instructions to the newly created true/false blocks to prevent their execution on the common path and interleaves dependence slices to maximize ILP. Depends on D120232 Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D120233	2022-05-24 14:08:09 -04:00
Serge Pavlov	6fc0bc5b0f	Fix behavior of is_fp_class on empty class set The second argument to is_fp_class specifies the set of floating-point class to test against. It can be zero, in this case the intrinsic is expected to return zero value. Differential Revision: https://reviews.llvm.org/D112025	2022-05-24 21:50:18 +07:00
Simon Pilgrim	11455e4758	[DAG] Unroll vectorized FPOW instructions before widening that will scalarize to libcalls anyway Followup to D125988 - FPOW is similar to FREM and will most likely scalarize to libcalls, so unroll before widening to prevent use making additional libcalls with UNDEF args.	2022-05-24 15:44:53 +01:00
Sam Parker	e0fe9785d3	[TypePromotion] Avoid unnecessary trunc zext pairs Any zext 'sink' should already have an operand that is in the legal value, so avoid using a trunc and just use the trunc operand instead. Differential Revision: https://reviews.llvm.org/D118905	2022-05-24 15:34:36 +01:00
Nabeel Omer	8b5d9cbbfe	[x86][DAG] Unroll vectorized FREMs that will become libcalls Currently, two element vectors produced as the result of a binary op are widened to four element vectors on x86 by DAGTypeLegalizer::WidenVecRes_BinaryCanTrap. If the op still isn't legal after widening it is unrolled into 4 scalar ops in SelectionDAG before being converted into a libcall. This way we end up with 4 libcalls (two of them on known undef elements) instead of the original two libcalls. This patch modifies DAGTypeLegalizer::WidenVectorResult to ensure that if it is known that a binary op will be tunred into a libcall, it is unrolled instead of being widened. This prevents the creation of the extra scalar instructions on known undef elements and (eventually) libacalls with known undef parameters which would otherwise be created when the op gets expanded post widening. Differential Revision: https://reviews.llvm.org/D125988	2022-05-24 13:34:51 +01:00
Fraser Cormack	7f7ef0ed61	[LegalizeTypes][NFC] Fix node name in assertion message This was probably copy/pasted from the MSCATTER widening.	2022-05-24 09:16:18 +01:00
Lian Wang	be84f91f87	[LegalizeTypes][VP] Fix OpNo in WidenVecOp_VP_SCATTER Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126276	2022-05-24 07:14:46 +00:00
Chen Zheng	62a9b36fcf	[MachineSink] replace MachineLoop with MachineCycle MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-24 01:16:19 -04:00
Sotiris Apostolakis	1786e70bd8	Revert "[SelectOpti][5/5] Optimize select-to-branch transformation" This reverts commit `a111fb9601`.	2022-05-24 00:02:00 -04:00
Sotiris Apostolakis	a111fb9601	[SelectOpti][5/5] Optimize select-to-branch transformation This patch optimizes the transformation of selects to a branch when the heuristics deemed it profitable. It aggressively sinks eligible instructions to the newly created true/false blocks to prevent their execution on the common path and interleaves dependence slices to maximize ILP. Depends on D120232 Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D120233	2022-05-23 23:31:27 -04:00
Sotiris Apostolakis	d7ebb74611	[SelectOpti][4/5] Loop Heuristics This patch adds the loop-level heuristics for determining whether branches are more profitable than conditional moves. These heuristics apply to only inner-most loops. Depends on D120231 Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D120232	2022-05-23 22:05:41 -04:00
Sotiris Apostolakis	8b42bc5662	[SelectOpti][3/5] Base Heuristics This patch adds the base heuristics for determining whether branches are more profitable than conditional moves. Base heuristics apply to all code apart from inner-most loops. Depends on D122259 Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D120231	2022-05-23 22:01:12 -04:00
Sotiris Apostolakis	97c3ef5c8a	[SelectOpti][2/5] Select-to-branch base transformation This patch implements the actual transformation of selects to branches. It includes only the base transformation without any sinking. Depends on D120230 Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D122259	2022-05-23 16:11:40 -04:00
Qunyan Mangus	12bae5f3e2	Remove duplicate fields in RAGreedy RAGreedy has two fields of RegisterClassInfo, one called RCI and another RegClassInfo from its base class. RCI is initialized without freezeReservedRegs first, while RegClassInfo does. Therefore, if reserved registers information is changed between last time freezeReservedRegs is called and RAGreedy, it's not picked up by RCI. Instead of having both fields in RAGreedy, remove RCI and use RegClassInfo instead. Also removed is the TRI field which is present in its base class. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D125926	2022-05-23 13:08:25 -07:00
Craig Topper	569d8945f3	[DAGCombiner][AArch64] Don't fold (smulo x, 2) -> (saddo x, x) if VT is i2. If the VT is i2, then 2 is really -2. Test has not been commited yet, but diff shows the change. Fixes PR55644. Differential Revision: https://reviews.llvm.org/D126213	2022-05-23 11:13:57 -07:00
Nikita Popov	5126c38012	[CGP] Freeze condition when despeculating ctlz/cttz Freeze the condition of the newly introduced conditional branch, to avoid immediate undefined behavior if the input to ctlz/cttz was originally poison. Differential Revision: https://reviews.llvm.org/D125887	2022-05-23 11:01:18 +02:00
Craig Topper	c11051a400	[SelectionDAG] Add a freeze to ISD::ABS expansion. I had initially assumed this was the problem with https://github.com/llvm/llvm-project/issues/55271#issuecomment-1133426243 But it turns out that was a simpler issue. This patch is still more correct than what we were doing before so figured I'd submit it anyway. No test case because I'm not sure how to get an undef around until expansion. Looking at the test deltas I wonder if it be valid to combine (sext_inreg (freeze (aextload X))) -> (freeze (sextload X)). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D126175	2022-05-22 14:29:58 -07:00
Craig Topper	768a1ca5ec	[SelectionDAG] Fold abs(undef) to 0 instead of undef. abs should only produce a positive value or the signed minimum value. This means we can't fold abs(undef) to undef as that would allow more values. Fold to 0 instead to match InstSimplify. Fixes test mentioned in comment on pr55271. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D126174	2022-05-22 12:47:32 -07:00
Paul Walker	258dac43d6	[SVE] Enable use of 32bit gather/scatter indices for fixed length vectors Differential Revision: https://reviews.llvm.org/D125193	2022-05-22 12:32:30 +01:00
Ping Deng	0e8ac3a797	[LegalizeTypes][VP] Add integer promotion support for vp.sitofp/vp.uitofp Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125960	2022-05-22 02:13:45 +00:00
Craig Topper	4638766794	[TypePromotion] Refine fix sext/zext for promoted constant from D125294. Reviewing the code again, I believe the sext is needed on the LHS or RHS for ICmp and only on the RHS for Add. Add an opcode check before checking the operand number. Fixes PR55627. Differential Revision: https://reviews.llvm.org/D125654	2022-05-21 14:08:15 -07:00
Craig Topper	003b95acf2	[LegalizeTypes] Remove double map lookup in DAGTypeLegalizer::PerformExpensiveChecks. NFC Remove repeated checks for ResId being 0.	2022-05-21 00:06:59 -07:00
Craig Topper	66875dbcc0	[LegalizeTypes] Use SmallDenseMap::count instead of SmallDenseMap::find. NFC It's more readable and more efficient.	2022-05-21 00:06:55 -07:00
Shilei Tian	ff60a0a364	[LLVM] Add a check if should cast atomic operations to integer type Currently for atomic load, store, and rmw instructions, as long as the operand is floating-point value, they are casted to integer. Nowadays many targets can actually support part of atomic operations with floating-point operands. For example, NVPTX supports atomic load and store of floating-point values. This patch adds a series interface functions `shouldCastAtomicXXXInIR`, and the default implementations are same as what we currently do. Later for targets can have their specialization. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D125652	2022-05-20 17:23:53 -04:00
Zequan Wu	9886046289	[CodeView] Combine variable def ranges that are continuous. It saves about 1.13% size for chrome.dll.pdb on chrome official build. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D125721	2022-05-20 12:12:14 -07:00
Craig Topper	8d3894f67e	[TypePromotion] Fix another case for sext vs zext in promoted constant. If the SafeWrap operation is a subtract, we negated the constant to treat the subtract as an addition. The sext was based on the operation being addition. So we really need to do (neg (sext (neg C))) when promoting the constant. This is equivalent to (sext C) for every value of C except the min signed value. For min signed value we need to do (zext C) instead. Fixes PR55490. Differential Revision: https://reviews.llvm.org/D125653	2022-05-20 09:30:07 -07:00
Ivan Kosarev	86803008ea	[MIR] Provide location of extra instruction operand when diagnosing it. Also resolves misspelled FileCheck directives caught with D125604. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D125965	2022-05-20 05:56:25 +01:00
Sotiris Apostolakis	ca7c307d18	[SelectOpti][1/5] Setup new select-optimize pass This is the first commit for the cmov-vs-branch optimization pass. The goal is to develop a new profile-guided and target-independent cost/benefit analysis for selecting conditional moves over branches when optimizing for performance. Initially, this new pass is expected to be enabled only for instrumentation-based PGO. RFC: https://discourse.llvm.org/t/rfc-cmov-vs-branch-optimization/6040 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D120230	2022-05-19 16:31:10 +00:00
Jay Foad	6bec3e9303	[APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf Most clients only used these methods because they wanted to be able to extend or truncate to the same bit width (which is a no-op). Now that the standard zext, sext and trunc allow this, there is no reason to use the OrSelf versions. The OrSelf versions additionally have the strange behaviour of allowing extending to a smaller width, or truncating to a larger width, which are also treated as no-ops. A small amount of client code relied on this (ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and needed rewriting. Differential Revision: https://reviews.llvm.org/D125557	2022-05-19 11:23:13 +01:00
Lian Wang	530bab1f93	[RISCV][SelectionDAG] Support VECREDUCE_ADD mask operation Re-landed D125206 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125206	2022-05-19 09:53:33 +00:00
Lian Wang	f035068bb3	[LegalizeVectorTypes][VP] Add widen and split support for VP_SETCC Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D125446	2022-05-19 07:42:39 +00:00
Lian Wang	bbc6834e26	[LegalizeTypes][VP] Add integer promotions support for VP_TRUNCATE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125739	2022-05-19 07:36:10 +00:00
Lian Wang	993070d11f	[LegalizeTypes][VP][NFC] Use an if and two returns instead of ?: operator Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125858	2022-05-19 07:18:24 +00:00
Jon Roelofs	d699e54ca2	Fix an or+and miscompile w/ GlobalISel Fixes #55284	2022-05-18 19:09:47 -07:00
Matthias Braun	8d03c49f49	Extend switch condition in optimizeSwitchPhiConst when free In a case like: switch((i32)x) { case 42: phi((i64)42, ...); } replace `(i64)42` with `zext(x)` when we can do so for free. This fixes a part of https://github.com/llvm/llvm-project/issues/55153 Differential Revision: https://reviews.llvm.org/D124897	2022-05-18 16:23:53 -07:00
Mitch Phillips	7aa1fa0a0a	Reland "[dwarf] Emit a DIGlobalVariable for constant strings." An upcoming patch will extend llvm-symbolizer to provide the source line information for global variables. The goal is to move AddressSanitizer off of internal debug info for symbolization onto the DWARF standard (and doing a clean-up in the process). Currently, ASan reports the line information for constant strings if a memory safety bug happens around them. We want to keep this behaviour, so we need to emit debuginfo for these variables as well. Reviewed By: dblaikie, rnk, aprantl Differential Revision: https://reviews.llvm.org/D123534	2022-05-18 13:56:45 -07:00
Michael Kitzan	29bebb0237	[GISel] Add new combines for G_FMINNUM/MAXNUM and G_FMINIMUM/MAXIMUM I noticed https://reviews.llvm.org/D87415 added SDAG combines to fold FMIN/MAX instrs with NaNs. The patch implements the same NaN combines for GISel GMIR FMIN/MAX opcodes: G_FMINNUM(X, NaN) -> X G_FMAXNUM(X, NaN) -> X G_FMINIMUM(X, NaN) -> NaN G_FMAXIMUM(X, NaN) -> NaN The patch adds AArch64 tests for these combines as well. Reviewed by: arsenm Differential revision: https://reviews.llvm.org/D125819	2022-05-18 12:08:53 -07:00
Yusra Syeda	5ac411aea8	[SystemZ][z/OS] Add the PPA1 to SystemZAsmPrinter Differential Revision: https://reviews.llvm.org/D125725	2022-05-18 14:13:17 -04:00
Craig Topper	46eef76876	[DAGCombiner] Fix bug in MatchBSwapHWordLow. This function tries to match (a >> 8) \| (a << 8) as (bswap a) >> 16. If the SRL isn't masked and the high bits aren't demanded, we still need to ensure that bits 23:16 are zero. After the right shift they will be in bits 15:8 which is where the important bits from the SHL end up. It's only a bswap if the OR on bits 15:8 only takes the bits from the SHL. Fixes PR55484. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D125641	2022-05-18 09:23:18 -07:00
Yeting Kuo	00999fb6e1	[SelectionDAGBuilder] Pass fast math flags to most of VP SDNodes. The patch does not pass math flags to float VPCmpIntrinsics because LLParser could not identify float VPCmpIntrinsics as FPMathOperators. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125600	2022-05-18 16:15:47 +08:00
Simon Pilgrim	d40b7f0d5a	[DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses If we're using shift pairs to mask, then relax the one use limit if the shift amounts are equal - we'll only be generating a single AND node. AArch64 has a couple of regressions due to this, so I've enforced the existing one use limit inside a AArch64TargetLowering::shouldFoldConstantShiftPairToMask callback. Part of the work to fix the regressions in D77804 Differential Revision: https://reviews.llvm.org/D125607	2022-05-17 13:40:11 +01:00
Jay Foad	77480556c4	[RegAllocGreedy] New hook regClassPriorityTrumpsGlobalness Add a new TargetRegisterInfo hook to allow targets to tweak the priority of live ranges, so that AllocationPriority of the register class will be treated as more important than whether the range is local to a basic block or global. This is determined per-MachineFunction. Differential Revision: https://reviews.llvm.org/D125102	2022-05-17 12:35:21 +01:00
jacquesguan	26593e7314	[SelectionDAG] Support more VP reduction mask operation. This patch uses VP_REDUCE_AND and VP_REDUCE_OR to replace VP_REDUCE_SMAX,VP_REDUCE_SMIN,VP_REDUCE_UMAX and VP_REDUCE_UMIN for mask vector type. Differential Revision: https://reviews.llvm.org/D125002	2022-05-17 09:14:21 +00:00
Fraser Cormack	599ff247de	[StackColoring] Don't merge slots with differing StackIDs The documentation for this specifically mentions that this should not happen. We could think about adding target hooks to permit it (and how to merge IDs) in the future if that is desirable. This specific test case was merging a scalable-vector slot into a non-scalable one and dropping the notion of scalability, meaning we failed to allocate enough stack space for the object. Reviewed By: arsenm, MaskRay, sdesmalen Differential Revision: https://reviews.llvm.org/D125699	2022-05-17 08:28:49 +01:00
Mitch Phillips	ed2c3218f5	Revert "[dwarf] Emit a DIGlobalVariable for constant strings." This reverts commit `4680982b36`. Broke a fuchsia windows bot. More details in the review: https://reviews.llvm.org/D123534	2022-05-16 19:07:38 -07:00
Mitch Phillips	4680982b36	[dwarf] Emit a DIGlobalVariable for constant strings. An upcoming patch will extend llvm-symbolizer to provide the source line information for global variables. The goal is to move AddressSanitizer off of internal debug info for symbolization onto the DWARF standard (and doing a clean-up in the process). Currently, ASan reports the line information for constant strings if a memory safety bug happens around them. We want to keep this behaviour, so we need to emit debuginfo for these variables as well. Reviewed By: dblaikie, rnk, aprantl Differential Revision: https://reviews.llvm.org/D123534	2022-05-16 16:52:16 -07:00
Philip Reames	7dbf2e7b57	Teach PeepholeOpt to eliminate redundant copy from constant physreg (e.g VLENB on RISCV) The existing redundant copy elimination required a virtual register source, but the same logic works for any physreg where we don't have to worry about clobbers. On RISCV, this helps eliminate redundant CSR reads from VLENB. Differential Revision: https://reviews.llvm.org/D125564	2022-05-16 16:38:30 -07:00
Paul Walker	7dd05ba9ed	[SelectionDAG] Remove duplicate "is scaled" information from gather/scatter SDNodes. During early gather/scatter enablement two different approaches were taken to represent scaled indices: * A Scale operand whereby byte_offsets = Index * Scale * An IndexType whereby byte_offsets = Index * sizeof(MemVT.ElementType) Having multiple representations is bad as shown by this patch which fixes instances where the two are out of sync. The dedicated scale operand is more flexible and pervasive so this patch removes the UNSCALED values from IndexType. This means all indices are scaled but the scale can be one, hence unscaled. SDNodes now use the scale operand to answer the "isScaledIndex" question. I toyed with the idea of keeping the UNSCALED enums and helper functions but because they will have no uses and force SDNodes to validate the set of supported values I figured it's best to remove them. We can re-add them if there's a real need. For similar reasons I've kept the IndexType enum when a bool could be used as I think being explicitly looks better. Depends On D123347 Differential Revision: https://reviews.llvm.org/D123381	2022-05-16 20:47:52 +01:00
Craig Topper	1c4880a2d3	[TargetLowering] Expand the last stage of i16 popcnt using shift+add+and instead of mul+shift. If we use multiply it would be with 0x0101 which is 1 more than a power of 2. On some targets we would expand this to shl+add. By avoiding the multiply earlier, we can generate better code. Note, PowerPC doesn't do the shl+add expansion of multiply so one of the tests increased in instruction count. Limiting to scalars because it almost always increased the number of instructions in vector tests. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D125638	2022-05-16 09:27:44 -07:00
Craig Topper	e6fc8454be	[DAGCombiner] Fix incorrect indentation. NFC	2022-05-16 09:27:15 -07:00
Philip Reames	55e2df7285	[LiveIntervals] Add range accessors for value numbers [nfc]	2022-05-16 08:23:12 -07:00
Bradley Smith	7ff5148d64	[DAGCombine] Support splat_vector nodes in (and (extload)) dagcombine Differential Revision: https://reviews.llvm.org/D125367	2022-05-16 11:25:20 +00:00
Abinav Puthan Purayil	485dd0b752	[GlobalISel] Handle constant splat in funnel shift combine This change adds the constant splat versions of m_ICst() (by using getBuildVectorConstantSplat()) and uses it in matchOrShiftToFunnelShift(). The getBuildVectorConstantSplat() name is shortened to getIConstantSplatVal() so that the *SExtVal() version would have a more compact name. Differential Revision: https://reviews.llvm.org/D125516	2022-05-16 16:03:30 +05:30
Yeting Kuo	26a61ab678	[SelectionDAG] Make getNode which uses single element SDVTList pass SDNodeFlags. The patch make users not need to know getNode with SDNodeFlags argument may not pass its flags. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125659	2022-05-16 18:19:46 +08:00
Denis Antrushin	8903dbef8f	[StatepointLowering] Properly handle local and non-local relocates of the same value. FunctionLoweringInfo::StatepointRelocationMaps map is used to pass GC pointer lowering information from statepoint to gc.relocate which may appear ini different block. D124444 introduced different lowering for local and non-local relocates. Local relocates use SDValue and non-local relocates use value exported to VReg. But I overlooked the fact that StatepointRelocationMap is indexed not by GCRelocate instruction, but by derived pointer. This works incorrectly when we have two relocates (one local and another non-local) of the same value, because they need different relocation records. This patch fixes the problem by recording relocation information per relocate instruction, not per derived pointer. This way, each gc.relocate can be lowered differently. Reviewed By: skatkov Differential Revision: https://reviews.llvm.org/D125538	2022-05-16 17:02:34 +07:00
Nikita Popov	05c3fe075d	[FastISel] Fix load folding for registers with fixups FastISel tries to fold loads into the single using instruction. However, if the register has fixups, then there may be additional uses through an alias of the register. In particular, this fixes the problem reported at https://reviews.llvm.org/D119432#3507087. The load register is (at the time of load folding) only used in a single call instruction. However, selection of the bitcast has added a fixup between the load register and the cross-BB register of the bitcast result. After fixups are applied, there would now be two uses of the load register, so load folding is not legal. Differential Revision: https://reviews.llvm.org/D125459	2022-05-16 10:25:25 +02:00
Craig Topper	b4ad450953	[TargetLowering] expandCTPOP don't create an used constant mask for i8 ctpop. NFC Use early out for the i8 case. I'm looking at avoiding MUL on targets that use libcalls for MUL. So doing a little pre-refactoring.	2022-05-14 20:35:38 -07:00
Simon Pilgrim	f4eac6e5f6	[DAG] visitOR - merge isa/cast<ShuffleVectorSDNode> into dyn_cast<ShuffleVectorSDNode>. NFC. Also, initialize entire mask to -1 to simplify undefined cases.	2022-05-14 20:49:26 +01:00
Simon Pilgrim	95cdd63b87	[DAG] visitADDLike - use SelectionDAG::FoldConstantArithmetic directly to match constant operands SelectionDAG::FoldConstantArithmetic determines if operands are foldable constants, so we don't need to bother with isConstantOrConstantVector / Opaque tests before calling it directly.	2022-05-14 18:39:41 +01:00
Simon Pilgrim	8db72d9d04	[DAG] visitMUL - pull out repeated SDLoc() calls. NFC.	2022-05-14 14:28:39 +01:00
Simon Pilgrim	8d4d4988e4	[DAG] Use SelectionDAG::FoldConstantArithmetic directly to match constant operands SelectionDAG::FoldConstantArithmetic determines if operands are foldable constants, so we don't need to bother with isConstantOrConstantVector / Opaque tests before calling it directly.	2022-05-14 14:19:12 +01:00
Simon Pilgrim	1ecc3d86ae	[DAG] Enable ISD::SHL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits Pulled out of D77804 as its going to be easier to address the regressions individually. This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts. The lost RISCV gorc2 fold shouldn't be a problem - instcombine would have already destroyed that pattern - see https://github.com/llvm/llvm-project/issues/50553 Differential Revision: https://reviews.llvm.org/D124839	2022-05-14 09:50:01 +01:00
Eli Friedman	96c2a0c9ff	[GlobalIsel] Fix fallback if stack protector isn't supported. When GlobalISel fails, we need to report the error, and we need to set the FailedISel property. We skipped those steps if stack protector insertion failed, which led to a very strange miscompile. Differential Revision: https://reviews.llvm.org/D125584	2022-05-13 14:17:27 -07:00
Simon Pilgrim	3fc33ced10	DAGCombiner.cpp - break if-else chains that always return (style)	2022-05-13 18:31:39 +01:00
Sanjay Patel	e52e1dab2a	[SDAG] freeze operand when expanging urem This is a potential miscompile as discussed in issue #55291. The related IR transform was patched with: `d428f09b2c`	2022-05-13 10:55:14 -04:00
Nikita Popov	ed1cb01baf	[IRBuilder] Add IsInBounds parameter to CreateGEP() We commonly want to create either an inbounds or non-inbounds GEP based on a boolean value, e.g. when preserving inbounds from existing GEPs. Directly accept such a boolean in the API, rather than requiring a ternary between CreateGEP and CreateInBoundsGEP. This change is not entirely NFC, because we now preserve an inbounds flag in a constant expression edge-case in InstCombine.	2022-05-13 14:30:55 +02:00
Sam Parker	6d53d35efd	[TypePromotion] Avoid some unnecessary truncs Recommit. Check for legal zext 'sinks' before inserting a trunc. Differential Revision: https://reviews.llvm.org/D115451	2022-05-13 09:45:20 +01:00
Jay Foad	26e1ebd3ea	[GlobalISel] Change ConstantFoldVectorBinop to return vector of APInt Previously it built MIR for the results and returned a Register. This avoids building constants for earlier elements of the vector if later elements will fail to fold, and allows CSEMIRBuilder::buildInstr to avoid unconditionally building a copy from the result. Use a new helper function MachineIRBuilder::buildBuildVectorConstant to build a G_BUILD_VECTOR of G_CONSTANTs. Differential Revision: https://reviews.llvm.org/D117758	2022-05-13 09:33:07 +01:00
Lian Wang	693758b282	[LegalizeTypes][VP] Add integer promotion support for vp.setcc Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125453	2022-05-13 06:25:13 +00:00
Lian Wang	8050ba6678	[LegalizeTypes][VP] Add integer promotion support for vp.merge Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125452	2022-05-13 03:28:29 +00:00
Craig Topper	cec249c60d	[TypePromotion] Promote undef by converting to 0. If we're promoting an undef I think that means that we expect the upper bits are zero. undef doesn't guarantee that. This patch replaces undef with 0 to ensure this. This matches how a zext or sext of undef would be folded by InstCombine/InstSimplify. I haven't found a failure from this was just thinking through the code. Differential Revision: https://reviews.llvm.org/D123174	2022-05-12 09:09:24 -07:00
Fraser Cormack	1106bc208c	[CodeGen][NFC] Move some comments from the end of lines to above them This avoids wrapping the line itself awkwardly when it exceeds 80 chars. It also better matches our style most other places.	2022-05-12 15:45:04 +01:00
Jeremy Morse	a975472fa6	[DebugInfo][InstrRef] Describe value sizes when spilt to stack This is a re-apply of D123599, which was reverted in `4fe2ab5279`, now with a more appropriate assertion. Original commit message follow: InstrRefBasedLDV can track and describe variable values that are spilt to the stack -- however it does not current describe the size of the value on the stack. This can cause uninitialized bytes to be read from the stack if a small register is spilt for a larger variable, or theoretically on big-endian machines if a large value on the stack is used for a small variable. Fix this by using DW_OP_deref_size to specify the amount of data to load from the stack, if there's any possibility for ambiguity. There are a few scenarios where this can be omitted (such as when using DW_OP_piece and a non-DW_OP_stack_value location), see deref-spills-with-size.mir for an explicit table of inputs flavours and output expressions. Differential Revision: https://reviews.llvm.org/D123599	2022-05-12 15:52:55 +01:00
Nikita Popov	50f846d634	[FastISel] Add some debug output (NFC) Print a debug message when aborting isel (next to the ORE report) and when folding a load.	2022-05-12 12:25:20 +02:00
Lian Wang	9176096c86	[LegalizeVectorTypes] Enable WidenVecRes_SETCC work for scalable vector. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125359	2022-05-12 02:52:43 +00:00
Craig Topper	edbf390d10	[CodeGenPrepare] Use const reference to avoid unnecessary APInt copy. NFC Spotted while looking at Matthias' patches. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124985	2022-05-11 12:06:45 -07:00
Matthias Braun	de9ad98d2d	Fix endless loop in optimizePhiConst with integer constant switch condition Avoid endless loop in degenerate case with an integer constant as switch condition as reported in https://reviews.llvm.org/D124552	2022-05-11 08:49:01 -07:00
David Green	5feeceddb2	[TypePromotion] Fix sext vs zext in promoted constant As pointed out in #55342, given non-canonical IR with multiple constants, we check the second operand in isSafeWrap, but can promote both with sext. Fix that as suggested by @craig.topper by ensuring we only extend the second constant if multiple are present. Fixes #55342 Differential Revision: https://reviews.llvm.org/D125294	2022-05-11 10:47:44 +01:00
David Green	764a7f4864	[TypePromotion] Format Type Promotion. NFC This clang-formats the TypePromotion code, with the only meaningful change being the removal of a verifyFunction call inside a LLVM_DEBUG, and the printing of the entire function which can be better handled via -print-after-all.	2022-05-11 08:18:58 +01:00
Xiang1 Zhang	2ea8f203cd	[CodeGen] Fix ConvertNodeToLibcall for STRICT_FPOWI Reviewed By: PengfeiWang Differential Revision: https://reviews.llvm.org/D125159	2022-05-11 08:58:06 +08:00
Matthias Braun	f0ea9c9cec	CodeGenPrepare: Replace constant PHI arguments with switch condition value We often see code like the following after running SCCP: switch (x) { case 42: phi(42, ...); } This tends to produce bad code as we currently materialize the constant phi-argument in the switch-block. This increases register pressure and if the pattern repeats for `n` case statements, we end up generating `n` constant values. This changes CodeGenPrepare to catch this pattern and revert it back to: switch (x) { case 42: phi(x, ...); } Differential Revision: https://reviews.llvm.org/D124552	2022-05-10 10:00:10 -07:00
Matthias Braun	cd19af74c0	Avoid 8 and 16bit switch conditions on x86 This adds a `TargetLoweringBase::getSwitchConditionType` callback to give targets a chance to control the type used in `CodeGenPrepare::optimizeSwitchInst`. Implement callback for X86 to avoid i8 and i16 types where possible as they often incur extra zero-extensions. This is NFC for non-X86 targets. Differential Revision: https://reviews.llvm.org/D124894	2022-05-10 10:00:10 -07:00
Lian Wang	f14a1f26ad	Revert "[RISCV][SelectionDAG] Support VECREDUCE_ADD mask operation" This patch make CodeGen/test/AArch64/vecreduce-add-legalization.ll fail. This reverts commit `17a8a1bb71`.	2022-05-10 09:25:25 +00:00
Lian Wang	17a8a1bb71	[RISCV][SelectionDAG] Support VECREDUCE_ADD mask operation Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125206	2022-05-10 08:52:48 +00:00
Mircea Trofin	c35ad9ee4f	[mlgo] Support exposing more features than those supported by models This allows the compiler to support more features than those supported by a model. The only requirement (development mode only) is that the new features must be appended at the end of the list of features requested from the model. The support is transparent to compiler code: for unsupported features, we provide a valid buffer to copy their values; it's just that this buffer is disconnected from the model, so insofar as the model is concerned (AOT or development mode), these features don't exist. The buffers are allocated at setup - meaning, at steady state, there is no extra allocation (maintaining the current invariant). These buffers has 2 roles: one, keep the compiler code simple. Second, allow logging their values in development mode. The latter allows retraining a model supporting the larger feature set starting from traces produced with the old model. For release mode (AOT-ed models), this decouples compiler evolution from model evolution, which we want in scenarios where the toolchain is frequently rebuilt and redeployed: we can first deploy the new features, and continue working with the older model, until a new model is made available, which can then be picked up the next time the compiler is built. Differential Revision: https://reviews.llvm.org/D124565	2022-05-09 18:01:21 -07:00
David Green	2cfb243bcd	[DAG] Use isAnyConstantBuildVector. NFC As suggested from `02f8519502`, this uses the isAnyConstantBuildVector method in lieu of separate isBuildVectorOfConstantSDNodes calls. It should otherwise be an NFC.	2022-05-09 14:13:03 +01:00
David Green	02f8519502	[DAG] Prevent infinite loop combining bitcast shuffle This prevents an infinite loop from D123801, where code trying to reduce the total number of bitcasts, but also handling constants, could create the opposite transform. Prevent the transform in these case to let the bitcast of a constant transform naturally. Fixes #55345	2022-05-09 09:36:22 +01:00
Simon Pilgrim	800d36cf32	[DAG] Only perform the fold (A-B)+(C-D) --> (A+C)-(B+D) when both inner subs have one use Fixes #51381	2022-05-08 13:51:58 +01:00
Craig Topper	b81bf7bb2f	[LegalizeTypes] Make use of SelectionDAG::getShiftAmountConstant. NFC Instead of calling getShiftAmountTy and getConstant separately.	2022-05-07 12:16:53 -07:00
Craig Topper	00bfaba997	[LegalizeTypes] Don't assume fshl/fshr shift amount type matches the other operands. Like other shifts, the type isn't required to match. We shouldn't assume we can call ZExtPromotedInteger. I tested the PromoteIntOp_FunnelShift locally by removing the promotion of the shift amount from PromoteIntRes_FunnelShift. But with the final version of this patch it is never executed on any tests. Differential Revision: https://reviews.llvm.org/D125106	2022-05-07 11:44:07 -07:00
Amaury Séchet	06fad8bc05	[DAGCombine] Add node in the worklist in topological order in CombineTo This is part of an ongoing effort toward making DAGCombine process the nodes in topological order. This is able to discover a couple of new optimizations, but also causes a couple of regression. I nevertheless chose to submit this patch for review as to start the discussion with people working on the backend so we can find a good way forward. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124743	2022-05-07 16:24:31 +00:00
Paul Walker	702c4ade22	[ISD::IndexType] Helper functions for common queries. Add helper functions to query the signed and scaled properties of ISD::IndexType along with functions to change them. Remove setIndexType from MaskedGatherSDNode because it only has one usage and typically should only be changed alongside its index operand. Minimise the direct use of the enum values to lay the groundwork for more refactoring. Differential Revision: https://reviews.llvm.org/D123347	2022-05-07 11:23:42 +01:00
David Green	5930691ee1	Revert "[DAGCombine] Make combineShuffleOfBitcast LittleEndian specific" This reverts commit `891c3cf99e` as it turns out that the error was not caused by this commit, the error caming from D124526 instead.	2022-05-06 21:03:22 +01:00
David Green	891c3cf99e	[DAGCombine] Make combineShuffleOfBitcast LittleEndian specific Something is going wrong with the BigEndian PowerPC bot. It is hard to tell what is wrong from here, but attempt to fix it by disabling the combineShuffleOfBitcast combine for bigendian.	2022-05-06 18:42:44 +01:00
Craig Topper	76f90a9d71	[SelectionDAG] Clear promoted bits before UREM on shift amount in PromoteIntRes_FunnelShift. Otherwise we have garbage in the upper bits that can affect the results of the UREM. Fixes PR55296. Differential Revision: https://reviews.llvm.org/D125076	2022-05-06 09:26:30 -07:00
Simon Pilgrim	c0bebc12f0	[DAG] visitREM - merge buildOptimizedSREM into if(). NFCI.	2022-05-06 15:39:17 +01:00
David Green	115c188807	[DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask')) If the mask is made up of elements that form a mask in the higher type we can convert shuffle(bitcast into the bitcast type, simplifying the instruction sequence. A v4i32 2,3,0,1 for example can be treated as a 1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load combines, along with helping simplify a number of other tests. The PowerPC combine for v16i8 splat vector loads needed some fixes to keep it working for v16i8 vectors. This improves the handling of v2i64 shuffles to match too, hopefully improving them in general. Differential Revision: https://reviews.llvm.org/D123801	2022-05-06 10:50:31 +01:00
Lian Wang	fb0d636f28	[RISCV][SelectionDAG] Support VP_REDUCE_ADD mask operation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124986	2022-05-06 01:49:21 +00:00
Craig Topper	5140e0d219	[SelectionDAGISel] Add back a comment to MergeInputChains handling. NFC This comment used to exist, but was lost in a refactor over 10 years ago, but still seems relevant and improves readability.	2022-05-05 12:59:21 -07:00
Craig Topper	084f967370	[SelectionDAG] Constant fold (sext_inreg undef, VT) to 0 instead of undef. The result of sign_extend_inreg needs to have as many sign bits as requested by the VT argument. The easiest way to guarantee this is to fold it to 0. SystemZ test was modified to avoid using undef. Fixes https://github.com/llvm/llvm-project/issues/55178 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124696	2022-05-05 09:45:35 -07:00
Craig Topper	4e2d1a6c18	[DAGCombiner] Fold (sext/zext undef) -> 0 and aext(undef) -> undef. Differential Revision: https://reviews.llvm.org/D124988	2022-05-05 09:34:18 -07:00
Craig Topper	fd13192aa5	[DAGCombiner] Fold (max/min X, X) -> X. Differential Revision: https://reviews.llvm.org/D124951	2022-05-05 09:34:17 -07:00
Brian Tracy	87a55137e2	Fix "the the" typo in documentation and user facing strings There are many more instances of this pattern, but I chose to limit this change to .rst files (docs), anything in libcxx/include, and string literals. These have the highest chance of being seen by end users. Reviewed By: #libc, Mordante, martong, ldionne Differential Revision: https://reviews.llvm.org/D124708	2022-05-05 17:52:08 +02:00
Thomas Preud'homme	68dee83923	[MachinePipeliner] Fix unscheduled instruction Prior to ordering instructions to be scheduled, the machine pipeliner update recurrence node sets in groupRemainingNodes() by adding in a given node set any node on the dependency path from a node set with higher priority to the given node set. The function computePath() that determine what constitutes a path follows artificial dependencies. However, when ordering the nodes in the resulting node sets, computeNodeOrder() calls ignoreDependence when looking at dependencies which ignores artificial dependencies. This can cause a node not to be scheduled which then causes wrong code generation and in the case of a debug build will lead to an assert failure in generatePhis() in ModuloScheduler.cpp. This commit adds calls to ignoreDependence() in computePath() to not add any node in groupRemainingNodes() that would not be ordered by computeNodeOrder(). Reviewed By: sgundapa Differential Revision: https://reviews.llvm.org/D124267	2022-05-05 16:01:41 +01:00
Xing Xue	e5926906eb	[XCOFF][AIX] Use unique section names for LSDA and EH info sections with -ffunction-sections Summary: When -ffunction-sections is on, this patch makes the compiler to generate unique LSDA and EH info sections for functions on AIX by appending the function name to the section name as a suffix. This will allow the AIX linker to garbage-collect unused function. Reviewed by: MaskRay, hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D124855	2022-05-05 09:01:36 -04:00
Jay Foad	9ebbe25034	RegAllocGreedy: Common up part of the priority calculation. NFC.	2022-05-05 10:35:33 +01:00
Nikita Popov	9678936f18	[DAGCombine] Fold (X & ~Y) \| Y with truncated not This extends the (X & ~Y) \| Y to X \| Y fold to also work if ~Y is a truncated not (when taking into account the mask X). This is done by exporting the infrastructure added in D124856 and reusing it here. I've retained the old value of AllowUndefs=false, though probably this can be switched to true with extra test coverage. Differential Revision: https://reviews.llvm.org/D124930	2022-05-05 11:10:11 +02:00
Craig Topper	572dfef1db	[SelectionDAG] Use llvm::any_of to simplify a loop. NFC	2022-05-04 19:09:06 -07:00
Nikita Popov	451bc723ae	[SDAG] Handle truncated not in haveNoCommonBitsSet() Demanded bits analysis may replace a full-width not with a any_extend (not (truncate X)) pattern. This patch looks through this kind of pattern in haveNoCommonBitsSet(). Of course, we can only do this if we only need negated bits in the non-extended part, as the other bits may now be arbitrary. For example, if we have haveNoCommonBitsSet(~X & Y, X) then ~X only needs to actually negate bits set in Y. This is only a partial solution to the problem in that it allows add -> or conversion, but the resulting or doesn't get folded yet. (I guess that will involve exposing getBitwiseNotOperand() as a more general helper and using that in the relevant transform.) Differential Revision: https://reviews.llvm.org/D124856	2022-05-04 15:30:44 +02:00
serge-sans-paille	7030654296	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `fa5a4e1b95` detected a few regressions, fixing them. Differential Revision: https://reviews.llvm.org/D124847	2022-05-04 08:32:38 +02:00
Luo, Yuanke	764676b737	[fastregalloc] Fix bug when undef value is tied to def. If the tied use is undef value, fastregalloc should free the def register. There is no reload needed for the undef value. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D124834	2022-05-04 12:12:55 +08:00
Jon Roelofs	e1c808b36e	Fix zero-width bitfield extracts to emit 0 Fixes #55129	2022-05-03 14:46:42 -07:00
Simon Pilgrim	faa35fc873	[DAG] Fix issue with rot(rot(x,c1),c2) -> rot(x,c1+c2) fold with unnormalized rotation amounts Don't assume the rotation amounts have been correctly normalized - do it as part of the constant folding. Also, the normalization should be performed with UREM not SREM.	2022-05-03 17:16:26 +01:00
Nikita Popov	2171a896ed	[SDAG] Handle A and B&~A in haveNoCommonBitsSet() This is the DAG variant of D124763. The code already handles the general pattern, but not this degenerate case. This allows folding A + (B&~A) to A \| (B&~A) which further holds to A \| B. Handling on the SDAG level is needed because in the motivating case the add is actually a getelementptr, which only gets converted into an add on the SDAG level. However, this patch is not quite sufficient to handle the getelementptr case yet, because of an interfering demanded bits simplification. Differential Revision: https://reviews.llvm.org/D124772	2022-05-03 15:47:02 +02:00
Nikita Popov	e0892614b1	[SDAG] Extract commutative helper from haveNoCommonBitsSet() (NFC) To make it easier to add additional patterns, which will generally want to handle commuted top-level operands.	2022-05-03 12:28:35 +02:00
Jeremy Morse	1d712c3818	[DebugInfo][InstrRef] Don't generate redundant DBG_PHIs In SelectionDAG, DBG_PHI instructions are created to "read" physreg values and give them an instruction number, when they can't be traced back to a defining instruction. The most common scenario if arguments to a function. Unfortunately, if you have 100 inlined methods, each of which has the same "this" pointer, then the 100 dbg.value instructions become 100 DBG_INSTR_REFs plus 100 DBG_PHIs, where only one DBG_PHI would suffice. This patch adds a vreg cache for MachienFunction::salvageCopySSA, if we've already traced a value back to the start of a block and created a DBG_PHI then it allows us to re-use the DBG_PHI, as well as reducing work. Differential Revision: https://reviews.llvm.org/D124517	2022-05-03 09:56:12 +01:00
David Green	6f81903e89	[LV][SLP] Mark fptosi_sat as vectorizable This adds fptosi_sat and fptoui_sat to the list of trivially vectorizable functions, mainly so that the loop vectorizer can vectorize the instruction. Marking them as trivially vectorizable also allows them to be SLP vectorized, and Scalarized. The signature of a fptosi_sat requires two type overrides (@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first operand of the intrinsic as a overloaded (but not scalar) operand. Differential Revision: https://reviews.llvm.org/D124358	2022-05-03 09:32:34 +01:00
Hsiangkai Wang	eaaa31ff2c	[RISCV][TargetLowering] Special case overflow expansion for (uaddo X, C). Follow-up to D122933. Differential Revision: https://reviews.llvm.org/D124374	2022-05-03 03:51:36 +00:00
Craig Topper	5f057eaa0d	[DAGCombiner] reassociationCanBreakAddressingModePattern should check uses of the outer add. When looking for memory uses, reassociationCanBreakAddressingModePattern should check uses of the outer ADD rather than the inner ADD. We want to know if the two ops we're reassociating are used by a load/store. In practice, the existing check usually works because CodeGenPrepare will make one of the load/stores have an offset of 0 relative to split GEP. That will make the inner add have a memory use. To test this, I've manually split the GEPs so there is no 0 offset store. This issue was recently discussed in the original review D60294. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D124644	2022-05-02 16:38:53 -07:00
Sanjay Patel	747c6a0c73	[SDAG] fix miscompile when casting int->FP->int This is the codegen equivalent of D124692. As shown in https://github.com/llvm/llvm-project/issues/55150 - the existing fold may be wrong when converting to a signed value. This is a quick fix to avoid the miscompile. https://alive2.llvm.org/ce/z/KtaDmd Differential Revision: https://reviews.llvm.org/D124771	2022-05-02 14:57:27 -04:00
Simon Pilgrim	ae8b10e543	[DAG] (style) Break apart if-else chain as they all return	2022-05-01 17:56:59 +01:00
Paul Walker	f10a8f6752	[LegalizeDAG] Fix TypeSize conversion error when expanding SIGN_EXTEND_INREG SIGN_EXTEND_INREG expansion can trigger a TypeSize error because "VT.getSizeInBits() == 1" is used to detect for a boolean without first verifying VT is a scalar.	2022-04-30 19:21:48 +01:00
Craig Topper	6affe87bda	[DAGCombiner] When matching a disguised rotate by constant don't forget to apply LHSMask/RHSMask. We try to match as a disguised rotate by constant of these forms (shl (X \| Y), C1) \| (srl X, C2) --> (rotl X, C1) \| (shl Y, C1) (shl X, C1) \| (srl (X \| Y), C2) --> (rotl X, C1) \| (srl Y, C2) We may have also looked through an AND to find the shift. If we did, we need to apply a mask to the result. I'll add an AArch64 test and pre-commit it and the RISC-V test tomorrow. Fixes PR55201. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124711	2022-04-30 11:02:30 -07:00
David Penry	dcb77643e3	Reapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM Fixed "private field is not used" warning when compiled with clang. original commit: `28d09bbbc3` reverted in: `fa49021c68` ------ This patch permits Swing Modulo Scheduling for ARM targets turns it on by default for the Cortex-M7. The t2Bcc instruction is recognized as a loop-ending branch. MachinePipeliner is extended by adding support for "unpipelineable" instructions. These instructions are those which contribute to the loop exit test; in the SMS papers they are removed before creating the dependence graph and then inserted into the final schedule of the kernel and prologues. Support for these instructions was not previously necessary because current targets supporting SMS have only supported it for hardware loop branches, which have no loop-exit-contributing instructions in the loop body. The current structure of the MachinePipeliner makes it difficult to remove/exclude these instructions from the dependence graph. Therefore, this patch leaves them in the graph, but adds a "normalization" method which moves them in the schedule to stage 0, which causes them to appear properly in kernel and prologues. It was also necessary to be more careful about boundary nodes when iterating across successors in the dependence graph because the loop exit branch is now a non-artificial successor to instructions in the graph. In additional, schedules with physical use/def pairs in the same cycle should be treated as creating an invalid schedule because the scheduling logic doesn't respect physical register dependence once scheduled to the same cycle. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D122672	2022-04-29 10:54:39 -07:00
Paul Walker	23c509754d	[DAGCombiner] Stop invalid sign conversion in refineIndexType. When looking through extends of gather/scatter indices it's safe to convert a known positive signed index to unsigned, but unsigned indices must remain unsigned. Depends On D123318 Differential Revision: https://reviews.llvm.org/D123326	2022-04-29 14:20:13 +01:00
Nikita Popov	027c728f29	[SelectionDAGBuilder] Don't create MGATHER/MSCATTER with Scale != ElemSize This is an alternative to D124530. In getUniformBase() only create scales that match the gather/scatter element size. If targets also support other scales, then they can produce those scales in target DAG combines. This is what X86 already does (as long as the resulting scale would be 1, 2, 4 or 8). This essentially restores the pre-opaque-pointer state of things. Fixes https://github.com/llvm/llvm-project/issues/55021. Differential Revision: https://reviews.llvm.org/D124605	2022-04-29 14:57:53 +02:00
Paul Walker	7a0b897e86	[DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling refineUniformBase and selectGatherScatterAddrMode both attempt the transformation: base(0) + index(A+splat(B)) => base(B) + index(A) However, this is only safe when index is not implicitly scaled. Differential Revision: https://reviews.llvm.org/D123222	2022-04-29 12:35:16 +01:00
Serge Pavlov	9fc58f1820	[PowerPC] Support of ppc_fp128 in lowering of llvm.is_fpclass PowerPC supports `ppc_fp128`, which is not an IEEE floating point type. The generic lowering of llvm.is_fpclass could not handle it properly. This change extends the generic lowering code to support `ppc_fp128`. The change was tested on emulator using runtime tests from https://reviews.llvm.org/D112933 and the patch for clang https://reviews.llvm.org/D112932. Differential Revision: https://reviews.llvm.org/D113908	2022-04-29 11:10:47 +07:00
Zequan Wu	4fe2ab5279	Revert "[DebugInfo][InstrRef] Describe value sizes when spilt to stack" This reverts commit `a15b66e76d`. This causes linker to crash at assertion: `Assertion failed: !Expr->isComplex(), file C:\b\s\w\ir\cache\builder\src\third_party\llvm\llvm\lib\CodeGen\LiveDebugValues\InstrRefBasedImpl.cpp, line 907`.	2022-04-28 16:18:16 -07:00
David Penry	fa49021c68	Revert "[CodeGen][ARM] Enable Swing Module Scheduling for ARM" This reverts commit `28d09bbbc3` while I investigate a buildbot failure.	2022-04-28 13:29:27 -07:00
David Penry	28d09bbbc3	[CodeGen][ARM] Enable Swing Module Scheduling for ARM This patch permits Swing Modulo Scheduling for ARM targets turns it on by default for the Cortex-M7. The t2Bcc instruction is recognized as a loop-ending branch. MachinePipeliner is extended by adding support for "unpipelineable" instructions. These instructions are those which contribute to the loop exit test; in the SMS papers they are removed before creating the dependence graph and then inserted into the final schedule of the kernel and prologues. Support for these instructions was not previously necessary because current targets supporting SMS have only supported it for hardware loop branches, which have no loop-exit-contributing instructions in the loop body. The current structure of the MachinePipeliner makes it difficult to remove/exclude these instructions from the dependence graph. Therefore, this patch leaves them in the graph, but adds a "normalization" method which moves them in the schedule to stage 0, which causes them to appear properly in kernel and prologues. It was also necessary to be more careful about boundary nodes when iterating across successors in the dependence graph because the loop exit branch is now a non-artificial successor to instructions in the graph. In additional, schedules with physical use/def pairs in the same cycle should be treated as creating an invalid schedule because the scheduling logic doesn't respect physical register dependence once scheduled to the same cycle. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D122672	2022-04-28 13:01:18 -07:00

... 7 8 9 10 11 ...

33072 Commits