llvm-project

Commit Graph

Author	SHA1	Message	Date
Kazu Hirata	cba40c4ede	[llvm] Use MachineBasicBlock::{successors,predecessors} (NFC)	2021-11-09 07:11:14 -08:00
Simon Pilgrim	58c01ef270	[SelectionDAG] Merge FoldConstantVectorArithmetic into FoldConstantArithmetic (PR36544) This patch merges FoldConstantVectorArithmetic back into FoldConstantArithmetic. Like FoldConstantVectorArithmetic we now handle vector ops with any operand count, but we currently still only handle binops for scalar types - this can be improved in future patches - in particular some common unary/trinary ops still have poor constant folding. There's one change in functionality causing test changes - FoldConstantVectorArithmetic bails early if the build/splat vector isn't all constant (with some undefs) elements, but FoldConstantArithmetic doesn't - it instead attempts to fold the scalar nodes and bails if they fail to regenerate a constant/undef result, allowing some additional identity/undef patterns to be handled. Differential Revision: https://reviews.llvm.org/D113300	2021-11-09 11:31:01 +00:00
Jay Foad	5c3c7adf3a	[CodeGen] Fix assertion failure in TwoAddressInstructionPass::rescheduleMIBelowKill This fixes an assertion failure with -early-live-intervals when trying to update the live intervals for a debug instruction, which don't even have slot indexes. Differential Revision: https://reviews.llvm.org/D113116	2021-11-09 09:24:21 +00:00
Akira Hatanaka	1fe8993ad8	[ObjC][ARC] Replace uses of ObjC intrinsics that are arguments of operand bundle "clang.arc.attachedcall" with ObjC runtime functions The existing code only handles the case where the intrinsic being rewritten is used as the called function pointer of a call/invoke.	2021-11-08 21:19:07 -08:00
Wouter van Oortmerssen	62eeb3e57e	[WebAssembly] fix __stack_pointer being added to .debug_aranges When emitting a reloc for the Wasm global __stack_pointer, it was inadvertedly added to the symbols used for generating aranges, which caused some aranges to use it as the end symbol in a symbol diff, which caused a reloc for it to be emitted, which then caused an assert in `wasm64` since we have no 64-bit relocs for Wasm globals. Fixes: https://bugs.llvm.org/show_bug.cgi?id=52376 Differential Revision: https://reviews.llvm.org/D113438	2021-11-08 16:30:31 -08:00
Kazu Hirata	3c06920cd1	[llvm] Use make_early_inc_range (NFC)	2021-11-08 09:09:39 -08:00
Simon Pilgrim	f059b04f7b	[DAG] Add SelectionDAG::ComputeMinSignedBits helper As suggested on D113371, this adds a wrapper to SelectionDAG::ComputeNumSignBits, similar to the llvm::ComputeMinSignedBits wrapper. I've included some usage, its not exhaustive, just the more obvious cases where the intention is obvious. Differential Revision: https://reviews.llvm.org/D113396	2021-11-08 14:12:45 +00:00
Simon Pilgrim	f60d3ec0c7	[DAG] Add BuildVectorSDNode::getConstantRawBits helper We have several places where we need to extract the raw bits data from a BUILD_VECTOR node, so consolidate this to a single helper function that handles Undefs and Integer/FP constants, including implicit truncation. This should make it easier to extend D113202 to handle more constant folding of bitcasted constant data. Differential Revision: https://reviews.llvm.org/D113351	2021-11-08 12:07:38 +00:00
Chen Zheng	50acbbe3cd	[AsmPrinter][ORE] use correct opcode name Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D113173	2021-11-08 01:51:24 +00:00
Benjamin Kramer	9b8b16457c	Put implementation details into anonymous namespaces. NFCI.	2021-11-07 15:18:30 +01:00
Simon Pilgrim	0ff1edeeec	[DAG] SimplifyVBinOp - replace FoldConstantVectorArithmetic with FoldConstantArithmetic Currently FoldConstantArithmetic only handles binops, so replacing other uses of FoldConstantVectorArithmetic (in particular for SETCC nodes), still require more work.	2021-11-07 12:11:46 +00:00
Kazu Hirata	843d1eda18	[llvm] Use llvm::reverse (NFC)	2021-11-06 19:31:18 -07:00
Sanjay Patel	39c4c7d391	[DAGCombiner] remove vselect fold that was accidentally added This diff snuck into the unrelated: `025a2f73a3` It's a suggested follow-up for D113212, but I need to add test coverage first.	2021-11-06 09:34:30 -04:00
Sanjay Patel	025a2f73a3	[InstCombine] add tests for umax with sub; NFC	2021-11-06 08:32:52 -04:00
Kazu Hirata	87e53a0ad8	[llvm] Use make_early_inc_range (NFC)	2021-11-05 19:39:07 -07:00
Jay Foad	bdaa181007	[TwoAddressInstructionPass] Update existing physreg live intervals In TwoAddressInstructionPass::processTiedPairs with -early-live-intervals, update any preexisting physreg live intervals, as well as virtreg live intervals. By default (without -precompute-phys-liveness) physreg live intervals only exist for registers that are live-in to some basic block. Differential Revision: https://reviews.llvm.org/D113191	2021-11-05 21:20:30 +00:00
Sanjay Patel	7e30404c3b	[DAGCombiner] add fold for vselect based on mask of signbit, part 2 This is the 'or' sibling for the fold added with: D113212 https://alive2.llvm.org/ce/z/tgnp7K Note that neither of these transforms is poison-safe, but it does not seem to matter at this level. We have had the scalar version of D113212 for a long time, so this is just making optimizer behavior consistent. We do not have the scalar version of this fold, however, so that is another follow-up.	2021-11-05 15:02:12 -04:00
Michael Liao	af2ae2cf42	[BranchRelaxation] Fix warning on unused variable. NFC.	2021-11-05 11:18:27 -04:00
Simon Pilgrim	9e6506299a	[DAG] FoldConstantVectorArithmetic - remove SDNodeFlags argument Another minor step towards merging FoldConstantVectorArithmetic into FoldConstantArithmetic. We don't use SDNodeFlags in any constant folding inside DAG, so passing the Flags argument is a waste of time - an alternative would be to wire up FoldConstantArithmetic to take SDNodeFlags just-in-case we someday start using it, but we don't have any way to test it and I'd prefer to avoid dead code. Differential Revision: https://reviews.llvm.org/D113276	2021-11-05 14:36:17 +00:00
Sanjay Patel	4fc1fc4005	[DAGCombiner] add fold for vselect based on mask of signbit (X s< 0) ? Y : 0 --> (X s>> BW-1) & Y We canonicalize to the icmp+select form in IR, and we already have this fold for scalar select in SDAG, so I think it's an oversight that we don't have the fold for vectors. It seems neutral for AArch64 and saves some instructions on x86. Whether we should also have the sibling folds for the inverse condition or all-ones true value may depend on target-specific factors such as whether there's an "and-not" instruction. Differential Revision: https://reviews.llvm.org/D113212	2021-11-05 10:06:16 -04:00
Simon Pilgrim	f2703c3c33	[DAG] FoldConstantArithmetic - rename NumOps -> NumElts. NFC. NumOps represents the number of elements for vector constant folding, rename this NumElts so in future we can the consistently use NumOps to represent the number of operands of the opcode. Minor cleanup before trying to begin generalizing FoldConstantArithmetic to support opcodes other than binops.	2021-11-05 13:32:34 +00:00
Alfredo Dal'Ava Junior	1cb9f37a17	[FreeBSD] Do not mark __stack_chk_guard as dso_local This symbol is defined in libc.so so it is definitely not DSO-Local. Marking it as such causes problems on some platforms (such as PowerPC). Differential revision: https://reviews.llvm.org/D109090	2021-11-05 07:29:50 -05:00
Simon Pilgrim	c1e7911c3b	[DAG] FoldConstantArithmetic - fold bitlogic(bitcast(x),bitcast(y)) -> bitcast(bitlogic(x,y)) To constant fold bitwise logic ops where we've legalized constant build vectors to a different type (e.g. v2i64 -> v4i32), this patch adds a basic ability to peek through the bitcasts and perform the constant fold on the inner operands. The MVE predicate v2i64 regressions will be addressed by future support for basic v2i64 type support. One of the yak shaving fixes for D113192.... Differential Revision: https://reviews.llvm.org/D113202	2021-11-05 12:00:59 +00:00
Jay Foad	0321bd64e6	Revert "[TwoAddressInstructionPass] Update existing physreg live intervals" This reverts commit `ec0e1e88d2`. It was pushed by mistake.	2021-11-05 09:54:26 +00:00
Jay Foad	ec0e1e88d2	[TwoAddressInstructionPass] Update existing physreg live intervals In TwoAddressInstructionPass::processTiedPairs with -early-live-intervals, update any preexisting physreg live intervals, as well as virtreg live intervals. By default (without -precompute-phys-liveness) physreg live intervals only exist for registers that are live-in to some basic block. Differential Revision: https://reviews.llvm.org/D113191	2021-11-05 09:10:24 +00:00
Mircea Trofin	34f4fe3a90	[NFC][Regalloc] Ensure Query::interferingVRegs is accurate. To correctly use Query, one had to first call collectInterferingVRegs to pre-cache the query result, then call interferingVRegs. Failing the former, interferingVRegs could be stale. This did cause a bug which was addressed in D98232, but the underlying usability issue of the Query API wasn't. This patch addresses the latter by making collectInterferingVRegs an implementation detail, and having interferingVRegs play both roles. One side-effect of this is that interferingVRegs is not const anymore. Differential Revision: https://reviews.llvm.org/D112882	2021-11-02 18:26:54 -07:00
Chih-Ping Chen	2ed29d87ef	[CodeView] Fortran debug info emission in Code View. Differential Revision: https://reviews.llvm.org/D112826	2021-11-02 15:06:21 -04:00
Arthur Eubanks	e2024d72fa	Revert "[NFC] Remove LinkAll*.h" This reverts commit `fe364e5dc7`. Causes breakages, e.g. https://lab.llvm.org/buildbot/#/builders/188/builds/5266	2021-11-02 09:08:09 -07:00
Arthur Eubanks	fe364e5dc7	[NFC] Remove LinkAll*.h These were added to prevent functions from being removed by WPO. But that doesn't make sense, correct WPO will not remove functions we actually use. I noticed these because compiling cc1_main.cpp was pulling in random LLVM pass headers. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112971	2021-11-02 08:43:17 -07:00
Jay Foad	be1a8f8834	[AMDGPU] Really preserve LiveVariables in SILowerControlFlow https://bugs.llvm.org/show_bug.cgi?id=52204 Differential Revision: https://reviews.llvm.org/D112731	2021-11-02 15:03:37 +00:00
jacquesguan	a39eadcf16	[DAGCombiner] Teach combineShiftToMULH to handle constant and const splat vector. Fold (srl (mul (zext i32:$a to i64), i64:c), 32) -> (mulhu $a, $b), if c can truncate to i32 without loss. Reviewed By: frasercrmck, craig.topper, RKSimon Differential Revision: https://reviews.llvm.org/D108129	2021-11-02 12:04:23 +00:00
Simon Pilgrim	325031786e	[SelectionDAG] Optimize expansion for rotates/funnel shifts If the type of a funnel shift needs to be expanded, expand it to two funnel shifts instead of regular shifts. For constant shifts, this doesn't make much difference, but for variable shifts it allows a more optimal lowering. Also use the optimized funnel shift lowering for rotates. Alive2: https://alive2.llvm.org/ce/z/TvHDB- / https://alive2.llvm.org/ce/z/yzPept (Branched from D108058 as getting this completed should help unlock some other WIP patches). Original Patch: @efriedma (Eli Friedman) Differential Revision: https://reviews.llvm.org/D112443	2021-11-02 11:38:25 +00:00
Simon Pilgrim	37e17f278f	[DAG] MatchRotate - remove (redundant) legal type check. Rely on the hasOperation() instead - as commented on D77804, the mid-term intention is to recognise rotate/funnel-by-constant pre-legalization to help avoid SimplifyDemandedBits regressions.	2021-11-02 11:24:50 +00:00
Kazu Hirata	6bdb61c58a	[CodeGen] Use make_early_inc_range (NFC)	2021-11-01 22:38:49 -07:00
Jay Foad	b8016b626e	[CodeGen] Tweak coding style in LivePhysRegs::stepForward. NFC.	2021-11-01 16:01:24 +00:00
Jun Ma	1f9fa54984	[Taildup] Don't tail-duplicate loop header with multiple successors as its latches when Taildup hit loop with multiple latches like: // 1 -> 2 <-> 3 \| // \ <-> 4 \| // \ <-> 5 \| // \---> rest \| it may transform this loop into multiple loops by duplicate loop header. However, this change may has little benefit while makes cfg much complex. In some uncommon cases, it causes large compile time regression (offered by @alexfh in D106056). This patch disable tail-duplicate of such cases. TestPlan: check-llvm Differential Revision: https://reviews.llvm.org/D110613	2021-11-01 15:32:00 +08:00
Craig Topper	ada5458521	[RISCV] Expand scalable vector bswap. Fix crash for bitreverse. Fix LegalizeVectorOps to not try shuffle or unrolling expansions for scalable vectors. Differential Revision: https://reviews.llvm.org/D112236	2021-10-31 10:01:27 -07:00
Kazu Hirata	1a605f395f	[CodeGen] Use make_early_inc_range (NFC)	2021-10-31 07:57:36 -07:00
Kazu Hirata	72710af233	[CodeGen, Target] Use MachineBasicBlock::terminators (NFC)	2021-10-31 07:57:34 -07:00
Kazu Hirata	4cc7c4724f	[MachineCSE] Use make_early_inc_range (NFC)	2021-10-30 19:00:23 -07:00
Roman Lebedev	25043c8276	[NFCI] Introduce `ICmpInst::compare()` and use it where appropriate As noted in https://reviews.llvm.org/D90924#inline-1076197 apparently this is a pretty common pattern, let's not repeat it yet again, but have it in a common place. There may be some more places where it could be used, but these are the most obvious ones.	2021-10-30 17:50:06 +03:00
Christudasan Devadasan	aa2d3b59ce	GlobalISel/Utils: Use incoming regbank while constraining the superclasses Register operands with superclasses can possibly have multiple regBanks if they have different register types. The regBank ambiguity resolved during regbankselect should be used to constrain the operand regclass instead of obtaining one from the MCInstrDesc. This is a prerequisite patch for D109300 that introduces allocatable AV_* Superclasses for AMDGPU by combining both VGPRs and AGPRs and we want to restrain the regclass to either A or V based on the incoming regbank. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112323	2021-10-30 07:20:45 -04:00
Fraser Cormack	8314a04ede	[SelectionDAG] Allow FindMemType to fail when widening loads & stores This patch removes an internal failure found in FindMemType and "bubbles it up" to the users of that method: GenWidenVectorLoads and GenWidenVectorStores. FindMemType -- renamed findMemType -- now returns an optional value, returning None if no such type is found. Each of the aforementioned users now pre-calculates the list of types it will use to widen the memory access. If the type breakdown is not possible they will signal a failure, at which point the compiler will crash as it does currently. This patch is preparing the ground for alternative legalization strategies for vector loads and stores, such as using vector-predication versions of loads or stores. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112000	2021-10-29 18:27:31 +01:00
Neubauer, Sebastian	c78640ee6a	[TailDuplicator] Fix merging block with terminator The TailDuplicator merged two blocks, even if the first one ended with a terminator, resulting in invalid MIR, where a terminator is in the middle of a block. Abort merging if the first block ends with a terminator. Differential Revision: https://reviews.llvm.org/D112226	2021-10-29 10:52:46 +02:00
Abinav Puthan Purayil	db8d7b6e2d	[DAGCombine][NFC] s/it's/its in the comment of hasNoInfs().	2021-10-29 07:36:38 +05:30
Daniel Kiss	d8075e8781	Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This is relanding commit `da1d1a0869` . This patch additionally addresses failures found in buildbots & post review comments. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-28 21:45:09 +02:00
Guozhi Wei	1e46dcb77b	[TwoAddressInstructionPass] Put all new instructions into DistanceMap In function convertInstTo3Addr, after converting a two address instruction into three address instruction, only the last new instruction is inserted into DistanceMap. This is wrong, DistanceMap should track all instructions from the beginning of current MBB to the working instruction. When a two address instruction is converted to three address instruction, multiple instructions may be generated (usually an extra COPY is generated), all of them should be inserted into DistanceMap. Similarly when unfolding memory operand in function tryInstructionTransform DistanceMap is not maintained correctly. Differential Revision: https://reviews.llvm.org/D111857	2021-10-28 11:11:59 -07:00
Nicolai Hähnle	b437aaa672	MachineDominators: Define MachineDomTree type alias This is a (very) small move towards making the machine dominators more aligned with the IR dominators: * DominatorTree / MachineDomTree is the class holding the dominator tree * DominatorTreeWrapperPass / MachineDominatorTree is the corresponding (machine) function pass This alignment will be used by analyses that are designed as templates that work with LLVM IR as well as Machine IR. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D112690	2021-10-28 22:30:35 +05:30
Daniel Kiss	66e03db814	Revert "Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."" This reverts commit `b6420e575f`.	2021-10-28 17:24:53 +02:00
Daniel Kiss	b6420e575f	Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This is relanding commit `da1d1a0869` . This patch additionally addresses failures found in buildbots & post review comments. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-28 16:49:19 +02:00
Neubauer, Sebastian	50d8d963e3	[GlobalISel] Simplify RegBankSelect Save the instruction list of a block before selecting banks. This allows to cope with moved instructions, even if they are reordered or splitted into multiple basic blocks. Differential Revision: https://reviews.llvm.org/D111223	2021-10-28 10:30:55 +02:00
Michael Liao	e6a4ba3aa6	[amdgpu] Handle the case where there is no scavenged register. - When an unconditional branch is expanded into an indirect branch, if there is no scavenged register, an SGPR pair needs spilling to enable the destination PC calculation. In addition, before jumping into the destination, that clobbered SGPR pair need restoring. - As SGPR cannot be spilled to or restored from memory directly, the spilling/restoring of that SGPR pair reuses the regular SGPR spilling support but without spilling it into memory. As that spilling and restoring points are fully controlled, we only need to spill that SGPR into the temporary VGPR, which needs spilling into its emergency slot. - The target-specific hook is revised to take additional restore block, where the restoring code is filled. After that, the relaxation will place that restore block directly before the destination block and insert an unconditional branch in any fall-through block into the destination block. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106449	2021-10-27 18:37:27 -04:00
Kerry McLaughlin	f01fafdcd4	[SVE][CodeGen] Fix incorrect legalisation of zero-extended masked loads PromoteIntRes_MLOAD always sets the extension type to `EXTLOAD`, which results in a sign-extended load. If the type returned by getExtensionType() for the load being promoted is something other than `NON_EXTLOAD`, we should instead pass this to getMaskedLoad() as the extension type. Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D112320	2021-10-27 14:15:41 +01:00
Caroline Concatto	1137b7207d	[SelectionDAG] Widening the result of INSERT_SUBVECTOR. Widens the result and first input vector because they have the same size. The subvector to be inserted is widened in the operand widen function. Differential Revision: https://reviews.llvm.org/D112187	2021-10-27 13:52:25 +01:00
Daniel Kiss	894ddba1c9	Revert "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This reverts commit `da1d1a0869`.	2021-10-27 14:29:35 +02:00
Jay Foad	b9e3af124b	[LiveInterval] Add RemoveDeadValNo argument to removeSegment(iterator) Add an optional bool RemoveDeadValNo argument to the removeSegment(iterator) overload, for consistency with the other overloads. This gives clients a way to remove dead valnos while also getting an updated iterator returned (in the manner of vector::erase). Use this to clean up some inefficient code in LiveIntervals::repairOldRegInRange. NFC. Differential Revision: https://reviews.llvm.org/D110560	2021-10-27 09:43:32 +01:00
Daniel Kiss	da1d1a0869	[ARM] __cxa_end_cleanup should be called instead of _UnwindResume. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-27 10:40:00 +02:00
Kazu Hirata	c3e698e2f5	[CodeGen, Hexagon] Use MachineBasicBlock::phis (NFC)	2021-10-26 09:01:29 -07:00
Craig Topper	d51e3a2139	[LegalizeTypes][TargetLowering] Merge getShiftAmountTyForConstant into TargetLowering::getShiftAmountTy. getShiftAmountTyForConstant is a special helper that changes the shift amount to i32 if the type chosen by TargetLowering::getShiftAmountTy can't represent all possible values. This is needed to satisfy an assert in SelectionDAG::getNode. It requires additional consideration to know when this helper should be used. I'm not sure that we are always using it when we should. This patch merges the getShiftAmountTyForConstant handling into TargetLowering::getShiftAmountTy so we don't need to think about it anymore. Technically this may slightly increase compile times since the majority of callers of getShiftAmountTy won't need this. Hopefully, this isn't an issue in practice. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112469	2021-10-25 14:06:53 -07:00
Jeremy Morse	4136897bd4	[DebugInfo][InstrRef][NFC] Switch to using DenseMaps and similar There are a few STL containers hanging around that can become DenseMaps, SmallVectors and similar. This recovers a modest amount of compile time performance. While I'm here, adjust the bit layout of ValueIDNum: this was always supposed to act like a value type, however it seems that clang doesn't compile the comparison functions to act that way. Add a uint64_t to a union that explicitly aliases the bitfields, so that we can compare the whole value as a single integer. Differential Revision: https://reviews.llvm.org/D112333	2021-10-25 18:07:17 +01:00
Jeremy Morse	97ddf49e43	[DebugInfo][InstrRef] Recover stack-slot tracking performance This patch is like D111627 -- instead of calculating IDF for every location on the stack, only do it for the smallest units of interference, and copy the PHIs for those units to any aliases. The test added runs placeMLocPHIs directly, and tests that: * A def of the lower 8 bits of a stack slot causes all aliasing regs to have PHIs placed, * It doesn't cause the equivalent location to x86's $ah, which isn't aliased, to have a PHI placed. Differential Revision: https://reviews.llvm.org/D112324	2021-10-25 17:31:09 +01:00
Danila Malyutin	7b102fcc91	[CodeGen] Fix dependence breaking for tied operands Differential Revision: https://reviews.llvm.org/D107582	2021-10-25 18:52:27 +03:00
Jeremy Morse	ee3eee71e4	[DebugInfo][InstrRef] Track values fused into stack spills During register allocation, some instructions can have stack spills fused into them. It means that when vregs are allocated on the stack we can convert: SETCCr %0 DBG_VALUE %0 to SETCCm %stack.0 DBG_VALUE %stack.0 Unfortunately instruction referencing finds this harder: a store to the stack doesn't have a specific operand number, therefore we don't substitute the old operand for a new operand, and the location is dropped. This patch implements a solution: just recognise the memory operand attached to an instruction with a Special Number (TM), and record a substitution between the old value and the new one. This patch adds substitution code to InlineSpiller to record such fused spills, and tracking in InstrRefBasedLDV to recognise such values, and produce the value numbers for them. Everything to do with the movement of stack-defined values is already handled in InstrRefBasedLDV. Differential Revision: https://reviews.llvm.org/D111317	2021-10-25 15:14:53 +01:00
Jeremy Morse	2eb96e1711	[DebugInfo][NFC] Avoid a use-after-free This patch swaps two lines -- the CurSucc reference can be invalidated by the call to DFS.push_back, therefore that should happen last. The usual hat-tip to asan for catching this. This patch also swaps an ealier call to ToAdd.insert and DFS.push_back, where a stable iterator (from successors()) is being used. This isn't strictly necessary, but is good for consistency and avoiding readers asking themselves why the two code portions have a different order.	2021-10-25 14:16:30 +01:00
Sanjay Patel	6e46b66e2a	[DAGCombiner] make matching bit-hack form of usubsat more flexible (i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128 As suggested in D112085, we can substitute 'xor' with 'add' in this pattern, and it is logically equivalent: https://alive2.llvm.org/ce/z/eJtWWC We canonicalize to 'xor' in IR, but SDAG does not do that (and it probably should not - https://llvm.org/PR52267 ), so it is possible to see either pattern in codegen. Note that 'sub' is a another potential pattern, but that is canonicalized to 'add' in DAGCombiner, so we don't need to worry about that variation. Differential Revision: https://reviews.llvm.org/D112377	2021-10-25 09:01:52 -04:00
Tim Northover	f9089accba	CodeGenPrep: remove all copies of GEP from list if there are duplicates. Unfortunately ToT has changed enough from the revision where this actually caused problems that the test no longer triggers an assertion failure.	2021-10-25 14:00:02 +01:00
Kazu Hirata	4bd46501c3	Use llvm::any_of and llvm::none_of (NFC)	2021-10-24 17:35:33 -07:00
Kazu Hirata	1c35973c77	[llvm] Call *(Set\|Map)::erase directly (NFC) We can erase an item in a set or map without checking its membership first.	2021-10-24 09:32:59 -07:00
Kazu Hirata	d8e4170b0a	Ensure newlines at the end of files (NFC)	2021-10-23 08:45:29 -07:00
Kazu Hirata	d14d7068b6	[llvm] Use StringRef::contains (NFC)	2021-10-23 08:45:27 -07:00
Jay Foad	2915889d74	[ScheduleDAGInstrs] Call adjustSchedDependency in more cases This removes a condition and the corresponding FIXME comment, because the Hexagon assertion it refers to has apparently been fixed, probably by D76134. NFCI. This just gives targets the opportunity to adjust latencies that were set to 0 by the generic code because they involve "implicit pseudo" operands. Differential Revision: https://reviews.llvm.org/D112306	2021-10-22 20:03:29 +01:00
Jeremy Morse	e7084ceab3	[DebugInfo][Instr] Track subregisters across stack spills/restores Sometimes we generate code that writes to a subregister, then spills / restores a super-register to the stack, for example: $eax = MOV32ri 0 MOV64mr $rsp, 1, $noreg, 16, $noreg, $rax $rcx = MOV64rm $rsp, 1, $noreg, 8, $noreg This patch takes a different approach: it adds another index to MLocTracker that identifies a size/offset within a stack slot. A location on the stack is then a pari of {FrameIndex, SlotNum}. Spilling and restoring now involves pairing up the src/dest register numbers, and the dest/src stack position to be transferred to/from. Location coverage improves as a result, compile-time performance decreases, alas. One limitation is that if a PHI occurs inside a stack slot: DBG_PHI %stack.0, 1 We don't know how large the resulting value is, and so might have difficulty picking which value to use. DBG_PHI might need to be augmented in the future with such a size. Unit tests added ensure that spills and restores correctly transfer to positions in the Location => Value map, and that different register classes written to the stack will correctly clobber all other positions in the stack slot. Differential Revision: https://reviews.llvm.org/D112133	2021-10-22 19:20:55 +01:00
Craig Topper	93139a3c32	[LegalizeTypes] Only expand CTLZ/CTTZ/CTPOP during type promotion if the new type is legal. We might be promoting a large non-power of 2 type and the new type may need to be split. Once we split it we may have a ctlz/cttz/ctpop instruction for the split type. I'm also concerned that we may create large shifts with shift amounts that are too small.	2021-10-22 11:02:35 -07:00
Simon Pilgrim	a5f56342b0	[DAG] narrowExtractedVectorLoad - EXTRACT_SUBVECTOR indices are always constant EXTRACT_SUBVECTOR indices are always constant, we don't need to check for ConstantSDNode, we should just use getConstantOperandVal which will assert for the constant.	2021-10-22 18:32:14 +01:00
Jeremy Morse	d9eebe3cd7	[DebugInfo][InstrRef] Add unit tests for transfer-function building This patch adds some unit tests for the machine-location transfer-function building parts of InstrRefBasedLDV: i.e., test that if we feed some MIR into the transfer-function building code, does it create the correct transfer function. There are a number of minor defects that get corrected in the process: * The unit test was selecting the x86 (i.e. 32 bit) backend rather than x86_64's 64 bit backend, * COPY instructions weren't actually having their subregister values correctly represented in the transfer function. Subregisters were being defined by the COPY, rather than taking the value in the source register. * SP aliases were at risk of being clobbered, if an SP subregister was clobbered. Differential Revision: https://reviews.llvm.org/D112006	2021-10-22 18:29:03 +01:00
Craig Topper	04c184bba7	[TargetLowering] Simplify the interface of expandABS. NFC Instead of returning a bool to indicate success and a separate SDValue, return the SDValue and have the callers check if it is null. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112331	2021-10-22 10:22:23 -07:00
Craig Topper	0766aef3f3	[LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if they'll be expanded later. Expanding these requires multiple constants. If we promote during type legalization when they'll end up getting expanded in LegalizeDAG, we'll use larger constants. These constants may be harder to materialize. For example, 64-bit constants on 64-bit RISCV are very expensive. This is similar to what has already been done to BSWAP and BITREVERSE. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112268	2021-10-22 09:10:01 -07:00
Zarko Todorovski	0bd6a9f2d1	[clang/llvm] Inclusive language: replace segregate with separate	2021-10-22 09:59:35 -04:00
Craig Topper	996123e5e8	[TargetLowering] Simplify the interface for expandCTPOP/expandCTLZ/expandCTTZ. There is no need to return a bool and have an SDValue output parameter. Just return the SDValue and let the caller check if it is null. I have another patch to add more callers of these so I thought I'd clean up the interface first. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112267	2021-10-21 15:35:28 -07:00
Craig Topper	ff37b1105d	[LegalizeVectorOps][X86] Don't defer BITREVERSE expansion to LegalizeDAG. By expanding early it allows the shifts to be custom lowered in LegalizeVectorOps. Then a DAG combine is able to run on them before LegalizeDAG handles the BUILD_VECTORS for the masks used. v16Xi8 shift lowering on X86 requires a mask to be applied to a v8i16 shift. The BITREVERSE expansion applied an AND mask before SHL ops and after SRL ops. This was done to share the same mask constant for both shifts. It looks like this patch allows DAG combine to remove the AND mask added after v16i8 SHL by X86 lowering. This maintains the mask sharing that BITREVERSE was trying to achieve. Prior to this patch it looks like we kept the mask after the SHL instead which required an extra constant pool or a PANDN to invert it. This is dependent on D112248 because RISCV will end up scalarizing the BSWAP portion of the BITREVERSE expansion if we don't disable BSWAP scalarization in LegalizeVectorOps first. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112254	2021-10-21 15:23:23 -07:00
Craig Topper	458ed5fcc3	[TargetLowering][RISCV] Prevent scalarization of fixed vector bswap. It's better to do the ands, shifts, ors in the vector domain than to scalarize it and do those operations on each element. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112248	2021-10-21 14:34:01 -07:00
Yonghong Song	f6811cec84	[DebugInfo] Support typedef with btf_decl_tag attributes Clang patch ([1]) added support for btf_decl_tag attributes with typedef types. This patch added llvm support including dwarf generation. For example, for typedef typedef unsigned * __u __attribute__((btf_decl_tag("tag1"))); __u u; the following shows llvm-dwarfdump result: 0x00000033: DW_TAG_typedef DW_AT_type (0x00000048 "unsigned int *") DW_AT_name ("__u") DW_AT_decl_file ("/home/yhs/work/tests/llvm/btf_tag/t.c") DW_AT_decl_line (1) 0x0000003e: DW_TAG_LLVM_annotation DW_AT_name ("btf_decl_tag") DW_AT_const_value ("tag1") 0x00000047: NULL [1] https://reviews.llvm.org/D110127 Differential Revision: https://reviews.llvm.org/D110129	2021-10-21 08:42:58 -07:00
Sanjay Patel	d2198771e9	[DAGCombiner] fold bit-hack form of usubsat (i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128 I haven't found a generalization of this identity: https://alive2.llvm.org/ce/z/_sriEQ Note: I was actually looking at the first form of the pattern in that link, but that's part of a long chain of potential missed transforms in codegen and IR....that I hope ends here! The predicates for when this is profitable are a bit tricky. This version of the patch excludes multi-use but includes custom lowering (as opposed to legal only). On x86 for example, we have custom lowering for some vector types, and that uses umax and sub. So to enable that fold, we need add use checks to avoid regressions. Even with legal-only lowering, we could see code with extra reg move instructions for extra uses, so that constraint would have to be eased very carefully to avoid penalties. Differential Revision: https://reviews.llvm.org/D112085	2021-10-21 09:47:19 -04:00
Kerry McLaughlin	0d153df69e	[SVE] Fix selection failure when splitting extended masked loads When splitting a masked load, `GetDependentSplitDestVTs` is used to get the MemVTs of the high and low parts. If the masked load is extended, this may return VTs with different element types which are used to create the high & low masked load instructions. This patch changes `GetDependentSplitDestVTs` to ensure we return VTs with the same element type. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D111996	2021-10-21 13:04:38 +01:00
Arthur Eubanks	6ea7437ca5	[SelectionDAG] Bail out of mergeTruncStores when not optimizing With unoptimized code, we may see lots of stores and spend too much time in mergeTruncStores. Fixes PR51827. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D111596	2021-10-20 16:58:22 -07:00
Jon Roelofs	b046eb19b8	[AArch64][GlobalISel] combine (and (or x, c1), c2) => (and x, c2) iff c1 & c2 == 0 https://godbolt.org/z/h8ejrG4hb rdar://83597585 Differential Revision: https://reviews.llvm.org/D111856	2021-10-20 12:11:52 -07:00
Stanislav Mekhanoshin	c80d8a8cea	[AMDGPU] MachineLICM cannot hoist VALU MachineLoop::isLoopInvariant() returns false for all VALU because of the exec use. Check TII::isIgnorableUse() to allow hoisting. That unfortunately results in higher register consumption since MachineLICM does not adequately estimate pressure. Therefor I think it shall only be enabled after D107677 even though it does not depend on it. Differential Revision: https://reviews.llvm.org/D107859	2021-10-20 11:47:24 -07:00
Itay Bookstein	08ed216000	[IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol As discussed in: * https://reviews.llvm.org/D94166 * https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html The GlobalIndirectSymbol class lost most of its meaning in https://reviews.llvm.org/D109792, which disambiguated getBaseObject (now getAliaseeObject) between GlobalIFunc and everything else. In addition, as long as GlobalIFunc is not a GlobalObject and getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee is a GlobalIFunc cannot currently be modeled properly. Creating aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition, calling getAliaseeObject on a GlobalIFunc will currently return nullptr, which is undesirable because it should return the object itself for non-aliases. This patch refactors the GlobalIFunc class to inherit directly from GlobalObject, and removes GlobalIndirectSymbol (while inlining the relevant parts into GlobalAlias and GlobalIFunc). This allows for calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc itself, making getAliaseeObject() more consistent and enabling alias-to-ifunc to be properly modeled in the IR. I exercised some judgement in the API clients of GlobalIndirectSymbol: some were 'monomorphized' for GlobalAlias and GlobalIFunc, and some remained shared (with the type adapted to become GlobalValue). Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D108872	2021-10-20 10:29:47 -07:00
Fraser Cormack	eabf11f9ea	[CodeGenPrepare] Avoid a scalable-vector crash in ctlz/cttz This patch fixes a crash when despeculating ctlz/cttz intrinsics with scalable-vector types. It is not safe to speculatively get the size of the vector type in bits in case the vector type is not a fixed-length type. As it happens this isn't required as vector types are skipped anyway. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112141	2021-10-20 16:45:55 +01:00
Craig Topper	fe1f0de003	[RISCV][WebAssembly][TargetLowering] Allow expandCTLZ/expandCTTZ to rely on CTPOP expansion for vectors. Our fallback expansion for CTLZ/CTTZ relies on CTPOP. If CTPOP isn't legal or custom for a vector type we would scalarize the CTLZ/CTTZ. This is different than CTPOP itself which would use a vector expansion. This patch teaches expandCTLZ/CTTZ to rely on the vector CTPOP expansion instead of scalarizing. To do this I had to add additional checks to make sure the operations used by CTPOP expansions are all supported. Some of the operations were already needed for the CTLZ/CTTZ expansion. This is a huge improvement to the RISCV which doesn't have a scalar ctlz or cttz in the base ISA. For WebAssembly, I've added Custom lowering to keep the scalarizing behavior. I've also extended the scalarizing to CTPOP. Differential Revision: https://reviews.llvm.org/D111919	2021-10-20 07:46:41 -07:00
Jeremy Morse	89950ade21	[DebugInfo][InstrRef] Track a single variable at a time Here's another performance patch for InstrRefBasedLDV: rather than processing all variable values in a scope at a time, instead, process one variable at a time. The benefits are twofold: * It's easier to reason about one variable at a time in your mind, * It improves performance, apparently from increased locality. The downside is that the value-propagation code gets indented one level further, plus there's some churn in the unit tests. Differential Revision: https://reviews.llvm.org/D111799	2021-10-20 15:03:52 +01:00
Sander de Smalen	be6c8dc765	[SelectionDAG] Fix getVectorSubVecPointer for scalable subvectors. When inserting a scalable subvector into a scalable vector through the stack, the index to store to needs to be scaled by vscale. Before this patch, that didn't yet happen, so it would generate the wrong offset, thus storing a subvector to the incorrect address and overwriting the wrong lanes. For some insert: nxv8f16 insert_subvector(nxv8f16 %vec, nxv2f16 %subvec, i64 2) The offset was not scaled by vscale: orr x8, x8, #0x4 st1h { z0.h }, p0, [sp] st1h { z1.d }, p1, [x8] ld1h { z0.h }, p0/z, [sp] And is changed to: mov x8, sp st1h { z0.h }, p0, [sp] st1h { z1.d }, p1, [x8, #1, mul vl] ld1h { z0.h }, p0/z, [sp] Differential Revision: https://reviews.llvm.org/D111633	2021-10-20 13:55:24 +01:00
Simon Pilgrim	71e39e3f18	[ADT] Add APInt::isNegatedPowerOf2() helper Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos..... Differential Revision: https://reviews.llvm.org/D111998	2021-10-19 14:38:21 +01:00
Jeremy Morse	849b17949f	[DebugInfo][InstrRef] Avoid un-necessary densemap copies and comparisons This is purely a performance patch: InstrRefBasedLDV used to use three DenseMaps to store variable values, two for long term storage and one as a working set. This patch eliminates the working set, and updates the long term storage in place, thus avoiding two DenseMap comparisons and two DenseMap assignments, which can be expensive. Differential Revision: https://reviews.llvm.org/D111716	2021-10-19 11:10:14 +01:00
Jeremy Morse	cf033bb2d3	[DebugInfo][NFC] Zero-initialize a class field This field gets assigned when the relevant object starts being used; but it remains uninitialized beforehand. This risks introducing hard-to-detect bugs if something changes, so zero-initialize the field.	2021-10-19 10:24:12 +01:00
Alexandros Lamprineas	04dc68710a	[DebugInfo][ARM] Fix incorrect debug information for RWPI accessed globals When compiling for the RWPI relocation model the debug information is wrong: * the debug location is described as { DW_OP_addr Var } instead of { DW_OP_constNu Var DW_OP_bregX 0 DW_OP_plus } * the relocation type is R_ARM_ABS32 instead of R_ARM_SBREL32 Differential Revision: https://reviews.llvm.org/D111404	2021-10-18 21:29:46 +01:00
Nikita Popov	54d868991a	[ExpandMemCmp] Update CFG before DTU The applyUpdates() API requires that the CFG is already updated, so make sure to insert the new terminator first.	2021-10-18 21:49:47 +02:00
Jon Roelofs	1300677f97	[AArch64][GlobalISel] combine and + [la]sr => ubfx https://godbolt.org/z/h8ejrG4hb rdar://83597585 Differential Revision: https://reviews.llvm.org/D111839	2021-10-18 10:33:01 -07:00
Kazu Hirata	8568ca789e	Use llvm::erase_if (NFC)	2021-10-18 09:33:42 -07:00
Sanjay Patel	2a3cc4d461	[Analysis] add utility function for unary shuffle mask creation This is NFC-intended for the callers. Posting in case there are other potential users that I missed. I would also use this from VectorCombine in a patch for: https://llvm.org/PR52178 ( D111901 ) Differential Revision: https://reviews.llvm.org/D111891	2021-10-18 09:00:39 -04:00
Jeremy Morse	ea970661dc	Fix signed/unsigned comparison after `b5426ced71` gcc11 warns that this counter causes a signed/unsigned comaprison when it's later compared with a SmallVector::difference_type. gcc appears to be correct, clang does not warn one way or the other.	2021-10-18 10:28:52 +01:00
Jay Foad	012248b0bc	Remove the verifyAfter mechanism that was replaced by D111397 Differential Revision: https://reviews.llvm.org/D111872	2021-10-18 10:26:46 +01:00
Jay Foad	36deb9a670	Add new MachineFunction property FailsVerification TargetPassConfig::addPass takes a "bool verifyAfter" argument which lets you skip machine verification after a particular pass. Unfortunately this is used in generic code in TargetPassConfig itself to skip verification after a generic pass, only because some previous target- specific pass damaged the MIR on that specific target. This is bad because problems in one target cause lack of verification for all targets. This patch replaces that mechanism with a new MachineFunction property called "FailsVerification" which can be set by (usually target-specific) passes that are known to introduce problems. Later passes can reset it again if they are known to clean up the previous problems. Differential Revision: https://reviews.llvm.org/D111397	2021-10-18 10:26:46 +01:00
Fraser Cormack	3d850d03ae	[SelectionDAG] Fix illegal widening of scalable-vector loads The process of widening simple vector loads attempts to use a load of a wider vector type if the original load is sufficiently aligned to avoid memory faults. However this optimization is only legal when performed on fixed-length vector types. For scalable vector types this is invalid (unless vscale happens to be 1). This patch does increase the likelihood of compiler crashes (from `FindMemType` failing to find a suitable type) but this now better matches how widening non-simple loads, insufficiently-aligned loads, and scalable-vector stores are handled. Patches will be introduced later by which loads and stores can be widened on targets with support for masked or predicated operations. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D111885	2021-10-18 10:00:00 +01:00
Bing1 Yu	f383c53311	[MachineSink] Compile time improvement for large testcases which has many kill flags We did a experiment and observed dramatic decrease on compilation time which spent on clearing kill flags. Before: Number of BasicBlocks:33357 Number of Instructions:162067 Number of Cleared Kill Flags:32869 Time of handling kill flags(ms):1.607509e+05 After: Number of BasicBlocks:33357 Number of Instructions:162067 Number of Cleared Kill Flags:32869 Time of handling kill flags:3.987371e+03 Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D111688	2021-10-18 15:44:07 +08:00
Mingming Liu	cfd155c41b	[SelectionDAG] Fix typo in option help Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D111867	2021-10-15 11:27:40 -07:00
Ellis Hoag	aa80034ab9	[DebugInfo] retainedTypes should not have subprograms After D80369, the retainedTypes in CU's should not have any subprograms so we should not handle that case when emitting debug info. Differential Revision: https://reviews.llvm.org/D111593	2021-10-15 12:42:25 -04:00
Dávid Bolvanský	6678db00e6	[X86] Enable promotion of i16 popcnt (PR52056) Solves https://bugs.llvm.org/show_bug.cgi?id=52056 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111507	2021-10-15 15:41:37 +02:00
Kazu Hirata	81e9c90686	[llvm] Use llvm::is_contained (NFC)	2021-10-14 22:44:09 -07:00
Jeremy Morse	b5426ced71	[DebugInfo][InstrRef] Place variable-values PHI using LLVM utilities This patch is very similar to D110173 / `a3936a6c19`, but for variable values rather than machine values. This is for the second instr-ref problem, calculating the correct variable value on entry to each block. The previous lattice based implementation was broken; we now use LLVMs existing PHI placement utilities to work out where values need to merge, then eliminate un-necessary ones through value propagation. Most of the deletions here happen in vlocJoin: it was trying to pick a location for PHIs to happen in, badly, leading to an infinite loop in the MIR test added, where it would repeatedly switch between register locations. The new approach is simpler: either PHIs can be eliminated, or they can't, and the location of the value is a different problem. Various bits and pieces move to the header so that they can be tested in the unit tests. The DbgValue class grows a "VPHI" kind to represent variable value PHIS that haven't been eliminated yet. Differential Revision: https://reviews.llvm.org/D110630	2021-10-14 14:43:43 +01:00
Simon Pilgrim	88487662f7	[Codegen] TargetLowering::getCanonicalIndexType - early out scaled MVT::i8 indices. NFCI. Avoids unused assignment scan-build warning.	2021-10-14 13:08:40 +01:00
Jeremy Morse	e3e1da20d4	Follow up to `a3936a6c19`, correctly select LiveDebugValues implementation Some functions get opted out of instruction referencing if they're being compiled with no optimisations, however the LiveDebugValues pass picks one implementation and then sticks with it through the rest of compilation. This leads to a segfault if we encounter a function that doesn't use instr-ref (because it's optnone, for example), but we've already decided to use InstrRefBasedLDV which expects to be passed a DomTree. Solution: keep both implementations around in the pass, and pick whichever one is appropriate to the current function.	2021-10-14 11:28:53 +01:00
Jeremy Morse	fbf269c71e	[DebugInfo][InstrRef] Only calculate IDF for reg units In D110173 we start using the existing LLVM IDF calculator to place PHIs as we reconstruct an SSA form of machine-code program. Sadly that's slower than the old (but broken) way, this patch attempts to recover some of that performance. The key observation: every time we def a register, we also have to def it's register units. If we def'd $rax, in the current implementation we independently calculate PHI locations for {al, ah, ax, eax, hax, rax}, and they will all have the same PHI positions. Instead of doing that, we can calculate the PHI positions for {al, ah} and place PHIs for any aliasing registers in the same positions. Any def of a super-register has to def the unit, and vice versa, so this is sound. It cuts down the SSA placement we need to do significantly. This doesn't work for stack slots, or registers we only ever read, so place PHIs normally for those. LiveDebugValues choses to ignore writes to SP at calls, and now have to ignore writes to SP register units too. Differential Revision: https://reviews.llvm.org/D111627	2021-10-13 16:08:18 +01:00
Jeremy Morse	e845ca2ff1	Follow up `a3936a6c19` to work around an old compiler bug Old versions of gcc want template specialisations to happen within the namespace where the template lives; this is still present in gcc 5.1, which we officially support, so it has to be worked around.	2021-10-13 13:27:25 +01:00
Jeremy Morse	a3936a6c19	[DebugInfo][InstrRef] Use PHI placement utilities for machine locations InstrRefBasedLDV used to try and determine which values are in which registers using a lattice approach; however this is hard to understand, and broken in various ways. This patch replaces that approach with a standard SSA approach using existing LLVM utilities. PHIs are placed at dominance frontiers; value propagation then eliminates un-necessary PHIs. This patch also adds a bunch of unit tests that should cover many of the weirder forms of control flow. Differential Revision: https://reviews.llvm.org/D110173	2021-10-13 12:49:04 +01:00
Kerry McLaughlin	1a2e90199f	[SVE][CodeGen] Add patterns for ADD/SUB + element count This patch adds patterns to match the following with INC/DEC: - @llvm.aarch64.sve.cnt[b\|h\|w\|d] intrinsics + ADD/SUB - vscale + ADD/SUB For some implementations of SVE, INC/DEC VL is not as cheap as ADD/SUB and so this behaviour is guarded by the "use-scalar-inc-vl" feature flag, which for SVE is off by default. There are no known issues with SVE2, so this feature is enabled by default when targeting SVE2. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D111441	2021-10-13 11:36:15 +01:00
Heejin Ahn	9261ee32dc	[WebAssembly] Make EH work with dynamic linking This makes Wasm EH work with dynamic linking. So far we were only able to handle destructors, which do not use any tags or LSDA info. 1. This uses `TargetExternalSymbol` for `GCC_except_tableN` symbols, which points to the address of per-function LSDA info. It is more convenient to use than `MCSymbol` because it can take additional target flags. 2. When lowering `wasm_lsda` intrinsic, if PIC is enabled, make the symbol relative to `__memory_base` and generate the `add` node. If PIC is disabled, continue to use the absolute address. 3. Make tag symbols (`__cpp_exception` and `__c_longjmp`) undefined in the backend, because it is hard to make it work with dynamic linking's loading order. Instead, we make all tag symbols undefined in the LLVM backend and import it from JS. 4. Add support for undefined tags to the linker. Companion patches: - https://github.com/WebAssembly/binaryen/pull/4223 - https://github.com/emscripten-core/emscripten/pull/15266 Reviewed By: sbc100 Differential Revision: https://reviews.llvm.org/D111388	2021-10-12 23:28:27 -07:00
Amara Emerson	5abce56edb	[GlobalISel] Add support for constant vector folding of binops in CSEMIRBuilder. Differential Revision: https://reviews.llvm.org/D111524	2021-10-12 11:31:22 -07:00
Hongtao Yu	098a0d8fbc	[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3. This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation. Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are: - Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes - Unblocked CSE by avoiding pseudo probe from clobbering memory SSA - Unblocked induction variable simpliciation - Allow empty loop deletion by treating probe intrinsic isDroppable - Some refactoring. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D110847	2021-10-12 09:44:12 -07:00
Jeremy Morse	d9fa186a5c	Scatter NDEBUG to fix after `838b4a533e` These "dump" methods call into MachineOperand::dump, which doesn't exist with NDEBUG, thus we croak. Disable LiveDebugValues dump methods when NDEBUG is turned on to avoid this.	2021-10-12 17:13:15 +01:00
Jay Foad	f7ee21aa32	[TwoAddressInstruction] Remove ad hoc machine verification With the -early-live-intervals command line flag, TwoAddressInstructionPass::runOnMachineFunction would call MachineFunction::verify before returning to check the live intervals. But there was not much benefit to doing this since -verify-machineinstrs and LLVM_ENABLE_EXPENSIVE_CHECKS provide a more general way of scheduling machine verification after every pass. Also it caused problems on targets like Lanai which are marked as "not machine verifier clean", since verification would fail for known target-specific problems which are nothing to do with LiveIntervals. Differential Revision: https://reviews.llvm.org/D111618	2021-10-12 16:09:18 +01:00
Jeremy Morse	838b4a533e	[DebugInfo][NFC] Move LiveDebugValues class to header This patch shifts the InstrRefBasedLDV class declaration to a header. Partially because it's already massive, but mostly so that I can start writing some unit tests for it. This patch also adds the boilerplate for said unit tests. Differential Revision: https://reviews.llvm.org/D110165	2021-10-12 16:07:26 +01:00
Yonghong Song	325d000765	[NFC][Attr] rename attribute btf_tag to btf_decl_tag Per discussion in https://reviews.llvm.org/D111199, the existing btf_tag attribute will be renamed to btf_decl_tag. This patch mostly updated the Bitcode and DebugInfo test cases with new attribute name. Differential Revision: https://reviews.llvm.org/D111591	2021-10-11 20:57:31 -07:00
Amara Emerson	53ebfa7c5d	[AArch64][GlobalISel] Fix combiner assertion in matchConstantOp(). We shouldn't call APInt::getSExtValue() on a >64b value.	2021-10-11 15:55:13 -07:00
Guozhi Wei	6599961c17	[TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation This patch contains following enhancements to SrcRegMap and DstRegMap: 1 In findOnlyInterestingUse not only check if the Reg is two address usage, but also check after commutation can it be two address usage. 2 If a physical register is clobbered, remove SrcRegMap entries that are mapped to it. 3 In processTiedPairs, when create a new COPY instruction, add a SrcRegMap entry only when the COPY instruction is coalescable. (The COPY src is killed) With these enhancements isProfitableToCommute can do better commute decision, and finally more register copies are removed. Differential Revision: https://reviews.llvm.org/D108731	2021-10-11 15:28:31 -07:00
Roman Lebedev	684cbae89a	[KnownBits] Introduce `countMaxActiveBits()` and use it in a few places	2021-10-11 23:36:06 +03:00
Jay Foad	edfdce2627	[PHIElimination] Fix accounting for undef uses when updating LiveVariables PHI elimination updates LiveVariables info as described here: // We only need to update the LiveVariables kill of SrcReg if this was the // last PHI use of SrcReg to be lowered on this CFG edge and it is not live // out of the predecessor. We can also ignore undef sources. Unfortunately if the last use also happened to be an undef use then it would fail to update the LiveVariables at all. Fix this by not counting undef uses in the VRegPHIUse map. Thanks to Mikael Holmén for the test case! Differential Revision: https://reviews.llvm.org/D111552	2021-10-11 20:22:47 +01:00
Amara Emerson	f95d9c95bb	[GlobalISel] Fix the stores of truncates -> wide store combine for non-evenly dividing type sizes. If the wide store we'd generate is not a multiple of the memory type of the narrow stores (e.g. s48 and s32), we'd assert. Fix that.	2021-10-09 21:18:20 -07:00
Dávid Bolvanský	943b304848	Fixed some errors detected by PVS Studio	2021-10-09 17:27:41 +02:00
Dávid Bolvanský	3649fb14d1	Fixed some errors detected by PVS Studio	2021-10-09 17:20:04 +02:00
Arthur Eubanks	a0a4935182	Make more places that use alignment use uint64_t Followup to D110451.	2021-10-08 16:35:19 -07:00
Reid Kleckner	89b57061f7	Move TargetRegistry.(h\|cpp) from Support to MC This moves the registry higher in the LLVM library dependency stack. Every client of the target registry needs to link against MC anyway to actually use the target, so we might as well move this out of Support. This allows us to ensure that Support doesn't have includes from MC/*. Differential Revision: https://reviews.llvm.org/D111454	2021-10-08 14:51:48 -07:00
Amara Emerson	17b89f9daa	[GlobalISel] Improve G_UMHULH -> LSHR combine to accept non-uniform constant vectors.	2021-10-08 11:25:26 -07:00
Craig Topper	a9700653ab	[RegisterScavenging] Use a Twine in a call to report_fatal_error instead of going from std::string to c_str. NFC The std::string was built on the line above. Might as well just build it as a Twine in the call.	2021-10-08 11:04:08 -07:00
Bradley Smith	7c68d4b8ff	Revert "[SelectionDAG] Remove PromoteIntOp_EXTRACT_SUBVECTOR." This reverts commit `3e8d2008f7`. The code removed in this commit is actually required for extracting fixed types from illegal scalable types, hence this commit causes assertion failures in such extracts.	2021-10-08 14:53:26 +00:00
Mirko Brkusanin	d20840c937	[GlobalISel] Combine for eliminating redundant operand negations Differential Revision: https://reviews.llvm.org/D111319	2021-10-08 14:29:22 +02:00
Amara Emerson	72ce310bf0	[GlobalISel][IRTranslator] Fix a use-after-free bug when translating trap-func-name traps. This was using MachineFunction::createExternalSymbolName() before, which seems reasonable, but in fact this is freed before the asm emitter which tries to access the function name string. Switching it to use the string returned by the attribute seems to fix the problem.	2021-10-07 23:51:37 -07:00
Amara Emerson	08b3c0d995	[GlobalISel] Combine G_UMULH x, (1 << c)) -> x >> (bitwidth - c) In order to not generate an unnecessary G_CTLZ, I extended the constant folder in the CSEMIRBuilder to handle G_CTLZ. I also added some extra handing of vector constants too. It seems we don't have any support for doing constant folding of vector constants, so the tests show some other useless G_SUB instructions too. Differential Revision: https://reviews.llvm.org/D111036	2021-10-07 23:51:37 -07:00
Jay Foad	b84d9d299e	[TargetPassConfig] Remove an obsolete FIXME comment The "coloring with register" functionality it refers to was removed ten years ago in svn r144481 "Remove the -color-ss-with-regs option".	2021-10-08 07:34:25 +01:00
Itay Bookstein	faa0e2ae76	[SelectionDAG] Fix shift libcall ABI mismatch in shift-amount argument The shift libcalls have a shift amount parameter of MVT::i32, but sometimes ExpandIntRes_Shift may be called with a node whose second operand is a type that is larger than that. This leads to an ABI mismatch, and for example causes a spurious zeroing of a register in RV32 for 64-bit shifts. Note that at present regular shift intstructions already have their shift amount operand adapted at SelectionDAGBuilder::visitShift time, and funnelled shifts bypass that. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110508	2021-10-08 09:57:57 +08:00
Wang, Pengfei	c236883b6b	[X86] Optimize fdiv with reciprocal instructions for half type Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110557	2021-10-08 09:41:13 +08:00
Adrian Prantl	9f93f2bfbd	Do not emit prologue_end for line 0 locs if there is a non-zero loc present This change fixes a bug where the compiler generates a prologue_end for line 0 locs. That is because line 0 is not associated with any source location, so there should not be a prolgoue_end at a location that doesn't correspond to a source location. There were some LLVM tests that were explicitly checking for line 0 prologue_end's as well since I believe that to be incorrect, I had to change those tests as well. Patch by Shubham Rastogi! Differential Revision: https://reviews.llvm.org/D110740	2021-10-07 13:54:28 -07:00
Jay Foad	097339b1ca	[TargetPassConfig] Enable machine verification after miscellaneous passes In a couple of places machine verification was disabled for no apparent reason, probably just because an "addPass(..., false)" line was cut and pasted from elsewhere. After this patch the only remaining place where machine verification is disabled in the generic TargetPassConfig code, is after addPreEmitPass.	2021-10-07 21:24:50 +01:00
Jay Foad	27c57e791a	[TwoAddressInstruction] Enable machine verification after this pass Differential Revision: https://reviews.llvm.org/D111007	2021-10-07 20:04:51 +01:00
Jay Foad	3ff0a5747d	[PHIElimination] Enable machine verification after this pass Differential Revision: https://reviews.llvm.org/D111006	2021-10-07 20:04:51 +01:00
Jay Foad	3c9dfba189	[PHIElimination] Account for INLINEASM_BR when inserting kills When PHIElimination adds kills after lowering PHIs to COPYs it knows that some instructions after the inserted COPY might use the same SrcReg, but it was only looking at the terminator instructions at the end of the block, not at other instructions like INLINEASM_BR that can appear after the COPY insertion point. Since we have already called findPHICopyInsertPoint, which knows about INLINEASM_BR, we might as well reuse the insertion point that it calculated when looking for instructions that might use SrcReg. This fixes a machine verification failure if you force machine verification to run after PHIElimination (currently it is disabled for other reasons) when running test/CodeGen/X86/callbr-asm-phi-placement.ll. Differential Revision: https://reviews.llvm.org/D110834	2021-10-07 20:04:39 +01:00
Amara Emerson	8bfc0e06dc	[GlobalISel] Port the udiv -> mul by constant combine. This is a straight port from the equivalent DAG combine. Differential Revision: https://reviews.llvm.org/D110890	2021-10-07 11:37:17 -07:00
Jay Foad	548b01c7a6	[MIRParser] Add support for IsInlineAsmBrIndirectTarget Print this basic block flag as inlineasm-br-indirect-target and parse it. This allows you to write MIR test cases for INLINEASM_BR. The test case I added is one that I wanted to precommit anyway for D110834. Differential Revision: https://reviews.llvm.org/D111291	2021-10-07 19:08:01 +01:00
Bradley Smith	5be266db7a	[AArch64][SVE] Improve VECTOR_SPLICE codegen for VL > 128-bit Differential Revision: https://reviews.llvm.org/D111135	2021-10-07 15:28:55 +00:00
Jack Andersen	bd4dad87f4	[MachineInstr] Move MIParser's DBG_VALUE RegState::Debug invariant into MachineInstr::addOperand Based on the reasoning of D53903, register operands of DBG_VALUE are invariably treated as RegState::Debug operands. This change enforces this invariant as part of MachineInstr::addOperand so that all passes emit this flag consistently. RegState::Debug is inconsistently set on DBG_VALUE registers throughout LLVM. This runs the risk of a filtering iterator like MachineRegisterInfo::reg_nodbg_iterator to process these operands erroneously when not parsed from MIR sources. This issue was observed in the development of the llvm-mos fork which adds a backend that relies on physical register operands much more than existing targets. Physical RegUnit 0 has the same numeric encoding as $noreg (indicating an undef for DBG_VALUE). Allowing debug operands into the machine scheduler correlates $noreg with RegUnit 0 (i.e. a collision of register numbers with different zero semantics). Eventually, this causes an assert where DBG_VALUE instructions are prohibited from participating in live register ranges. Reviewed By: MatzeB, StephenTozer Differential Revision: https://reviews.llvm.org/D110105	2021-10-07 16:08:52 +01:00
Carl Ritson	b5d6ad20e1	[MachineCopyPropagation] Handle propagation of undef copies When propagating undefined copies the undef flag must also be propagated. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D111219	2021-10-07 20:34:27 +09:00
Jay Foad	df2d4bc4cb	[TwoAddressInstruction] Fix ReplacedAllUntiedUses in processTiedPairs Fix the calculation of ReplacedAllUntiedUses when any of the tied defs are early-clobber. The effect of this is to fix the placement of kill flags on an instruction like this (from @f2 in test/CodeGen/SystemZ/asm-18.ll): INLINEASM &"stepb $1, $2" [attdialect], $0:[regdef-ec:GRH32Bit], def early-clobber %3:grh32bit, $1:[reguse tiedto:$0], killed %4:grh32bit(tied-def 3), $2:[reguse:GRH32Bit], %4:grh32bit After TwoAddressInstruction without this patch: %3:grh32bit = COPY killed %4:grh32bit INLINEASM &"stepb $1, $2" [attdialect], $0:[regdef-ec:GRH32Bit], def early-clobber %3:grh32bit, $1:[reguse tiedto:$0], %3:grh32bit(tied-def 3), $2:[reguse:GRH32Bit], %4:grh32bit Note that the COPY kills %4, even though there is a later use of %4 in the INLINEASM. This fails machine verification if you force it to run after TwoAddressInstruction (currently it is disabled for other reasons). After TwoAddressInstruction with this patch: %3:grh32bit = COPY %4:grh32bit INLINEASM &"stepb $1, $2" [attdialect], $0:[regdef-ec:GRH32Bit], def early-clobber %3:grh32bit, $1:[reguse tiedto:$0], %3:grh32bit(tied-def 3), $2:[reguse:GRH32Bit], %4:grh32bit Differential Revision: https://reviews.llvm.org/D110848	2021-10-07 10:10:11 +01:00
Mikael Holmen	9bf5d91361	[GlobalISel] Silence gcc warning about unused variable	2021-10-07 07:18:04 +02:00
Itay Bookstein	40ec1c0f16	[IR][NFC] Rename getBaseObject to getAliaseeObject To better reflect the meaning of the now-disambiguated {GlobalValue, GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction (D109792), the function is renamed to getAliaseeObject.	2021-10-06 19:33:10 -07:00
David Blaikie	f6a561c4d6	DebugInfo: Use clang's preferred names for integer types This reverts `c7f16ab3e3` / r109694 - which suggested this was done to improve consistency with the gdb test suite. Possible that at the time GCC did not canonicalize integer types, and so matching types was important for cross-compiler validity, or that it was only a case of over-constrained test cases that printed out/tested the exact names of integer types. In any case neither issue seems to exist today based on my limited testing - both gdb and lldb canonicalize integer types (in a way that happens to match Clang's preferred naming, incidentally) and so never print the original text name produced in the DWARF by GCC or Clang. This canonicalization appears to be in `integer_types_same_name_p` for GDB and in `TypeSystemClang::GetBasicTypeEnumeration` for lldb. (I tested this with one translation unit defining 3 variables - `long`, `long ()()`, and `int ()()`, and another translation unit that had main, and a function that took `long ()()` as a parameter - then compiled them with mismatched compilers (either GCC+Clang, or Clang+(Clang with this patch applied)) and no matter the combination, despite the debug info for one CU naming the type "long int" and the other naming it "long", both debuggers printed out the name as "long" and were able to correctly perform overload resolution and pass the `long int ()()` variable to the `long (*)()` function parameter) Did find one hiccup, identified by the lldb test suite - that CodeView was relying on these names to map them to builtin types in that format. So added some handling for that in LLVM. (these could be split out into separate patches, but seems small enough to not warrant it - will do that if there ends up needing any reverti/revisiting) Differential Revision: https://reviews.llvm.org/D110455	2021-10-06 16:02:34 -07:00
Arthur Eubanks	05392466f0	Reland [IR] Increase max alignment to 4GB Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945. This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now. The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field. Updating clang's max allowed alignment will come in a future patch. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D110451	2021-10-06 13:29:23 -07:00
Arthur Eubanks	569346f274	Revert "Reland [IR] Increase max alignment to 4GB" This reverts commit `8d64314ffe`.	2021-10-06 11:38:11 -07:00
Arthur Eubanks	8d64314ffe	Reland [IR] Increase max alignment to 4GB Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945. This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now. The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field. Updating clang's max allowed alignment will come in a future patch. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D110451	2021-10-06 11:03:51 -07:00
Arthur Eubanks	72cf8b6044	Revert "[IR] Increase max alignment to 4GB" This reverts commit `df84c1fe78`. Breaks some bots	2021-10-06 10:21:35 -07:00
Arthur Eubanks	df84c1fe78	[IR] Increase max alignment to 4GB Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945. This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now. The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field. Updating clang's max allowed alignment will come in a future patch. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D110451	2021-10-06 09:54:14 -07:00
Amara Emerson	79d13bf22c	Revert "Revert "[GlobalISel][IRTranslator] Emit trap intrinsic for "unreachable""" This reverts commit `d95cd81141`. Re-land the original patch now that the bug this exposed in selection has been fixed by `6bc64e24c3`	2021-10-06 04:16:19 -07:00
Simon Pilgrim	21661607ca	[llvm] Replace report_fatal_error(std::string) uses with report_fatal_error(Twine) As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.	2021-10-06 12:04:30 +01:00
David Sherwood	37edb7d3e2	[SVE] Fix incorrect DAG combines when extracting fixed-width from scalable vectors We were previously silently generating incorrect code when extracting a fixed-width vector from a scalable vector. This is worse than crashing, since the user will have no indication that this is currently unsupported behaviour. I have fixed the code to only perform DAG combines when safe to do so, i.e. the input and output vectors are both fixed-width or both scalable. Test added here: CodeGen/AArch64/sve-extract-scalable-vector.ll Differential revision: https://reviews.llvm.org/D110624	2021-10-06 09:27:44 +01:00
Amara Emerson	6bc64e24c3	[GlobalISel] Clear unreachable blocks' contents after selection. If these blocks are unreachable, then we can discard all of the instructions. However, keep the block around because it may have an address taken or the block may have a stale reference from a PHI somewhere. Instead of finding those PHIs and fixing them up, just leave the block empty. Differential Revision: https://reviews.llvm.org/D111201	2021-10-05 23:06:22 -07:00
Simon Pilgrim	2e5daac217	[llvm] Update report_fatal_error calls from raw_string_ostream to use Twine(OS.str()) As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared. We can use the raw_string_ostream::str() method to perform the implicit flush() and return a reference to the std::string container that we can then wrap inside Twine().	2021-10-05 18:42:12 +01:00
Joe Nash	8f55fdf26c	[MacroFusion] Expose useful static methods. NFC. hasLessThanNumFused and fuseInstructionPair are useful for DAG mutations similar to MacroFusion, but which cannot use MacroFusion as a whole (such as fusing non-dependent instruction). Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D111070 Change-Id: I3a5d56aba0471d45ef64cebb9b724030e2eae2f3	2021-10-05 11:51:48 -04:00
Amara Emerson	de5b16d8ca	Revert "Revert "Revert "[GlobalISel][IRTranslator] Emit trap intrinsic for "unreachable"""" This reverts commit `c93bc508ee`. Seems to break a different thing now.	2021-10-05 08:25:13 -07:00
Jay Foad	f65458df32	[PHIElimination] Update LiveVariables after handling an unspillable terminator Update the LiveVariables analysis after the special handling for unspillable terminators which was added in D91358. This is just enough to fix some "Block should not be in AliveBlocks" / "Block missing from AliveBlocks" errors in the codegen test suite when machine verification is forced to run after PHIElimination (currently it is disabled). Differential Revision: https://reviews.llvm.org/D110939	2021-10-05 14:25:53 +01:00
Jeremy Morse	e265644b32	[DebugInfo][InstrRef] Track all of DBG_PHIs operands An important part of the instruction referencing solution is that we identify all the registers that values move between before we then compute an SSA-like function from the machine code, and from the variable intrinsics. DBG_PHIs weren't causing all the subregisters of their operands to be tracked; this patch forces that to happen. The practical implications were that not enough space is allocated for storing values when analysing the function -- asan will crash on the attached test case with an unpatched compiler. Non-asan llc's will produce a DBG_VALUE $noreg, where it should be $dil. Differential Revision: https://reviews.llvm.org/D109064	2021-10-05 14:01:26 +01:00
Mirko Brkusanin	40e00063bc	[GlobalISel] Combine fabs(fneg(x)) to fabs(x) Differential Revision: https://reviews.llvm.org/D110943	2021-10-05 13:43:39 +02:00
Bjorn Pettersson	8ed0e6b2cf	[SelectionDAG] Replace error prone index check in BaseIndexOffset::computeAliasing Deriving NoAlias based on having the same index in two BaseIndexOffset expressions seemed weird (and as shown in the added unittest the correctness of doing so depended on undocumented pre-conditions that the user of BaseIndexOffset::computeAliasing would need to take care of. This patch removes the code that dereived NoAlias based on indices being the same. As a compensation, to avoid regressions/diffs in various lit test, we also add a new check. The new check derives NoAlias in case the two base pointers are based on two different GlobalValue:s (neither of them being a GlobalAlias). Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D110256	2021-10-05 12:15:55 +02:00
Bjorn Pettersson	1896fb2cff	[SelectionDAG] Assume that a GlobalAlias may alias other global values This fixes a bug detected in DAGCombiner when using global alias variables. Here is an example: @foo = global i16 0, align 1 @aliasFoo = alias i16, i16 * @foo define i16 @bar() { ... store i16 7, i16 * @foo, align 1 store i16 8, i16 * @aliasFoo, align 1 ... } BaseIndexOffset::computeAliasing would incorrectly derive NoAlias for the two accesses in the example above, resulting in DAGCombiner miscompiles. This patch fixes the problem by a defensive approach letting BaseIndexOffset::computeAliasing return false, i.e. that the aliasing couldn't be determined, when comparing two global values and at least one is a GlobalAlias. In the future we might improve this with a deeper analysis to look at the aliasee for the GlobalAlias etc. But that is a bit more complicated considering that we could have 'local_unnamed_addr' and situations with several 'alias' variables. Fixes PR51878. Differential Revision: https://reviews.llvm.org/D110064	2021-10-05 12:15:55 +02:00
Jay Foad	0a031f5c88	[GlobalISel] Simplify narrowScalarMul. NFC. Remove some redundancy because the source and result types of any multiply are always the same.	2021-10-05 10:53:12 +01:00
Jay Foad	0bd4365445	[LiveIntervals] Fix verification of early-clobbered segments Enable verification of live intervals immediately after computing them (when -early-live-intervals is used) and fix a problem that that provokes: currently the verifier insists that a segment that ends at an early-clobber slot must be followed by another segment starting at the same slot. But before TwoAddressInstruction runs, the equivalent condition is: a segment that ends at an early-clobber slot must have its last use tied to an early-clobber def. That condition is harder to check here, so for now just disable this check until tied operands have been rewritten. Differential Revision: https://reviews.llvm.org/D111065	2021-10-05 08:17:56 +01:00
Amara Emerson	cfef1803dd	[GlobalISel] Port over the SelectionDAG stack protector codegen feature. This is a port of the feature that allows the StackProtector pass to omit checking code for stack canary checks, and rely on SelectionDAG to do it at a later stage. The reasoning behind this seems to be to prevent the IR checking instructions from hindering tail-call optimizations during codegen. Here we allow GlobalISel to also use that scheme. Doing so requires that we do some analysis using some factored-out code to determine where to generate code for the epilogs. Not every case is handled in this patch since we don't have support for all targets that exercise different stack protector schemes. Differential Revision: https://reviews.llvm.org/D98200	2021-10-04 21:33:44 -07:00
Amara Emerson	c93bc508ee	Revert "Revert "[GlobalISel][IRTranslator] Emit trap intrinsic for "unreachable""" This reverts commit `d95cd81141`. The selector sometimes leaves unreachable blocks unselected because it uses a postorder traversal for the block ordering. With the trap intrinsics now being emitted, these blocks are no longer empty and the unselected G_INTRINSIC instructions survive past selection. To fix this, keep track of which blocks are selected and later delete any blocks that weren't selected.	2021-10-04 18:10:28 -07:00
Amara Emerson	d95cd81141	Revert "[GlobalISel][IRTranslator] Emit trap intrinsic for "unreachable"" This reverts commit `019041bec3`. It broke some bots.	2021-10-04 15:44:52 -07:00
Jeremy Morse	e2b838dd91	[DebugInfo][InstrRef] Accept landingpad block arguments This patch makes instruction-referencing accepts an additional scenario where values can be read from physical registers at the start of blocks. As far as I was aware, this only happened: * With arguments in the entry block, * With constant physical registers, To which this patch adds a third case: * With exception-handling landing-pad blocks In the attached test: the operand of the dbg.value traces back to the "landingpad" instruction, which becomes some copies from physregs. Right now, that's deemed unacceptable, and the assertion fires. The fix is to just accept this scenario; this is a case where the value in question is defined by a register and a position, not by an instruction that defines it. Reading it with a DBG_PHI is the correct behaviour, there isn't a non-copy instruction that we can refer to. Differential Revision: https://reviews.llvm.org/D109005	2021-10-04 23:03:02 +01:00
Amara Emerson	8bde5e58c0	Delay outgoing register assignments to last. The delayed stack protector feature which is currently used for SDAG (and thus allows for more commonly generating tail calls) depends on being able to extract the tail call into a separate return block. To do this it also has to extract the vreg->physreg copies that set up the call's arguments, since if it doesn't then the call inst ends up using undefined physregs in it's new spliced block. SelectionDAG implementations can do this because they delay emitting register copies until after the stack arguments are set up. GISel however just processes and emits the arguments in IR order, so stack arguments always end up last, and thus this breaks the code that looks for any register arg copies that precede the call instruction. This patch adds a thunk argument to the assignValueToReg() and custom assignment hooks. For outgoing arguments, register assignments use this return param to return a thunk that does the actual generating of the copies. We collect these until all the outgoing stack assignments have been done and then execute them, so that the copies (and perhaps some artifacts like G_SEXTs) are placed after any stores. Differential Revision: https://reviews.llvm.org/D110610	2021-10-04 12:33:20 -07:00
Jay Foad	24688f8fdf	Revert "[GlobalISel] Support vectors in LegalizerHelper::narrowScalarMul" This reverts commit `90da0b9a5a`. It was causing an LLVM_ENABLE_EXPENSIVE_CHECKS buildbot failure.	2021-10-04 20:26:30 +01:00
Amara Emerson	dafcbfdaa0	[GlobalISel] Widen G_EXTRACT_VECTOR_ELT using anyext instead of sext. G_SEXT seems to be unnecessary here, anyext will do. Differential Revision: https://reviews.llvm.org/D110469	2021-10-04 12:19:19 -07:00
Jay Foad	90da0b9a5a	[GlobalISel] Support vectors in LegalizerHelper::narrowScalarMul Also remove some redundancy because the source and result types of any multiply are always the same. Differential Revision: https://reviews.llvm.org/D110926	2021-10-04 19:33:38 +01:00
Amara Emerson	019041bec3	[GlobalISel][IRTranslator] Emit trap intrinsic for "unreachable" We were previously just ignoring unreachable, but targets like Darwin want to keep unreachable instructions as traps. Differential Revision: https://reviews.llvm.org/D110603	2021-10-04 11:02:29 -07:00
Jay Foad	a9bceb2b05	[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC. Stop using APInt constructors and methods that were soft-deprecated in D109483. This fixes all the uses I found in llvm, except for the APInt unit tests which should still test the deprecated methods. Differential Revision: https://reviews.llvm.org/D110807	2021-10-04 08:57:44 +01:00
Kazu Hirata	d34cd75d89	[Analysis, CodeGen] Migrate from arg_operands to args (NFC) Note that arg_operands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-10-03 08:22:20 -07:00
Takafumi Arakaki	e8806d7486	Re-apply the fix on DwarfEHPrepare and add a test This patch re-introduces the fix in the commit https://github.com/llvm/llvm-project/commit/66b0cebf7f736 by @yrnkrn > In DwarfEHPrepare, after all passes are run, RewindFunction may be a dangling > > pointer to a dead function. To make sure it's valid, doFinalization nullptrs > RewindFunction just like the constructor and so it will be found on next run. > > llvm-svn: 217737 It seems that the fix was not migrated to `DwarfEHPrepareLegacyPass`. This patch also updates `llvm/test/CodeGen/X86/dwarf-eh-prepare.ll` to include `-run-twice` to exercise the cleanup. Without this patch `llvm-lit -v llvm/test/CodeGen/X86/dwarf-eh-prepare.ll` fails with ``` -- Testing: 1 tests, 1 workers -- FAIL: LLVM :: CodeGen/X86/dwarf-eh-prepare.ll (1 of 1) ****************** TEST 'LLVM :: CodeGen/X86/dwarf-eh-prepare.ll' FAILED ****************** Script: -- : 'RUN: at line 1'; /home/arakaki/build/llvm-project/main/bin/opt -mtriple=x86_64-linux-gnu -dwarfehprepare -simplifycfg-require-and-preserve-domtree=1 -run-twice < /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll -S \| /home/arakaki/build/llvm-project/main/bin/FileCheck /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll -- Exit Code: 2 Command Output (stderr): -- Referencing function in another module! call void @_Unwind_Resume(i8* %ehptr) #1 ; ModuleID = '<stdin>' void (i8) @_Unwind_Resume ; ModuleID = '<stdin>' in function simple_cleanup_catch LLVM ERROR: Broken function found, compilation aborted! PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. Stack dump: 0. Program arguments: /home/arakaki/build/llvm-project/main/bin/opt -mtriple=x86_64-linux-gnu -dwarfehprepare -simplifycfg-require-and-preserve-domtree=1 -run-twice -S 1. Running pass 'Function Pass Manager' on module '<stdin>'. 2. Running pass 'Module Verifier' on function '@simple_cleanup_catch' #0 0x000056121b570a2c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Unix/Signals.inc:569:0 #1 0x000056121b56eb64 llvm::sys::RunSignalHandlers() /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Signals.cpp:97:0 #2 0x000056121b56f28e SignalHandler(int) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Unix/Signals.inc:397:0 #3 0x00007fc7e9b22980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980) #4 0x00007fc7e87d3fb7 raise /build/glibc-S7xCS9/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0 #5 0x00007fc7e87d5921 abort /build/glibc-S7xCS9/glibc-2.27/stdlib/abort.c:81:0 #6 0x000056121b4e1386 llvm::raw_svector_ostream::raw_svector_ostream(llvm::SmallVectorImpl<char>&) /home/arakaki/repos/watch/llvm-project/llvm/include/llvm/Support/raw_ostream.h:674:0 #7 0x000056121b4e1386 llvm::report_fatal_error(llvm::Twine const&, bool) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/ErrorHandling.cpp:114:0 #8 0x000056121b4e1528 (/home/arakaki/build/llvm-project/main/bin/opt+0x29e3528) #9 0x000056121adfd03f llvm::raw_ostream::operator<<(llvm::StringRef) /home/arakaki/repos/watch/llvm-project/llvm/include/llvm/Support/raw_ostream.h:218:0 FileCheck error: '<stdin>' is empty. FileCheck command line: /home/arakaki/build/llvm-project/main/bin/FileCheck /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll -- ****************** ****************** Failed Tests (1): LLVM :: CodeGen/X86/dwarf-eh-prepare.ll Testing Time: 0.22s Failed: 1 ``` Reviewed By: loladiro Differential Revision: https://reviews.llvm.org/D110979	2021-10-02 21:50:35 -04:00
Simon Pilgrim	df672f66b6	[DAG] scalarizeExtractedVectorLoad - replace getABITypeAlign with allowsMemoryAccess (PR45116) One of the cases identified in PR45116 - we don't need to limit extracted loads to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback. I've also cleaned up the alignment calculation code - if we have a constant extraction index then the alignment can be based on an offset from the original vector load alignment, but for non-constant indices we should assume the worst (single element alignment only). Differential Revision: https://reviews.llvm.org/D110486	2021-10-01 21:07:34 +01:00
Jay Foad	dff3454bda	[TwoAddressInstruction] Tweak constraining of tied operands In collectTiedOperands, when handling an undef use that is tied to a def, constrain the dst reg with the actual register class of the src reg, instead of with the register class from the instructions's MCInstrDesc. This makes a difference in some AMDGPU test cases like this, before: %16:sgpr_96 = INSERT_SUBREG undef %15:sgpr_96_with_sub0_sub1(tied-def 0), killed %11:sreg_64_xexec, %subreg.sub0_sub1 After, without this patch: undef %16.sub0_sub1:sgpr_96 = COPY killed %11:sreg_64_xexec This fails machine verification if you force it to run after TwoAddressInstruction (currently it is disabled) with: * Bad machine code: Invalid register class for subregister index * - function: s_load_constant_v3i32_align4 - basic block: %bb.0 (0xa011a88) - instruction: undef %16.sub0_sub1:sgpr_96 = COPY killed %11:sreg_64_xexec - operand 0: undef %16.sub0_sub1:sgpr_96 Register class SGPR_96 does not fully support subreg index 4 After, with this patch: undef %16.sub0_sub1:sgpr_96_with_sub0_sub1 = COPY killed %11:sreg_64_xexec See also svn r159120 which introduced the code to handle tied undef uses. Differential Revision: https://reviews.llvm.org/D110944	2021-10-01 20:57:58 +01:00
Jay Foad	31c92d515d	[MachineLoopInfo] Enable machine verification after this pass Enabling this does not show any problems in check-llvm in an LLVM_ENABLE_EXPENSIVE_CHECKS build. Differential Revision: https://reviews.llvm.org/D110703	2021-10-01 18:15:57 +01:00
Jay Foad	04787239c9	[LiveVariables] Skip verification of kills inside bundles LiveVariables does not examine the contents of bundles, so MachineVerifier should not expect it to know about kill flags on operands of instructions inside a bundle. With this fix we can enable machine verification after running the LiveVariables analysis. Doing this does not show any problems in check-llvm in an LLVM_ENABLE_EXPENSIVE_CHECKS build. Differential Revision: https://reviews.llvm.org/D110700	2021-10-01 18:15:57 +01:00
Jay Foad	08d41f75d9	[UnreachableMachineBlockElim] Enable machine verification after this pass Enabling this does not show any problems in check-llvm in an LLVM_ENABLE_EXPENSIVE_CHECKS build. Differential Revision: https://reviews.llvm.org/D110697	2021-10-01 18:15:57 +01:00
Jay Foad	2bfe777a45	[ProcessImplicitDefs] Enable machine verification after this pass Enabling this does not show any problems in check-llvm in an LLVM_ENABLE_EXPENSIVE_CHECKS build. Differential Revision: https://reviews.llvm.org/D110695	2021-10-01 18:15:56 +01:00
Jay Foad	fd8e99700d	[DetectDeadLanes] Enable machine verification after this pass Machine verification after DetectDeadLanes has been disabled since the pass was first added in D18427, but I guess this was just due to copy- and-paste. Enabling it does not show any problems in check-llvm in an LLVM_ENABLE_EXPENSIVE_CHECKS build. Differential Revision: https://reviews.llvm.org/D110689	2021-10-01 18:15:56 +01:00
Marcelo Juchem	dfb213c2df	Fix ambiguous overload build failure LLVM (llvmorg-14-init) under Debian sid using latest gcc (Debian 10.3.0-9) 10.3.0 fails due to ambiguous overload on operators == and !=: /root/src/llvm/src/llvm/tools/obj2yaml/elf2yaml.cpp:212:22: error: ambiguous overload for 'operator!=' (operand types are 'llvm::ELFYAML::ELF_SHF' and 'int') /root/src/llvm/src/llvm/tools/obj2yaml/elf2yaml.cpp:204:32: error: ambiguous overload for 'operator!=' (operand types are 'const llvm::yaml::Hex64' and 'int') /root/src/llvm/src/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp:629:35: error: ambiguous overload for 'operator==' (operand types are 'const uint64_t' {aka 'const long unsigned int'} and 'llvm::Register') Reviewed by: StephenTozer, jmorse, Higuoxing Differential Revision: https://reviews.llvm.org/D109534	2021-10-01 14:19:57 +01:00
Sander de Smalen	b62e6f19d7	[SelectionDAG] Handle promotion + widening in getCopyToPartsVector Some vectors require both widening and promotion for their legalization. This case is not yet handled in getCopyToPartsVector and falls back on scalarizing by default. BBecause scalable vectors can't easily be scalarised, we need to implement this in two separate stages: 1. Widen the vector. 2. Promote the vector. As part of this patch, PromoteIntRes_CONCAT_VECTORS also needed to be made scalable aware. Instead of falling back on scalarizing the vector (fixed-width only), each sub-part of the CONCAT vector is promoted, and the operation is performed on the type with the widest element type, finally truncating the result to the promoted result type. Differential Revision: https://reviews.llvm.org/D110646	2021-10-01 08:19:47 +01:00
Christopher Tetreault	3077bc90de	[NFC] Restore magic and magicu to a globally visible location While these functions are only used in one location in upstream, it has been reused in multiple downstreams. Restore this file to a globally visibile location (outside of APInt.h) to eliminate donwstream breakage and enable potential future reuse. Additionally, this patch renames types and cleans up clang-tidy issues.	2021-09-30 17:43:12 -07:00
Amara Emerson	ca8316b704	[GlobalISel] Extend CombinerHelper::matchConstantOp() to match constant splat vectors. This allows the "x op 0 -> x" fold to optimize vector constant RHSs. Differential Revision: https://reviews.llvm.org/D110802	2021-09-30 14:31:25 -07:00
Amara Emerson	80f4bb5c61	[GlobalISel] Extend G_SELECT of known condition combine to vectors. Adds a new utility function: isConstantOrConstantSplatVector(). Differential Revision: https://reviews.llvm.org/D110786	2021-09-30 12:16:44 -07:00
Kazu Hirata	f631173d80	[llvm] Migrate from arg_operands to args (NFC) Note that arg_operands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-09-30 08:51:21 -07:00
Brock Wyma	bafd8b1add	[CodeView] Recognize Fortran95 as Fortran instead of MASM Map Fortran95 sources to Fortran so the CodeView language is not emitted as MASM. Differential Revision: https://reviews.llvm.org/D110330	2021-09-30 09:27:05 -04:00
Jay Foad	156d7d2df7	[LiveIntervals] Remove unused subreg ranges in repairIntervalsInRange If the old instructions mentioned a subreg that the new instructions do not, remove the subrange for that subreg. For example, in TwoAddressInstructionPass::eliminateRegSequence, if a use operand in the REG_SEQUENCE has the undef flag then we don't generate a copy for it so after the elimination there should be no live interval at all for the corresponding subreg of the def. This is a small step towards switching TwoAddressInstructionPass over from LiveVariables to LiveIntervals. Currently this path is only tested if you explicitly enable -early-live-intervals. Differential Revision: https://reviews.llvm.org/D110542	2021-09-30 09:15:10 +01:00
Sander de Smalen	6709b193ea	[SelectionDAG] Make WidenVecRes_EXTRACT_SUBVECTOR work for scalable vectors. The legalizer handles this by breaking up an EXTRACT_SUBVECTOR into smaller parts, and combines those together, padding the result with UNDEF vectors, e.g. nxv6i64 extract_subvector(nxv12i64, 6) <-> nxv8i64 concat( nxv2i64 extract_subvector(nxv16i64, 6) nxv2i64 extract_subvector(nxv16i64, 8) nxv2i64 extract_subvector(nxv16i64, 10) nxv2i64 undef) Reviewed By: frasercrmck, david-arm Differential Revision: https://reviews.llvm.org/D110253	2021-09-29 11:33:45 +01:00
Jay Foad	27179b39f9	[RemoveRedundantDebugValues] Enable machine verification after this pass Machine verification after RemoveRedundantDebugValues has been disabled since the pass was first added in D105279, but I guess this was just due to copy-and-paste. Enabling it does not show any problems in check-llvm in an LLVM_ENABLE_EXPENSIVE_CHECKS build. Differential Revision: https://reviews.llvm.org/D110688	2021-09-29 10:44:35 +01:00
Itay Bookstein	7255ce30e4	[SelectionDAG] Fix incorrect condition for shift amount truncation Comment says: // If the operand is larger than the shift count type but the shift // count type has enough bits to represent any shift value ... It clearly talks about the shifted operand, not the shift-amount operand, but the comparison is performed against Log2_32_Ceil(Op2.getValueSizeInBits()) where Op2 is the shift amount operand. This comparison also doesn't make sense in the context of the previous one (ShiftsSize > Op2Size) because Op2Size == Op2.getValueSizeInBits(). Fix to use Op1. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110509	2021-09-28 17:52:30 -07:00
Jessica Paquette	15a24e1fdb	[GlobalISel] Combine mulo x, 2 -> addo x, x Similar to what SDAG does when it sees a smulo/umulo against 2 (see: `DAGCombiner::visitMULO`) This pattern is fairly common in Swift code AFAICT. Here's an example extracted from a Swift testcase: https://godbolt.org/z/6cT8Mesx7 Differential Revision: https://reviews.llvm.org/D110662	2021-09-28 16:59:43 -07:00
Arthur Eubanks	aa53785f23	Reland [clang] Rework dontcall attributes To avoid using the AST when emitting diagnostics, split the "dontcall" attribute into "dontcall-warn" and "dontcall-error", and also add the frontend attribute value as the LLVM attribute value. This gives us all the information to report diagnostics we need from within the IR (aside from access to the original source). One downside is we directly use LLVM's demangler rather than using the existing Clang diagnostic pretty printing of symbols. Previous revisions didn't properly declare the new dependencies. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D110364	2021-09-28 15:31:30 -07:00
Shoaib Meenai	f9b3c18e74	[CodeGen] Fix wrapping personality symbol on ARM The ARM backend was explicitly setting global binding on the personality symbol. This was added without any comment in `a7ec2dcefd`, which introduced EHABI support (back in 2011). None of the other backends do anything equivalent, as far as I can tell. This causes problems when attempting to wrap the personality symbol. Wrapped symbols are marked as weak inside LTO to inhibit IPO (see https://reviews.llvm.org/D33621). When we wrap the personality symbol, it initially gets weak binding, and then the ARM backend attempts to change the binding to global, which causes an error in MC because of attempting to change the binding of a symbol from non-global to global (the error was added in https://reviews.llvm.org/D90108). Simply drop the ARM backend's explicit global binding setting to fix this. This matches all the other backends, and a large internal application successfully linked and ran with this change, so it shouldn't cause any problems. Test via LLD, since wrapping is required to exhibit the issue. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D110609	2021-09-28 15:01:05 -07:00
Arthur Eubanks	7833d20f1f	Revert "[clang] Rework dontcall attributes" This reverts commit `2943071e2e`. Breaks bots	2021-09-28 14:49:27 -07:00
Arthur Eubanks	2943071e2e	[clang] Rework dontcall attributes To avoid using the AST when emitting diagnostics, split the "dontcall" attribute into "dontcall-warn" and "dontcall-error", and also add the frontend attribute value as the LLVM attribute value. This gives us all the information to report diagnostics we need from within the IR (aside from access to the original source). One downside is we directly use LLVM's demangler rather than using the existing Clang diagnostic pretty printing of symbols. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D110364	2021-09-28 14:21:10 -07:00
Jay Foad	845b93e692	[LiveIntervals] Fix another asan debug build failure Call RemoveMachineInstrFromMaps before erasing instrs. repairIntervalsInRange will do this for you after erasing the instruction, but it's not safe to rely on it because assertions in SlotIndexes::removeMachineInstrFromMaps refer to fields in the erased instruction. This fixes asan buildbot failures caused by D110335.	2021-09-28 11:09:38 +01:00
“bhkumarn”	62eeacce17	[DebugInfo] Emit DW_TAG_namelist and DW_TAG_namelist_item This patch emits DW_TAG_namelist and DW_TAG_namelist_item for fortran namelist variables. DICompositeType is extended to support this fortran feature. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D108553	2021-09-28 14:40:58 +05:30
Jay Foad	20c0280733	[LiveIntervals] Repair subreg ranges in processTiedPairs In TwoAddressInstructionPass::processTiedPairs, update subranges of the live interval for RegB as well as the main range. This is a small step towards switching TwoAddressInstructionPass over from LiveVariables to LiveIntervals. Currently this path is only tested if you explicitly enable -early-live-intervals. Differential Revision: https://reviews.llvm.org/D110526	2021-09-28 08:10:16 +01:00
Jay Foad	b2b1a8b833	[LiveIntervals] Improve repair after convertToThreeAddress After TwoAddressInstructionPass calls TargetInstrInfo::convertToThreeAddress, improve the LiveIntervals repair to cope with convertToThreeAddress creating more than one new instruction. This mostly seems to benefit X86. For example in test/CodeGen/X86/zext-trunc.ll it converts: %4:gr32 = ADD32rr %3:gr32(tied-def 0), %2:gr32, implicit-def dead $eflags to: undef %6.sub_32bit:gr64 = COPY %3:gr32 undef %7.sub_32bit:gr64_nosp = COPY %2:gr32 %4:gr32 = LEA64_32r killed %6:gr64, 1, killed %7:gr64_nosp, 0, $noreg Differential Revision: https://reviews.llvm.org/D110335	2021-09-28 08:10:08 +01:00
Xiang1 Zhang	ebe9944a34	[ISel] Legalized arithmetic.fence.f128 for 32-bits target Reviewed By: Craig Topper, Wang Pengfei Differential Revision: https://reviews.llvm.org/D110467	2021-09-28 10:27:25 +08:00
Fraser Cormack	e2b46e336b	[DAGCombiner][VP] Fold zero-length or false-masked VP ops This patch adds a generic DAGCombine for vector-predicated (VP) nodes. Those for which we can determine that no vector element is active can be replaced by either undef or, for reductions, the start value. This is tested rather trivially at the IR level, where it's possible that we want to teach instcombine to perform this optimization. However, we can also see the zero-evl case arise during SelectionDAG legalization, when wide VP operations can be split into two and the upper operation emerges as trivially false. It's possible that we could perform this optimization "proactively" (both on legal vectors and before splitting) and reduce the width of an operation and insert it into a larger undef vector: ``` v8i32 vp_add x, y, mask, 4 -> v8i32 insert_subvector (v8i32 undef), (v4i32 vp_add xsub, ysub, mask, 4), i32 0 ``` This is somewhat analogous to similar vector narrow/widening optimizations, but it's unclear at this point whether that's beneficial to do this for VP ops for any/all targets. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D109148	2021-09-27 11:30:09 +01:00
Simon Pilgrim	18c8ed5416	[DAG] ReduceLoadOpStoreWidth - replace getABITypeAlign with allowsMemoryAccess (PR45116) One of the cases identified in PR45116 - we don't need to limit store narrowing to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory access by checking allowsMisalignedMemoryAccesses as a fallback.	2021-09-25 18:35:57 +01:00
Simon Pilgrim	6bd5b1b1ce	[DAG] combineShiftToMULH - move getValueType() inside assert. NFCI. Avoids an unnecessary (void).	2021-09-25 11:56:35 +01:00
David Blaikie	5cb210862b	DebugInfo: Use the signedness of the underlying enum when encoding enum non-type-template-parameters This improves the accuracy of the debug info and improves round tripping through -gsimple-template-names.	2021-09-24 17:02:55 -07:00
Jay Foad	ac51ad24a7	[LiveIntervals] Fix asan debug build failures Call RemoveMachineInstrFromMaps before erasing instrs. repairIntervalsInRange will do this for you after erasing the instruction, but it's not safe to rely on it because assertions in SlotIndexes::removeMachineInstrFromMaps refer to fields in the erased instruction. This fixes asan buildbot failures caused by D110328.	2021-09-24 19:14:57 +01:00
Stanislav Mekhanoshin	08d7eec06e	Revert "Allow rematerialization of virtual reg uses" Reverted due to two distcint performance regression reports. This reverts commit `92c1fd19ab`.	2021-09-24 10:26:11 -07:00
Jay Foad	e4e95f14f1	[LiveIntervals] Repair live intervals that gain subranges In repairIntervalsInRange, if the new instructions refer to subregs but the old instructions did not, make sure any existing live interval for the superreg is updated to have subranges. Also skip repairing any range that we have recalculated from scratch, partly for efficiency but also to avoids some cases that repairOldRegInRange can't handle. The existing test/CodeGen/AMDGPU/twoaddr-regsequence.mir provides some test coverage for this change: when TwoAddressInstructionPass converts REG_SEQUENCE into subreg copies, the live intervals will now get subranges and MachineVerifier will verify that the subranges are correct. Unfortunately MachineVerifier does not complain if the subranges are not present, so the test also passed before this patch. This patch also fixes ~800 of the ~1500 failures in the whole CodeGen lit test suite when -early-live-intervals is forced on. Differential Revision: https://reviews.llvm.org/D110328	2021-09-24 11:58:08 +01:00
Jay Foad	7863cc6c1c	[LiveIntervals] Fix repairOldRegInRange for simple def cases The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange" was over-zealous. It would bail out when the end of the range to be repaired was in the middle of the first segment of the live range of Reg, which was always the case when the range contained a single def of Reg. This patch fixes it as suggested by Matthias Braun in post-commit review on the original patch, and tests it by adding -early-live-intervals to a selection of existing lit tests that now pass. (Note that D23303 was originally applied to fix a crash in SILoadStoreOptimizer, but that is now moot since D23814 updated SILoadStoreOptimizer to run before scheduling so it no longer has to update live intervals.) Differential Revision: https://reviews.llvm.org/D110238 Unrevert with some changes to the tests: - Add -verify-machineinstrs to check for remaining problems in live interval support in TwoAddressInstructionPass. - Drop test/CodeGen/AMDGPU/extract-load-i1.ll since it suffers from some of those remaining problems.	2021-09-24 11:44:49 +01:00
Amara Emerson	9f773b17c2	[GlobalISel][IRTranslator] Fix crash during bit-test switch optimization with odd types. Odd switch case types cause a crash in the conversion to MVT. Instead use a pointer sized scalar type which is what SDAG does in these cases.	2021-09-24 00:19:27 -07:00
Matt Arsenault	2875d3d484	RegAllocGreedy: Remove an unhelpful auto, and don't use a reference	2021-09-23 17:25:25 -04:00
Jay Foad	deb2ca566a	Revert "[LiveIntervals] Fix repairOldRegInRange for simple def cases" This reverts commit `8229cb7412`. It was failing on buildbots with expensive checks enabled.	2021-09-23 17:55:05 +01:00
Jay Foad	8229cb7412	[LiveIntervals] Fix repairOldRegInRange for simple def cases The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange" was over-zealous. It would bail out when the end of the range to be repaired was in the middle of the first segment of the live range of Reg, which was always the case when the range contained a single def of Reg. This patch fixes it as suggested by Matthias Braun in post-commit review on the original patch, and tests it by adding -early-live-intervals to a selection of existing lit tests that now pass. (Note that D23303 was originally applied to fix a crash in SILoadStoreOptimizer, but that is now moot since D23814 updated SILoadStoreOptimizer to run before scheduling so it no longer has to update live intervals.) Differential Revision: https://reviews.llvm.org/D110238	2021-09-23 17:16:14 +01:00
Craig Topper	d5c67bba62	[RegAlloc] Cast uint8_t to unsigned before printing it. raw_ostream interprets uint8_t as wanting to print a character with that ASCII value. In this case the uint8_t is an integer that we want to print.	2021-09-23 08:49:44 -07:00
Simon Pilgrim	2a5936faf0	[CodeGen] ProcessSDDbgValues - use const-ref value in for-range loop. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-23 12:23:46 +01:00
Simon Pilgrim	5cabe4d9d3	[CodeGen] RegisterCoalescer::buildVRegToDbgValueMap - use const-ref value in for-range loop. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-23 12:23:45 +01:00
Fraser Cormack	e7c879a69d	[RISCV][VP] Add support for VP_REDUCE_* operations This patch adds codegen support for lowering the vector-predicated reduction intrinsics to RVV instructions. The process is similar to that of the other reduction intrinsics, save for the fact that every VP reduction has a start value. We reuse the existing custom "VL" nodes, adding extra patterns where required to handle non-true masks. To support these nodes, the `RISCVISD::VECREDUCE_*_VL` nodes have been given an explicit "merge" operand. This is to faciliate the VP reductions, where we must be careful to ensure that even if no operation is performed (when VL=0) we still produce the start value. The RVV reductions don't update the destination register under these conditions, so we tie the splatted start value to the output register. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107657	2021-09-23 11:11:05 +01:00
Jay Foad	6cef28ed2d	[TII] Remove the MFI argument to convertToThreeAddress. NFC. This simplifies the API and addresses a FIXME in TwoAddressInstructionPass::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D110229	2021-09-23 08:58:46 +01:00
Bjorn Pettersson	c3ae8ecb52	[DAGCombiner] Rename isAlias as mayAlias. NFC Differential Revision: https://reviews.llvm.org/D110062	2021-09-23 09:54:42 +02:00
Freddy Ye	13207a21a6	[NFC] Remove redundant setOperationAction. [FROUND,FROUNDEVEN][f32, f64, f128] are set Expand twice. Differential Revision: https://reviews.llvm.org/D110302	2021-09-23 10:28:21 +08:00
David Green	c49611f909	Mark CFG as preserved in TypePromotion and InterleaveAccess passes Neither of these passes modify the CFG, allowing us to preserve DomTree and LoopInfo across them by using setPreservesCFG. Differential Revision: https://reviews.llvm.org/D110161	2021-09-22 18:58:00 +01:00
Daniil Fukalov	1a7b7d7ba2	[NFCI][CodeGen, AArch64] Fix inconsistent TargetCostKind types. The pass uses different cost kinds to estimate "old" and "interleaved" costs: default cost kind for all targets override `getInterleavedMemoryOpCost()` is `TCK_SizeAndLatency`. Although at the moment estimated `TCK_Latency` costs are equal to `TCK_SizeAndLatency`, (so the change is NFC) it may change in future. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110100	2021-09-22 20:15:17 +03:00
Hongtao Yu	d9b511d8e8	[CSSPGO] Set PseudoProbeInserter as a default pass. Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D110209	2021-09-22 09:09:48 -07:00
Kazu Hirata	3c557cd7f9	[CodeGen] Remove redundant declaration MIRCanonicalizerID (NFC) Note that MIRCanonicalizerID is declared in llvm/include/llvm/CodeGen/Passes.h, which MIRCanonicalizerPass.cpp includes. Identified with readability-redundant-declaration.	2021-09-22 08:58:27 -07:00
Sander de Smalen	3e8d2008f7	[SelectionDAG] Remove PromoteIntOp_EXTRACT_SUBVECTOR. This code seems untested and is likely obsolete, because this case should already be handled by the code that legalizes the result type of EXTRACT_SUBVECTOR. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110061	2021-09-22 14:23:35 +01:00
Sander de Smalen	d5681f1d68	[SelectionDAG] Add PromoteIntOp_INSERT_SUBVECTOR. This is required to codegen something like: <vscale x 8 x i16> @llvm.experimental.vector.insert(<vscale x 8 x i16> %vec, <vscale x 2 x i16> %subvec, i64 %idx) where the output vector is legal, but the input vector needs promoting. It implements this by performing the whole operation on the promoted type, and then truncating the result. Reviewed By: david-arm, craig.topper Differential Revision: https://reviews.llvm.org/D110059	2021-09-22 13:32:36 +01:00
Sander de Smalen	4ca1fbe361	[SelectionDAG] Make WidenVecRes_Convert work for scalable vectors. Most of the code wasn't yet scalable safe, although most of the code conceptually just works for scalable vectors. This change makes the algorithm work on ElementCount, where appropriate, and leaves the fixed-width only code to use `getFixedNumElements`. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D110058	2021-09-22 10:58:38 +01:00
Arthur Eubanks	e42234383e	Make DiagnosticInfoResourceLimit's limit param required And always print it. This makes some LLVM diagnostics match up better with Clang's diagnostics. Updated some AMDGPU uses of DiagnosticInfoResourceLimit and now we print better diagnostics for those. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D110204	2021-09-21 15:27:58 -07:00
Craig Topper	aeb63d464f	[RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for and/or/xor. This requires a minor change to CodeGenPrepare to ensure that shouldSinkOperands will be called for And. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D110106	2021-09-21 10:07:29 -07:00
Michael Liao	5fb3ae525f	[SelectionDAG] Re-calculate scoped AA metadata when merging stores. Reviewed By: jeroen.dobbelaere Differential Revision: https://reviews.llvm.org/D102821	2021-09-21 11:41:17 -04:00
Aleksandr Bezzubikov	624e4d087e	[GlobalISel] Support ConstantAsMetadata in IRTranslator When using instructions which have a MetadataAsValue argument (e.g. some target-specific intrinsics) MD canonicalization strips internal MDNodes with a single ConstantAsMetadata child. That prevented IRTranslator from the proper translation of such a calls.	2021-09-21 11:24:56 -04:00
Simon Pilgrim	20b58855e0	[CodeGen] SelectionDAGBuilder - Use const-ref iterator in for-range loops. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-21 13:01:08 +01:00
Simon Pilgrim	0f83456cf5	[CodeGen] SDDbgValue::getSDNodes() - use const-ref to avoid unnecessary copies. NFCI. Reported by MSVC static analyzer.	2021-09-21 13:01:08 +01:00
Petar Avramovic	8bc7185668	GlobalISel/Utils: Refactor constant splat match functions Add generic helper function that matches constant splat. It has option to match constant splat with undef (some elements can be undef but not all). Add util function and matcher for G_FCONSTANT splat. Differential Revision: https://reviews.llvm.org/D104410	2021-09-21 12:09:35 +02:00
Amara Emerson	7091a7f781	[GlobalISel][Legalizer] Don't use eraseFromParentAndMarkDBGValuesForRemoval() for some artifacts. For artifacts excluding G_TRUNC/G_SEXT, which have IR counterparts, we don't seem to have debug users of defs. However, in the legalizer we're always calling MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() which is expensive. In some rare cases, this contributes significantly to unreasonably long compile times when we have lots of artifact combiner activity. To verify this, I added asserts to that function when it actually replaced a debug use operand with undef for these artifacts. On CTMark with both -O0 and -Os and debug info enabled, I didn't see a single case where it triggered. In my measurements I saw around a 0.5% geomean compile-time improvement on -g -O0 for AArch64 with this change. Differential Revision: https://reviews.llvm.org/D109750	2021-09-20 23:34:42 -07:00
Amara Emerson	f9d69a0ab0	[GlobalISel] Implement support for the "trap-func-name" attribute. This attribute calls a function instead of emitting a trap instruction. Differential Revision: https://reviews.llvm.org/D110098	2021-09-20 14:32:01 -07:00
Petar Avramovic	e4c46ddd91	[GlobalISel] Improve elimination of dead instructions in legalizer Add eraseInstr(s) utility functions. Before deleting an instruction collects its use instructions. After deletion deletes use instructions that became trivially dead. This patch clears all dead instructions in existing legalizer mir tests. Differential Revision: https://reviews.llvm.org/D109154	2021-09-20 13:00:58 +02:00
Kazu Hirata	84b07c9b3a	[llvm] Use pop_back_val (NFC)	2021-09-19 13:44:23 -07:00
Kazu Hirata	48719e3b18	[CodeGen] Use make_early_inc_range (NFC)	2021-09-18 09:29:24 -07:00
Kazu Hirata	e2febc2ed4	[llvm] Use drop_begin (NFC)	2021-09-17 09:16:40 -07:00
Simon Pilgrim	4af7643470	[CodeGen] LiveDebug - Use const-ref iterator in for-range loop. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-17 14:04:54 +01:00
Simon Pilgrim	9e70d4e5f2	[AsmPrinter] DebugLocEntry::dump() - Use const-ref iterator in for-range loop. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-17 12:11:54 +01:00
Petar Avramovic	d477a7c2e7	GlobalISel/Utils: Refactor integer/float constant match functions Rework getConstantstVRegValWithLookThrough in order to make it clear if we are matching integer/float constant only or any constant(default). Add helper functions that get DefVReg and APInt/APFloat from constant instr getIConstantVRegValWithLookThrough: integer constant, only G_CONSTANT getFConstantVRegValWithLookThrough: float constant, only G_FCONSTANT getAnyConstantVRegValWithLookThrough: either G_CONSTANT or G_FCONSTANT Rename getConstantVRegVal and getConstantVRegSExtVal to getIConstantVRegVal and getIConstantVRegSExtVal. These now only match G_CONSTANT as described in comment. Relevant matchers now return both DefVReg and APInt/APFloat. Replace existing uses of getConstantstVRegValWithLookThrough and getConstantVRegVal with new helper functions. Any constant match is only required in: ConstantFoldBinOp: for constant argument that was bit-cast of float to int getAArch64VectorSplat: AArch64::G_DUP operands can be any constant amdgpu select for G_BUILD_VECTOR_TRUNC: operands can be any constant In other places use integer only constant match. Differential Revision: https://reviews.llvm.org/D104409	2021-09-17 11:22:13 +02:00
Nikita Popov	0fc624f029	[IR] Return AAMDNodes from Instruction::getMetadata() (NFC) getMetadata() currently uses a weird API where it populates a structure passed to it, and optionally merges into it. Instead, we can return the AAMDNodes and provide a separate merge() API. This makes usages more compact. Differential Revision: https://reviews.llvm.org/D109852	2021-09-16 21:06:57 +02:00
Kazu Hirata	cfc7402419	[llvm] Use drop_begin (NFC)	2021-09-16 08:46:26 -07:00
Doug Gregor	a773db7d76	Add a command-line flag to control the Swift extended async frame info. Introduce a new command-line flag `-swift-async-fp={auto\|always\|never}` that controls how code generation sets the Swift extended async frame info bit. There are three possibilities: * `auto`: which determines how to set the bit based on deployment target, either statically or dynamically via `swift_async_extendedFramePointerFlags`. * `always`: the default, always set the bit statically, regardless of deployment target. * `never`: never set the bit, regardless of deployment target. Patch by Doug Gregor <dgregor@apple.com> Reviewed By: doug.gregor Differential Revision: https://reviews.llvm.org/D109392	2021-09-16 06:57:45 -07:00
Konstantin Schwarz	d2e66d7fa4	[GlobalISel] Add a combine for and(load , mask) -> zextload This only handles simple masks, not shifted masks, for now. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D109357	2021-09-16 10:42:46 +02:00
Sam Parker	c98a8a09b5	[HardwareLoops] Loop guard intrinsic to recognise zext If a loop count was initially represented by a 32b unsigned int in C then the hardware-loop pass can recognise the loop guard and insert the llvm.test.set.loop.iterations intrinsic. If this was instead a unsigned short/char then clang inserts a zext instruction to expand the loop count to an i32. This patch adds the necessary pattern matching to enable the use of lvm.test.set.loop.iterations in those cases. Patch by: sherwin-dc Differential Revision: https://reviews.llvm.org/D109631	2021-09-16 08:33:16 +01:00
Alok Kumar Sharma	a5b72abc9e	[DebugInfo] Enhance DIImportedEntity to accept children entities New field `elements` is added to '!DIImportedEntity', representing list of aliased entities. This is needed to dump optimized debugging information where all names in a module are imported, but a few names are imported with overriding aliases. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D109343	2021-09-16 10:41:55 +05:30
Ahmed Bougacha	94a2f9cdb6	[GlobalISel] Fix CombinerHelper::isPredecessor for same def/use MI. The doc comment for isPredecessor says: Returns true if \p DefMI precedes \p UseMI or they are the same instruction. And dominates relies on that behavior for its own: Returns true if \p DefMI dominates \p UseMI. By definition an instruction dominates itself. Make both statements correct by fixing isPredecessor. Found by inspection.	2021-09-15 16:45:27 -07:00
Matt Arsenault	87c00878d3	SplitKit: Remove decade old live interval hack This was trying to fixup broken live intervals coming out of the coalescer. The verifier is more complete now and no tests seem to fail without this.	2021-09-15 17:35:59 -04:00
Amara Emerson	5ec1845cad	[AArch64][GlobalISel] Add a new reassociation for G_PTR_ADDs. G_PTR_ADD (G_PTR_ADD X, C), Y) -> (G_PTR_ADD (G_PTR_ADD(X, Y), C) Improves CTMark -Os on AArch64: Program before after diff sqlite3 286932 287024 0.0% kc 432512 432508 -0.0% SPASS 412788 412764 -0.0% pairlocalalign 249460 249416 -0.0% bullet 475740 475512 -0.0% 7zip-benchmark 568864 568356 -0.1% consumer-typeset 419088 418648 -0.1% tramp3d-v4 367628 367224 -0.1% clamscan 383184 382732 -0.1% lencod 430028 429284 -0.2% Geomean difference -0.1% Differential Revision: https://reviews.llvm.org/D109528	2021-09-14 23:57:41 -07:00
Matt Arsenault	54d755a034	DAG: Fix incorrect folding of fmul -1 to fneg The fmul is a canonicalizing operation, and fneg is not so this would break denormals that need flushing and also would not quiet signaling nans. Fold to fsub instead, which is also canonicalizing.	2021-09-14 21:25:02 -04:00
Matt Arsenault	4a36e96c3f	RegAllocGreedy: Account for reserved registers in num regs heuristic This simple heuristic uses the estimated live range length combined with the number of registers in the class to switch which heuristic to use. This was taking the raw number of registers in the class, even though not all of them may be available. AMDGPU heavily relies on dynamically reserved numbers of registers based on user attributes to satisfy occupancy constraints, so the raw number is highly misleading. There are still a few problems here. In the original testcase that made me notice this, the live range size is incorrect after the scheduler rearranges instructions, since the instructions don't have the original InstrDist offsets. Additionally, I think it would be more appropriate to use the number of disjointly allocatable registers in the class. For the AMDGPU register tuples, there are a large number of registers in each tuple class, but only a small fraction can actually be allocated at the same time since they all overlap with each other. It seems we do not have a query that corresponds to the number of independently allocatable registers. Relatedly, I'm still debugging some allocation failures where overlapping tuples seem to not be handled correctly. The test changes are mostly noise. There are a handful of x86 tests that look like regressions with an additional spill, and a handful that now avoid a spill. The worst looking regression is likely test/Thumb2/mve-vld4.ll which introduces a few additional spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll shows a massive improvement by completely eliminating a large number of spills inside a loop.	2021-09-14 21:00:29 -04:00
Bjorn Pettersson	cd2bff1ef1	[StackColoring] Fix a debug invariance problem Ignore dbg instructions when collecting stack slot markers. This is to make sure the coloring is invariant regarding presence of dbg instructions (even in cases when the dbg instructions might be badly placed in the input). Differential Revision: https://reviews.llvm.org/D109758	2021-09-14 19:21:56 +02:00
vnalamot	726b5d3416	[RegScavenger][NFC] Refer to the already initialized local variable for spill slot index Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109501	2021-09-13 21:55:33 +05:30
Simon Pilgrim	9db20822f7	[APInt] Add APIntOps::ScaleBitMask helper APInt is used to describe a bit mask in a variety of value tracking and demanded bits/elts functions. When traversing through dst/src operands, we have a number of places where these masks need to widened/narrowed to translate through bitcasts, reductions etc. to a different type. This patch add a APIntOps::ScaleBitMask common helper, adds unit test coverage, and updates a number of cases to use the the helper instead of their own implementation. This came up on D109065 where we currently have to add yet another implementation of the same code. Differential Revision: https://reviews.llvm.org/D109683	2021-09-13 16:27:12 +01:00
vnalamot	0fc3ebb70a	[SelectionDAG][NFC] Fix typo in VerifyDAGDiverence() function name Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109674	2021-09-13 20:48:04 +05:30
David Truby	915e9e76bf	[llvm][sve] Lowering for VLS masked extending loads This extends the custom lowering for extending loads on fixed length vectors in SVE to support masked extending loads. The existing tests for correct behaviour of masked extending loads exhibit bad code generation due to the legalistaion of i1 vectors. They have been left as-is and new tests have been added that do not exhibit this behaviour. Differential Revision: https://reviews.llvm.org/D108200	2021-09-13 11:13:25 +01:00
Nikita Popov	4189e5fe12	[CGP] Support opaque pointers in address mode fold Rather than inspecting the pointer element type, use the access type of the load/store/atomicrmw/cmpxchg. In the process of doing this, simplify the logic by storing the address + type in MemoryUses, rather than an Instruction + Operand pair (which was then used to fetch the address).	2021-09-12 17:43:37 +02:00
Kazu Hirata	c9fca53af1	[CodeGen, Target] Use pred_empty and succ_empty (NFC)	2021-09-10 11:11:31 -07:00
Nikita Popov	14afbe9448	[CallLowering] Support opaque pointers Always use the byval/inalloca/preallocated type (which is required nowadays), don't fall back on the pointer element type. This requires adding Function::getParamPreallocatedType() to mirror the CallBase API, so that the templated code can work with both.	2021-09-10 18:32:12 +02:00
Sander de Smalen	ec7d8d5069	[SelectionDAG] PromoteIntRes_EXTRACT_SUBVECTOR for scalable vectors (widening). This patch implements legalization of EXTRACT_SUBVECTOR for the case where the result needs promoting, and the input type requires widening. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D109509	2021-09-10 13:29:26 +01:00
Sander de Smalen	801a745dd2	[SelectionDAG] PromoteIntRes_EXTRACT_SUBVECTOR for scalable vectors. This patch implements legalization of EXTRACT_SUBVECTOR for the case where the result needs promoting, and the input type is either legal or requires splitting. The idea is that the operation is broken down into simpler steps, by first extracting a smaller subvector until the input vector becomes legal or requires promotion. Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D109313	2021-09-10 13:29:26 +01:00
Zequan Wu	12f80c0bbd	[DebugInfo] Emit DW_AT_inline under -g1/-gmlt Differential Revision: https://reviews.llvm.org/D109554	2021-09-09 18:59:50 -07:00
Craig Topper	9af8f1b18e	[SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode. Soft deprecrate isNullValue/isAllOnesValue and update in tree callers. This matches the changes to the APInt interface from D109483. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D109535	2021-09-09 13:28:30 -07:00
Craig Topper	517728fe1e	[SelectionDAG] Use DAG.getNOT to further simplify some code. NFC Followup to D109483	2021-09-09 10:53:39 -07:00
Nick Desaulniers	e69d402088	[NFC] rename member of BitTestBlock and JumpTableHeader Follow up to suggestions in D109103 via hans: I think UnreachableDefault (or UnreachableFallthrough) would be a better name now, since it doesn't just omit the range check, it also omits the last bit test. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D109455	2021-09-09 10:43:00 -07:00
Chris Lattner	d51da74889	[CodeGen] Use DAG.getAllOnesConstant where possible to simplify code. NFC.	2021-09-09 10:22:51 -07:00
Chris Lattner	735f46715d	[APInt] Normalize naming on keep constructors / predicate methods. This renames the primary methods for creating a zero value to `getZero` instead of `getNullValue` and renames predicates like `isAllOnesValue` to simply `isAllOnes`. This achieves two things: 1) This starts standardizing predicates across the LLVM codebase, following (in this case) ConstantInt. The word "Value" doesn't convey anything of merit, and is missing in some of the other things. 2) Calling an integer "null" doesn't make any sense. The original sin here is mine and I've regretted it for years. This moves us to calling it "zero" instead, which is correct! APInt is widely used and I don't think anyone is keen to take massive source breakage on anything so core, at least not all in one go. As such, this doesn't actually delete any entrypoints, it "soft deprecates" them with a comment. Included in this patch are changes to a bunch of the codebase, but there are more. We should normalize SelectionDAG and other APIs as well, which would make the API change more mechanical. Differential Revision: https://reviews.llvm.org/D109483	2021-09-09 09:50:24 -07:00
Chris Lattner	9e46dd965a	[APInt.h] Reduce the APInt header file interface a bit. NFC This moves one mid-size function out of line, inlines the trivial tcAnd/tcOr/tcXor/tcComplement methods into their only caller, and moves the magic/umagic functions into SelectionDAG since they are implementation details of its algorithm. This also removes the unit tests for magic, but these are already tested in the divide lowering logic for various targets. This also upgrades some C style comments to C++. Differential Revision: https://reviews.llvm.org/D109476	2021-09-08 18:17:07 -07:00
Amara Emerson	eae44c8a86	[GlobalISel] Implement merging of stores of truncates. This is a port of a combine which matches a pattern where a wide type scalar value is stored by several narrow stores. It folds it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; => ((i32)p) = val; On CTMark AArch64 -Os this results in a good amount of savings: Program before after diff SPASS 412792 412788 -0.0% kc 432528 432512 -0.0% lencod 430112 430096 -0.0% consumer-typeset 419156 419128 -0.0% bullet 475840 475752 -0.0% tramp3d-v4 367760 367628 -0.0% clamscan 383388 383204 -0.0% pairlocalalign 249764 249476 -0.1% 7zip-benchmark 570100 568860 -0.2% sqlite3 287628 286920 -0.2% Geomean difference -0.1% Differential Revision: https://reviews.llvm.org/D109419	2021-09-08 17:06:33 -07:00
Nick Desaulniers	4331f19d8b	[ISEL][BitTestBlock] omit additional bit test when default destination is unreachable Otherwise we end up with an extra conditional jump, following by an unconditional jump off the end of a function. ie. bb.0: BT32rr .. JCC_1 %bb.4 ... bb.1: BT32rr .. JCC_1 %bb.2 ... JMP_1 %bb.3 bb.2: ... bb.3.unreachable: bb.4: ... Should be equivalent to: bb.0: BT32rr .. JCC_1 %bb.4 ... JMP_1 %bb.2 bb.1: bb.2: ... bb.3.unreachable: bb.4: ... This can occur since at the higher level IR (Instruction) SwitchInsts are required to have BBs for default destinations, even when it can be deduced that such BBs are unreachable. For most programs, this isn't an issue, just wasted instructions since the unreachable has been statically proven. The x86_64 Linux kernel when built with CONFIG_LTO_CLANG_THIN=y fails to boot though once D106056 is re-applied. D106056 makes it more likely that correlation-propagation (CVP) can deduce that the default case of SwitchInsts are unreachable. The x86_64 kernel uses a binary post processor called objtool, which emits this warning: vmlinux.o: warning: objtool: cfg80211_edmg_chandef_valid()+0x169: can't find jump dest instruction at .text.cfg80211_edmg_chandef_valid+0x17b I haven't debugged precisely why this causes a failure at boot time, but fixing this very obvious jump off the end of the function fixes the warning and boot problem. Link: https://bugs.llvm.org/show_bug.cgi?id=50080 Fixes: https://github.com/ClangBuiltLinux/linux/issues/679 Fixes: https://github.com/ClangBuiltLinux/linux/issues/1440 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D109103	2021-09-08 11:03:47 -07:00
David Green	d8d24c64fe	[DAG] Fix GT -> GE condition when creating SetCC `79845ed6df` folded some setcc(ashr) conditions to setcc, but got the condition for NE incorrect, using GT where it should be using GE.	2021-09-08 12:41:51 +01:00
Evgeny Leviant	93b09a2a5d	[LiveDebugValues] Handle spills of indirect debug values correctly When handling register spill for indirect debug value LiveDebugValues pass doesn't add DW_OP_deref operator which may in some cases cause debugger to return value address, instead of value while machine register holding that address is spilled. Differential revision: https://reviews.llvm.org/D109142	2021-09-08 14:06:08 +03:00
Fraser Cormack	2c5568a6a9	[LegalizeTypes][VP] Add promotion support for binary VP ops This patch extends the preliminary support for vector-predicated (VP) operation legalization to include promotion of illegal integer vector types. Integer promotion of binary VP operations is relatively simple and piggy-backs on the non-VP logic, but passing the two extra mask and VP operands through to the promoted operation. Tests have been added to the RISC-V target to cover the basic scenarios for integer promotion for both fixed- and scalable-vector types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D108288	2021-09-08 10:22:57 +01:00
Peter Smith	5e71839f77	[MC] Add MCSubtargetInfo to MCAlignFragment In preparation for passing the MCSubtargetInfo (STI) through to writeNops so that it can use the STI in operation at the time, we need to record the STI in operation when a MCAlignFragment may write nops as padding. The STI is currently unused, a further patch will pass it through to writeNops. There are many places that can create an MCAlignFragment, in most cases we can find out the STI in operation at the time. In a few places this isn't possible as we are in initialisation or finalisation, or are emitting constant pools. When possible I've tried to find the most appropriate existing fragment to obtain the STI from, when none is available use the per module STI. For constant pools we don't actually need to use EmitCodeAlign as the constant pools are data anyway so falling through into it via an executable NOP is no better than falling through into data padding. This is a prerequisite for D45962 which uses the STI to emit the appropriate NOP for the STI. Which can differ per fragment. Note that involves an interface change to InitSections. It is now called initSections and requires a SubtargetInfo as a parameter. Differential Revision: https://reviews.llvm.org/D45961	2021-09-07 15:46:19 +01:00
Mirko Brkusanin	6c4b634da6	[AMDGPU][GlobalISel] Legalize G_MUL for non-standard types Legalizing G_MUL for non-standard types (like i33) generated an error. Putting minScalar and maxScalar instead of clampScalar. Also using new rule, instead of widening to the next power of 2, widen to the next multiple of the passed argument (32 in this case), so instead of widening i65 to i128, we widen it to i96. Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D109228	2021-09-07 16:33:24 +02:00
Mirko Brkusanin	5263bf583a	[AMDGPU][GlobalISel] Legalization of G_ROTL and G_ROTR Add implementation for the legalization of G_ROTL and G_ROTR machine instructions. They are very similar to funnel shift instructions, the only difference is funnel shifts have 3 operands, whereas rotate instructions have two operands, the first being the register that is being rotated and the second being the number of shifts. The legalization of G_ROTL/G_ROTR is just lowering them into funnel shift instructions if they are legal. Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D105347	2021-09-07 16:33:24 +02:00
Mirko Brkusanin	36527cbe02	[AMDGPU][GlobalISel] Legalize memcpy family of intrinsics Legalize G_MEMCPY, G_MEMMOVE, G_MEMSET and G_MEMCPY_INLINE. Corresponding intrinsics are replaced by a loop that uses loads/stores in AMDGPULowerIntrinsics pass unless their length is a constant lower then MemIntrinsicExpandSizeThresholdOpt (default 1024). Any G_MEM* instruction that reaches legalizer should have a const length argument and should be expanded into appropriate number of loads + stores. Differential Revision: https://reviews.llvm.org/D108357	2021-09-07 12:24:07 +02:00
Fraser Cormack	a823bdf3ab	[RISCV][VP] Custom lower VP_STORE and VP_LOAD This patch adds support for the vector-predicated `VP_STORE` and `VP_LOAD` nodes. We do this in the same way we lower `MSTORE` and `MLOAD`: to regular load/store instructions via intrinsics. One necessary change was made to `SelectionDAGLegalize` so that `VP_STORE` nodes' operation actions are taken from the stored "value" operands, in the same vein as `STORE` or `MSTORE`. Reviewed By: craig.topper, rogfer01 Differential Revision: https://reviews.llvm.org/D108999	2021-09-07 10:53:25 +01:00
Fraser Cormack	f4dee8cb82	[RISCV][VP] Custom lower VP_SCATTER and VP_GATHER This patch adds support for the `VP_SCATTER` and `VP_GATHER` nodes by lowering them to RVV's `vsox`/`vlux` instructions, respectively. This process is almost identical to the existing `MSCATTER`/`MGATHER` support. One extra change was made to `SelectionDAGLegalize` so that `VP_SCATTER`'s operation action is derived from its stored "value" operand rather than its return type (which is always the chain). Reviewed By: craig.topper, rogfer01 Differential Revision: https://reviews.llvm.org/D108987	2021-09-07 10:43:07 +01:00
Sanjay Patel	e1e4bf174b	[DAGCombine] Prevent the transform of combine for multi-use operand The test is based on a miscompile example in: https://llvm.org/PR51321 Differential Revision: https://reviews.llvm.org/D107692	2021-09-06 15:30:32 -04:00
Jonas Paulsson	118997d8e9	[SelectionDAGBuilder] Bugfix in visitInlineAsm() In case of a virtual register tied to a phys-def, the register class needs to be computed. Make sure that this works generally also with fast regalloc by using TLI.getRegClassFor() whenever possible, and make only the case of 'Untyped' use getMinimalPhysRegClass(). Fixes https://bugs.llvm.org/show_bug.cgi?id=51699. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D109291	2021-09-06 17:46:31 +02:00
David Green	1b83aaaefa	[DAG] Remove oneuse check in select_cc setgt X, -1, C, ~C fold This appears to produce better code, even if the condition may need to be replicated.	2021-09-05 16:18:31 +01:00
David Green	8523fb96a6	[DAG] Fold select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C Given a select_cc producing a constant and a invertion of the constant for a comparison more than zero, we can produce an xor with ashr instead, which produces smaller code. The ashr either sets all bits or clear all bits depending on if the value is negative. This is then xor'd with the constant to optionally negate the value. https://alive2.llvm.org/ce/z/DTFaBZ This includes a OneUseCheck on the Cmp, which seems to make thinks a little worse and will be removed in a followup. Differential Revision: https://reviews.llvm.org/D109149	2021-09-05 16:04:01 +01:00
David Green	79845ed6df	[DAG] Fold setcc eq with ashr to compare to zero. Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 -> set_cc setlt X, 0 to prevent some regressions later on when folding select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C Differential Revision: https://reviews.llvm.org/D109214	2021-09-05 14:06:47 +01:00
Fangrui Song	e03c8d309a	[AsmPrinter] Remove unneeded MCSubtargetInfo temporary after D14346. NFC The temporary object was used as a workaround when the target parser may change STI. D14346 made the MCSubtargetInfo argument to createMCAsmParser const, so we no longer need the temporary object.	2021-09-04 10:50:10 -07:00
Konstantin Schwarz	90d5298759	[GlobalISel] Add convenience constructors to MemDesc This allows constructing a MemDesc from a MachineMemoryOperand, a pattern that starts to show up more frequently. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D109161	2021-09-03 12:52:18 +02:00
Chen Zheng	34badc409c	Revert "[HardwareLoops] Change order of SCEV expression construction for InitLoopCount." This causes https://bugs.llvm.org/show_bug.cgi?id=51714 and is not a right patch according to comments in D91724 This reverts commit `42eaf4fe0a`.	2021-09-03 02:55:43 +00:00
Jessica Paquette	844d8e0337	[GlobalISel] Combine icmp eq/ne x, 0/1 -> x when x == 0 or 1 This adds the following combines: ``` x = ... 0 or 1 c = icmp eq x, 1 -> c = x ``` and ``` x = ... 0 or 1 c = icmp ne x, 0 -> c = x ``` When the target's true value for the relevant types is 1. This showed up in the following situation: https://godbolt.org/z/M5jKexWTW SDAG currently supports the `ne` case, but not the `eq` case. This can probably be further generalized, but I don't feel like thinking that hard right now. This gives some minor code size improvements across the board on CTMark at -Os for AArch64. (0.1% for 7zip and pairlocalalign in particular.) Differential Revision: https://reviews.llvm.org/D109130	2021-09-02 15:05:31 -07:00
Heejin Ahn	28780e59f6	[WebAssembly] Add Wasm SjLj support This add support for SjLj using Wasm exception handling instructions: https://github.com/WebAssembly/exception-handling/blob/master/proposals/exception-handling/Exceptions.md This does not yet support the mixed use of EH and SjLj within a function. It will be added in a follow-up CL. This currently passes all SjLj Emscripten tests for wasm0/1/2/3/s, except for the below: - `test_longjmp_standalone`: Uses Node - `test_dlfcn_longjmp`: Uses NodeRAWFS - `test_longjmp_throw`: Mixes EH and SjLj - `test_exceptions_longjmp1`: Mixes EH and SjLj - `test_exceptions_longjmp2`: Mixes EH and SjLj - `test_exceptions_longjmp3`: Mixes EH and SjLj Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D108960	2021-09-02 10:51:02 -07:00
David Green	9cb8f4d1ad	[ARM] Add a tail-predication loop predicate register The semantics of tail predication loops means that the value of LR as an instruction is executed determines the predicate. In other words: mov r3, #3 DLSTP lr, r3 // Start tail predication, lr==3 VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0. mov lr, #1 VADD.s32 q0, q1, q2 // Only first lane is updated. This means that the value of lr cannot be spilled and re-used in tail predication regions without potentially altering the behaviour of the program. More lanes than required could be stored, for example, and in the case of a gather those lanes might not have been setup, leading to alignment exceptions. This patch adds a new lr predicate operand to MVE instructions in order to keep a reference to the lr that they use as a tail predicate. It will usually hold the zeroreg meaning not predicated, being set to the LR phi value in the MVETPAndVPTOptimisationsPass. This will prevent it from being spilled anywhere that it needs to be used. A lot of tests needed updating. Differential Revision: https://reviews.llvm.org/D107638	2021-09-02 13:42:58 +01:00
Roman Lebedev	3f1f08f0ed	Revert @llvm.isnan intrinsic patchset. Please refer to https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html (and that whole thread.) TLDR: the original patch had no prior RFC, yet it had some changes that really need a proper RFC discussion. It won't be productive to discuss such an RFC, once it's actually posted, while said patch is already committed, because that introduces bias towards already-committed stuff, and the tree is potentially in broken state meanwhile. While the end result of discussion may lead back to the current design, it may also not lead to the current design. Therefore i take it upon myself to revert the tree back to last known good state. This reverts commit `4c4093e6e3`. This reverts commit `0a2b1ba33a`. This reverts commit `d9873711cb`. This reverts commit `791006fb8c`. This reverts commit `c22b64ef66`. This reverts commit `72ebcd3198`. This reverts commit `5fa6039a5f`. This reverts commit `9efda541bf`. This reverts commit `94d3ff09cf`.	2021-09-02 13:53:56 +03:00
Fraser Cormack	ef78f2106c	[LegalizeTypes][VP] Add splitting support for binary VP ops This patch extends D107904's introduction of vector-predicated (VP) operation legalization to include vector splitting. When the result of a binary VP operation needs splitting, all of its operands are split in kind. The two operands and the mask are split as usual, and the vector-length parameter EVL is "split" such that the low and high halves each execute the correct number of elements. Tests have been added to the RISC-V target to show splitting several scenarios for fixed- and scalable-vector types. Without support for `umax` (e.g. in the `B` extension) the generated code starts to branch. Ideally a cost model would prevent their insertion in the first place. Through these tests many opportunities for better codegen can be seen: combining known-undef VP operations and for constant-folding operations on `ISD::VSCALE`, to name but a few. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D107957	2021-09-02 10:15:53 +01:00
Abinav Puthan Purayil	0baace5379	[DAGCombine] Add node level checks for fp-contract and fp-ninf in visitFMULForFMADistributiveCombine(). Differential Revision: https://reviews.llvm.org/D107551	2021-09-02 11:33:14 +05:30
Roman Lebedev	f5753125f0	[Codegen][TLI][X86] SimplifyMultipleUseDemandedBits(): 0'th vec subreg widening is free, try to perform it earlier I believe, the profitability reasoning here is correct "sub"reg is already located within the 0'th subreg of wider reg, so if we have suvector insertion at index 0 into undef, then it's always free do to. After this, D109065 finally avoids the regression in D108382. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D109074	2021-09-02 00:54:05 +03:00
Arthur Eubanks	52e6d70c40	[NFC] Use newly introduced *AtIndex methods Introduced in D108788. These are clearer.	2021-09-01 11:18:41 -07:00
Fraser Cormack	85fd44d7fe	[SelectionDAG][NFC] Fix typo in assertion message s/Uexpected/Unexpected.	2021-09-01 08:55:06 +01:00
Yonghong Song	89424a829f	[DWARF] Support new TAG DW_TAG_LLVM_annotation A new LLVM specific TAG DW_TAG_LLVM_annotation is added. The name is suggested by Paul Robinson ([1]). Currently, this tag is used to output __attribute__((btf_tag("string"))) annotations in dwarf. The following is an example for a global variable with two btf_tag attributes: 0x0000002a: DW_TAG_variable DW_AT_name ("g1") DW_AT_type (0x00000052 "int") DW_AT_external (true) DW_AT_decl_file ("/tmp/home/yhs/work/tests/llvm/btf_tag/t.c") DW_AT_decl_line (8) DW_AT_location (DW_OP_addr 0x0) 0x0000003f: DW_TAG_LLVM_annotation DW_AT_name ("btf_tag") DW_AT_const_value ("tag1") 0x00000048: DW_TAG_LLVM_annotation DW_AT_name ("btf_tag") DW_AT_const_value ("tag2") 0x00000051: NULL In the future, DW_TAG_LLVM_annotation may encode other type of non-string const value. [1] https://lists.llvm.org/pipermail/llvm-dev/2021-June/151250.html Differential Revision: https://reviews.llvm.org/D106621	2021-08-31 19:22:17 -07:00
Stanislav Mekhanoshin	d170945bb2	[RegAlloc] Immediately delete dead instructions with live uses When RA eliminated a dead def it can either immediately delete the instruction itself or replace it with KILL to defer the actual removal. If this instruction has a virtual register use killing the register it will shrink the LI of the use. However, if the LI covers the instruction and extends beyond it the shrink will not happen. In fact that is impossible to shrink such use because of the KILL still using it. If later the LI of the use will be split at the KILL and the KILL itself is eliminated after that point the new live segment ends up at an invalid slot index. This extremely rare condition was hit after D106408 which has enabled rematerialization of such instructions. The replacement with KILL is only done for rematerialized defs which became dead and such rematerialization did not generally happen before. The patch deletes an instruction immediately if it is a result of rematerialization and has such use. An alternative would be to prohibit a split at a KILL instruction, but it looks like it is better to split a live range rather then keeping a killed instruction just in case it can be rematerialized further. Fixes PR51655. Differential Revision: https://reviews.llvm.org/D108951	2021-08-31 13:46:00 -07:00
Jessica Paquette	94d3ff09cf	[GlobalISel] Don't use G_FPTOSI in G_ISNAN legalization As noted in the comments in D108227, using G_FPTOSI produces wrong results for G_ISNAN. Drop the G_FPTOSI and perform the operation on integer types. Elsewhere in LLVM, a bitcast would be the appropriate choice (as it is in SDAG). GlobalISel does not distinguish between integer and FP types, so a bitcast would be meaningless here.	2021-08-31 10:26:42 -07:00
Hussain Kadhem	524ded7d01	[VP] implementation of sdag support for VP memory intrinsics Followup to D99355: SDAG support for vector-predicated load/store/gather/scatter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D105871	2021-08-31 17:01:50 +02:00
Nemanja Ivanovic	84d4ed1761	Revert "[DebugInfo] Emit DW_TAG_namelist and DW_TAG_namelist_item" This reverts commit `0a6fad754e`. It caused failures on a number of PowerPC bots.	2021-08-31 09:24:50 -05:00
Craig Topper	201f6446da	[LegalizeTypes][X86] Improve ExpandIntRes_FP_TO_SINT/ExpandIntRes_FP_TO_UINT when input is SoftPromoteHalf. Instead of splitting off the fp16 to float conversion and generating a libcall, we should split the operation into fp16 to float and float to integer operations. This will allow the float to integer conversion to go through any custom handling the target has. If the target doesn't have custom handling then we should come back to ExpandIntRes_FP_TO_SINT/ ExpandIntRes_FP_TO_UINT automatically to create the libcall. This avoids generating libcalls on 32-bit X86. These library functions may not exist in 32-bit libgcc. At least for LLVM, we never generate them when hardware floating point instructions are available. Differential Revision: https://reviews.llvm.org/D108933	2021-08-30 13:12:59 -07:00
Bjorn Pettersson	789f01283d	[SelectionDAG] Fix miscompile bugs related to smul.fix.sat with scale zero When expanding a SMULFIXSAT ISD node (usually originating from a smul.fix.sat intrinsic) we've applied some optimizations for the special case when the scale is zero. The idea has been that it would be cheaper to use an SMULO instruction (if legal) to perform the multiplication and at the same time detect any overflow. And in case of overflow we could use some SELECT:s to replace the result with the saturated min/max value. The only tricky part is to know if we overflowed on the min or max value, i.e. if the product is positive or negative. Unfortunately the implementation has been incorrect as it has looked at the product returned by the SMULO to determine the sign of the product. In case of overflow that product is truncated and won't give us the correct sign bit. This patch is adding an extra XOR of the multiplication operands, which is used to determine the sign of the non truncated product. This patch fixes PR51677. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D108938	2021-08-30 22:08:26 +02:00
Chih-Ping Chen	070090cfa5	[DebugInfo] Remove the restriction on the size of DIStringType in DebugHandlerBase::isUnsignedDIType. Differential Revision: https://reviews.llvm.org/D108559	2021-08-30 15:36:54 -04:00
Nikita Popov	0529e2e018	[InstrInfo] Use 64-bit immediates for analyzeCompare() (NFCI) The backend generally uses 64-bit immediates (e.g. what MachineOperand::getImm() returns), so use that for analyzeCompare() and optimizeCompareInst() as well. This avoids truncation for targets that support immediates larger 32-bit. In particular, we can avoid the bugprone value normalization hack in the AArch64 target. This is a followup to D108076. Differential Revision: https://reviews.llvm.org/D108875	2021-08-30 19:46:04 +02:00
Hongtao Yu	f39256e3a5	[CSSPGO] Avoid repeatedly computing md5 hash code for pseudo probe inline contexts. Md5 hashing is expansive. Using a hash map to look up already computed GUID for dwarf names. Saw a 2% build time improvement on an internal large application. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D108722	2021-08-30 10:11:47 -07:00
Kazu Hirata	c50faffb4e	[llvm] Remove redundant calls to str() and c_str() (NFC) Identified with readability-redundant-string-cstr.	2021-08-30 09:05:05 -07:00
Craig Topper	705d005781	[DAGCombiner][RISCV] Don't use vector types in DAGCombiner::tryStoreMergeOfLoads if we need a rotate. The check for whether a rotate is possible occurs before the memory legality checks for the integer type. So it's possible we decide we can use a rotate, but then fail the legality checks. If that happens we should not fall back to a vector type. This triggers an assertion in the rotate handling when it finds a vector type instead of an integer type. In theory we could use a shufflevector in place of the rotate, but right now I'd just like to fix the crash. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D108839	2021-08-30 08:47:15 -07:00
Djordje Todorovic	86f5288eae	[LiveDebugValues] Cleanup Transfers when removing Entry Value If we encounter a new debug value, describing the same parameter, we should stop tracking the parameter's Entry Value. At that point, in some cases, the Transfer which uses the parameter's Entry Value, is already emitted. Thanks to the RemoveRedundantDebugValues pass, many problems with incorrect instruction order and number of DBG_VALUEs are fixed. However, we still cannot rely on the rule that each new debug value is set by the previous non-debug instruction in Machine Basic Block. When new parameter debug value triggers removal of Backup Entry Value for the same parameter, do the cleanup of Transfers emitted from Backup Entry Values. Get the Transfer Instruction which created the new debug value and search for debug values already emitted from the to-be-deleted Backup Entry Value and attached to the Transfer Instruction. If found, delete the Transfer and remove "primary" Entry Value Var Loc from OpenRanges. This patch fixes PR47628. Patch by Nikola Tesic. Differential revision: https://reviews.llvm.org/D106856	2021-08-30 14:00:41 +02:00
Simon Pilgrim	7c25a32840	Fix MSVC "signed/unsigned mismatch" comparison warning. NFCI.	2021-08-30 12:11:09 +01:00
“bhkumarn”	0a6fad754e	[DebugInfo] Emit DW_TAG_namelist and DW_TAG_namelist_item This patch emits DW_TAG_namelist and DW_TAG_namelist_item for fortran namelist variables. DICompositeType is extended to support this fortran feature. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D108553	2021-08-30 13:40:39 +05:30
Matt Arsenault	1494298b51	GlobalISel: Remove check for empty functions as these are invalid IR	2021-08-27 09:27:06 -04:00
Carl Ritson	5d9de3ea18	[DAGCombine] Allow FMA combine with both FMA and FMAD Without this change only the preferred fusion opcode is tested when attempting to combine FMA operations. If both FMA and FMAD are available then FMA ops formed prior to legalization will not be merged post legalization as FMAD becomes the preferred fusion opcode. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D108619	2021-08-27 19:49:35 +09:00
Matt Arsenault	3fdcd9bb13	GlobalISel: Add CallBase to CallLoweringInfo The DAG version has this, and is necessary for call lowering to take advantage of any attributes at the call site.	2021-08-26 21:09:11 -04:00
Craig Topper	8bb24289f3	[SelectionDAG] Optimize bitreverse expansion to minimize the number of mask constants. We can halve the number of mask constants by masking before shl and after srl. This can reduce the number of mov immediate or constant materializations. Or reduce the number of constant pool loads for X86 vectors. I think we might be able to do something similar for bswap. I'll look at it next. Differential Revision: https://reviews.llvm.org/D108738	2021-08-26 09:33:24 -07:00
Andrew Wei	c9066c5d37	[CGP] Fix the crash for combining address mode when having cyclic dependency In the combination of addressing modes, when replacing the matched phi nodes, sometimes the phi node to be replaced has been modified. For example, there’s matcher set [A, B] and [C, A], which will have cyclic dependency: A is replaced by B and C will be replaced by A. Because we tried to match new phi node to another new phi node, we should ignore new phi nodes when mapping new phi node to old one. Reviewed By: skatkov Differential Revision: https://reviews.llvm.org/D108635	2021-08-26 22:52:42 +08:00
Jay Foad	985eb25546	[MachineScheduler] Fix tracing Consistently print a newline before "RegionInstrs:".	2021-08-26 09:27:01 +01:00
Heejin Ahn	2f88a30ca6	[WebAssembly] Extract longjmp handling in EmSjLj to a function (NFC) Emscripten SjLj and (soon-to-be-added) Wasm SjLj transformation share many steps: 1. Initialize `setjmpTable` and `setjmpTableSize` in the entry BB 2. Handle `setjmp` callsites 3. Handle `longjmp` callsites 4. Cleanup and update SSA 1, 3, and 4 are identical for Emscripten SjLj and Wasm SjLj. Only the step 2 is different. This CL extracts the current Emscripten SjLj's longjmp callsites handling into a function. The reason to make this a separate CL is, without this, the diff tool cannot compare things well in the presence of moved code and added code in the followup Wasm SjLj CL, and it ends up mixing them together, making the diff unreadable. Also fixes some typos and variable names. So far we've been calling the buffer argument to `setjmp` and `longjmp` `jmpbuf`, but the name used in the man page for those functions is `env`, so updated them to be consistent. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D108728	2021-08-25 15:45:38 -07:00
Heejin Ahn	c2c9a3fd9c	[WebAssembly] Rename wasm.catch.exn intrinsic back to wasm.catch The plan was to use `wasm.catch.exn` intrinsic to catch exceptions and add `wasm.catch.longjmp` intrinsic, that returns two values (setjmp buffer and return value), later to catch longjmps. But because we decided not to use multivalue support at the moment, we are going to use one intrinsic that returns a single value for both exceptions and longjmps. And even if it's not for that, I now think the naming of `wasm.catch.exn` is a little weird, because the intrinsic can still take a tag immediate, which means it can be used for anything, not only exceptions, as long as that returns a single value. This partially reverts D107405. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D108683	2021-08-25 14:19:22 -07:00
Sanjay Patel	e728d1a3e8	[DAGCombiner] create binop nodes with all of expected values This is another bug exposed by https://llvm.org/PR51612 (and the one that triggered the initial assertion) in the report. That example was suppressed with: `985b48f183` ...but these would still crash because we created nodes like UADDO without the expected 2 output values.	2021-08-25 16:14:22 -04:00
Sanjay Patel	985b48f183	[DAGCombiner] check uses more strictly on select-of-binop fold There are 2 bugs here: 1. We were not checking uses of operand 2 (the false value of the select). 2. We were not checking for multiple uses of nodes that produce >1 result. Correcting those is enough to avoid the crash in the reduced test based on: https://llvm.org/PR51612 The additional use check on operand 0 (the condition value of the select) should not strictly be necessary because we are only replacing one use with another (whether it makes performance sense to do the transform with that pattern is not clear). But as noted in the TODO, changing that uncovers another bug. Note: there's at least one more bug here - we aren't propagating EVTs correctly, but I plan to fix that in another patch.	2021-08-25 14:14:41 -04:00
Nick Desaulniers	846e562dcc	[Clang] add support for error+warning fn attrs Add support for the GNU C style __attribute__((error(""))) and __attribute__((warning(""))). These attributes are meant to be put on declarations of functions whom should not be called. They are frequently used to provide compile time diagnostics similar to _Static_assert, but which may rely on non-ICE conditions (ie. relying on compiler optimizations). This is also similar to diagnose_if function attribute, but can diagnose after optimizations have been run. While users may instead simply call undefined functions in such cases to get a linkage failure from the linker, these provide a much more ergonomic and actionable diagnostic to users and do so at compile time rather than at link time. Users instead may be able use inline asm .err directives. These are used throughout the Linux kernel in its implementation of BUILD_BUG and BUILD_BUG_ON macros. These macros generally cannot be converted to use _Static_assert because many of the parameters are not ICEs. The Linux kernel still needs to be modified to make use of these when building with Clang; I have a patch that does so I will send once this feature is landed. To do so, we create a new IR level Function attribute, "dontcall" (both error and warning boil down to one IR Fn Attr). Then, similar to calls to inline asm, we attach a !srcloc Metadata node to call sites of such attributed callees. The backend diagnoses these during instruction selection, while we still know that a call is a call (vs say a JMP that's a tail call) in an arch agnostic manner. The frontend then reconstructs the SourceLocation from that Metadata, and determines whether to emit an error or warning based on the callee's attribute. Link: https://bugs.llvm.org/show_bug.cgi?id=16428 Link: https://github.com/ClangBuiltLinux/linux/issues/1173 Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D106030	2021-08-25 10:34:18 -07:00
Jeremy Morse	0116ed0069	[DebugInfo][InstrRef] Don't use instr-ref for unoptimised functions InstrRefBasedLDV is marginally slower than VarlocBasedLDV when analysing optimised code -- however, it's much slower when analysing code compiled -O0. To avoid this: don't use instruction referencing for -O0 functions. In the "pure" case of unoptimised code, this won't really harm the debugging experience because most variables won't have been promoted off the stack, so can't go missing. It becomes more complicated when optimised code is inlined into functions marked optnone; however these are rare, and as -O0 doesn't run many optimisations there should be little damage to the debug experience as a result. I've taken the opportunity to refactor testing for instruction-referencing into a MachineFunction method, which seems the most appropriate place to put it. Differential Revision: https://reviews.llvm.org/D108585	2021-08-25 15:10:36 +01:00
Peilin Guo	4c4dbeeeea	[DAGCombine] Check the legality of the index of EXTRACT_SUBVECTOR For ISD::EXTRACT_SUBVECTOR, its second operand must be a constant multiple of the known-minimum vector length of the result type. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107795	2021-08-25 19:33:39 +08:00
Jeremy Morse	cc1e87bf55	[DebugInfo][InstrRef] Avoid stack-slot-coloring changing codegen due to DI Stack slot colouring adds "weight" to slots if a non-dbg-value instruction refers to it. This, unfortunately, means that DBG_PHI instructions can have an effect on codegen. The fix is very simple, replace isDebugValue with isDebugInstr. The regression test contains a scenario that reproduces this problem; I've represented both normal-debug mode and instr-ref debug mode instructions in comment lines prefixed with AAAAAA and BBBBBB, and un-comment them with sed to test that the two different modes produce the same behaviour. Differential Revision: https://reviews.llvm.org/D108627	2021-08-25 12:04:59 +01:00
Konstantin Schwarz	4b4bc1ea16	[GlobalISel] Do not generate illegal G_SEXTLOADs after legalization The sext_inreg_of_load combine did not have the isLegalOrBeforeLegalizer check, leading to the generation of potentially illegal G_SEXTLOADs when run after legalization. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D108626	2021-08-25 10:13:39 +02:00
Vang Thao	549f6a819a	[MachineCopyPropagation] Check CrossCopyRegClass for cross-class copys On some AMDGPU subtargets, copying to and from AGPR registers using another AGPR register is not possible. A intermediate VGPR register is needed for AGPR to AGPR copy. This is an issue when machine copy propagation forwards a COPY $agpr, replacing a COPY $vgpr which results in $agpr = COPY $agpr. It is removing a cross class copy that may have been optimized by previous passes and potentially creating an unoptimized cross class copy later on. To avoid this issue, check CrossCopyRegClass if a different register class will be needed for the copy. If so then avoid forwarding the copy when the destination does not match the desired register class and if the original copy already matches the desired register class. Issue seen while attempting to optimize another AGPR to AGPR issue: Live-ins: $agpr0 $vgpr0 = COPY $agpr0 $agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0 $agpr2 = COPY $vgpr0 $agpr3 = COPY $vgpr0 $agpr4 = COPY $vgpr0 After machine-cp: $vgpr0 = COPY $agpr0 $agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0 $agpr2 = COPY $agpr0 $agpr3 = COPY $agpr0 $agpr4 = COPY $agpr0 Machine-cp propagated COPY $agpr0 to replace $vgpr0 creating 3 AGPR to AGPR copys. Later this creates a cross-register copy from AGPR->VGPR->AGPR for each copy when the prior VGPR->AGPR copy was already optimal. Reviewed By: lkail, rampitec Differential Revision: https://reviews.llvm.org/D108011	2021-08-24 21:22:36 -07:00
Stanislav Mekhanoshin	92c1fd19ab	Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408	2021-08-24 11:09:02 -07:00
Simon Pilgrim	194b08000c	[DAG] LoadedSlice::canMergeExpensiveCrossRegisterBankCopy - replace getABITypeAlign with allowsMemoryAccess (PR45116) One of the cases identified in PR45116 - we don't need to limit load combines to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.	2021-08-24 15:28:30 +01:00
Simon Pilgrim	6de0b55188	[DAG] TransformFPLoadStorePair - replace getABITypeAlign with allowsMemoryAccess (PR45116) One of the cases identified in PR45116 - we don't need to limit load combines (in this case for fp->int load/store copies) to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback. Differential Revision: https://reviews.llvm.org/D108318	2021-08-24 13:11:27 +01:00
Simon Pilgrim	e431b280c9	[DAG] CombineConsecutiveLoads - replace getABITypeAlign with allowsMemoryAccess (PR45116) One of the cases identified in PR45116 - we don't need to limit load combines (in this case for ISD::BUILD_PAIR) to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback. This helps in particular for 32-bit X86 cases loading 64-bit size data, reducing codegen diffs vs x86_64. Differential Revision: https://reviews.llvm.org/D108307	2021-08-24 12:31:22 +01:00
Stanislav Mekhanoshin	401a45c61b	Fix late rematerialization operands check D106408 enables rematerialization of instructions with virtual register uses. That has uncovered the bug in the allUsesAvailableAt implementation: https://bugs.llvm.org/show_bug.cgi?id=51516. In the majority of cases canRematerializeAt() called to check if an instruction can be rematerialized before the given UseIdx. However, SplitEditor::enterIntvAtEnd() calls it to rematerialize an instruction at the end of a block passing LIS.getMBBEndIdx() into the check. In the testcase from the bug it has attempted to rematerialize ADDXri after STRXui in bb.17. The use operand %55 of the ADD is killed by the STRX but that is undetected by the check because it adjusts passed UseIdx to the reg slot, before the kill. The value is dead at the index passed to the check however. This change uses a later of passed UseIdx and its reg slot. This shall be correct because if are checking an availability of operands before an instruction that instruction cannot be the one defining these operands. If we are checking for late rematerialization we are really interested if operands live past the instruction. The bug is not exploitable without D106408 but needed to reland reverted D106408. Differential Revision: https://reviews.llvm.org/D108475	2021-08-23 12:23:58 -07:00
Jessica Paquette	6760e2a7bc	[GlobalISel] Translate @llvm.llround.* -> G_LLROUND Translate it using `IRTranslator::translateSimpleIntrinsic`. Differential Revision: https://reviews.llvm.org/D108563	2021-08-23 09:42:53 -07:00
Ben Shi	f69fb7ac72	[DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2) Reviewed by: lebedev.ri, spatel, craig.topper, luismarques, jrtc27 Differential Revision: https://reviews.llvm.org/D107711	2021-08-22 16:53:32 +08:00
Fangrui Song	1dfb30e54c	[TargetCallingConv] Change OutputArg ctor to match its members This avoids unneeded MVT->EVT conversion.	2021-08-21 16:41:48 -07:00
Jessica Paquette	af8e09d4bb	[GlobalISel] Add G_LLROUND Basically the same as G_LROUND. Handles the llvm.llround family of intrinsics. Also add a helper function to the MachineVerifier for checking if all of the (virtual register) operands of an instruction are scalars. Seems like a useful thing to have. Differential Revision: https://reviews.llvm.org/D108429	2021-08-20 14:07:21 -07:00
Daniel Paoliello	8ecce69594	Fix SEH table addresses for Windows Issue Details: The addresses for SEH tables for Windows are incorrect as 1 was unconditionally being added to all addresses. +1 is required for the SEH end address (as it is exclusive), but the SEH start addresses is inclusive and so should be used as-is. In the IP2State tables, the addresses are +1 for AMD64 to handle the return address for a call being after the actual call instruction but are as-is for ARM and ARM64 as the `StateFromIp` function in the VC runtime automatically takes this into account and adjusts the address that is it looking up. Fix Details: * Split the `getLabel` function into two: `getLabel` (used for the SEH start address and ARM+ARM64 IP2State addresses) and `getLabelPlusOne` (for the SEH end address, and AMD64 IP2State addresses). Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D107784	2021-08-20 22:32:12 +03:00
Craig Topper	10020d41ee	[TypePromotion] Remove unused IRBuilder object. NFC	2021-08-20 12:20:09 -07:00
Jeremy Morse	ce8254d096	[DebugInfo][InstrRef] Correctly ignore DBG_VALUE_LIST in InstrRef mode This patch makes InstrRefBasedLDV "safe" to work with DBG_VALUE_LISTs. It doesn't actually interpret them, but it recognises that they specify variable locations and avoids propagating false locations, which is better than the current state. Observe the attached tes * We avoid propagating DBG_VALUE_LISTs into successor blocks, as they're not "currently" supported, * We don't propagate other variable locations across DBG_VALUE_LISTs, because we know that the variable location is terminated by the DBG_VALUE_LIST. Differential Revision: https://reviews.llvm.org/D108143	2021-08-20 14:51:02 +01:00
Jeremy Morse	c76c24e40b	[DebugInfo][InstrRef] Remove a faulty assertion This patch removes an assertion, and adds a regression test showing why the assertion is broken. For context, LocIdx is a key/index number for machine locations, so that we can describe locations as a single integer and ignore whether they're on the stack, in registers or otherwise. Back when InstrRefBasedLDV was added, I happened to bake in a "special" zero number for various reasons, which Vedant identified as undesirable in this review comment: https://reviews.llvm.org/D83047#inline-765495 . I subsequently removed that special zero number, but it looks like I didn't delete this assertion at the time, which assumes that a zero LocIdx is invalid. The attached test shows that this assertion is reachable on valid code -- on x86 $rsp always gets the LocIdx number zero, and if you transfer a variable value into it, InstrRefBasedLDV crashes on that assertion. The code might be a bit wild to be storing variables to $rsp like that, however we shouldn't crash on it. Differential Revision: https://reviews.llvm.org/D108134	2021-08-20 14:23:32 +01:00
Anshil Gandhi	508b06699a	[Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions Produce remarks when atomic instructions are expanded into hardware instructions in SIISelLowering.cpp. Currently, these remarks are only emitted for atomic fadd instructions. Differential Revision: https://reviews.llvm.org/D108150	2021-08-19 20:51:19 -06:00
Jessica Paquette	3207ed196c	[GlobalISel] Add IRTranslator support for @llvm.lround.* -> G_LROUND Translate the `@llvm.lround.*` family to G_LROUND via `IRTranslator::translateSimpleIntrinsic`. Differential Revision: https://reviews.llvm.org/D108418	2021-08-19 17:08:08 -07:00
Jessica Paquette	3118926483	[GlobalISel] Add a G_LROUND instruction Meant to represent the `@llvm.lround.*` family. Add the opcode, docs, and verification. Differential Revision: https://reviews.llvm.org/D108417	2021-08-19 17:06:24 -07:00
Amara Emerson	95ac3d15e9	[AArch64][GlobalISel] Add G_VECREDUCE fewerElements support for full scalarization. For some reductions like G_VECREDUCE_OR on AArch64, we need to scalarize completely if the source is <= 64b. This change adds support for that in the legalizer. If the source has a pow-2 num elements, then we can do a tree reduction using the scalar operation in the individual elements. Otherwise, we just create a sequential chain of operations. For AArch64, we only need to scalarize if the input is <64b. If it's great than 64b then we can first do a fewElements step to 64b, taking advantage of vector instructions until we reach the point of scalarization. I also had to relax the verifier checks for reductions because the intrinsics support <1 x EltTy> types, which we lower to scalars for GlobalISel. Differential Revision: https://reviews.llvm.org/D108276	2021-08-19 16:38:52 -07:00
Adrian Prantl	1e586bcc3e	Move function definition out-of-line to fix the modularized build (NFC)	2021-08-19 12:26:23 -07:00
Craig Topper	84cea602f9	Revert "[SelectionDAGBuilder] Compute and cache PreferredExtendType on demand." This reverts commit `add08c8741`. There was a compile time jump on tramp3d-v4 on https://llvm-compile-time-tracker.com/ Want to see if it goes away with this reverted.	2021-08-19 08:42:05 -07:00
David Green	d10f23a25d	[ISel] Expand saddsat and ssubsat via asr and xor This changes the lowering of saddsat and ssubsat so that instead of using: r,o = saddo x, y c = setcc r < 0 s = c ? INTMAX : INTMIN ret o ? s : r into using asr and xor to materialize the INTMAX/INTMIN constants: r,o = saddo x, y s = ashr r, BW-1 x = xor s, INTMIN ret o ? x : r https://alive2.llvm.org/ce/z/TYufgD This seems to reduce the instruction count in most testcases across most architectures. X86 has some custom lowering added to compensate for cases where it can increase instruction count. Differential Revision: https://reviews.llvm.org/D105853	2021-08-19 16:08:07 +01:00
Craig Topper	add08c8741	[SelectionDAGBuilder] Compute and cache PreferredExtendType on demand. Previously we pre-calculated this and cached it for every instruction in the function. Most of the calculated results will never be used. So instead calculate it only on the first use, and then cache it. The cache was originally added to fix a compile time issue which caused r216066 to be reverted. This change exposed that we weren't pre-computing the Value for Arguments. I've explicitly disabled that for now as it seemed to regress some tests on AArch64 which has sext built into its compare instructions. Spotted while investigating how to improve heuristics to work better with RISCV preferring sign extend for unsigned compares for i32 on RV64. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D107976	2021-08-19 07:18:33 -07:00
Craig Topper	c60a4c1ba5	[TypePromotion] Use Instruction* instead of Value* for a couple functions. NFC This matches how they are called and allows some isa/cast/dyn_cast to be removed. Differential Revision: https://reviews.llvm.org/D108333	2021-08-19 07:09:38 -07:00
Fraser Cormack	e6b1ac8546	[LegalizeTypes][VP] Add widening support for binary VP ops This patch adds the beginnings of more thorough support in the legalizers for vector-predicated (VP) operations. The first step is the ability to widen illegal vectors. The more complicated scenario in which the result/operands need widening but the mask doesn't has not been handled here. That would require a lot of code without an in-tree target on which to test it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D107904	2021-08-19 13:08:47 +01:00
Rong Xu	5fdaaf7fd8	[SampleFDO] Flow Sensitive Sample FDO (FSAFDO) profile loader This patch implements Flow Sensitive Sample FDO (FSAFDO) profile loader. We have two profile loaders for FS profile, one before RegAlloc and one before BlockPlacement. To enable it, when -fprofile-sample-use=<profile> is specified, add "-enable-fs-discriminator=true \ -disable-ra-fsprofile-loader=false \ -disable-layout-fsprofile-loader=false" to turn on the FS profile loaders. Differential Revision: https://reviews.llvm.org/D107878	2021-08-18 18:37:35 -07:00
Kyungwoo Lee	829616c241	[NFC][DebugInfo] getDwarfCompileUnitID This is a refactoring for the use in https://reviews.llvm.org/D108261 Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D108271	2021-08-18 17:35:03 -07:00
Arthur Eubanks	2fc075948c	[NFC] Remove some unnecessary AttributeList methods These rely on methods I'm trying to cleanup.	2021-08-18 11:15:20 -07:00
Jessica Paquette	791006fb8c	[GlobalISel] Implement lowering for G_ISNAN + use it in AArch64 GlobalISel equivalent to `TargetLowering::expandISNAN`. Use it in AArch64 and add a testcase. Differential Revision: https://reviews.llvm.org/D108227	2021-08-18 10:54:25 -07:00
Jessica Paquette	d9873711cb	[GlobalISel] Add IRTranslator support for G_ISNAN Translate the `@llvm.isnan` intrinsic to G_ISNAN when we see it. This is pretty much the same as the associated SelectionDAGBuilder code. Main difference is that we don't expand it here. It makes more sense to do that during legalization in GlobalISel. GlobalISel will just legalize the generated illegal types. Differential Revision: https://reviews.llvm.org/D108226	2021-08-18 10:48:10 -07:00
Jessica Paquette	0a2b1ba33a	[GlobalISel] Add G_ISNAN Add a generic opcode equivalent to the `llvm.isnan` intrinsic + MachineVerifier support for it. We need an opcode here because we may want target-specific lowering later on. Differential Revision: https://reviews.llvm.org/D108222	2021-08-18 10:42:05 -07:00
Petr Hosek	2d4470ab89	Revert "Allow rematerialization of virtual reg uses" This reverts commit `877572cc19` which introduced PR51516.	2021-08-18 00:12:41 -07:00
Arthur Eubanks	3f4d00bc3b	[NFC] More get/removeAttribute() cleanup	2021-08-17 21:05:41 -07:00
Arthur Eubanks	de0ae9e89e	[NFC] Cleanup more AttributeList::addAttribute()	2021-08-17 21:05:41 -07:00
Qiu Chaofan	5ca250a03d	[RegAlloc] Remove addAllocPriorityToGlobalRanges hook It was introduced in `1a6dc92` and only enabled on PowerPC/AMDGPU. That should be enabled for all targets. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D108010	2021-08-18 10:21:27 +08:00
jacquesguan	a7ebc4d145	[DAGCombiner] Teach isKnownToBeAPowerOfTwo handle SPLAT_VECTOR Make DAGCombine turn mul by power of 2 into shl for scalable vector. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107883	2021-08-18 10:10:40 +08:00
Wang, Pengfei	2379949aad	[X86] AVX512FP16 instructions enabling 3/6 Enable FP16 conversion instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105265	2021-08-18 09:03:41 +08:00
Simon Pilgrim	d7f288502f	SelectionDAGBuilder::visitInlineAsm - don't dereference dyn_cast<> results. dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct. Fixes static analyser warning.	2021-08-17 18:40:59 +01:00
Fraser Cormack	f3e9047249	[VP] Add vector-predicated reduction intrinsics This patch adds vector-predicated ("VP") reduction intrinsics corresponding to each of the existing unpredicated `llvm.vector.reduce.*` versions. Unlike the unpredicated reductions, all VP reductions have a start value. This start value is returned when the no vector element is active. Support for expansion on targets without native vector-predication support is included. This patch is based on the ["reduction slice"](https://reviews.llvm.org/D57504#1732277) of the LLVM-VP reference patch (https://reviews.llvm.org/D57504). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104308	2021-08-17 17:56:35 +01:00
Sebastian Neubauer	fbae34635d	[GlobalISel] Add combine for PTR_ADD with regbanks Combine two G_PTR_ADDs, but keep the register bank of the constant. That way, the combine can be used in post-regbank-select combines. Introduce two helper methods in CombinerHelper, getRegBank and setRegBank that get and set an optional register bank to a register. That way, they can be used before and after register bank selection. Differential Revision: https://reviews.llvm.org/D103326	2021-08-17 13:58:16 +02:00
Tiehu Zhang	9cfa9b44a5	[CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree, which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert according to the use chain Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107262	2021-08-17 18:58:15 +08:00
Jeremy Morse	708cbda577	[DebugInfo][InstrRef] Honour too-much-debug-info cutouts This reapplies `54a61c94f9`, its follow up in `547b712500`, which were reverted `95fe61e639`. Original commit message: VarLoc based LiveDebugValues will abandon variable location propagation if there are too many blocks and variable assignments in the function. If it didn't, and we had (say) 1000 blocks and 1000 variables in scope, we'd end up with 1 million DBG_VALUEs just at the start of blocks. Instruction-referencing LiveDebugValues should honour this limitation too (because the same limitation applies to it). Hoist the relevant command line options into LiveDebugValues.cpp and pass it down into the implementation classes as an argument to ExtendRanges. I've duplicated all the run-lines in live-debug-values-cutoffs.mir to have an instruction-referencing flavour. Differential Revision: https://reviews.llvm.org/D107823	2021-08-17 11:34:49 +01:00
Arthur Eubanks	0d822da2bd	[NFC] Remove/replace some confusing attribute getters on Function	2021-08-16 16:12:37 -07:00
Afanasyev Ivan	913b5d2f7a	[AsmPrinter] fix nullptr dereference for MBBs with hasAddressTaken property without BB Basic block pointer is dereferenced unconditionally for MBBs with hasAddressTaken property. MBBs might have hasAddressTaken property without reference to BB. Backend developers must assign fake BB to MBB to workaround this issue and it should be fixed. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D108092	2021-08-16 15:32:09 -07:00
Anshil Gandhi	f22ba51873	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpand pass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-16 14:56:01 -06:00
Stanislav Mekhanoshin	877572cc19	Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408	2021-08-16 12:42:42 -07:00
Stanislav Mekhanoshin	b9e433b02a	Prevent machine licm if remattable with a vreg use Check if a remateralizable nstruction does not have any virtual register uses. Even though rematerializable RA might not actually rematerialize it in this scenario. In that case we do not want to hoist such instruction out of the loop in a believe RA will sink it back if needed. This already has impact on AMDGPU target which does not check for this condition in its isTriviallyReMaterializable implementation and have instructions with virtual register uses enabled. The other targets are not impacted at this point although will be when D106408 lands. Differential Revision: https://reviews.llvm.org/D107677	2021-08-16 12:09:00 -07:00
Craig Topper	92abb1cf90	[TypePromotion] Don't mutate the result type of SwitchInst. SwitchInst should have a void result type. Add a check to the verifier to catch this error. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D108084	2021-08-16 08:54:34 -07:00
Simon Pilgrim	d6fe8d37c6	[DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b) Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal. This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands. Differential Revision: https://reviews.llvm.org/D107597	2021-08-16 16:06:54 +01:00
Jeremy Morse	95fe61e639	Revert `54a61c94f9` and its follow up in `547b712500` These were part of D107823, however asan has found something excitingly wrong happening: https://lab.llvm.org/buildbot/#/builders/5/builds/10543/steps/13/logs/stdio	2021-08-16 15:48:56 +01:00
Jeremy Morse	547b712500	Suppress signedness-comparison warning This is a follow-up to `54a61c94f9`.	2021-08-16 15:29:43 +01:00
Jeremy Morse	54a61c94f9	[DebugInfo][InstrRef] Honour too-much-debug-info cutouts VarLoc based LiveDebugValues will abandon variable location propagation if there are too many blocks and variable assignments in the function. If it didn't, and we had (say) 1000 blocks and 1000 variables in scope, we'd end up with 1 million DBG_VALUEs just at the start of blocks. Instruction-referencing LiveDebugValues should honour this limitation too (because the same limitation applies to it). Hoist the relevant command line options into LiveDebugValues.cpp and pass it down into the implementation classes as an argument to ExtendRanges. I've duplicated all the run-lines in live-debug-values-cutoffs.mir to have an instruction-referencing flavour. Differential Revision: https://reviews.llvm.org/D107823	2021-08-16 15:06:40 +01:00
Paul Walker	cd0e196413	[DAGCombiner] Stop visitEXTRACT_SUBVECTOR creating illegal BITCASTs post legalisation. visitEXTRACT_SUBVECTOR can sometimes create illegal BITCASTs when removing "redundant" INSERT_SUBVECTOR operations. This patch adds an extra check to ensure such combines only occur after operation legalisation if any resulting BITBAST is itself legal. Differential Revision: https://reviews.llvm.org/D108086	2021-08-15 18:25:49 +01:00
Qiu Chaofan	a240b29f21	[NFC] Simply update a FIXME comment X86 overrided LowerOperationWrapper was moved to common implementation in `a7eae62`.	2021-08-15 22:43:46 +08:00
Dávid Bolvanský	49de6070a2	Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop" This reverts commit `435785214f`. Still same compile time issues for -O0 -g, eg. +1.3% for sqlite3.	2021-08-15 11:44:13 +02:00
Anshil Gandhi	435785214f	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpand pass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-14 23:37:23 -06:00
Anshil Gandhi	29e11a1aa3	Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop" This reverts commit `c4e5425aa5`.	2021-08-13 23:58:04 -06:00
Anshil Gandhi	c4e5425aa5	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpandPass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-13 22:44:08 -06:00
Jessica Paquette	50efbf9cbe	[GlobalISel] Narrow binops feeding into G_AND with a mask This is a fairly common pattern: ``` %mask = G_CONSTANT iN <mask val> %add = G_ADD %lhs, %rhs %and = G_AND %add, %mask ``` We have combines to eliminate G_AND with a mask that does nothing. If we combined the above to this: ``` %mask = G_CONSTANT iN <mask val> %narrow_lhs = G_TRUNC %lhs %narrow_rhs = G_TRUNC %rhs %narrow_add = G_ADD %narrow_lhs, %narrow_rhs %ext = G_ZEXT %narrow_add %and = G_AND %ext, %mask ``` We'd be able to take advantage of those combines using the trunc + zext. For this to work (or be beneficial in the best case) - The operation we want to narrow then widen must only be used by the G_AND - The G_TRUNC + G_ZEXT must be free - Performing the operation at a narrower width must not produce a different value than performing it at the original width after masking. Example comparison between SDAG + GISel: https://godbolt.org/z/63jzb1Yvj At -Os for AArch64, this is a 0.2% code size improvement on CTMark/pairlocalign. Differential Revision: https://reviews.llvm.org/D107929	2021-08-13 18:31:13 -07:00
Matt Arsenault	cc56152f83	GlobalISel: Add helper function for getting EVT from LLT This can only give an imperfect approximation, but is enough to avoid crashing in places where we call into EVT functions starting from LLTs.	2021-08-13 21:10:13 -04:00
Arthur Eubanks	f80ae58068	[NFC] Cleanup calls to AttributeList::getAttribute(FunctionIndex) getAttribute() is confusing, use a clearer method.	2021-08-13 16:27:11 -07:00
Arthur Eubanks	d7593ebaee	[NFC] Clean up users of AttributeList::hasAttribute() AttributeList::hasAttribute() is confusing, use clearer methods like hasParamAttr()/hasRetAttr(). Add hasRetAttr() since it was missing from AttributeList.	2021-08-13 11:59:18 -07:00
Arthur Eubanks	92ce6db9ee	[NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr() This is more consistent with similar methods.	2021-08-13 11:09:18 -07:00
Ruiling Song	e1beebbac5	SplitKit: Don't further split subrange mask in buildCopy We may use several COPY instructions to copy the needed sub-registers during split. But the way we split the lanes during the COPYs may be different from the subranges of the old register. This would fail when we extend the subranges of the new register because the LaneMasks do not match exactly between subranges of new register and old register. Since we are bundling the COPYs, I think there is no need to further refine the subranges of the new register based on the set of LaneMasks of the inserted COPYs. I am not sure if there will be further breaking cases. But as the subranges of new register are created based on the LaneMasks of the subranges of old register, it will be highly possible we will always find an exact LaneMask match. We can think about how to make the extendPHIKillRanges() work for subrange mask mismatch case if we meet more such cases in the future. The test case was from D105065 by @arsenm. Differential Revision: https://reviews.llvm.org/D107829	2021-08-13 07:36:38 +08:00
Rong Xu	4c5909ba83	[SampleFDO] Add two passes of MIRAddFSDiscriminatorsPass This patch adds Pass1 of MIRADDFSDiscriminatorsPass before register allocation, and Pass2 of MIRAddFSDiscriminatorsPass before Block-Placement. This is still under --enable-fs-discrmininator option (default false). This would reduce the turn-around time for FSAFDO transition. Differential Revision: https://reviews.llvm.org/D104579	2021-08-11 11:11:04 -07:00
Fraser Cormack	885be620f9	[LegalizeTypes][NFC] Remove else-after-return Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107890	2021-08-11 16:48:28 +01:00
Rainer Orth	7bbbf29561	[ELF] Don't emit SHF_GNU_RETAIN on Solaris The introduction of `SHF_GNU_RETAIN` has caused massive problems on Solaris. Initially, as reported in Bug 49437, it caused dozens of testsuite failures on both sparc and x86. The objects were marked as `ELFOSABI_NONE`, but `SHF_GNU_RETAIN` is a GNU extension. In the native Solaris ABI, that flag (in the range for OS-specific values) is `SHF_SUNW_ABSENT` with a completely different semantics, which confuses Solaris `ld` very much. Later, the objects became (correctly) marked `ELFOSABI_GNU`, which Solaris `ld` doesn't support, causing it to SEGV and break the build. The linker is currently being hardened to not accept non-native OS ABIs to avoid this. The need for linker support is already documented in `clang/include/clang/Basic/AttrDocs.td`, but not currently checked. This patch avoids all this by not emitting `SHF_GNU_RETAIN` on Solaris at all. Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and `x86_64-pc-linux-gnu`. Differential Revision: https://reviews.llvm.org/D107747	2021-08-11 09:27:51 +02:00
madhur13490	61526b1262	[DAG] Reword comment for EnforceNodeIdInvariant and InvalidateNodeId. NFC. Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D107845	2021-08-11 12:14:28 +05:30
Craig Topper	a8ae41fb51	[SelectionDAGBuilder] Save iterator to avoid second DenseMap lookup. NFC We were calling find and then using operator[]. Instead keep the iterator from find and use it to get the value. Just happened to notice while investigating how we decide what extends to use between basic blocks.	2021-08-10 22:37:48 -07:00
Christopher Di Bella	c874dd5362	[llvm][clang][NFC] updates inline licence info Some files still contained the old University of Illinois Open Source Licence header. This patch replaces that with the Apache 2 with LLVM Exception licence. Differential Revision: https://reviews.llvm.org/D107528	2021-08-11 02:48:53 +00:00
Amara Emerson	7ec4ce157b	[AArch64][GlobalISel] Relax oneuse restriction for PTR_ADD chain combining to check addressing legality. With contributions by Sebastian Neubauer Differential Revision: https://reviews.llvm.org/D105676	2021-08-10 16:41:18 -07:00
Adrian Prantl	d6b6880172	Streamline the API of salvageDebugInfoImpl (NFC) This patch refactors / simplifies salvageDebugInfoImpl(). The goal here is to simplify the implementation of coro::salvageDebugInfo() in a followup patch. 1. Change the return value to I.getOperand(0). Currently users of salvageDebugInfoImpl() assume that the first operand is I.getOperand(0). This patch makes this information explicit. A nice side-effect of this change is that it allows us to salvage expressions such as add i8 1, %a in the future. 2. Factor out the creation of a DIExpression and return an array of DIExpression operations instead. This change allows users that call salvageDebugInfoImpl() in a loop to avoid the costly creation of temporary DIExpressions and to defer the creation of a DIExpression until the end. This patch does not change any functionality. rdar://80227769 Differential Revision: https://reviews.llvm.org/D107383	2021-08-10 15:21:18 -07:00
Jinsong Ji	2cfd427626	[AIX] Don't crash on unimplemented lowerRelativeReference We may call lowerRelativeReference in MC to determine whether target supports this lowering. We should return nullptr instead of crashing when we haven't implemented the real lowering. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D107830	2021-08-10 17:43:06 +00:00
Matt Arsenault	1b41945da0	RegAllocGreedy: Add spaces between registers in debug message	2021-08-10 13:12:34 -04:00
Konstantin Schwarz	64bef13f08	[GlobalISel] Look through truncs and extends in narrowScalarShift If a G_SHL is fed by a G_CONSTANT, the lower and upper bits of the source can be shifted individually by the constant shift amount. However in case the shift amount came from a G_TRUNC(G_CONSTANT), the generic shift legalization code was used, producing intermediate shifts that are potentially illegal on some targets. This change teaches narrowScalarShift to look through G_TRUNCs and G_*EXTs. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D89100	2021-08-10 13:49:22 +02:00
Wang, Pengfei	6f7f5b54c8	[X86] AVX512FP16 instructions enabling 1/6 1. Enable FP16 type support and basic declarations used by following patches. 2. Enable new instructions VMOVW and VMOVSH. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105263	2021-08-10 12:46:01 +08:00
Jeremy Morse	d4ce9e463d	[DWARF] Revert sharing subprograms across CUs This patch is a revert of `e08f205f5c`. In that patch, DW_TAG_subprograms were permitted to be referenced across CU boundaries, to improve stack trace construction using call site information. Unfortunately, as documented in PR48790, the way that subprograms are "owned" by dwarf units is sufficiently complicated that subprograms end up in unexpected units, invalidating cross-unit references. There's no obvious way to easily fix this, and several attempts have failed. Revert this to ensure correct DWARF is always emitted. Three tests change in addition to the reversion, but they're all very light alterations. Differential Revision: https://reviews.llvm.org/D107076	2021-08-09 12:43:43 +01:00
Luo, Yuanke	53642d5b80	[NFC] Fix the formula for reciprocal calculation. Differential Revision: https://reviews.llvm.org/D107713	2021-08-09 16:03:56 +08:00
Amara Emerson	4c2e01232c	[GlobalISel] Fix a combine causing DBG_VALUE with dangling vregs. We should use MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() instead of eraseFromParent(). We should probably use that in other places too but fix this issue which affects clang bootstrap builds for now.	2021-08-07 01:41:02 -07:00
Nemanja Ivanovic	62fe3dcf98	Fix PPC buildbot break caused by `4c4093e6e3` This commit adds the isnan intrinsic and provides a default expansion for it in the SDAG. However, it makes the assumption that types it operates on are IEEE-compliant types. This is not always the case. An example of that is PPC "double double" which has a representation that - Does not need to conform to IEEE requirements for isnan as it is not an IEEE-compliant type - Does not have a representation that allows for straightforward reinterpreting as an integer and use of integer operations The result was that this commit broke __builtin_isnan for ppc_fp128 making many valid numeric values report a NaN. This patch simply changes the expansion to always expand to unordered comparison (regardless of whether FP exceptions are tracked). This is inline with previous semantics.	2021-08-06 22:10:20 -05:00
Amara Emerson	2b067e3335	Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG. DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.	2021-08-06 12:57:53 -07:00
Jon Roelofs	eae4a44c1d	[GlobalISel][KnownBits] Implement G_CTPOP Implementation copied almost verbatim from ValueTracking. Differential revision: https://reviews.llvm.org/D107606	2021-08-06 09:48:39 -07:00
Craig Topper	b2ca4dc935	[LegalizeTypes] Add a simple expansion for SMULO when a libcall isn't available. This isn't optimal, but prevents crashing when the libcall isn't available. It just calculates the full product and makes sure the high bits match the sign of the low half. Each of the pieces should go through their own type legalization. This can make D107420 unnecessary. Needs tests, but I wanted to start discussion about D107420. Reviewed By: FreddyYe Differential Revision: https://reviews.llvm.org/D107581	2021-08-06 09:43:01 -07:00
Kazu Hirata	276be84d0a	[CodeGen] Remove computeDefOperandLatency (NFC) The last use was removed on Oct 9, 2016 in commit `5c924d7117`.	2021-08-06 08:26:55 -07:00
Jay Foad	57b9107e3f	[GlobalISel] Improve widening of cttz/cttz_zero_undef Differential Revision: https://reviews.llvm.org/D107631	2021-08-06 14:25:56 +01:00
Jay Foad	cd2594e1c6	[GlobalISel] Improve legalization of narrow CTTZ Differential Revision: https://reviews.llvm.org/D107457	2021-08-06 09:40:48 +01:00
Serge Pavlov	4c4093e6e3	Introduce intrinsic llvm.isnan This is recommit of the patch `16ff91ebcc`, reverted in `0c28a7c990` because it had an error in call of getFastMathFlags (base type should be FPMathOperator but not Instruction). The original commit message is duplicated below: Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-06 14:32:27 +07:00
Sean Fertile	23651c5ae0	[PowerPC][AIX] Create multiple constant sections. Fixes issue where late materialized constants can be more strictly aligned then their containing csect. Differential Revision: https://reviews.llvm.org/D103103	2021-08-05 21:19:16 -04:00
Jon Roelofs	5fc7b1a260	Revert "[GlobalISel][KnownBits] Implement G_CTPOP" This reverts commit `ce6eb4f15a`. It's broken on the windows bots: https://reviews.llvm.org/D107606#2930121	2021-08-05 17:47:47 -07:00
Jon Roelofs	ce6eb4f15a	[GlobalISel][KnownBits] Implement G_CTPOP Implementation copied almost verbatim from ValueTracking. Differential revision: https://reviews.llvm.org/D107606	2021-08-05 17:17:29 -07:00
Craig Topper	f7076cfd3a	[DAGCombiner][RISCV][AMDGPU] Call SimplifyDemandedBits at the end of visitMULHU to enable known bits contant folding. We don't have real demanded bits support for MULHU, but we can still use the known bits based constant folding support at the end of SimplifyDemandedBits to simplify a MULHU. This helps with cases where we know the LHS and RHS have enough leading zeros so that the high multiply result is always 0. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D106471	2021-08-05 08:31:26 -07:00
Simon Pilgrim	2cbf9fd402	[DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes. This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector. This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector. Fixes PR50053 Differential Revision: https://reviews.llvm.org/D107068	2021-08-05 15:40:48 +01:00
Paul Robinson	75aa3d520d	Add a DIExpression const-folder to prevent silly expressions. It's entirely possible (because it actually happened) for a bool variable to end up with a 256-bit DW_AT_const_value. This came about when a local bool variable was initialized from a bitfield in a 32-byte struct of bitfields, and after inlining and constant propagation, the variable did have a constant value. The sequence of optimizations had it carrying "i256" values around, but once the constant made it into the llvm.dbg.value, no further IR changes could affect it. Technically the llvm.dbg.value did have a DIExpression to reduce it back down to 8 bits, but the compiler is in no way ready to emit an oversized constant and a DWARF expression to manipulate it. Depending on the circumstances, we had either just the very fat bool value, or an expression with no starting value. The sequence of optimizations that led to this state did seem pretty reasonable, so the solution I came up with was to invent a DWARF constant expression folder. Currently it only does convert ops, but there's no reason it couldn't do other ops if that became useful. This broke three tests that depended on having convert ops survive into the DWARF, so I added an operator that would abort the folder to each of those tests. Differential Revision: https://reviews.llvm.org/D106915	2021-08-05 06:14:40 -07:00
Petar Avramovic	66de26b1f9	GlobalISel: Fix matchEqualDefs for instructions with multiple defs Instructions that produceSameValue produce same values for operands with same index. matchEqualDefs used to return true for any two values from different instructions that produce same values. Fix this by checking if values are defined by operands with the same index. Differential Revision: https://reviews.llvm.org/D107362	2021-08-05 15:05:45 +02:00
Dominik Montada	cc947e29ea	[GlobalISel] Combine shr(shl x, c1), c2 to G_SBFX/G_UBFX Reviewed By: foad Differential Revision: https://reviews.llvm.org/D107330	2021-08-05 13:52:10 +02:00
Fraser Cormack	0b8471e91b	[SelectionDAG] Correctly determine the VECREDUCE_SEQ_FMUL action The LegalizeAction for this node should follow the logic for `VECREDUCE_SEQ_FADD` and be determined using the vector operand's type. here isn't an in-tree target that makes use of this, but I think it's safe to say this is how it should behave, should a target want to customize the action for this node. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107478	2021-08-05 09:42:33 +01:00
Fangrui Song	a194438615	[CodeGen] Add -align-loops to `lib/CodeGen/CommandFlags.cpp`. It can replace -x86-experimental-pref-loop-alignment=. The loop alignment is only used by MachineBlockPlacement. The implementation uses a new `llvm::TargetOptions` for now, as an IR function attribute/module flags metadata may be overkill. This is the llvm part of D106701.	2021-08-04 12:45:18 -07:00
Craig Topper	c23405174a	[DAGCombiner][AMDGPU] Canonicalize constants to the RHS of MULHU/MULHS. This allows special constants like to 0 to be recognized. It's also expected by isel patterns if a target had a mulh with immediate instructions. The commuting done by tablegen won't commute patterns with immediates since it expects DAGCombine to have done it. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D107486	2021-08-04 11:39:23 -07:00
David Green	eeddcba525	[RDA] Attempt to make RDA subreg aware This attempts to make more of RDA aware of potentially overlapping subregisters. Some of this was already in place, with it iterating through MCRegUnitIterators. This also replaces calls to LiveRegs.contains(..) with !LiveRegs.available(..), and updates the isValidRegUseOf and isValidRegDefOf to search subregs. Differential Revision: https://reviews.llvm.org/D107351	2021-08-04 14:21:32 +01:00
Serge Pavlov	0c28a7c990	Revert "Introduce intrinsic llvm.isnan" This reverts commit `16ff91ebcc`. Several errors were reported mainly test-suite execution time. Reverted for investigation.	2021-08-04 17:18:15 +07:00
Serge Pavlov	16ff91ebcc	Introduce intrinsic llvm.isnan Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-04 15:27:49 +07:00
Heejin Ahn	9bd02c433b	[WebAssembly] Misc. cosmetic changes in EH (NFC) - Rename `wasm.catch` intrinsic to `wasm.catch.exn`, because we are planning to add a separate `wasm.catch.longjmp` intrinsic which returns two values. - Rename several variables - Remove an unnecessary parameter from `canLongjmp` and `isEmAsmCall` from LowerEmscriptenEHSjLj pass - Add `-verify-machineinstrs` in a test for a safety measure - Add more comments + fix some errors in comments - Replace `std::vector` with `SmallVector` for cases likely with small number of elements - Renamed `EnableEH`/`EnableSjLj` to `EnableEmEH`/`EnableEmSjLj`: We are soon going to add `EnableWasmSjLj`, so this makes the distincion clearer Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D107405	2021-08-03 21:03:46 -07:00
Arthur Eubanks	ad25344620	[MC][CodeGen] Emit constant pools earlier Previously we would emit constant pool entries for ldr inline asm at the very end of AsmPrinter::doFinalization(). However, if we're emitting dwarf aranges, that would end all sections with aranges. Then if we have constant pool entries to be emitted in those same sections, we'd hit an assert that the section has already been ended. We want to emit constant pool entries before emitting dwarf aranges. This patch splits out arm32/64's constant pool entry emission into its own MCTargetStreamer virtual method. Fixes PR51208 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D107314	2021-08-03 20:55:31 -07:00
Simon Pilgrim	11396641e4	[DAG] Cleanup DAGCombiner::CombineConsecutiveLoads early-outs. NFCI. We had some similar hasOneUse/isNON_EXTLoad early-outs spread out over different parts of the method - we should pull them all together. Noticed while triaging PR45116	2021-08-03 13:47:55 +01:00
Eli Friedman	1f62af6346	[AArch64][SelectionDAG] Support passing/returning scalable vectors with unusual types. This adds handling for two cases: 1. A scalable vector where the element type is promoted. 2. A scalable vector where the element count is odd (or more generally, not divisble by the element count of the part type). (Some element types still don't work; for example, <vscale x 2 x i128>, or <vscale x 2 x fp128>.) Differential Revision: https://reviews.llvm.org/D105591	2021-08-02 15:53:16 -07:00
Max Kazantsev	c5b63714b5	[GC][NFC] Make getGCStrategy by name available in IR We might want to use info from GC strategy in middle end analysis. The motivation for this is provided in D99135: we may want to ask a GC if it's going to work with a given pointer (currently this code makes naive check by the method name). Differetial Revision: https://reviews.llvm.org/D100559 Reviewed By: reames	2021-08-02 14:26:04 +07:00
Matt Arsenault	ebc17a0d68	GlobalISel: Scalarize unaligned vector stores This has the same problems and limitations as the load path.	2021-07-31 10:37:15 -04:00
Simon Pilgrim	3a7c82efb8	[DAG] isGuaranteedNotToBeUndefOrPoison - handle ISD::BUILD_VECTOR nodes If all demanded elements of the BUILD_VECTOR pass a isGuaranteedNotToBeUndefOrPoison check, then we can treat this specific demanded use of the BUILD_VECTOR as guaranteed not to be undef or poison either. Differential Revision: https://reviews.llvm.org/D107174	2021-07-31 15:08:25 +01:00
Matt Arsenault	bc2cb91a20	GlobalISel: Have lowerStore handle some unaligned stores This is NFC until some of the AMDGPU legalization rules are ripped out.	2021-07-31 10:01:42 -04:00
Alexandros Lamprineas	7d940432c4	[AArch64] Legalize MVT::i64x8 in DAG isel lowering This patch legalizes the Machine Value Type introduced in D94096 for loads and stores. A new target hook named getAsmOperandValueType() is added which maps i512 to MVT::i64x8. GlobalISel falls back to DAG for legalization. Differential Revision: https://reviews.llvm.org/D94097	2021-07-31 09:51:28 +01:00
Alexandros Lamprineas	3094e5389b	[AArch64] Add a Machine Value Type for 8 consecutive registers Adds MVT::i64x8, a Machine Value Type needed for lowering inline assembly operands which materialize a sequence of eight general purpose registers. Differential Revision: https://reviews.llvm.org/D94096	2021-07-31 09:51:28 +01:00
Rahman Lavaee	2256b359d7	Explain the symbols of basic block clusters with an example in the header comments. This prevents from confusion with the ``labels`` option. Reviewed By: snehasish Differential Revision: https://reviews.llvm.org/D107128	2021-07-30 12:08:04 -07:00
Simon Pilgrim	3c0b596ecc	SelectionDAGDumper.cpp - remove nested if-else return chain. NFCI. Match style and don't use an else after a return.	2021-07-30 19:23:05 +01:00
Simon Pilgrim	986841cca2	SelectionDAGDumper.cpp - printrWithDepthHelper - remove dead code. NFCI. Fixes coverity warning - we have an early-out for unsigned depth == 0, so the depth < 1 early-out later on is dead code.	2021-07-30 19:23:04 +01:00
Matt Arsenault	e46badd4e9	GlobalISel: Have lowerLoad scalarize unaligned vectors This could be smarter by picking an ideal type, or at least splitting the vector in half first. Also handles lower for non-power-of-2, non-extending vector loads. Currently this just avoids failing to legalize some odd vector AMDGPU tests, but is a step towards removing the split logic from the NarrowScalar logic.	2021-07-30 13:23:29 -04:00
Matt Arsenault	f19226dda5	GlobalISel: Have load lowering handle some unaligned accesses The code for splitting an unaligned access into 2 pieces is essentially the same as for splitting a non-power-of-2 load for scalars. It would be better to pick an optimal memory access size and directly use it, but splitting in half is what the DAG does. As-is this fixes handling of some unaligned sextload/zextloads for AMDGPU. In the future this will help drop the ugly abuse of narrowScalar to handle splitting unaligned accesses.	2021-07-30 12:55:58 -04:00
Adrian Prantl	c5d84d2eb3	GlobalISel/AArch64: don't optimize away redundant branches at -O0 This patch prevents GlobalISel from optimizing out redundant branch instructions when compiling without optimizations. The motivating example is code like the following common pattern in Swift, where users expect to be able to set a breakpoint on the early exit: public func f(b: Bool) { guard b else { return // I would like to set a breakpoint here. } ... } The patch modifies two places in GlobalISEL: The first one is in IRTranslator.cpp where the removal of redundant branches is made conditional on the optimization level. The second one is in AArch64InstructionSelector.cpp where an -O0 only optimization is being removed. Disabling these optimizations increases code size at -O0 by ~8%. However, doing so improves debuggability, and debug builds are the primary reason why developers compile without optimizations. We thus concluded that this is the right trade-off. rdar://79515454 This tenatively reapplies the patch without modifications, the LLDB test that has blocked this from landing previously has since been modified to hopefully no longer be sensitive to this change. Differential Revision: https://reviews.llvm.org/D105238	2021-07-29 16:04:22 -07:00
Amara Emerson	c54d5c9756	[GlobalISel] Use GMergeLikeOp to simplify a combine. NFC.	2021-07-29 13:53:16 -07:00
Amara Emerson	532c458fa8	[GlobalISel] Add GPtrAdd and use it in some combines.	2021-07-29 12:04:02 -07:00
Jessica Clarke	95ef464ac9	Handle subregs and superregs in callee-saved register mask If a target lists both a subreg and a superreg in a callee-saved register mask, the prolog will spill both aliasing registers. Instead, don't spill the subreg if a superreg is being spilled. This case is hit by the PowerPC SPE code, as well as a modified RISC-V backend for CHERI I maintain out of tree. Reviewed By: jhibbits Differential Revision: https://reviews.llvm.org/D73170	2021-07-29 16:53:29 +01:00
Sanjay Patel	fa6b2c9915	[DAGCombiner] don't try to partially reduce add-with-overflow ops This transform was added with D58874, but there were no tests for overflow ops. We need to change this one way or another because it can crash as shown in: https://llvm.org/PR51238 Note that if there are no uses of an overflow op's bool overflow result, we reduce it to a regular math op, so we continue to fold that case either way. If we have uses of both the math and the overflow bool, then we are likely not saving anything by creating an independent sub instruction as seen in the test diffs here. This patch makes the behavior in SDAG consistent with what we do in instcombine AFAICT. Differential Revision: https://reviews.llvm.org/D106983	2021-07-29 08:51:54 -04:00
Guozhi Wei	50b6273145	[MBP] findBestLoopTopHelper should exit if OldTop is not a chain header Function findBestLoopTopHelper tries to find a new loop top block which can also fall through to OldTop, but it's impossible if OldTop is not a chain header, so it should exit immediately. Differential Revision: https://reviews.llvm.org/D106329	2021-07-28 19:00:45 -07:00
Jeremy Morse	8612417e5a	[DebugInfo][InstrRef] Don't break up ret-sequences on debug-info instrs When we have a terminator sequence (i.e. a tailcall or return), MIIsInTerminatorSequence is used to work out where the preceding ABI-setup instructions end, i.e. the parts that were glued to the terminator instruction. This allows LLVM to split blocks safely without having to worry about ABI stuff. The function only ignores DBG_VALUE instructions, meaning that the two debug instructions I recently added can end terminator sequences early, causing various MachineVerifier errors. This patch promotes the test for debug instructions from "isDebugValue" to "isDebugInstr", thus avoiding any debug-info interfering with this function. Differential Revision: https://reviews.llvm.org/D106660	2021-07-28 15:56:00 +01:00
Juneyoung Lee	4f71f59bf3	[DAGCombiner] Fold SETCC(FREEZE(x),const) to FREEZE(SETCC(x,const)) if SETCC is used by BRCOND This patch adds a peephole optimization `SETCC(FREEZE(x),const)` => `FREEZE(SETCC(x,const))` if the SETCC is only used by BRCOND. Combined with `BRCOND(FREEZE(X)) => BRCOND(X)`, this leads to a nice improvement in the generated assembly when x is a masked loaded value. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D105344	2021-07-28 09:22:15 +09:00
Anirudh Prasad	a8cfa4b9bd	[SystemZ][z/OS] Initial code to generate assembly files on z/OS - This patch consists of the bare basic code needed in order to generate some assembly for the z/OS target. - Only the .text and the .bss sections are added for now. - The relevant MCSectionGOFF/Symbol interfaces have been added. This enables us to print out the GOFF machine code sections. - This patch enables us to add simple lit tests wherever possible, and contribute to the testing coverage for the z/OS target - Further improvements and additions will be made in future patches. Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D106380	2021-07-27 11:29:15 -04:00
Jeremy Morse	ec9da51724	[DebugInfo][InstrRef] Correctly update DBG_PHIs during instr scheduling Avoid several crashes when DBG_INSTR_REF and DBG_PHI instructions are fed to the instruction scheduler. DBG_INSTR_REFs should be treated like DBG_LABELs, and just ignored for the purpose of scheduling [0]. DBG_PHIs however behave much more like DBG_VALUEs: they refer to register operands, and if some register defs get shuffled around during instruction scheduling, there's a risk that the debug instr will refer to the wrong value. There's already a facility for updating DBG_VALUEs to reflect this; add DBG_PHI to the list of instructions that it will update. [0] Suboptimal, but it's what instr scheduling does right now. Differential Revision: https://reviews.llvm.org/D106663	2021-07-27 15:12:46 +01:00
Jeremy Morse	7dc9d73731	[DebugInfo][InstrRef] Handle llvm.frameaddress intrinsics gracefully When working out which instruction defines a value, the instruction-referencing variable location code has a few special cases for physical registers: * Arguments are never defined by instructions, * Constant physical registers always read the same value, are never def'd This patch adds a third case for the llvm.frameaddress intrinsics: you can read the framepointer in any block if you so choose, and use it as a variable location, as shown in the added test. This rather violates one of the assumptions behind instruction referencing, that LLVM-ir shouldn't be able to read from an arbitrary register at some arbitrary point in the program. The solution for now is to just emit a DBG_PHI that reads the register value: this works, but if we wanted to do something clever with DBG_PHIs in the future then this would probably get in the way. As it stands, this patch avoids a crash. Differential Revision: https://reviews.llvm.org/D106659	2021-07-27 13:44:37 +01:00
Jay Foad	dc4ca0dbbc	[GlobalISel] Constant fold G_SITOFP and G_UITOFP in CSEMIRBuilder Differential Revision: https://reviews.llvm.org/D104528	2021-07-27 11:27:58 +01:00
Fraser Cormack	7b33b849bd	[SelectionDAG] Support scalable splats in U(ADD\|SUB)SAT combines This patch builds on top of D106575 in which scalable-vector splats were supported in `ISD::matchBinaryPredicate`. It teaches the DAGCombiner how to perform a variety of the pre-existing saturating add/sub combines on scalable-vector types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106652	2021-07-27 10:52:34 +01:00
David Green	e00d67dc48	[NFC] Reflow some debug messages.	2021-07-27 10:11:51 +01:00
Johannes Doerfert	25a3130d89	[Local] Do not introduce a new `llvm.trap` before `unreachable` This is the second attempt to remove the `llvm.trap` insertion after https://reviews.llvm.org/rGe14e7bc4b889dfaffb7180d176a03311df2d4ae6 reverted the first one. It is not clear what the exact issue was back then and it might already be gone by now, it has been >5 years after all. Replaces D106299. Differential Revision: https://reviews.llvm.org/D106308	2021-07-26 23:33:36 -05:00
Mitch Phillips	ae70b211eb	Revert "[GlobalISel] Add scalar widening for G_MERGE_VALUES destination" This reverts commit `0a37163d1d`. Reason: Broke the sanitizer msan bots. More details are available in the original Phabricator review: https://reviews.llvm.org/D106814.	2021-07-26 19:52:12 -07:00
Jon Roelofs	f2e8e46d78	Revert "[AArch64][GlobalISel] Legalize ctpop s128" This reverts commit `97e95fea53`. It broke test/CodeGen/Mips/GlobalISel/llvm-ir/ctpop.ll. Not sure why I didn't see that.	2021-07-26 17:06:43 -07:00
Jessica Paquette	0a37163d1d	[GlobalISel] Add scalar widening for G_MERGE_VALUES destination This adds support for the case where WideSize = DstSize + K * SrcSize In this case, we can pad the G_MERGE_VALUES instruction with K extra undef values with width SrcSize. Then the destination can be handled via widenScalarDst. Differential Revision: https://reviews.llvm.org/D106814	2021-07-26 17:00:00 -07:00
Jon Roelofs	97e95fea53	[AArch64][GlobalISel] Legalize ctpop s128 Differential revision: https://reviews.llvm.org/D106494	2021-07-26 16:33:50 -07:00
Amara Emerson	c658b472f3	[GlobalISel] Add a constant folding combine. Use it AArch64 post-legal combiner. These don't always get folded because when the instructions are created the constants are obscured by artifacts. Differential Revision: https://reviews.llvm.org/D106776	2021-07-26 14:53:33 -07:00
Heejin Ahn	a48ee9f255	[WebAssembly] Remove dominator dependency in WasmEHPrepare (NFC) Dominator trees were previously used for an optimization related to `wasm.lsda` but the optimization was removed in D97309. Currently dominators are not doing anything in this pass. Also removes some `include` lines without which it compiles. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D106811	2021-07-26 14:45:13 -07:00
Matheus Izvekov	f84c70a379	[CodeView] Saturate values bigger than supported by APInt. This fixes an assert firing when compiling code which involves 128 bit integrals. This would trigger runtime checks similar to this: ``` Assertion failed: getMinSignedBits() <= 64 && "Too many bits for int64_t", file llvm/include/llvm/ADT/APInt.h, line 1646 ``` To get around this, we just saturate those big values. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D105320	2021-07-26 22:15:26 +02:00
Craig Topper	14e356d121	[TypePromotion] Remove redundant if. NFC The same condition was checked in the previous if. Maybe this was a bad merge resolution?	2021-07-26 11:47:25 -07:00
Amara Emerson	dec34104bf	[GlobalISel] Add combine for merge(unmerge) and use AArch64 postlegal-combiner. Differential Revision: https://reviews.llvm.org/D106761	2021-07-26 10:37:31 -07:00
Stephen Tozer	31e7551217	[DebugInfo] Correctly update debug users of SSA values in tail duplication During tail duplication, SSA values may be updated and have their uses replaced with a virtual register, and any debug instructions that use that value are deleted. This patch fixes the implementation of the debug instruction deletion to work correctly for debug instructions that use the SSA value multiple times, by batching deletions so that we don't attempt to delete the same instruction twice. Differential Revision: https://reviews.llvm.org/D106557	2021-07-26 17:27:57 +01:00
Jeremy Morse	f86694cb80	[InstrRef][AArch64][1/4] Accept constant physreg variable locations Late in SelectionDAG we join up instruction numbers with their defining instructions, if it couldn't be done during the main part of SelectionDAG. One exception is function arguments, where we have to point a DBG_PHI instruction at the incoming live register, as they don't have a defining instruction. This patch adds another exception, for constant physregs, like aarch64 has. It may seem wasteful to use two instructions where we could use a single DBG_VALUE, however the whole point of instruction referencing is to decouple the identification of values from the specification of where variable location ranges start. (Part of my aarch64 work to ease adoption of instruction referencing, as in the meta comment on D104520) Differential Revision: https://reviews.llvm.org/D104520	2021-07-26 15:26:15 +01:00
Fraser Cormack	f924a3d474	[SelectionDAG] Support scalable-vector splats in yet more cases This patch extends support for (scalable-vector) splats in the DAGCombiner via the `ISD::matchBinaryPredicate` function, which enable a variety of simple combines of constants. Users of this function may now have to distinguish between `BUILD_VECTOR` and `SPLAT_VECTOR` vector operands. The way of dealing with this in-tree follows the approach added for `ISD::matchUnaryPredicate` implemented in D94501. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106575	2021-07-26 10:15:08 +01:00
Esme-Yi	0d3e4d9d4d	[Debug-Info][llvm-dwarfdump] Don't use DW_FORM_data4/8 to encode the constants for DW_AT_data_member_location. Summary: In DWARF v3, DW_FORM_data4/8 in DW_AT_data_member_location are interpreted as location list pointers. Interpreting constants as pointers is not expected, so we use DW_FORM_udata to encode the constants. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D105687	2021-07-26 03:47:02 +00:00
Simon Pilgrim	478b22d95a	[CGP] despeculateCountZeros - Don't create is-zero branch if cttz/ctlz source is known non-zero If value tracking can confirm that the cttz/ctlz source is known non-zero then we don't need to create a branch (which DAG will struggle to recover from). Differential Revision: https://reviews.llvm.org/D106685	2021-07-24 13:11:49 +01:00
Simon Pilgrim	c261a06b7a	[DAG] Add initial SelectionDAG::isGuaranteedNotToBeUndefOrPoison framework (PR51129) I've setup the basic framework for the isGuaranteedNotToBeUndefOrPoison call and updated DAGCombiner::visitFREEZE to use it, further Opcodes can be handled when we have test coverage. I'm not aware of any vector test freeze coverage so the DemandedElts (and the Depth) args are not being used yet - but they are in place. SelectionDAG::isGuaranteedNotToBePoison wrappers have also been added. Differential Revision: https://reviews.llvm.org/D106668	2021-07-24 11:36:35 +01:00
David Truby	1528a4d400	[llvm][sve] Lowering for VLS truncating stores This adds custom lowering for truncating stores when operating on fixed length vectors in SVE. It also includes a DAG combine to fold extends followed by truncating stores into non-truncating stores in order to prevent this pattern appearing once truncating stores are supported. Currently truncating stores are not used in certain cases where the size of the vector is larger than the target vector width. Differential Revision: https://reviews.llvm.org/D104471	2021-07-23 14:04:55 +01:00
Paulo Matos	46667a1003	[WebAssembly] Implementation of global.get/set for reftypes in LLVM IR Reland of `31859f896`. This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and lowering methods for load and stores of reference types from IR globals. Once the lowering creates the new nodes, tablegen pattern matches those and converts them to Wasm global.get/set. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D104797	2021-07-22 22:07:24 +02:00
Roman Lebedev	af8fa36bf0	[NFCI][TLI] prepare[US]REMEqFold(): don't add nonsensical 'exact' flag to rotates created As pointed out by Craig Topper.	2021-07-22 23:02:58 +03:00
Simon Tatham	bd41136746	[clang] Use i64 for the !srcloc metadata on asm IR nodes. This is part of a patch series working towards the ability to make SourceLocation into a 64-bit type to handle larger translation units. !srcloc is generated in clang codegen, and pulled back out by llvm functions like AsmPrinter::emitInlineAsm that need to report errors in the inline asm. From there it goes to LLVMContext::emitError, is stored in DiagnosticInfoInlineAsm, and ends up back in clang, at BackendConsumer::InlineAsmDiagHandler(), which reconstitutes a true clang::SourceLocation from the integer cookie. Throughout this code path, it's now 64-bit rather than 32, which means that if SourceLocation is expanded to a 64-bit type, this error report won't lose half of the data. The compiler will tolerate both of i32 and i64 !srcloc metadata in input IR without faulting. Test added in llvm/MC. (The semantic accuracy of the metadata is another matter, but I don't know of any situation where that matters: if you're reading an IR file written by a previous run of clang, you don't have the SourceManager that can relate those source locations back to the original source files.) Original version of the patch by Mikhail Maltsev. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D105491	2021-07-22 10:24:52 +01:00
ShihPo Hung	8d86562e5f	[RegisterCoalescer] Make resolveConflicts aware of earlyclobber Prior to this patch, it skipped the instruction defining VNI when checking if the tainted lanes are used. In the given example, VRGATHER is an illegal instruction because its DstReg overlaps with SrcReg. Therefore we need to check the defining instruction as well when there is an earlyclobber constraint. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D105684	2021-07-22 12:11:10 +08:00
Stanislav Mekhanoshin	c54c76037b	Prevent dead uses in register coalescer after rematerialization The coalescer does not check if register uses are available at the point of rematerialization. If it attempts to rematerialize an instruction with such uses it can end up with use without a def. LiveRangeEdit does such check during rematerialization, so just call LiveRangeEdit::allUsesAvailableAt() to avoid the problem. Differential Revision: https://reviews.llvm.org/D106396	2021-07-21 15:19:55 -07:00
Eli Friedman	0ca46a1757	[SelectionDAG] Fix the representation of ISD::STEP_VECTOR. The existing rule about the operand type is strange. Instead, just say the operand is a TargetConstant with the right width. (Legalization ignores TargetConstants, so it doesn't matter if that width is legal.) Highlights: 1. I had to substantially rewrite the AArch64 isel patterns to expect a TargetConstant. Nothing too exotic, but maybe a little hairy. Maybe worth considering a target-specific node with some dagcombines instead of this complicated nest of isel patterns. 2. Our behavior on RV32 for vectors of i64 has changed slightly. In particular, we correctly preserve the width of the arithmetic through legalization. This changes the DAG a bit. Maybe room for improvement here. 3. I explicitly defined the behavior around overflow. This is necessary to make the DAGCombine transforms legal, and I don't think it causes any practical issues. Differential Revision: https://reviews.llvm.org/D105673	2021-07-21 10:58:40 -07:00
Jon Roelofs	4de74a7c4d	[MachineVerifier] Make INSERT_SUBREG diagnostic respect operand 2 subregs This came out of post-commit review: https://reviews.llvm.org/D105953#inline-1012919 Thanks uabelho!	2021-07-21 08:47:17 -07:00
Guillaume Chatelet	d6da02d952	[llvm] Add enum iteration to Sequence This patch allows iterating typed enum via the ADT/Sequence utility. It also changes the original design to better separate concerns: - `StrongInt` only deals with safe `intmax_t` operations, - `SafeIntIterator` presents the iterator and reverse iterator interface but only deals with safe `StrongInt` internally. - `iota_range` only deals with `SafeIntIterator` internally. This design ensures that operations are always valid. In particular, "Out of bounds" assertions fire when: - the `value_type` is not representable as an `intmax_t` - iterator operations make internal computation underflow/overflow - the internal representation cannot be converted back to `value_type` Differential Revision: https://reviews.llvm.org/D106279	2021-07-21 12:48:53 +00:00
Tim Northover	291e0daa6e	AArch64: support 8 & 16-bit atomic operations in GlobalISel We have SelectionDAG patterns for 8 & 16-bit atomic operations, but they assume the value types will have been legalized to 32-bits. So this adds the ability to widen them to both AArch64 & generic GISel infrastructure.	2021-07-21 09:35:14 +01:00
Jon Roelofs	be8738324c	[MachineVerifier] Diagnose invalid INSERT_SUBREGs Differential revision: https://reviews.llvm.org/D105953	2021-07-20 17:32:29 -07:00
Jon Roelofs	a14b4e34a4	[GlobalISel] Tail call memcpy/memmove/memset even in the presence of copies Differentail revision: https://reviews.llvm.org/D105382	2021-07-20 17:04:33 -07:00
Jon Roelofs	afaf92826e	[GlobalISel] Mark memcpy/memmove/memset as thisreturn https://clang.godbolt.org/z/9az64j8W6 rdar://77466123 Differential revision: https://reviews.llvm.org/D105370	2021-07-20 17:04:33 -07:00
Fangrui Song	3924877932	[IR] Rename `comdat noduplicates` to `comdat nodeduplicate` In the textual format, `noduplicates` means no COMDAT/section group deduplication is performed. Therefore, if both sets of sections are retained, and they happen to define strong external symbols with the same names, there will be a duplicate definition linker error. In PE/COFF, the selection kind lowers to `IMAGE_COMDAT_SELECT_NODUPLICATES`. The name describes the corollary instead of the immediate semantics. The name can cause confusion to other binary formats (ELF, wasm) which have implemented/ want to implement the "no deduplication" selection kind. Rename it to be clearer. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D106319	2021-07-20 12:47:10 -07:00
Stefan Pintilie	1a6dc92be7	[PowerPC] Inefficient register allocation of ACC registers results in many copies. ACC registers are a combination of four consecutive vector registers. If the vector registers are assigned first this often forces a number of copies to appear just before the ACC register is created. If the ACC register is assigned first then fewer copies are generated when the vector registers are assigned. This patch tries to force the register allocator to assign the ACC registers first and then the UACC registers and then the vector pair registers. It does this by changing the priority of the register classes. This patch also adds hints to help the register allocator assign UACC registers from known ACC registers and vector pair registers from known UACC registers. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D105854	2021-07-20 10:53:40 -05:00
Jeremy Morse	241f3e386c	[DebugInfo][InstrRef] Fix a broken substitution method, add test coverage This patch fixes a clearly-broken function that I absent-mindedly bodged many months ago. Over in D85749 I landed the substituteDebugValuesForInst, that creates substitution records for all the def operands from one debug-labelled instruction to the new one. Unfortunately it would crash if the two instructions had different numbers of operands; I tried to fix this in `537f0fbe82` by adding a "max operand" parameter to the method, but then didn't actually change the loop bound to take account of this. It passed all the tests because.... well there wasn't any real test coverage of this method. This patch fixes up the loop to be bounded by the MaxOperand bound; and adds test coverage for the x86-fixup-LEAs calls to this method, so that it's actually tested. Differential Revision: https://reviews.llvm.org/D105820	2021-07-20 11:45:13 +01:00
Matt Arsenault	c9ec807b11	CodeGen: Make MachineOptimizationRemarkEmitterPass a CFG analysis This avoids rerunning it a few times.	2021-07-19 21:08:26 -04:00
Matt Arsenault	904dab55ab	GlobalISel: Remove some mystery code that clears isReturned I don't understand what this is going for, and haven't found an analog in DAG code. No tests fail with this removed.	2021-07-19 20:21:05 -04:00
Amy Huang	fd972bb9fd	Revert "[llvm][sve] Lowering for VLS truncating stores" because it causes a seg fault (see https://reviews.llvm.org/D104471). This reverts commit `c305557acd`.	2021-07-19 11:03:33 -07:00
Amara Emerson	03cdb5221d	[GlobalISel] Fix load-or combine moving loads across potential aliasing stores. Although this combine checks that there's no load folding barriers between the loads that it's trying to merge, it was inserting the load at the MIRBuilder's default insertion point, which is the G_OR use inst. This was causing a miscompile in the test suite's SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-bswap-2 Differential Revision: https://reviews.llvm.org/D106251	2021-07-19 10:23:23 -07:00
Craig Topper	50302feb1d	[SelectionDAG][RISCV] Use isSExtCheaperThanZExt to control whether sext or zext is used for constant folding any_extend. RISCV would prefer a sign extended constant since that works better with our constant materialization. We have an existing TLI hook we use to control sign extension of setcc operands in type legalization. That hook happens to do the right check we need here, but might be straying from its original purpose. With only RISCV defining this hook in tree, I wasn't sure if it was worth adding another hook with identical behavior. This is an alternative to D105785 where I tried to handle this in the RISCV backend by not creating ANY_EXTENDs in some places. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D105918	2021-07-19 09:25:28 -07:00
Matt Arsenault	67d6132463	GlobalISel: Preserve memory types for implicit sret load/stores	2021-07-19 11:52:42 -04:00
Matt Arsenault	9236125ec8	GlobalISel: Preserve LLT when bitcasting loads and stores This also avoids improperly legalizing some truncating vector stores.	2021-07-19 11:30:14 -04:00
Roman Lebedev	5b51bd1878	[TLI] prepareSREMEqFold(): use correct VT for the final VSELECT (PR51133) We were using the wrong VT for this final VSELECT, it should be in the final comparison VT, not the source value's VT. Fixes https://bugs.llvm.org/show_bug.cgi?id=51133	2021-07-19 16:44:00 +03:00
Eli Friedman	6601be4419	[X86] Remove incorrect use of known bits in shuffle simplification. This reverts commit `2a419a0b99`. The result of a shufflevector must not propagate poison from any element other than the one noted in the shuffle mask. The regressions outside of fptoui-may-overflow.ll can probably be recovered some other way; for example, using isGuaranteedNotToBePoison. See discussion on https://reviews.llvm.org/D106053 for more background. Differential Revision: https://reviews.llvm.org/D106222	2021-07-18 18:13:11 -07:00
Simon Pilgrim	fd7a54c709	[DAG] DAGCombiner::foldSelectOfBinops - propagate the common flags to the merged binop As discussed on D106058 - we were failing to keep the common flags. This matches the behaviour in InstCombinerImpl::foldSelectOpOp.	2021-07-18 18:38:59 +01:00
Simon Pilgrim	5643be96bc	[DAG] Enable foldSelectOfBinops on select(setcc(),binop(),binop()) calls	2021-07-18 18:38:59 +01:00
Simon Pilgrim	1a6a8443c2	[DAG] Move select(cc, binop(), binop()) folds into DAGCombiner::foldSelectOfBinops. NFCI. I'm going to extend the functionality started in D106058 so move the folds into their own method to reduce the amount of code in DAGCombiner::visitSELECT	2021-07-18 14:54:41 +01:00
Amara Emerson	4c55cdb00a	[GlobalISel] Fix known bits for G_BSWAP and B_BITREVERSE not doing anything. llvm::KnownBits::byteSwap() and reverse() don't modify in-place, so we weren't actually computing anything. This was causing a miscompile on an arm64 stage2 bootstrap clang build.	2021-07-17 23:07:16 -07:00
Kazu Hirata	1993b73755	[Analaysis, CodeGen] Remove getHotSucc (NFC) These functions seem to be unused for at least 5 years.	2021-07-17 07:31:36 -07:00
Amara Emerson	9637848f51	[GlobalISel] Fix non-pow-2 legalization of s56 stores. s56 stores are broken down into s32 + s24 stores. During this step both of those new stores use an anyextended s64 value, resulting in truncating stores. With s56, the s24 requires another lower step to make it legal, and we were crashing because we didn't expect non-pow-2 stores to also be truncating as well. Differential Revision: https://reviews.llvm.org/D106183	2021-07-16 13:29:49 -07:00
Guozhi Wei	5609c8b607	[X86FixupLEAs] Try again to transform the sequence LEA/SUB to SUB/SUB This patch transforms the sequence lea (reg1, reg2), reg3 sub reg3, reg4 to two sub instructions sub reg1, reg4 sub reg2, reg4 Similar optimization can also be applied to LEA/ADD sequence. The modifications to TwoAddressInstructionPass is to ensure the operands of ADD instruction has expected order (the dest register of LEA should be src register of ADD). Differential Revision: https://reviews.llvm.org/D104684	2021-07-16 10:16:03 -07:00
Jon Roelofs	6c40abb6fe	Revert "[MachineVerifier] Diagnose invalid INSERT_SUBREGs" This reverts commit `dd57ba1a17`. It broke some tests: http://45.33.8.238/linux/51314/step_12.txt	2021-07-16 09:53:55 -07:00
Simon Pilgrim	95995673d1	[DAG] SelectionDAG::MaskedElementsAreZero - assert we're calling with a vector. NFCI. Add an assertion that we've calling MaskedElementsAreZero with a vector op and that the DemandedElts arg is a matching width. Makes the error a lot easier to grok when something else accidentally gets used.	2021-07-16 17:43:35 +01:00
Jon Roelofs	dd57ba1a17	[MachineVerifier] Diagnose invalid INSERT_SUBREGs Differential revision: https://reviews.llvm.org/D105953	2021-07-16 09:43:12 -07:00
Matt Arsenault	5a0d940f2a	GlobalISel: Preserve memory type for memset expansion	2021-07-16 11:41:32 -04:00
Matt Arsenault	f57f8f7ccc	GlobalISel: Remove dead function	2021-07-16 08:59:25 -04:00
Jeremy Morse	231bf52119	[InstrRef][FastISel] Support emitting DBG_INSTR_REF from fast-isel If you attach __attribute__((optnone)) to a function when using optimisations, that function will use fast-isel instead of the usual SelectionDAG method. This is a problem for instruction referencing, because it means DBG_VALUEs of virtual registers will be created, triggering some safety assertions in LiveDebugVariables. Those assertions exist to detect exactly this scenario, where an unexpected piece of code is generating virtual register references in instruction referencing mode. Fix this by transforming the DBG_VALUEs created by fast-isel into half-formed DBG_INSTR_REFs, after which they get patched up in finalizeDebugInstrRefs. The test modified adds a fast-isel mode to the instruction referencing isel test. Differential Revision: https://reviews.llvm.org/D105694	2021-07-16 13:56:15 +01:00
Matt Arsenault	a2d7ace3e3	GlobalISel: Surface offsets parameter from ComputeValueVTs	2021-07-15 19:11:40 -04:00
Matt Arsenault	e91da668d0	GlobalISel: Track argument pointeriness with arg flags Since we're still building on top of the MVT based infrastructure, we need to track the pointer type/address space on the side so we can end up with the correct pointer LLTs when interpreting CCValAssigns.	2021-07-15 19:11:40 -04:00
Amara Emerson	4e3dc6b8dd	GlobalISel: Introduce GenericMachineInstr classes and derivatives for idiomatic LLVM RTTI. This adds some level of type safety, allows helper functions to be added for specific opcodes for free, and also allows us to succinctly check for class membership with the usual dyn_cast/isa/cast functions. To start off with, add variants for the different load/store operations with some places using it. Differential Revision: https://reviews.llvm.org/D105751	2021-07-15 15:21:57 -07:00
Jessica Paquette	5da0f9ab61	[GlobalISel] Fix infinite loop in reassociationCanBreakAddressingModePattern It didn't update the opcode while walking through G_INTTOPTR/G_PTRTOINT. Differential Revision: https://reviews.llvm.org/D106080	2021-07-15 10:09:07 -07:00
Simon Pilgrim	0aece73aba	[DAG] Fold select(cond,binop(x,y),binop(x,z)) -> binop(x,select(cond,y,z)) Similar to the folds performed in InstCombinerImpl::foldSelectOpOp, this attempts to push a select further up to help merge a pair of binops. I'm primarily interested in select(cond,add(x,y),add(x,z)) folds to help expose pointer math (see https://bugs.llvm.org/show_bug.cgi?id=51069 etc.) but I've tried to use the more generic isBinOp(). Differential Revision: https://reviews.llvm.org/D106058	2021-07-15 16:08:30 +01:00
Tim Northover	5d7632ee72	MachO: don't emit L... private symbols in do_not_dead_strip sections. The linker can sometimes drop the do_not_dead_strip if it can't associate the atom with a symbol (the other place to specify no dead-stripping in MachO files).	2021-07-15 14:40:43 +01:00
Djordje Todorovic	fa2daaeff8	[2/2][RemoveRedundantDebugValues] Add a Pass that removes redundant DBG_VALUEs This patch adds the forward scan for finding redundant DBG_VALUEs. This analysis aims to remove redundant DBG_VALUEs by going forward in the basic block by considering the first DBG_VALUE as a valid until its first (location) operand is not clobbered/modified. For example: (1) DBG_VALUE $edi, !"var1", ... (2) <block of code that does affect $edi> (3) DBG_VALUE $edi, !"var1", ... ... in this case, we can remove (3). Differential Revision: https://reviews.llvm.org/D105280	2021-07-15 00:08:31 -07:00
Kai Luo	b9c3941cd6	[PowerPC] Generate inlined quadword lock free atomic operations via AtomicExpand This patch uses AtomicExpandPass to implement quadword lock free atomic operations. It adopts the method introduced in https://reviews.llvm.org/D47882, which expand atomic operations post RA to avoid spilling that might prevent LL/SC progress. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D103614	2021-07-15 01:12:09 +00:00
Stanislav Mekhanoshin	76b7d3432e	[AMDGPU] Add TII::isIgnorableUse() to allow VOP rematerialization Any def of EXEC prevents rematerialization of any VOP instruction because of the physreg use. Create a callback to check if the physreg use can be ingored to allow rematerialization. Differential Revision: https://reviews.llvm.org/D105836	2021-07-14 13:03:58 -07:00
Eli Friedman	1e30bf8621	[SelectionDAG] Add an overload of getStepVector that assumes step 1. This is mostly a minor convenience, but the pattern seems frequent enough to be worthwhile (and we'll probably add more uses in the future). Differential Revision: https://reviews.llvm.org/D105850	2021-07-14 11:37:01 -07:00
Matt Arsenault	47269da5d8	GlobalISel: Handle lowering non-power-of-2 extloads	2021-07-14 11:54:11 -04:00
Djordje Todorovic	df686842bc	[RemoveRedundantDebugValues] Add a Pass that removes redundant DBG_VALUEs This new MIR pass removes redundant DBG_VALUEs. After the register allocator is done, more precisely, after the Virtual Register Rewriter, we end up having duplicated DBG_VALUEs, since some virtual registers are being rewritten into the same physical register as some of existing DBG_VALUEs. Each DBG_VALUE should indicate (at least before the LiveDebugValues) variables assignment, but it is being clobbered for function parameters during the SelectionDAG since it generates new DBG_VALUEs after COPY instructions, even though the parameter has no assignment. For example, if we had a DBG_VALUE $regX as an entry debug value representing the parameter, and a COPY and after the COPY, DBG_VALUE $virt_reg, and after the virtregrewrite the $virt_reg gets rewritten into $regX, we'd end up having redundant DBG_VALUE. This breaks the definition of the DBG_VALUE since some analysis passes might be built on top of that premise..., and this patch tries to fix the MIR with the respect to that. This first patch performs bacward scan, by trying to detect a sequence of consecutive DBG_VALUEs, and to remove all DBG_VALUEs describing one variable but the last one: For example: (1) DBG_VALUE $edi, !"var1", ... (2) DBG_VALUE $esi, !"var2", ... (3) DBG_VALUE $edi, !"var1", ... ... in this case, we can remove (1). By combining the forward scan that will be introduced in the next patch (from this stack), by inspecting the statistics, the RemoveRedundantDebugValues removes 15032 instructions by using gdb-7.11 as a testbed. Differential Revision: https://reviews.llvm.org/D105279	2021-07-14 04:29:42 -07:00
Ruiling Song	40e3df2a1b	[RegisterCoalescer] Resolve conflict based on liveness of subregister Currently we are resolving lane/subregister conflict by visiting instructions sequentially in current block to see whether there is any use of the tainted lanes. To save compile time, we are not doing further check in successor blocks. This sounds reasonable without subgregister liveness. But since we have added subregister liveness tracking capability to register coalescer, we can easily determine whether we have subregister liveness conflict by checking subranges. This would help coalescing more COPYs for target that enables subregister liveness tracking. Reviewed by: arsenm, qcolombet Differential Revision: https://reviews.llvm.org/D104509	2021-07-14 14:43:22 +08:00
Hongtao Yu	74b99b5c2e	[CSSPGO] Do not import pseudo probe desc in thinLTO Previously we reliedy on pseudo probe descriptors to look up precomputed GUID during probe emission for inlined probes. Since we are moving to always using unique linkage names, GUID for functions can be computed in place from dwarf names. This eliminates the need of importing pseudo probe descs in thinlto, since those descs should be emitted by the original modules. This significantly reduces thinlto memory footprint in some extreme case where the number of imported modules for a single module is massive. Test Plan: Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D105248	2021-07-13 18:26:36 -07:00
Matt Arsenault	eebe841a47	RegAlloc: Allow targets to split register allocation AMDGPU normally spills SGPRs to VGPRs. Previously, since all register classes are handled at the same time, this was problematic. We don't know ahead of time how many registers will be needed to be reserved to handle the spilling. If no VGPRs were left for spilling, we would have to try to spill to memory. If the spilled SGPRs were required for exec mask manipulation, it is highly problematic because the lanes active at the point of spill are not necessarily the same as at the restore point. Avoid this problem by fully allocating SGPRs in a separate regalloc run from VGPRs. This way we know the exact number of VGPRs needed, and can reserve them for a second run. This fixes the most serious issues, but it is still possible using inline asm to make all VGPRs unavailable. Start erroring in the case where we ever would require memory for an SGPR spill. This is implemented by giving each regalloc pass a callback which reports if a register class should be handled or not. A few passes need some small changes to deal with leftover virtual registers. In the AMDGPU implementation, a new pass is introduced to take the place of PrologEpilogInserter for SGPR spills emitted during the first run. One disadvantage of this is currently StackSlotColoring is no longer used for SGPR spills. It would need to be run again, which will require more work. Error if the standard -regalloc option is used. Introduce new separate -sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be controlled individually. PBQB is not currently supported, so this also prevents using the unhandled allocator.	2021-07-13 18:49:29 -04:00
Guillaume Chatelet	2c47b8847e	Revert "[llvm] Add enum iteration to Sequence" This reverts commit `a006af5d6e`.	2021-07-13 16:44:42 +00:00
Guillaume Chatelet	a006af5d6e	[llvm] Add enum iteration to Sequence This patch allows iterating typed enum via the ADT/Sequence utility. Differential Revision: https://reviews.llvm.org/D103900	2021-07-13 16:22:19 +00:00
Matt Arsenault	222fde1eec	GlobalISel: Use extension instead of merge with undef in common case This fixes not respecting signext/zeroext in these cases. In the anyext case, this avoids a larger merge with undef and should be a better canonical form. This should also handle this if a merge is needed, but I'm not aware of a case where that can happen. In a future change this will also allow AMDGPU to drop some custom code without introducing regressions.	2021-07-13 11:04:47 -04:00
Matt Arsenault	77a608d9de	GlobalISel: Remove getIntrinsicID utility function This is redundant with a method directly on MachineInstr	2021-07-13 11:04:10 -04:00
Qiu Chaofan	954a15d639	[SelectionDAG] Check use before combining into USUBSAT Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D105789	2021-07-13 14:50:26 +08:00
Jessica Paquette	47d0780f45	[GlobalISel] Handle more types in narrowScalar for eq/ne G_ICMP Generalize the existing eq/ne case using `extractParts`. The original code only handled narrowings for types of width 2n->n. This generalization allows for any type that can be broken down by `extractParts`. General overview is: - Loop over each narrow-sized part and do exactly what the 2-register case did. - Loop over the leftover-sized parts and do the same thing - Widen the leftover-sized XOR results to the desired narrow size - OR that all together and then do the comparison against 0 (just like the old code) This shows up a lot when building clang for AArch64 using GlobalISel, so it's worth fixing. For the sake of simplicity, this doesn't handle the non-eq/ne case yet. Also remove the code in this case that notifies the observer; we're just going to delete MI anyway so talking to the observer shouldn't be necessary. Differential Revision: https://reviews.llvm.org/D105161	2021-07-12 22:18:50 -07:00
Arthur Eubanks	7987c46273	[OpaquePtr][ISel] Use ArgListEntry::IndirectType more	2021-07-12 21:14:35 -07:00
Eli Friedman	ec1cdee6aa	[SelectionDAG][RISCV] Support @llvm.vscale.i64() on 32-bit targets. Not really useful on its own, but D105673 depends on it. Differential Revision: https://reviews.llvm.org/D105840	2021-07-12 14:53:42 -07:00
Jinsong Ji	28fb69e00a	[AIX] Emit version string in .file directive AIX .file directive support including compiler version string. https://www.ibm.com/docs/en/aix/7.2?topic=ops-file-pseudo-op This patch adds the support so that it will be easier to identify build compiler in objects. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D105743	2021-07-12 17:03:52 +00:00
Bradley Smith	112c09039b	[SelectionDAG] Simplify PromoteIntRes_INSERT_SUBVECTOR to only handle result Let other parts of legalization handle the rest of the node, this allows re-use of existing optimizations elsewhere. Differential Revision: https://reviews.llvm.org/D105624	2021-07-12 15:20:44 +00:00

... 9 10 11 12 13 ...

31928 Commits