llvm-project

Commit Graph

Author	SHA1	Message	Date
Vasileios Porpodas	6f9640d6a3	[RegAlloc] Add a complexity limit in growRegion() to cap compilation time. growRegion() does not scale in code with BBs with a very large number of edges. In such code growRegion() becomes a compile-time bottleneck, consuming 60% of the total compilation time. This patch adds a limit to the complexity of growRegion() by incrementing a counter in each iteration. We bail out once the limit is reached. Differential Revision: https://reviews.llvm.org/D120752	2022-03-03 11:31:07 -08:00
Paul Robinson	7b85f0f32f	[PS4] isPS4 and isPS4CPU are not meaningfully different	2022-03-03 11:36:59 -05:00
Sanjay Patel	e9302bf7ef	[SDAG] try harder to remove a rotate from X == 0 https://alive2.llvm.org/ce/z/mJP7XP This can be viewed as expanding the compare into and/or-of-compares: https://alive2.llvm.org/ce/z/bkZYWE followed by reduction of each compare. This could be extended in several ways: 1. There's a (X & Y) == -1 sibling. 2. We can recurse through more than 1 'or'. 3. The fold could be generalized beyond rotates - any operation that only changes the order of bits (bswap, bitreverse). This is a transform noted in D111530.	2022-03-03 09:25:46 -05:00
Sanjay Patel	c33dbc2a2d	[SDAG] refactor foldSetCCWithRotate; NFC There are more potential optimizations to make here, so rearrange to make it easier to append those.	2022-03-02 16:42:05 -05:00
Craig Topper	ab7a7cc1dd	Revert "[LegalizeTypes][VP] Add splitting and widening support for VP_FNEG." This reverts commit `ac93f95861`. Committed by accident.	2022-03-02 10:00:22 -08:00
Craig Topper	324c0a7206	[SelectionDAG][RISCV] Emit a canonical sign bit test from ExpandIntRes_ABS. Instead of emitting 0 > Hi, emit Hi < 0. If Hi needs to be expanded again this will allow the special case for sign bit tests in ExpandIntOp_SETCC to trigger. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120761	2022-03-02 09:47:26 -08:00
Craig Topper	ac93f95861	[LegalizeTypes][VP] Add splitting and widening support for VP_FNEG. Differential Revision: https://reviews.llvm.org/D120785	2022-03-02 09:47:05 -08:00
Daniel McIntosh	d636b76eca	[CodeGen] Use AdjustStackOffset for Callee Saved Registers in PEI::calculateFrameObjectOffsets Also, changes how the CSR loop is indexed, which should avoid bugs like the one fixed by rG4a57bb5a3b74bdad9b0518009a7d7ac7ca2ac650 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D120668	2022-03-02 11:41:12 -05:00
Nikita Popov	6fde043951	[MachineSink] Disable if there are any irreducible cycles This is an alternative to D120330, which disables MachineSink for functions with irreducible cycles entirely. This avoids both the correctness problem, and ensures we don't perform non-profitable sinks into cycles. At the same time, it may also disable profitable sinks in the same function. This can be made more precise by using MachineCycleInfo in the future. Fixes https://github.com/llvm/llvm-project/issues/53990. Differential Revision: https://reviews.llvm.org/D120800	2022-03-02 16:57:29 +01:00
Simon Pilgrim	5cce97d61e	[DAG] isSplatValue - improve ISD::VECTOR_SHUFFLE splat detection Currently we only check for splat shuffles, this extends it to see if the source operand is a splat across the demanded elts based upon the shuffle mask	2022-03-02 15:32:24 +00:00
spupyrev	bcdc047731	speeding up ext-tsp for huge instances Differential Revision: https://reviews.llvm.org/D120780	2022-03-02 07:17:48 -08:00
Simon Pilgrim	df0a2b4f30	[DAG] SelectionDAG::isSplatValue - add initial BITCAST handling This patch adds support for recognising vector splats by peeking through bitcasts to vectors with smaller element types - if all the offset subelements are splats then the bitcasted vector is a splat as well. We don't have great coverage for isSplatValue so I've made this pretty specific to the use case I'm trying to fix - regressions in some vXi64 vector shift by splat cases that 32-bit x86 doesn't recognise because the shift amount buildvector has been type legalised to v2Xi32. We can add further support (floats, bitcast from larger element types, undef elements) when we have actual test coverage. Differential Revision: https://reviews.llvm.org/D120553	2022-03-02 11:25:51 +00:00
Xiang1 Zhang	65588a0776	Revert "TLS loads opimization (hoist)" Revert for more reviews This reverts commit `30e612ebdf`.	2022-03-02 14:10:11 +08:00
Mircea Trofin	cb2160760e	[nfc][codegen] Move RegisterBank[Info].h under CodeGen This wraps up from D119053. The 2 headers are moved as described, fixed file headers and include guards, updated all files where the old paths were detected (simple grep through the repo), and `clang-format`-ed it all. Differential Revision: https://reviews.llvm.org/D119876	2022-03-01 21:53:25 -08:00
Xiang1 Zhang	30e612ebdf	TLS loads opimization (hoist) Reviewed By: Wang Pheobe, Topper Craig Differential Revision: https://reviews.llvm.org/D120000	2022-03-02 10:37:24 +08:00
Craig Topper	8787726609	[LegalizeTypes] Remove incomplete StrictFP support from SplitVecRes_UnaryOp. NFC There is no handling of Chain operands in this function so it can't work. There's a separate splitting function for all strict fp nodes.	2022-03-01 15:43:57 -08:00
Zequan Wu	5c9e20d7d0	[PDB] Add char8_t type Differential Revision: https://reviews.llvm.org/D120690	2022-03-01 13:39:51 -08:00
serge-sans-paille	a494ae43be	Cleanup includes: TransformsUtils Estimation on the impact on preprocessor output: before: 1065307662 after: 1064800684 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120741	2022-03-01 21:00:07 +01:00
Craig Topper	bf8054644d	[DAGCombiner] Don't expand (neg (abs x)) if the abs has an additional user. If the types aren't legal, the expansions may get type legalized in a different way preventing code sharing. If the type is legal, we will share some instructions between the two expansions, but we will need an extra register. Since we don't appear to fold (neg (sub A, B)) if the sub has an additional user, I think it makes sense not to expand NABS. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120513	2022-03-01 07:32:07 -08:00
Jeremy Morse	ab49dce01f	[DebugInfo][InstrRef][NFC] Use unique_ptr instead of raw pointers InstrRefBasedLDV allocates some big tables of ValueIDNum, to store live-in and live-out block values in, that then get passed around as pointers everywhere. This patch wraps the allocation in a std::unique_ptr, names some types based on unique_ptr, and passes references to those around instead. There's no functional change, but it makes it clearer to the reader that references to these tables are borrowed rather than owned, and we get some extra validity assertions too. Differential Revision: https://reviews.llvm.org/D118774	2022-03-01 12:49:50 +00:00
Sam Parker	20d75059a2	Revert "[TypePromotion] Avoid some unnecessary truncs" This reverts commit `281d29b8fe`. Report of a miscompilation and awaiting a reproducer.	2022-03-01 08:59:52 +00:00
Phoebe Wang	e03d216c28	[X86] Use bit test instructions to optimize some logic atomic operations This is to match GCC's optimizations: https://gcc.godbolt.org/z/3odh9e7WE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120199	2022-03-01 09:57:08 +08:00
Sanjay Patel	69684b84c6	[SDAG] fold (rotate X) eq/ne (0/-1) This is the SDAG equivalent of an instcombine transform added with: `fd807601a7` This is another step towards solving #49541 and part of an alternative set of more general transforms than what is proposed in D111530. https://alive2.llvm.org/ce/z/ToxaE8	2022-02-27 11:31:19 -05:00
Sanjay Patel	acb96ffd14	[SDAG] fold bitwise logic with shifted operands LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z https://alive2.llvm.org/ce/z/QmR9rR This is a reassociation + factoring fold. The common shift operation is moved after a bitwise logic op on 2 input operands. We get simpler cases of these patterns in IR, but I suspect we would miss all of these exact tests in IR too. We also handle the simpler form of this plus several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands(). This is a partial implementation of a transform suggested in D111530 (only handles 'or' bitwise logic as a first step - need to stamp out more tests for other opcodes). Several of the same tests added for D111530 are altered here (but not fully optimized). I'm not sure yet if this would help/hinder that patch, but this should be an improvement for all tests added with `ecf606cb43` since it removes a shift operation in those examples. Differential Revision: https://reviews.llvm.org/D120516	2022-02-27 09:54:12 -05:00
Simon Pilgrim	fadd20f80d	[DAG] Ensure type is legal for bswap(shl(x,c)) -> zext(bswap(trunc(shl(x,c-bw/2)))) fold As reported on D120192	2022-02-27 11:25:22 +00:00
Benjamin Kramer	1de11fe360	Use RegisterInfo::regsOverlaps instead of checking aliases This is both less code and faster since it doesn't have to expand all the sub & superreg sets. NFCI.	2022-02-26 20:32:12 +01:00
Jameson Nash	c4b1a63a1b	mark getTargetTransformInfo and getTargetIRAnalysis as const Seems like this can be const, since Passes shouldn't modify it. Reviewed By: wsmoses Differential Revision: https://reviews.llvm.org/D120518	2022-02-25 14:30:44 -05:00
Rong Xu	ccbbb4f6c7	[Sample-PGO] Emit FS discriminators only when -fdebug-info-for-profiling is set IR level addDiscriminator pass is guarded by DebugInfoForProfiling (set by option -fdebug-info-for-profiling). This patch syncs the logic for the MIR and IR level implementations. Differential Revision: https://reviews.llvm.org/D120536	2022-02-25 09:41:17 -08:00
Nikita Popov	87ebd9a36f	[IR] Use CallBase::getParamElementType() (NFC) As this method now exists on CallBase, use it rather than the one on AttributeList.	2022-02-25 10:01:58 +01:00
Rahman Lavaee	aeec9671fb	Revert "Encode address offsets of basic blocks relative to the end of the previous basic blocks." This reverts commit `029283c1c0`. The code in `ELFFile::decodeBBAddrMap` was not changed in the submitted patch. Differential Revision: https://reviews.llvm.org/D120457	2022-02-24 13:31:15 -08:00
Simon Pilgrim	370ebc9d9a	[DAG] Attempt to fold bswap(shl(x,c)) -> zext(bswap(trunc(shl(x,c-bw/2)))) If the shl is at least half the bitwidth (i.e. the lower half of the bswap source is zero), then we can reduce the shift and perform the bswap at half the bitwidth and just zero extend. Based off PR51391 + PR53867 Differential Revision: https://reviews.llvm.org/D120192	2022-02-24 19:33:51 +00:00
Sanjay Patel	4a3708cd6b	[SDAG] remove shift that is redundant with part of funnel shift This is the SDAG translation of D120253 : https://alive2.llvm.org/ce/z/qHpmNn The SDAG nodes can have different operand types than the result value. We can see an example of that with AArch64 - the funnel shift amount is an i64 rather than i32. We may need to make that match even more flexible to handle post-legalization nodes, but I have not stepped into that yet. Differential Revision: https://reviews.llvm.org/D120264	2022-02-24 11:25:46 -05:00
Jay Foad	719bac55df	[MIRParser] Diagnose too large align values in MachineMemOperands When parsing MachineMemOperands, MIRParser treated the "align" keyword the same as "basealign". Really "basealign" should specify the alignment of the MachinePointerInfo base value, and "align" should specify the alignment of that base value plus the offset. This worked OK when the specified alignment was no larger than the alignment of the offset, but in cases like this it just caused confusion: STW killed %18, 4, %stack.1.ap2.i.i :: (store (s32) into %stack.1.ap2.i.i + 4, align 8) MIRPrinter would never have printed this, with an offset of 4 but an align of 8, so it must have been written by hand. MIRParser would interpret "align 8" as "basealign 8", but I think it is better to give an error and force the user to write "basealign 8" if that is what they really meant. Differential Revision: https://reviews.llvm.org/D120400 Change-Id: I7eeeefc55c2df3554ba8d89f8809a2f45ada32d8	2022-02-24 15:32:08 +00:00
Matthias Braun	6a383369f9	PGOInstrumentation, GCOVProfiling: Split indirectbr critical edges regardless of PHIs The `SplitIndirectBrCriticalEdges` function was originally designed for `CodeGenPrepare` and skipped splitting of edges when the destination block didn't contain any `PHI` instructions. This only makes sense when reducing COPYs like `CodeGenPrepare`. In the case of `PGOInstrumentation` or `GCOVProfiling` it would result in missed counters and wrong result in functions with computed goto. Differential Revision: https://reviews.llvm.org/D120096	2022-02-23 16:27:37 -08:00
Craig Topper	c7d6448d03	[DAGCombiner][TargetLowering] Pass SDValue by value to isMulAddWithConstProfitable. Internally to DAGCombiner the SDValues were passed by non-const reference despite not being modified. They were then passed by const reference to TLI. This patch passes them by value which is consistent with the vast majority of code. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120420	2022-02-23 12:40:45 -08:00
Pawe Bylica	afdaa86b77	[DAGCombine] Extend combineCarryDiamond() In combineCarryDiamond() use getAsCarry() to find more candidates for being a carry flag. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118362	2022-02-23 21:37:49 +01:00
Jessica Paquette	68c718c8f4	Revert "[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"" This reverts commit `d97f997eb7`. This commit was not NFC. (See: https://reviews.llvm.org/rGd97f997eb79d91b2872ac13619f49cb3a7120781)	2022-02-23 10:35:52 -08:00
Sanjay Patel	21d7c3bcc6	[DAG] try to convert multiply to shift via demanded bits This is a fix for a regression discussed in: https://github.com/llvm/llvm-project/issues/53829 We cleared more high multiplier bits with `995d400`, but that can lead to worse codegen because we would fail to recognize the now disguised multiplication by neg-power-of-2 as a shift-left. The problem exists independently of the IR change in the case that the multiply already had cleared high bits. We also convert shl+sub into mul+add in instcombine's negator. This patch fills in the high-bits to see the shift transform opportunity. Alive2 attempt to show correctness: https://alive2.llvm.org/ce/z/GgSKVX The AArch64, RISCV, and MIPS diffs look like clear wins. The x86 code requires an extra move register in the minimal examples, but it's still an improvement to get rid of the multiply on all CPUs that I am aware of (because multiply is never as fast as a shift). There's a potential follow-up noted by the TODO comment. We should already convert that pattern into shl+add in IR, so it's probably not common: https://alive2.llvm.org/ce/z/7QY_Ga Fixes #53829 Differential Revision: https://reviews.llvm.org/D120216	2022-02-23 12:09:32 -05:00
Rainer Orth	365be7ac72	[MC][ELF] Use SHF_SUNW_NODISCARD instead of SHF_GNU_RETAIN on Solaris As requested in D107955 <https://reviews.llvm.org/D107955>, this patch splits off the `MC` and `CodeGen` parts and adds a testcase. Tested on `sparcv9-sun-solaris2.11`, `amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`. Differential Revision: https://reviews.llvm.org/D120318	2022-02-23 15:43:12 +01:00
Bill Wendling	a5bbc6ef99	[NFC] Remove unnecessary "#include"s from header files	2022-02-23 01:20:48 -08:00
Rahman Lavaee	029283c1c0	Encode address offsets of basic blocks relative to the end of the previous basic blocks. Conceptually, the new encoding emits the offsets and sizes as label differences between each two consecutive basic block begin and end label. When decoding, the offsets must be aggregated along with basic block sizes to calculate the final relative-to-function offsets of basic blocks. This encoding uses smaller values compared to the existing one (offsets relative to function symbol). Smaller values tend to occupy fewer bytes in ULEB128 encoding. As a result, we get about 25% reduction in the size of the bb-address-map section (reduction from about 9MB to 7MB). Reviewed By: tmsriram, jhenderson Differential Revision: https://reviews.llvm.org/D106421	2022-02-22 15:46:46 -08:00
Jay Foad	b47e2dc91f	[StableHashing] Hash machine basic blocks and functions This adds very basic support for hashing MachineBasicBlock and MachineFunction, for use in MachineFunctionPass to detect passes that modify the MachineFunction wrongly. Differential Revision: https://reviews.llvm.org/D120122	2022-02-22 17:38:47 +00:00
Joseph Huber	456ffd7a22	[OpenMP] Ensure offloading sections do not have SHF_ALLOC flag We use offloading sections in the new Clang driver scheme to embed device code into the host. We later use these sections to link the device image, after which point they are completely unused and should not be loaded into memory if they are still in the executable. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D120275	2022-02-21 21:35:17 -05:00
Jessica Paquette	d97f997eb7	[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges" We found a case in the Swift benchmarks where the MachineOutliner introduces about a 20% compile time overhead in comparison to building without the MachineOutliner. The origin of this slowdown is that the benchmark has long blocks which incur lots of LRU checks for lots of candidates. Imagine a case like this: ``` bb: i1 i2 i3 ... i123456 ``` Now imagine that all of the outlining candidates appear early in the block, and that something like, say, NZCV is defined at the end of the block. The outliner has to check liveness for certain registers across all candidates, because outlining from areas where those registers are used is unsafe at call boundaries. This is fairly wasteful because in the previously-described case, the outlining candidates will never appear in an area where those registers are live. To avoid this, precalculate areas where we will consider outlining from. Anything outside of these areas is mapped to illegal and not included in the outlining search space. This allows us to reduce the size of the outliner's suffix tree as well, giving us a potential memory win. By precalculating areas, we can also optimize other checks too, like whether or not LR is live across an outlining candidate. Doing all of this is about a 16% compile time improvement on the case. This is likely useful for other targets (e.g. ARM + RISCV) as well, but for now, this only implements the AArch64 path. The original "is the MBB safe" method still works as before.	2022-02-21 15:29:16 -08:00
Paweł Bylica	df0c16ce00	[NFC][DAGCombine] Use isOperandOf() in combineCarryDiamond Pre-commit for https://reviews.llvm.org/D118362.	2022-02-21 21:41:31 +01:00
Matt Arsenault	9c7ca51b2c	MIR: Start diagnosing too many operands on an instruction Previously this would just assert which was annoying and didn't point to the specific instruction/operand.	2022-02-21 10:36:39 -05:00
Simon Pilgrim	46f1e8359e	[DAG] visitBSWAP - pull out repeated SDLoc. NFC Cleanup for D120192	2022-02-21 13:08:01 +00:00
Jay Foad	9a547e7009	[StableHashing] Hash vregs with multiple defs This allows stableHashValue to be used on Machine IR that is not in SSA form. Differential Revision: https://reviews.llvm.org/D120121	2022-02-21 10:26:34 +00:00
Craig Topper	440c4b705a	[SelectionDAG][RISCV][ARM][PowerPC][X86][WebAssembly] Change default abs expansion to use sra (X, size(X)-1); sub (xor (X, Y), Y). Previous we used sra (X, size(X)-1); xor (add (X, Y), Y). By placing sub at the end, we allow RISCV to combine sign_extend_inreg with it to form subw. Some X86 tests for Z - abs(X) seem to have improved as well. Other targets look to be a wash. I had to modify ARM's abs matching code to match from sub instead of xor. Maybe instead ISD::ABS should be made legal. I'll try that in parallel to this patch. This is an alternative to D119099 which was focused on RISCV only. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D119171	2022-02-20 21:11:23 -08:00
Chen Zheng	efe5b8ad90	[ISEL] remove unnecessary getNode(); NFC Reviewed By: RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D120049	2022-02-20 21:08:49 -05:00
Luo, Yuanke	67ef63138b	[SDAG] enable binop identity constant folds for sub This patch extract the sub folding from D119654 and leave only add folding in that patch. Differential Revision: https://reviews.llvm.org/D120116	2022-02-21 09:37:36 +08:00
David Blaikie	323c672789	DebugInfo: Add an assert about cross-unit references in dwo units This is helping me debug some issues with simplified template names	2022-02-20 14:53:17 -08:00
Amara Emerson	b09e63bad1	[AArch64][GlobalISel] Implement combines for boolean G_SELECT->bitwise ops. Differential Revision: https://reviews.llvm.org/D117160	2022-02-20 00:53:09 -08:00
Craig Topper	24bfa24355	[SelectionDAGBuilder] Simplify visitShift. NFC This code was detecting whether the value returned by getShiftAmountTy can represent all shift amounts. If not, it would use MVT::i32 as a placeholder. getShiftAmountTy was updated last year to return i32 if the type returned by the target couldn't represent all values. This means the MVT::i32 case here is dead and can the logic can be simplified. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120164	2022-02-19 12:40:59 -08:00
Craig Topper	1df8efae56	[SelectionDAG][X86] Support f16 in getReciprocalOpName. If the "reciprocal-estimates" attribute is present and it doesn't contain "all", "none", or "default", we previously crashed on f16 operations. This patch addes an 'h' suffix' to prevent the crash. I've added simple tests that just enable the estimate for all vec-sqrt and one test case that explicitly tests the new 'h' suffix to override the default steps. There may be some frontend change needed to, but I haven't checked that yet. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D120158	2022-02-18 21:55:49 -08:00
Craig Topper	8e7247a377	[SelectionDAG] Fix off by one error in range check in DAGTypeLegalizer::ExpandShiftByConstant. The code was considering shifts by an about larger than the number of bits in the original VT to be out of range. Shifts exactly equal to the original bit width are also out of range. I don't know how to test this. DAGCombiner should usually fold this away. I just noticed while looking for something else in this code. The llvm-cov report shows that we don't have coverage for out of range shifts here. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D120170	2022-02-18 18:42:20 -08:00
Craig Topper	0d59a54cea	Revert "[SelectionDAG][X86] Support f16 in getReciprocalOpName." This reverts commit `86b5e25662`. This wasn't supposed to be commited yet	2022-02-18 15:39:50 -08:00
Craig Topper	04f815c26f	[SelectionDAGBuilder] Remove LegalTypes=false from a call to getShiftAmountConstant. getShiftAmountTy will return MVT::i32 if the shift amount coming from the target's getScalarShiftAmountTy can't reprsent all possible values. That should eliminate the need to use the pointer type which is what we do when LegalTypes is false. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D120165	2022-02-18 15:36:35 -08:00
Craig Topper	86b5e25662	[SelectionDAG][X86] Support f16 in getReciprocalOpName. If the "reciprocal-estimates" attribute is present and it doesn't contain "all", "none", or "default", we previously crashed on f16 operations. This patch addes an 'h' suffix' to prevent the crash. I've added simple tests that just enable the estimate for all vec-sqrt and one test case that explicitly tests the new 'h' suffix to override the default steps. There may be some frontend change needed to, but I haven't checked that yet. Differential Revision: https://reviews.llvm.org/D120158	2022-02-18 15:36:35 -08:00
Sanjay Patel	a2963d871e	[SDAG] fold sub-of-shift to add-of-shift This fold is done in IR: https://alive2.llvm.org/ce/z/jWyFrP There is an x86 test that shows an improvement from the added flexibility of using add (commutative). The other diffs are presumed neutral. Note that this could also be folded to an 'xor', but I'm not sure if that would be universally better (eg, x86 can convert adds more easily into LEA). This helps prevent regressions from a potential fold for issue #53829.	2022-02-18 11:55:50 -05:00
Jay Foad	074d1e2536	[CodeGen] Return better Changed status from PostRAHazardRecognizer Differential Revision: https://reviews.llvm.org/D119954	2022-02-18 09:46:24 +00:00
Jessica Paquette	12389e3758	[MachineOutliner] Add statistics for unsigned vector size Useful for debugging + evaluating improvements to the outliner. Stats are the number of illegal, legal, and invisible instructions in the unsigned vector, and it's total length.	2022-02-17 18:25:51 -08:00
Heejin Ahn	4f9b839772	[WebAssembly] Make EH/SjLj vars unconditionally thread local This makes three thread local variables (`__THREW__`, `__threwValue`, and `__wasm_lpad_context`) unconditionally thread local. If the target doesn't support TLS, they will be downgraded to normal variables in `stripThreadLocals`. This makes the object not linkable with other objects using shared memory, which is what we intend here; these variables should be thread local when used with shared memory. This is what we initially tried in D88262. But D88323 changed this: It only created these variables when threads were supported, because `__THREW__` and `__threwValue` were always generated even if Emscripten EH/SjLj was not used, making all objects built without threads not linkable with shared memory, which was too restrictive. But sometimes this is not safe. If we build an object using variables such as `__THREW__` without threads, it can be linked to other objects using shared memory, because the original object's `__THREW__` was not created thread local to begin with. So this CL basically reverts D88323 with some additional improvements: - This checks each of the functions and global variables created within `LowerEmscriptenEHSjLj` pass and removes it if it's not used at the end of the pass. So only modules using those variables will be affected. - Moves `CoalesceFeaturesAndStripAtomics` and `AtomicExpand` passes after all other IR pasess that can create thread local variables. It is not sufficient to move them to the end of `addIRPasses`, because `__wasm_lpad_context` is created in `WasmEHPrepare`, which runs inside `addPassesToHandleExceptions`, which runs before `addISelPrepare`. So we override `addISelPrepare` and move atomic/TLS stripping and expanding passes there. This also removes merges `TLS` and `NO-TLS` FileCheck lines into one `CHECK` line, because in the bitcode level we always create them as thread local. Also some function declarations are deleted `CHECK` lines because they are unused. Reviewed By: tlively, sbc100 Differential Revision: https://reviews.llvm.org/D120013	2022-02-17 16:04:18 -08:00
Matt Arsenault	c46aab01c0	RegAllocGreedy: Fix last chance recolor assert in impossible case This example is not compilable without handling eviction of specific subregisters. Last chance recoloring was deciding it could try evicting an overlapping superregister, which doesn't help make any progress. The LiveIntervalUnion would then assert due to an overlapping / identical range when trying the new assignment. Unfortunately this is also producing a verifier error after the allocation fails. I've seen a number of these, and not sure if we should just start deleting the function on error rather than trying to figure out how to put together valid MIR. I'm not super confident this is the right place to fix this. I also have a number of failing testcases I need to fix by handling partial evictions of superregisters.	2022-02-17 18:30:56 -05:00
Paul Walker	6457f42bde	[DAGCombiner] Extend ISD::ABDS/U combine to handle more cases. The current ABD combine doesn't quite work for SVE because only a single scalable vector per scalar integer type is legal (e.g. for i32, <vscale x 4 x i32> is the only legal scalable vector type). This patch extends the combine to also trigger for the cases when operand extension must be retained. Differential Revision: https://reviews.llvm.org/D115739	2022-02-17 13:32:20 +00:00
Bjorn Pettersson	1a8bdf95a3	[DAG] Fix in ReplaceAllUsesOfValuesWith When doing SelectionDAG::ReplaceAllUsesOfValuesWith a worklist is prepared containing all users that should be updated. Then we use the RemoveNodeFromCSEMaps/AddModifiedNodeToCSEMaps helpers to handle recursive CSE updates while doing the replacements. This patch aims at solving a problem that could arise if the recursive CSE updates would result in an SDNode present in the worklist is being removed as a side-effect of morphing a prio user in the worklist. To examplify such a scenario, imagine that we have these nodes in the DAG t12: i64 = add t8, t11 t13: i64 = add t12, t8 t14: i64 = add t11, t11 t15: i64 = add t14, t8 t16: i64 = sub t13, t15 and that the t8 uses should be replaced by t11. An initial worklist (listing the users that should be morphed) could be [t12, t13, t15]. When updating t12 we get t12: i64 = add t11, t11 which results in a CSE update that replaces t14 by t12, so we get t15: i64 = add t12, t8 which results in a CSE update that replaces t13 by t12, so we get t16: i64 = sub t12, t15 and then t13 is removed given that it was the last use of t13. So when being done with the updates triggered by rewriting the use of t8 in t12 the t13 node no longer exist. And we used to end up hitting an assertion when continuing with the worklist aiming at replacing the t8 uses in t13. The solution is based on using a DAGUpdateListener, making sure that we prune a user from the worklist if it is removed during the recursive CSE updates. The bug was found using an OOT target. I think the problem is quite old, even if the particular intree target reproducer added in this patch seem to pass when using LLVM 13.0.0. Differential Revision: https://reviews.llvm.org/D119088	2022-02-17 14:29:59 +01:00
Jay Foad	50ddb5d2d1	[CodeGen] Return better Changed status from LocalStackSlotAllocation Differential Revision: https://reviews.llvm.org/D119942	2022-02-17 09:31:41 +00:00
Jay Foad	f0092f9ded	[CodeGen] Return false from LiveIntervals::runOnMachineFunction This is an analysis pass so it does not modify the MachineFunction. Differential Revision: https://reviews.llvm.org/D119941	2022-02-17 09:31:41 +00:00
Jay Foad	3c9229c663	[CodeGen] Return better Changed status from DetectDeadLanes Differential Revision: https://reviews.llvm.org/D119940	2022-02-17 09:31:41 +00:00
Heejin Ahn	c60d822965	[WebAssembly] Make __wasm_lpad_context thread-local This makes `__wasm_lpad_context`, a struct that is used as a communication channel between compiler-generated code and personality function in libunwind, thread local. The library code will be changed to thread local in the emscripten side. Reviewed By: sbc100, tlively Differential Revision: https://reviews.llvm.org/D119803	2022-02-16 15:56:38 -08:00
Craig Topper	1daa66d3fd	[SelectionDAG] Add SPLAT_VECTOR to SelectionDAG::isConstantFPBuildVectorOrConstantFP. Matches what is done for the int version. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D119793	2022-02-16 09:22:11 -08:00
Simon Pilgrim	30e9cdd1aa	[DAG] computeKnownBits - add ISD::AVGCEILU handling Expand the ISD::AVGCEILU to determine the known bits of the result. First part of PR53622 Differential Revision: https://reviews.llvm.org/D119629	2022-02-16 13:00:15 +00:00
Shengchen Kan	ce02c79dc6	[Debugify] Mark mir-check-debugify change nothing of input Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D119914	2022-02-16 18:37:26 +08:00
Shao-Ce SUN	2aed07e96c	[NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter` Reviewed By: skan Differential Revision: https://reviews.llvm.org/D119846	2022-02-16 13:10:09 +08:00
Shao-Ce SUN	9cc49c1951	Revert "[NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter`" This reverts commit `fe25c06cc5`.	2022-02-16 11:57:49 +08:00
Shao-Ce SUN	fe25c06cc5	[NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter` For ten years, it seems that `MCRegisterInfo` is not used by any target. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D119846	2022-02-16 11:47:17 +08:00
Carl Ritson	ef949ecba5	[MachineSink] Use SkipPHIsAndLabels for sink insertion points For AMDGPU the insertion point for a block may not be the first non-PHI instruction. This happens when a block contains EXEC mask manipulation related to control flow (converging lanes). Use SkipPHIsAndLabels to determine the block insertion point so that the target can skip any block prologue instructions. Reviewed By: rampitec, ruiling Differential Revision: https://reviews.llvm.org/D119399	2022-02-16 12:44:22 +09:00
Mircea Trofin	c62eefb886	[nfc][codegen] Move RegisterBank[Info].cpp under CodeGen Layering-wise, it seems RegisterBank stuff fits under CodeGen, like other target abstraction. In particular, TargetSubtargetInfo has a getRegBankInfo member, but using that object requires making sure GlobalISel is linked, which is not always the case (e.g. llvm-jitlink doesn't). Differential Revision: https://reviews.llvm.org/D119053	2022-02-15 11:27:15 -08:00
David Green	655d0d86f9	[DAGCombine] Move AVG combine to SimplifyDemandBits This moves the matching of AVGFloor and AVGCeil into a place where demand bit are available, so that it can detect more cases for more folds. It changes the transform to start from a shift, not from a truncate. We match the pattern shr(add(ext(A), ext(B)), 1), transforming to ext(hadd(A, B)). For signed values, because only the bottom bits are demanded llvm will transform the above to use a lshr too, as opposed to ashr. In order to correctly detect the hadd we need to know the demanded bits to turn it back. Depending on whether the shift is signed (ashr) or logical (lshr), and the extensions are signed or unsigned we can create different nodes. If the shift is signed: Needs >= 2 sign bits. https://alive2.llvm.org/ce/z/h4gQAW generating signed rhadd. Needs >= 2 zero bits. https://alive2.llvm.org/ce/z/B64DUA generating unsigned rhadd. If the shift is unsigned: Needs >= 1 zero bits. https://alive2.llvm.org/ce/z/ByD8sj generating unsigned rhadd. Needs 1 demanded bit zero and >= 2 sign bits https://alive2.llvm.org/ce/z/hvPGxX and https://alive2.llvm.org/ce/z/32P5n1 generating signed rhadd. Differential Revision: https://reviews.llvm.org/D119072	2022-02-15 10:17:02 +00:00
Momchil Velikov	6398903ac8	Extend the `uwtable` attribute with unwind table kind We have the `clang -cc1` command-line option `-funwind-tables=1\|2` and the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind tables (1) or asynchronous unwind tables (2)`. However, this is encoded in LLVM IR by the presence or the absence of the `uwtable` attribute, i.e. we lose the information whether to generate want just some unwind tables or asynchronous unwind tables. Asynchronous unwind tables take more space in the runtime image, I'd estimate something like 80-90% more, as the difference is adding roughly the same number of CFI directives as for prologues, only a bit simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even more, if you consider tail duplication of epilogue blocks. Asynchronous unwind tables could also restrict code generation to having only a finite number of frame pointer adjustments (an example of not having a finite number of `SP` adjustments is on AArch64 when untagging the stack (MTE) in some cases the compiler can modify `SP` in a loop). Having the CFI precise up to an instruction generally also means one cannot bundle together CFI instructions once the prologue is done, they need to be interspersed with ordinary instructions, which means extra `DW_CFA_advance_loc` commands, further increasing the unwind tables size. That is to say, async unwind tables impose a non-negligible overhead, yet for the most common use cases (like C++ exceptions), they are not even needed. This patch extends the `uwtable` attribute with an optional value: - `uwtable` (default to `async`) - `uwtable(sync)`, synchronous unwind tables - `uwtable(async)`, asynchronous (instruction precise) unwind tables Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D114543	2022-02-14 14:35:02 +00:00
David Green	03380c70ed	[DAGCombine] Basic combines for AVG nodes. This adds very basic combines for AVG nodes, mostly for constant folding and handling degenerate (zero) cases. The code performs mostly the same transforms as visitMULHS, adjusted for AVG nodes. Constant folding extends to a higher bitwidth and drops the lowest bit. For undef nodes, `avg undef, x` is transformed to x. There is also a transform for `avgfloor x, 0` transforming to `shr x, 1`. Differential Revision: https://reviews.llvm.org/D119559	2022-02-14 11:18:35 +00:00
Tim Northover	a87d3ba61c	Reapply: StackProtector: ignore debug insts when splitting blocks. When deciding where to split a block to insert stack guard checks, we should move past any debug instructions we see that might (e.g.) be separating a tail call from its frame wrangling. This time, also don't run off the front of a basic block.	2022-02-14 10:58:22 +00:00
Nikita Popov	ff040eca93	[FastISel] Reuse register for bitcast that does not change MVT The current FastISel code reuses the register for a bitcast that doesn't change the IR type, but uses a reg-to-reg copy if it changes the IR type without changing the MVT. However, we can simply reuse the register in that case as well. In particular, this avoids unnecessary reg-to-reg copies for pointer bitcasts. This was found while inspecting O0 codegen differences between typed and opaque pointers. Differential Revision: https://reviews.llvm.org/D119432	2022-02-14 09:13:17 +01:00
Craig Topper	e72fe654b7	[DAGCombiner] Use getShiftAmountConstant in DAGCombiner::foldSelectOfConstants. This enables fshl to be matched earlier on X86 %6 = lshr i32 %3, 1 %7 = select i1 %4, i32 -2147483648, i32 0 %8 = or i32 %6, %7 X86 uses i8 for shift amounts. SelectionDAGBuilder creates the ISD::SRL with an i8 shift type. DAGCombiner turns the select into an ISD::SHL. Prior to this patch it would use i32 for the shift amount. fshl matching failed because the shift amounts have different types. LegalizeDAG fixes the ISD::SHL shift amount to i8. This allowed fshl matching to succeed. With this patch, the ISD::SHL will be created with an i8 shift amount. This allows the fshl to match immediately. No test case beause we still end up with a fshl either way.	2022-02-13 19:09:26 -08:00
Benjamin Kramer	bee4531bee	[MachineSink] Inline getRegUnits Reg unit sets are uniqued, so no need to wrap it in a set.	2022-02-12 17:46:12 +01:00
Sanjay Patel	96b7e0b5a0	[SDAG] clean up scalarizing load transform I have not found a way to expose a difference for this patch in a test because it only triggers for a one-use load, but this is the code that was adapted into D118376 and caused miscompiles. The new code pattern is the same as what we do in narrowExtractedVectorLoad() (reduces load width for a subvector extract). This removes seemingly unnecessary manual worklist management and fixes the chain updating via "SelectionDAG::makeEquivalentMemoryOrdering()". Differential Revision: https://reviews.llvm.org/D119549	2022-02-12 11:41:19 -05:00
Sanjay Patel	429f10f5f2	[SDAG] reduce code duplication and fix formatting; NFC	2022-02-12 10:22:13 -05:00
Arthur Eubanks	c0281c7607	[OpaquePtr][SPARC] Remove getPointerElementType() call in SparcISelLowering Requires keeping better track of sret types.	2022-02-11 11:31:19 -08:00
David Green	4072e362c0	[ISel] Port AArch64 HADD and RHADD to ISel This ports the aarch64 combines for HADD and RHADD over to DAG combine, so that they can be used in more architectures (notably MVE in a followup patch). They are renamed to AVGFLOOR and AVGCEIL in the process, to avoid confusion with instructions such as X86 hadd. The code was also rewritten slightly to remove the AArch64 idiosyncrasies. The general pattern for a AVGFLOORS is %xe = sext i8 %x to i32 %ye = sext i8 %y to i32 %a = add i32 %xe, %ye %r = lshr i32 %a, 1 %t = trunc i32 %r to i8 An AVGFLOORU is equivalent with zext. Because of the truncate lshr==ashr, as the top bits are not demanded. An AVGCEIL also includes an extra rounding, so includes an extra add of 1. Differential Revision: https://reviews.llvm.org/D106237	2022-02-11 18:28:56 +00:00
Tim Northover	2ba06bed6b	Revert "StackProtector: ignore debug insts when splitting blocks." This reverts commit `7605ca85f1`. It caused an assertion failure in Fuschia.	2022-02-11 18:06:28 +00:00
Julien Pages	dcb2da13f1	[AMDGPU] Add a new intrinsic to control fp_trunc rounding mode Add a new llvm.fptrunc.round intrinsic to precisely control the rounding mode when converting from f32 to f16. Differential Revision: https://reviews.llvm.org/D110579	2022-02-11 12:08:23 -05:00
Tim Northover	7605ca85f1	StackProtector: ignore debug insts when splitting blocks. When deciding where to split a block to insert stack guard checks, we should move past any debug instructions we see that might (e.g.) be separating a tail call from its frame wrangling.	2022-02-11 10:13:50 +00:00
serge-sans-paille	06943537d9	Cleanup MCParser headers As usual with that header cleanup series, some implicit dependencies now need to be explicit: llvm/MC/MCParser/MCAsmParser.h no longer includes llvm/MC/MCParser/MCAsmLexer.h Preprocessed lines to build llvm on my setup: after: 1068185081 before: 1068324320 So no compile time benefit to expect, but we still get the looser coupling between files which is great. Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D119359	2022-02-11 10:39:29 +01:00
Yuanfang Chen	f927021410	Reland "[clang-cl] Support the /JMC flag" This relands commit `b380a31de0`. Restrict the tests to Windows only since the flag symbol hash depends on system-dependent path normalization.	2022-02-10 15:16:17 -08:00
Yuanfang Chen	b380a31de0	Revert "[clang-cl] Support the /JMC flag" This reverts commit `bd3a1de683`. Break bots: https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-windows-x64/b8822587673277278177/overview	2022-02-10 14:17:37 -08:00
Reid Kleckner	64037afe01	[CodeView] Avoid integer overflow while parsing long version strings This came up on a funny vendor-provided version string that didn't have a standard dotted quad of numbers.	2022-02-10 13:52:11 -08:00
Yuanfang Chen	bd3a1de683	[clang-cl] Support the /JMC flag The introduction and some examples are on this page: https://devblogs.microsoft.com/cppblog/announcing-jmc-stepping-in-visual-studio/ The `/JMC` flag enables these instrumentations: - Insert at the beginning of every function immediately after the prologue with a call to `void __fastcall __CheckForDebuggerJustMyCode(unsigned char *JMC_flag)`. The argument for `__CheckForDebuggerJustMyCode` is the address of a boolean global variable (the global variable is initialized to 1) with the name convention `__<hash>_<filename>`. All such global variables are placed in the `.msvcjmc` section. - The `<hash>` part of `__<hash>_<filename>` has a one-to-one mapping with a directory path. MSVC uses some unknown hashing function. Here I used DJB. - Add a dummy/empty COMDAT function `__JustMyCode_Default`. - Add `/alternatename:__CheckForDebuggerJustMyCode=__JustMyCode_Default` link option via ".drectve" section. This is to prevent failure in case `__CheckForDebuggerJustMyCode` is not provided during linking. Implementation: All the instrumentations are implemented in an IR codegen pass. The pass is placed immediately before CodeGenPrepare pass. This is to not interfere with mid-end optimizations and make the instrumentation target-independent (I'm still working on an ELF port in a separate patch). Reviewed By: hans Differential Revision: https://reviews.llvm.org/D118428	2022-02-10 10:26:30 -08:00
Nikita Popov	6241f7dee0	[FastISel] Remove redundant reg class check (NFC) SrcVT and DstVT are the same in this branch, as such their register classes will also be the same.	2022-02-10 14:10:00 +01:00
Jeremy Morse	be5734ddaa	[DebugInfo][InstrRef] Don't fire assertions if debug-info is faulty It's inevitable that optimisation passes will fail to update debug-info: when that happens, it's best if the compiler doesn't crash as a result. Therefore, downgrade a few assertions / failure modes that would crash when illegal debug-info was seen, to instead drop variable locations. In practice this means that an instruction reference to a nonexistant or illegal operand should be tolerated. Differential Revision: https://reviews.llvm.org/D118998	2022-02-10 11:25:08 +00:00
Jay Foad	abda8d2229	[GlobalISel] CSE FP constants at -O0 At -O0 we claim to CSE constants only. I think this should apply to G_FCONSTANT as well as G_CONSTANT. Differential Revision: https://reviews.llvm.org/D119344	2022-02-10 09:17:11 +00:00
Reid Kleckner	b5a592a8e2	[DAG] Remove pointless std::function wrapper, NFC	2022-02-09 14:30:43 -08:00
Reid Kleckner	f63c150187	Revert "[DagCombine] Increase depth by number of operands to avoid a pathological compile time." Appears to be causing check-llvm to fail This reverts commit `49ab760090`.	2022-02-09 13:55:40 -08:00
Alina Sbirlea	49ab760090	[DagCombine] Increase depth by number of operands to avoid a pathological compile time. We're hitting a pathological compile-time case, profiled to be in DagCombiner::visitTokenFactor and many inserts into a SmallPtrSet. It looks like one of the paths around findBetterNeighborChains is not capped and leads to this. This patch resolves the issue. Looking for feedback if this solution looks reasonable. Differential Revision: https://reviews.llvm.org/D118877	2022-02-09 13:31:28 -08:00
Alexander Yermolovich	1be6ccfc02	[DWARF][codegen] Fix for Aranges when split inlining is present When we enable -fsplit-dwarf-inlining we end up with two entries in .debug_aranges for each CU. Because it processes Skeleton CU inline information and DWO CU. Furthermore address calculations were incorrect because we were processing sections in Skeleton CU. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D118857	2022-02-09 11:51:43 -08:00
Sander de Smalen	ec46232517	[DAGCombiner] Fold `ty1 extract_vector(ty2 splat(V)) -> ty1 splat(V)` This seems like an obvious fold, which leads to a few improvements. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118920	2022-02-09 14:30:01 +00:00
serge-sans-paille	ef736a1c39	Cleanup LLVMMC headers There's a few relevant forward declarations in there that may require downstream adding explicit includes: llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h Counting preprocessed lines required to rebuild llvm-project on my setup: before: 1052436830 after: 1049293745 Which is significant and backs up the change in addition to the usual benefits of decreasing coupling between headers and compilation units. Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D119244	2022-02-09 11:09:17 +01:00
Bill Wendling	deaf22bc0e	[X86] Implement -fzero-call-used-regs option The "-fzero-call-used-regs" option tells the compiler to zero out certain registers before the function returns. It's also available as a function attribute: zero_call_used_regs. The two upper categories are: - "used": Zero out used registers. - "all": Zero out all registers, whether used or not. The individual options are: - "skip": Don't zero out any registers. This is the default. - "used": Zero out all used registers. - "used-arg": Zero out used registers that are used for arguments. - "used-gpr": Zero out used registers that are GPRs. - "used-gpr-arg": Zero out used GPRs that are used as arguments. - "all": Zero out all registers. - "all-arg": Zero out all registers used for arguments. - "all-gpr": Zero out all GPRs. - "all-gpr-arg": Zero out all GPRs used for arguments. This is used to help mitigate Return-Oriented Programming exploits. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D110869	2022-02-08 17:42:54 -08:00
Mircea Trofin	2868c57caf	[nfc][mlgo][regalloc] Add the url to a reference pre-trained model	2022-02-08 16:57:24 -08:00
Matt Arsenault	5af0f097ba	GlobalISel: Constant fold G_PTR_ADD Some globals lower to literal addresses on AMDGPU. This may be wrong for non-integral address spaces. I'm wondering if we should just allow regular G_ADD to use pointer types, and reserve G_PTR_ADD for non-integral address spaces.	2022-02-08 19:21:06 -05:00
Matt Arsenault	2af4a554fe	GlobalISel: Constant fold FP bin ops in MIRBuilder Might as well handle these if we're going to handle the integer ops here.	2022-02-08 18:51:10 -05:00
Matt Arsenault	930f2498d4	GlobalISel: Constant fold integer min/max opcodes	2022-02-08 18:50:35 -05:00
Matt Arsenault	0877fbcc16	GlobalISel: Add FoldBinOpIntoSelect combine This will do the combine in cases that should fold, but don't now. e.g. we're relying on the CSEMIRBuilder's incomplete constant folding. For instance it doesn't handle FP operations or vectors (and we don't have separate constant folding combines either to catch them).	2022-02-08 18:17:21 -05:00
Mircea Trofin	5a50ab4d5c	[nfc][mlgo][regalloc] Stop warnings about unused function Added a `NoopSavedModelImpl` type which can be used as a mock AOT-ed saved model, and further minimize conditional compilation cases. This also removes unused function warnings on gcc.	2022-02-08 08:35:33 -08:00
Sanjay Patel	905abc5b7d	[SDAG] enable binop identity constant folds for fmul/fdiv The test diffs are identical to D119111. This only affects x86 currently because no other target has an override for the TLI hook that controls this transform.	2022-02-08 10:52:28 -05:00
Roman Lebedev	ae9414d562	[ValueTracking] Only check for non-undef/poison if already known to be a self-multiply https://godbolt.org/z/js9fTTG9h ^ we don't care what `isGuaranteedNotToBeUndefOrPoison()` says unless we already knew that the operands were equal.	2022-02-08 18:35:29 +03:00
Sanjay Patel	a68e098024	[SDAG] move x86 select-with-identity-constant fold behind a target hook; NFC This is no-functional-change-intended because only the x86 target enables the TLI hook currently. We can add fmul/fdiv opcodes to the switch similar to the proposal D119111, but we don't need to make other changes like enabling target-specific combines. We can also add integer opcodes (add, or, shl, etc.) to the switch because this function is called from all of the generic binary opcodes. The goal is to incrementally enable the profitable diffs from D90113 while avoiding regressions. Differential Revision: https://reviews.llvm.org/D119150	2022-02-08 09:55:05 -05:00
Sheng	76c83e747f	[GlobalISel] Add big endian support in CallLowering When splitting values, CallLowering assumes Lo part goes first. But in big endian ISA such as M68k, Hi part goes first. This patch fixes this. Differential Revision: https://reviews.llvm.org/D116877	2022-02-08 14:43:38 +00:00
Nikita Popov	924696d271	[AsmPrinter] Avoid pointer element type access Instead of checking for a bitcast from a function type, check whether the aliasee is a function after stripping bitcasts. This is not strictly equivalent, but serves the same purpose.	2022-02-08 15:06:02 +01:00
Simon Pilgrim	fd2bb51f1e	[ADT] Add APInt/MathExtras isShiftedMask variant returning mask offset/length In many cases, calls to isShiftedMask are immediately followed with checks to determine the size and position of the bitmask. This patch adds variants of APInt::isShiftedMask, isShiftedMask_32 and isShiftedMask_64 that return these values as additional arguments. I've updated a number of cases that were either performing seperate size/position calculations or had created their own local wrapper versions of these. Differential Revision: https://reviews.llvm.org/D119019	2022-02-08 12:04:13 +00:00
Carl Ritson	42ac4e1a12	[MachineLICM] Add shouldHoist method to TargetInstrInfo Add a shouldHoist method to TargetInstrInfo which is queried by MachineLICM to override hoisting decisions for a given target. This mirrors functionality provided by shouldSink. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D118773	2022-02-08 15:53:05 +09:00
Sheng	146c7820d9	[GlobalISel][Legalizer] Support reducing load/store width in big endian order	2022-02-07 20:06:17 -05:00
Sanjay Patel	d1ecfaa097	[SDAG] try to fold one-demanded-bit-of-multiply This is a translation of the transform added to InstCombine with: D118539	2022-02-07 17:24:35 -05:00
Sanjay Patel	fc6bee1c11	[SDAG] SimplifyDemandedBits - generalize fold for 2 LSB of X*X This is translated from recent changes to the IR version of this function: D119060 D119139	2022-02-07 15:38:50 -05:00
Vang Thao	570471199b	[AMDGPU] Fix debug values in scheduler not placed correctly when reverting Debug position data is cleared after ScheduleDAGMILive::schedule() due to it also calling placeDebugValues(). Make it so the data is not cleared after initial call to placeDebugValues since we will call it again after reverting a schedule. Secondly, since we skip debug instructions when reverting the schedule on AMDGPU, all debug instructions are now moved to the end of the scheduling region. RegionEnd points to the beginning of this chunk of debug instructions since it was not incremented when a debug instruction was skipped. RegionBegin may also point to the same debug instruction if Unsched.front() is a debug instruction thus shrinking the region to 1. Fix RegionBegin and RegionEnd so that they point to the current beginning and ending before calling placeDebugValues() since both vars will be used as reference points to move debug instructions back. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D119022	2022-02-07 11:01:13 -08:00
Simon Pilgrim	74555fd367	[DAG] visitINSERT_VECTOR_ELT - break if-else chain as they both return (style). NFC.	2022-02-07 09:58:47 +00:00
Simon Pilgrim	5d3a86489f	[GlobalISel] Move getOpcode() calls inside assert() to avoid (void)s. NFC. Tidier solution to the unused variable warnings - we already do this in other places in this file.	2022-02-07 09:50:27 +00:00
Djordje Todorovic	def10a2895	[GlobalIsel] Fix another "unused variable" warning	2022-02-07 09:32:22 +01:00
Djordje Todorovic	eab395fa40	Fix the warning after D118805 A variable was used within assert() only.	2022-02-07 09:25:02 +01:00
Craig Topper	c35ccd2ac8	[DAGCombiner][RISCV] Allow rotates by non-constant to be matched for i32 on riscv64 with Zbb. rv64izbb has a RORW/ROLW instructions that operate on the lower 32-bits of a 64-bit value and sign extend bit 31 of the result. DAGCombiner won't match rotate idioms because the i32 type isn't Legal on riscv64. This patch teaches DAGCombiner to allow it if the type is going to be promoted and the target has Custom type legalization for ISD::ROTL or ISD::ROTR. I've restricted this to scalar types. It doesn't appear any in tree targets other than riscv64 have custom type legalization for rotates. If this patch isn't acceptable, I guess I can match SRLW, SLLW, and OR after type legalization, but I'd like to avoid that if possible. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D119062	2022-02-06 10:58:12 -08:00
Kazu Hirata	3a8c51480f	[CodeGen] Use = default (NFC) Identified with modernize-use-equals-default	2022-02-06 10:54:44 -08:00
Bjorn Pettersson	cecf11c315	[DAGCombiner] Fold SSHLSAT/USHLSAT to SHL when no saturation will occur When the shift amount is known and a known sign bit analysis of the shiftee indicates that no saturation will occur, then we can replace SSHLSAT/USHLSAT by SHL. Differential Revision: https://reviews.llvm.org/D118765	2022-02-06 18:59:06 +01:00
Rong Xu	52d981a4c1	[SampleFDO] Enable FSAFDO loading passes if --enable-fs-discriminator is enabled FSAFDO profile loader is currently disabled even --enable-fs-discriminator is enabled. They need to be turned on by options which makes it cumbersome for experiments. This patch changes the FSAFDO profile loader enabled by default. Since they are guarded by EnableFSDiscriminator, they will only be turned on if --enable-fs-discriminator is enabled. Note that --enable-fs-discriminator is still disabled by default. Differential Revision: https://reviews.llvm.org/D119033	2022-02-05 22:37:09 -08:00
Benjamin Kramer	a40dc4eaf8	Simplify mask creation with llvm::seq. NFCI.	2022-02-05 23:35:41 +01:00
Sander de Smalen	6452549f30	[DAGCombiner] Fold vecreduce_or/and if operand is insert_subvector. Fold: vecreduce_or(insert_subvec(zeroinitializer, vec)) -> vecreduce_or(vec) vecreduce_and(insert_subvec(allones, vec)) -> vecreduce_and(vec) vecreduce_and/or(insert_subvec(undef, vec)) -> vecreduce_and/or(vec) This is useful for SVE which uses insert/extract subvector to convert fixed-width to/from scalable vectors. Reviewed By: bsmith Differential Revision: https://reviews.llvm.org/D118919	2022-02-05 14:35:53 +00:00
Hongtao Yu	dee058c670	[CSSPGO] Turn on ext-tsp by default for CSSPGO. I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119048	2022-02-04 19:46:44 -08:00
Róbert Ágoston	cd4ed08b5a	[GlobalISel] Don't combine instructions which are fed by memory instructions using different size Memory instructions like extending loads from the same address are not equal if their size is not equal. This fixes https://github.com/llvm/llvm-project/issues/53524. Differential Revision: https://reviews.llvm.org/D118805	2022-02-04 15:00:47 -08:00
John Brawn	0d8092dd48	[AArch64] Fix legalization of v1f64 strict_fsetcc and strict_fsetccs These operations are scalarized but the result type v1i1 isn't which needs special handling (the same as is done for the non-strict versions of these operations). Differential Revision: https://reviews.llvm.org/D118258	2022-02-04 12:55:38 +00:00
serge-sans-paille	ffe8720aa0	Reduce dependencies on llvm/BinaryFormat/Dwarf.h This header is very large (3M Lines once expended) and was included in location where dwarf-specific information were not needed. More specifically, this commit suppresses the dependencies on llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used, this has a decent impact on number of preprocessed lines generated during compilation of LLVM, as showcased below. This is achieved by moving some definitions back to the .cpp file, no performance impact implied[0]. As a consequence of that patch, downstream user may need to manually some extra files: llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h In some situations, codes maybe relying on the fact that llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden dependency now needs to be explicit. $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l after: 10978519 before: 11245451 Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup [0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions Differential Revision: https://reviews.llvm.org/D118781	2022-02-04 11:44:03 +01:00
Bjorn Pettersson	3db39e7479	[DAGCombiner] Fix dependency analysis in checkMergeStoreCandidatesForDependencies In the aftermath of D116895 a problem was found in the analysis of dependencies between store merge candidates in checkMergeStoreCandidatesForDependencies, that is needed to avoid the cycles are introduced in the DAG. In the past it has been enough (or assumed to be enough) to start scanning from non-chain operands when analysing the store merge candidates for dependencies, assuming that the analysis of chain dependencies performed when finding the candidates would cover up for potential dependencies that exist involving the chain operands. It was however discovered that one could end up with scenarios such as descibed in the aarch64-checkMergeStoreCandidatesForDependencies.ll test case, when the dependency between two stores is given by a mix of chain operand dependencies and non-chain operand dependencies. The fix in this patch make sure that we also account for chain operand dependencies when doing the more elaborate analysis in checkMergeStoreCandidatesForDependencies, no longer relying on that the earlier check involving chain operands is enough. Differential Revision: https://reviews.llvm.org/D118943	2022-02-04 08:53:01 +01:00
Mircea Trofin	91a33ad32b	[nfc][mlgo][regalloc] Cache live interval feature components Lazily cache the feature components of a LiveInterval. Differential Revision: https://reviews.llvm.org/D118674	2022-02-03 17:01:42 -08:00
Jessica Paquette	9a61e731ff	[GlobalISel] Combine (G_ADDO x, 0) -> x + no carry out Similar to the G_MULO change. The code for checking if a constant is legal/pre-legalize is shared between these, and is kind of hairy. So, factor it out into a new function: `isConstantLegalOrBeforeLegalizer`. To make the refactoring clean, further refactor `isLegalOrBeforeLegalizer` into a wrapper for two functions: - `isPreLegalize` - `isLegal` This is a bit easier to read in general. https://godbolt.org/z/KW7oszP1o Differential Revision: https://reviews.llvm.org/D118655	2022-02-03 14:25:15 -08:00
Jessica Paquette	c636899dc1	[GlobalISel] Combine: (G_MULO x, 0) -> 0 + no carry out Similar to the following combine in `DAGCombiner::visitMULO`: ``` // fold (mulo x, 0) -> 0 + no carry out if (isNullOrNullSplat(N1)) return CombineTo(N, DAG.getConstant(0, DL, VT), DAG.getConstant(0, DL, CarryVT)); ``` This fixes some generally poor codegen for `mulo`: https://godbolt.org/z/eTxYsvz8f Differential Revision: https://reviews.llvm.org/D118635	2022-02-03 14:23:58 -08:00
Mircea Trofin	592f52de33	[nfc][regalloc] const LiveIntervals within the allocator Once built, LiveIntervals are immutable. This patch captures that. Differential Revision: https://reviews.llvm.org/D118918	2022-02-03 12:35:36 -08:00
Bjorn Pettersson	0352ee1a22	[CodeGenPrepare] Avoid out-of-bounds shift AddressingModeMatcher::matchOperationAddr may attempt to shift a variable by the same amount of steps as found in the IR in a SHL instruction. This was done without considering that there could be undefined behavior in the IR, so the shift performed when compiling could end up having undefined behavior as well. This patch avoid UB in the codegenprepare by making sure that we limit the shift amount used, in a similar way as already being done in CodeGenPrepare::optimizeLoadExt. Differential Revision: https://reviews.llvm.org/D118602	2022-02-03 21:03:58 +01:00
Mircea Trofin	79b98f0a07	Revert "[nfc][mlgo] De-const a parameter" This reverts commit `bc3b372161`. The planned change that would have needed non-const MachineFunction refs isn't needed after all.	2022-02-03 09:20:36 -08:00
John Brawn	94843ea7d7	[AArch64] Make machine combiner patterns preserve MIFlags This is mainly done so that we don't lose the nofpexcept flag once we start emitting it. Differential Revision: https://reviews.llvm.org/D118621	2022-02-03 11:58:59 +00:00
Sander de Smalen	01bfe9729a	[ISEL] Canonicalize STEP_VECTOR to LHS if RHS is a splat. This helps recognise patterns where we're trying to match STEP_VECTOR patterns to INDEX instructions that take a GPR for the Start/Step. The reason for canonicalising this operation to the LHS is because it will already be canonicalised to the LHS if the RHS is a constant splat vector. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D118459	2022-02-03 09:31:46 +00:00
Jeremy Morse	4654fa89ea	Follow up to `6e03a68b77`, squelch another leak This patch is a sticking-paster until D118774 solves the situation with unique_ptrs. I'm certainly wishing I'd focused on that first X_X.	2022-02-02 21:02:11 +00:00
Jeremy Morse	6e03a68b77	[DebugInfo] Re-enable instruction referencing for x86_64 After discussion in D116821 this was turned off in `74db5c8c95`, `14aaaa1236` applied to limit the maximum memory consumption in rare conditions, plus some performance patches.	2022-02-02 19:41:59 +00:00
Matt Arsenault	a96dbb9035	CodeGen: Use asm register names in warning message This was using the ugly tablegenerated register enum names, which are really hideous for register tuples on AMDGPU. Use the prettier names which are recognized by the asm parser.	2022-02-02 14:20:12 -05:00
Jeremy Morse	206cafb680	Follow up to `9fd9d56dc6`, avoid a memory leak Gaps in the basic block number range (from blocks being deleted or folded) get block-value-tables allocated but never ejected, leading to a memory leak, currently tripping up the asan buildbots. Fix this up by manually freeing that memory. As suggested elsewhere, if these things were owned by a unique_ptr then cleanup would happen automagically. D118774 should eliminate the need for this dance.	2022-02-02 16:01:11 +00:00
Masoud Ataei	256d253332	[PowerPC] Scalar IBM MASS library conversion pass This patch introduces the conversions from math function calls to MASS library calls. To resolves calls generated with these conversions, one need to link libxlopt.a library. This patch is tested on PowerPC Linux and AIX. Differential: https://reviews.llvm.org/D101759 Reviewer: bmahjour	2022-02-02 07:54:19 -08:00
Mircea Trofin	660ff655c8	Fix buildbreak introduced in `ed2deab595`	2022-02-02 07:34:51 -08:00
Mircea Trofin	ed2deab595	[nfc][regalloc] Make the max inference cutoff configurable Added a flag to make configurable the number of interferences after which we 'bail out' and treat a set of intervals as un-evictable. Also using it on the ML side, as it turns out to be a good control for compile-time. With this configurable, we can do a bit of trial and error and see if bumping it has any effect on heuristic/policy quality. Differential Revision: https://reviews.llvm.org/D118707	2022-02-02 07:29:34 -08:00
Jeremy Morse	43de305704	[DebugInfo][InstrRef] Fix a tombstone-in-DenseMap crash from D117877 This is a follow-up to D117877: variable assignments of DBG_VALUE $noreg, or DBG_INSTR_REFs where no value can be found, are represented by a DbgValue object with Kind "Undef", explicitly meaning "there is no value". In D117877 I added a special-case to some assignment accounting faster, without considering this scenario. It causes variables to be given the value ValueIDNum::EmptyValue, which then ends up being a DenseMap key. The DenseMap asserts, because EmptyValue is the tombstone key. Fix this by handling the assign-undef scenario in the special case, to match what happens in the general case: the variable has no value if it's only ever assigned $noreg / undef. Differential Revision: https://reviews.llvm.org/D118715	2022-02-02 15:08:49 +00:00
Jeremy Morse	9fd9d56dc6	[DebugInfo][InstrRef][NFC] Use depth-first scope search for variable locs This patch aims to reduce max-rss from instruction referencing, by avoiding keeping variable value information in memory for too long. Instead of computing all the variable values then emitting them to DBG_VALUE instructions, this patch tries to stream the information out through a depth first search: * Make use of the fact LexicalScopes gives a depth-number to each lexical scope, * Produce a map that identifies the last lexical scope to make use of a block, * Enumerate each scope in LexicalScopes' DFS order, solving the variable value problem, * After each scope is processed, look for any blocks that won't be used by any other scope, and emit all the variable information to DBG_VALUE instructions. Differential Revision: https://reviews.llvm.org/D118460	2022-02-02 14:09:54 +00:00
Jeremy Morse	a80181a81e	[DebugInfo][InstrRef][NFC] Free resources at an earlier stage This patch releases some memory from InstrRefBasedLDV earlier that it would otherwise. The underlying problem is: * We store a big table of "live in values for each block", * We translate that into DBG_VALUE instructions in each block, And both exist in memory at the same time, which needlessly doubles that information. The most of what this patch does is: as we progressively translate live-in information into DBG_VALUEs, we free the variable-value / machine-value tracking information as we go, which significantly reduces peak memory. While I'm here, also add a clear method to wipe variable assignments that have been accumulated into VLocTracker objects, and turn a DenseMap into a SmallDenseMap to avoid an initial allocation. Differential Revision: https://reviews.llvm.org/D118453	2022-02-02 12:58:15 +00:00
Jeremy Morse	d556eb7e27	[DebugInfo][InstrRef][NFC] Cache some PHI resolutions Install a cache of DBG_INSTR_REF -> ValueIDNum resolutions, for scenarios where the value has to be reconstructed from several DBG_PHIs. Whenever this happens, it's because branch folding + tail duplication has messed with the SSA form of the program, and we have to solve a mini SSA problem to find the variable value. This is always called twice, so it makes sense to cache the value. This gives a ~0.5% geomean compile-time-performance improvement on CTMark. Differential Revision: https://reviews.llvm.org/D118455	2022-02-02 12:21:28 +00:00
Simon Pilgrim	5aa2acc86b	[DAG] SimplifyDemandedVectorElts - remove KnownZero/KnownUndef from DCI helper wrapper None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.	2022-02-02 12:04:49 +00:00
Jeremy Morse	14aaaa1236	Re-apply `3fab2d138e`, now with a triple added Was reverted in `1c1b670a73` as it broke all non-x86 bots. Original commit message: [DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out In certain circumstances with things like autogenerated code and asan, you can end up with thousands of Values live at the same time, causing a large working set and a lot of information spilled to the stack. Unfortunately InstrRefBasedLDV doesn't cope well with this and consumes a lot of memory when there are many many stack slots. See the reproducer in D116821. It seems very unlikely that a developer would be able to reason about hundreds of live named local variables at the same time, so a huge working set and many stack slots is an indicator that we're likely analysing autogenerated or instrumented code. In those cases: gracefully degrade by setting an upper bound on the amount of stack slots to track. This limits peak memory consumption, at the cost of dropping some variable locations, but in a rare scenario where it's unlikely someone is actually going to use them. In terms of the patch, this adds a cl::opt for max number of stack slots to track, and has the stack-slot-numbering code optionally return None. That then filters through a number of code paths, which can then chose to not track a spill / restore if it touches an untracked spill slot. The added test checks that we drop variable locations that are on the stack, if we set the limit to zero. Differential Revision: https://reviews.llvm.org/D118601	2022-02-02 11:04:00 +00:00
Sam Parker	281d29b8fe	[TypePromotion] Avoid some unnecessary truncs Check for legal zext 'sinks' before inserting a trunc. Differential Revision: https://reviews.llvm.org/D115451	2022-02-02 10:05:15 +00:00
Simon Moll	7d926b7177	[VE] LEGALAVL and staged VVP legalization The new LEGALAVL node annotates that the AVL refers to packs of 64bit. We use a two-stage lowering approach with LEGALAVL: First, standard SDNodes are translated into illegal VVP layer nodes. Regardless of source (VP or standard), all VVP nodes have a mask and AVL parameter. The AVL parameter refers to the element position (just as in VP intrinsics). Second, we legalize the AVL usage in VVP layer nodes. If the element size is < 64bit, the EVL parameter has to be adjusted to refer to packs of 64bits. We wrap the legalized AVL in a LEGALAVL node to track this. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D118321	2022-02-02 09:11:41 +01:00
Kevin Athey	1c1b670a73	Revert "[DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out" This reverts commit `3fab2d138e`. Breaking PPC sanitizer build: https://lab.llvm.org/buildbot/#/builders/105/builds/20857	2022-02-01 18:37:02 -08:00
David Blaikie	f69f23396d	Revert "DebugInfo: Don't put types in type units if they reference internal linkage types" This reverts commit `ab4756338c`. Breaks some cases, including this: namespace { template <typename> struct a {}; } // namespace class c { c(); }; class b { b(); a<c> ax; }; b::b() {} c::c() {} By producing a reference to a type unit for "c" but not producing the type unit.	2022-02-01 16:13:07 -08:00
David Green	c89cfbd4dd	Revert "[DAG] Extend SearchForAndLoads with any_extend handling" This reverts commit `100763a88f` as it was making incorrect assumptions about implicit zero_extends.	2022-02-01 20:18:40 +00:00
Jeremy Morse	8e75536e51	[DebugInfo][InstrRef][NFC] Bypass a frequently-noop loop Bypass this loop if it would do nothing -- if there are no register masks to be examined, there's no point looking at each location to see if the location has been def'd. Awkwardly, this was responsible for almost an entire half a percent of performance improvement on CTMark. Differential Revision: https://reviews.llvm.org/D118613	2022-02-01 19:39:09 +00:00
Jeremy Morse	3fab2d138e	[DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out In certain circumstances with things like autogenerated code and asan, you can end up with thousands of Values live at the same time, causing a large working set and a lot of information spilled to the stack. Unfortunately InstrRefBasedLDV doesn't cope well with this and consumes a lot of memory when there are many many stack slots. See the reproducer in D116821. It seems very unlikely that a developer would be able to reason about hundreds of live named local variables at the same time, so a huge working set and many stack slots is an indicator that we're likely analysing autogenerated or instrumented code. In those cases: gracefully degrade by setting an upper bound on the amount of stack slots to track. This limits peak memory consumption, at the cost of dropping some variable locations, but in a rare scenario where it's unlikely someone is actually going to use them. In terms of the patch, this adds a cl::opt for max number of stack slots to track, and has the stack-slot-numbering code optionally return None. That then filters through a number of code paths, which can then chose to not track a spill / restore if it touches an untracked spill slot. The added test checks that we drop variable locations that are on the stack, if we set the limit to zero. Differential Revision: https://reviews.llvm.org/D118601	2022-02-01 19:25:29 +00:00
Jeremy Morse	91fb66cf91	[DebugInfo][InstrRef][NFC] Don't build a map of un-needed values When finding locations for variable values at the start of a block, we build a large map of every value to every location, and then pick out the locations for values that are desired. This takes up quite a lot of time, because, unsurprisingly, there are usually more values in registers and stack slots than there are variables. This patch instead creates a map of desired values to their locations, which are initially illegal locations. Then, as we examine every available value, we can select locations for values we care about, and ignore those that we don't. This substantially reduces the amount of work done (i.e., building a map up of values to locations that nothing wants or needs). Geomean performance improvement of 1% on CTMark, woo. Differential Revision: https://reviews.llvm.org/D118597	2022-02-01 18:58:06 +00:00
Mircea Trofin	22d3bbdf4e	[nfc][regalloc] Move DefaultEvictionAdvisor::* to RegAllocEvictionAdvisor.cpp This is leftover from the advisor refactoring. Straight-forward copy and paste.	2022-02-01 07:59:25 -08:00
Simon Pilgrim	904395ab8f	[DAG] SimplifyMultipleUseDemandedBits - add default Depth = 0 argument. Simplifies an upcoming change.	2022-02-01 12:34:38 +00:00
Simon Pilgrim	d83a96f59f	[DAG] Make it clear mul(x,x) knownbits bit[1] == 0 check should be for x is undef only As raised on rGffd0e464b4b9, if x is poison, this fold is still ok.	2022-02-01 11:32:14 +00:00
Bjorn Pettersson	3885879046	[DAGCombine] Add simple folds for SSHLSAT/USHLSAT Do "simplifyShift" and "FoldConstantArithmetic" folds for the SSHLSAT and USHLSAT DAG nodes. This includes folds such as: (shlsat undef/poison, x) -> 0 (shlsat x, undef/poison) -> undef (shlsat x, too_large_shamt) -> undef (shlsat 0, x) -> 0 (shlsat x, 0) -> x (shlsat c1, c2) -> c3 Differential Revision: https://reviews.llvm.org/D118603	2022-02-01 10:51:35 +01:00
David Sherwood	daa80339df	[CodeGen] Support folds of not(cmp(cc, ...)) -> cmp(!cc, ...) for scalable vectors I have updated TargetLowering::isConstTrueVal to also consider SPLAT_VECTOR nodes with constant integer operands. This allows the optimisation to also work for targets that support scalable vectors. Differential Revision: https://reviews.llvm.org/D117210	2022-02-01 09:50:00 +00:00
Mircea Trofin	a3f1491849	[nfc][mlgo][regalloc] 'hasPreferredPhys' out of feature components It isn't cacheable, it can be updated by other events than live interval resizing.	2022-01-31 18:59:47 -08:00
Mircea Trofin	9aa2c914b9	[mlgo][regalloc] Factor live interval feature calculation Factoring it out so we can subsequently cache it. This should be a NFC, however, for the float quantities, we see small errors in the least significant digits. This is because, before, we were summing up one by one. Now, we sum up results of sums. This shouldn't matter for ML, and will require rework when we do quantization (avoiding floats altogether), but meanwhile, it did require an update to the reference file used for testing. The patch also bumps the precision of the variables involved in this, to reduce the error (note they are casted back to float at the end by the SET macro, since we only work with float and not double in TF) Differential Revision: https://reviews.llvm.org/D118659	2022-01-31 15:19:15 -08:00
Mircea Trofin	d46305e22d	[NFC][regalloc] Move evict advisor initialization before VRAI This is because a subsequent patch will propose obtaining the VRAI from the advisor, which will enable feature caching for the ML advisor, for better compile time. Making this change first as it's both innocuous and keeps the future patch to be reviewed small.	2022-01-31 14:04:59 -08:00
Mircea Trofin	bc3b372161	[nfc][mlgo] De-const a parameter We plan to pass the MachineFunction& to APIs that expect it non-const (for legitimate reasons). The advisor still holds the ref as a const ref, though, so we keep most of the maintainability value of that.	2022-01-31 13:44:33 -08:00
Philip Reames	57cf29ac1b	[Statepoint] Remove another use of getActualReturnType [NFC] For the cross block gc.result projection case, we only care about the return type if there is a cross block gc.result, and if there is one, we can take the type from the gc.result. At the moment, this makes little difference, but for opaque pointers we need a means to get result typing without relying on pointee types.	2022-01-31 09:57:46 -08:00
Adrian Prantl	f85c6b79f3	Fix a fragment overflow problem when composing super-registers. Addresses https://github.com/llvm/llvm-project/issues/53342 Differential Revision: https://reviews.llvm.org/D118412	2022-01-31 09:47:29 -08:00
Philip Reames	6e4f7c0823	[Statepoints] Take result type from gc.result [NFC] When lowering a gc.result, we can assume that the result type of the gc.result matches the type of the underlying call. This is explicitly required in LangRef. At the moment, this makes little difference, but for opaque pointers we need a means to get result typing without relying on pointee types.	2022-01-31 09:42:34 -08:00
Philip Reames	093b43f48d	Sink getGCResultLocality to sole use [NFC]	2022-01-31 09:33:57 -08:00
Jeremy Morse	4a2cb01370	[DebugInfo][InstrRef][NFC] Refactor ahead of further optimisations This patch shuffles some functions around so that some blocks of code can be reused. In particular, * Move the determination of "which blocks are in scope" to its own function, as it's non-trivial to solve. Delete the "InScopeBlocks" collection too, which nothing reads from. * Split transfer emission (i.e., installing DBG_VALUEs into blocks) into its own function. * Name some useful types. * Rename "ScopeToBlocks" to "ScopeToAssignBlocks", as that's what the collection contains, blocks where assignments happen. Differential Revision: https://reviews.llvm.org/D118454	2022-01-31 16:45:53 +00:00
Jeremy Morse	e9739f116d	Revert "[DebugInfo][InstrRef][NFC] Add a missing assignment operator" This reverts commit `f18429372f`. Bitten by -Werror,-Wdeprecated-copy on a buildbot, alas!	2022-01-31 16:15:21 +00:00
Jeremy Morse	f18429372f	[DebugInfo][InstrRef][NFC] Add a missing assignment operator ValueIDNum is supposed to be a value type that boils down to a uint64_t, that has some bitfields for convenience. If we use the default operator=, we end up with each bit field being individually assigned, which is un-necessarily slow. Implement the assignment operator by just copying the uint64_t value of the object. This is quicker, and matches how the comparison operators work already. Doing so is 0.1% faster on the compile-time-tracker.	2022-01-31 16:08:38 +00:00
Kerry McLaughlin	002b944dfa	[SVE] Fix TypeSize->uint64_t implicit conversion in visitAlloca() Fixes a crash ('Invalid size request on a scalable vector') in visitAlloca() when we call this function for a scalable alloca instruction, caused by the implicit conversion of TySize to uint64_t. This patch changes TySize to a TypeSize as returned by getTypeAllocSize() and ensures the allocation size is multiplied by vscale for scalable vectors. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D118372	2022-01-31 14:37:23 +00:00
Dávid Bolvanský	ae990a3cbd	[Analysis] Attribute noundef should not prevent tail call optimization Very similar to https://reviews.llvm.org/D101230 Fixes https://github.com/llvm/llvm-project/issues/53501	2022-01-31 15:13:52 +01:00
Simon Pilgrim	7ec8fc2932	[X86] combineAnd() - per-element simplification - call SimplifyDemandedBits using mask demanded bits if SimplifyDemandedVectorElts fails We already call SimplifyDemandedVectorElts using whether each vector mask element is zero/nonzero, this just extends this to also try SimplifyDemandedBits using the demanded bits mask generated from the nonzero elements. This also requires an additional TargetLowering::SimplifyDemandedBits DemandedBits/DemandedElts wrapper.	2022-01-31 13:58:00 +00:00
Jeremy Morse	c703d77a61	[DebugInfo][InstrRef] Don't fully propagate single assigned variables If we only assign a variable value a single time, we can take a short-cut when computing its location: the variable value is only valid up to the dominance frontier of where the assignemnt happens. Past that point, there are other predecessors from where the variable has no value, meaning the variable has no location past that point. This patch recognises this scenario, and avoids expensive SSA computation, to improve compile-time performance. Differential Revision: https://reviews.llvm.org/D117877	2022-01-31 12:54:17 +00:00
Simon Pilgrim	2d1390efbe	[DAG] SimplifyDemandedBits - mul(x,x) - if only demand bit[1] then fold to zero	2022-01-31 12:00:51 +00:00
Simon Pilgrim	48f45f6b25	[X86] Limit mul(x,x) knownbits tests with not undef/poison check We can only assume bit[1] == zero if its the only demanded bit or the source is not undef/poison	2022-01-31 11:55:10 +00:00
Fangrui Song	0e691aed7e	[mlgo][regalloc] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after `a8a7bf922c`	2022-01-30 15:18:30 -08:00
Mircea Trofin	a8a7bf922c	[mlgo][regalloc] Fix register masking If AllocationOrder has less than 32 elements, we were treating the extra positions as if they were valid. This was detected by a subsequent assert. The fix also tightens the asserts.	2022-01-30 14:59:08 -08:00
Markus Böck	e0b11c7659	[Support][NFC] Fix generic `ChildrenGetterTy` of `IDFCalculatorBase` Both IDFCalculatorBase and its accompanying DominatorTreeBase only supports pointer nodes. The template argument is the block type itself and any uses of GraphTraits is therefore done via a pointer to the node type. However, the ChildrenGetterTy type of IDFCalculatorBase has a use on just the node type instead of a pointer to the node type. Various parts of the monorepo has worked around this issue by providing specializations of GraphTraits for the node type directly, or not been affected by using specializations instead of the generic case. These are unnecessary however and instead the generic code should be fixed instead. An example from within Tree is eg. A use of IDFCalculatorBase in InstrRefBasedImpl.cpp. It basically instantiates a IDFCalculatorBase<MachineBasicBlock, false> but due to the bug above then goes on to specialize GraphTraits<MachineBasicBlock> although GraphTraits<MachineBasicBlock*> exists (and should be used instead). Similar dead code exists in clang which defines redundant GraphTraits to work around this bug. This patch fixes both the original issue and removes the dead code that was used to work around the issue. Differential Revision: https://reviews.llvm.org/D118386	2022-01-30 22:09:07 +01:00
Kazu Hirata	2bea207d26	[CodeGen] Use default member initialization (NFC) Identified with modernize-use-default-member-init.	2022-01-30 12:32:51 -08:00
Mircea Trofin	bc5644ee74	[MLGO] Regalloc: allow multiple occurences of -regalloc-enable-advisor This allows scearios where some central config sets it one way and a user wants to override it.	2022-01-29 09:00:52 -08:00
Fangrui Song	33b38339a0	[lld] Add module name to LTO inline asm diagnostic Close #52781: for LTO, the inline asm diagnostic uses `<inline asm>` as the file name (lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp) and it is unclear which module has the issue. With this patch, we will see the module name (say `asm.o`) before `<inline asm>` with ThinLTO. ``` % clang -flto=thin -c asm.c && myld.lld asm.o -e f ld.lld: error: asm.o <inline asm>:1:2: invalid instruction mnemonic 'invalid' invalid ^~~~~~~ ``` For regular LTO, unfortunately the original module name is lost and we only get ld-temp.o. Reviewed By: #lld-macho, ychen, Jez Ng Differential Revision: https://reviews.llvm.org/D118434	2022-01-28 11:32:42 -08:00
Cullen Rhodes	5d089d9a83	[DAGCombiner] Fix invalid size request in combineRepeatedFPDivisors If we have a vector FP division with a splatted divisor, use getVectorMinNumElements when scaling the num of uses by splat factor. For AArch64 the combine kicks in for the <vscale x 4 x float> case since it's above the fdiv threshold (3) when scaling num uses by splat factor, but the codegen is worse (splat + vector fdiv + vector fmul) than the <vscale x 2 x double> case (splat + vector fdiv). If the combine could be converted into a scalar FP division by scalarizeBinOpOfSplats it may be cheaper, but it looks like this is predicated on the isExtractVecEltCheap TLI function which is implemented for x86 but not AArch64. Perhaps for now combineRepeatedFPDivisors should only scale num uses by splat if the division can be converted into scalar op. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118343	2022-01-28 17:01:08 +00:00
Jeremy Morse	76fd78b4b3	[MVerifier] Don't check liveness of any debug instruction operands Shiny new DBG_PHI instruction usually have physical registers as operands -- however, the machine verifier checks to see whether they're live, and occasionally this fails. There's a filter for DBG_VALUE instructions to not get verified in this way: expand it to exempt all debug instructions from liveness checking, which means DBG_PHIs get treated like DBG_VALUEs. This also future proofs against us adding new debug instructions. Differential Revision: https://reviews.llvm.org/D117891	2022-01-28 15:04:54 +00:00
Martin Storsjö	f7d2afbac9	[CodeGen] Emit COFF symbol type for function aliases On the level of the generated object files, both symbols (both original and alias) are generally indistinguishable - both are regular defined symbols. But previously, only the original function had the COFF ComplexType set to IMAGE_SYM_DTYPE_FUNCTION, while the symbol created via an alias had the type set to IMAGE_SYM_DTYPE_NULL. This matches what GCC does, which emits directives for setting the COFF symbol type for this kind of alias symbol too. This makes a difference when GNU ld.bfd exports symbols without dllexport directives or a def file - it seems to decide between function or data exports based on the COFF symbol type. This means that functions created via aliases, like some C++ constructors, are exported as data symbols (missing the thunk for calling without dllimport). The hasnt been an issue when doing the same with LLD, as LLD decides between function or data export based on the flags of the section that the symbol points at. This should fix the root cause of https://github.com/msys2/MINGW-packages/issues/10547. Differential Revision: https://reviews.llvm.org/D118328	2022-01-28 13:06:16 +02:00
Ellis Hoag	11d3074267	[InstrProf] Add single byte coverage mode Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because * We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions * We use a single byte per function rather than 8 bytes per block The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code. When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%). [0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D116180	2022-01-27 17:38:55 -08:00
Simon Pilgrim	fdd3e2c943	[DAG] SelectionDAG::getNode(N1,N2) - detect N2 constant vector splats as well as scalars We already perform some basic folds (add/sub with zero etc.) on scalar types, this patch adds some basic support for constant splats as well in a few cases (we can add more with future test coverage). In the cases I've enabled, we can handle buildvector implicit truncation as we're not creating new constant nodes from the vector types - we're just returning existing nodes. This allows us to get a number of extra cases in the aarch64 tests. I haven't enabled support for undefs in buildvector splats, as we're often checking for zero/allones patterns that return the original constant and we shouldn't be returning undef elements in some of these cases - we can enable this later if we're OK with creating new constants. Differential Revision: https://reviews.llvm.org/D118264	2022-01-27 10:59:08 +00:00
Fraser Cormack	84e85e025e	[SelectionDAG][VP] Provide expansion for VP_MERGE This patch adds support for expanding VP_MERGE through a sequence of vector operations producing a full-length mask setting up the elements past EVL/pivot to be false, combining this with the original mask, and culminating in a full-length vector select. This expansion should work for any data type, though the only use for RVV is for boolean vectors, which themselves rely on an expansion for the VSELECT. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118058	2022-01-27 09:00:41 +00:00
Adrian Prantl	ee72b17386	Fix UB in DwarfExpression::emitLegacyZExt() A shift-left > 63 triggers a UBSAN failure. This patch kicks the can down the road (to the consumer) by emitting a more compact representation of the shift computation in DWARF expressions. Relanding (I accidentally pushed an earlier version of the patch previously). Differential Revision: https://reviews.llvm.org/D118183	2022-01-26 13:08:35 -08:00
Adrian Prantl	f400a6012c	Revert "Fix UB in DwarfExpression::emitLegacyZExt()" This reverts commit `216002c4bb` while investigating bot breakage.	2022-01-26 12:46:07 -08:00
Matt Arsenault	2d670de84c	GlobalISel: Avoid crash on asm with lying result types The physical register in the asm has the wrong type for the declared IR. It seems to work in the DAG by extracting the 4 elements that are defined in the IR from the register, but that isn't handled here. This doesn't seem to be a well tested path since other mismatched cases are crashing the DAG asm handling.	2022-01-26 15:23:59 -05:00
Adrian Prantl	216002c4bb	Fix UB in DwarfExpression::emitLegacyZExt() A shift-left > 63 triggers a UBSAN failure. This patch kicks the can down the road (to the consumer) by emitting a more compact representation of the shift computation in DWARF expressions. Differential Revision: https://reviews.llvm.org/D118183	2022-01-26 10:57:11 -08:00
Chih-Ping Chen	28bfa57a73	[DebugInfo] Add stringLocationExp field to DIStringType DIStringType is used to encode the debug info of a character object in Fortran. A Fortran deferred-length character object is typically implemented as a pair of the following two pieces of info: An address of the raw storage of the characters, and the length of the object. The stringLocationExp field contains the DIExpression to get to the raw storage. This patch also enables the emission of DW_AT_data_location attribute in a DW_TAG_string_type debug info entry based on stringLocationExp in DIStringType. A test is also added to ensure that the bitcode reader is backward compatible with the old DIStringType format. Differential Revision: https://reviews.llvm.org/D117586	2022-01-26 11:56:57 -05:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
Sanjay Patel	63daea8b35	[SDAG] fix bug in ComputeNumSignBits of target constant The loop below the changed line assumes that the element width of the target constant is the same as the element width of the loaded value, but that is not always true. We could try harder to do some kind of min/max calc even if the sizes don't match, but that can be another patch if needed. This fixes #53401 (miscompile) and does not change the motivating cases added when this analysis was introduced: `ad298f86b7`	2022-01-26 10:22:41 -05:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
alex-t	5157f984ae	[AMDGPU] Enable divergence-driven XNOR selection Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence. This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32) on those targets which have no V_XNOR. Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form. We assume that xor (not) is already turned into the not (xor) by the combiner. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116270	2022-01-26 15:33:10 +03:00
Sebastian Neubauer	4723f3cf03	[AMDGPU][GlobalISel] Combine unmerge of undef Fold (unmerge undef) -> undef, undef, ... Differential Revision: https://reviews.llvm.org/D118138	2022-01-26 12:30:36 +01:00
David Green	57356d6bb7	[DAG] Create fptoui.sat from clamped fptoui This is the unsigned variant of D111976, where we convert a clamped fptoui to a fptoui.sat. Because we are unsigned, the condition this time is only UMIN of UINT_MAX. Similarly to D111976 it handles ISD::UMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D114964	2022-01-26 08:37:44 +00:00
wangpc	8597458278	[regalloc] Fix assertion error when LiveInterval is empty When evicting interference, it causes an asseertion error since LiveIntervals::intervalIsInOneMBB assumes that input is not empty. This patch fixed bug mentioned in D118020. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D118124	2022-01-26 14:06:57 +08:00
Adrian Prantl	3efa016d4c	Revert accidentally pushed commit. It was not yet reviewed. "Fix UB in DwarfExpression::emitLegacyZExt()" This reverts commit `e37de5d36e`.	2022-01-25 13:53:14 -08:00
Adrian Prantl	e37de5d36e	Fix UB in DwarfExpression::emitLegacyZExt() A shift-left > 63 triggers a UBSAN failure. This patch kicks the can down the road (to the consumer) by emitting a more compact representation of the shift computation in DWARF expressions. Differential Revision: https://reviews.llvm.org/D118183	2022-01-25 13:49:14 -08:00
Sean Fertile	a2505bd063	[PowerPC][AIX] Override markFunctionEnd() During fast-isel calling 'markFunctionEnd' in the base class will call tidyLandingPads. This can cause an issue where we have determined that we need ehinfo and emitted a traceback table with the bits set to indicate that we will be emitting the ehinfo, but the tidying deletes all landing pads. In this case we end up emitting a reference to __ehinfo.N symbol, but not emitting a definition to said symbol and the resulting file fails to assemble. Differential Revision: https://reviews.llvm.org/D117040	2022-01-25 10:08:53 -05:00
Nikita Popov	a3a2239aaa	[GlobalISel] Avoid pointer element type access during InlineAsm lowering Same change as has been made for the SDAG lowering.	2022-01-25 14:26:47 +01:00
Simon Pilgrim	15e2be291f	[DAG] visitMULHS/MULHU/AND - remove some redundant LHS constant checks Now that we constant fold and canonicalize constants to the RHS, we don't need to check both LHS and RHS for specific constants	2022-01-25 11:54:23 +00:00
Bjorn Pettersson	109cc5adcc	[DAGCombine] Fold SRA of a load into a narrower sign-extending load An sra is basically sign-extending a narrower value. Fold away the shift by doing a sextload of a narrower value, when it is legal to reduce the load width accordingly. Differential Revision: https://reviews.llvm.org/D116930	2022-01-25 12:14:48 +01:00
Fraser Cormack	7cb452bfde	[SelectionDAG][VP] Add widening support for VP_MERGE This patch adds widening support for ISD::VP_MERGE, which widens identically to VP_SELECT and similarly to other select-like nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118030	2022-01-25 10:59:40 +00:00
Fraser Cormack	5f5c5603ce	[SelectionDAG][VP] Add splitting support for VP_MERGE This patch adds splitting support for ISD::VP_MERGE, which splits identically to VP_SELECT and similarly to other select-like nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118032	2022-01-25 10:33:23 +00:00
Victor Perez	2233befa5d	[LegalizeTypes][VP] Add splitting support for vp.gather and vp.scatter Split these nodes in a similar way as their masked versions. Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D117760	2022-01-25 10:08:07 +00:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
Nikita Popov	9554aaa275	[Dwarf] Optimize getOrCreateSourceID() for repeated calls on same file (NFCI) DwarfCompileUnit::getOrCreateSourceID() is often called many times in sequence with the same DIFile. This is currently very expensive, because it involves creating a string from directory and file name and looking it up in a string map. This patch remembers the last DIFile and its ID and directly returns that. This gives a geomean -1.3% compile-time improvement on CTMark O0-g. Differential Revision: https://reviews.llvm.org/D118041	2022-01-25 09:27:11 +01:00
Ahmed Bougacha	e7298464c5	[ObjCARC] Use "UnsafeClaimRV" to refer to unsafeClaim in enums. NFC. This matches the actual runtime function more closely. I considered also renaming both RetainRV/UnsafeClaimRV to end with "ARV", for AutoreleasedReturnValue, but there's less potential for confusion there.	2022-01-24 19:37:01 -08:00
Paweł Bylica	9d32847b33	[DAGCombine] Remove unused param in combineCarryDiamond(). NFC	2022-01-24 20:57:00 +01:00
Mircea Trofin	b1af01fe6a	[NFC][MLGO] Simplify conditional compilation Most of the code that's shared between 'release' and 'development' modes doesn't depend on anything special.	2022-01-24 11:19:04 -08:00
Jeremy Morse	d27f022614	[NFC][DebugInfo] Strip out an undesired #if 0 block As mentioned in discussion of D116821, it's better to just delete this block than keep it hanging around.	2022-01-24 18:04:47 +00:00
Jeremy Morse	74db5c8c95	Revert rG6a605b97a200 due to excessive memory use Over in the comments for D116821, some use-cases have cropped up where there's a substantial increase in memory usage. A quick inspection shows that a) it's a lot of memory and b) there are several things to be done to reduce it. Reverting (via disabling this feature by default) to avoid bothering people in the meantime.	2022-01-24 17:08:21 +00:00
Sander de Smalen	699e22a083	[ISEL] Move trivial step_vector folds to FoldConstantArithmetic. Given that step_vector is practically a constant, doing this early helps with DAGCombine folds that happen before type legalization. There is currently no way to test this happens earlier, although existing tests for step_vector folds continue protect the folds happening at all. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D117863	2022-01-24 16:37:21 +00:00
Craig Topper	a43ed49f5b	[DAGCombiner][RISCV] Canonicalize (bswap(bitreverse(x))->bitreverse(bswap(x)). If the bitreverse gets expanded, it will introduce a new bswap. By putting a bswap before the bitreverse, we can ensure it gets cancelled out when this happens. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118012	2022-01-24 08:31:53 -08:00
Craig Topper	b8c7cdcc81	[SelectionDAG][RISCV] Teach getNode to fold bswap(bswap(x))->x. This can show up during when bitreverse is expanded to bswap and swap of bits within a byte. If the input is already a bswap, we should cancel them out before we further transform them in a way that makes it harder to see the redundancy. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118007	2022-01-24 08:17:46 -08:00
Matt Arsenault	99e8e17313	Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction" This reverts commit `a97e20a3a8`.	2022-01-24 09:26:52 -05:00
serge-sans-paille	5f290c090a	Move STLFunctionalExtras out of STLExtras Only using that change in StringRef already decreases the number of preoprocessed lines from 7837621 to 7776151 for LLVMSupport Perhaps more interestingly, it shows that many files were relying on the inclusion of StringRef.h to have the declaration from STLExtras.h. This patch tries hard to patch relevant part of llvm-project impacted by this hidden dependency removal. Potential impact: - "llvm/ADT/StringRef.h" no longer includes <memory>, "llvm/ADT/Optional.h" nor "llvm/ADT/STLExtras.h" Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup/5831	2022-01-24 14:13:21 +01:00
Bjorn Pettersson	46cacdbb21	[DAGCombiner] Adjust some checks in DAGCombiner::reduceLoadWidth In code review for D117104 two slightly weird checks were found in DAGCombiner::reduceLoadWidth. They were typically checking if BitsA was a mulitple of BitsB by looking at (BitsA & (BitsB - 1)), but such a comparison actually only make sense if BitsB is a power of two. The checks were related to the code that attempted to shrink a load based on the fact that the loaded value would be right shifted. Afaict the legality of the value types is checked later (typically in isLegalNarrowLdSt), so the existing checks were both overly conservative as well as being wrong whenever ExtVTBits wasn't a power of two. The latter was a situation triggered by a number of lit tests so we could not just assert on ExtVTBIts being a power of two). When attempting to simply remove the checks I found some problems, that seems to have been guarded by the checks (maybe just out of luck). A typical example would be a pattern like this: t1 = load i96* ptr t2 = srl t1, 64 t3 = truncate t2 to i64 When DAGCombine is visiting the truncate reduceLoadWidth is called attempting to narrow the load to 64 bits (ExtVT := MVT::i64). Then the SRL is detected and we set ShAmt to 64. In the past we've bailed out due to i96 not being a multiple of 64. If we simply remove that check then we would end up replacing the load with a new load that would read 64 bits but with a base pointer adjusted by 64 bits. So we would read 32 bits the wasn't accessed by the original load. This patch will instead utilize the fact that the logical left shift can be folded away by using a zextload. Thus, the pattern above will now be combined into t3 = load i32* ptr+offset, zext to i64 Another case is shown in the X86/shift-folding.ll test case: t1 = load i32* ptr t2 = srl i32 t1, 8 t3 = truncate t2 to i16 In the past we bailed out due to the shift count (8) not being a multiple of 16. Now the narrowing kicks in and we get t3 = load i16* ptr+offset Differential Revision: https://reviews.llvm.org/D117406	2022-01-24 12:22:04 +01:00
Nikita Popov	0d1308a7b7	[AArch64][GlobalISel] Support returned argument with multiple registers The call lowering code assumed that a returned argument could only consist of one register. Pass an ArrayRef<Register> instead of Register to make sure that all parts get assigned. Fixes https://github.com/llvm/llvm-project/issues/53315. Differential Revision: https://reviews.llvm.org/D117866	2022-01-24 10:55:28 +01:00
Nikita Popov	e7c9a6cae0	[SDAG] Don't move DBG_VALUE instructions after insertion point during scheduling (PR53243) EmitSchedule() shouldn't be touching instructions after the provided insertion point. The change introduced in D83561 performs a scan to the end of the block, and thus may move unrelated instructions. In particular, this ends up moving instructions that have been produced by FastISel and will later be deleted. Moving them means that more instructions than intended are removed. Fix this by stopping the iteration when the insertion point is reached. Fixes https://github.com/llvm/llvm-project/issues/53243. Differential Revision: https://reviews.llvm.org/D117489	2022-01-24 10:50:49 +01:00
Sander de Smalen	4f8fdf7827	[ISEL] Canonicalise constant splats to RHS. SelectionDAG::getNode() canonicalises constants to the RHS if the operation is commutative, but it doesn't do so for constant splat vectors. Doing this early helps making certain folds on vector types, simplifying the code required for target DAGCombines that are enabled before Type legalization. Somewhat to my surprise, DAGCombine doesn't seem to traverse the DAG in a post-order DFS, so at the time of doing some custom fold where the input is a MUL, DAGCombiner::visitMUL hasn't yet reordered the constant splat to the RHS. This patch leads to a few improvements, but also a few minor regressions, which I traced down to D46492. When I tried reverting this change to see if the changes were still necessary, I ran into some segfaults. Not sure if there is some latent bug there. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117794	2022-01-24 09:38:36 +00:00
Abinav Puthan Purayil	68b70d17d8	[GlobalISel] Fold or of shifts with constant amount to funnel shift. This change folds (or (shl x, C0), (lshr y, C1)) to funnel shift iff C0 and C1 are constants where C0 + C1 is the bit-width of the shift instructions. Differential Revision: https://reviews.llvm.org/D116529	2022-01-24 10:43:32 +05:30
David Blaikie	2e58a18910	DebugInfo: Include template parameters for simplified template decls in type units LLVM DebugInfo CodeGen synthesizes type declarations in type units when referencing types that are not in type units. When those synthesized types are templates and simplified template names (or mangled simplified template names) are in use, the template arguments must be attached to those declarations. A deeper fix (with a CU or DICompositeType flag) that would also support other uses of clang's -debug-forward-template-args (such as Sony's platform) could/should be implemented to fix this more broadly.	2022-01-23 16:10:14 -08:00
David Blaikie	ab4756338c	DebugInfo: Don't put types in type units if they reference internal linkage types Doing this causes a declaration of the internal linkage (anonymous namespace) type to be emitted in the type unit, which would then be ambiguous as to which internal linkage definition it refers to (since the name is only valid internally). It's possible these internal linkage types could be resolved relative to the unit the TU is referred to from - but that doesn't seem ideal, and there's no reason to put the type in a type unit since it can only be defined in one CU anyway (since otherwise it'd be an ODR violation) & so avoiding the type unit should be a smaller DWARF encoding anyway. This also addresses an issue with Simplified Template Names where the template parameter could not be rebuilt from the declaration emitted into the TU (specifically for an enum non-type template parameter, where looking up the enumerators is necessary to rebuild the full template name)	2022-01-23 14:07:31 -08:00
Simon Pilgrim	accc07e654	[DAG] Fold (X & Y) != 0 --> zextOrTrunc(X & Y) iff everything but LSB is known zero (PR51312) Fixes parity codegen issue where we know all but the lowest bit is zero, we can replace the ICMPNE with 0 comparison with a ext/trunc Differential Revision: https://reviews.llvm.org/D117983	2022-01-23 16:36:25 +00:00
Simon Pilgrim	6605057992	Revert rG7c66aaddb128dc0f342830c1efaeb7a278bfc48c "[DAG] Fold (X & Y) != 0 --> zextOrTrunc(X & Y) iff everything but LSB is known zero (PR51312)" Noticed a typo in the getBooleanContents call just after I pressed commit :(	2022-01-23 16:28:44 +00:00
Simon Pilgrim	7c66aaddb1	[DAG] Fold (X & Y) != 0 --> zextOrTrunc(X & Y) iff everything but LSB is known zero (PR51312) Fixes parity codegen issue where we know all but the lowest bit is zero, we can replace the ICMPNE with 0 comparison with a ext/trunc Differential Revision: https://reviews.llvm.org/D117983	2022-01-23 16:20:42 +00:00
Simon Pilgrim	20d46fbd4a	[CodeGenPrepare] Use dyn_cast result to check for null pointers Simplifies logic and helps the static analyzer correctly check for nullptr dereferences	2022-01-23 12:47:52 +00:00
David Green	b27e5459d5	[DAG] Convert truncstore(extend(x)) back to store(x) Pulled out of D106237, this folds truncstore(extend(x)) back to store(x) if the original store was legal. This can come up due to the order we fold nodes. A fold from X86 needs to be adjusted to prevent infinite loops, to have it pick the operand of a trunc more directly. Differential Revision: https://reviews.llvm.org/D117901	2022-01-22 13:20:36 +00:00
OCHyams	b6a41fddcf	[DWARF][DebugInfo] Fix off-by-one error in size of DW_TAG_base_type types Fix PR53163 by rounding the byte size of DW_TAG_base_type types up. Without this fix we risk emitting types with a truncated size (including rounding less-than-byte-sized types' sizes down to zero). Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D117124	2022-01-21 11:37:49 +00:00
Craig Topper	9abc593e98	[TargetLowering][InstCombine] Simplify BSwap demanded bits code a little. NFC Use alignDown instead of &= ~7. Replace ResultBit with NLZ. (BitWidth - NLZ - NTZ == 8) so (BitWidth - NTZ - 8 == NLZ). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D117804	2022-01-20 10:45:17 -08:00
Alexandre Ganea	5af2433e17	[clang-cl] Support the /HOTPATCH flag This patch adds support for the MSVC /HOTPATCH flag: https://docs.microsoft.com/sv-se/cpp/build/reference/hotpatch-create-hotpatchable-image?view=msvc-170&viewFallbackFrom=vs-2019 The flag is translated to a new -fms-hotpatch flag, which in turn adds a 'patchable-function' attribute for each function in the TU. This is then picked up by the PatchableFunction pass which would generate a TargetOpcode::PATCHABLE_OP of minsize = 2 (which means the target instruction must resolve to at least two bytes). TargetOpcode::PATCHABLE_OP is only implemented for x86/x64. When targetting ARM/ARM64, /HOTPATCH isn't required (instructions are always 2/4 bytes and suitable for hotpatching). Additionally, when using /Z7, we generate a 'hot patchable' flag in the CodeView debug stream, in the S_COMPILE3 record. This flag is then picked up by LLD (or link.exe) and is used in conjunction with the linker /FUNCTIONPADMIN flag to generate extra space before each function, to accommodate for live patching long jumps. Please see: `d703b92296/lld/COFF/Writer.cpp (L1298)` The outcome is that we can finally use Live++ or Recode along with clang-cl. NOTE: It seems that MSVC cl.exe always enables /HOTPATCH on x64 by default, although if we did the same I thought we might generate sub-optimal code (if this flag was active by default). Additionally, MSVC always generates a .debug$S section and a S_COMPILE3 record, which Clang doesn't do without /Z7. Therefore, the following MSVC command-line "cl /c file.cpp" would have to be written with Clang such as "clang-cl /c file.cpp /HOTPATCH /Z7" in order to obtain the same result. Depends on D43002, D80833 and D81301 for the full feature. Differential Revision: https://reviews.llvm.org/D116511	2022-01-20 12:57:19 -05:00
Lucas Prates	283f5a198a	[GlobalISel] Fix incorrect sign extension when combining G_INTTOPTR and G_PTR_ADD The GlobalISel combiner currently uses sign extension when manipulating the LHS constant when combining a sequence of the following sequence of machine instructions into a single constant: ``` %0:_(s32) = G_CONSTANT i32 <CONSTANT> %1:_(p0) = G_INTTOPTR %0:_(s32) %2:_(s64) = G_CONSTANT i64 <CONSTANT> %3:_(p0) = G_PTR_ADD %1:_, %2:_(s64) ``` This causes an issue when the bit width of the first contant and the target pointer size are different, as G_INTTOPTR has no sign extension semantics. This patch fixes this by capture an arbitrary precision in when matching the constant, allowing the matching function to correctly zero extend it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D116941	2022-01-20 17:02:52 +00:00
Mircea Trofin	f29256a64a	[MLGO] Improved support for AOT cross-targeting scenarios The tensorflow AOT compiler can cross-target, but it can't run on (for example) arm64. We added earlier support where the AOT-ed header and object would be built on a separate builder and then passed at build time to a build host where the AOT compiler can't run, but clang can be otherwise built. To simplify such scenarios given we now support more than one AOT-able case (regalloc and inliner), we make the AOT scenario centered on whether files are generated, case by case (this includes the "passed from a different builder" scenario). This means we shouldn't need an 'umbrella' LLVM_HAVE_TF_AOT, in favor of case by case control. A builder can opt out of an AOT case by passing that case's model path as `none`. Note that the overrides still take precedence. This patch controls conditional compilation with case-specific flags, which can be enabled locally, for the component where those are available. We still keep an overall flag for some tests. The 'development/training' mode is unchanged, because there the model is passed from the command line and interpreted. Differential Revision: https://reviews.llvm.org/D117752	2022-01-20 07:05:39 -08:00
Nikita Popov	81d35f27dd	[DebugInstrRef] Memoize variable order during sorting (NFC) Instead of constructing DebugVariables and looking up the order in the comparison function, compute the order upfront and then sort a vector of (order, instr). This improves compile-time by -0.4% geomean on CTMark ReleaseLTO-g. Differential Revision: https://reviews.llvm.org/D117575	2022-01-20 16:04:24 +01:00
Victor Perez	c10c748878	[LegalizeTypes][VP] Add widening support for vp.gather and vp.scatter Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117557	2022-01-20 08:57:57 +00:00
Alexandre Ganea	aba5b91b69	Re-land [CodeView] Add full repro to LF_BUILDINFO record This patch writes the full -cc1 command into the resulting .OBJ, like MSVC does. This allows for external tools (Recode, Live++) to rebuild a source file without any external dependency but the .OBJ itself (other than the compiler) and without knowledge of the build system. The LF_BUILDINFO record stores a full path to the compiler, the PWD (CWD at program startup), a relative or absolute path to the source, and the full CC1 command line. The stored command line is self-standing (does not depend on the environment). In the same way, MSVC doesn't exactly store the provided command-line, but an expanded version (a somehow equivalent of CC1) which is also self-standing. For more information see PR36198 and D43002. Differential Revision: https://reviews.llvm.org/D80833	2022-01-19 19:44:37 -05:00
Johannes Doerfert	dd75a6b2ae	[DWARF][FIX] Try not to crash for nvptx with missing debug information This prevents crashes in the OpenMP offload pipeline as not everything is properly annotated with debug information, e.g., the runtimes we link in. While we might want to have them annotated, it seems to be generally useful to gracefully handle missing debug info rather than crashing. TODO: A test is missing and can hopefully be distilled prior to landing. This fixes #51079. Differential Revision: https://reviews.llvm.org/D116959	2022-01-19 18:40:13 -06:00
Mircea Trofin	073e09683d	Fix build break introduced by D117147	2022-01-19 11:43:51 -08:00
Mircea Trofin	e67430cca4	[MLGO] ML Regalloc Eviction Advisor The bulk of the implementation is common between 'release' mode (==AOT-ed model) and 'development' mode (for training), the main difference is that in development mode, we may also log features (for training logs), inject scoring information (currently after the Virtual Register Rewriter) and then produce the log file. This patch also introduces the score injection pass, 'Register Allocation Pass Scoring', which is trivially just logging the score in development mode. Differential Revision: https://reviews.llvm.org/D117147	2022-01-19 11:00:32 -08:00
Simon Pilgrim	d6fee6c3b0	[DAG] SelectionDAG::computeKnownBits - add mul(x,x) self-multiply handling (PR48683) Pass the SelfMultiply flag to KnownBits::mul() - added at D108992 https://alive2.llvm.org/ce/z/NN_eaR	2022-01-19 17:39:32 +00:00
Daniel Thornburgh	2e2999cd44	[NFC] Test commit to verify commit access.	2022-01-18 18:03:26 -08:00
Matt Arsenault	5599c43124	GlobalISel: Swap order of operand checks in ConstantFoldVectorBinop Since constants are canonicalized to the RHS, this is more likely to exit early.	2022-01-18 17:21:02 -05:00
Matt Arsenault	da72822763	GlobalISel: Fix CSEMIRBuilder mishandling constant folds of vectors This was ignoring the requested result register, resulting in a missing def when this happened in the IRTranslator. Fixes some crashes and verifier errors at -O0. Alternatively we could pass DstOps to the constant fold functions.	2022-01-18 17:21:02 -05:00
David Green	100763a88f	[DAG] Extend SearchForAndLoads with any_extend handling This extends the code in SearchForAndLoads to be able to look through ANY_EXTEND nodes, which can be created from mismatching IR types where the AND node we begin from only demands the low parts of the register. That turns zext and sext into any_extends as only the low bits are demanded. To be able to look through ANY_EXTEND nodes we need to handle mismatching types in a few places, potentially truncating the mask to the size of the final load. Recommitted with a more conservative check for the type of the extend. Differential Revision: https://reviews.llvm.org/D117457	2022-01-18 21:03:08 +00:00
Matt Arsenault	984451eafc	PostRAPseudos: Don't preserve kills on some implicit copy operands This fixes a verifier error I ran into at -O0. A subregister copy had an implicit kill of an overlapping superregister, which was partially redefined by the copy. The preserved implicit operand killed subregisters made live earlier in the sequence. AMDGPU already uses similar logic for whether to preserve the kill of the superregister on the final instruction if there's overlap.	2022-01-18 13:52:04 -05:00
Fraser Cormack	c8e33978fb	[VP] Propagate align parameter attr on VP gather/scatter to ISel This patch fixes a case where the 'align' parameter attribute on the pointer operands to llvm.vp.gather and llvm.vp.scatter was being dropped during the conversion to the SelectionDAG. The default alignment equal to the ABI type alignment of the vector type was kept. It also updates the documentation to reflect the fact that the parameter attribute is now properly supported. The default alignment of these intrinsics was previously documented as being equal to the ABI alignment of the scalar type, when in fact that wasn't the case: the ABI alignment of the vector type was used instead. This has also been fixed in this patch. Reviewed By: simoll, craig.topper Differential Revision: https://reviews.llvm.org/D114423	2022-01-18 17:33:24 +00:00
Sanjay Patel	870591200d	[SDAG] remove duplicate functionality when getting shift type for demanded bits; NFCI This was noted as a potential cleanup in D117508. getShiftAmountTy() has checks for vector, phase, etc. so it should handle anything that the caller was trying to account for.	2022-01-18 12:13:45 -05:00
Nikita Popov	0d51b6ab15	[DebugInstrRef] Add some missing const qualifiers (NFC)	2022-01-18 17:19:23 +01:00
Nikita Popov	cbaae61422	[DebugInstrRef] Use DenseMap for ValueToLoc (NFC) Just replacing std::map with DenseMap here is a major regression -- because this code used an identity hash for ValueIDNum. Because ValueIDNum is composed of multiple components, it is important that we use a reasonably good hash function here, so switch it to hash_value. DenseMapInfo::getHashValue<uint64_t> would not be sufficient. This gives a -0.8% geomean improvement on CTMark ReleaseLTO-g.	2022-01-18 17:02:14 +01:00
Vang Thao	10ed1eca24	[MachineSink] Allow sinking of constant or ignorable physreg uses For AMDGPU, any use of the physical register EXEC prevents sinking even if it is not a real physical register read. Add check to see if a physical register use can be ignored for sinking. Also perform same constant and ignorable physical register check when considering sinking in loops. https://reviews.llvm.org/D116053	2022-01-18 14:17:40 +00:00
Victor Perez	b7bf96a258	[LegalizeTypes][VP] Add widening support for vp.reduce.* When widening these intrinsics, we do not have to insert neutral elements at the end of the vector as when widening vector.reduce.* intrinsics, thanks to vector predication semantics. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117467	2022-01-18 10:21:01 +00:00
Hans Wennborg	f4615feaa1	Revert "[DAG] Extend SearchForAndLoads with any_extend handling" This caused builds to fail with llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:5638: bool (anonymous namespace)::DAGCombiner::BackwardsPropagateMask(llvm::SDNode *): Assertion `NewLoad && "Shouldn't be masking the load if it can't be narrowed"' failed. See the code review for a link to a reproducer. > This extends the code in SearchForAndLoads to be able to look through > ANY_EXTEND nodes, which can be created from mismatching IR types where > the AND node we begin from only demands the low parts of the register. > That turns zext and sext into any_extends as only the low bits are > demanded. To be able to look through ANY_EXTEND nodes we need to handle > mismatching types in a few places, potentially truncating the mask to > the size of the final load. > > Differential Revision: https://reviews.llvm.org/D117457 This reverts commit `578008789f`.	2022-01-18 10:50:55 +01:00
Victor Perez	fd1dce35bd	[LegalizeTypes][VP] Add splitting support for vp.reduction.* Split vp.reduction.* intrinsics by splitting the vector to reduce in two halves, perform the reduction operation in each one of them and accumulate the results of both operations. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117469	2022-01-18 09:29:24 +00:00
Bjorn Pettersson	65fbe38f0a	[DwarfDebug] Restore code that make comments stay aligned in DwarfDebug::emitDebugLocEntry Commit `2bddab25db` removed a piece of code from DwarfDebug::emitDebugLocEntry that according to code comments "Make sure comments stay aligned". This patch restores that piece of code, together with the addition of some extra checks in an existing lit test to work as a regression test. Without this patch we incorrectly get .byte 159 # 0 instead of .byte 159 # DW_OP_stack_value Differential Revision: https://reviews.llvm.org/D117441	2022-01-18 09:46:03 +01:00
David Sherwood	f4515ab858	Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants" This reverts commit `197f3c0deb`. Reverting after miscompilation errors discovered with ffmpeg.	2022-01-18 08:40:20 +00:00
Sanjay Patel	ba6485e25f	[SDAG] add demanded bits transform for bswap A possible codegen regression for PowerPC is noted in D117406 because we don't recognize a pattern that demands only 1 byte from a bswap. This fold has existed in IR since close to the beginning of LLVM: https://github.com/llvm/llvm-project/blame/main/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L794 ...so this patch copies that code as much as possible and adapts it for SDAG. The test for PowerPC that would change in D117406 is over-reduced with undefs, so I recreated it for AArch64 and x86 by passing in pointer args and renamed the values to make the logic clearer. Differential Revision: https://reviews.llvm.org/D117508	2022-01-17 18:25:42 -05:00
David Green	578008789f	[DAG] Extend SearchForAndLoads with any_extend handling This extends the code in SearchForAndLoads to be able to look through ANY_EXTEND nodes, which can be created from mismatching IR types where the AND node we begin from only demands the low parts of the register. That turns zext and sext into any_extends as only the low bits are demanded. To be able to look through ANY_EXTEND nodes we need to handle mismatching types in a few places, potentially truncating the mask to the size of the final load. Differential Revision: https://reviews.llvm.org/D117457	2022-01-17 15:25:11 +00:00
David Sherwood	197f3c0deb	[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants When we know the value we're extending is a negative constant then it makes sense to use SIGN_EXTEND because this may improve code quality in some cases, particularly when doing a constant splat of an unpacked vector type. For example, for SVE when splatting the value -1 into all elements of a vector of type <vscale x 2 x i32> the element type will get promoted from i32 -> i64. In this case we want the splat value to sign-extend from (i32 -1) -> (i64 -1), whereas currently it zero-extends from (i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use a single mov immediate instruction. New tests added here: CodeGen/AArch64/sve-vector-splat.ll I believe we see some code quality improvements in these existing tests too: CodeGen/AArch64/reduce-and.ll CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only occur because the test disables codegen prepare and branch folding. Differential Revision: https://reviews.llvm.org/D114357	2022-01-17 11:08:57 +00:00
Nikita Popov	873a7ee7e4	[MachineInstr] Don't include debug uses in bundle header (PR52817) Following the recommendation in https://github.com/llvm/llvm-project/issues/52817#issuecomment-1007635426, this excludes debug instructions when finalizing the bundle. As uses in debug instructions don't have effects, they will no longer be included in the BUNDLE header. Fixes https://github.com/llvm/llvm-project/issues/52817. Differential Revision: https://reviews.llvm.org/D116945	2022-01-17 10:43:21 +01:00
Bjorn Pettersson	9f237c9e7d	[DAGCombine] Refactor DAGCombiner::ReduceLoadWidth. NFCI Update code comments in DAGCombiner::ReduceLoadWidth and refactor the handling of SRL a bit. The refactoring is done with the intent of adding support for folding away SRA by using SEXTLOAD in a follow-up patch. The function is also renamed as DAGCombiner::reduceLoadWidth. Differential Revision: https://reviews.llvm.org/D117104	2022-01-16 20:24:52 +01:00
Fangrui Song	5456249736	[SelectionDAG] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D117235	2022-01-15 17:13:09 -08:00
Nikita Popov	c63a3175c2	[AttrBuilder] Remove ctor accepting AttributeList and Index Use the AttributeSet constructor instead. There's no good reason why AttrBuilder itself should exact the AttributeSet from the AttributeList. Moving this out of the AttrBuilder generally results in cleaner code.	2022-01-15 22:39:31 +01:00
Fraser Cormack	877d1b3d07	[SelectionDAG][VP] Add splitting/widening for VP_LOAD and VP_STORE Original patch by @hussainjk. This patch was split off from D109377 to keep vector legalization (widening/splitting) separate from vector element legalization (promoting). While the original patch added a third overload of SelectionDAG::getVPStore, this patch takes the liberty of collapsing those all down to 1, as three overloads seems excessive for a little-used node. The original patch also used ModifyToType in places, but that method still crashes on scalable vector types. Seeing as the other VP legalization methods only work when all operands need identical widening, this patch follows in that vein. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117235	2022-01-15 11:41:29 +00:00
Craig Topper	e0841f6920	[SelectionDAGBuilder] Remove unneeded vector bitcast from visitTargetIntrinsic. This seems to be a leftover from a long time ago when there was an ISD::VBIT_CONVERT and a MVT::Vector. It looks like in those days the vector type was carried in a VTSDNode. As far as I know, these days ComputeValueTypes would have already assigned "Result" the same type we're getting from TLI.getValueType here. Thus the BITCAST is always a NOP. Verified by adding an assert and running check-llvm. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D117335	2022-01-14 12:52:49 -08:00
James Y Knight	a97e20a3a8	Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction" This commit sometimes causes a crash when compiling a vtable thunk. E.g.: clang '--target=aarch64-grtev4-linux-gnu' -xc++ - -c -o /dev/null <<EOF struct a { virtual int f(); }; struct c { virtual int &g() const; }; struct d : a, c { int &g() const; }; int &d::g() const {} EOF Some follow-up commits have been reverted as well: Revert "IR: Make getRetAlign check callee function attributes" Revert "Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC." Revert "Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC." This reverts commit `4f414af6a7`. This reverts commit `a5507d2e25`. This reverts commit `3d2d208f6a`. This reverts commit `07ddfa95e3`.	2022-01-14 04:50:07 +00:00
David Sherwood	ba471ba8d2	Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants" This reverts commit `31009f0b5a`. It seems to be causing SVE VLA buildbot failures and has introduced a genuine regression. Reverting for now.	2022-01-13 15:59:43 +00:00
Eugene Zhulenev	764e52f0d4	[DebugInfo][InstrRef] Short-circuit unnecessary preferred location map construction Reviewed By: cota Differential Revision: https://reviews.llvm.org/D117162	2022-01-13 06:24:52 -08:00
Simon Pilgrim	4f414af6a7	Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC.	2022-01-13 11:10:50 +00:00
Sebastian Neubauer	f4139440f1	[Docs] Fix IR and TableGen grammar inconsistencies IR: - globals (and functions, ifuncs, aliases) can have a partition - catchret has a `to` before the label - the sint/int types do not exist - signext comes after the type - a variable was missing its type TableGen: - The second value after a `#` concatenation is optional See e.g. llvm/lib/Target/X86/X86InstrAVX512.td:L3351 - IncludeDirective and PreprocessorDirective were never referenced in the grammar - Add some missing ; - Parent classes of multiclasses can have generic arguments. Reuse the `ParentClassList` that is already used in other places. MIR: - liveins only allows physical registers, which start with a $ Differential Revision: https://reviews.llvm.org/D116674	2022-01-13 11:55:13 +01:00
David Sherwood	31009f0b5a	[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants When we know the value we're extending is a negative constant then it makes sense to use SIGN_EXTEND because this may improve code quality in some cases, particularly when doing a constant splat of an unpacked vector type. For example, for SVE when splatting the value -1 into all elements of a vector of type <vscale x 2 x i32> the element type will get promoted from i32 -> i64. In this case we want the splat value to sign-extend from (i32 -1) -> (i64 -1), whereas currently it zero-extends from (i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use a single mov immediate instruction. New tests added here: CodeGen/AArch64/sve-vector-splat.ll I believe we see some code quality improvements in these existing tests too: CodeGen/AArch64/dag-numsignbits.ll CodeGen/AArch64/reduce-and.ll CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only occur because the test disables codegen prepare and branch folding. Differential Revision: https://reviews.llvm.org/D114357	2022-01-13 09:43:07 +00:00
Matt Arsenault	5a16306c09	GlobalISel: Always enable GISelKnownBits for InstructionSelect This wasn't running at -O0, and causing crashes for AMDGPU. AMDGPU needs this to match the addressing modes of stack access instructions, which is even more important at -O0 than with optimizations. It currently costs nothing to run ahead of time, so just always enable it.	2022-01-12 18:57:24 -05:00
Matt Arsenault	5f39a02ea9	RegScavenger: Remove used regs from scavenge candidates In a future change, AMDGPU will have 2 emergency scavenging indexes in some situations. The secondary scavenging index ends up being used recursively when the scavenger calls eliminateFrameIndex for the emergency spill slot. Without this, it would end up seeing the same register which was just scavenged in the parent call as free, inserts a second emergency spill to the same location and returns the same register when 2 unique free registers are required. We need to only do this if the register is used. SystemZ uses 2 scavenging slots, but calls the scavenger twice in sequence and not recursively. In this case the previously scavenged register can be re-clobbered, but is still tracked in the scavenger until it sees the deferred restore instruction.	2022-01-12 18:56:52 -05:00
Matt Arsenault	07ddfa95e3	GlobalISel: Add G_ASSERT_ALIGN hint instruction Insert it for call return values only for now, which is the only case the DAG handles also.	2022-01-12 18:20:58 -05:00
Matt Arsenault	8a16201a0b	GlobalISel: Fix insert point in localizer This was inserting the new G_CONSTANT after the use, and the later block scan would run off the end. Fix calling SkipPHIsAndLabels for no apparent reason.	2022-01-12 13:44:05 -05:00
Mircea Trofin	b2d2e93138	[NFC][MLGO] The regalloc reward is float, not int64_t	2022-01-12 09:32:41 -08:00
Mircea Trofin	3150bce078	[NFC][MLGO] Prep a few files before the main ML Regalloc adviser To avoid trivial changes.	2022-01-12 08:54:00 -08:00
Petar Avramovic	c8c5dc766b	GlobalIsel: Fix fma combine when one of the operands comes from unmerge Fma combine assumes that MRI.getVRegDef(Reg)->getOperand(0).getReg() = Reg which is not true when Reg is defined by instruction with multiple defs e.g. G_UNMERGE_VALUES. Fix is to keep register and the instruction that defines register in DefinitionAndSourceRegister and use when needed. Differential Revision: https://reviews.llvm.org/D117032	2022-01-12 17:47:25 +01:00
Leonard Grey	0f85393004	[MachO] Port call graph profile section and directive This ports the `.cg_profile` assembly directive and call graph profile section generation to MachO from COFF/ELF. Due to MachO section naming rules, the section is called `__LLVM,__cg_profile` rather than `.llvm.call-graph-profile` as in COFF/ELF. Support for llvm-readobj is included to facilitate testing. Corresponding LLD change is D112164 Differential Revision: https://reviews.llvm.org/D112160	2022-01-12 09:22:26 -05:00
Jeremy Morse	6a605b97a2	[DebugInfo] Move flag for instr-ref to LLVM option, from TargetOptions This feature was previously controlled by a TargetOptions flag, and I figured that codegen::InitTargetOptionsFromCodeGenFlags would default it to "on" for all frontends. Enabling by default was discussed here: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153653.html and originally supposed to happen in `3c04507088`, but it didn't actually take effect, as it turns out frontends initialize TargetOptions themselves. This patch moves the flag from a TargetOptions flag to a global flag to CodeGen, where it isn't immediately affected by the frontend being used. Hopefully this will actually cause instr-ref to be on by default on x86_64 now! This patch is easily reverted, and chances of turbulence are moderately high. If you need to revert, please consider instead commenting out the 'return true' part of llvm::debuginfoShouldUseDebugInstrRef to turn the feature off, and dropping me an email. Differential Revision: https://reviews.llvm.org/D116821	2022-01-12 13:28:01 +00:00
Alexey Lapshin	39385d4cd1	[CodeGen][Debuginfo][NFC] Refactor DIE values SizeOf method to not depend on AsmPrinter. SizeOf() method of DIE values(unsigned SizeOf(const AsmPrinter *AP, dwarf::Form Form) const) depends on AsmPrinter. AsmPrinter is too specific class here. This patch removes dependency on AsmPrinter and use dwarf::FormParams structure instead. It allows calculate DIE values size without using AsmPrinter. That refactoring is useful for D96035([dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.) Differential Revision: https://reviews.llvm.org/D116997	2022-01-12 13:15:26 +03:00
Craig Topper	63b17eb9ec	[RISCV] Add strictfp support for compares. This adds support for STRICT_FSETCC(quiet) and STRICT_FSETCCS(signaling). FEQ matches well to STRICT_FSETCC oeq. FLT/FLE matches well to STRICT_FSETCCS olt/ole. Others require commuting operands or multiple instructions. STRICT_FSETCC olt/ole/ogt/oge/ult/ule/ugt/uge uses FLT/FLE, but we need to save/restore FFLAGS around them to avoid spurious exceptions. I've implemented pseudo instructions with a CustomInserter to insert the save/restore CSR instructions. Unfortunately, this doesn't honor exceptions for signaling NANs but I'm not sure if signaling nans are really supported by the constrained intrinsics. STRICT_FSETCC one and ueq expand to a pair of FLT instructions with a save/restore of fflags around each. This could be improved in the future. There may be some opportunities to generate better code for strict comparisons mixed with nonans fast math flags. I've left FIXMEs in the .td files for that. Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com> Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D116694	2022-01-11 20:01:41 -08:00
Matt Arsenault	5a434ceafb	GlobalISel: Use cloneVirtualRegister in localizer	2022-01-11 16:10:12 -05:00
Nick Desaulniers	4edb9983cb	[SelectionDAG] treat X constrained labels as i for asm Completely rework how we handle X constrained labels for inline asm. X should really be treated as i. Then existing tests can be moved to use i D115410 and clang can just emit i D115311. (D115410 and D115311 are callbr, but this can be done for label inputs, too). Coincidentally, this simplification solves an ICE uncovered by D87279 based on assumptions made during D69868. This is the third approach considered. See also discussions v1 (D114895) and v2 (D115409). Reported-by: kernel test robot <lkp@intel.com> Fixes: https://github.com/ClangBuiltLinux/linux/issues/1512 Reviewed By: void, jyknight Differential Revision: https://reviews.llvm.org/D115688	2022-01-11 10:29:40 -08:00
Nick Desaulniers	9c4b49db19	[ShrinkWrap] check for PPC's non-callee-saved LR As pointed out in https://reviews.llvm.org/D115688#inline-1108193, we don't want to sink the save point past an INLINEASM_BR, otherwise prologepilog may incorrectly sink a prolog past the MBB containing an INLINEASM_BR and into the wrong MBB. ShrinkWrap is getting this wrong because LR is not in the list of callee saved registers. Specifically, ShrinkWrap::useOrDefCSROrFI calls RegisterClassInfo::getLastCalleeSavedAlias which reads CalleeSavedAliases which was populated by RegisterClassInfo::runOnMachineFunction by iterating the list of MCPhysReg returned from MachineRegisterInfo::getCalleeSavedRegs. Because PPC's LR is non-allocatable, it's NOT considered callee saved. Add an interface to TargetRegisterInfo for such a case and use it in Shrinkwrap to ensure we don't sink a prolog past an INLINEASM or INLINEASM_BR that clobbers LR. Reviewed By: jyknight, efriedma, nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D116424	2022-01-11 10:01:34 -08:00
David Sherwood	51497dc0b2	[IR] Change vector.splice intrinsic to reject out-of-bounds indices I've changed the definition of the experimental.vector.splice instrinsic to reject indices that are known to be or possibly out-of-bounds. In practice, this means changing the definition so that the index is now only valid in the range [-VL, VL-1] where VL is the known minimum vector length. We use the vscale_range attribute to take the minimum vscale value into account so that we can permit more indices when the attribute is present. The splice intrinsic is currently only ever generated by the vectoriser, which will never attempt to splice vectors with out-of-bounds values. Changing the definition also makes things simpler for codegen since we can always assume that the index is valid. This patch was created in response to review comments on D115863 Differential Revision: https://reviews.llvm.org/D115933	2022-01-11 09:37:39 +00:00
Nick Desaulniers	649b11ef8b	git-clang-format HEAD~	2022-01-10 18:34:30 -08:00
Nick Desaulniers	301e911740	[TargetLowering] precommit refactor from D115688 NFC Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>	2022-01-10 18:32:13 -08:00
Mircea Trofin	b191c1f0f9	[NFC][regalloc] Pull out some AllocationOrder/CostPerUseLimit eviction logic We are reusing that logic in the ML implementation. Differential Revision: https://reviews.llvm.org/D116075	2022-01-10 15:47:31 -08:00
Nadav Rotem	e2cc091a7d	Fix a missed opportunity to merge stores. This commit fixes a missed opportunity in merging consecutive stores. The code that searches for stores skipped the case of stores that directly connect to the root. The comment above the implementation lists this case but the code did not handle it. I found this pattern when looking into the shared_ptr destructor. GCC generates the right sequence. Here is a small repo: int foo(int* buff) { buff[0] = 0; int x = buff[1]; buff[1] = 0; return x; } Differential Revision: https://reviews.llvm.org/D116895	2022-01-10 13:49:02 -08:00
Mircea Trofin	e121269131	[NFC][regalloc] Pass RAGreedy to eviction adviser This patch simplifies the interface between RAGreedy and the eviction adviser by passing the allocator to the adviser, which allows the latter to extract needed information as needed, rather than requiring it be passed piecemeal at construction time (which would also complicate later evolution). Part of this, the patch also moves ExtraRegInfo back to RAGreedy. We keep the encapsulation of ExtraRegInfo because it has benefits (e.g. improved readability by abstracting access to the cascade info) and also simpler re-initialization at regalloc pass re-entry time (we just flush the Optional). Differential Revision: https://reviews.llvm.org/D116669	2022-01-10 11:55:16 -08:00
Matt Arsenault	0ba4e4b500	GlobalISel: Pass DebugLoc to getFunctionLiveInPhysReg Fixes crash in assertion about dropping debug info.	2022-01-10 13:50:52 -05:00
Serge Guelton	d2cc6c2d0c	Use a sorted array instead of a map to store AttrBuilder string attributes Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor slightly heavier, and involves extra allocation for each new string attribute. Storing the attribute key/value as strings implies extra allocation/copy step. Use a sorted vector instead. Given the low number of attributes generally involved, this is cheaper, as showcased by https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions Differential Revision: https://reviews.llvm.org/D116599	2022-01-10 14:49:53 +01:00
Chen Zheng	2c46ca96e2	[PowerPC] fast isel can lower intrinsics call on AIX. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D114778	2022-01-10 02:30:05 +00:00
Craig Topper	a500f7f48f	[SelectionDAG] Add FP_TO_UINT_SAT/FP_TO_SINT_SAT to computeKnownBits/computeNumSignBits. These nodes should saturate to their saturating VT. We can use this information to know the bits past the VT are all zeros or all sign bits. I think we might only have test coverage for the unsigned case. I'll verify and add tests. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D116870	2022-01-09 17:48:05 -08:00
Alexander Shaposhnikov	22430ede7e	[CodeGen] Rename emitCalleeSavedFrameMoves This diff renames emitCalleeSavedFrameMoves to avoid conflicts with non-virtual methods of derived classes having the same name but different semantics. E.g. the class AArch64FrameLowering used to have (non-virtual) "emitCalleeSavedFrameMoves" but it started to override TargetFrameLowering::emitCalleeSavedFrameMoves after https://github.com/llvm/llvm-project/commit/c3e6555616 though its usage and semantics didn't change. P.S. for x86 there was no conflict because the signature of non-virtual X86FrameLowering::emitCalleeSavedFrameMoves is different Test plan: make check-all Differential revision: https://reviews.llvm.org/D114140	2022-01-10 01:33:04 +00:00
Jay Foad	50fb44eebb	[GlobalISel] Use getPreferredShiftAmountTy in one more G_UBFX combine Change CombinerHelper::matchBitfieldExtractFromShrAnd to use getPreferredShiftAmountTy for the shift-amount-like operands of G_UBFX just like all the other G_[SU]BFX combines do. This better matches the AMDGPU legality rules for these instructions. Differential Revision: https://reviews.llvm.org/D116803	2022-01-08 09:20:44 +00:00
Jay Foad	ff971873b3	[GlobalISel] Fix legality checks for G_UBFX combines 1. Fix CombinerHelper::matchBitfieldExtractFromAnd to check legality with the correct types for the G_UBFX that it builds. 2. Fix AMDGPUTargetLowering::isConstantUnsignedBitfieldExtractLegal to match the legality rules: result and first operand can be s32 or s64 but the "shift amount" operands are always s32. 3. Add AMDGPU tests where the post-legalizer combiner would create illegal MIR without the above fixes. Differential Revision: https://reviews.llvm.org/D116802	2022-01-08 09:20:44 +00:00
Kazu Hirata	4e2ec7e38d	[llvm] Remove unused forward declarations (NFC)	2022-01-07 20:00:34 -08:00
Kazu Hirata	b932bdf59f	[llvm] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-07 17:45:09 -08:00
Jay Foad	3f3fe4a5cf	[GlobalISel] Fix typo Extact to Extract in function name. NFC.	2022-01-07 11:13:35 +00:00
Nikita Popov	0312fe2901	[CodeGen] Support opaque pointers for inline asm This is the last part of D116531. Fetch the type of the indirect inline asm operand from the elementtype attribute, rather than the pointer element type. Fixes https://github.com/llvm/llvm-project/issues/52928.	2022-01-07 10:57:38 +01:00
Nikita Popov	e4d1779990	[IR] Add ConstraintInfo::hasArg() helper (NFC) Checking whether a constraint corresponds to an argument is a recurring pattern.	2022-01-07 10:44:38 +01:00
Victor Perez	38efa68b08	[LegalizeTypes][VP] Add splitting support for vp.select Split vp.select in a similar way as vselect, splitting also the length parameter. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116651	2022-01-07 08:46:01 +00:00
Kazu Hirata	2aed08131d	[llvm] Use true/false instead of 1/0 (NFC) Identified with modernize-use-bool-literals.	2022-01-07 00:39:14 -08:00
Kazu Hirata	410480e32b	Ensure newlines at the end of files (NFC)	2022-01-06 23:44:02 -08:00
Arlo Siemsen	3d10997e42	Add Rust to CodeView SourceLanguage (CV_CFL_LANG) enum Microsoft has added several new entries to the CV_CFL_LANG enum, including Rust: https://docs.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/cv-cfl-lang This change adds Rust to the corresponding LLVM enum and translates `dwarf::DW_LANG_Rust` to `SourceLanguage::Rust` in the CodeView AsmPrinter. This means that Rust will no longer emit as Masm. Differential Revision: https://reviews.llvm.org/D115300	2022-01-06 14:27:08 -08:00
Mircea Trofin	68ac7b1701	[NFC][mlgo] Add feature declarations for the ML regalloc advisor This just adds feature declarations and some boilerplate. Differential Revision: https://reviews.llvm.org/D116076	2022-01-05 11:54:01 -08:00
David Green	fffd663c87	[CodeGen] Initialize MaxBytesForAlignment in TargetLoweringBase::TargetLoweringBase. This appears to be missing from D114590, causing sanitizer errors.	2022-01-05 19:34:27 +00:00
Luís Ferreira	34435fd105	[llvm] Add support for DW_TAG_immutable_type Added documentation about DW_TAG_immutable_type too. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D113633	2022-01-05 19:17:08 +00:00
Craig Topper	88ecdd30f6	[LegalizeTypes] Remove IsVP argument from type legalization methods. NFC We can either check the opcode or number of operands or use ISD::isVPOpcode inside the methods. In some places I've used number of operands figuring that it is cheaper than isVPOpcode. I've included isVPOpcode in an assert to verify. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D116578	2022-01-05 09:00:48 -08:00
Nicholas Guy	73d92faa2f	[CodeGen] Emit alignment "Max Skip" operand The current AsmPrinter has support to emit the "Max Skip" operand (the 3rd of .p2align), however has no support for it to actually be specified. Adding MaxBytesForAlignment to MachineBasicBlock provides this capability on a per-block basis. Leaving the value as default (0) causes no observable differences in behaviour. Differential Revision: https://reviews.llvm.org/D114590	2022-01-05 12:54:30 +00:00
Victor Perez	96e220e688	[LegalizeTypes][VP] Add integer promotion support for vp.select Promote select, vselect and vp.select in a similar way. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116400	2022-01-05 11:01:52 +00:00
Victor Perez	df5226dfb3	[LegalizeTypes][VP] Add widening support for vp.select Widen vp.select the same way as select and vselect. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116407	2022-01-05 09:21:11 +00:00
Craig Topper	a04b532505	[LegalizeIntegerTypes][RISCV] Teach PromoteSetCCOperands to check sign bits of unsigned compares. Unsigned compares work with either zero extended or sign extended inputs just like equality comparisons. I didn't allow this when I refactored the code in D116421 due to lack of tests. But I've since found a simple C test case that demonstrates when this can be useful. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D116617	2022-01-04 12:38:47 -08:00
Jack Andersen	5b1337184b	[DebugInfo] Avoid triggering global location assert for 2-byte pointer sizes. D111404 moved a 4/8 byte check assert into a block taken by 2-byte platforms. Since these platforms do not take the branches where the pointer size is used, sink the assert accordingly. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D116480	2022-01-04 15:16:36 -05:00
Michael Liao	56ec762a76	[regalloc] Fix GCC warning `-Wattributes`. NFC. - Mark it with LLVM_LIBRARY_VISIBILITY to preserve the legacy visibility.	2022-01-04 12:05:57 -05:00
Mircea Trofin	64e56f8356	[NFC] Expose isRematerializable and copyHint from CalcSpillWeights We need to reuse them for the ML regalloc eviction advisor, as we 'explode' the weight calculation into sub-features. Differential Revision: https://reviews.llvm.org/D116074	2022-01-04 08:11:49 -08:00
Mircea Trofin	c41610778b	[NFC][regalloc] Introduce RegAllocGreedy.h This was suggested in D114831. It should simplify the relation between eviction advisor and the allocator, and simplify ingesting more features tied to the internals of the allocator, in the future. This change simply pulls out RAGreedy, places it in the llvm namespace, and cleans up a bit the includes in the new header file. Differential Revision: https://reviews.llvm.org/D116114	2022-01-04 08:04:55 -08:00
Simon Moll	4c2aba999e	[VP][ISel] use LEGALPOS for legalization action Use the VPIntrinsics.def's LEGALPOS that is specified with every VP SDNode to determine which return or operand value type shall be used to infer the legalization action. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D116594	2022-01-04 14:50:49 +01:00
Simon Pilgrim	882c083889	[DAG] TargetLowering::SimplifySetCC - use APInt::getMinSignedBits() helper. NFC.	2022-01-04 13:48:36 +00:00
Nikita Popov	4ef560ec60	[ELF] Handle .init_array prefix consistently Currently, the code in TargetLoweringObjectFile only assigns @init_array section type to plain .init_array sections, but not prioritized sections like .init_array.00001. This is inconsistent with the interpretation in the AsmParser (see `791523bae6/llvm/lib/MC/MCParser/ELFAsmParser.cpp (L621-L632)`) and upcoming expectations in LLD (see https://github.com/rust-lang/rust/issues/92181 for context). This patch assigns @init_array section type to all sections with an .init_array prefix. The same is done for .fini_array and .preinit_array as well. With that, the logic matches the AsmParser. Differential Revision: https://reviews.llvm.org/D116528	2022-01-04 09:42:58 +01:00
Craig Topper	cbcbbd6ac8	[ValueTracking][SelectionDAG] Rename ComputeMinSignedBits->ComputeMaxSignificantBits. NFC This function returns an upper bound on the number of bits needed to represent the signed value. Use "Max" to match similar functions in KnownBits like countMaxActiveBits. Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old name around to keep this patch size down. Will do a bulk rename as follow up. Rename KnownBits::countMaxSignedBits->countMaxSignificantBits. Reviewed By: lebedev.ri, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D116522	2022-01-03 11:33:30 -08:00
Kazu Hirata	e5947760c2	Revert "[llvm] Remove redundant member initialization (NFC)" This reverts commit `fd4808887e`. This patch causes gcc to issue a lot of warnings like: warning: base class ‘class llvm::MCParsedAsmOperand’ should be explicitly initialized in the copy constructor [-Wextra]	2022-01-03 11:28:47 -08:00
Victor Perez	5527139302	[RISCV][VP] Add RVV codegen for [nX]vXi1 vp.select Expand [nX]vXi1 vp.select the same way as [nX]vXi1 vselect. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D115546	2022-01-02 23:12:32 -08:00
Kazu Hirata	7e163afd9e	Remove redundant void arguments (NFC) Identified by modernize-redundant-void-arg.	2022-01-02 10:20:19 -08:00
Kazu Hirata	fd4808887e	[llvm] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-01 16:18:18 -08:00
Kazu Hirata	69ccc96162	[llvm] Use the default constructor for SDValue (NFC)	2022-01-01 10:36:59 -08:00
Craig Topper	243b7aaf51	[SelectionDAG] Use KnownBits::countMinSignBits() to simplify the end of ComputeNumSignBits. This matches what is done in ValueTracking.cpp Reviewed By: RKSimon, foad Differential Revision: https://reviews.llvm.org/D116423	2021-12-31 17:29:57 -08:00
Craig Topper	d00e438cfe	[RISCV][LegalizeIntegerTypes] Teach PromoteSetCCOperands not to sext i32 comparisons for RV64 if the promoted values are already zero extended. This is similar to what is done for targets that prefer zero extend where we avoid using a zero extend if the promoted values are sign extended. We'll also check for zero extended operands for ugt, ult, uge, and ule when the target prefers sign extend. This is different than preferring zero extend, where we only check for sign bits on equality comparisons. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D116421	2021-12-31 17:15:20 -08:00
Craig Topper	7d659c6ac7	[LegalizeIntegerTypes] Rename NewLHS/NewRHS arguments to DAGTypeLegalizer::PromoteSetCCOperands. NFC The 'New' only makes sense in the context of these being output arguments, but they are also used as inputs first. Drop the 'New' and just call them LHS/RHS. Factored out of D116421.	2021-12-30 15:31:43 -08:00
Craig Topper	15787ccd45	[RISCV] Add support for STRICT_LRINT/LLRINT/LROUND/LLROUND. Tests for other strict intrinsics. This patch adds isel support for STRICT_LRINT/LLRINT/LROUND/LLROUND. It also adds test cases for f32 and f64 constrained intrinsics that correspond to the intrinsics in float-intrinsics.ll and double-intrinsics.ll. Support for promoting the integer argument of STRICT_FPOWI was added. I've skipped adding tests for f16 intrinsics, since we don't have libcalls for them and we have inconsistent support for promoting them in LegalizeDAG. This will need to be examined more closely. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D116323	2021-12-30 11:54:32 -08:00
modimo	ba51d26ec4	[CodeView] Clamp Frontend version D43002 introduced a test debug-info-objname.cpp that outputted the current compiler version into CodeView. Internally we appended a date to the patch version and overflowed the 16-bits allocated to that space. This change clamps the Frontend version outputted values to 16-bits like rGd1185fc081ead71a8bf239ff1814f5ff73084c15 did for the Backend version. Testing: ninja check-all newly added tests correctly clamps and no longer asserts when trying to output the field Reviewed By: aganea Differential Revision: https://reviews.llvm.org/D116243	2021-12-28 15:22:18 -08:00
Craig Topper	1c6b740d4b	[TargetLowering] Remove workaround for old behavior of getShiftAmountTy. NFC getShiftAmountTy used to directly return the shift amount type from the target which could be too small for large illegal types. For example, X86 always returns i8. The code here detected this and used i32 instead if it won't fit. This behavior was added to getShiftAmountTy in D112469 so we no longer need this workaround.	2021-12-28 14:08:25 -08:00
Kazu Hirata	5a667c0e74	[llvm] Use nullptr instead of 0 (NFC) Identified with modernize-use-nullptr.	2021-12-28 08:52:25 -08:00
Kazu Hirata	d09a284dfb	[CodeGen] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2021-12-28 00:38:11 -08:00
Petar Avramovic	508e39afe0	GlobalISel: remove redundant line added in D114198. NFC	2021-12-27 12:14:13 +01:00
David Blaikie	2bddab25db	DebugInfo: Don't hash DIE offsets before they're computed Instead of hashing DIE offsets, hash DIE references the same as they would be when used outside of a loclist - that is, deep hash the type on first use, and hash the numbering on subsequent uses. This does produce different hashes for different type references, where it did not before (because we were hashing zero all the time - so it didn't matter what type was referenced, the hash would be identical). This also allows us to enforce that the DIE offset (& size) is not queried before it is used (which came up while investigating another bug recently).	2021-12-25 16:09:12 -08:00
Kazu Hirata	2d303e6781	Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-12-24 23:17:54 -08:00
Fangrui Song	ea2d4c5881	[GlobalISel] Fix -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off builds after D114198	2021-12-24 00:55:54 -08:00
David Blaikie	b05df0287b	Revert "[DWARF] Fix PR51087 Extraneous enum record in DWARF with type units" Causes invalid debug_gnu_pubnames (& I think non-gnu pubnames too) - visible as 0 values for the offset in gnu pubnames. More details on the original review in D115325. This reverts commit `78d15a112c`. This reverts commit `54586582d3`.	2021-12-23 20:50:30 -08:00
Kristina Bessonova	81378f7e56	Revert "[DwarfDebug] Support emitting function-local declaration for a lexical block" & dependent patches Try to revert D113741 once again. This also reverts `0ac75e82ff` (D114705) as it causes LLDB's lldb-api.lang/cpp/nsimport.TestCppNsImport.py test failure w/o D113741. This reverts commit `f9607d45f3`. Differential Revision: https://reviews.llvm.org/D116225	2021-12-24 00:47:04 +02:00
Simon Pilgrim	71fc4bbdd2	[X86][SSE] Add ISD::ROTR support Fix issue in TargetLowering::expandROT where we only attempt to flip a rotation if the other direction has better support - this matches TargetLowering::expandFunnelShift This allows us to enable ISD::ROTR lowering on SSE targets, which particularly simplifies/improves codegen for splat amount and AVX2 per-element shifts.	2021-12-23 15:07:30 +00:00
Petar Avramovic	29f88b93fd	[GlobalISel] Rework more/fewer elements for vectors Artifact combiner is not able to access individual elements after using LCMTy style merge/unmerge, extract and insert to change vector number of elements (pad with undef or split to sub-vector instructions). Use unmerge to individual elements instead and then merge elements into requested types. Change argument lowering for vectors and moreElementsVector to use buildPadVectorWithUndefElements and buildDeleteTrailingVectorElements. FewerElementsVector had a few helpers that had different behavior, introduce new helper for most of the opcodes. FewerElementsVector helper is more flexible since it can create leftover instruction smaller then requested type (useful in case target wants to avoid pad with undef and use fewer registers). If target does not want leftover of different type it should call more elements first. Some helpers were performing more elements first to have split without leftover. Opcodes that used this helper use clampMaxNumElementsStrict (does more elements first) in LegalizerInfo to avoid test changes. Fixes failures caused by failing to combine artifacts created during more/fewer elements vector. Differential Revision: https://reviews.llvm.org/D114198	2021-12-23 14:30:02 +01:00
Muhammad Omair Javaid	f9607d45f3	Revert "Revert "[DwarfDebug] Support emitting function-local declaration for a lexical block" & dependent patches" This has broke following LLDB buildbots: https://lab.llvm.org/buildbot/#/builders/17/builds/14984 https://lab.llvm.org/buildbot/#/builders/96/builds/15928 https://lab.llvm.org/buildbot/#/builders/68/builds/23600 This reverts commit `62a6b9e9ab`.	2021-12-23 14:09:48 +05:00
Shivam Gupta	0489e89119	[DAGCombiner] Avoid combining adjacent stores at -O0 to improve debug experience When the source has a series of assignments, users reasonably want to have the debugger step through each one individually. Turn off the combine for adjacent stores so we get this behavior at -O0. Similar to D7181. Reviewed By: spatel, xgupta Differential Revision: https://reviews.llvm.org/D115808	2021-12-23 10:48:28 +05:30
David Blaikie	62a6b9e9ab	Revert "[DwarfDebug] Support emitting function-local declaration for a lexical block" & dependent patches This patch causes invalid DWARF to be generated in some cases of LTO + Split DWARF - follow-up on the original review thread (D113741) contains further detail and test cases. This reverts commit `75b622a795`. This reverts commit `b6ccca217c`. This reverts commit `514d374419`.	2021-12-22 15:27:09 -08:00
Simon Pilgrim	4639461531	[DAG][X86] Add TargetLowering::isSplatValueForTargetNode override Add callback to enable us to test target nodes if they are splat vectors Added some basic X86ISD::VBROADCAST + X86ISD::VBROADCAST_LOAD handling	2021-12-22 16:57:44 +00:00
Alexandre Ganea	a282ea4898	Reland - [CodeView] Emit S_OBJNAME record Reland integrates build fixes & further review suggestions. Thanks to @zturner for the initial S_OBJNAME patch! Differential Revision: https://reviews.llvm.org/D43002	2021-12-21 19:02:14 -05:00
Alexandre Ganea	5bb5142e80	Revert [CodeView] Emit S_OBJNAME record Also revert all subsequent fixes: - `abd1cbf5e5` [Clang] Disable debug-info-objname.cpp test on Unix until I sort out the issue. - `00ec441253` [Clang] debug-info-objname.cpp test: explictly encode a x86 target when using %clang_cl to avoid falling back to a native CPU triple. - `cd407f6e52` [Clang] Fix build by restricting debug-info-objname.cpp test to x86.	2021-12-21 19:02:14 -05:00
Alexandre Ganea	f44e3fbadd	[CodeView] Emit S_OBJNAME record Thanks to @zturner for the initial patch! Differential Revision: https://reviews.llvm.org/D43002	2021-12-21 09:26:36 -05:00
Jay Foad	17006033f9	[GlobalISel] Verify operand types for G_SHL, G_LSHR, G_ASHR Differential Revision: https://reviews.llvm.org/D115868	2021-12-21 11:59:33 +00:00
Simon Pilgrim	592e89e636	[DAG] Constify SelectionDAG::isSplatValue() This doesn't generate any nodes so should be usable by methods with const SelectionDAG &.	2021-12-21 11:19:23 +00:00
Kazu Hirata	500c4b68dc	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-12-20 23:43:24 -08:00
Mircea Trofin	07622368a8	[NFC] Fix clang-tidy issues in CalcSpillWeights.cpp	2021-12-20 19:24:44 -08:00
Sami Tolvanen	5dc8aaac39	[llvm][IR] Add no_cfi constant With Control-Flow Integrity (CFI), the LowerTypeTests pass replaces function references with CFI jump table references, which is a problem for low-level code that needs the address of the actual function body. For example, in the Linux kernel, the code that sets up interrupt handlers needs to take the address of the interrupt handler function instead of the CFI jump table, as the jump table may not even be mapped into memory when an interrupt is triggered. This change adds the no_cfi constant type, which wraps function references in a value that LowerTypeTestsModule::replaceCfiUses does not replace. Link: https://github.com/ClangBuiltLinux/linux/issues/1353 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D108478	2021-12-20 12:55:32 -08:00
Shivam Gupta	eb66f0662a	Revert "[DAGCombiner] Avoid combining adjacent stores at -O0 to improve debug experience" This reverts commit `731bde1ed3`.	2021-12-20 21:43:40 +05:30
Shivam Gupta	731bde1ed3	[DAGCombiner] Avoid combining adjacent stores at -O0 to improve debug experience When the source has a series of assignments, users reasonably want to have the debugger step through each one individually. Turn off the combine for adjacent stores so we get this behavior at -O0. Similar to D7181. Differential Revision: https://reviews.llvm.org/D115808	2021-12-19 20:58:49 +05:30
Simon Pilgrim	efec3a26b4	[DAG] visitADDSAT/visitSUBSAT - merge scalar/vector canonicalization and constant folding. Match order of most of the other integer opcode combines	2021-12-19 13:19:40 +00:00
Simon Pilgrim	c1340b9e78	[DAG] Improve FMINNUM/FMAXNUM/FMINIMUM/FMAXIMUM constant folding Merge the node combines into a common DAGCombiner::visitFMinMax (like we do for IMINMAX). Move the constant folding into SelectionDAG::foldConstantFPMath. This allows us to fold the vecreduce-propagate-sd-flags.ll test as it reduces constants - so I've refactored it to take variables instead. Differential Revision: https://reviews.llvm.org/D115952	2021-12-19 11:45:51 +00:00
Kazu Hirata	fee57711fe	Use DenseMap::lookup (NFC)	2021-12-17 18:19:25 -08:00
Kazu Hirata	26bd534a79	[llvm] Use none_of instead of \!any_of (NFC)	2021-12-17 13:48:57 -08:00
Sanjay Patel	79932211f9	[SDAG] remove FP-to-int cast attribute check in fold to FTRUNC We were using a function attribute to indicate a non-standard FP mode, but now we can use intrinsics for that job as shown in the new tests. Presumably the x86 asm could be improved for that IR with intrinsics, but I have not worked out exactly how to do that. Note that the transform to FTRUNC still requires a hacky check for "nsz" (because FMF are not applied to FP casts). This is a cleanup based on the clang change in D115804 / `8c7f2a4f87` . This is effectively a revert of `5a90285bd9` + D46237 . Differential Revision: https://reviews.llvm.org/D115885	2021-12-17 16:01:37 -05:00
Kazu Hirata	90bd4873d6	[CodeGen] Fix an unused variable warning This patch fixes: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:22617:11: error: unused variable 'Ops' [-Werror,-Wunused-variable]	2021-12-17 09:43:42 -08:00
Simon Pilgrim	35c7b1aeae	[DAG] SimplifyVBinOp - remove FoldConstantArithmetic call. Constant folding (scalar/vector) is now consistently handled before the SimplifyVBinOp calls.	2021-12-17 17:22:23 +00:00
Simon Pilgrim	f602723bfa	[DAG] Constant fold + canonicalize fp binops before SimplifyVBinOp call Replace custom constant scalar/splat folding with FoldConstantArithmetic call and canonicalize commutative constant ops to the RHS before the SimplifyVBinOp call	2021-12-17 17:02:54 +00:00
Simon Pilgrim	9d2994311a	[DAG] Move foldConstantFPMath() inside FoldConstantArithmetic Further merging of integer and fp constant folding paths. This allows us to handle undef vector arguments the same as scalar cases.	2021-12-17 16:06:41 +00:00
Simon Pilgrim	512ab9968d	[DAG] foldConstantFPMath - fold vector splats as well as scalar constants	2021-12-17 15:19:26 +00:00
Simon Pilgrim	52611702ea	Revert rG22dbc7a48bf7a3942a7e5ff57977ef828d240bd3 "[DAG] foldConstantFPMath - fold vector splats as well as scalar constants" A followup patch uncovered an issue with allowing undef elements in the splat - I will reapply this with a fixed implementation.	2021-12-17 15:19:25 +00:00
David Truby	5c9684704d	[DAG][sve] Lowering for VLS masked truncating stores This extends the custom lowering for truncating stores on fixed length vectors in SVE to support masked truncating stores. It also adds a DAG combine for truncates followed by masked stores. Reviewed By: peterwaller-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D108115	2021-12-17 15:04:45 +00:00
Simon Pilgrim	22dbc7a48b	[DAG] foldConstantFPMath - fold vector splats as well as scalar constants	2021-12-17 14:24:36 +00:00
Simon Pilgrim	d91b5b0f57	[DAG] foldConstantFPMath - use APFloat& for read-only constant fold arg. NFC. We just need to copy the 1st arg (which we use for the constant fold result) - use a cheaper const reference for the 2nd arg.	2021-12-17 12:34:03 +00:00
Simon Pilgrim	42f00106b7	[DAG] Constant fold + canonicalize integer binops before SimplifyVBinOp call SimplifyVBinOp still has a FoldConstantArithmetic call, which now it isn't vector specific we should be able to remove (once fp binops are tidied up); but we can at least clean up the integer opcodes to perform the basic constant/undef handling in common code first.	2021-12-17 12:02:27 +00:00
OCHyams	78d15a112c	[DWARF] Fix PR51087 Extraneous enum record in DWARF with type units Fixes https://llvm.org/PR51087: Extraneous enum record in DWARF with type units. As explained in PR51087 we sometimes get skeleton DIEs for enums in a Dwarf Compile Unit (CU) that are not referenced from any CU and are already described by a type unit. Types for enums are emitted whether used or not, all together before most types in the CU. Mechanically, the extraneous CU records are generated because the enum types are generated with a call to CU->getOrCreateTypeDIE. This function will recursively get-or-create the parent DIE (in the CU) and the type unit for each. We don't need the CU-side DIEs if the type units are sucesfully emitted. Fix by only emitting the type units for enums if possible, falling back to a call to getOrCreateTypeDIE if not. Do the same for retained types. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D115325	2021-12-17 10:10:55 +00:00
Mircea Trofin	09103807e7	[NFC][regalloc] Introduce the RegAllocEvictionAdvisorAnalysis This patch introduces the eviction analysis and the eviction advisor, the default implementation, and the scaffolding for introducing the other implementations of the advisor. Differential Revision: https://reviews.llvm.org/D115707	2021-12-16 17:56:46 -08:00
Ellis Hoag	58d9c1aec8	[Try2][InstrProf] Attach debug info to counters Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile. Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 The original diff https://reviews.llvm.org/D114565 was reverted because of the `Instrumentation/InstrProfiling/debug-info-correlate.ll` test, which is fixed in this commit. Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D115693	2021-12-16 14:20:30 -08:00
Nathan Sidwell	dd073e08ae	Avoid by-value copies of referenced objects These were detected by the new -Wauto-by-value-copy (D114989) warning, these by-value constant copies need only be references. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D114990	2021-12-16 07:22:46 -08:00
Jay Foad	cce93b3397	[MachineVerifier] Undef subreg operands do not require subranges D112556 added verification that the live interval for a subreg operand must have subranges. This patch fixes a corner case, where if all subreg operands for a particular register are undef uses then no subranges are required. This matches how LiveIntervalCalc would build the live intervals in the first place, since an undef use is not considered to read the register. Before this patch, CodeGen/AMDGPU/no-remat-indirect-mov.mir would fail with -early-live-intervals: # After Live Interval Analysis ... * Bad machine code: Live interval for subreg operand has no subranges * - function: index_vgpr_waterfall_loop - basic block: %bb.1 (0x6a9a968) [352B;496B) - instruction: 432B %24:vgpr_32 = V_MOV_B32_e32 undef %18.sub0:vreg_512, implicit $exec, implicit %18:vreg_512, implicit $m0 - operand 1: undef %18.sub0:vreg_512 Differential Revision: https://reviews.llvm.org/D115360	2021-12-16 09:49:27 +00:00
Arthur Eubanks	eba7b26815	[SafeStack] Use Align instead of uint64_t It is better typed, and the calls to getAlignment() are deprecated. Differential Revision: https://reviews.llvm.org/D115466	2021-12-15 14:40:56 -08:00
Arnold Schwaighofer	d87e617048	Teach the backend to make references to swift_async_extendedFramePointerFlags weak if it emits it When references to the symbol `swift_async_extendedFramePointerFlags` are emitted they have to be weak. References to the symbol `swift_async_extendedFramePointerFlags` get emitted only by frame lowering code. Therefore, the backend needs to track references to the symbol and mark them weak. Differential Revision: https://reviews.llvm.org/D115672	2021-12-15 10:02:06 -08:00
Simon Pilgrim	b88f4f271b	[DAG] SelectionDAG::isSplatValue - add *_EXTEND_VECTOR_INREG handling Fixes #52719	2021-12-15 12:26:39 +00:00
Esme-Yi	c0529efc95	[DebugInfo][DWARF] emit DW_AT_accessibility attribute for class/struct/union types. Summary: This patch emits the DW_AT_accessibility attribute for class/struct/union types in the LLVM part. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D115606	2021-12-15 07:38:12 +00:00
John Brawn	dc9f65be45	[AArch64][SVE] Fix handling of stack protection with SVE Fix a couple of things that were causing stack protection to not work correctly in functions that have scalable vectors on the stack: * Use TypeSize when determining if accesses to a variable are considered out-of-bounds so that the behaviour is correct for scalable vectors. * When stack protection is enabled move the stack protector location to the top of the SVE locals, so that any overflow in them (or the other locals which are below that) will be detected. Fixes: https://github.com/llvm/llvm-project/issues/51137 Differential Revision: https://reviews.llvm.org/D111631	2021-12-14 11:30:48 +00:00
Chen Zheng	062d9b7d43	[LegalizeVectorOps] code refactor for LegalizeOp; NFC Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D115636	2021-12-14 03:45:53 +00:00
Ellis Hoag	c809da7d9c	Revert "[InstrProf] Attach debug info to counters" This reverts commit `800bf8ed29`. The `Instrumentation/InstrProfiling/debug-info-correlate.ll` test was failing because I forgot the `llc` commands are architecture specific. I'll follow up with a fix. Differential Revision: https://reviews.llvm.org/D115689	2021-12-13 18:15:17 -08:00
Ellis Hoag	800bf8ed29	[InstrProf] Attach debug info to counters Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile. Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D114565	2021-12-13 17:51:22 -08:00
Chen Zheng	8c107bee70	[LegalizeVectorOps] fix a typo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D115637	2021-12-14 00:22:58 +00:00
Fangrui Song	a6a07a514b	[MachineOutliner] Don't outline functions starting with PATCHABLE_FUNCTION_ENTER/FENTRL_CALL MachineOutliner may outline a "patchable-function-entry" function whose body has a TargetOpcode::PATCHABLE_FUNCTION_ENTER MachineInstr. This is incorrect because the special code sequence must stay unchanged to be used at run-time. Avoid outlining PATCHABLE_FUNCTION_ENTER. While here, avoid outlining FENTRY_CALL too (which doesn't reproduce currently) to allow phase ordering flexibility. Fixes #52635 Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D115614	2021-12-13 13:24:29 -08:00
Stanislav Mekhanoshin	c4aef9c281	Check subrange liveness at rematerialization LiveRangeEdit::allUsesAvailableAt checks that VNI at use is the same as at the original use slot. However, the VNI can be the same while a specific subrange needed for use can be dead at the new index. This patch adds subrange liveness check if there is a subreg use. Fixes: SWDEV-312810 Differential Revision: https://reviews.llvm.org/D115278	2021-12-13 11:11:55 -08:00
Mircea Trofin	657adcb077	[NFC][regalloc] Move ExtraRegInfo and related to LiveRangeStageManager This would allow sharing the LiveRangeStageManager between different RegAllocEvictionAdvisors. One scenario is for ML training, where we want to capture what the default advisor would do, for bootstrapping (speeds up training). Differential Revision: https://reviews.llvm.org/D114831	2021-12-13 10:10:57 -08:00
Roman Lebedev	c1a36ba002	[DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask') In most test changes this allows us to drop some broadcasts/shuffles. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D104156	2021-12-13 20:03:44 +03:00
Peter Waller	921e89c59a	[SVE] Only combine (fneg (fma)) => FNMLA with nsz -(Za + Zm * Zn) != (-Za + Zm * (-Zn)) when the FMA produces a zero output (e.g. all zero inputs can produce -0 output) Add a PatFrag to check presence of nsz on the fneg, add tests which ensure the combine does not fire in the absense of nsz. See https://reviews.llvm.org/D90901 for a similar discussion on X86. Differential Revision: https://reviews.llvm.org/D109525	2021-12-13 11:33:07 +00:00
Fraser Cormack	b0319ab79b	[PR52475] Ensure a correct chain in copies to/from hidden sret parameter This patch fixes an issue during SelectionDAG construction. When the target is unable to lower the function's return value, a hidden sret parameter is created. It is initialized and copied to a stored variable (DemoteRegister) with CopyToReg and is later fetched with CopyFromReg. The bug is that the chains used for each copy are inconsistent, and thus in rare cases the scheduler may issue them out of order. The fix is to ensure that the CopyFromReg uses the DAG root which is set as the chain corresponding to the initial CopyToReg. Fixes https://llvm.org/PR52475 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D114795	2021-12-13 10:46:32 +00:00
Kazu Hirata	bb6447a78c	[llvm] Use llvm::reverse (NFC)	2021-12-12 16:13:49 -08:00
Kazu Hirata	67aeae0138	[llvm] Use range-based for loops (NFC)	2021-12-11 22:34:07 -08:00
Adrian Prantl	c7c84b9087	[DwarfDebug] Refuse to emit DW_OP_LLVM_arg values wider than 64 bits DwarfExpression::addUnsignedConstant(const APInt &Value) only supports wider-than-64-bit values when it is used to emit a top-level DWARF expression representing the location of a variable. Before this change, it was possible to call addUnsignedConstant on >64 bit values within a subexpression when substituting DW_OP_LLVM_arg values. This can trigger an assertion failure (e.g. PR52584, PR52333) when it happens in a fragment (DW_OP_LLVM_fragment) expression, as addUnsignedConstant on >64 bit values splits the constant into separate DW_OP_pieces, which modifies DwarfExpression::OffsetInBits. This change papers over the assertion errors by bailing on overly wide DW_OP_LLVM_arg values. A more comprehensive fix might be to be to split wide values into pointer-sized fragments. [0] https://github.com/llvm/llvm-project/blob/e71fa03/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp#L799-L805 Patch by Ricky Zhou! Differential Revision: https://reviews.llvm.org/D115343	2021-12-10 09:33:27 -08:00
David Sherwood	652faed353	[CodeGen] Improve SelectionDAGBuilder lowering code for get.active.lane.mask intrinsic Previously we were using UADDO to generate a two-result value with the unsigned addition and the overflow mask. We then combined the overflow mask with the trip count comparison to get a result. However, we don't need to do this - we can simply use a UADDSAT saturating add node to add the vector index splat and the stepvector together. Then we can just compare this to a splat of the trip count. This results in overall better code quality for both Thumb2 and AArch64. Differential Revision: https://reviews.llvm.org/D115354	2021-12-10 13:39:38 +00:00
Brian Cain	1e68c79987	Reapply [xray] add support for hexagon Adds x-ray support for hexagon to llvm codegen, clang driver, compiler-rt libs. Differential Revision: https://reviews.llvm.org/D113638 Reapplying this after `543a9ad7c4`, which fixes the leak introduced there.	2021-12-10 05:32:28 -08:00
Sameer Sahasrabuddhe	1d0244aed7	Reapply CycleInfo: Introduce cycles as a generalization of loops Reverts `02940d6d22`. Fixes breakage in the modules build. LLVM loops cannot represent irreducible structures in the CFG. This change introduce the concept of cycles as a generalization of loops, along with a CycleInfo analysis that discovers a nested hierarchy of such cycles. This is based on Havlak (1997), Nesting of Reducible and Irreducible Loops. The cycle analysis is implemented as a generic template and then instatiated for LLVM IR and Machine IR. The template relies on a new GenericSSAContext template which must be specialized when used for each IR. This review is a restart of an older review request: https://reviews.llvm.org/D83094 Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>, with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> Differential Revision: https://reviews.llvm.org/D112696	2021-12-10 14:36:43 +05:30
Konstantin Schwarz	a344653725	[GlobalISel] Fix IRTranslator for constexpr fcmp The existing code assumed fcmp to always be an Instruction, but it can also be a ConstExpr. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D115450	2021-12-10 08:49:12 +01:00
Kazu Hirata	f829630d2e	[llvm] Use llvm::count (NFC)	2021-12-09 20:50:38 -08:00
Arthur Eubanks	1172712f46	[NFC] Replace some deprecated getAlignment() calls with getAlign() Reviewed By: gchatelet Differential Revision: https://reviews.llvm.org/D115370	2021-12-09 08:43:19 -08:00
Brian Cain	ab28cb1c5c	Revert "[xray] add support for hexagon" This reverts commit `543a9ad7c4`.	2021-12-09 07:30:40 -08:00
Brian Cain	543a9ad7c4	[xray] add support for hexagon Adds x-ray support for hexagon to llvm codegen, clang driver, compiler-rt libs. Differential Revision: https://reviews.llvm.org/D113638	2021-12-09 05:47:53 -08:00
Mircea Trofin	4afae6f7c7	[NFC] Rename MachineFunction::cloneMachineInstrBundle (coding style)	2021-12-08 21:12:54 -08:00
Mircea Trofin	b012742405	[NFC] Rename MachineFunction::deleteMachineInstr (coding style)	2021-12-08 20:36:13 -08:00
Kazu Hirata	c23ebf1714	[llvm] Use range-based for loops (NFC)	2021-12-08 20:35:39 -08:00
Mircea Trofin	91a0da0142	[NFC] Rename MachineFunction::DeleteMachineBasicBlock Renamed to conform to coding style	2021-12-08 18:12:51 -08:00
David Sherwood	3257f63bbd	[NFC][CodeGen] Remove rarely used DL variable from SelectionDAGBuilder There is a pointer to the DataLayout in SelectionDAGBuilder called 'DL' that is hardly ever used. In most cases the code seems to just use `DAG.getDataLayout()` instead. Given that DL is also often used as a shadowed variable for the debug location it seems sensible to just kill off the few remaining uses and be consistent with the rest of the code. Differential Revision: https://reviews.llvm.org/D114451	2021-12-08 17:05:46 +00:00
David Green	5d7efd4758	[SDAG] Refine MMO size when converting masked load/store to normal load/store After D113888 / `32b6c17b29` the MMO size of a masked loads/store is unknown. When we are converting back to a standard load/store because the mask is known all ones, we can refine that to the correct size from the size of the vector being loaded/stored. Differential Revision: https://reviews.llvm.org/D114582	2021-12-08 10:13:25 +00:00
Alex Lorenz	0756aa3978	[macho] add support for emitting macho files with two build version load commands This patch extends LLVM IR to add metadata that can be used to emit macho files with two build version load commands. It utilizes "darwin.target_variant.triple" and "darwin.target_variant.SDK Version" metadata names for that, which will be set by a future patch in clang. MachO uses two build version load commands to represent an object file / binary that is targeting both the macOS target, and the Mac Catalyst target. At runtime, a dynamic library that supports both targets can be loaded from either a native macOS or a Mac Catalyst app on a macOS system. We want to add support to this to upstream to LLVM to be able to build compiler-rt for both targets, to finish the complete support for the Mac Catalyst platform, which is right now targetable by upstream clang, but the compiler-rt bits aren't supported because of the lack of this multiple build version support. Differential Revision: https://reviews.llvm.org/D112189	2021-12-07 18:17:47 -08:00
Jonas Devlieghere	02940d6d22	Revert "CycleInfo: Introduce cycles as a generalization of loops" This reverts commit `0fe61ecc2c` because it breaks the modules build. https://green.lab.llvm.org/green/job/clang-stage2-rthinlto/4858/ https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/39112/	2021-12-07 13:06:34 -08:00
Chih-Ping Chen	b5c42ef3da	[NFC][CodeView] Use one unified access to the module in beginModule. Differential Revision: https://reviews.llvm.org/D115257	2021-12-07 13:45:48 -05:00
Simon Pilgrim	d298c32407	Remove unused variable. NFC.	2021-12-07 18:37:07 +00:00
Simon Pilgrim	52d2f35323	[DAG] Update expandFunnelShift/expandROT to return the expansion directly. NFCI. Don't return a bool to indicate if the expansion was successful, just return the SDValue result directly, like we do for most other basic expansions.	2021-12-07 18:09:43 +00:00
Kazu Hirata	630c847b1b	[llvm] Use range-based for loops (NFC)	2021-12-07 09:17:03 -08:00
Mircea Trofin	fa99cb64ff	[mlgo][regalloc] Add score calculation for training Add the calculation of a score, which will be used during ML training. The score qualifies the quality of a regalloc policy, and is independent of what we train (currently, just eviction), or the regalloc algo itself. We can then use scores to guide training (which happens offline), by formulating a reward based on score variation - the goal being lowering scores (currently, that reward is percentage reduction relative to Greedy's heuristic) Currently, we compute the score by factoring different instruction counts (loads, stores, etc) with the machine basic block frequency, regardless of the instructions' provenance - i.e. they could be due to the regalloc policy or be introduced previously. This is different from RAGreedy::reportStats, which accummulates the effects of the allocator alone. We explored this alternative but found (at least currently) that the more naive alternative introduced here produces better policies. We do intend to consolidate the two, however, as we are actively investigating improvements to our reward function, and will likely want to re-explore scoring just the effects of the allocator. In either case, we want to decouple score calculation from allocation algorighm, as we currently evaluate it after a few more passes after allocation (also, because score calculation should be reusable regardless of allocation algorithm). We intentionally accummulate counts independently because it facilitates per-block reporting, which we found useful for debugging - for instance, we can easily report the counts indepdently, and then cross-reference with perf counter measurements. Differential Revision: https://reviews.llvm.org/D115195	2021-12-07 09:00:27 -08:00
spupyrev	f573f6866e	ext-tsp basic block layout A new basic block ordering improving existing MachineBlockPlacement. The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem. The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order. An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. Differential Revision: https://reviews.llvm.org/D113424	2021-12-07 07:31:10 -08:00
Paulo Matos	2fd634a5e3	[WebAssembly] Implement table instruction intrinsics This change implements intrinsics for table.grow, table.fill, table.size, and table.copy. Differential Revision: https://reviews.llvm.org/D113420	2021-12-07 13:25:59 +01:00
Fraser Cormack	40d51de5cb	[SelectionDAG] Use UnknownSize for VP memory ops In the style of D113888, this patch updates the various VP memory operations (load, store, gather, scatter) to use UnknownSize. This is for the same reason as for masked loads and stores: the number of elements accessed is not generally known at compile time. This is somewhat pessimistic in the sense that we may still find un-canonicalized intrinsics featuring both an all-true mask and an EVL equal to the vector size. Arguably those should be canonicalized before the SelectionDAG, so those have been left for future work. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D115036	2021-12-07 10:51:02 +00:00
Fraser Cormack	3460cc2585	[VP] Propagate align parameter attr on VP load/store to ISel This patch fixes a case where the 'align' parameter attribute on the pointer operands to llvm.vp.load and llvm.vp.store was being dropped during the conversion to the SelectionDAG. The default alignment equal to the ABI type alignment of the vector type was kept. It also updates the documentation to reflect the fact that the parameter attribute is now properly supported. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D114422	2021-12-07 10:16:16 +00:00
Sameer Sahasrabuddhe	0fe61ecc2c	CycleInfo: Introduce cycles as a generalization of loops LLVM loops cannot represent irreducible structures in the CFG. This change introduce the concept of cycles as a generalization of loops, along with a CycleInfo analysis that discovers a nested hierarchy of such cycles. This is based on Havlak (1997), Nesting of Reducible and Irreducible Loops. The cycle analysis is implemented as a generic template and then instatiated for LLVM IR and Machine IR. The template relies on a new GenericSSAContext template which must be specialized when used for each IR. This review is a restart of an older review request: https://reviews.llvm.org/D83094 Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>, with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> Differential Revision: https://reviews.llvm.org/D112696	2021-12-07 12:02:34 +05:30
Mircea Trofin	2bd7384d3a	[NFC][MachineInstr] Pass-by-value DebugLoc in CreateMachineInstr DebugLoc is cheap to move, passing it by-val rather than const ref to take advantage of the fact that it is consumed that way by the MachineInstr ctor, which creates some optimization oportunities. Differential Revision: https://reviews.llvm.org/D115208	2021-12-06 19:42:19 -08:00
Kai Luo	b206ee6906	[MachineVerifier] Make TiedOpsRewritten computable in MIRParser This patch is to address post-commit comment https://reviews.llvm.org/D80538#anchor-inline-1091625, which make the constraint stronger based on what https://reviews.llvm.org/D80538 does, i.e., "TiedOpsRewritten is set iff leave-ssa and all tied operands share the same register". Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D114573	2021-12-07 02:25:15 +00:00
Mircea Trofin	615e374252	[NFC][MachineInstr] Rename some vars to conform to coding style	2021-12-06 17:19:11 -08:00
Nico Weber	3678326d28	Revert "ext-tsp basic block layout" This reverts commit `c68f71eb37`. Breaks tests on arm hosts, see comments on https://reviews.llvm.org/D113424	2021-12-06 19:08:20 -05:00
James Nagurne	cc3bb85580	[llvm][Hexagon] Generalize VLIWResourceModel, VLIWMachineScheduler, and ConvergingVLIWScheduler The Pre-RA VLIWMachineScheduler used by Hexagon is a relatively generic implementation that would make sense to use on other VLIW targets. This commit lifts those classes into their own header/source file with the root VLIWMachineScheduler. I chose this path rather than adding the strategy et al. into MachineScheduler to avoid bloating the file with other implementations. Target-specific behaviors have been captured and replicated through function overloads. - Added an overloadable DFAPacketizer creation member function. This is mainly done for our downstream, which has the capability to override the DFAPacketizer with custom implementations. This is an upstreamable TODO on our end. Currently, it always returns the result of TargetInstrInfo::CreateTargetScheduleState - Added an extra helper which returns the number of instructions in the current packet. This is used in our downstream, and may be useful elsewhere. - Placed the priority heuristic values into the ConvergingVLIWscheduler class instead of defining them as local statics in the implementation - Added a overridable helper in ConvergingVLIWScheduler so that targets can create their own VLIWResourceModel Differential Revision: https://reviews.llvm.org/D113150	2021-12-06 16:23:48 -06:00
spupyrev	c68f71eb37	ext-tsp basic block layout A new basic block ordering improving existing MachineBlockPlacement. The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem. The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order. An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. Differential Revision: https://reviews.llvm.org/D113424	2021-12-06 08:56:39 -08:00
Kazu Hirata	c4a8928b51	[CodeGen] Use range-based for loops (NFC)	2021-12-06 08:49:10 -08:00
Jack Andersen	f108c7f59d	[GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues. Expanding on D109750. Since `DBG_VALUE` instructions have final register validity determined in `LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune unused register operands as their defs are erased. Consequently, this renders `MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a substantial performance improvement. The only necessary changes involve making relevant passes consider invalid DBG_VALUE vregs uses as valid. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D112852	2021-12-05 15:55:59 -05:00
Michael Liao	b6ccca217c	Fix `-Wunused-variable` warning. NFC.	2021-12-05 13:40:35 -05:00
Kazu Hirata	1457e78352	[llvm] Use range-based for loops (NFC)	2021-12-05 08:33:02 -08:00
Kristina Bessonova	75b622a795	Reland [DwarfDebug] Support emitting function-local declaration for a lexical block This is another attempt to make function-local declarations (like static variables, structs/classes and other) be correctly emitted within a lexical (bracketed) block. Fixes https://bugs.llvm.org/show_bug.cgi?id=19238. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D113741	2021-12-05 13:56:45 +02:00
Kristina Bessonova	0ac75e82ff	Reland [DwarfDebug] Move emission of global vars, types and imports to endModule() This patch proposes to move emission of global variables, types, imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule(). Effectively, this changes nothing but the order of debug entities which will be as follows: * subprograms (including related context, local variables/labels, local imported entities; related types can be created as a part of the emission of local entities of an abstract subprogram); * global variables (including related context and types); * retained types and enums; * non-local-scoped imported entities; * basic types; * other types left (as a part of local variables attributes emission). Note that the order of emitted compile units may also be changed as now we emit units that contain subprograms first and then all other non-empty units. The motivation behind this change is the following: (1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline, from this time IR can be significantly changed by target-specific passes. If it happens for debug metadata of global entities, those changes will not be reflected in the emitted DWARF. (2) imported subprogram names should refer to an abstract subprogram if it exists, but it isn't known in DwarfDebug::beginModule() (it's possible to make some guesses based on location info, but it's not quite reliable); (3) aforementioned entities if they are scoped within a bracketed block (subject of D113741) couldn't be emitted in DwarfDebug::beginModule() (they need parent emitted first). Another problem is if to try to gather some information about local entities and defer their emission (till subprogram's processing or DwarfDebug::endModule()) all the gathered details might be irrelevant / invalid by the time the entities are being emitted (because of (1)). Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114705	2021-12-05 13:56:45 +02:00
David Green	57ff805a6d	[DAG] Create fptoui.sat from clamped fptosi As an extension to D111976, this converts clamp fptosi, clamped between 0 and (2^n)-1 to a fptoui.sat. This can greatly help on targets with conversions that naturally saturate, such as Arm. X86 disables the transform as some of the test cases increases in size. A fptoui.sat necessitates a fp clamp without native support, so there is little use in converting if the instruction is just going to be expanded. Differential Revision: https://reviews.llvm.org/D112428	2021-12-05 09:25:52 +00:00
Kazu Hirata	ca2f53897a	[CodeGen] Use range-based for loops (NFC)	2021-12-04 08:48:05 -08:00
Kristina Bessonova	a961604819	Revert "[DwarfDebug] Support emitting function-local declaration for a lexical block" This reverts commits * `ee691970a9` (D113741), * `79d3132998` (D114705) due to lldb and dexter test failures.	2021-12-04 18:06:57 +02:00
Kristina Bessonova	ee691970a9	[DwarfDebug] Support emitting function-local declaration for a lexical block This is another attempt to make function-local declarations (like static variables, structs/classes and other) be correctly emitted within a lexical (bracketed) block. Fixes https://bugs.llvm.org/show_bug.cgi?id=19238. Differential Revision: https://reviews.llvm.org/D113741	2021-12-04 17:12:47 +02:00
Kristina Bessonova	79d3132998	[DwarfDebug] Move emission of global vars, types and imports to endModule() This patch proposes to move emission of global variables, types, imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule(). Effectively, this changes nothing but the order of debug entities which will be as follows: * subprograms (including related context, local variables/labels, local imported entities; related types can be created as a part of the emission of local entities of an abstract subprogram); * global variables (including related context and types); * retained types and enums; * non-local-scoped imported entities; * basic types; * other types left (as a part of local variables attributes emission). Note that the order of emitted compile units may also be changed as now we emit units that contain subprograms first and then all other non-empty units. The motivation behind this change is the following: (1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline, from this time IR can be significantly changed by target-specific passes. If it happens for debug metadata of global entities, those changes will not be reflected in the emitted DWARF. (2) imported subprogram names should refer to an abstract subprogram if it exists, but it isn't known in DwarfDebug::beginModule() (it's possible to make some guesses based on location info, but it's not quite reliable); (3) aforementioned entities if they are scoped within a bracketed block (subject of D113741) couldn't be emitted in DwarfDebug::beginModule() (they need parent emitted first). Another problem is if to try to gather some information about local entities and defer their emission (till subprogram's processing or DwarfDebug::endModule()) all the gathered details might be irrelevant / invalid by the time the entities are being emitted (because of (1)). Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114705	2021-12-04 14:10:01 +02:00
Kazu Hirata	3aed282257	[CodeGen] Use range-based for loops (NFC)	2021-12-03 20:45:59 -08:00
Simon Pilgrim	ebf5271918	[DAG] PromoteIntRes_FunnelShift - rename shift Amount variable to Amt to prevent line overflow. NFC.	2021-12-03 17:24:45 +00:00
Stephen Tozer	98a021fcbf	[DebugInfo] Attempt to preserve more information during tail duplication Prior to this patch, tail duplication handled debug info poorly - specifically, debug instructions would be dropped instead of being set undef, potentially extending the lifetimes of prior debug values that should be killed. The pass was also very aggressive with dropping debug info, dropping debug info even when the SSA value it referred to was still present. This patch attempts to handle debug info more carefully, checking to see whether each affected debug value can still be live, setting it undef if not. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D106875	2021-12-03 15:30:05 +00:00
David Green	255ad73424	[ARM] Make MVE v2i1 predicates legal MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the two halves. This was never treated as a legal type in llvm in the past as there are not many 64bit instructions and no 64bit compares. There are a few instructions that could use it though, notably a VSELECT (as it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for similar reasons, some gathers/scatter and long multiplies and VCTP64 instructions. This patch goes through and makes v2i1 a legal type, handling all the cases that fall out of that. It also makes VSELECT legal for v2i64 as a side benefit. A lot of the codegen changes as a result - usually in way that is a little better or a little worse, but still expensive. Costs can change a little too in the process, again in a way that expensive things remain expensive. A lot of the tests that changed are mainly to ensure correctness - the code can hopefully be improved in the future where it comes up in practice. The intrinsics currently remain using the v4i1 they previously did to emulate a v2i1. This will be changed in a followup patch but this one was already large enough. Differential Revision: https://reviews.llvm.org/D114449	2021-12-03 14:05:41 +00:00
Jay Foad	d133a21b71	[SelectionDAG] Add newline to a debug message	2021-12-03 13:33:32 +00:00
Simon Pilgrim	6803d08c38	[DAG][PowerPC] Enable initial ISD::BITCAST SimplifyDemandedBits/SimplifyMultipleUseDemandedBits big-endian handling This patch begins extending handling for peeking through bitcast nodes to big-endian targets as well as the existing little-endian case. Differential Revision: https://reviews.llvm.org/D114676	2021-12-02 11:47:53 +00:00
Omer Aviram	617ad14060	[SelectionDAG] Add pattern to haveNoCommonBitsSet Correctly identify the following pattern, which has no common bits: (X & ~M) op (Y & M). Differential Revision: https://reviews.llvm.org/D113970	2021-12-01 12:04:04 -05:00
Simon Pilgrim	19d34f6e95	[X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT combinePMULH currently only truncates vXi32/vXi64 multiplies to PMULHW/PMULUW if the source operands are SEXT/ZEXT instructions for a 'free' truncation. But we can generalize this to any source operand with sufficient leading sign/zero bits that would allow PACKS/PACKUS to be used as a 'cheap' truncation. This helps us avoid the wider multiplies, in exchange for truncation on both source operands instead of the result. Differential Revision: https://reviews.llvm.org/D113371	2021-12-01 16:37:49 +00:00
Ties Stuij	f5f28d5b0c	[ARM] Implement BTI placement pass for PACBTI-M This patch implements a new MachineFunction in the ARM backend for placing BTI instructions. It is similar to the existing AArch64 aarch64-branch-targets pass. BTI instructions are inserted into basic blocks that: - Have their address taken - Are the entry block of a function, if the function has external linkage or has its address taken - Are mentioned in jump tables - Are exception/cleanup landing pads Each BTI instructions is placed in the beginning of a BB after the so-called meta instructions (e.g. exception handler labels). Each outlining candidate and the outlined function need to be in agreement about whether BTI placement is enabled or not. If branch target enforcement is disabled for a function, the outliner should not covertly enable it by emitting a call to an outlined function, which begins with BTI. The cost mode of the outliner is adjusted to account for the extra BTI instructions in the outlined function. The ARM Constant Islands pass will maintain the count of the jump tables, which reference a block. A `BTI` instruction is removed from a block only if the reference count reaches zero. PAC instructions in entry blocks are replaced with PACBTI instructions (tests for this case will be added in a later patch because the compiler currently does not generate PAC instructions). The ARM Constant Island pass is adjusted to handle BTI instructions correctly. Functions with static linkage that don't have their address taken can still be called indirectly by linker-generated veneers and thus their entry points need be marked with BTI or PACBTI. The changes are tested using "LLVM IR -> assembly" tests, jump tables also have a MIR test. Unfortunately it is not possible add MIR tests for exception handling and computed gotos because of MIR parser limitations. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Mikhail Maltsev - Momchil Velikov - Ties Stuij Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D112426	2021-12-01 12:54:05 +00:00
Bradley Smith	0eb1efb92c	[DAGCombiner] When combining REM ensure optimized div nodes are unique The REM DAG combine uses the visitDivLike functions to try and get an optimized DIV node to provide better codegen, however in some cases this visitDivLike call ends up in the BuildSDIVPow2 target hook, which in turn sometimes will return the same node passed in to indicate not to change it. The REM DAG combine does not anticipate this and creates a cycle in the DAG because of it. Fix this by ensuring any such optimized div node returned is distinct from the node being combined. Differential Revision: https://reviews.llvm.org/D114716	2021-12-01 11:24:26 +00:00
Simon Pilgrim	9981dd142f	[DAG] Apply clang-format to visitMSTORE + visitMLOAD. NFC. Reduce diff in D114582	2021-12-01 11:23:47 +00:00
Qiu Chaofan	15826eb437	[Legalizer] Avoid expansion to BR_CC if illegal Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110616	2021-12-01 12:22:21 +08:00
Mircea Trofin	a503cb00d1	[NFC][regalloc] Factor accesses to ExtraRegInfo We'll move ExtraRegInfo to the RegAllocEvictionAdvisor subsequently. This change prepares for that by factoring all accesses. RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html Differential Revision: https://reviews.llvm.org/D114759	2021-11-30 15:10:49 -08:00
David Green	9e8a71caf0	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 15:29:14 +00:00
Hans Wennborg	a87782c34d	Revert "[DAG] Create fptosi.sat from clamped fptosi" It causes builds to fail with this assert: llvm/include/llvm/ADT/APInt.h:990: bool llvm::APInt::operator==(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed. See comment on the code review. > This adds a fold in DAGCombine to create fptosi_sat from sequences for > smin(smax(fptosi(x))) nodes, where the min/max saturate the output of > the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because > it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, > ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need > to be handled similarly. > > A shouldConvertFpToSat method was added to control when converting may > be profitable. The original fptosi will have a less strict semantics > than the fptosisat, with less values that need to produce defined > behaviour. > > This especially helps on ARM/AArch64 where the vcvt instructions > naturally saturate the result. > > Differential Revision: https://reviews.llvm.org/D111976 This reverts commit `52ff3b0093`.	2021-11-30 15:36:56 +01:00
Jeremy Morse	3c04507088	[DebugInfo] Turn instruction referencing on by default for x86 This patch is designed to be reverted -- it activates a reasonably large block of new-ish code, so some turbulence is likely. Instruction referencing is best summarised, and it being on-by-default, is discussed here: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153653.html Differential Revision: https://reviews.llvm.org/D114631	2021-11-30 13:44:07 +00:00
Jeremy Morse	651122fc4a	[DebugInfo][InstrRef] Pre-land on-by-default-for-x86 changes Over in D114631 and [0] there's a plan for turning instruction referencing on by default for x86. This patch adds / removes all the relevant bits of code, with the aim that the final patch is extremely small, for an easy revert. It should just be a condition in CommandFlags.cpp and removing the XFail on instr-ref-flag.ll. [0] https://lists.llvm.org/pipermail/llvm-dev/2021-November/153653.html	2021-11-30 12:40:59 +00:00
Jeremy Morse	8dda516b83	[DebugInfo][InstrRef] Avoid dropping fragment info during PHI elimination InstrRefBasedLDV used to crash on the added test -- the exit block is not in scope for the variable being propagated, but is still considered because it contains an assignment. The failure-mode was vlocJoin ignoring assign-only blocks and not updating DIExpressions, but pickVPHILoc would still find a variable location for it. That led to DBG_VALUEs created with the wrong fragment information. Fix this by removing a filter inherited from VarLocBasedLDV: vlocJoin will now consider assign-only blocks and will update their expressions. Differential Revision: https://reviews.llvm.org/D114727	2021-11-30 11:32:31 +00:00
David Green	52ff3b0093	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 11:05:32 +00:00
Abinav Puthan Purayil	bc5dbb0bae	[GlobalISel] Add matchers for constant splat. This change exposes isBuildVectorConstantSplat() to the llvm namespace and uses it to implement the constant splat versions of m_SpecificICst(). CombinerHelper::matchOrShiftToFunnelShift() can now work with vector types and CombinerHelper::matchMulOBy2()'s match for a constant splat is simplified. Differential Revision: https://reviews.llvm.org/D114625	2021-11-30 15:18:50 +05:30
Guozhi Wei	f1d8345a2a	[TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB Currently we create register mappings for registers used only once in current MBB. For registers with multiple uses, when all the uses are in the current MBB, we can also create mappings for them similarly according to the last use. For example %reg101 = ... = ... reg101 %reg103 = ADD %reg101, %reg102 We can create mapping between %reg101 and %reg103. Differential Revision: https://reviews.llvm.org/D113193	2021-11-29 19:01:59 -08:00
Mircea Trofin	e8b8304d76	[NFC][Regalloc] Split canEvictInterference into hint and general There are 2 eviction queries. One is made by tryAssign, when it attempts to free an interference occupying the hint of the candidate. The other is during 'regular' interference resolution, where we scan over all physical registers and try to see if we can evict live ranges in favor of the candidate. We currently use the same logic in both cases, just that the former never passes the cost to any subsequent query. Technically, the 2 decisions could be implemented with different policies. This patch splits the 2. RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html Differential Revision: https://reviews.llvm.org/D114019	2021-11-29 16:04:03 -08:00
Jeremy Morse	0eee844539	[DebugInfo][InstrRef] Terminate overlapping variable fragments If we have a variable where its fragments are split into overlapping segments: DBG_VALUE $ax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 16) ... DBG_VALUE $eax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 32) we should only propagate the most recently assigned fragment out of a block. LiveDebugValues only deals with live-in variable locations, as overlaps within blocks is DbgEntityHistoryCalculators domain. InstrRefBasedLDV has kept the accumulateFragmentMap method from VarLocBasedLDV, we just need it to recognise DBG_INSTR_REFs. Once it's produced a mapping of variable / fragments to the overlapped variable / fragments, VLocTracker uses it to identify when a debug instruction needs to terminate the other parts it overlaps with. The test is updated for some standard "InstrRef picks different registers" variation, and the order of some unrelated DBG_VALUEs changes. Differential Revision: https://reviews.llvm.org/D114603	2021-11-29 23:37:20 +00:00
Jeremy Morse	a20987adf4	[DebugInfo][InstrRef] Add indirection from dbg.declare in SelectionDAG Usually dbg.declares get translated into either entries in an MF side-table, or a DBG_VALUE on entry to the function with IsIndirect set (including in instruction referencing mode). Much rarer is a dbg.declare attached to a non-argument value, such as in the test added in this patch where there's a variable-length-array. Such dbg.declares become SDDbgValue nodes with InIndirect=true. As it happens, we weren't correctly emitting DBG_INSTR_REFs with the additional indirection. This patch adds the extra indirection, encoded as adding an additional DW_OP_deref to the expression. Differential Revision: https://reviews.llvm.org/D114440	2021-11-29 22:24:19 +00:00
Jeremy Morse	9cf31b8d39	[DebugInfo][InstrRef] Preserve properties of restored variables InstrRefBasedLDV observes when variable locations are clobbered, scans what values are available in the machine, and re-issues a DBG_VALUE for the variable if it can find another location. Unfortunately, I hadn't joined up the Indirectness flag, so if it did this to an Indirect Value, the indirectness would be dropped. Fix this, and add a test that if we clobber a variable value (on the stack in this case), then the recovered variable location keeps the Indirect flag. Differential Revision: https://reviews.llvm.org/D114378	2021-11-29 21:57:24 +00:00
Kazu Hirata	f240e528ce	[llvm] Use range-based for loops (NFC)	2021-11-29 09:04:44 -08:00
Mirko Brkusanin	0dd570ff56	[AMDGPU][GlobalISel] Transform (fsub (fpext (fneg (fmul x, y))), z) -> (fneg (fma (fpext x), (fpext y), z)) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D98050	2021-11-29 16:27:22 +01:00
Mirko Brkusanin	37c2a2201d	[AMDGPU][GlobalISel] Transform (fsub (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), (fneg z)) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D98049	2021-11-29 16:27:22 +01:00
Mirko Brkusanin	5fe7fcd28e	[AMDGPU][GlobalISel] Transform (fsub (fneg (fmul, x, y)), z) -> (fma (fneg x), y, (fneg z)) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D98048	2021-11-29 16:27:22 +01:00
Mirko Brkusanin	a782169270	[AMDGPU][GlobalISel] Transform (fsub (fmul x, y), z) -> (fma x, y, -z) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D96614	2021-11-29 16:27:22 +01:00
Mirko Brkusanin	e5e49a08f1	[AMDGPU][GlobalISel] Transform (fadd (fma x, y, (fpext (fmul u, v))), z) -> (fma x, y, (fma (fpext u), (fpext v), z)) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D98047	2021-11-29 16:27:21 +01:00
Mirko Brkusanin	f732292536	[AMDGPU][GlobalISel] Transform (fadd (fma x, y, (fmul u, v)), z) -> (fma x, y, (fma u, v, z)) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D97938	2021-11-29 16:27:21 +01:00
Mirko Brkusanin	8951136216	[AMDGPU][GlobalISel] Transform (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D97937	2021-11-29 16:27:21 +01:00
Mirko Brkusanin	881840fc26	[AMDGPU][GlobalISel] Transform (fadd (fmul x, y), z) -> (fma x, y, z) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D93305	2021-11-29 16:27:21 +01:00
Bjorn Pettersson	297fb66484	Use a deterministic order when updating the DominatorTree This solves a problem with non-deterministic output from opt due to not performing dominator tree updates in a deterministic order. The problem that was analysed indicated that JumpThreading was using the DomTreeUpdater via llvm::MergeBasicBlockIntoOnlyPred. When preparing the list of updates to send to DomTreeUpdater::applyUpdates we iterated over a SmallPtrSet, which didn't give a well-defined order of updates to perform. The added domtree-updates.ll test case is an example that would result in non-deterministic printouts of the domtree. Semantically those domtree:s are equivalent, but it show the fact that when we use the domtree iterator the order in which nodes are visited depend on the order in which dominator tree updates are performed. Since some passes (at least EarlyCSE) are iterating over nodes in the dominator tree in a similar fashion as the domtree printer, then the order in which transforms are applied by such passes, transitively, also depend on the order in which dominator tree updates are performed. And taking EarlyCSE as an example the end result could be different depending on in which order the transforms are applied. Reviewed By: nikic, kuhar Differential Revision: https://reviews.llvm.org/D110292	2021-11-29 13:14:50 +01:00
Bradley Smith	6180806632	[AArch64][SVE] Mark fixed-type FP extending/truncating loads/stores as custom This allows the generic DAG combine to fold fp_extend/fp_trunc into loads/stores which we can then lower into a integer extending load/truncating store plus an FP_EXTEND/FP_ROUND. The nuance here is that fixed-type FP_EXTEND/FP_ROUND require unpacked types hence lowering them introduces an unpack/zip. By allowing these nodes to be combined with loads/store we make it much easier to have this unpack/zip combined into the load/store by our custom lowering. Differential Revision: https://reviews.llvm.org/D114580	2021-11-29 11:56:07 +00:00
David Sherwood	a31f4bdfe8	[CodeGen][SVE] Use whilelo instruction when lowering @llvm.get.active.lane.mask In most common cases the @llvm.get.active.lane.mask intrinsic maps directly to the SVE whilelo instruction, which already takes overflow into account. However, currently in SelectionDAGBuilder::visitIntrinsicCall we always lower this immediately to a generic sequence of instructions that explicitly take overflow into account. This makes it very difficult to then later transform back into a single whilelo instruction. Therefore, this patch introduces a new TLI function called shouldExpandGetActiveLaneMask that asks if we should lower/expand this to a sequence of generic ISD nodes, or instead just leave it as an intrinsic for the target to lower. You can see the significant improvement in code quality for some of the tests in this file: CodeGen/AArch64/active_lane_mask.ll Differential Revision: https://reviews.llvm.org/D114542	2021-11-29 08:08:17 +00:00
Kazu Hirata	fd7d40640d	[llvm] Use range-based for loops (NFC)	2021-11-28 18:14:49 -08:00
Kazu Hirata	c73fc74ce0	[llvm] Use range-based for loops (NFC)	2021-11-28 10:04:54 -08:00
Kristina Bessonova	9043289326	[DwarfCompileUnit] Set parent DIE right after creating a local entity No functional changes intended. Before this patch DwarfCompileUnit::createScopeChildrenDIE() and DwarfCompileUnit::createAndAddScopeChildrenDIE() used to emit child subtrees and then when all the children get created, attach them to a parent scope DIE. However, when a DIE doesn't have a parent, all the requests for its unit DIE fail. Currently, this is not a big issue since it isn't usually needed to know unit DIE for a local (function-scoped) entity. But once we introduce lexical blocks as a valid scope for global variables (static locals) and type DIEs, any requests for a unit DIE need to be guarded against local scope due to the potential absence of the DIE's parent. To avoid the aforementioned issue, this patch refactors a few DwarfCompileUnit methods to support the idea of attaching a DIE to its parent as close to the creation of this DIE as possible. Reviewed By: ellis Differential Revision: https://reviews.llvm.org/D114350	2021-11-27 17:59:07 +02:00
Kazu Hirata	387927bbaf	[Target] Use range-based for loops (NFC)	2021-11-26 21:21:17 -08:00
Nikita Popov	bfa91f38a9	[DAG] Restore dropped condition This was dropped in `fcee33bd5a`, presumably accidentally.	2021-11-26 21:18:54 +01:00
Simon Pilgrim	fcee33bd5a	[DAG] Pull out repeated isLittleEndian() calls. NFC.	2021-11-26 18:41:56 +00:00
Abinav Puthan Purayil	4af45f10cc	[GlobalISel] Fold or of shifts to funnel shift. This change folds a basic funnel shift idiom: - (or (shl x, amt), (lshr y, sub(bw, amt))) -> fshl(x, y, amt) - (or (shl x, sub(bw, amt)), (lshr y, amt)) -> fshr(x, y, amt) This also helps in folding to rotate shift if x and y are equal since we already have a funnel shift to rotate combine. Differential Revision: https://reviews.llvm.org/D114499	2021-11-26 17:05:29 +05:30
Simon Pilgrim	2778f9a9f6	[DAG] SimplifyDemandedVectorElts - attempt to handle ADD(x,x) as single use If the ADD node is the only user of the repeated operand, then treat this as single use - allows us to peek through shl(x,1) patterns.	2021-11-26 10:32:10 +00:00
David Sherwood	86137fb722	[CodeGen] Add scalable vector support for lowering of llvm.get.active.lane.mask Currently the generic lowering of llvm.get.active.lane.mask is done in SelectionDAGBuilder::visitIntrinsicCall and currently assumes only fixed-width vectors are used. This patch changes the code to be more generic and support scalable vectors too. I have added tests for SVE here: CodeGen/AArch64/active_lane_mask.ll although the code quality leaves a lot to be desired. The code will be improved significantly in a later patch that makes use of the SVE whilelo instruction. Differential Revision: https://reviews.llvm.org/D114541	2021-11-26 08:17:55 +00:00
Kazu Hirata	259cd6f893	[llvm] Use range-based for loops (NFC)	2021-11-25 22:17:10 -08:00
Jeremy Morse	536b9eb31e	[DebugInfo][InstrRef] Add extra indirection for NRVO tests In some scenarios, usually involving NRVO, we can issue indirect DBG_VALUEs after SelectionDAG, even in instruction referencing mode (if the variable is an argument). If the corresponding argument value is spilt to the stack, then we have: * Indirection from it being on the stack, * Indirection from it being a dbg.declare or a dbg.addr. However InstrRefBasedLDV only emits one level of indirection. This patch adds the second, by adding an extra DW_OP_deref if necessary. The two tests modified fail otherwise -- they feature some NRVO, and require two levels of indirection to be correct. Differential Revision: https://reviews.llvm.org/D114364	2021-11-25 21:43:38 +00:00
Jeremy Morse	3107081e94	[DebugInfo][InstrRef] Avoid some quadratic behaviour in LiveDebugVariables This is a performance patch -- LiveDebugVariables can behave quadratically if a lot of debug instructions are inserted back into the same place, and we have to repeatedly step-over hte ones we've already inserted. To get around it, whenever we insert a debug instruction at a slot index, check whether there are more debug instructions to insert at this point, and insert them too. That avoids the repeated lookup and stepping through. It relies on the container for unlinked debug instructions being recorded in-order, which is how LiveDebugVariables currently does it. Differential Revision: https://reviews.llvm.org/D114587	2021-11-25 20:31:00 +00:00
Kazu Hirata	bfd5dd1568	[llvm] Use range-based for loops (NFC)	2021-11-25 08:55:16 -08:00
Jeremy Morse	102d2a8a99	[DebugInfo][InstrRef] Track variable assignments in out-of-scope blocks DBG_INSTR_REF's and DBG_VALUE's can end up in blocks that aren't in the lexical scope of their variable. It's arguable as to what we should do about this, however VarLocBasedLDV permits such variable locations to be propagated, so let's allow it in InstrRefBasedLDV. It's necessary for the modified test to work. Differential Revision: https://reviews.llvm.org/D114578	2021-11-25 14:52:11 +00:00
Simon Pilgrim	63b1e58f07	[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED) If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift. For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task. Reapplied with fix for AArch64 rev patterns to matching the ARM fix. https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount) https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount) https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount) https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount) Differential Revision: https://reviews.llvm.org/D114354	2021-11-25 11:14:15 +00:00
David Green	3a700cabdc	[SDAG] Allow Unknown sizes when refining MMO alignments. NFC The changes in D113888 / `32b6c17b29` altered the memory size of a masked store, as it will store an unknown number of bytes not the full vector size. We can have situations where the masked stores is legalized and then turned to a normal store, as the mask is known to be all ones. This creates a store with an unknown size MMO that was hitting this assert. The store created can be given a better size in a followup patch. This currently adjusts the assert to handle unknown sizes.	2021-11-25 10:19:29 +00:00
Jameson Nash	0332d105b9	GlobalISel: remove assert that memcpy Src and Dst addrspace must be identical The LangRef does not require these arguments to have the same type. Differential Revision: https://reviews.llvm.org/D93154	2021-11-24 20:23:05 -05:00
Zarko Todorovski	95875d246a	[LLVM][NFC]Inclusive language: remove occurances of sanity check/test from llvm Part of work to use more inclusive language in clang/llvm. Rewording some comments and change function and variable names.	2021-11-24 17:29:55 -05:00
Jeremy Morse	bfadc5dcbf	[DebugInfo][InstrRef] Cope with win32 calls changing SP in LiveDebugValues Almost all of the time, call instructions don't actually lead to SP being different after they return. An exception is win32's _chkstk, which which implements stack probes. We need to recognise that as modifying SP, so that copies of the value are tracked as distinct vla pointers. This patch adds a target frame-lowering hook to see whether stack probe functions will modify the stack pointer, store that in an internal flag, and if it's true then scan CALL instructions to see whether they're a stack probe. If they are, recognise them as defining a new stack-pointer value. The added test exercises this behaviour: two calls to _chkstk should be considered as producing two different values. Differential Revision: https://reviews.llvm.org/D114443	2021-11-24 19:56:21 +00:00
Jeremy Morse	133e25f946	[DebugInfo][InstrRef] Ignore SP clobbers on call instructions even more Avoid un-necessarily recreating DBG_VALUEs on call instructions. In LiveDebugvalues we choose to ignore any clobbers of SP by call instructions, as they're irrelevant to our model of the machine. We currently do so for tracking register values (MTracker); do the same for tracking variable locations (TTracker). Test modified to endure that a duplicate DBG_VALUE is not created after the call in struction in this test. Differential Revision: https://reviews.llvm.org/D114365	2021-11-24 17:25:48 +00:00
Benjamin Kramer	d32787230d	Revert "[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl" This reverts commit `3cf4a2c620`. It makes llc hang on the following test case. ``` target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" target triple = "aarch64-unknown-linux-gnu" define dso_local void @_PyUnicode_EncodeUTF16() local_unnamed_addr #0 { entry: br label %while.body117.i while.body117.i: ; preds = %cleanup149.i, %entry %out.6269.i = phi i16* [ undef, %cleanup149.i ], [ undef, %entry ] %0 = load i16, i16* undef, align 2 %1 = icmp eq i16 undef, -10240 br i1 %1, label %fail.i, label %cleanup149.i cleanup149.i: ; preds = %while.body117.i %or130.i = call i16 @llvm.bswap.i16(i16 %0) #2 store i16 %or130.i, i16* %out.6269.i, align 2 br label %while.body117.i fail.i: ; preds = %while.body117.i ret void } ; Function Attrs: nofree nosync nounwind readnone speculatable willreturn declare i16 @llvm.bswap.i16(i16) #1 attributes #0 = { "target-features"="+neon,+v8a" } attributes #1 = { nofree nosync nounwind readnone speculatable willreturn } attributes #2 = { mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" } ```	2021-11-24 14:42:54 +01:00
Simon Pilgrim	3cf4a2c620	[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift. For the ARM rev16 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task. https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount) https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount) https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount) https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount) Differential Revision: https://reviews.llvm.org/D114354	2021-11-24 11:28:35 +00:00
David Sherwood	cf40ca026f	[NFC] Tidy up SelectionDAGBuilder::visitIntrinsicCall to use existing sdl debug loc In quite a few places we were calling getCurSDLoc() to get the debug location, but this is already a local variable `sdl`. Differential Revision: https://reviews.llvm.org/D114447	2021-11-24 10:35:49 +00:00
Jeremy Morse	b8f68ad9cd	[DebugInfo][InstrRef] Avoid crash when values optimised out late in sdag It appears that we can emit all the instructions for a function, including debug instructions, and then optimise some of the values out late. Specifically, in the attached test case, an argument gets optimised out after DBG_VALUE / DBG_INSTR_REFs are created. This confuses MachineFunction::finalizeDebugInstrRefs, which expects to be able to find a defining instruction, and crashes instead. Fix this by identifying when there's no defining instruction, and translating that instead into a DBG_VALUE $noreg. Differential Revision: https://reviews.llvm.org/D114476	2021-11-24 10:34:48 +00:00
Jun Ma	17eb6b61de	Revert "[Taildup] Don't tail-duplicate loop header with multiple successors as its latches" This reverts commit `1f9fa54984`.	2021-11-24 10:26:37 +08:00
Matt Arsenault	273a0c8bc9	PrologEpilogInserter: Use explicit control for scavenge slot placement AMDGPU is unusual in that the both stack is indexed in the same direction as stack growth (up). We therefore always need the emergency stack slots placed as low as possible to ensure they are in range of load/store instruction immediate offsets. The existing logic is mostly OK, but failed if we required stack realignment. I don't understand what the existing control isFPCloseToIncomingSP is supposed to mean, but can only be used to stop placing the scavenge slots earlier. Make this explicit so that targets can opt-in rather than opt-out only.	2021-11-23 18:01:12 -05:00
Rong Xu	bf1138491a	[SampleFDO] Recompute BFI if the sample loader changes BPI The MIR sample loader changes the branch probability but not BFI. Here we force a recompute of BFI if the branch probabilities are changed. Also register the MIR FSAFDO passes properly. Differential Revision: https://reviews.llvm.org/D114400	2021-11-23 13:24:31 -08:00
Quinn Pham	1345bc5e16	[NFC][llvm] Inclusive language: remove instance of master in LiveRangeUtils.h [NFC] As part of using inclusive language within the llvm project, this patch replaces master with primary in `LiveRangeUtils.h`. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D114191	2021-11-23 13:07:42 -06:00
Kazu Hirata	d45cb1d7ea	[llvm] Use range-based for loops (NFC)	2021-11-23 08:54:48 -08:00
Simon Moll	1e65b93f3a	[VP] Canonicalize macros of VPIntrinsics.def Usage and naming of macros in VPIntrinsics.def has been inconsistent. Rename all property macros to VP_PROPERTY_<name>. Use BEGIN/END scope macros to attach properties to vp intrinsics and SDNodes (instead of specifying either directly with the property macro). A follow-up patch has documentation on how the macros are (intended) to be used. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D114144	2021-11-23 16:51:11 +01:00
David Green	32b6c17b29	[SDAG] Use UnknownSize for masked load/store MMO size A masked load or store will load a potentially unknown number of bytes from a memory location - that is not generally known at compile time. They do not necessarily load/store the entire vector width, and treating them as such can lead to incorrect aliasing information (for example, if the underlying object is smaller than the size of the vector). This makes sure that the MMO is given an unknown size to represent this. which is less accurate that "may load/store from up to 16 bytes", but less incorrect that "will load/store from 16 bytes". Differential Revision: https://reviews.llvm.org/D113888	2021-11-23 09:47:56 +00:00
Kazu Hirata	d5b73a70a0	[llvm] Use range-based for loops (NFC)	2021-11-22 20:33:28 -08:00
Nico Weber	2fb3c05b34	[asm] Merge EmitMSInlineAsmStr() and EmitGCCInlineAsmStr() This basically reverts `1778831a3d`, which split them. Since they were split 9 years ago, EmitGCCInlineAsmStr() grew a bunch of features that usually weren't added to EmitMSInlineAsmStr(), and that was usually a mistake. D71677, D113932, D114167 are all examples of where things were backported to EmitMSInlineAsmStr(). The names were also not great. EmitMSInlineAsmStr() used to be called for `asm inteldialect`, which clang produces for Microsoft-style __asm { ... } blocks as well for GCC-style __asm__ / asm statements with -masm=intel. On the other hand, EmitGCCInlineAsmStr() used to be called for `asm`, whic clang produces for GCC-style __asm__ / asm statements with -masm=att (the default). It's also less code (23 insertions, 188 deletions). No behavior change. Differential Revision: https://reviews.llvm.org/D114330	2021-11-22 11:49:57 -05:00
Nico Weber	7c2d51474a	[asm] Allow labels as operands in intel asm syntax This makes a line in llvm/test/CodeGen/X86/asm-block-labels.ll pass with `asm inteldialect` too. I don't know if this is something one can hit in practice with inline asm. The test is from 2007 (`4646aa3e33`) but in 2009 blockaddr was introduced and e.g. `__asm__ __volatile__("brl %0" :: "X"(&&foo) : "memory");` compiles to call void asm sideeffect "brl $0", "X,..."(i8* blockaddress(@func, %1)) nowadays (thanks to jrtc27 for that example!). (`6c4d255bf3` switched clang to blockaddress on an opt-in basis, `e4801f7844` added docs for it, `31b132c0b7` added IR support.) I half-heartedly tried to build clang 2.8 locally, but it didn't just build. And 2.8 didn't have a prebuilt clang binary yet. The motivation is to make EmitGCCInlineAsmStr() and EmitMSInlineAsmStr() more alike, and maybe we should delete this code form EmitGCCInlineAsmStr() instead. But since it's just 3 lines and it's reachable from LLVM IR, let's do the safer thing for now. Differential Revision: https://reviews.llvm.org/D114329	2021-11-22 11:49:29 -05:00
Kazu Hirata	c133fb321f	[CodeGen] Use llvm::is_contained (NFC)	2021-11-21 10:36:20 -08:00
Kazu Hirata	fc981cedea	[llvm] Use range-based for loops (NFC)	2021-11-21 10:36:18 -08:00
Kazu Hirata	f6bce30cf9	[llvm] Use range-based for loops (NFC)	2021-11-20 18:42:10 -08:00
Nico Weber	8b76d33c59	[asm] Allow block address operands in `asm inteldialect` This makes the following program build with -masm=intel: int foo(int count) { asm goto ("dec %0; jb %l[stop]" : "+r" (count) : : : stop); return count; stop: return 0; } It's also is another step towards merging EmitGCCInlineAsmStr() and EmitMSInlineAsmStr(). Differential Revision: https://reviews.llvm.org/D114167	2021-11-19 09:27:30 -05:00
Nico Weber	4f9a5c2a14	[asm] Remove explicit branch for modifier 'l' No intended behavior change. EmitGCCInlineAsmStr() used to explicitly check for modifier 'l' after handling block address and machine basic block operands. This prevented passing a MachineOperand with 'l' modifier to PrintAsmMemoryOperand(). Conceptually that seems kind of nice, but in practice the overrides of PrintAsmMemoryOperand() in all () AsmPrinter subclasses already reject modifiers they don't know about, and none of them don't know about 'l'. So removing this doesn't have a behavior difference, is less code, and it makes EmitGCCInlineAsmStr() and EmitMSInlineAsmStr() more similar, to prepare for merging them later. (Why not _add_ the branch to EmitMSInlineAsmStr() instead? Because that always works with X86AsmPrinter I think, and X86AsmPrinter::PrintAsmMemoryOperand() very decisively rejects the 'l' modifier, so it's hard to motivate adding that branch.) : The one exception was AVRAsmPrinter, which had an llvm_unreachable instead of returning true. So this commit changes that, so that the AVR target keeps emitting an error instead of crashing when passing a mem operand with a :l modifier to it. All the other targets already don't crash on this. Differential Revision: https://reviews.llvm.org/D114216	2021-11-19 09:19:53 -05:00
Simon Pilgrim	812e64ef0c	[DAG] MatchRotate - support rotate-by-constant of illegal types Patch to fix some of the regressions in D77804. By folding to rotate/funnel-shift by constant amounts for illegal types, we prevent SimplifyDemandedBits from destroying the patterns prematurely, allowing us to use the rotate/funnel-shift legalization that was added in D112443. Differential Revision: https://reviews.llvm.org/D113192	2021-11-19 11:12:04 +00:00
Kazu Hirata	7ca14f6044	[llvm] Use range-based for loops (NFC)	2021-11-18 09:09:52 -08:00
Eric Tang	9fe6b9e802	[TargetLowering][RISCV] Fixed a scalable vector issue when lowering [s\|u]mul.overflow intrinsics Fixed the vector type issue that where we used getVectorNumElements() should be replaced by getVectorElementCount() when lowering these intrinsics. This is similar to D94149 Signed-off-by: Eric Tang <tangxingxin1008@gmail.com> Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D109809	2021-11-18 10:16:08 +00:00
Craig Topper	d78fdf111d	[LegalizeTypes] Further limit expansion of CTTZ during type promotion. Don't expand CTTZ if CTPOP or CTLZ is supported on the promoted type. We have special handling for CTTZ expansion to use those ops with a small conversion. The setup for that doesn't generate extra code or large constants so we don't gain anything from expanding early and we make CTTZ_ZERO_UNDEF codegen worse. Follow up from post commit feedback on D112268. We don't seem to have any in tree tests that care about this.	2021-11-17 15:27:29 -08:00
Nico Weber	bf834b2629	[x86/asm] Let EmitMSInlineAsmStr() handle variants too This is preparation for D113707, where I want to make `-masm=intel` emit `asm inteldialect` instructions. `{movq %rbx, %rax\|mov rax, rbx}` is supposed to evaluate to the bit between { and \| for att and to the bit between \| and } for intel. Since intel will become `asm inteldialect`, which alls EmitMSInlineAsmStr(), EmitMSInlineAsmStr() has to support variants as well. (clang translates `{...\|...}` to `$(...$\|...$)`. I'm not sure why it doesn't just send along only the first `...` or the second `...` to LLVM, but given the notes in PR23933 let's not do a big reorganization in this codepath.) Differential Revision: https://reviews.llvm.org/D113932	2021-11-17 13:31:59 -05:00
Craig Topper	0274be28d7	[RISCV] Lower vector CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF by converting to FP and extracting the exponent. If we have a large enough floating point type that can exactly represent the integer value, we can convert the value to FP and use the exponent to calculate the leading/trailing zeros. The exponent will contain log2 of the value plus the exponent bias. We can then remove the bias and convert from log2 to leading/trailing zeros. This doesn't work for zero since the exponent of zero is zero so we can only do this for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF. If we need a value for zero we can use a vmseq and a vmerge to handle it. We need to be careful to make sure the floating point type is legal. If it isn't we'll continue using the integer expansion. We could split the vector and concatenate the results but that needs some additional work and evaluation. Differential Revision: https://reviews.llvm.org/D111904	2021-11-17 10:29:41 -08:00
Nico Weber	103cc914d6	[x86/asm] Make variants work when converting at&t inline asm input to intel asm output `asm` always has AT&T-style input (`asm inteldialect` has Intel-style asm input), so EmitGCCInlineAsmStr() always has to pick the same variant since it cares about the input asm string, not the output asm string. For PowerPC, that default variant is 1. For other targets, it's 0. Without this, the included test case errors out with error: unknown use of instruction mnemonic without a size suffix mov rax, rbx since it picks the intel branch and then tries to interpret it as AT&T when selecting intel-style output with `-x86-asm-syntax=intel`. Differential Revision: https://reviews.llvm.org/D113894	2021-11-17 13:23:18 -05:00
DianQK	1e9fa0b12a	Fix the side effect of outlined function when the register is implicit use and implicit-def in the same instruction. This is the diff associated with {D95267}, and we need to mark $x0 as live whether or not $x0 is dead. The compiler also needs to mark register $x0 as live in for the following case. ``` $x1 = ADDXri $sp, 16, 0 BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def $x0 ``` This change fixes an issue where the wrong registers were used when -machine-outliner-reruns>0. As an example: ``` lang=c typedef struct { double v1; double v2; } D16; typedef struct { D16 v1; D16 v2; } D32; typedef long long LL8; typedef struct { long long v1; long long v2; } LL16; typedef struct { LL16 v1; LL16 v2; } LL32; typedef struct { LL32 v1; LL32 v2; } LL64; LL8 needx0(LL8 v0, LL8 v1); void bar(LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8); LL8 foo(LL8 v0, LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8) { LL8 result = needx0(v0, 0); bar(v1, v2, v3, v4, v5, v6, v7, v8); return result + 1; } ``` As you can see from the `foo` function, we should not modify the value of `x0` until we call `needx0`. This code is compiled to give the following instruction MIR code. ``` $sp = frame-setup SUBXri $sp, 256, 0 frame-setup STPDi killed $d13, killed $d12, $sp, 16 frame-setup STPDi killed $d11, killed $d10, $sp, 18 frame-setup STPDi killed $d9, killed $d8, $sp, 20 frame-setup STPXi killed $x26, killed $x25, $sp, 22 frame-setup STPXi killed $x24, killed $x23, $sp, 24 frame-setup STPXi killed $x22, killed $x21, $sp, 26 frame-setup STPXi killed $x20, killed $x19, $sp, 28 ... $x1 = MOVZXi 0, 0 BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0 ... ``` Since there are some other instruction sequences that duplicate `foo`, after the first execution of Machine Outliner you will get: ``` $sp = frame-setup SUBXri $sp, 256, 0 frame-setup STPDi killed $d13, killed $d12, $sp, 16 frame-setup STPDi killed $d11, killed $d10, $sp, 18 frame-setup STPDi killed $d9, killed $d8, $sp, 20 $x7 = ORRXrs $xzr, $lr, 0 BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26 $lr = ORRXrs $xzr, $x7, 0 ... BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp ... ``` For the first time we outlined the following sequence: ``` frame-setup STPXi killed $x26, killed $x25, $sp, 22 frame-setup STPXi killed $x24, killed $x23, $sp, 24 frame-setup STPXi killed $x22, killed $x21, $sp, 26 frame-setup STPXi killed $x20, killed $x19, $sp, 28 ``` and ``` $x1 = MOVZXi 0, 0 BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0 ``` When we execute the outline again, we will get: ``` $x0 = ORRXrs $xzr, $lr, 0 <---- here BL @OUTLINED_FUNCTION_2_0, implicit-def $lr, implicit $sp, implicit-def $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $d8, implicit $d9, implicit $d10, implicit $d11, implicit $d12, implicit $d13, implicit $x0 $lr = ORRXrs $xzr, $x0, 0 $x7 = ORRXrs $xzr, $lr, 0 BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26 $lr = ORRXrs $xzr, $x7, 0 ... BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp ``` When calling `OUTLINED_FUNCTION_2_0`, we used `x0` to save the `lr` register. The reason for the above error appears to be that: ``` BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp ``` should be: ``` BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp, implicit $x0 ``` When processing the same instruction with both `implicit-def $x0` and `implicit $x0` we should keep `implicit $x0`. A reproducible demo is available at: [https://github.com/DianQK/reproduce_outlined_function_use_live_x0](https://github.com/DianQK/reproduce_outlined_function_use_live_x0). Reviewed By: jinlin Differential Revision: https://reviews.llvm.org/D112911	2021-11-17 09:44:10 -08:00
Mirko Brkusanin	db6bc2ab51	[AMDGPU][GlobalISel] Fold G_FNEG above when users cannot fold mods If possible fold fneg into instruction above if users cannot fold mods and we know it will decrease instruction count. Follows same logic as SDAG combiner in choosing opportunities to combine. Differential Revision: https://reviews.llvm.org/D112827	2021-11-17 14:25:13 +01:00
David Sherwood	8d77555b12	[Analysis] Ensure getTypeLegalizationCost returns a simple VT for TypeScalarizeScalableVector When getTypeConversion returns TypeScalarizeScalableVector we were sometimes returning a non-simple type from getTypeLegalizationCost. However, many callers depend upon this being a simple type and will crash if not. This patch changes getTypeLegalizationCost to ensure that we always a return sensible simple VT. If the vector type contains unusual integer types, e.g. <vscale x 2 x i3>, then we just set the type to MVT::i64 as a reasonable default. A test has been added here that demonstrates the vectoriser can correctly calculate the cost of vectorising a "zext i3 to i64" instruction with a VF=vscale x 1: Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll Differential Revision: https://reviews.llvm.org/D113777	2021-11-17 13:11:58 +00:00
Simon Pilgrim	5fedbd5b18	[DAG] SimplifyDemandedVectorElts - zero_extend_vector_inreg(and(x,c)) -> and(x,c') If we've only demanded the 0'th element, and it comes from a (one-use) AND, try to convert the zero_extend_vector_inreg into a mask and constant fold it with the AND.	2021-11-17 12:41:48 +00:00
Jay Foad	3264e95938	[CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress Delegate updating of LiveIntervals to each target's convertToThreeAddress implementation, instead of repairing LiveIntervals after the fact in TwoAddressInstruction::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D113493	2021-11-17 10:16:47 +00:00
Eric Tang	f7eb061a5f	[SelectionDAG] Make WidenVecRes_SELECT work for scalable vectors This change make WidenVecRes_SELECT work for scalable vectors. This patch is split from [D110319](https://reviews.llvm.org/D110319) Signed-off-by: Eric Tang <tangxingxin1008@gmail.com> Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D110388	2021-11-17 08:55:11 +00:00
Aaron Puchert	b20da5117f	Don't add irrelevant items to queue in DwarfCompileUnit::createScopeChildrenDIE (NFC) Instead of popping them and then immediately throwing them away, we can just filter out globals and items in different scopes before adding them to WorkList. Shouldn't change anything but keep the queue smaller. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D113864	2021-11-17 00:01:20 +01:00
Aaron Puchert	86b3100cde	[DebugInfo] Use DbgEntityKind in DbgEntity interface (NFC) It was being used occasionally already, and using it on the constructor and getDbgEntityID has obvious type safety benefits. Also use llvm_unreachable in the switch as usual, but since only these two values are used in constructor calls I think it's still NFC. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D113862	2021-11-17 00:01:20 +01:00
Mircea Trofin	c6b9b702a0	[NFC][Regalloc] Factor out eviction decision from eviction attempt This splits tryEvict into a const tryFindEvictionCandidate, which attempts to find a candidate, and the actual eviction (should the former be successful) The newly introduced tryFindEvictionCandidate will move subsequently into the RegAllocEvictionAdvisor. RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html Differential Revision: https://reviews.llvm.org/D113941	2021-11-16 10:50:23 -08:00
Kazu Hirata	ee0133dc6d	[llvm] Use range-for loops (NFC)	2021-11-16 09:01:56 -08:00
Frederik Gossen	3f3d4e8a15	Fix unused variable warning in LoadStoreOpt.cpp with (void)	2021-11-16 12:03:59 +01:00
Frederik Gossen	2bceb7c8da	Revert "Fix unused variable in llvm/lib/CodeGen/GlobalISel/LoadStoreOpt.cpp" This reverts commit `40a609aebe`.	2021-11-16 12:00:17 +01:00
Frederik Gossen	ecfe7a3404	Revert "Fix unused variable warning." This reverts commit `a062e2a8ca`.	2021-11-16 11:59:34 +01:00
Frederik Gossen	9a6817b7ed	Revert "Fix another unused variable error." This reverts commit `5b84ae7c48`.	2021-11-16 11:58:02 +01:00
Adrian Kuegel	5b84ae7c48	Fix another unused variable error.	2021-11-16 11:32:44 +01:00
Adrian Kuegel	a062e2a8ca	Fix unused variable warning.	2021-11-16 11:17:33 +01:00
Frederik Gossen	40a609aebe	Fix unused variable in llvm/lib/CodeGen/GlobalISel/LoadStoreOpt.cpp	2021-11-16 11:05:18 +01:00
Amara Emerson	dcd8728d83	Remove unnecessary <any> include.	2021-11-16 00:50:30 -08:00
Kazu Hirata	7f00806a6a	[llvm] Use make_early_inc_range (NFC)	2021-11-15 21:28:46 -08:00
Amara Emerson	dc84770d55	[GlobalISel] Add a store-merging optimization pass and enable for AArch64. This is a first attempt at a constant value consecutive store merging pass, a counterpart to the DAGCombiner's store merging optimization. The high level goals of this pass: * Have a simple and efficient algorithm. As close to linear time as we can get. Thus, prioritizing scalability of the algorithm over merging every corner case we can find. The DAGCombiner's store merging code has been the source of compile time and complexity issues in the past and I wanted to avoid that. * Don't introduce any new data structures for ordering memory operations. In MIR, we don't have the concept of chains like we do in the DAG, and the instruction order is stricter than enforcing ordering with graph edges. Although I considered adding something similar, I couldn't justify the overhead. The pass is current split into 3 main parts. The main store merging code focuses on identifying candidate stores and managing the candidate group that's under consideration for merging. Analyzing addressing of stores is a potentially complex part and for now there's just a basic implementation to identify easy cases. Finally, the other main bit of complexity is the alias analysis, which tries to follow the same logic as the DAG's AA. Currently this implementation only supports merging of constant stores. Stores of arbitrary variables are technically possible with a very small change, but the DAG chooses not to do this. Doing so here makes most code worse since there's extra overhead in merging values into wider registers. On AArch64 -Os, this optimization results in very minor savings on CTMark. Differential Revision: https://reviews.llvm.org/D109131	2021-11-15 21:10:39 -08:00
Fabian Wolff	b484fa8289	[X86] Fix crash with inline asm using wrong register name Fixes PR#48678. `X86TargetLowering::getRegForInlineAsmConstraint()` can adjust the register class to match the type, e.g. change `VR128X` to `VR256X` if the type needs 256 bits. However, the function currently returns the unadjusted register and the adjusted register class, e.g. `xmm15` and `VR256X`, which then causes an assertion failure later because the register class does not contain that register. This patch fixes this behavior. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D113834	2021-11-16 10:38:12 +08:00
Craig Topper	233def40f7	[DAGCombiner] Prevent unfoldMaskedMerge from creating an AND with two inverted inputs. It's possible that the mask is already a NOT. At least if InstCombine hasn't canonicalized the input. In that case we will form an ANDN with X instead of with Y. So we don't need to worry about Y being a constant. We might need to check that X isn't a constant instead, but we don't have a test case for that yet. This fixes a size regression found when trying to enable this combine for RISCV in D113937. Differential Revision: https://reviews.llvm.org/D113948	2021-11-15 17:15:51 -08:00
Mircea Trofin	19e6b730ce	[NFC][Regalloc] Factor types that would be used by the eviction advisor This is in prepartion of pulling the eviction decision-making into an analysis pass, which would then allow swapping that decision making process. RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html Differential Revision: https://reviews.llvm.org/D113929	2021-11-15 13:15:14 -08:00
Nico Weber	b4e50e5228	[asm] Make EmitMSInlineAsmStr and EmitGCCInlineAsmStr more alike https://reviews.llvm.org/D71677 copied a bunch of code from EmitGCCInlineAsmStr() to EmitMSInlineAsmStr() but made a few small (likely unintentional) changes. This makes these pieces look the same. No behavior change. (Why are these functions two copies? No great reason as far as I can tell. https://reviews.llvm.org/rG1778831a3d1d24ab6545635f63da4d9c5f8f0ac7 did the split; we might want to undo them at some point. But PR23933 suggests that a bigger change is planned for this file in the future, so keeping this incremental for now.) Differential Revision: https://reviews.llvm.org/D113924	2021-11-15 15:43:01 -05:00
Nico Weber	0be836b7dd	[asm] Convert AsmPrinter::PrintSpecial() to StringRef No behavior change. Differential Revision: https://reviews.llvm.org/D113911	2021-11-15 15:38:27 -05:00
Nico Weber	833393e021	[asm] Correctly handle special names in variants There's really no reason why anyone should use these special names in a variant. I noticed this while reading the code: all other writes to OS are guarded by this conditional, and the behavior with the check seems more correct, so let's add the check. Differential Revision: https://reviews.llvm.org/D113909	2021-11-15 15:37:09 -05:00
Simon Pilgrim	7bac1985f4	[DAG] SimplifyVBinOp - add SDLoc() argument Pass in SDLoc instead of (repeated) local creations in SimplifyVBinOp and scalarizeBinOpOfSplats	2021-11-15 10:43:56 +00:00
Simon Pilgrim	8658d20724	[DAG] SimplifyVBinOp - pull out repeated getValueType() call. NFC.	2021-11-15 10:43:55 +00:00
Jay Foad	4119da2f7c	[MachineVerifier] Live interval for a subreg must have subranges MachineVerifier verified the subranges of a live interval if they existed, but did not complain if they did not exist. This patch changes the verifier to complain if there are no subranges in the live interval for a subreg operand (so long as MachineRegisterInfo says we should be tracking subreg liveness for that register). This matches the conditions for LiveIntervalCalc to create subranges in the first place. Differential Revision: https://reviews.llvm.org/D112556	2021-11-15 10:13:35 +00:00
Kyungwoo Lee	6747d44bda	[DebugInfo] Fix end_sequence of debug_line in LTO Object In a LTO build, the `end_sequence` in debug_line table for each compile unit (CU) points the end of text section which merged all CUs. The `end_sequence` needs to point to the end of each CU's range. This bug often causes invalid `debug_line` table in the final `.dSYM` binary for MachO after running `dsymutil` which tries to compensate an out-of-range address of `end_sequence`. The fix is to sync the line table termination with the range operations that are already maintained in DwarfDebug. When CU or section changes, or nodebug functions appear or module is finished, the prior pending line table is terminated using the last range label. In the MC path where no range is tracked, the old logic is conservatively used to end the line table using the section end symbol. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D108261	2021-11-14 20:19:47 -08:00
Kazu Hirata	feb40a3a47	[llvm] Use range-based for loops with instructions (NFC)	2021-11-14 19:40:48 -08:00
Kazu Hirata	d243cbf8ea	[llvm] Use isa instead of dyn_cast (NFC)	2021-11-14 19:40:46 -08:00
Mircea Trofin	a32c2c3808	[NFC] Use Optional<ProfileCount> to model invalid counts ProfileCount could model invalid values, but a user had no indication that the getCount method could return bogus data. Optional<ProfileCount> addresses that, because the user must dereference the optional. In addition, the patch removes concept duplication. Differential Revision: https://reviews.llvm.org/D113839	2021-11-14 19:03:30 -08:00
Kazu Hirata	7379736774	[llvm] Use range-based for loops with User::operands (NFC)	2021-11-14 09:32:38 -08:00
Sanjay Patel	254c5246e9	[DAGCombiner] match inverted/swapped patterns for vselect of mask of signbit This was noted as a follow-up to D113212 / D113426: `4fc1fc4005` `7e30404c3b` `11522cfcad` https://alive2.llvm.org/ce/z/e4o96b The canonicalization rules for these IR patterns are complicated, and we were not matching the expected forms in 2 out of the 3 cases. We can make codegen more robust by matching the swapped forms (and that will also work if these patterns are created late).	2021-11-14 09:35:26 -05:00
David Green	355ee18c5d	[TypePromotion] Extend TypePromotion::isSafeWrap This modifies the preconditions of TypePromotion's isSafeWrap method, to allow it to work from all constants from the ICmp. Using the code: %a = add %x, C1 %c = icmp ult %a, C2 According to Alive, we can prove that is equivalent to icmp ult (add zext(%x), sext(C1)), zext(C2) given C1 <=s 0 and C1 >s C2. https://alive2.llvm.org/ce/z/CECYZB Which is similar to what is already present. We can also prove icmp ult (add zext(%x), sext(C1)), sext(C2) given C1 <=s 0 and C1 <=s C2. https://alive2.llvm.org/ce/z/KKgyeL The PrepareWrappingAdds method was removed, and the constants are now altered to sext or zext directly as required by the above methods. Differential Revision: https://reviews.llvm.org/D113678	2021-11-14 11:18:31 +00:00
Kristina Bessonova	5b4bfd8c24	[DwarfCompileUnit] getOrCreateCommonBlock(): check for existing entity first. NFCI For global variables and common blocks there is no way to create entities through getOrCreateContextDIE(), so no need to obtain the context first. Differential Revision: https://reviews.llvm.org/D113651	2021-11-14 10:58:24 +02:00
Kristina Bessonova	90c5ab54a9	[DwarfCompileUnit] getOrCreateGlobalVariableDIE(): remove outdated comment. NFC	2021-11-14 10:56:54 +02:00
Craig Topper	82bc6a094e	[X86] Promote f16 STRICT_FROUND to f32 and call libc. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D113817	2021-11-12 21:37:03 -08:00
Kazu Hirata	99d5cbbd7e	[CodeGen] Use SDNode::uses (NFC)	2021-11-12 07:33:29 -08:00
Markus Lavin	4e94e25c90	Fix minor deficiency in machine-sink. Register uses that are MRI->isConstantPhysReg() should not inhibit sinking transformation. Reviewed By: StephenTozer Differential Revision: https://reviews.llvm.org/D111531	2021-11-12 08:01:13 +01:00
Kazu Hirata	2ca45adf24	[CodeGen, Target] Use MachineRegisterInfo::use_operands (NFC)	2021-11-11 22:28:55 -08:00
Simon Pilgrim	010b09b0c5	[DAG] reassociateOpsCommutative - test getNode result directly. NFC Matches the clean code style we use directly above	2021-11-11 18:45:50 +00:00
Sanjay Patel	11522cfcad	[DAGCombiner] add fold for vselect based on mask of signbit, part 3 (Cond0 s> -1) ? N1 : 0 --> ~(Cond0 s>> BW-1) & N1 https://alive2.llvm.org/ce/z/mGCBrd This was suggested as a potential enhancement in D113212 (also `7e30404c3b` ). There's an improvement for AArch that could be generalized ( X > -1 --> X >= 0 ). For x86, we have a counter-acting fold for most cases that turns the shift+not back into a setcc, so that needs a work-around to get more cases to use "pandn": D113603 Note that this pattern (and a previous one) are not currently canonical forms in IR: https://alive2.llvm.org/ce/z/e4o96b Adding swapped variants is left as a TODO item here, but is planned as a near-term follow-up patch. Differential Revision: https://reviews.llvm.org/D113426	2021-11-11 10:27:37 -05:00
Jay Foad	417add4d4e	[CodeGen] Tweak whitespace in LiveInterval printing When printing a LiveInterval, tweak the use of single and double spaces to try to make it clearer that the valnos are associated with the preceding range or subrange, not the following subrange. Compare the output before and then after this patch: %1 [32r,144r:0) 0@32r L000000000000000C [32r,144r:0) 0@32r L00000000000000F3 [32r,32d:0) 0@32r weight:0.000000e+00 %1 [32r,144r:0) 0@32r L000000000000000C [32r,144r:0) 0@32r L00000000000000F3 [32r,32d:0) 0@32r weight:0.000000e+00 Differential Revision: https://reviews.llvm.org/D113671	2021-11-11 15:19:32 +00:00
Kazu Hirata	ce227ce3b3	[CodeGen] Use MachineInstr::operands (NFC)	2021-11-11 07:10:30 -08:00
Jay Foad	491beae71d	[TwoAddressInstruction] Update LiveIntervals after rewriting INSERT_SUBREG to COPY Also add subranges to an existing live interval when introducing a new subreg def. Differential Revision: https://reviews.llvm.org/D113044	2021-11-11 12:24:59 +00:00
Jay Foad	6abbc3a420	[LiveIntervals] Update subranges in processTiedPairs In TwoAddressInstructionPass::processTiedPairs when updating live intervals after moving the last use of RegB back to the newly inserted copy, update any affected subranges as well as the main range. Differential Revision: https://reviews.llvm.org/D110411	2021-11-11 12:24:59 +00:00
Simon Pilgrim	82b74363a9	[DAG] reassociateOpsCommutative - peek through bitcasts to find constants Now that FoldConstantArithmetic can fold bitcasted constants, we should peek through bitcasts of binop operands to try and find foldable constants	2021-11-11 12:00:22 +00:00
Simon Pilgrim	098ea29641	[DAG] FoldConstantArithmetic - fold intop(bitcast(buildvector(c1)),bitcast(buildvector(c1))) -> bitcast(intop(buildvector(c1'),buildvector(c2'))) Enable FoldConstantArithmetic to constant fold bitcasted constant build vectors. These have typically been bitcasted for type legalization purposes. By extracting the raw constant bit data, performing the constant fold, and then casting the constant bit data back to the (legalized) type, we can perform constant folding on integer types after legalization. This in particular helps 32-bit targets which need to handle vXi64 build vectors - during legalization the (unsupported) i64 elements are split to create a bitcasted v2Xi32 build vector. Addresses some regressions in D113192. Differential Revision: https://reviews.llvm.org/D113564	2021-11-11 11:35:18 +00:00
Craig Topper	0963291991	[TypePromotion] Fix a hardcoded use of 32 as the size being promoted to. At least I think that's what the 32 here is. Use RegisterBitWidth instead. While there replace zext with zextOrSelf to simplify the code. Reviewed By: samparker, dmgreen Differential Revision: https://reviews.llvm.org/D113495	2021-11-10 22:12:39 -08:00
Kazu Hirata	642a361b7e	[llvm] Use make_early_inc_range (NFC)	2021-11-10 19:56:35 -08:00
Fraser Cormack	b1d8d70b9d	[SelectionDAG] Replace the Chain in LOAD->VP_LOAD widening The introduction of this legalization, D111248, forgot to replace the old chain with the new. This could manifest itself in the old (illegally-typed) value remaining in the DAG, though the simple test cases didn't catch this. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D113561	2021-11-10 17:49:12 +00:00
Simon Pilgrim	381d14775e	[DAG] reassociateOpsCommutative - pull out repeated getOperand() calls. NFC.	2021-11-10 15:19:13 +00:00
Simon Pilgrim	ed80761b50	[DAG] Split BuildVectorSDNode::getConstantRawBits into BuildVectorSDNode::recastRawBits helper. NFC. NFC refactor of D113351, pulling out the APInt split/merge code from the BuildVectorSDNode bits extraction into a BuildVectorSDNode::recastRawBits helper. This is to allow us to reuse the code when we're packing constant folded APInt data back together.	2021-11-10 13:06:19 +00:00
Fraser Cormack	332318ffb6	[SelectionDAG] Widen scalable-vector loads/stores via VP_LOAD/VP_STORE This patch fixes a compiler crash when widening scalable-vector loads and stores which end up breaking down to element-wise store operations. It does so by providing a way for targets with support for vector-predicated loads and stores to use those instead. By widening the operation but maintaining the original effective operation length via the EVL, only the intended vector elements are loaded or stored. This method should in theory be possible and even preferred for fixed-length vector types, but all fixed-length types can be broken down into their elements, and regardless I have observed regressions in the generated code when doing so. I believe this is simply due to VP_LOAD/VP_STORE not being up to par with LOAD/STORE in terms of optimization. It does improve performance on smaller self-contained examples, however, so the potential is there. While the only target that benefits from this is RISCV, the legalization is generic and so was placed centrally. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D111248	2021-11-10 09:55:03 +00:00
Adrian Kuegel	f0d997c472	Revert "[DebugInfo] Only create concrete DIEs of concrete functions" This reverts commit `f19471a249`. This leads to a crash. Still working on a reproducer to share.	2021-11-10 10:52:15 +01:00
Kazu Hirata	ef2d0e0f20	[llvm] Use MachineBasicBlock::{successors,predecessors} (NFC)	2021-11-09 23:05:15 -08:00
Jessica Paquette	3eabcda814	[GlobalISel] Ensure that translateInvoke adds all successors for inlineasm The existing code didn't add all necessary successors, which resulted in disjoint basic blocks. These would end up not being legalized which, in the best case, caused a fallback only in assert builds. Here's an example: https://godbolt.org/z/ndx15Enfj We also end up getting weird codegen here as well. Refactoring the code here allows us to correctly attach all successors. With this patch, the above example gives correct codegen at -O0 with and without asserts. Also autogen the testcase to show that we add all the successors now. Differential Revision: https://reviews.llvm.org/D113437	2021-11-09 16:20:34 -08:00
Arthur Eubanks	05963a3d66	Revert "[DebugInfo] Enforce implicit constraints on `distinct` MDNodes" This reverts commit `ee76525698`. Causes crashes, see comments in D104827.	2021-11-09 14:27:55 -08:00
Ilya Yanok	3c47c5ca13	[RegAllocFast] Fix nondeterminism in debuginfo generation Changes from commit `1db137b185` added iteration over hash map that can result in non-deterministic order. Fix that by using a SmallMapVector to preserve the order. Differential Revision: https://reviews.llvm.org/D113468	2021-11-09 21:42:50 +01:00
Ellis Hoag	f19471a249	[DebugInfo] Only create concrete DIEs of concrete functions At the begining of the module we can iterate through the functions to see which SPs should have concrete DIEs. Then when we need to reference a DIE for a SP we can decide if it's ok to create a concrete DIE or not. Fixes * https://bugs.llvm.org/show_bug.cgi?id=52159 * https://bugs.llvm.org/show_bug.cgi?id=30637 Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D112337	2021-11-09 10:52:34 -08:00
Scott Linder	ee76525698	[DebugInfo] Enforce implicit constraints on `distinct` MDNodes Add UNIQUED and DISTINCT properties in Metadata.def and use them to implement restrictions on the `distinct` property of MDNodes: * DIExpression can currently be parsed from IR or read from bitcode as `distinct`, but this property is silently dropped when printing to IR. This causes accepted IR to fail to round-trip. As DIExpression appears inline at each use in the canonical form of IR, it cannot actually be `distinct` anyway, as there is no syntax to describe it. * Similarly, DIArgList is conceptually always uniqued. It is currently restricted to only appearing in contexts where there is no syntax for `distinct`, but for consistency it is treated equivalently to DIExpression in this patch. * DICompileUnit is already restricted to always being `distinct`, but along with adding general support for the inverse restriction I went ahead and described this in Metadata.def and updated the parser to be general. Future nodes which have this restriction can share this support. The new UNIQUED property applies to DIExpression and DIArgList, and forbids them to be `distinct`. It also implies they are canonically printed inline at each use, rather than via MDNode ID. The new DISTINCT property applies to DICompileUnit, and requires it to be `distinct`. A potential alternative change is to forbid the non-inline syntax for DIExpression entirely, as is done with DIArgList implicitly by requiring it appear in the context of a function. For example, we would forbid: !named = !{!0} !0 = !DIExpression() Instead we would only accept the equivalent inlined version: !named = !{!DIExpression()} This essentially removes the ability to create a `distinct` DIExpression by construction, as there is no syntax for `distinct` inline. If this patch is accepted as-is, the result would be that the non-canonical version is accepted, but the following would be an error and produce a diagnostic: !named = !{!0} ; error: 'distinct' not allowed for !DIExpression() !0 = distinct !DIExpression() Also update some documentation to consistently use the inline syntax for DIExpression, and to describe the restrictions on `distinct` for nodes where applicable. Reviewed By: StephenTozer, t-tye Differential Revision: https://reviews.llvm.org/D104827	2021-11-09 18:19:11 +00:00
Chih-Ping Chen	cf0e32d197	[CodeView] Properly handle a DISubprogram in getScopeIndex. Differential Revision: https://reviews.llvm.org/D113142	2021-11-09 13:18:07 -05:00
Kazu Hirata	cba40c4ede	[llvm] Use MachineBasicBlock::{successors,predecessors} (NFC)	2021-11-09 07:11:14 -08:00
Simon Pilgrim	58c01ef270	[SelectionDAG] Merge FoldConstantVectorArithmetic into FoldConstantArithmetic (PR36544) This patch merges FoldConstantVectorArithmetic back into FoldConstantArithmetic. Like FoldConstantVectorArithmetic we now handle vector ops with any operand count, but we currently still only handle binops for scalar types - this can be improved in future patches - in particular some common unary/trinary ops still have poor constant folding. There's one change in functionality causing test changes - FoldConstantVectorArithmetic bails early if the build/splat vector isn't all constant (with some undefs) elements, but FoldConstantArithmetic doesn't - it instead attempts to fold the scalar nodes and bails if they fail to regenerate a constant/undef result, allowing some additional identity/undef patterns to be handled. Differential Revision: https://reviews.llvm.org/D113300	2021-11-09 11:31:01 +00:00
Jay Foad	5c3c7adf3a	[CodeGen] Fix assertion failure in TwoAddressInstructionPass::rescheduleMIBelowKill This fixes an assertion failure with -early-live-intervals when trying to update the live intervals for a debug instruction, which don't even have slot indexes. Differential Revision: https://reviews.llvm.org/D113116	2021-11-09 09:24:21 +00:00
Akira Hatanaka	1fe8993ad8	[ObjC][ARC] Replace uses of ObjC intrinsics that are arguments of operand bundle "clang.arc.attachedcall" with ObjC runtime functions The existing code only handles the case where the intrinsic being rewritten is used as the called function pointer of a call/invoke.	2021-11-08 21:19:07 -08:00
Wouter van Oortmerssen	62eeb3e57e	[WebAssembly] fix __stack_pointer being added to .debug_aranges When emitting a reloc for the Wasm global __stack_pointer, it was inadvertedly added to the symbols used for generating aranges, which caused some aranges to use it as the end symbol in a symbol diff, which caused a reloc for it to be emitted, which then caused an assert in `wasm64` since we have no 64-bit relocs for Wasm globals. Fixes: https://bugs.llvm.org/show_bug.cgi?id=52376 Differential Revision: https://reviews.llvm.org/D113438	2021-11-08 16:30:31 -08:00
Kazu Hirata	3c06920cd1	[llvm] Use make_early_inc_range (NFC)	2021-11-08 09:09:39 -08:00
Simon Pilgrim	f059b04f7b	[DAG] Add SelectionDAG::ComputeMinSignedBits helper As suggested on D113371, this adds a wrapper to SelectionDAG::ComputeNumSignBits, similar to the llvm::ComputeMinSignedBits wrapper. I've included some usage, its not exhaustive, just the more obvious cases where the intention is obvious. Differential Revision: https://reviews.llvm.org/D113396	2021-11-08 14:12:45 +00:00
Simon Pilgrim	f60d3ec0c7	[DAG] Add BuildVectorSDNode::getConstantRawBits helper We have several places where we need to extract the raw bits data from a BUILD_VECTOR node, so consolidate this to a single helper function that handles Undefs and Integer/FP constants, including implicit truncation. This should make it easier to extend D113202 to handle more constant folding of bitcasted constant data. Differential Revision: https://reviews.llvm.org/D113351	2021-11-08 12:07:38 +00:00
Chen Zheng	50acbbe3cd	[AsmPrinter][ORE] use correct opcode name Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D113173	2021-11-08 01:51:24 +00:00
Benjamin Kramer	9b8b16457c	Put implementation details into anonymous namespaces. NFCI.	2021-11-07 15:18:30 +01:00
Simon Pilgrim	0ff1edeeec	[DAG] SimplifyVBinOp - replace FoldConstantVectorArithmetic with FoldConstantArithmetic Currently FoldConstantArithmetic only handles binops, so replacing other uses of FoldConstantVectorArithmetic (in particular for SETCC nodes), still require more work.	2021-11-07 12:11:46 +00:00
Kazu Hirata	843d1eda18	[llvm] Use llvm::reverse (NFC)	2021-11-06 19:31:18 -07:00
Sanjay Patel	39c4c7d391	[DAGCombiner] remove vselect fold that was accidentally added This diff snuck into the unrelated: `025a2f73a3` It's a suggested follow-up for D113212, but I need to add test coverage first.	2021-11-06 09:34:30 -04:00
Sanjay Patel	025a2f73a3	[InstCombine] add tests for umax with sub; NFC	2021-11-06 08:32:52 -04:00
Kazu Hirata	87e53a0ad8	[llvm] Use make_early_inc_range (NFC)	2021-11-05 19:39:07 -07:00
Jay Foad	bdaa181007	[TwoAddressInstructionPass] Update existing physreg live intervals In TwoAddressInstructionPass::processTiedPairs with -early-live-intervals, update any preexisting physreg live intervals, as well as virtreg live intervals. By default (without -precompute-phys-liveness) physreg live intervals only exist for registers that are live-in to some basic block. Differential Revision: https://reviews.llvm.org/D113191	2021-11-05 21:20:30 +00:00
Sanjay Patel	7e30404c3b	[DAGCombiner] add fold for vselect based on mask of signbit, part 2 This is the 'or' sibling for the fold added with: D113212 https://alive2.llvm.org/ce/z/tgnp7K Note that neither of these transforms is poison-safe, but it does not seem to matter at this level. We have had the scalar version of D113212 for a long time, so this is just making optimizer behavior consistent. We do not have the scalar version of this fold, however, so that is another follow-up.	2021-11-05 15:02:12 -04:00
Michael Liao	af2ae2cf42	[BranchRelaxation] Fix warning on unused variable. NFC.	2021-11-05 11:18:27 -04:00
Simon Pilgrim	9e6506299a	[DAG] FoldConstantVectorArithmetic - remove SDNodeFlags argument Another minor step towards merging FoldConstantVectorArithmetic into FoldConstantArithmetic. We don't use SDNodeFlags in any constant folding inside DAG, so passing the Flags argument is a waste of time - an alternative would be to wire up FoldConstantArithmetic to take SDNodeFlags just-in-case we someday start using it, but we don't have any way to test it and I'd prefer to avoid dead code. Differential Revision: https://reviews.llvm.org/D113276	2021-11-05 14:36:17 +00:00
Sanjay Patel	4fc1fc4005	[DAGCombiner] add fold for vselect based on mask of signbit (X s< 0) ? Y : 0 --> (X s>> BW-1) & Y We canonicalize to the icmp+select form in IR, and we already have this fold for scalar select in SDAG, so I think it's an oversight that we don't have the fold for vectors. It seems neutral for AArch64 and saves some instructions on x86. Whether we should also have the sibling folds for the inverse condition or all-ones true value may depend on target-specific factors such as whether there's an "and-not" instruction. Differential Revision: https://reviews.llvm.org/D113212	2021-11-05 10:06:16 -04:00
Simon Pilgrim	f2703c3c33	[DAG] FoldConstantArithmetic - rename NumOps -> NumElts. NFC. NumOps represents the number of elements for vector constant folding, rename this NumElts so in future we can the consistently use NumOps to represent the number of operands of the opcode. Minor cleanup before trying to begin generalizing FoldConstantArithmetic to support opcodes other than binops.	2021-11-05 13:32:34 +00:00
Alfredo Dal'Ava Junior	1cb9f37a17	[FreeBSD] Do not mark __stack_chk_guard as dso_local This symbol is defined in libc.so so it is definitely not DSO-Local. Marking it as such causes problems on some platforms (such as PowerPC). Differential revision: https://reviews.llvm.org/D109090	2021-11-05 07:29:50 -05:00
Simon Pilgrim	c1e7911c3b	[DAG] FoldConstantArithmetic - fold bitlogic(bitcast(x),bitcast(y)) -> bitcast(bitlogic(x,y)) To constant fold bitwise logic ops where we've legalized constant build vectors to a different type (e.g. v2i64 -> v4i32), this patch adds a basic ability to peek through the bitcasts and perform the constant fold on the inner operands. The MVE predicate v2i64 regressions will be addressed by future support for basic v2i64 type support. One of the yak shaving fixes for D113192.... Differential Revision: https://reviews.llvm.org/D113202	2021-11-05 12:00:59 +00:00
Jay Foad	0321bd64e6	Revert "[TwoAddressInstructionPass] Update existing physreg live intervals" This reverts commit `ec0e1e88d2`. It was pushed by mistake.	2021-11-05 09:54:26 +00:00
Jay Foad	ec0e1e88d2	[TwoAddressInstructionPass] Update existing physreg live intervals In TwoAddressInstructionPass::processTiedPairs with -early-live-intervals, update any preexisting physreg live intervals, as well as virtreg live intervals. By default (without -precompute-phys-liveness) physreg live intervals only exist for registers that are live-in to some basic block. Differential Revision: https://reviews.llvm.org/D113191	2021-11-05 09:10:24 +00:00
Mircea Trofin	34f4fe3a90	[NFC][Regalloc] Ensure Query::interferingVRegs is accurate. To correctly use Query, one had to first call collectInterferingVRegs to pre-cache the query result, then call interferingVRegs. Failing the former, interferingVRegs could be stale. This did cause a bug which was addressed in D98232, but the underlying usability issue of the Query API wasn't. This patch addresses the latter by making collectInterferingVRegs an implementation detail, and having interferingVRegs play both roles. One side-effect of this is that interferingVRegs is not const anymore. Differential Revision: https://reviews.llvm.org/D112882	2021-11-02 18:26:54 -07:00
Chih-Ping Chen	2ed29d87ef	[CodeView] Fortran debug info emission in Code View. Differential Revision: https://reviews.llvm.org/D112826	2021-11-02 15:06:21 -04:00
Arthur Eubanks	e2024d72fa	Revert "[NFC] Remove LinkAll*.h" This reverts commit `fe364e5dc7`. Causes breakages, e.g. https://lab.llvm.org/buildbot/#/builders/188/builds/5266	2021-11-02 09:08:09 -07:00
Arthur Eubanks	fe364e5dc7	[NFC] Remove LinkAll*.h These were added to prevent functions from being removed by WPO. But that doesn't make sense, correct WPO will not remove functions we actually use. I noticed these because compiling cc1_main.cpp was pulling in random LLVM pass headers. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112971	2021-11-02 08:43:17 -07:00
Jay Foad	be1a8f8834	[AMDGPU] Really preserve LiveVariables in SILowerControlFlow https://bugs.llvm.org/show_bug.cgi?id=52204 Differential Revision: https://reviews.llvm.org/D112731	2021-11-02 15:03:37 +00:00
jacquesguan	a39eadcf16	[DAGCombiner] Teach combineShiftToMULH to handle constant and const splat vector. Fold (srl (mul (zext i32:$a to i64), i64:c), 32) -> (mulhu $a, $b), if c can truncate to i32 without loss. Reviewed By: frasercrmck, craig.topper, RKSimon Differential Revision: https://reviews.llvm.org/D108129	2021-11-02 12:04:23 +00:00
Simon Pilgrim	325031786e	[SelectionDAG] Optimize expansion for rotates/funnel shifts If the type of a funnel shift needs to be expanded, expand it to two funnel shifts instead of regular shifts. For constant shifts, this doesn't make much difference, but for variable shifts it allows a more optimal lowering. Also use the optimized funnel shift lowering for rotates. Alive2: https://alive2.llvm.org/ce/z/TvHDB- / https://alive2.llvm.org/ce/z/yzPept (Branched from D108058 as getting this completed should help unlock some other WIP patches). Original Patch: @efriedma (Eli Friedman) Differential Revision: https://reviews.llvm.org/D112443	2021-11-02 11:38:25 +00:00
Simon Pilgrim	37e17f278f	[DAG] MatchRotate - remove (redundant) legal type check. Rely on the hasOperation() instead - as commented on D77804, the mid-term intention is to recognise rotate/funnel-by-constant pre-legalization to help avoid SimplifyDemandedBits regressions.	2021-11-02 11:24:50 +00:00
Kazu Hirata	6bdb61c58a	[CodeGen] Use make_early_inc_range (NFC)	2021-11-01 22:38:49 -07:00
Jay Foad	b8016b626e	[CodeGen] Tweak coding style in LivePhysRegs::stepForward. NFC.	2021-11-01 16:01:24 +00:00
Jun Ma	1f9fa54984	[Taildup] Don't tail-duplicate loop header with multiple successors as its latches when Taildup hit loop with multiple latches like: // 1 -> 2 <-> 3 \| // \ <-> 4 \| // \ <-> 5 \| // \---> rest \| it may transform this loop into multiple loops by duplicate loop header. However, this change may has little benefit while makes cfg much complex. In some uncommon cases, it causes large compile time regression (offered by @alexfh in D106056). This patch disable tail-duplicate of such cases. TestPlan: check-llvm Differential Revision: https://reviews.llvm.org/D110613	2021-11-01 15:32:00 +08:00
Craig Topper	ada5458521	[RISCV] Expand scalable vector bswap. Fix crash for bitreverse. Fix LegalizeVectorOps to not try shuffle or unrolling expansions for scalable vectors. Differential Revision: https://reviews.llvm.org/D112236	2021-10-31 10:01:27 -07:00
Kazu Hirata	1a605f395f	[CodeGen] Use make_early_inc_range (NFC)	2021-10-31 07:57:36 -07:00
Kazu Hirata	72710af233	[CodeGen, Target] Use MachineBasicBlock::terminators (NFC)	2021-10-31 07:57:34 -07:00
Kazu Hirata	4cc7c4724f	[MachineCSE] Use make_early_inc_range (NFC)	2021-10-30 19:00:23 -07:00
Roman Lebedev	25043c8276	[NFCI] Introduce `ICmpInst::compare()` and use it where appropriate As noted in https://reviews.llvm.org/D90924#inline-1076197 apparently this is a pretty common pattern, let's not repeat it yet again, but have it in a common place. There may be some more places where it could be used, but these are the most obvious ones.	2021-10-30 17:50:06 +03:00
Christudasan Devadasan	aa2d3b59ce	GlobalISel/Utils: Use incoming regbank while constraining the superclasses Register operands with superclasses can possibly have multiple regBanks if they have different register types. The regBank ambiguity resolved during regbankselect should be used to constrain the operand regclass instead of obtaining one from the MCInstrDesc. This is a prerequisite patch for D109300 that introduces allocatable AV_* Superclasses for AMDGPU by combining both VGPRs and AGPRs and we want to restrain the regclass to either A or V based on the incoming regbank. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112323	2021-10-30 07:20:45 -04:00
Fraser Cormack	8314a04ede	[SelectionDAG] Allow FindMemType to fail when widening loads & stores This patch removes an internal failure found in FindMemType and "bubbles it up" to the users of that method: GenWidenVectorLoads and GenWidenVectorStores. FindMemType -- renamed findMemType -- now returns an optional value, returning None if no such type is found. Each of the aforementioned users now pre-calculates the list of types it will use to widen the memory access. If the type breakdown is not possible they will signal a failure, at which point the compiler will crash as it does currently. This patch is preparing the ground for alternative legalization strategies for vector loads and stores, such as using vector-predication versions of loads or stores. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112000	2021-10-29 18:27:31 +01:00
Neubauer, Sebastian	c78640ee6a	[TailDuplicator] Fix merging block with terminator The TailDuplicator merged two blocks, even if the first one ended with a terminator, resulting in invalid MIR, where a terminator is in the middle of a block. Abort merging if the first block ends with a terminator. Differential Revision: https://reviews.llvm.org/D112226	2021-10-29 10:52:46 +02:00
Abinav Puthan Purayil	db8d7b6e2d	[DAGCombine][NFC] s/it's/its in the comment of hasNoInfs().	2021-10-29 07:36:38 +05:30
Daniel Kiss	d8075e8781	Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This is relanding commit `da1d1a0869` . This patch additionally addresses failures found in buildbots & post review comments. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-28 21:45:09 +02:00
Guozhi Wei	1e46dcb77b	[TwoAddressInstructionPass] Put all new instructions into DistanceMap In function convertInstTo3Addr, after converting a two address instruction into three address instruction, only the last new instruction is inserted into DistanceMap. This is wrong, DistanceMap should track all instructions from the beginning of current MBB to the working instruction. When a two address instruction is converted to three address instruction, multiple instructions may be generated (usually an extra COPY is generated), all of them should be inserted into DistanceMap. Similarly when unfolding memory operand in function tryInstructionTransform DistanceMap is not maintained correctly. Differential Revision: https://reviews.llvm.org/D111857	2021-10-28 11:11:59 -07:00
Nicolai Hähnle	b437aaa672	MachineDominators: Define MachineDomTree type alias This is a (very) small move towards making the machine dominators more aligned with the IR dominators: * DominatorTree / MachineDomTree is the class holding the dominator tree * DominatorTreeWrapperPass / MachineDominatorTree is the corresponding (machine) function pass This alignment will be used by analyses that are designed as templates that work with LLVM IR as well as Machine IR. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D112690	2021-10-28 22:30:35 +05:30
Daniel Kiss	66e03db814	Revert "Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."" This reverts commit `b6420e575f`.	2021-10-28 17:24:53 +02:00
Daniel Kiss	b6420e575f	Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This is relanding commit `da1d1a0869` . This patch additionally addresses failures found in buildbots & post review comments. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-28 16:49:19 +02:00
Neubauer, Sebastian	50d8d963e3	[GlobalISel] Simplify RegBankSelect Save the instruction list of a block before selecting banks. This allows to cope with moved instructions, even if they are reordered or splitted into multiple basic blocks. Differential Revision: https://reviews.llvm.org/D111223	2021-10-28 10:30:55 +02:00
Michael Liao	e6a4ba3aa6	[amdgpu] Handle the case where there is no scavenged register. - When an unconditional branch is expanded into an indirect branch, if there is no scavenged register, an SGPR pair needs spilling to enable the destination PC calculation. In addition, before jumping into the destination, that clobbered SGPR pair need restoring. - As SGPR cannot be spilled to or restored from memory directly, the spilling/restoring of that SGPR pair reuses the regular SGPR spilling support but without spilling it into memory. As that spilling and restoring points are fully controlled, we only need to spill that SGPR into the temporary VGPR, which needs spilling into its emergency slot. - The target-specific hook is revised to take additional restore block, where the restoring code is filled. After that, the relaxation will place that restore block directly before the destination block and insert an unconditional branch in any fall-through block into the destination block. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106449	2021-10-27 18:37:27 -04:00
Kerry McLaughlin	f01fafdcd4	[SVE][CodeGen] Fix incorrect legalisation of zero-extended masked loads PromoteIntRes_MLOAD always sets the extension type to `EXTLOAD`, which results in a sign-extended load. If the type returned by getExtensionType() for the load being promoted is something other than `NON_EXTLOAD`, we should instead pass this to getMaskedLoad() as the extension type. Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D112320	2021-10-27 14:15:41 +01:00
Caroline Concatto	1137b7207d	[SelectionDAG] Widening the result of INSERT_SUBVECTOR. Widens the result and first input vector because they have the same size. The subvector to be inserted is widened in the operand widen function. Differential Revision: https://reviews.llvm.org/D112187	2021-10-27 13:52:25 +01:00
Daniel Kiss	894ddba1c9	Revert "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This reverts commit `da1d1a0869`.	2021-10-27 14:29:35 +02:00
Jay Foad	b9e3af124b	[LiveInterval] Add RemoveDeadValNo argument to removeSegment(iterator) Add an optional bool RemoveDeadValNo argument to the removeSegment(iterator) overload, for consistency with the other overloads. This gives clients a way to remove dead valnos while also getting an updated iterator returned (in the manner of vector::erase). Use this to clean up some inefficient code in LiveIntervals::repairOldRegInRange. NFC. Differential Revision: https://reviews.llvm.org/D110560	2021-10-27 09:43:32 +01:00
Daniel Kiss	da1d1a0869	[ARM] __cxa_end_cleanup should be called instead of _UnwindResume. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-27 10:40:00 +02:00
Kazu Hirata	c3e698e2f5	[CodeGen, Hexagon] Use MachineBasicBlock::phis (NFC)	2021-10-26 09:01:29 -07:00
Craig Topper	d51e3a2139	[LegalizeTypes][TargetLowering] Merge getShiftAmountTyForConstant into TargetLowering::getShiftAmountTy. getShiftAmountTyForConstant is a special helper that changes the shift amount to i32 if the type chosen by TargetLowering::getShiftAmountTy can't represent all possible values. This is needed to satisfy an assert in SelectionDAG::getNode. It requires additional consideration to know when this helper should be used. I'm not sure that we are always using it when we should. This patch merges the getShiftAmountTyForConstant handling into TargetLowering::getShiftAmountTy so we don't need to think about it anymore. Technically this may slightly increase compile times since the majority of callers of getShiftAmountTy won't need this. Hopefully, this isn't an issue in practice. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112469	2021-10-25 14:06:53 -07:00
Jeremy Morse	4136897bd4	[DebugInfo][InstrRef][NFC] Switch to using DenseMaps and similar There are a few STL containers hanging around that can become DenseMaps, SmallVectors and similar. This recovers a modest amount of compile time performance. While I'm here, adjust the bit layout of ValueIDNum: this was always supposed to act like a value type, however it seems that clang doesn't compile the comparison functions to act that way. Add a uint64_t to a union that explicitly aliases the bitfields, so that we can compare the whole value as a single integer. Differential Revision: https://reviews.llvm.org/D112333	2021-10-25 18:07:17 +01:00
Jeremy Morse	97ddf49e43	[DebugInfo][InstrRef] Recover stack-slot tracking performance This patch is like D111627 -- instead of calculating IDF for every location on the stack, only do it for the smallest units of interference, and copy the PHIs for those units to any aliases. The test added runs placeMLocPHIs directly, and tests that: * A def of the lower 8 bits of a stack slot causes all aliasing regs to have PHIs placed, * It doesn't cause the equivalent location to x86's $ah, which isn't aliased, to have a PHI placed. Differential Revision: https://reviews.llvm.org/D112324	2021-10-25 17:31:09 +01:00
Danila Malyutin	7b102fcc91	[CodeGen] Fix dependence breaking for tied operands Differential Revision: https://reviews.llvm.org/D107582	2021-10-25 18:52:27 +03:00
Jeremy Morse	ee3eee71e4	[DebugInfo][InstrRef] Track values fused into stack spills During register allocation, some instructions can have stack spills fused into them. It means that when vregs are allocated on the stack we can convert: SETCCr %0 DBG_VALUE %0 to SETCCm %stack.0 DBG_VALUE %stack.0 Unfortunately instruction referencing finds this harder: a store to the stack doesn't have a specific operand number, therefore we don't substitute the old operand for a new operand, and the location is dropped. This patch implements a solution: just recognise the memory operand attached to an instruction with a Special Number (TM), and record a substitution between the old value and the new one. This patch adds substitution code to InlineSpiller to record such fused spills, and tracking in InstrRefBasedLDV to recognise such values, and produce the value numbers for them. Everything to do with the movement of stack-defined values is already handled in InstrRefBasedLDV. Differential Revision: https://reviews.llvm.org/D111317	2021-10-25 15:14:53 +01:00
Jeremy Morse	2eb96e1711	[DebugInfo][NFC] Avoid a use-after-free This patch swaps two lines -- the CurSucc reference can be invalidated by the call to DFS.push_back, therefore that should happen last. The usual hat-tip to asan for catching this. This patch also swaps an ealier call to ToAdd.insert and DFS.push_back, where a stable iterator (from successors()) is being used. This isn't strictly necessary, but is good for consistency and avoiding readers asking themselves why the two code portions have a different order.	2021-10-25 14:16:30 +01:00
Sanjay Patel	6e46b66e2a	[DAGCombiner] make matching bit-hack form of usubsat more flexible (i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128 As suggested in D112085, we can substitute 'xor' with 'add' in this pattern, and it is logically equivalent: https://alive2.llvm.org/ce/z/eJtWWC We canonicalize to 'xor' in IR, but SDAG does not do that (and it probably should not - https://llvm.org/PR52267 ), so it is possible to see either pattern in codegen. Note that 'sub' is a another potential pattern, but that is canonicalized to 'add' in DAGCombiner, so we don't need to worry about that variation. Differential Revision: https://reviews.llvm.org/D112377	2021-10-25 09:01:52 -04:00
Tim Northover	f9089accba	CodeGenPrep: remove all copies of GEP from list if there are duplicates. Unfortunately ToT has changed enough from the revision where this actually caused problems that the test no longer triggers an assertion failure.	2021-10-25 14:00:02 +01:00
Kazu Hirata	4bd46501c3	Use llvm::any_of and llvm::none_of (NFC)	2021-10-24 17:35:33 -07:00
Kazu Hirata	1c35973c77	[llvm] Call *(Set\|Map)::erase directly (NFC) We can erase an item in a set or map without checking its membership first.	2021-10-24 09:32:59 -07:00
Kazu Hirata	d8e4170b0a	Ensure newlines at the end of files (NFC)	2021-10-23 08:45:29 -07:00
Kazu Hirata	d14d7068b6	[llvm] Use StringRef::contains (NFC)	2021-10-23 08:45:27 -07:00
Jay Foad	2915889d74	[ScheduleDAGInstrs] Call adjustSchedDependency in more cases This removes a condition and the corresponding FIXME comment, because the Hexagon assertion it refers to has apparently been fixed, probably by D76134. NFCI. This just gives targets the opportunity to adjust latencies that were set to 0 by the generic code because they involve "implicit pseudo" operands. Differential Revision: https://reviews.llvm.org/D112306	2021-10-22 20:03:29 +01:00
Jeremy Morse	e7084ceab3	[DebugInfo][Instr] Track subregisters across stack spills/restores Sometimes we generate code that writes to a subregister, then spills / restores a super-register to the stack, for example: $eax = MOV32ri 0 MOV64mr $rsp, 1, $noreg, 16, $noreg, $rax $rcx = MOV64rm $rsp, 1, $noreg, 8, $noreg This patch takes a different approach: it adds another index to MLocTracker that identifies a size/offset within a stack slot. A location on the stack is then a pari of {FrameIndex, SlotNum}. Spilling and restoring now involves pairing up the src/dest register numbers, and the dest/src stack position to be transferred to/from. Location coverage improves as a result, compile-time performance decreases, alas. One limitation is that if a PHI occurs inside a stack slot: DBG_PHI %stack.0, 1 We don't know how large the resulting value is, and so might have difficulty picking which value to use. DBG_PHI might need to be augmented in the future with such a size. Unit tests added ensure that spills and restores correctly transfer to positions in the Location => Value map, and that different register classes written to the stack will correctly clobber all other positions in the stack slot. Differential Revision: https://reviews.llvm.org/D112133	2021-10-22 19:20:55 +01:00
Craig Topper	93139a3c32	[LegalizeTypes] Only expand CTLZ/CTTZ/CTPOP during type promotion if the new type is legal. We might be promoting a large non-power of 2 type and the new type may need to be split. Once we split it we may have a ctlz/cttz/ctpop instruction for the split type. I'm also concerned that we may create large shifts with shift amounts that are too small.	2021-10-22 11:02:35 -07:00
Simon Pilgrim	a5f56342b0	[DAG] narrowExtractedVectorLoad - EXTRACT_SUBVECTOR indices are always constant EXTRACT_SUBVECTOR indices are always constant, we don't need to check for ConstantSDNode, we should just use getConstantOperandVal which will assert for the constant.	2021-10-22 18:32:14 +01:00
Jeremy Morse	d9eebe3cd7	[DebugInfo][InstrRef] Add unit tests for transfer-function building This patch adds some unit tests for the machine-location transfer-function building parts of InstrRefBasedLDV: i.e., test that if we feed some MIR into the transfer-function building code, does it create the correct transfer function. There are a number of minor defects that get corrected in the process: * The unit test was selecting the x86 (i.e. 32 bit) backend rather than x86_64's 64 bit backend, * COPY instructions weren't actually having their subregister values correctly represented in the transfer function. Subregisters were being defined by the COPY, rather than taking the value in the source register. * SP aliases were at risk of being clobbered, if an SP subregister was clobbered. Differential Revision: https://reviews.llvm.org/D112006	2021-10-22 18:29:03 +01:00
Craig Topper	04c184bba7	[TargetLowering] Simplify the interface of expandABS. NFC Instead of returning a bool to indicate success and a separate SDValue, return the SDValue and have the callers check if it is null. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112331	2021-10-22 10:22:23 -07:00
Craig Topper	0766aef3f3	[LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if they'll be expanded later. Expanding these requires multiple constants. If we promote during type legalization when they'll end up getting expanded in LegalizeDAG, we'll use larger constants. These constants may be harder to materialize. For example, 64-bit constants on 64-bit RISCV are very expensive. This is similar to what has already been done to BSWAP and BITREVERSE. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112268	2021-10-22 09:10:01 -07:00
Zarko Todorovski	0bd6a9f2d1	[clang/llvm] Inclusive language: replace segregate with separate	2021-10-22 09:59:35 -04:00
Craig Topper	996123e5e8	[TargetLowering] Simplify the interface for expandCTPOP/expandCTLZ/expandCTTZ. There is no need to return a bool and have an SDValue output parameter. Just return the SDValue and let the caller check if it is null. I have another patch to add more callers of these so I thought I'd clean up the interface first. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112267	2021-10-21 15:35:28 -07:00
Craig Topper	ff37b1105d	[LegalizeVectorOps][X86] Don't defer BITREVERSE expansion to LegalizeDAG. By expanding early it allows the shifts to be custom lowered in LegalizeVectorOps. Then a DAG combine is able to run on them before LegalizeDAG handles the BUILD_VECTORS for the masks used. v16Xi8 shift lowering on X86 requires a mask to be applied to a v8i16 shift. The BITREVERSE expansion applied an AND mask before SHL ops and after SRL ops. This was done to share the same mask constant for both shifts. It looks like this patch allows DAG combine to remove the AND mask added after v16i8 SHL by X86 lowering. This maintains the mask sharing that BITREVERSE was trying to achieve. Prior to this patch it looks like we kept the mask after the SHL instead which required an extra constant pool or a PANDN to invert it. This is dependent on D112248 because RISCV will end up scalarizing the BSWAP portion of the BITREVERSE expansion if we don't disable BSWAP scalarization in LegalizeVectorOps first. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112254	2021-10-21 15:23:23 -07:00
Craig Topper	458ed5fcc3	[TargetLowering][RISCV] Prevent scalarization of fixed vector bswap. It's better to do the ands, shifts, ors in the vector domain than to scalarize it and do those operations on each element. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112248	2021-10-21 14:34:01 -07:00
Yonghong Song	f6811cec84	[DebugInfo] Support typedef with btf_decl_tag attributes Clang patch ([1]) added support for btf_decl_tag attributes with typedef types. This patch added llvm support including dwarf generation. For example, for typedef typedef unsigned * __u __attribute__((btf_decl_tag("tag1"))); __u u; the following shows llvm-dwarfdump result: 0x00000033: DW_TAG_typedef DW_AT_type (0x00000048 "unsigned int *") DW_AT_name ("__u") DW_AT_decl_file ("/home/yhs/work/tests/llvm/btf_tag/t.c") DW_AT_decl_line (1) 0x0000003e: DW_TAG_LLVM_annotation DW_AT_name ("btf_decl_tag") DW_AT_const_value ("tag1") 0x00000047: NULL [1] https://reviews.llvm.org/D110127 Differential Revision: https://reviews.llvm.org/D110129	2021-10-21 08:42:58 -07:00
Sanjay Patel	d2198771e9	[DAGCombiner] fold bit-hack form of usubsat (i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128 I haven't found a generalization of this identity: https://alive2.llvm.org/ce/z/_sriEQ Note: I was actually looking at the first form of the pattern in that link, but that's part of a long chain of potential missed transforms in codegen and IR....that I hope ends here! The predicates for when this is profitable are a bit tricky. This version of the patch excludes multi-use but includes custom lowering (as opposed to legal only). On x86 for example, we have custom lowering for some vector types, and that uses umax and sub. So to enable that fold, we need add use checks to avoid regressions. Even with legal-only lowering, we could see code with extra reg move instructions for extra uses, so that constraint would have to be eased very carefully to avoid penalties. Differential Revision: https://reviews.llvm.org/D112085	2021-10-21 09:47:19 -04:00
Kerry McLaughlin	0d153df69e	[SVE] Fix selection failure when splitting extended masked loads When splitting a masked load, `GetDependentSplitDestVTs` is used to get the MemVTs of the high and low parts. If the masked load is extended, this may return VTs with different element types which are used to create the high & low masked load instructions. This patch changes `GetDependentSplitDestVTs` to ensure we return VTs with the same element type. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D111996	2021-10-21 13:04:38 +01:00
Arthur Eubanks	6ea7437ca5	[SelectionDAG] Bail out of mergeTruncStores when not optimizing With unoptimized code, we may see lots of stores and spend too much time in mergeTruncStores. Fixes PR51827. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D111596	2021-10-20 16:58:22 -07:00
Jon Roelofs	b046eb19b8	[AArch64][GlobalISel] combine (and (or x, c1), c2) => (and x, c2) iff c1 & c2 == 0 https://godbolt.org/z/h8ejrG4hb rdar://83597585 Differential Revision: https://reviews.llvm.org/D111856	2021-10-20 12:11:52 -07:00
Stanislav Mekhanoshin	c80d8a8cea	[AMDGPU] MachineLICM cannot hoist VALU MachineLoop::isLoopInvariant() returns false for all VALU because of the exec use. Check TII::isIgnorableUse() to allow hoisting. That unfortunately results in higher register consumption since MachineLICM does not adequately estimate pressure. Therefor I think it shall only be enabled after D107677 even though it does not depend on it. Differential Revision: https://reviews.llvm.org/D107859	2021-10-20 11:47:24 -07:00
Itay Bookstein	08ed216000	[IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol As discussed in: * https://reviews.llvm.org/D94166 * https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html The GlobalIndirectSymbol class lost most of its meaning in https://reviews.llvm.org/D109792, which disambiguated getBaseObject (now getAliaseeObject) between GlobalIFunc and everything else. In addition, as long as GlobalIFunc is not a GlobalObject and getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee is a GlobalIFunc cannot currently be modeled properly. Creating aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition, calling getAliaseeObject on a GlobalIFunc will currently return nullptr, which is undesirable because it should return the object itself for non-aliases. This patch refactors the GlobalIFunc class to inherit directly from GlobalObject, and removes GlobalIndirectSymbol (while inlining the relevant parts into GlobalAlias and GlobalIFunc). This allows for calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc itself, making getAliaseeObject() more consistent and enabling alias-to-ifunc to be properly modeled in the IR. I exercised some judgement in the API clients of GlobalIndirectSymbol: some were 'monomorphized' for GlobalAlias and GlobalIFunc, and some remained shared (with the type adapted to become GlobalValue). Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D108872	2021-10-20 10:29:47 -07:00
Fraser Cormack	eabf11f9ea	[CodeGenPrepare] Avoid a scalable-vector crash in ctlz/cttz This patch fixes a crash when despeculating ctlz/cttz intrinsics with scalable-vector types. It is not safe to speculatively get the size of the vector type in bits in case the vector type is not a fixed-length type. As it happens this isn't required as vector types are skipped anyway. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112141	2021-10-20 16:45:55 +01:00
Craig Topper	fe1f0de003	[RISCV][WebAssembly][TargetLowering] Allow expandCTLZ/expandCTTZ to rely on CTPOP expansion for vectors. Our fallback expansion for CTLZ/CTTZ relies on CTPOP. If CTPOP isn't legal or custom for a vector type we would scalarize the CTLZ/CTTZ. This is different than CTPOP itself which would use a vector expansion. This patch teaches expandCTLZ/CTTZ to rely on the vector CTPOP expansion instead of scalarizing. To do this I had to add additional checks to make sure the operations used by CTPOP expansions are all supported. Some of the operations were already needed for the CTLZ/CTTZ expansion. This is a huge improvement to the RISCV which doesn't have a scalar ctlz or cttz in the base ISA. For WebAssembly, I've added Custom lowering to keep the scalarizing behavior. I've also extended the scalarizing to CTPOP. Differential Revision: https://reviews.llvm.org/D111919	2021-10-20 07:46:41 -07:00
Jeremy Morse	89950ade21	[DebugInfo][InstrRef] Track a single variable at a time Here's another performance patch for InstrRefBasedLDV: rather than processing all variable values in a scope at a time, instead, process one variable at a time. The benefits are twofold: * It's easier to reason about one variable at a time in your mind, * It improves performance, apparently from increased locality. The downside is that the value-propagation code gets indented one level further, plus there's some churn in the unit tests. Differential Revision: https://reviews.llvm.org/D111799	2021-10-20 15:03:52 +01:00
Sander de Smalen	be6c8dc765	[SelectionDAG] Fix getVectorSubVecPointer for scalable subvectors. When inserting a scalable subvector into a scalable vector through the stack, the index to store to needs to be scaled by vscale. Before this patch, that didn't yet happen, so it would generate the wrong offset, thus storing a subvector to the incorrect address and overwriting the wrong lanes. For some insert: nxv8f16 insert_subvector(nxv8f16 %vec, nxv2f16 %subvec, i64 2) The offset was not scaled by vscale: orr x8, x8, #0x4 st1h { z0.h }, p0, [sp] st1h { z1.d }, p1, [x8] ld1h { z0.h }, p0/z, [sp] And is changed to: mov x8, sp st1h { z0.h }, p0, [sp] st1h { z1.d }, p1, [x8, #1, mul vl] ld1h { z0.h }, p0/z, [sp] Differential Revision: https://reviews.llvm.org/D111633	2021-10-20 13:55:24 +01:00
Simon Pilgrim	71e39e3f18	[ADT] Add APInt::isNegatedPowerOf2() helper Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos..... Differential Revision: https://reviews.llvm.org/D111998	2021-10-19 14:38:21 +01:00
Jeremy Morse	849b17949f	[DebugInfo][InstrRef] Avoid un-necessary densemap copies and comparisons This is purely a performance patch: InstrRefBasedLDV used to use three DenseMaps to store variable values, two for long term storage and one as a working set. This patch eliminates the working set, and updates the long term storage in place, thus avoiding two DenseMap comparisons and two DenseMap assignments, which can be expensive. Differential Revision: https://reviews.llvm.org/D111716	2021-10-19 11:10:14 +01:00
Jeremy Morse	cf033bb2d3	[DebugInfo][NFC] Zero-initialize a class field This field gets assigned when the relevant object starts being used; but it remains uninitialized beforehand. This risks introducing hard-to-detect bugs if something changes, so zero-initialize the field.	2021-10-19 10:24:12 +01:00
Alexandros Lamprineas	04dc68710a	[DebugInfo][ARM] Fix incorrect debug information for RWPI accessed globals When compiling for the RWPI relocation model the debug information is wrong: * the debug location is described as { DW_OP_addr Var } instead of { DW_OP_constNu Var DW_OP_bregX 0 DW_OP_plus } * the relocation type is R_ARM_ABS32 instead of R_ARM_SBREL32 Differential Revision: https://reviews.llvm.org/D111404	2021-10-18 21:29:46 +01:00

... 12 13 14 15 16 ...

32682 Commits