llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	00816714f9	[DAGCombiner][RISCV] Make foldBinOpIntoSelect work correctly with opaque constants. The CanFoldNonConst doesn't work correctly with opaque constants because getNode won't constant fold constants if one is opaque. Even if the operation is AND/OR. This can lead to infinite loops. This patch does the folding manually in the DAGCombine. Alternatively, we could improve getNode but that seemed likely to have bigger impact and possibly increase compile time for the additional checks. We wouldn't want to directly constant fold because we need to preserve the opaque flag. Fixes PR58511. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D136472	2022-10-22 19:10:33 -07:00
Simon Pilgrim	b24a9f0cef	[DAG] visitFREEZE - pull out Operands array. NFCI. Initial tidyup and it will make it easier to adjust additional Operands in a future patch.	2022-10-22 20:14:56 +01:00
Simon Pilgrim	7511303c4f	[DAG] canCreateUndefOrPoison - add freeze(fsh(x,y,z)) -> fsh(freeze(x),freeze(y),freeze(z)) support The funnel-shift amount is always modulo, so won't introduce poison/undef	2022-10-22 18:39:52 +01:00
Simon Pilgrim	89111707ec	[DAG] canCreateUndefOrPoison - add freeze(rot(x,y)) -> rot(freeze(x),freeze(y)) support The rotation amount is always modulo, so won't introduce poison/undef	2022-10-22 17:24:53 +01:00
DianQK	d20e4a1d68	[DebugInfo] Fix potential CU mismatch for attachRangesOrLowHighPC When a CU attaches some ranges for a subprogram or an inlined code, the CU should be that of the subprogram/inlined code that was emitted. If not, then these emitted ranges will use the incorrect base of the CU in `emitRangeList`. A reproducible example is: When linking these two LLVM IRs, dsymutil will report no mapping for range or inconsistent range data warnings. `foo.swift` ```swift import AppKit.NSLayoutConstraint public class Foo { public var c: Int { get { Int(NSLayoutConstraint().constant) } set { } } } ``` `main.swift` ```swift // no mapping for range let f: Foo! = nil // inconsistent range data //let l: Foo = Foo() ``` Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D136039	2022-10-22 21:22:01 +08:00
Jay Foad	f010b1eae5	Revert "[MachineVerifier] Try harder to verify LiveVariables" This reverts commit `d4650d0938`. Reverted because it causes several X86 CodeGen test failures in a build with LLVM_ENABLE_EXPENSIVE_CHECKS=ON.	2022-10-22 12:21:50 +01:00
Arthur Eubanks	4153f989ba	[ObjCARC] Remove legacy PM versions of optimization passes This doesn't touch objc-arc-contract because that's in the codegen pipeline. However, this does move its corresponding initialize function into initializeCodegen(). Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D135041	2022-10-21 13:40:54 -07:00
Jay Foad	d4650d0938	[MachineVerifier] Try harder to verify LiveVariables Verify the LiveVariables analysis after a pass that claims to preserve it, even if there are no further passes (apart from the verifier itself) that would use the analysis. Differential Revision: https://reviews.llvm.org/D129213	2022-10-21 16:31:09 +01:00
Jay Foad	33f78d0903	[TwoAddressInstruction] Fix stale LiveVariables info in processStatepoint D129213 improves verification of LiveVariables, and caused CodeGen/X86/statepoint-cmp-sunk-past-statepoint.ll to fail with: * Bad machine code: LiveVariables: Block should not be in AliveBlocks * after Two-Address instruction pass. Fix it by clearing AliveBlocks for a register which is no longer used. Differential Revision: https://reviews.llvm.org/D136445	2022-10-21 14:57:03 +01:00
Craig Topper	2c82080f09	[MachineFrameInfo][RISCV] Call ensureStackAlignment for objects created with scalable vector stack id. This is an alternative to fix PR57939 for RISC-V. It definitely can be argued that the stack temporaries for RISC-V are being created with an unnecessarily large alignment. But ignoring the alignment in MachineFrameInfo also seems bad. Looking at the test update that go with the current ID==0 check, it was intending to exclude things like the NoAlloc stackid. So I'm not sure if scalable vectors are intentionally being excluded. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D135913	2022-10-20 14:05:46 -07:00
Paul Walker	ab8257ca0e	[NFC] Fix a few whitespace inconsistencies.	2022-10-20 14:52:25 +00:00
Yuanfang Chen	24c6ea917c	[JMCInstrument] rename ELF section name from ".just.my.code" to ".data.just.my.code" This gives linker scripts a hint about where to place the section.	2022-10-19 10:49:54 -07:00
Simon Pilgrim	9708d88017	Revert rG42230efccf8fe1185be5fa6c23dce0a8183d6ec9 "[DAG] Fold (sra (or (shl x, c1), (shl y, c2)), c1) -> (sext_inreg (or x, (shl y,c2-c1)) iff c2 >= c1" @foad was right - this isn't actually going to help with D136042 as much as hoped, we need a better AMDGPU-specific solution as other targets are likely to make use of it	2022-10-19 12:07:41 +01:00
Simon Pilgrim	42230efccf	[DAG] Fold (sra (or (shl x, c1), (shl y, c2)), c1) -> (sext_inreg (or x, (shl y,c2-c1)) iff c2 >= c1 Helps with some of the AMDGPU regressions identified in D136042 where we were losing signed BFE patterns after sinking shifts behind logic ops. Differential Revision: https://reviews.llvm.org/D136081	2022-10-19 11:18:49 +01:00
Koakuma	d3fcbee10d	[SPARC] Make calls to function with big return values work Implement CanLowerReturn and associated CallingConv changes for SPARC/SPARC64. In particular, for SPARC64 there's new `RetCC_Sparc64_` functions that handles the return case of the calling convention. It uses the same analysis as `CC_Sparc64_` family of funtions, but fails if the return value doesn't fit into the return registers. This makes calls to functions with big return values converted to an sret function as expected, instead of crashing LLVM. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D132465	2022-10-18 00:01:55 +00:00
Craig Topper	30305d7948	[TargetLowering][RISCV][Sparc] Don't emit zero check in CTTZTableLookup for CTTZ_ZERO_UNDEF. The code incorrectly checked for CTLZ_ZERO_UNDEF instead of CTTZ_ZERO_UNDEF. While I was there I flipped the condition into an early out. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D136010	2022-10-17 10:15:39 -07:00
Kazu Hirata	ef9956f434	[IR] Rename FuncletPadInst::getNumArgOperands to arg_size (NFC) This patch renames FuncletPadInst::getNumArgOperands to arg_size for consistency with CallBase, where getNumArgOperands was removed in favor of arg_size in commit `3e1c787b31` Differential Revision: https://reviews.llvm.org/D136048	2022-10-17 10:15:10 -07:00
Simon Pilgrim	8e77458578	[DAG] visitShiftByConstant - replace constant detection with FoldConstantArithmetic Instead of checking that an operand is constant/opaque before calling getNode() and then checking that the result is a constant, just use FoldConstantArithmetic which will just early-out if the operands are not constant foldable.	2022-10-17 16:19:10 +01:00
Simon Pilgrim	af5942cc09	Remove trailing whitespace. NFC.	2022-10-17 15:20:26 +01:00
Peter Rong	c2e7c9cb33	[CodeGen] Using ZExt for extractelement indices. In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating `i1 true` into `i32 -1`. This is because IRTranslator uses SExt for indices. In this fix, we change the expected behavior of extractelement's index, moving from SExt to ZExt. This change includes both documentation, SelectionDAG and IRTranslator. We also included a test for AMDGPU, updated tests for AArch64, Mips, PowerPC, RISCV, VE, WebAssembly and X86 This patch fixes issue #57452. Differential Revision: https://reviews.llvm.org/D132978	2022-10-15 15:45:35 -07:00
Filipp Zhinkin	ef774bec63	[AArch64] Support SETCCCARRY lowering Support SETCCCARRY lowering to SBCS instruction. Related issue: https://github.com/llvm/llvm-project/issues/44629 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D135302	2022-10-14 22:29:31 +03:00
chenglin.bi	c1909d7337	[DAGCombiner] Fix crash for the merge stores with different value type The crash case comes from #58350. It have two stores, one store is type f32 and the other is v1f32. When we try to merge these two stores on v1f32, the memVT is vector type so the old code will use ISD::EXTRACT_SUBVECTOR for type f32 also then compiler crash. So this patch insert a build_vector for f32 store to generate v1f32 also when memVT is v1f32. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D135954	2022-10-15 01:16:35 +08:00
Nicola Lancellotti	ce1a2ccf94	[NFC] Fix typo in DAGCombiner	2022-10-14 17:47:25 +01:00
Sander de Smalen	02df03c5b7	[AArch64][SME] Add support for arm_locally_streaming functions. Functions with `aarch64_sme_pstatesm_body` will emit a SMSTART at the start of the function, and a SMSTOP at the end of the function, such that all operations use the right value for vscale. Because the placement of these nodes is critically important (i.e. no vscale-dependent operations should be done before SMSTART has been issued), we require glueing the CopyFromReg to the Entry node such that we can insert the SMSTART as part of that glued chain. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131582	2022-10-14 13:47:53 +00:00
Matt Arsenault	d0750ec475	AtomicExpand: Avoid some operations if the atomic is overaligned Let some of the pointer bithacking fold away if we know the LSB are 0.	2022-10-13 23:31:00 -07:00
Anshil Gandhi	d383adec4d	[BranchRelaxation] Fall through only if block has no unconditional branches Prior to inserting an unconditional branch from X to its fall through basic block, check if X has any terminators to avoid inserting additional branches. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134557	2022-10-13 22:48:41 -06:00
Matt Arsenault	c427ee9798	AsmPrinter: Remove pointless code in inline asm emission This was scanning through def operands looking for the symbol operand. This is pointless because the symbol is always the first operand as enforced by the verifier, and all operands are implicit.	2022-10-13 21:12:11 -07:00
Xiang1 Zhang	aad013de41	[InlineAsm][bugfix] Correct function addressing in inline asm In Linux PIC model, there are 4 cases about value/label addressing: Case 1: Function call or Label jmp inside the module. Case 2: Data access (such as global variable, static variable) inside the module. Case 3: Function call or Label jmp outside the module. Case 4: Data access (such as global variable) outside the module. Due to current llvm inline asm architecture designed to not "recognize" the asm code, there are quite troubles for us to treat mem addressing differently for same value/adress used in different instuctions. For example, in pic model, call a func may in plt way or direclty pc-related, but lea/mov a function adress may use got. This patch fix/refine the case 1 and case 2 in inline asm. Due to currently inline asm didn't support jmp the outsider lable, this patch mainly focus on fix the function call addressing bugs in inline asm. Reviewed By: Pengfei, RKSimon Differential Revision: https://reviews.llvm.org/D133914	2022-10-14 09:47:26 +08:00
David Green	16e4e4ab87	[CodeGenPrep] Handle constants in ConvertPhiType This is a simple addition to the convertPhiTypes in CodeGenPrepare to consider and convert constants as it converts the phi type. Someone fixed the bug in the motivating example, so the undef is now a constant 0. This does mean converting between integer and floating point constants, which may have different materialization. Differential Revision: https://reviews.llvm.org/D135561	2022-10-13 16:41:44 +01:00
Anton Sidorenko	4431e705cc	[NFC] Use forward decl of MachineCombinerPattern enum to reduce dependencies Differential Revision: https://reviews.llvm.org/D135776	2022-10-13 14:56:14 +01:00
Simon Tatham	526ce9c929	Propagate tied operands when copying a MachineInstr. MachineInstr's copy constructor works by calling the addOperand method to add each operand of the old MachineInstr to the new one, one by one. But addOperand deliberately avoids trying to replicate ties between operands, on the grounds that the tie refers to operands by index, and the indices aren't necessarily finalized yet. This led to a code generation fault when the machine pipeliner cloned an Arm conditional instruction, and lost the tie between the output register and the input value to be used when the condition failed to execute. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D135434	2022-10-13 09:40:35 +01:00
Mirko Brkusanin	8b8463ef6c	[SelectionDAG] Use consistent type sizes for opcode	2022-10-12 17:33:04 +02:00
Craig Topper	ac9209751a	Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))" This reverts commit `0148df8157`. Getting a lit test failures on AMDGPU but I can't reproduce it so far. Reverting to investigate.	2022-10-11 16:30:40 -07:00
Craig Topper	0148df8157	[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y)) (sra X, BW-1) is either 0 or -1. So the multiply is a conditional negate of Y. This pattern shows up when type legalizing wide multiplies involving a sign extended value. Fixes PR57549. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D133399	2022-10-11 16:20:55 -07:00
Jessica Paquette	0f1a51e173	[GlobalISel] Allow vectors in redundant or + add combines We support KnownBits for vectors, so we can enable these. https://godbolt.org/z/r9a9W4Gj1 Differential Revision: https://reviews.llvm.org/D135719	2022-10-11 15:31:09 -07:00
Jessica Paquette	036a13065b	[GlobalISel] Combine (X op Y) == X --> Y == 0 This matches patterns of the form ``` (X op Y) == X ``` And transforms them to ``` Y == 0 ``` where appropriate. Example: https://godbolt.org/z/hfW811c7W Differential Revision: https://reviews.llvm.org/D135380	2022-10-11 09:52:48 -07:00
Philip Reames	487695e7c9	[SDAG] Treat DemandedElts argument to isSplatVector as splat for scalable vectors [nfc] The previous code used a APInt(1, 0) to represent the demanded elts of a scalable vector, and then ignored that argument if type was scalable. This was inconsistent with the UndefElts parameter which is set to either APInt(1, 0) or APInt(1,1) - that is, implicitly broadcast across all lanes. Particularly since the undef code relied on the DemandedElts parameter having bitwidth 1 to achieve that result! This change switches the demanded parameter to APInt(1,1), documents the broadcast semantics, and takes advantage of it to remove one special case for scalable vectors which is no longer required.	2022-10-11 09:49:28 -07:00
Philip Reames	ac4f3fff8c	[SDAG] Clarify behavior of scalable demanded/undef elts in isSplatValue [nfc] Update comment, and add an assertion to check property expected by sole (non-test) caller. Remove tests which appear to have been copied from fixed vector tests, and whose demanded bits don't correspond to the way this interface is otherwise used.	2022-10-11 07:28:34 -07:00
Craig Topper	0121b1a4ac	Revert "[TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant." This reverts commit `d4facda414`. This has been reported to cause failures. Reverting while I investigate.	2022-10-10 14:53:29 -07:00
Craig Topper	d4facda414	[TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant. If the divisor is even, we can first shift the dividend and divisor right by the number of trailing zeros. Now the divisor is odd and we can do the original algorithm to calculate a remainder. Then we shift that remainder left by the number of trailing zeros and add the bits that were shifted out of the dividend. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D135541	2022-10-10 11:02:22 -07:00
wanglei	730ee6568c	[LoongArch] Set correct encodings for DWARF exception handling This patch sets correct encodings for DWARF exception handling for LoongArch. Differential Revision: https://reviews.llvm.org/D134710	2022-10-08 11:53:48 +08:00
Craig Topper	9f67047cf0	[VP][RISCV] Add vp.smax/smin/umax/umin intrinsics Differential Revision: https://reviews.llvm.org/D135418	2022-10-07 17:14:31 -07:00
Amaury Séchet	62ea6c5be7	[DAGCombine] Deduplicate addcarry node using commutativity. The first two parameters of addcarry are commutative. We may face a situation where both variant are present in the DAG, in which case we benefit from using just one. Depends on D57302 and D33587 Reviewed By: RKSimon, chfast Differential Revision: https://reviews.llvm.org/D57317	2022-10-08 00:55:14 +02:00
eopXD	dbc681c98e	[VP][RISCV] Add vp.roundtozero and its RISC-V support The scalar instruction of this is `llvm.trunc`. However the naming of ISD::VP_TRUNC is already taken by `trunc` of the LLVM IR. Naming this as `vp.ftrunc` would likely cause confusion with `vp.fptrunc`. So adding `vp.roundtozero` that will look similar to `vp.roundeven`. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D135233	2022-10-07 02:15:23 -07:00
Pierre van Houtryve	36c3833783	[GISel] Add Trunc/Lshr/BuildVector Folding Similar to the current "Trunc/BuildVector" folding - which folds low element extracts of BuildVectors, folds hi element extracts done using bitshifts. For D134354 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135148	2022-10-07 08:44:03 +00:00
Pierre van Houtryve	a34977c4d0	[GISel] Handle G_TRUNC in `matchExtractVecEltBuildVec` Spotted some cases in D134354 where this was an issue. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135147	2022-10-07 08:37:18 +00:00
Mike Hommey	d3b0e745e8	[CodeView] Avoid NULL deref of Scope Regression from D131400: cross-language LTO causes a crash in the compiler on the NULL deref of Scope in `isa` call when Rust IR is involved. Presumably, this might affect other languages too, and even Rust itself without cross-language LTO when the Rust compiler switched to LLVM 16. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D134616	2022-10-07 08:34:57 +02:00
Philip Reames	04bb32e58a	[DAG] Extract helper for (neg x) [nfc] This is a frequently reoccurring pattern, let's factor it out. Differential Revision: https://reviews.llvm.org/D135301	2022-10-06 13:23:52 -07:00
Pierre van Houtryve	3ec0085c3f	[DAG] Update `isKnownNeverNaN` for `FMA/FMAD` We can still get a NaN even if none of the operands are NaN, e.g. from +inf/-inf. D50804 didn't catch that. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134854	2022-10-06 06:52:36 +00:00
Ellis Hoag	549773f9e9	[Dwarf] Reference the correct CU when inlining Sometimes when a function is inlined into a different CU, `llvm-dwarfdump --verify` would find an inlined subroutine with an invalid abstract origin. This is because `DwarfUnit::addDIEEntry()` will incorrectly assume the inlined subroutine and the abstract origin are from the same CU if it can't find the CU for the inlined subroutine. In the added test, the inlined subroutine for `bar()` is created before the CU for `B.swift` is created, so it tries to point to `goo()` in the wrong CU. Interestingly, if we swap the order of the two functions then we don't see a crash since the module for `goo()` is created first. The fix is to give a parent DIE to `ScopeDIE` before calling `addDIEEntry()` so that its CU can be found. Luckily, `constructInlinedScopeDIE()` is only called once so we can pass it the DIE of the scope's parent and give it a child just after it's created. `constructInlinedScopeDIE()` should always return a DIE, so assert that it is not null. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D135114	2022-10-05 09:19:12 -07:00
Fraser Cormack	08497a785b	[VP] Fix unused variable in release configurations	2022-10-05 10:33:07 +01:00
Fraser Cormack	a3a9b0743e	[VP][NFC] Remove \brief commands from doxygen comments Following a precedent set in D46861.	2022-10-05 08:08:30 +01:00
Fraser Cormack	3362e2d57f	[VP] Add IR expansion for vp.icmp and vp.fcmp These intrinsics are simply expanded to regular icmp/fcmp instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121594	2022-10-05 08:07:39 +01:00
Serguei Katkov	d330731f94	[RegAllocFast] Clean-up. Remove redundant operations. NFC. Reviewed By: MatzeB, arsenm Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D109213	2022-10-05 11:38:54 +07:00
Amara Emerson	c5cebf78bd	[GlobalISel] Add computeNumSignBits() support for compares. Doing so allows G_SEXT_INREG to be combined away for many vector cases. Differential Revision: https://reviews.llvm.org/D135168	2022-10-05 00:28:08 +01:00
Amara Emerson	8055aa8e8a	[AArch64][GlobalISel] Make vector G_SEXT_INREG legal and allow combining. As a result of making these legal, and tweaking the combine to allow vectors, we generate vector G_SEXT_INREG during legalization. The reason we want to make these legal in the first place is to allow for more combine opportunities. Once those have been done, we can just lower them back to shifts in the post-legalizer lowering. This needs to be one commit otherwise we start causing tests to fail due to incomplete support for selection etc.	2022-10-05 00:28:08 +01:00
jeff	cebec42089	[DAGCombiner] [AMDGPU] Allow vector loads in MatchLoadCombine Since SROA chooses promotion based on reaching load / stores of allocas, we may run into scenarios in which we alloca a vector, but promote it to an integer. The result of which is the familiar LoadCombine pattern (i.e. ZEXT, SHL, OR). However, instead of coming directly from distinct loads, the elements to be combined are coming from ExtractVectorElements which stem from a shared load. This patch identifies such a pattern and combines it into a load. Change-Id: I0bc06588f11e88a0a975cde1fd71e9143e6c42dd	2022-10-04 12:16:00 -07:00
Sanjay Patel	17dcbd8165	[SDAG] don't hoist div/rem through a select with neutral constant This bug was introduced with D134966.	2022-10-04 13:15:01 -04:00
Jay Foad	af947d9fcb	[ISel] Fix crash in new FMA DAG combine Fix a crash in the FMA combine added by D132837 and amended by D134810. In cases where the newly created node could be folded, the combiner would fail this assertion: llc: DAGCombiner.cpp:268: void (anonymous namespace)::DAGCombiner::AddToWorklist(llvm::SDNode *): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed. Differential Revision: https://reviews.llvm.org/D135150	2022-10-04 15:13:18 +01:00
Amara Emerson	07ccf651b9	x[AArch64][GlobalISel] Enable vector support for G_SELECT->G_FMAXIMUM/MINIMUM. Vector support seems to work immediately, as long as we run the combine before legalization (so the vector SELECTs don't get lowered) and the legalizer rules are there to enable generation. Differential Revision: https://reviews.llvm.org/D135047	2022-10-03 21:39:52 +01:00
Philip Reames	a200b0fc25	[DAG] Introduce getSplat utility for common dispatch pattern [nfc] We have a very common pattern of dispatching between BUILD_VECTOR and SPLAT_VECTOR creation repeated in many cases in code. Common the pattern into a utility function.	2022-10-03 12:49:39 -07:00
Jessica Paquette	970cb99e0a	[GlobalISel] Combine `(x + y) - y -> x` and friends This adds a combine that handles ``` (x + y) - y -> x (x + y) - x -> y x - (y + x) -> 0 - y x - (x + z) -> 0 - z ``` On AArch64, we get added benefit for `0 - y` because it can be selected to a `neg` instruction. Differential Revision: https://reviews.llvm.org/D135010	2022-10-03 10:06:48 -07:00
Philip Reames	21f97fdc97	[DAG] Use getSplatBuildVector in a couple more places [nfc]	2022-10-03 09:48:49 -07:00
Markus Böck	36af4c8418	[SelectionDAG] Fix use-after-free introduced in D130881 The code introduced in https://reviews.llvm.org/D130881 has a bug as it may cause a use-after-free error that can be caught by ASAN. The bug essentially boils down to iterator invalidation of `DenseMap`. The expression `SDEI[To] = I->second;` may cause `SDEI` to grow if `To` is inserted for the very first time. When that happens, all existing iterators to the map are invalidated as their backing storage has been freed. Accessing `I->second` is then invalid and attempts to access freed memory (as `I` is an iterator of `SDEI`). This patch fixes that quite simply by first making a copy of `I->second`, and then moving into the possibly newly inserted KV of the ` DenseMap`. No test attached as I am not sure it is practible to test. Differential revision: https://reviews.llvm.org/D135019	2022-10-03 15:09:14 +02:00
Petar Avramovic	1fa2019828	[SelectionDAG] Add check for BUILD_VECTOR in isKnownNeverNaN Includes handling of constants with vector type in isKnownNeverNaN. For AMDGPU results in not making fcanonicalize during legalization for vector inputs to fmaxnum_ieee and fminnum_ieee. Does not affect end result since there is a combine that eliminates fcanonicalize. Differential Revision: https://reviews.llvm.org/D88573	2022-10-03 12:47:07 +02:00
Amara Emerson	3daf7ddaef	[GlobalISel] Allow prelegalizer combiners to have access to LegalizerInfo. Before, the isPreLegalize() query in CombinerHelper only checked for the presence of a LegalizerInfo object. This is problematic when we want to have a combine actually check for legality in a pre-legalizer combine pass, since if we pass a LegalizerInfo object to the constructor it causes the combines to think that we're running post legalizer, which isn't true. This change fixes it to instead check an explicit bool that passes to signal whether the pass will be run before or after legalization. Doing so exposed a bug in the extending loads combine, which tried to check for legality of candidate extending loads if LegalizerInfo was present. Since we only ran it pre-legalizer and therefore with a null LegalizerInfo, it never actually ran. Also fixes the legality checks to keep the tests passing. Differential Revision: https://reviews.llvm.org/D135044	2022-10-03 07:36:18 +01:00
David Green	3651635eca	[ARM][DAG] BF16 constant handling. Much like f16 and f32, we shouldn't try to shrink bf16 to smaller fp constant. The code may not be optimal, but this allows us to legalize bf16 constants under Arm without errors.	2022-10-02 11:51:08 +01:00
Yeting Kuo	cefb7aab61	[VP][RISCV] Add vp.copysign and RISC-V support. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134935	2022-10-01 10:19:10 +08:00
Eric Wang	5b26f4f042	Reland "[MLGO] ML Regalloc Priority Advisor" This relands commit `8f4f26ba5b`, which was reverted in `91c96a806c` because of Buildbot failures. The previous model test is not compatible with tflite. e.g. https://lab.llvm.org/buildbot/#/builders/6/builds/14041 Differential Revision: https://reviews.llvm.org/D133616	2022-09-30 16:27:26 -05:00
Guozhi Wei	feea3b990e	[LiveRangeEdit] Add a statistic variable for rematerialization Add a statistic variable for rematerialization. Differential Revision: https://reviews.llvm.org/D134907	2022-09-30 19:39:51 +00:00
Simon Pilgrim	61dc5014ac	[DAG] Update foldSelectWithIdentityConstant to use llvm::isNeutralConstant D133866 added the llvm::isNeutralConstant helper to track neutral/passthrough constants This patch updates foldSelectWithIdentityConstant to use the helper instead of maintaining its own opcode handling Differential Revision: https://reviews.llvm.org/D134966	2022-09-30 17:46:52 +01:00
Serge Pavlov	b3913a9cdf	[GlobalISel] Do not crash on widening vector result Function buildCopyToRegs did not handle properly the case when it should make wider vector result. It happened, for example, in a function that returns value of type <2 x f32>, which should be widen to <4 x f32> to fit XMM register. The function eventually calls MachineIRBuilder.buildUnmerge, which does not expect that only one destination register is specified. Now this case is treated specifically in buildCopyToRegs. Differential Revision: https://reviews.llvm.org/D128546	2022-09-30 21:30:55 +07:00
Pierre van Houtryve	7388520d1c	[GISel] Add more cases to isKnownNeverNaN Make it even with the DAG implementation as of D134854 Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D134857	2022-09-30 14:10:56 +00:00
Pierre van Houtryve	653beae5a1	[AMDGPU][GISel] Add Identity BUILD_VECTOR Combines Folds-away BUILD_VECTOR-related noops in the post-legalizer combiner. Depends on D134433 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134953	2022-09-30 14:07:13 +00:00
Amaury Séchet	031a7ad575	[NFC] Fix erroneous indentation.	2022-09-30 12:30:27 +00:00
Yeting Kuo	1cc02b05b7	[SelectionDAG] Add helper function to check whether a SDValue is neutral element. NFC. Using this helper makes work about neutral elements more easier. Although I only find one case now, I think it will have more chance to be used since so many combine works are related to neutral elements. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D133866	2022-09-30 11:29:11 +08:00
Mircea Trofin	91c96a806c	Revert "[MLGO] ML Regalloc Priority Advisor" This reverts commit `8f4f26ba5b`. Buildbot failures, e.g. https://lab.llvm.org/buildbot/#/builders/6/builds/14041	2022-09-29 18:26:40 -07:00
Amaury Séchet	923909afbe	[DAG] Simplify the select of constant combine code. NFC	2022-09-30 01:03:14 +00:00
Amaury Séchet	d7600c7ccb	[DAG] select Cond, C, -1 --> or (sext (not Cond)), C when C is MVT::i1 In the spirit of D130765 . Get rid of cbranches and/or cmov. Usually shorter, but sometime not, becaus eit's hard to prededict when dependency breaking xor will be introduced. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D134736	2022-09-30 00:36:58 +00:00
Eric Wang	8f4f26ba5b	[MLGO] ML Regalloc Priority Advisor The bulk of the implementation is common between 'release' mode (==AOT-ed model) and 'development' mode (for training), the main difference is that in development mode, we may also log features (for training logs), inject scoring information and then produce the log file. Differential Revision: https://reviews.llvm.org/D133616	2022-09-29 16:55:15 -05:00
Thomas Symalla	a41dde2c62	[AMDGPU] Add use check in v_fma combine. In D132837, an existing v_fma combine was extended to regard nested fma instructions. Originally, the inner FMA was checked for being used only once. In its current state, this check is missing, which causes some regressions. In this patch, this check was added. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D134856	2022-09-29 12:25:03 +02:00
Jessica Paquette	1eb49bbab6	[GlobalISel][CallLowering] Use hasRetAttr for return flags on CallBases Given something like this: ``` declare signext i16 @signext_callee() define i32 @caller() { %res = call i16 @signext_callee() ... } ``` CallLowering would miss that signext_callee's return value is sign extended, because it isn't on the call. Use hasRetAttr on the CallBase to allow us to catch this. (This now inserts G_ASSERT_SEXT/G_ASSERT_ZEXT like in the original review.) Differential Revision: https://reviews.llvm.org/D86228	2022-09-28 19:38:24 -07:00
Jessica Paquette	704b2e162c	[GlobalISel] Add isConstFalseVal helper to Utils Add a utility function which returns true if the given value is a constant false value. This is necessary to port one of the compare simplifications in TargetLowering::SimplifySetCC. Differential Revision: https://reviews.llvm.org/D91754	2022-09-28 15:44:26 -07:00
serge-sans-paille	16544cbe64	[iwyu] Move <cmath> out of llvm/Support/MathExtras.h Interestingly, MathExtras.h doesn't use <cmath> declaration, so move it out of that header and include it when needed. No functional change intended, but there's no longer a transitive include fromMathExtras.h to cmath.	2022-09-28 20:49:01 +02:00
Aiden Grossman	8d77f8fde7	[MLGO] Add per-instruction MBB frequencies to regalloc dev features This commit adds in two new features to the ML regalloc eviction analysis that can be used in ML models, a vector of MBB frequencies and a vector of indicies mapping instructions to their corresponding basic blocks. This will allow for further experimentation with per-instruction features and give a lot more flexibility for future experimentation over how we're extracting MBB frequency data currently. Reviewed By: mtrofin, jacobhegna Differential Revision: https://reviews.llvm.org/D134166	2022-09-28 18:45:04 +00:00
Jay Foad	2c12a04bba	[ISel] Fix DAG divergence after new FMA combine D132837 introduced a new DAG combine that used MorphNodeTo to morph an FMUL into an FMA. It turns out that MorphNodeTo does not properly update the divergence bit for users of the morphed node, causing an assertion failure on the new test case: llc: SelectionDAG.cpp:10486: void llvm::SelectionDAG::VerifyDAGDivergence(): Assertion `calculateDivergence(N) == N->isDivergent() && "Divergence bit inconsistency detected"' failed. Fixing MorphNodeTo to propagate the divergence bit is tricky because of the way it is used to select machine instructions, so use getNode and ReplaceAllUsesOfValueWith instead. Differential Revision: https://reviews.llvm.org/D134810	2022-09-28 19:41:51 +01:00
Craig Topper	12357e88af	[RISCV][SelectionDAGBuilder] Fix crash when copying a v1f32 vector between basic blocks. On a rv64 without f32 or vector support, this will be passed across the basic block as an i64. We need use i32 as an intermediate type with bitcast and anyext/trunc. Fixes PR58025 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D134758	2022-09-28 10:13:35 -07:00
Matt Arsenault	a61c3455c0	AtomicExpand: Use llvm.ptrmask instead of ptrtoint This removes the ptrtoint from the load's pointer operand, although we can't entirely eliminate these to get the LSB shift. In a future patch, this will avoid ptrtoint in the case where the atomic is overaligned to the word size.	2022-09-28 12:51:30 -04:00
Matt Devereau	0a4771a7e8	[AArch64][SVE] Expand gather index to 32 bits instead of 64 bits For gathers which load in 8 and 16 bit data then use that data as an index, the index can be extended to 32 bits instead of 64 bits Differential Revision: https://reviews.llvm.org/D130692	2022-09-28 14:42:12 +00:00
jacquesguan	465ac0b96e	[LegalizeTypes] Use getVectorElementCount to avoid crash of scalable vector. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D134718	2022-09-28 17:31:29 +08:00
eopXD	9677d70eb2	[VP][RISCV] Add vp.floor, vp.round, vp.roundeven and their RISC-V support Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134759	2022-09-27 19:45:58 -07:00
eopXD	163cb33854	[VP][RISCV] Add vp.ceil and RISC-V support Previous commit `8b00b24f85` missed to add `int_ceil` anchor for the llvm.ceil.* section under LangRef.rst Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134586	2022-09-27 12:04:09 -07:00
eopXD	384b8b3da7	Revert "[VP][RISCV] Add vp.ceil and RISC-V support" This reverts commit `8b00b24f85`.	2022-09-27 11:12:57 -07:00
eopXD	8b00b24f85	[VP][RISCV] Add vp.ceil and RISC-V support Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134586	2022-09-27 11:08:27 -07:00
Craig Topper	a6383bb51c	[VP][RISCV] Add vp.fmuladd. Expanded in SelectionDAGBuilder similar to llvm.fmuladd. Reviewed By: frasercrmck, simoll Differential Revision: https://reviews.llvm.org/D134474	2022-09-27 10:02:37 -07:00
Amaury Séchet	d1baed7c9c	[DAG] select Cond, -1, C --> or (sext Cond), C if Cond is MVT::i1 This seems to be beneficial overall, except for midpoint-int.ll . The X86 backend seems to generate zeroing that are not necesary. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D131260	2022-09-27 12:54:52 +00:00
Yeting Kuo	04e1301f3d	[VP][RISCV] Add vp.maxnum and vp.minnum intrinsics and RISC-V support. Add vp.maxnum and vp.minnum which are vector predicted intrinsics of llvm.maxnum and llvm.minnum. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134639	2022-09-27 13:36:45 +08:00
Paul Scoropan	ce004fb4f2	[PowerPC] XCOFF exception section support on the direct assembler path This feature implements support for making entries in the exception section on XCOFF on the direct assembly path using the ".except" pseudo-op. It also provides functionality to lower entries (comprised of language and reason codes) into the exception section through the use of annotation metadata attached to llvm.ppc.trap/trapd/tw/tdw intrinsics. Integrated assembler support will be provided in another review. https://reviews.llvm.org/D133030 needs to merge first for LIT tests Reviewed By: shchenz, RKSimon Differential Revision: https://reviews.llvm.org/D132146	2022-09-26 22:24:20 -04:00
Mircea Trofin	8a24e0cb5a	[nfc][mlgo] Lazily compute the regalloc reward Differential Revision: https://reviews.llvm.org/D134664	2022-09-26 15:34:29 -07:00
Craig Topper	afdd600a49	[LegalizeTypes][RISCV] Support f16 in ExpandIntRes_LLROUND_LLRINT. Promote f16 to f32 and use the f32 libcall. I deleted rv64zfh-half-intrinsics-strict.ll because it only existed due to this issue breaking rv32. Differential Revision: https://reviews.llvm.org/D134579	2022-09-26 11:09:33 -07:00
Amaury Séchet	b30bbd181b	Small formating nit in DAGCombiner. NFC	2022-09-26 13:36:11 +00:00
Yeting Kuo	43c5fbdd3a	[VP][RISCV] Add vp.sqrt intrinsic and RISC-V support. The patch modeled vp.fabs patch D132793. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D133690	2022-09-26 10:47:40 +08:00
Yi Kong	32994b7357	Make MLIR model URLs cache variables This allows us to directly use the models published on Github. Differential Revision: https://reviews.llvm.org/D134566	2022-09-23 15:21:53 -07:00
Afanasyev Ivan	d2e434c378	[RegisterCoalescer] fix dst subreg replacement during remat copy trick Instructions might use definition register as its "undef" operand. It happens on architectures with predicated executon: ``` %0:subreg = instruction op_1, ..., op_N, undef %0:subreg, op_N+2, ... ``` RegisterCoalescer should take into account all remat instruction operands during destination subregister fixup. ``` ; remat result before fix: %1 = instruction op_1, ..., op_N, undef %1:subreg, op_N+2, ... ; remat result after fix (correct): %1 = instruction op_1, ..., op_N, undef %1, op_N+2, ... ``` Differential Revision: https://reviews.llvm.org/D125657	2022-09-23 18:52:29 +00:00
Philip Reames	b9c4733079	[DAG] Move one-use add of splat to base of scatter/gather This extends the uniform base transform used with scatter/gather to support one-use vector adds-of-splats with a non-zero base. This has the effect of essentially reassociating an add from vector to scalar domain. The motivation is to improve the lowering of scatter/gather operations fed by complex geps. Differential Revision: https://reviews.llvm.org/D134472	2022-09-22 18:45:12 -07:00
Eric Wang	83c53d346f	[NFC][MLGO] Introduce logRewardIfNeeded method This patch introduces a logRewardIfNeeded method to reuse regallocscoring. Differential Revision: https://reviews.llvm.org/D134232	2022-09-22 19:22:32 -05:00
Philip Reames	60c91fd364	[RISCV] Disallow scale for scatter/gather RISCV doesn't actually support a scaled form of indexed load and store. We previously handled this by forming the scaled SDNode, and then doing custom legalization during lowering. This patch instead adds a callback via TLI to prevent formation entirely. This has two effects: * First, the GEP gets expanded (and used). Instead of the shift being created with an SDLoc of the memory operation, it has the SDLoc of the GEP instruction. This avoids the scheduler perturbing IR order when there's no reason to. * Second, we fix what appears to be a bug in index calculation with RV32. The rules for GEPs require index calculation be done in particular bitwidth, and it appears the custom legalization code got this wrong for the case where index type exceeds pointer width. (Or at least, I trust the generic GEP lowering to be correct a lot more.) The DAGCombiner change to handle VPScatter/VPGather is technically separate, but is required to prevent a regression on those intrinsics. Differential Revision: https://reviews.llvm.org/D134382	2022-09-22 15:31:26 -07:00
Craig Topper	9d236d4dab	[SelectionDAGBuilder] Simplify how VTs is created for constrained intrinsics. NFC All constrained intrinsics return a single value. We can directly convert it to an EVT instead of going through ComputeValueTypes.	2022-09-22 14:21:22 -07:00
Philip Reames	46525fee81	[DAGCombine] Check both forms of a commutative transform The transform to fold an add into the base of a scatter/gather was only checking to see if the LHS was a splat. Included test change indicates that splats are not canonicalized to LHS, and that we need to check both sides.	2022-09-22 12:21:47 -07:00
Jonathan Camilleri	08288052ae	[DebugInfo] Emit access specifiers for typedefs The accessibility level of a typedef or using declaration in a struct or class was being lost when producing debug information. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D134339	2022-09-22 17:08:41 +00:00
Amara Emerson	885a87033c	[GlobalISel] Enforce G_ASSERT_ALIGN to have a valid alignment > 0.	2022-09-22 16:05:07 +01:00
Matt Arsenault	94ebd7d9ff	MachineVerifier: Verify REG_SEQUENCE Somehow there was no verification of this, other than an ad-hoc assertion in TwoAddressInstructions.	2022-09-22 09:51:15 -04:00
Christudasan Devadasan	32a8260ccc	-dot-machine-cfg for printing MachineFunction to a dot file This pass allows a user to dump a MIR function to a dot file and view it as a graph. It is targeted to provide a similar functionality as -dot-cfg pass on LLVM-IR. As of now the pass also support below flags: -dot-mcfg-only [optional][won't print instructions in the graph just block name] -mcfg-dot-filename-prefix [optional][prefix to add to output dot file] -mcfg-func-name [optional] [specify function name or it's substring, handy if mir file contains multiple functions and you need to see graph of just one] More flags and details can be introduced as per the requirements in future. This pass is inspired from -dot-cfg IR pass and APIs are written in almost identical format. Patch by Yashwant Singh <Yashwant.Singh@amd.com> (yassingh) Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D133709	2022-09-22 12:48:33 +05:30
Fangrui Song	2d975f1efe	[GlobalISel] Fix std::max after D134380	2022-09-21 14:09:04 -07:00
Amara Emerson	85cd376f70	[GlobalISel] Fix known bits for G_ASSERT_ALIGN. I don't know what was going on originally with these tests. It seems reasonable to have the immediate be the same byte alignment unit as the IR, in which case we need to take the log2 in order to set the right number of low bits. This fixes a miscompile in chromium. Differential Revision: https://reviews.llvm.org/D134380	2022-09-21 21:34:05 +01:00
Guozhi Wei	c39311eb40	[RegisterCoalescer] Use LiveRangeEdit to handle rematerialization This patch uses the API provided by LiveRangeEdit to handle rematerialization. It will make future maintenance and improvement more easier. No functional change. Differential Revision: https://reviews.llvm.org/D133610	2022-09-21 17:51:07 +00:00
Philip Reames	143f3bf8f4	[SDAG] Split handling of VPLoad/VPGather and VPStore/VPScatter [nfc] The merged routines are not-idiomatic, and the code sharing that results is prettty minimal. The confusion factor is not justified.	2022-09-21 09:06:02 -07:00
Thomas Symalla	c98a46fee6	[ISel] Enable generating more fma instructions. This patch changes a FADD / FMUL => FMA ISel pattern implemented in D80801 so that it peeks through more than one FMA. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D132837	2022-09-21 12:03:11 +02:00
Matt Arsenault	b9a371f6d1	AtomicExpand: Use correct pointer size for integer This was using the default address space.	2022-09-20 16:51:05 -04:00
Amara Emerson	78833a43e8	[GlobalISel][Legalizer] Fix lowerSelect() not sign-extending the mask value. I'm not sure why the SEXT_INREG was gated on a bitwidth check of the mask vs element size. This fixes a miscompile in chromium's skia library. Differential Revision: https://reviews.llvm.org/D134236	2022-09-20 16:40:34 +01:00
Matt Arsenault	34fb7803f8	GlobalISel: Pass through AssumptionCache	2022-09-19 19:10:51 -04:00
Matt Arsenault	bcb931c484	SelectionDAG: Add AssumptionCache analysis dependency Fixes compile time regression after `bb70b5d406`	2022-09-19 19:10:51 -04:00
Matt Arsenault	0d8ffcc532	Analysis: Add AssumptionCache argument to isDereferenceableAndAlignedPointer This does not try to pass it through from the end users.	2022-09-19 18:57:33 -04:00
Simon Pilgrim	8206044183	[DAG] SimplifyDemandedVectorElts - add MULHS/MULHU handling to existing MUL/AND handling Allows to determine known zero elements, which particularly helps simplification of DIV/REM by constant patterns	2022-09-19 12:44:43 +01:00
Simon Pilgrim	47cfe71027	[DAG] MatchRotate - reuse existing LHSShiftArg/RHSShiftArg variables. NFC.	2022-09-18 14:35:10 +01:00
Kai Nacke	ae35188f97	[GISel] Fix match tree emitter. The following changes are necessasy to get the generated tree matcher to compile: - In CodeExpansions::declare(), the assert() prevents connecting two instructions. E.g. the match code (match (MUL $t, $s1, $s2), (SUB $d, $t, $s3)), results in two declarations of $t, one for the def and one for the use. Removing the assertion allows this construct. If $t is later used, it is one of the operands, which should be perfectly fine. - The code emitted in GIMatchTreeVRegDefPartitioner::generatePartitionSelectorCode() is not compilable: - The value of NewInstrID should be emitted, not the name - Both calls involving getOperand() end with one parenthesis too many - Swaps generated condition for the partition code in the latter function It also changes the rules i2p_to_p2i, fabs_fabs_fold, and fneg_fneg_fold to use the tree matcher for a linear match. These rules are tested by: CodeGen/AArch64/GlobalISel/combine-fabs.mir CodeGen/AArch64/GlobalISel/combine-fneg.mir CodeGen/AArch64/GlobalISel/combine-ptrtoint.mir CodeGen/AMDGPU/GlobalISel/combine-add-nullptr.mir Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D133257	2022-09-18 00:00:15 +00:00
Aiden Grossman	e5e3dccd07	[mlgo] Add in-development instruction based features for regalloc advisor This patch adds in instruction based features to the regalloc advisor gated behind a flag so a user can decide at runtime whether or not they want to enable the feature. The features are only enabled when LLVM is compiled in MLGO develpment mode (LLVM_HAVE_TF_API) is set to true. To extract the instruction features, I'm taking a list of segments from each LiveInterval and noting the start and end SlotIndices. This list is then sorted based on the start SlotIndex and I iterate through each SlotIndex to grab instructions, making sure to check for overlaps. This results in a vector of opcodes and binary mapping matrix that maps live ranges to the opcodes of the instructions within that LR. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D131930	2022-09-17 19:54:45 +00:00
Kazu Hirata	20d764aff0	[llvm] Don't including SetVector.h (NFC) llvm/lib/ProfileData/RawMemProfReader.cpp uses SetVector without including SetVector.h, so this patch adds an appropriate #include there.	2022-09-17 12:36:43 -07:00
Vladislav Dzhidzhoev	6cf11f4462	[GlobalISel][DebugInfo] Salvage trivially dead instructions Use salvageDebugInfo for instructions erased as trivially dead in GlobalISel. It would be helpful to implement support of G_PTR_ADD and G_FRAME_INDEX in salvageDebugInfo in future in order to preserve more variable location. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D133986	2022-09-17 03:54:55 +03:00
Sotiris Apostolakis	b827e7c600	[SelectOpti] Restrict load sinking This is a follow-up to D133777, which resolved a use-after-free case but did not cover all possible memory bugs due to misplacement of loads. In short, the overall problem was that sinked loads could be moved after state-modifying instructions leading to memory bugs. The solution is to restrict load sinking unless it is found to be sound. i) Within a basic block (to-be-sinked load and select-user are in the same BB), loads can be sinked only if there is no intervening state-modifying instruction. This is a conservative approach to avoid resorting to alias analysis to detect potential memory overlap. ii) Across basic blocks, sinking of loads is avoided. This is because going over multiple basic blocks looking for memory conflicts could be computationally expensive and also unlikely to allow loads to sink. Further, experiments showed that not sinking these loads has a slight positive performance effect. Maybe for some of these loads, having some separation allows enough time for the load to be executed in time for its user. This is not the case for floating point operations that benefit more from sinking. The solution in D133777 was essentially undone in this patch, since the latter is a complete solution to the observed problem. Overall, the performance impact of this patch is minimal. Tested on two internal Google workloads with instrPGO. Search application showed <0.05% perf difference, while the database one showed a slight improvement, but not statistically significant. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D133999	2022-09-16 20:50:46 +00:00
Jessica Paquette	1076b31da8	[GlobalISel] Combine select + fcmp to fminnum/fmaxnum/fminimum/fmaximum This is a partial port of the code used by the SelectionDAGBuilder to translate selects. In particular, see matchSelectPattern in ValueTracking.cpp. This is a GISel-equivalent of the portion which handles fminnum/fmaxnum/fminimum/fmaximum. I tried to set it up so it'd be easy to add the non-FP cases. Those are simpler. On the AArch64-end, it seems like the FP cases are more important for perf right now, so I bit the bullet and went at the more complicated problem. :) I elected to do this as a post-legalize combine rather than in the IRTranslator because Deciding which fmax/fmin to use can depend on legalization rules Philosophically-speaking (TM), putting it in a combine just feels cleaner Being able to enable/disable the combine is handy Another option would be to use the ValueTracking code in the IRTranslator and match what SelectionDAGBuilder::visitSelect does. I think that may be somewhat annoying since we'd need to write lowerings back into the selects in the legalizer. I'm not strongly opposed to the approach. We'd also want to be careful with vector selects once that's implemented, which explicitly check if a vector select is legal on the target. That'd probably need a hook. From what I can tell, doing this as a combine is probably a cleaner option long-term. Differential Revision: https://reviews.llvm.org/D116702	2022-09-16 13:35:46 -07:00
Pengxuan Zheng	59365f33e2	[MachineCSE] Add a threshold to avoid spending too much time in isProfitableToCSE Currently, it can become extremely costly to compute MayIncreasePressure if the size of CSUses turns out to be very large. In that case, it's no longer cost effective to keep computing MayIncreasePressure. Therefore, to limit the amount of time spent in isProfitableToCSE, we simply conservatively assume MayIncreasePressure if the size of CSUses is too large. This can reduce overall compile time by 30% for some benchmarks. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134003	2022-09-16 13:35:17 -07:00
Craig Topper	1121eca685	[VP][VE] Default VP_SREM/UREM to Expand and add generic expansion using VP_SDIV/UDIV+VP_MUL+VP_SUB. I want to default all VP operations to Expand. These 2 were blocking because VE doesn't support them and the tests were expecting them to fail a specific way. Using Expand caused them to fail differently. Seemed better to emulate them using operations that are supported. @simoll mentioned on Discord that VE has some expansion downstream. Not sure if its done like this or in the VE target. Reviewed By: frasercrmck, efocht Differential Revision: https://reviews.llvm.org/D133514	2022-09-16 13:19:02 -07:00
David Majnemer	8a868d8859	Revert "Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView"" This reverts commit `cd20a18286` and adds a "let Heading" to NoStackProtectorDocs.	2022-09-16 19:39:48 +00:00
Stanislav Mekhanoshin	a0c8f5fefa	[SDAG] Print divergence in SDNode::dump If target does not support divergence the field is set to false and not printed. Differential Revision: https://reviews.llvm.org/D133984	2022-09-16 11:43:34 -07:00
Juan Manuel MARTINEZ CAAMAÑO	e438f2d821	[DAGCombine] Do not fold SRA/SRL of MUL into MULH when MUL's LSB are used, and MUL_LOHI is available Folding into a sra(mul) / srl(mul) into a mulh introduces an extra multiplication to compute the high half of the multiplication, while it is more profitable to compute the high and lower halfs with a single mul_lohi. Differential Revision: https://reviews.llvm.org/D133768	2022-09-16 15:48:36 +00:00
Liqiang Tao	2e37557fde	StackProtector: ensure stack checks are inserted before the tail call The IR stack protector pass should insert stack checks before the tail calls not only the musttail calls. So that the attributes `ssqreq` and `tail call`, which are emited by llvm-opt, could be both enabled by llvm-llc. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D133860	2022-09-16 22:24:46 +08:00
Florian Hahn	6b86b481e3	[AArch64] Use tbl for truncating vector FPtoUI conversions. On AArch64, doing the vector truncate separately after the fptoui conversion can be lowered more efficiently using tbl.4, building on D133495. https://alive2.llvm.org/ce/z/T538CC Depends on D133495 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133496	2022-09-16 14:57:43 +01:00
Florian Hahn	8491d01cc3	[AArch64] Lower vector trunc using tbl. Similar to using tbl to lower vector ZExts, tbl4 can be used to lower vector truncates. The initial version support i32->i8 conversions. Depends on D120571 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133495	2022-09-16 12:42:49 +01:00
Nikita Popov	b4309800e9	[CodeGen] Don't zero callee-save registers with zero-call-used-regs (PR57692) Callee save registers must be preserved, so -fzero-call-used-regs should not be zeroing them. The previous implementation only did not zero callee save registers that were saved&restored inside the function, but we need preserve all of them. Fixes https://github.com/llvm/llvm-project/issues/57692. Differential Revision: https://reviews.llvm.org/D133946	2022-09-16 11:52:29 +02:00
Florian Hahn	5871f18827	[AArch64] Lower extending uitofp using tbl. On AArch64, doing the zero-extend separately first can be lowered more efficiently using tbl, building on D120571. https://alive2.llvm.org/ce/z/8Je595 Depends on D120571 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133494	2022-09-16 10:20:25 +01:00
Yuta Mukai	116838b151	[MachinePipeliner] Fix the interpretation of the scheduling model The method of counting resource consumption is modified to be based on "Cycles" value when DFA is not used. The calculation of ResMII is modified to total "Cycles" and divide it by the number of units for each resource. Previously, ResMII was excessive because it was assumed that resources were consumed for the cycles of "Latency" value. The method of resource reservation is modified similarly. When a value of "Cycles" is larger than 1, the resource is considered to be consumed by 1 for cycles of its length from the scheduled cycle. To realize this, ResourceManager maintains a resource table for all slots. Previously, resource consumption was always 1 for 1 cycle regardless of the value of "Cycles" or "Latency". In addition, the number of micro operations per cycle is modified to be constrained by "IssueWidth". To disable the constraint, --pipeliner-force-issue-width=100 can be used. For the case of using DFA, the scheduling results are unchanged. Reviewed By: dpenry Differential Revision: https://reviews.llvm.org/D133572	2022-09-16 09:51:48 +09:00
Alexander Timofeev	fbdea5a2e9	[AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler. It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D133593	2022-09-15 22:03:56 +02:00
Florian Hahn	81a11da762	[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl. This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops using a wide shuffle creating a v64i8 vector, selecting groups of 3 zero elements and an element from the input. This is profitable on AArch64 where such shuffles can be lowered to tbl instructions, but only in loops, because it requires materializing 4 masks, which can be done in the loop preheader. This is the only reason the transform is part of CGP. If there's a better alternative I missed, please let me know. The same goes for the shouldReplaceZExtWithShuffle hook which guards this. I am not sure if this transform will be beneficial on other targets, but it seems like there is no way other convenient way. This improves the generated code for loops like the one below in combination with D96522. int foo(uint8_t p, int N) { unsigned long long sum = 0; for (int i = 0; i < N ; i++, p++) { unsigned int v = p; sum += (v < 127) ? v : 256 - v; } return sum; } https://clang.godbolt.org/z/Wco866MjY Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D120571	2022-09-15 19:18:13 +01:00
Sergei Barannikov	c6acb4eb0f	[SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s All in-tree targets pass pointer-sized ConstantSDNodes to the method. This overload reduced amount of boilerplate code a bit. This also makes getCALLSEQ_END consistent with getCALLSEQ_START, which already takes uint64_ts.	2022-09-15 14:02:12 -04:00
Matt Arsenault	63d1d37d35	RegAllocGreedy: Avoid overflowing priority bitfields The class priority is expected to be at most 5 bits before it starts clobbering bits used for other fields. Also clamp the instruction distance in case we have millions of instructions. AMDGPU was accidentally overflowing into the global priority bit in some cases. I think in principal we would have wanted this, but in the cases I've looked at, it had the counter intuitive effect and de-prioritized the large register tuple. Avoid using weird bit hack PPC uses for global priority. The AllocationPriority field is really 5 bits, and PPC was relying on overflowing this to 6-bits to forcibly set the global priority bit. Split this out as a separate flag to avoid having magic behavior for values above 31.	2022-09-15 10:38:40 -04:00
Stanislav Mekhanoshin	ef4b9c33f5	Fix crash while printing MMO target flags MachineMemOperand::print can dereference a NULL pointer if TII is not passed from the printMemOperand. This does not happen while dumping the DAG/MIR from llc but crashes the debugger if a dump() method is called from gdb. Differential Revision: https://reviews.llvm.org/D133903	2022-09-14 17:29:48 -07:00
Roland Froese	207228c1d6	[DAGCombiner] More load-store forwarding for big-endian Get some load-store forwarding cases for big-endian where a larger store covers a smaller load, and the offset would be 0 and handled on little-endian but on big-endian the offset is adjusted to be non-zero. The idea is just to shift the data to make it look like the offset 0 case. Differential Revision: https://reviews.llvm.org/D130115	2022-09-14 15:36:35 -04:00
Marco Elver	4627a30acf	[MIR] Support printing and parsing pcsections Adds support for printing and parsing PC sections metadata in MIR. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D133785	2022-09-14 10:30:25 +02:00
YongKang Zhu	5fa6b24354	Address feedback in https://reviews.llvm.org/D133637 https://reviews.llvm.org/D133637 fixes the problem where we should hash raw content of register mask instead of the pointer to it. Fix the same issue in `llvm::hash_value()`. Remove the added API `MachineOperand::getRegMaskSize()` to avoid potential confusion. Add an assert to emphasize that we probably should hash a machine operand iff it has associated machine function, but keep the fallback logic in the original change. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D133747	2022-09-13 16:12:41 -07:00
Matt Arsenault	d00b0aab31	RegisterCoalescer: Fix verifier error when merging copy of undef There's no real read of the register, so the copy introduced a new live value. Make sure we introduce a replacement implicit_def instead of just erasing the copy. Found from llvm-reduce since it tries to set undef on everything.	2022-09-13 18:40:28 -04:00
Sotiris Apostolakis	eda61fb656	[SelectOpti] Fix lifetime intrinsic bug When a select is converted to a branch and load instructions are sinked to the true/false blocks, lifetime intrinsics (if present) could be made unsound if not moved. This conservatively moves all lifetime intrinsics in a transformed BB to the end block to ensure preserved lifetime semantics. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D133777	2022-09-13 19:00:18 +00:00
Craig Topper	efd5acf120	[LegalizeTypes][NVPTX] Remove extra compare from fallback code for ISD::ADD in ExpandIntRes_ADDSUB. This is the ultimate fallback code if UADDO isn't supported. If the target uses 0/1 we used one compare, but if the target doesn't use 0/1 we emitted two compares. Regardless of boolean constants we should only need to check that the Result is less than one of the original operands. So we only need one compare. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D133708	2022-09-13 09:07:56 -07:00
Hendrik Greving	393a17b5d1	[ValueTypes] Define MVTs for v256i2/v128i4. Adds MVT::v256i2, MVT::v128i4. Differential Revision: https://reviews.llvm.org/D133603	2022-09-13 09:02:23 -07:00
Matt Arsenault	740f920a1f	LiveRegUnits: Break register loop when a clobber is encountered	2022-09-13 10:15:08 -04:00
Matt Arsenault	b7dae832e6	DeadMachineInstructionElim: Don't repeat per-function init This was happening for every iteration but only needs to be done once.	2022-09-13 08:19:54 -04:00
Matt Arsenault	243632c63e	LiveRegUnits: Cleanup isReg checks This is the common case and should be checked first. Provides a very marginal compile time improvement on the example I'm looking at.	2022-09-13 08:19:53 -04:00
Pavel Samolysov	02aaf8e3d6	[NFC][ScheduleDAGInstrs] Use structure bindings and emplace_back Some uses of std::make_pair and the std::pair's first/second members in the ScheduleDAGInstrs.[cpp\|h] files were replaced with using of the vector's emplace_back along with structure bindings from C++17.	2022-09-13 12:49:04 +03:00
Sylvestre Ledru	cd20a18286	Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView" Causing: https://github.com/llvm/llvm-project/issues/57709 This reverts commit `ab56719acd`.	2022-09-13 10:53:59 +02:00
David Green	f124e59b2e	[TypePromotionPass] Don't treat phi's as ToPromote This attempts to stop the type promotion pass transforming where it is not profitable, by not marking PhiNodes as ToPromote and being more aggressive about pulling extends out of loops. Differential Revision: https://reviews.llvm.org/D133203	2022-09-13 08:57:15 +01:00
Matt Arsenault	920b2e65fc	DeadMachineInstructionElim: Fix typo	2022-09-12 20:10:33 -04:00
Aiden Grossman	eec183c171	[nfc] Refactor SlotIndex::getInstrDistance to better reflect actual functionality This patch refactors SlotIndex::getInstrDistance to SlotIndex::getApproxInstrDistance to better describe the actual functionality of this function. This patch also adds in some additional comments better documenting the assumptions that this function makes to increase clarity. Based on discussion on the LLVM Discourse: https://discourse.llvm.org/t/odd-behavior-in-slotindex-getinstrdistance/64934/5 Reviewed By: mtrofin, foad Differential Revision: https://reviews.llvm.org/D133386	2022-09-12 23:33:35 +00:00
Amara Emerson	f24f469223	[GlobalISel] Fix crash when lowering G_SELECT of pointer vectors. The bit masking lowering only works for vectors of scalars, so for pointer element types we need to add some casting. Differential Revision: https://reviews.llvm.org/D133672	2022-09-13 00:01:37 +01:00
Matt Arsenault	d90f7cb559	LiveRegUnits: Do not use phys_regs_and_masks Somehow DeadMachineInstructionElim is about 3x slower when using it. Hopefully this reverses the compile time regression reported for `b5041527c7`.	2022-09-12 17:21:24 -04:00
David Majnemer	ab56719acd	[clang, llvm] Add __declspec(safebuffers), support it in CodeView __declspec(safebuffers) is equivalent to __attribute__((no_stack_protector)). This information is recorded in CodeView. While we are here, add support for strict_gs_check.	2022-09-12 21:15:34 +00:00
Kazu Hirata	9606608474	[llvm] Use x.empty() instead of llvm::empty(x) (NFC) I'm planning to deprecate and eventually remove llvm::empty. I thought about replacing llvm::empty(x) with std::empty(x), but it turns out that all uses can be converted to x.empty(). That is, no use requires the ability of std::empty to accept C arrays and std::initializer_list. Differential Revision: https://reviews.llvm.org/D133677	2022-09-12 13:34:35 -07:00
YongKang Zhu	481a32f587	Bug fix on stable hash calculation for machine operands RegisterMask and RegisterLiveOut MachineOperand::getRegMask() returns a pointer to register mask. We should hash the raw content of register mask instead of its pointer. Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D133637	2022-09-12 13:25:04 -07:00
Craig Topper	38ffa2bb96	[LegalizeTypes] Improve splitting for urem/udiv by constant for some constants. For remainder: If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer type, this urem will be expand by DAGCombiner using multiply by magic constant. We do have to take into account that adding high and low together can produce a carry, making it a (BitWidth / 2)+1 bit number. So we need to also add back in the carry from the first addition. For division: We can use the above trick to compute the remainder, subtract that remainder from the dividend, then multiply by the multiplicative inverse of the Divisor modulo (1 << BitWidth). This is based on the section "Remainder by Summing Digits" in Hacker's delight. The remainder trick is similar to a trick you may have learned for determining if a decimal number is divisible by 3. You can add all the digits together and see if the sum is divisible by 3. If you're not sure if the sum is divisible by 3, you can add its digits together. This can be repeated until you have a single decimal digit. If that digit is 3, 6, or 9, then the original number is divisible by 3. This works because 10 % 3 == 1. gcc already does this same trick. There are additional tricks gcc does urem as well as srem, udiv, and sdiv that I plan to add in future patches. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130862	2022-09-12 10:34:52 -07:00
Matthias Gehre	c1502425ba	Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth Also remove new-pass-manager version of ExpandLargeDivRem because there is no way yet to access TargetLowering in the new pass manager. Differential Revision: https://reviews.llvm.org/D133691	2022-09-12 17:06:16 +01:00
Jay Foad	210e6a993d	[GlobalISel] Simplify extended add/sub to add/sub with carry Simplify extended add/sub (with carry-in and carry-out) to add/sub with carry (with carry-out only) if carry-in is known to be zero. Differential Revision: https://reviews.llvm.org/D133702	2022-09-12 17:05:44 +01:00
Matt Arsenault	e30271169f	RegAllocGreedy: Try local instruction splitting with subranges This was only trying this to relax register class constraints, but this can also help if there are subranges involved. This solves a compilation failure for AMDGPU when there is high pressure created by large register tuples. If one virtual register is using most of the available budget, we need to be able to evict subranges. This solves the immediate failure, but this solution leaves a lot to be desired. In the relevant testcases, we have 32-element tuples but most of the uses are operations on 1 element subranges of it. What we're now getting is a spill and restore of the full 1024 bits and an extract of the used 32-bits. It would be far better if we introduced a copy to a new virtual register with a smaller register class and used narrower spills. Furthermore, we could probably do a better job if the allocator were to introduce new subranges where none previously existed in the highest pressure scenarios. The block and region splits should also try to split specific subranges out. The mve-vst3.ll test changes looks like noise to me, but instruction count increased by one. mve-vst4.ll looks like a solid improvement with several 16-byte spills eliminated. splitkit-copy-live-lanes.mir also shows a solid reduction in total spill count. This could use more tests but it's pretty tiring to come up with cases that fail on this.	2022-09-12 09:03:55 -04:00
Pavel Samolysov	f045d0c392	[NFC][ScheduleDAG] Use a reference to iterate over NodeSuccs/ChainSuccs	2022-09-12 15:54:48 +03:00
Pavel Samolysov	05946c144d	[NFC][ScheduleDAG] Use structure bindings and emplace_back Some uses of std::make_pair and the std::pair's first/second members in the ScheduleDAGRRList.cpp file were replaced with using of the vector's emplace_back along with structure bindings from C++17.	2022-09-12 15:54:48 +03:00
Matt Arsenault	c34679b2e8	DAG: Sink some getter code closer to uses	2022-09-12 08:38:35 -04:00
Matt Arsenault	bb70b5d406	CodeGen: Set MODereferenceable from isDereferenceableAndAlignedPointer Previously this was assuming piontsToConstantMemory implies dereferenceable.	2022-09-12 08:38:35 -04:00
Pavel Samolysov	354a3d9c02	[NFC][ScheduleDAG] Use Register and MCPhysReg instead of unsigned	2022-09-12 15:18:11 +03:00
Matt Arsenault	b5041527c7	DeadMachineInstructionElim: Switch to using LiveRegUnits Theoretically improves compile time for targets with many overlapping registers	2022-09-12 07:55:14 -04:00
Matt Arsenault	6c44a7179f	RegAlloc: Use SmallSet instead of std::set There shouldn't be more than a small handful of hints at most.	2022-09-12 07:55:10 -04:00
David Spickett	739b69e655	[LLVM][AArch64] Explain that X19 is used as the frame base pointer register Fixes #50098 LLVM uses X19 as the frame base pointer, if it needs to. Meaning you can get warnings if you clobber that with inline asm. However, it doesn't explain why. The frame base register is not part of the ABI so it's pretty confusing why you get that warning out of the blue. This adds a method to explain a reserved register with X19 as the first one. The logic is the same as getReservedRegs. I could have added a return parameter to isASMClobberable and friends but found that there's a lot of things that call isReservedReg in various ways. So while one more method on the pile isn't great design, it is simpler right now to do it this way and only pay the cost if you are actually using a reserved register. Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D133213	2022-09-12 09:18:09 +00:00
Kazu Hirata	af91e2b9db	[GlobalISel] Use std::initializer_list::size (NFC)	2022-09-11 12:19:37 -07:00
Manuel Brito	b51c6130ef	Use PoisonValue instead of UndefValue when RAUWing unreachable code [NFC] Replacing the following instances of UndefValue with PoisonValue, where the UndefValue is used as an arbitrary value: - llvm/lib/CodeGen/WinEHPrepare.cpp `demotePHIsOnFunclets`: RAUW arbitrary value for lingering uses of removed PHI nodes - llvm/lib/Transforms/Utils/BasicBlockUtils.cpp `FoldSingleEntryPHINodes`: Removes a self-referential single entry phi node. - llvm/lib/Transforms/Utils/CallGraphUpdater.cpp `finalize`: Remove all references to removed functions. - llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp `cleanup`: the result is not used then the inserted instructions are removed. - llvm/tools/bugpoint/CrashDebugger.cpp `TestInts`: the program is cloned and instructions are removed to narrow down source of crash. Differential Revision: https://reviews.llvm.org/D133640	2022-09-10 14:28:01 +01:00
Craig Topper	545affbf79	[DAGCombiner] Use HandleSDNode to keep node alive across call to getNegatedExpression. getNegatedExpression can delete nodes. If the first call to getNegatedExpression produced a node that the second call also manages to create, it might get deleted. Use a HandleSDNode to ensure it has a use to prevent it from being deleted. Fixes PR57658. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D133602	2022-09-09 22:02:41 -07:00
Sebastian Neubauer	c7750c522e	Add helper func to get first non-alloca position The LLVM performance tips suggest that allocas should be placed at the beginning of the entry block. So far, llvm doesn’t provide any helper to find that position. Add BasicBlock::getFirstNonPHIOrDbgOrAlloca and IRBuilder::SetInsertPointPastAllocas(Function*) that get an insert position after the (static) allocas at the start of a function and use it in ShadowStackGCLowering. Differential Revision: https://reviews.llvm.org/D132554	2022-09-09 15:39:53 +02:00
Craig Topper	aa83bdd198	[DAGCombiner][X86] Fold (sub (subcarry X, 0, Carry), Y) -> (subcarry X, Y, Carry) Fixes PR57576. Differential Revision: https://reviews.llvm.org/D133471	2022-09-08 22:56:46 -07:00
Joe Loser	5e96cea1db	[llvm] Use std::size instead of llvm::array_lengthof LLVM contains a helpful function for getting the size of a C-style array: `llvm::array_lengthof`. This is useful prior to C++17, but not as helpful for C++17 or later: `std::size` already has support for C-style arrays. Change call sites to use `std::size` instead. Differential Revision: https://reviews.llvm.org/D133429	2022-09-08 09:01:53 -06:00
Eric Wang	d8a2d3f7d4	[NFC][Regalloc] Introduce the RegAllocPriorityAdvisorAnalysis This patch introduces the priority analysis and the priority advisor, the default implementation, and the scaffolding for introducing the other implementations of the advisor. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D132835	2022-09-08 07:50:03 -07:00
Arthur Eubanks	7e0a52e8e9	[NFC][MachineFunctionPass] Only lookup pass name if we request printing Should report the small compile time regression reported in D133055.	2022-09-07 21:38:00 -07:00
Marco Elver	343700358f	[AsmPrinter] Emit PCs into requested PCSections Interpret MD_pcsections in AsmPrinter emitting the requested metadata to the associated sections. Functions and normal instructions are handled. Differential Revision: https://reviews.llvm.org/D130879	2022-09-07 11:36:02 +02:00
Marco Elver	31a548021b	[GlobalISel] Propagate PCSections metadata to MachineInstr Propagate (most) PC sections metadata to MachineInstr when GlobalISel is doing instruction selection. This change results in support for architectures using GlobalISel (such as -O0 with AArch64). Not all instructions may be supported yet, and requires further target-specific handling (such as done for AArch64 pseudo-atomics). Expanding supported instructions is planned on a case-by-case basis and new use cases for PC sections metadata. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130886	2022-09-07 11:36:02 +02:00
Marco Elver	f0d6709e4a	[AtomicExpandPass] Always copy pcsections Metadata to expanded atomics When expanding IR atomics to target-specific atomics, copy all !pcsections Metadata to expanded atomics automatically. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130885	2022-09-07 11:36:01 +02:00
Marco Elver	0ba8886af5	[FastISel] Propagate PCSections metadata to MachineInstr Propagate PC sections metadata to MachineInstr when FastISel is doing instruction selection. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130884	2022-09-07 11:36:01 +02:00
Marco Elver	4c58b00801	[SelectionDAG] Propagate PCSections through SDNodes Add a new entry to SDNodeExtraInfo to propagate PCSections through SelectionDAG. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130882	2022-09-07 11:22:50 +02:00
Xiang1 Zhang	16743c9534	[CodeGen] Limit building time in CodeGenPrepare for huge function Details: Currently CodeGenPrepare is very time consuming in handling big functions. Old Algorithm : It iterate each BB in function, and go on handle very instructions in BB. Due to some instruction optimizations may affect the BBs' dominate tree. The old logic will re-iterate and try optimize for each BB. Suppose we have a big function with 20000 BBs, If we handled the last BB with fine tuning the dominate tree. We need totally re-iterate and try optimize the 20000 BBs from the beginning. The Complex is near N! And we really encounter somes big tests (> 20000 BBs) that cost more than 30 mins in this pass. (Debug version compiler will cost 2 hours here) What this patch do for huge function ? It mainly changes the iteration way for optimization. 1 We do optimizeBlock for each BB (that is same with old way). And, in the meaning time, If BB is changed/updated in the optimization, it will be put into FreshBBs (try do optimizeBlock again). The new created BB at previous iteration will also put into FreshBBs. 2 For the BBs which not updated at previous iteration, we directly skip it. Strictly speaking, here may miss some opportunity, but the probability is very small. 3 For Instructions in single BB, we do optimizeInst for each instruction. If optimizeInst change the instruction dominator in this BB, rather than break and go back to optimize the first BB (the old way), we directly iterate instructions (to do optimizeInst) in this updated BB again (the new way). What this patch do for small/normal (not huge) function ? It is same with the Old Algorithm. (NFC) Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D129352	2022-09-07 10:05:40 +08:00
Markus Böck	f049b2c3fc	[MC] Emit Stackmaps before debug info This patch is essentially an alternative to https://reviews.llvm.org/D75836 and was mentioned by @lhames in a comment. The gist of the issue is that Mach-O has restrictions on which kind of sections are allowed after debug info has been emitted, which is also properly asserted within LLVM. Problem is that stack maps are currently emitted as one of the last sections in each target-specific AsmPrinter so far, which would cause the assertion to trigger. The current approach of special casing for the `__LLVM_STACKMAPS` section is not viable either, as downstream users can overwrite the stackmap format using plugins, which may want to use different sections. This patch fixes the issue by emitting the stack map earlier, right before debug info is emitted. The way this is implemented is by taking the choice when to emit the StackMap away from the target AsmPrinter and doing so in the base class. The only disadvantage of this approach is that the `StackMaps` member is now part of the base class, even for targets that do not support them. This is functionaly not a problem however, as emitting an empty `StackMaps` is a no-op. Differential Revision: https://reviews.llvm.org/D132708	2022-09-06 20:20:56 +02:00
Amara Emerson	fe7c3b87ce	Add parantheses to silence warning.	2022-09-06 15:36:19 +01:00
Marco Elver	7d63983c65	[SelectionDAG] Properly copy ExtraInfo on RAUW During SelectionDAG legalization SDNodes with associated extra info may be replaced with a new SDNode. Preserve associated extra info on ReplaceAllUsesWith and remove entries in DeallocateNode. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130881	2022-09-06 16:32:50 +02:00
Marco Elver	cc3faf4226	[SelectionDAG] Rename CallSiteDbgInfo to NodeExtraInfo For information infrequently attached to SDNodes, it is useful to provide a way to add this information out-of-line. This is already done for call-site specific information. Rename CallSiteDbgInfo to NodeExtraInfo in preparation of adding additional information not necessarily related to call sites only. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130880	2022-09-06 16:32:50 +02:00
Matthias Gehre	2090e85fee	[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64 This adds the ExpandLargeDivRem to the default pass pipeline. The limit at which it expands div/rem instructions is configured via a new TargetTransformInfo hook (default: no expansion) X86, Arm and AArch64 backends implement this hook to expand div/rem instructions with more than 128 bits. Differential Revision: https://reviews.llvm.org/D130076	2022-09-06 15:32:04 +01:00
Marco Elver	42836e283f	[MachineInstr] Allow setting PCSections in ExtraInfo Provide MachineInstr::setPCSection(), to propagate relevant metadata through the backend. Use ExtraInfo to store the metadata. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130876	2022-09-06 15:52:44 +02:00
Amara Emerson	3dd861818a	[GlobalISel] Combine G_INSERT/EXTRACT_VECTOR_ELT with out of bounds indices to undef. Differential Revision: https://reviews.llvm.org/D133309	2022-09-06 13:45:04 +01:00
Benjamin Kramer	c349d7f4ff	[SelectionDAG] Rewrite bfloat16 softening to use the "half promotion" path The main difference is that this preserves intermediate rounding steps, which the other route doesn't. This aligns bfloat16 more with half floats, which use this path on most targets. I didn't understand what the difference was between these softening approaches when I first added bfloat lowerings, would be nice if we only had one of them. Based on @pengfei 's D131502 Differential Revision: https://reviews.llvm.org/D133207	2022-09-06 11:54:34 +02:00
Daniil Fukalov	51d33afcbe	[RegisterCoalescer] Fix crash on early clobbered subreg operands. The issue was with processing two subregs of the same reg are used in the same instruction (e.g. inline asm): "def early-clobber" and other just "def". Register coalescer ran in bad recursion if the early clobbered subreg is second in the following sequence of COPYs. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127136	2022-09-06 08:42:37 +03:00
Daniil Fukalov	99d364d1f4	[MachineVerifier] Fix crash on early clobbered subreg operands. MachineVerifier tried to checkLivenessAtDef() ignoring it is actually a subreg. The issue was with processing two subregs of the same reg are used in the same instruction (e.g. inline asm): "def early-clobber" and other just "def". Reviewed By: foad Differential Revision: https://reviews.llvm.org/D126661	2022-09-05 17:08:21 +03:00
David Sherwood	ffa6267300	[CodeGen] Support extracting fixed-length vectors from illegal scalable vectors For some indices we can simply extract the fixed-length subvector from the low half of the scalable vector, for example when the index is less than the minimum number of elements in the low half. For all other cases we can expand the operation through the stack by storing out the vector and reloading the fixed-length part we need. Fixes https://github.com/llvm/llvm-project/issues/55412 Tests added here: CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll Differential Revision: https://reviews.llvm.org/D117499	2022-09-05 15:05:14 +01:00
Amara Emerson	fb60e50c78	[GlobalISel] Fix a combine crash due to a negative G_INSERT_VECTOR_ELT idx. These should really be folded away to undef but we shouldn't crash in any case.	2022-09-05 12:10:17 +01:00
Simon Pilgrim	4e6783f866	[DAG] getFreeze()/getNode() - account for operand depth when calling isGuaranteedNotToBeUndefOrPoison (PR57554) Similar to #57402 - we were calling isGuaranteedNotToBeUndefOrPoison on the freeze operand (with Depth = 0), but wasn't accounting for the fact that a later isGuaranteedNotToBeUndefOrPoison assertion will call from the new node (with Depth = 0 as well) - which will then recursively call isGuaranteedNotToBeUndefOrPoison for its operands with Depth = 1 Fixes #57554	2022-09-05 11:46:46 +01:00
Craig Topper	e529c0a2a0	[TargetLowering] Use ComputeMaxSignificantBits instead of ComputeNumSignBits in expandMUL_LOHI. NFC The way ComputeNumSignBits was being used was only correct if OuterBitSize is exactly 2x InnerBitSize. Which is always true, but not obviously so. Comparing ComputeMaxSignificantBits to InnerBitSize feels more correct.	2022-09-04 22:35:16 -07:00
Craig Topper	45e2809f71	[TargetLowering] Use getShiftAmountConstant. NFC	2022-09-04 20:22:32 -07:00
Kazu Hirata	32aa35b504	Drop empty string literals from static_assert (NFC) Identified with modernize-unary-static-assert.	2022-09-03 11:17:47 -07:00
Kazu Hirata	fedc59734a	[llvm] Use range-based for loops (NFC)	2022-09-03 11:17:40 -07:00
Kazu Hirata	89f1433225	Use llvm::lower_bound (NFC)	2022-09-03 11:17:37 -07:00
Kazu Hirata	bc96b36a41	[CodeGen] Use std::lcm (NFC)	2022-09-03 11:17:33 -07:00
Simon Pilgrim	62cdfdab4d	[DAG] canCreateUndefOrPoison - add freeze(insert_subvector(x,y,c)) -> insert_subvector(freeze(x),freeze(y),c) support We already have plenty of assertions in place to ensure that the insertion index is constant and inrange	2022-09-03 13:41:33 +01:00
Simon Pilgrim	e2d140e9c3	[TTI] Add isExpensiveToSpeculativelyExecute wrapper CGP uses a raw `getInstructionCost(I, TargetTransformInfo::TCK_SizeAndLatency) >= TCC_Expensive` check to see if its better to move an expensive instruction used in a select behind a branch instead. This is causing issues with upcoming improvements to TCK_SizeAndLatency costs on X86 as we need to use TCK_SizeAndLatency as an uop count (so its compatible with various target-specific buffer sizes - see D132288), but we can have instructions that have a low TCK_SizeAndLatency value but should still be treated as 'expensive' (FDIV for example) - by adding a isExpensiveToSpeculativelyExecute wrapper we can keep the current behaviour but still add an x86 override in a future patch when the cost tables are updated to compensate.	2022-09-03 13:12:22 +01:00
Daniil Fukalov	b4e1b0e00d	[LiveIntervals] Split live intervals on any dead def Each dead def of the same virtual register is required to be split into multiple virtual registers with separate live intervals to avoid MachineVerifier error. Partially fixes https://github.com/llvm/llvm-project/issues/56050 and https://github.com/llvm/llvm-project/issues/56051 Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D130477	2022-09-02 20:00:22 +03:00
David Green	5073499b69	[TypePromotionPass] Rename variable to avoid name conflict. NFC	2022-09-02 12:35:15 +01:00
Fangrui Song	8d95fd7e56	[MachineFunctionPass] Support -filter-passes for -print-changed [MachineFunctionPass] Support -filter-passes for -print-changed -filter-passes specifies a `PassID` (a lower-case dashed-separated pass name, also used by -print-after, -stop-after, etc) instead of a CamelCasePass. `-filter-passes=CamelCaseNewPMPass` seems like a workaround for new PM passes before we can use lower-case dashed-separated pass names (as used by `-passes=`). Example: ``` # getPassName() is "IRTranslator". PassID is "irtranslator" llc -mtriple=aarch64 -print-changed -filter-passes=irtranslator < print-changed-machine.ll ``` Close https://github.com/llvm/llvm-project/issues/57453 Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D133055	2022-09-01 11:06:06 -07:00
Nikita Popov	5134bd432f	[DwarfEhPrepare] Assign dummy debug location for inserted _Unwind_Resume calls (PR57469) DwarfEhPrepare inserts calls to _Unwind_Resume into landing pads. If _Unwind_Resume happens to be defined in the same module and debug info is used, then this leads to a verifier error: inlinable function call in a function with debug info must have a !dbg location call void @_Unwind_Resume(ptr %exn.obj) #0 Fix this by assigning a dummy location to the call. (As this happens in the backend, inlining is not actually relevant here.) Fixes https://github.com/llvm/llvm-project/issues/57469. Differential Revision: https://reviews.llvm.org/D133095	2022-09-01 16:35:49 +02:00
Nikita Popov	c635ea5c50	[CombinerHelper] Avoid deprecated method (NFC)	2022-09-01 16:09:05 +02:00
Stephen Tozer	211efaa1ce	Reapply "[DebugInfo] Extend the InstrRef LDV to support DbgValues with many Ops" Re-landing with an erroneous assert removed. This reverts commit `58d104b352`.	2022-09-01 14:20:24 +01:00
Amara Emerson	4cf3db41da	[GlobalISel] Add sdiv exact (X, constant) -> mul combine. This port of the SDAG optimization is only for exact sdiv case. Differential Revision: https://reviews.llvm.org/D130517	2022-09-01 13:34:00 +01:00
Craig Topper	77dbc5200b	[MachineCSE] Use TargetInstrInfo::isAsCheapAsAMove in isPRECandidate. Some targets like RISC-V require operands to be inspected to determine if an instruction is similar to a move. Spotted while investigating code differences between using an ADDI vs an ADDIW. RISC-V has the isAsCheapAsAMove flag for ADDI, but the TII hook checks the immediate is 0 or the register is X0. ADDIW is never generated with X0 or with an immediate of 0 so it doesn't have the isAsCheapAsAMove flag. I don't know enough about the PRE code to write a test for this yet. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132981	2022-08-31 15:39:41 -07:00
Sam Clegg	c5c4ba37b1	[WebAssembly][MC] Avoid the need for .size directives for functions Warn if `.size` is specified for a function symbol. The size of a function symbol is determined solely by its content. I noticed this simplification was possible while debugging #57427, but this change doesn't fix that specific issue. Differential Revision: https://reviews.llvm.org/D132929	2022-08-31 14:28:56 -07:00
Nick Desaulniers	d7474bef77	[llvm][TailDuplicator] don't taildup isInlineAsmBrIndirectTargets This fixes a crash observed after https://reviews.llvm.org/D129997. Similar to D88823. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D130127	2022-08-31 13:07:10 -07:00
Simon Pilgrim	eaede4b5b7	[DAG] extractShiftForRotate - replace assertion for shift opcode with an early-out We feed the result from the first extractShiftForRotate call into the second, and that result might no longer be a shift op (usually due to constant folding). NOTE: We REALLY need to stop creating nodes on the fly inside extractShiftForRotate! Fixes Issue #57474	2022-08-31 15:50:48 +01:00
Simon Pilgrim	9d22800275	[DAG] visitFreeze - account for operand depth when calling isGuaranteedNotToBeUndefOrPoison (PR57402) We were calling isGuaranteedNotToBeUndefOrPoison on operands (with Depth = 0), but wasn't accounting for the fact that a later isGuaranteedNotToBeUndefOrPoison assertion will call from the new node (with Depth = 0 as well) - which will then recursively call isGuaranteedNotToBeUndefOrPoison for its operands with Depth = 1 Fixes #57402	2022-08-31 12:20:30 +01:00
Kai Luo	ad2f7fd286	[AtomicExpand] Make floating point conversion happens before fence insertion IIUC, the conversion part is not part of atomic operations and fences should be put around converted atomic operations. This also fixes atomic load of floating point values which requires fence on PowerPC. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D127609	2022-08-31 09:54:58 +08:00
Markus Böck	2fdf963daf	[GlobalISel] Explicitly fail trying to translate `gc.statepoint` and related intrinsics The provided testcase would previously fail with an assertion due to later down below trying to allocate registers for `token` return types and arguments. This is especially problematic as the process would then exit instead of falling back to using FastIsel. This patch fixes that by simply explicitly failing translation if either of these intrinsics are encountered. Fixes https://github.com/llvm/llvm-project/issues/57349 Differential Revision: https://reviews.llvm.org/D132974	2022-08-31 00:47:17 +02:00
David Penry	9aca7b0217	[ModuloScheduler] Fix missing LLVM_DEBUG Guard a debug message with LLVM_DEBUG Differential Revision: https://reviews.llvm.org/D132895	2022-08-30 09:20:37 -07:00
Tomas Matheson	9a390d6692	[AArch64][GISel] fix G_ADD/G_SUB legalization widenScalarDst updates the insert point to after MI, so widenScalarSrc must be called before widenScalarDst. Otherwise The updated Src values will appear after MI and break SSA. e.g.: %14:_(s64), %15:_(s1) = G_UADDE %9:_, %11:_, %13:_ becomes %14:_(s64), %16:_(s32) = G_UADDE %9:_, %11:_, %17:_ %15:_(s1) = G_TRUNC %16:_(s32) %17:_(s32) = G_ZEXT %13:_(s1) Differential Revision: https://reviews.llvm.org/D132547 Change-Id: Ie3458747a6879433f4d5ab9939d2bd102dd0f2db	2022-08-30 10:59:32 +01:00
Xiang1 Zhang	a808ac2e42	[NFC] Clang-format for CodeGenPrepare.cpp	2022-08-30 13:42:36 +08:00
wanglian	e2bb9774b1	[LegalizeTypes] Support widen result for VECTOR_REVERSE. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D132359	2022-08-30 10:01:26 +08:00
Craig Topper	2f811a6c7f	[VP][RISCV] Add vp.fabs intrinsic and RISC-V support. Mostly just modeled after vp.fneg except there is a "functional instruction" for fneg while fabs is always an intrinsic. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D132793	2022-08-29 09:32:06 -07:00
Kazu Hirata	267f21a21b	Use std::gcd (NFC) This patch replaces calls to greatestCommonDivisor with std::gcd where two arguments are of the same type. This means that std::common_type_t of the argument type is the same as the argument type. We could drop calls to std::abs in some cases, but that's left for another patch.	2022-08-28 10:41:51 -07:00
Kazu Hirata	d1688e9ddf	[llvm] Use std::gcd (NFC) This patch replaces calls to greatestCommonDivisor with std::gcd where both arguments are known to be of unsigned. This means that std::common_type_t of the two argument types should just be the wider one of the two.	2022-08-27 23:54:29 -07:00
Kazu Hirata	9d6ab7230b	[GlobalISel] Use std::lcm (NFC) This patch replaces getLCMSize with std::lcm, a C++17 feature. Note that all the arguments are of unsigned with no implicit type conversion as they are passed to getLCMSize.	2022-08-27 09:53:16 -07:00
Kazu Hirata	21de2888a4	Use llvm::is_contained (NFC)	2022-08-27 09:53:11 -07:00
Matthias Gehre	3e39b27101	[llvm/CodeGen] Add ExpandLargeDivRem pass Adds a pass ExpandLargeDivRem to expand div/rem instructions with more than 128 bits into a loop computing that value. As discussed on https://reviews.llvm.org/D120327, this approach has the advantage that it is independent of the runtime library. This also helps the clang driver, which otherwise would need to understand enough about the runtime library to know whether to allow _BitInts with more than 128 bits. Targets are still free to disable this pass and instead provide a faster implementation in a runtime library. Fixes https://github.com/llvm/llvm-project/issues/44994 Differential Revision: https://reviews.llvm.org/D126644	2022-08-26 11:55:15 +01:00
Simon Pilgrim	88c7b16bed	[DAG] Strip poison generating flags in freeze(op()) -> op(freeze()) fold This patch follows the InstCombine approach of stripping poison generating flags (nsw/nuw from add/sub etc.) to allow us to push a freeze() through the op. Unlike InstCombine it doesn't retain any flags, but we have plenty of DAG folds that do the same thing already. We assert that the newly generated op isGuaranteedNotToBeUndefOrPoison. Similar to the ValueTracking approach, isGuaranteedNotToBeUndefOrPoison has been updated to confirm that if an op can't create undef/poison and its operands are guaranteed not to be undef/poison - then its not undef/poison. This is just for the generic opcodes - target specific opcodes will need to do this manually just in case they have some special cases. Differential Revision: https://reviews.llvm.org/D132333	2022-08-26 11:47:51 +01:00
Matthias Gehre	6d13b80fcb	Revert "[SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers" This reverts https://reviews.llvm.org/D120329. I abandoned the PR [0] to add __divei4 functions to compiler-rt in favor of adding a pass to transform div/rem [1]. This removes the backend code that was supposed to emit calls to the __divei4 functions. [0] https://reviews.llvm.org/D120327 [1] https://reviews.llvm.org/D130076 Differential Revision: https://reviews.llvm.org/D130079	2022-08-26 10:52:56 +01:00
Alex Richardson	0483b00875	Mark the $local function begin symbol as a function While this does not matter for most targets, when building for Arm Morello, we have to mark the symbol as a function and add size information, so that LLD can correctly evaluate relocations against the local symbol. Since Morello is an out-of-tree target, I tried to reproduce this with in-tree backends and with the previous reviews applied this results in a noticeable difference when targeting Thumb. Background: Morello uses a method similar Thumb where the encoding mode is specified in the LSB of the symbol. If we don't mark the target as a function, the relocation will not have the LSB set and calls will end up using the wrong encoding mode (which will almost certainly crash). Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D131429	2022-08-26 09:34:04 +00:00
wanglian	2887d7786f	[DAGCombiner] Use FoldConstantArithmetic instead of dyn_cast in visitFP_ROUND. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D132439	2022-08-25 11:29:05 +08:00
Matthias Braun	5364f49407	Fix CSR update check D132080 introduced a bug leading to `RegisterClassInfo` caches not getting invalidated when there was exactly one more CSR register added. Differential Revision: https://reviews.llvm.org/D132606	2022-08-24 18:09:49 -07:00
Sami Tolvanen	cff5bef948	KCFI sanitizer The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Relands `67504c9549` with a fix for 32-bit builds. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296	2022-08-24 22:41:38 +00:00
Sami Tolvanen	a79060e275	Revert "KCFI sanitizer" This reverts commit `67504c9549` as using PointerEmbeddedInt to store 32 bits breaks 32-bit arm builds.	2022-08-24 19:30:13 +00:00
Sami Tolvanen	67504c9549	KCFI sanitizer The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296	2022-08-24 18:52:42 +00:00
spupyrev	8d5b694da1	extending code layout alg The diff modifies ext-tsp code layout algorithm in the following ways: (i) fixes merging of cold block chains (this is a port of D129397); (ii) adjusts the cost model utilized for optimization; (iii) adjusts some APIs so that the implementation can be used in BOLT; this is a prerequisite for D129895. The only non-trivial change is (ii). Here we introduce different weights for conditional and unconditional branches in the cost model. Based on the new model it is slightly more important to increase the number of "fall-through unconditional" jumps, which makes sense, as placing two blocks with an unconditional jump next to each other reduces the number of jump instructions in the generated code. Experimentally, this makes a mild impact on the performance; I've seen up to 0.2%-0.3% perf win on some benchmarks. Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D129893	2022-08-24 09:40:25 -07:00
Simon Pilgrim	f9de13232f	[X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis. For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling. Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU. Differential Revision: https://reviews.llvm.org/D132520	2022-08-24 17:28:18 +01:00
Stephen Tozer	58d104b352	Revert "[DebugInfo] Extend the InstrRef LDV to support DbgValues with many Ops" Reverting due to reported errors when running Linux kernel builds with KMSAN -gdwarf-4. This reverts commit `2cb9e1ac42`.	2022-08-24 15:24:32 +01:00
Simon Pilgrim	5377abcde2	[DAG] matchRotateHalf - constify SelectionDAG arg. NFC. Based off Issue #57283 - we need to try harder to ensure we're not creating nodes on-the-fly - so make sure we're just using SelectionDAG for analysis where possible	2022-08-24 10:57:38 +01:00

... 3 4 5 6 7 ...

33283 Commits