llvm-project

Commit Graph

Author	SHA1	Message	Date
Mikael Holmen	e7b169a8ae	[AMDGPU] Fix gcc warnings about unused variables [NFC]	2021-09-23 08:08:00 +02:00
Johannes Doerfert	c6457dcae8	[OpenMP][FIX] Be more deliberate about invalidating the AAKernelInfo state This patch fixes a problem when the AAKernelInfo state was invalidated, e.g., due to `optnone` for a kernel, but not all parts indicated the invalidation properly. We further eliminate most full state invalidations as they should never be necessary. Differential Revision: https://reviews.llvm.org/D109468	2021-09-23 00:04:30 -05:00
Johannes Doerfert	0a16c56010	[OpenMP][NFC] Improve debug output	2021-09-23 00:04:29 -05:00
Usman Nadeem	3b12282b0e	[AArch64][SVE][InstCombine] Eliminate redundant chains of tuple get/set Differential Revision: https://reviews.llvm.org/D109667 Change-Id: I06a3c28e3658ecda109a3a1b73265828274ab2ea	2021-09-22 20:59:46 -07:00
Wang, Pengfei	ebec077e07	[X86][FP16] Change the order of the operands in complex FMA intrinsics to allow swap between the mul operands. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D109658	2021-09-23 11:02:48 +08:00
Freddy Ye	13207a21a6	[NFC] Remove redundant setOperationAction. [FROUND,FROUNDEVEN][f32, f64, f128] are set Expand twice. Differential Revision: https://reviews.llvm.org/D110302	2021-09-23 10:28:21 +08:00
hyeongyu kim	10a5632550	[NFC][InstCombine] Fix inconsistent comments	2021-09-23 09:31:39 +09:00
Zhi An Ng	1552179ac0	[WebAssembly] Add relaxed-simd feature This currently only defines a constant, but it the future will be used to gate builtins for experimenting and prototyping relaxed-simd proposal (https://github.com/WebAssembly/relaxed-simd/). Differential Revision: https://reviews.llvm.org/D110111	2021-09-22 14:52:50 -07:00
Craig Topper	f0a422f935	[RISCV] Add fcvt.s.w(u)/fcvt.d.w(u)/fcvt.h.w(u) to hasAllNBitUsers These instructions only read the lower 32 bits of their input.	2021-09-22 14:24:26 -07:00
Shilei Tian	423d34f74a	[OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit` This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110279	2021-09-22 17:16:41 -04:00
Sanjay Patel	1cd6b44f26	[InstCombine] add one-use check to shift-shift transform We don't want to create extra instructions, and this could infinite loop with the proposed transform in D110170.	2021-09-22 16:31:12 -04:00
Sanjay Patel	a85d7a56c7	[ValueTracking] fix isOnlyUsedInZeroEqualityComparison with no users This is another problem exposed by: https://bugs.llvm.org/PR50836	2021-09-22 15:01:53 -04:00
Sanjay Patel	b05804ab4c	[Analysis] reduce code for isOnlyUsedInZeroEqualityComparison; NFC There's a bug here noted by the FIXME and visible in variations of PR50836.	2021-09-22 14:57:53 -04:00
David Green	c49611f909	Mark CFG as preserved in TypePromotion and InterleaveAccess passes Neither of these passes modify the CFG, allowing us to preserve DomTree and LoopInfo across them by using setPreservesCFG. Differential Revision: https://reviews.llvm.org/D110161	2021-09-22 18:58:00 +01:00
Sanjay Patel	c240169ff2	[Analysis] improve function matching for strlen libcall The return type of strlen is size_t, not just any integer. This is a partial fix for an example based on: https://llvm.org/PR50836 There's another bug here because we can still crash processing a real strlen or something that looks like it.	2021-09-22 13:50:12 -04:00
Daniil Fukalov	1a7b7d7ba2	[NFCI][CodeGen, AArch64] Fix inconsistent TargetCostKind types. The pass uses different cost kinds to estimate "old" and "interleaved" costs: default cost kind for all targets override `getInterleavedMemoryOpCost()` is `TCK_SizeAndLatency`. Although at the moment estimated `TCK_Latency` costs are equal to `TCK_SizeAndLatency`, (so the change is NFC) it may change in future. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110100	2021-09-22 20:15:17 +03:00
Arthur Eubanks	e7249e4acf	[SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest When determining whether to fold branches to a common destination by merging two blocks, SimplifyCFG will count the number of instructions to be moved into the first basic block. However, there's no reason to count free instructions like bitcasts and other similar instructions. This resolves missed branch foldings with -fstrict-vtable-pointers in llvm-test-suite's lambda benchmark. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D108837	2021-09-22 09:52:37 -07:00
Craig Topper	b33a1cc05b	[RISCV] Optimize vp.store with an all ones mask to avoid a vmset. We can use riscv_vse intrinsic instead of riscv_vse_mask. The code here is based on similar code for handling masked.scatter and vp.scatter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D110206	2021-09-22 09:12:47 -07:00
Shilei Tian	b205b3300b	[NFC] clang-format -i llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp	2021-09-22 12:10:20 -04:00
Hongtao Yu	d9b511d8e8	[CSSPGO] Set PseudoProbeInserter as a default pass. Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D110209	2021-09-22 09:09:48 -07:00
Kazu Hirata	3c557cd7f9	[CodeGen] Remove redundant declaration MIRCanonicalizerID (NFC) Note that MIRCanonicalizerID is declared in llvm/include/llvm/CodeGen/Passes.h, which MIRCanonicalizerPass.cpp includes. Identified with readability-redundant-declaration.	2021-09-22 08:58:27 -07:00
Simon Pilgrim	8a44281f47	[SLP] getReductionCost - use explicit TTI::TCK_RecipThroughput CostKind. NFCI. Avoid relying on the default cost kinds in TTI calls (we already do this in other places in SLP) - noticed while trying to see how much work it'd be to extend D110242 and remove all remaining uses of default CostKind arguments.	2021-09-22 16:52:22 +01:00
hyeongyu kim	98e96663f6	[InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (3/3) This patch is for fixing potential shufflevector-related bugs like D93818. As D93818, this patch change shufflevector's default placeholder to poison. To reduce risk, it was divided into several patches, and this patch is for InstCombineVectorOps. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110230	2021-09-23 00:48:24 +09:00
Shilei Tian	ca999f7191	[OpenMP][Offloading] Use bitset to indicate execution mode instead of value The execution mode of a kernel is stored in a global variable, whose value means: - 0 - SPMD mode - 1 - indicates generic mode - 2 - SPMD mode execution with generic mode semantics We are going to add support for SIMD execution mode. It will be come with another execution mode, such as SIMD-generic mode. As a result, this value-based indicator is not flexible. This patch changes to bitset based solution to encode execution mode. Each position is: [0] - generic mode [1] - SPMD mode [2] - SIMD mode (will be added later) In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode execution with generic mode semantics. In the future after we add the support for SIMD mode, `0b1xx` will be in SIMD mode. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110029	2021-09-22 11:40:52 -04:00
hyeongyu kim	ec8311444a	[InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (2/3) This patch is for fixing potential shufflevector-related bugs like D93818. As D93818, this patch change shufflevector's default placeholder to poison. To reduce risk, it was divided into several patches, and this patch is for InstCombineCompares and InstructionCombining. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110227	2021-09-23 00:14:50 +09:00
Simon Pilgrim	b1f38a27f0	[Target][CodeGen] Remove default CostKind arguments on inner/impl TTI overrides Based off a discussion on D110100, we should be avoiding default CostKinds whenever possible. This initial patch removes them from the 'inner' target implementation callbacks - these should only be used by the main TTI calls, so this should guarantee that we don't cause changes in CostKind by missing it in an inner call. This exposed a few missing arguments in getGEPCost and reduction cost calls that I've cleaned up. Differential Revision: https://reviews.llvm.org/D110242	2021-09-22 15:28:08 +01:00
hyeongyu kim	e5aaf03326	[InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (1/3) This patch is for fixing potential shufflevector-related bugs like D93818. As D93818, this patch change shufflevector's default placeholder to poison. To reduce risk, it was divided into several patches, and this patch is for InstCombineCasts. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110226	2021-09-22 23:18:51 +09:00
Joseph Huber	1cf86df883	[OpenMP] Make sure the Thread ID function is not removed Summary: The thread ID function was reintroduced in D110195, but could potentially be removed by the optimizer. Make the function noinline to preserve the call sites and add it to the externalization RAII so its definition is not removed by the attributor.	2021-09-22 10:13:18 -04:00
Sander de Smalen	6375ca4059	[AArch64][SVE] Add extract_subvector patterns for unpacked fp16 and bfloat types. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D110163	2021-09-22 14:25:17 +01:00
Sander de Smalen	3e8d2008f7	[SelectionDAG] Remove PromoteIntOp_EXTRACT_SUBVECTOR. This code seems untested and is likely obsolete, because this case should already be handled by the code that legalizes the result type of EXTRACT_SUBVECTOR. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110061	2021-09-22 14:23:35 +01:00
Tim Northover	3a00e58c2f	AArch64: use indivisible cmpxchg for 128-bit atomic loads at O0 Like normal atomicrmw operations, at -O0 the simple register-allocator can insert spills into the LL/SC loop if it's expanded and visible when regalloc runs. This can cause the operation to never succeed by repeatedly clearing the monitor. Instead expand to a cmpxchg, which has a pseudo-instruction for -O0.	2021-09-22 14:20:43 +01:00
Sander de Smalen	d5681f1d68	[SelectionDAG] Add PromoteIntOp_INSERT_SUBVECTOR. This is required to codegen something like: <vscale x 8 x i16> @llvm.experimental.vector.insert(<vscale x 8 x i16> %vec, <vscale x 2 x i16> %subvec, i64 %idx) where the output vector is legal, but the input vector needs promoting. It implements this by performing the whole operation on the promoted type, and then truncating the result. Reviewed By: david-arm, craig.topper Differential Revision: https://reviews.llvm.org/D110059	2021-09-22 13:32:36 +01:00
Florian Hahn	a7c6471a85	[Passes] Run vector-combine early with -fenable-matrix. IR with matrix intrinsics is likely to also contain large vector operations, which can benefit from early simplifications. This is the last step in a series of changes to improve code-gen for code using matrix subscript operators with the C/C++ matrix extension in CLang, like using matrix_t = double __attribute__((matrix_type(15, 15))); void foo(unsigned i, matrix_t &A, matrix_t &B) { for (unsigned j = 0; j < 4; ++j) for (unsigned k = 0; k < i; k++) B[k][j] -= A[k][j] * B[i][j]; } https://clang.godbolt.org/z/6dKxK1Ed7 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D102496	2021-09-22 12:48:32 +01:00
Sanjay Patel	c6013f71a4	Revert "[InstCombine] fold cast of right-shift if high bits are not demanded" This reverts commit `2f6b07316f`. This caused several bots to hit an infinite loop at stage 2, so it needs to be reverted while figuring out how to fix that.	2021-09-22 07:45:21 -04:00
David Green	02cd8a6b91	[ARM] Allow smaller VMOVL in tail predicated loops This allows VMOVL in tail predicated loops so long as the the vector size the VMOVL is extending into is less than or equal to the size of the VCTP in the tail predicated loop. These cases represent a sign-extend-inreg (or zero-extend-inreg), which needn't block tail predication as in https://godbolt.org/z/hdTsEbx8Y. For this a vecsize has been added to the TSFlag bits of MVE instructions, which stores the size of the elements that the MVE instruction operates on. In the case of multiple size (such as a MVE_VMOVLs8bh that extends from i8 to i16, the largest size was be chosen). The sizes are encoded as 00 = i8, 01 = i16, 10 = i32 and 11 = i64, which often (but not always) comes from the instruction encoding directly. A unit test was added, and although only a subset of the vecsizes are currently used, the rest should be useful for other cases. Differential Revision: https://reviews.llvm.org/D109706	2021-09-22 12:07:52 +01:00
Yi Kong	d0746f2e9b	Don't fold (select C, (gep Ptr, Idx), Ptr) if C is vector but Idx is scalar The folding rule (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) creates a malformed SELECT IR if C is a vector while Idx is scalar. SELECT VecC, ScalarIdx, 0 We could splat Idx to a vector but it defeats the purpose of optimisation. Don't apply the folding rule in this case. This fixes a regression from commit `d561b6fbdb`.	2021-09-22 18:11:33 +08:00
Florian Mayer	36daf074d9	[hwasan] also omit safe mem[cpy\|mov\|set]. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D109816	2021-09-22 11:08:27 +01:00
Sander de Smalen	4ca1fbe361	[SelectionDAG] Make WidenVecRes_Convert work for scalable vectors. Most of the code wasn't yet scalable safe, although most of the code conceptually just works for scalable vectors. This change makes the algorithm work on ElementCount, where appropriate, and leaves the fixed-width only code to use `getFixedNumElements`. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D110058	2021-09-22 10:58:38 +01:00
Florian Hahn	300870a95c	[VectorCombine] Switch to using a worklist. This patch updates VectorCombine to use a worklist to allow iterative simplifications where a combine enables other combines. Suggested in D100302. The main use case at the moment is foldSingleElementStore and scalarizeLoadExtract working together to improve scalarization. Note that we now also do not run SimplifyInstructionsInBlock on the whole function if there have been changes. This means we fail to remove/simplify instructions not related to any of the vector combines. IMO this is fine, as simplifying the whole function seems more like a workaround for not tracking the changed instructions. Compile-time impact looks neutral: NewPM-O3: +0.02% NewPM-ReleaseThinLTO: -0.00% NewPM-ReleaseLTO-g: -0.02% http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions Reviewed By: spatel, lebedev.ri Differential Revision: https://reviews.llvm.org/D110171	2021-09-22 09:54:58 +01:00
Sander de Smalen	ab3607c0ed	[AArch64][SVE] Add missing load/store patterns for unpacked bfloat vectors. Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D110063	2021-09-22 09:45:33 +01:00
Jay Foad	0205806d0f	[AMDGPU] Convert mac/fmac to mad/fma when folding output modifiers Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac instruction, so we might as well convert it to the more flexible VOP3- only mad/fma form. With this change, the only way we should emit VOP3-encoded mac/fmac is if regalloc chooses registers that require the VOP3 encoding, e.g. sgprs for both src0 and src1. In all other cases the mac/fmac should either be converted to mad/fma or shrunk to VOP2 encoding. Differential Revision: https://reviews.llvm.org/D110156	2021-09-22 09:36:34 +01:00
Jay Foad	3828ea6181	[AMDGPU] Divergence-driven instruction selection for mul i32 Differential Revision: https://reviews.llvm.org/D109881	2021-09-22 09:36:34 +01:00
Florian Hahn	e08a5dc86f	[InstCombine] Move InstCombineWorklist to Utils to allow reuse (NFC). InstCombine's worklist can be re-used by other passes like VectorCombine. Move it to llvm/Transform/Utils and rename it to InstructionWorklist. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D110181	2021-09-22 08:47:21 +01:00
Matt Arsenault	ec55dcedce	AMDGPU: Refactor getWavesPerEU to separate flat workgroup size query Add an overload to pass the flat workgroup range in separately. This will allow the attributor to use the assumed value for amdgpu-flat-workgroup-sizes when inferring amdgpu-waves-per-eu.	2021-09-21 22:57:17 -04:00
Chen Zheng	ffa9fa9ed2	[PowerPC] prepare for udpate form with non-const increment. This is a follow-up of D105872. Now we are able to prepare for update form with non-const increment. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D106032	2021-09-22 02:54:28 +00:00
Wenlei He	5f187f0afa	[SamplePGO] Add switch to honor zero count on block level as accurate Add a new LLVM switch `-profile-sample-block-accurate` to trust zero block counts for branches. Currently we leave out such zero counts when annotating branch weight metadata, which would lead to weights being considered as unknown. Differential Revision: https://reviews.llvm.org/D110117	2021-09-21 17:06:37 -07:00
Usman Nadeem	645b8f5365	[AArch64][SVE] Add patterns to generate ADR instruction Differential Revision: https://reviews.llvm.org/D109665 Change-Id: I9d2928688b80b804a16f52928e2057749ec2c0b2	2021-09-21 15:50:49 -07:00
Arthur Eubanks	e42234383e	Make DiagnosticInfoResourceLimit's limit param required And always print it. This makes some LLVM diagnostics match up better with Clang's diagnostics. Updated some AMDGPU uses of DiagnosticInfoResourceLimit and now we print better diagnostics for those. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D110204	2021-09-21 15:27:58 -07:00
Kirill Stoimenov	2649999579	[asan] Fixed a bug causing a crash when redzone optimization kicked in on X86 with -asan-optimize-callbacks flag on. This change adds the ASan intrinsic to the list whihc are setting hasCopyImplyingStackAdjustment. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D110012	2021-09-21 22:26:03 +00:00
Craig Topper	b81e26c7f4	Recommit "[X86] Clear kill flags when rewriting SETCC uses in flag copy lowering." This time with the right bug number. When we rewrite the setcc we replace set old setcc output register with the new CondReg. But since CondReg can be shared by other replacements, we don't know if the kill flags for the old register are valid for CondReg. So be conservative and remove them. The test case has a SETCCr and a SETCCm on the same condition so they end up sharing the same CondReg. The SETCCr had one use with a kill flag. This kill flag isn't valid after the replacement because CondReg needs a live range extending to the later SETCCm replacment. Fixes PR51903.	2021-09-21 14:59:25 -07:00
Xu Mingjie	32ab405717	[LTO] Emit DebugLoc for dead function in optimization remarks Currently, the dead functions information getting from optimizations remarks does not contain debug location, but knowing where these dead functions locate could be useful for debugging or for detecting dead code. Cause in `LTO::addRegularLTO()` we use `BitcodeModule::getLazyModule()` to read the bitcode module, when we pass Function F to `ore::NV()`, F is not materialized, so `F->getSubprogram()` returns nullptr, and there is no debug location information of dead functions in optimizations remarks. This patch call `F->materialize()` before we pass Function F to `ore::NV()`, then debug location information will be emitted for dead functions in optimization remarks. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D109737	2021-09-21 14:50:21 -07:00
Craig Topper	51a82e051e	Revert "[X86] Clear kill flags when rewriting SETCC uses in flag copy lowering." This reverts commit `7550f146ff`. I botched the bug number.	2021-09-21 14:33:44 -07:00
Craig Topper	7550f146ff	[X86] Clear kill flags when rewriting SETCC uses in flag copy lowering. When we rewrite the setcc we replace set old setcc output register with the new CondReg. But since CondReg can be shared by other replacements, we don't know if the kill flags for the old register are valid for CondReg. So be conservative and remove them. The test case has a SETCCr and a SETCCm on the same condition so they end up sharing the same CondReg. The SETCCr had one use with a kill flag. This kill flag isn't valid after the replacement because CondReg needs a live range extending to the later SETCCm replacment. Fixes PR51908. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110046	2021-09-21 14:29:46 -07:00
George Burgess IV	cd5f582c3d	MemoryBuiltins: update comment; NFC This comment references behavior that was removed in `ccae43a247`, which is a commit from 5 years ago. It seems safe to assume that that behavior won't be coming back soon. If it does, we can readd this part of the comment :)	2021-09-21 13:47:26 -07:00
Sanjay Patel	2f6b07316f	[InstCombine] fold cast of right-shift if high bits are not demanded (masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C Narrowing the shift should be better for analysis and can lead to follow-on transforms as shown. Attempt at a general proof in Alive2: https://alive2.llvm.org/ce/z/tRnnSF Here are a couple of the specific tests: https://alive2.llvm.org/ce/z/bCnTp- https://alive2.llvm.org/ce/z/TfaHnb Differential Revision: https://reviews.llvm.org/D110170	2021-09-21 16:09:08 -04:00
Antonio Frighetto	43d6991c2a	[IR] Look through bitcast in hasFnAttribute() A logic incompleteness may lead MemorySSA to be too conservative in its results. Specifically, when dealing with a call of kind `call i32 bitcast (i1 (i1)* @test to i32 (i32)*)(i32 %1)`, where the function `test` is declared with readonly attribute, the bitcast is not looked through, obscuring function attributes. Hence, some methods of CallBase (e.g., doesNotReadMemory) could provide suboptimal results. Differential Revision: https://reviews.llvm.org/D109888	2021-09-21 21:57:02 +02:00
Nikita Popov	e4a1af3724	[MergeICmps] Remove unused NumMerged variable	2021-09-21 21:43:25 +02:00
Nikita Popov	f2fa6ad047	[MergeICmps] Don't reorder unmerged comparisons MergeICmps will currently sort (by offset) all comparisons in a chain, including those that do not get merged. This is problematic in two ways: * We may end up moving the original first block into the middle of the chain, in which case the "extra work" instructions will also be in the middle of the chain, resulting in invalid IR (reported in https://reviews.llvm.org/D108782#3005583). * Reordering branches is generally not legal, because it may introduce branch on poison, which is UB (PR51845). The merging done by MergeICmps is legal as long as we assume that memcmp() works on frozen memory, but the reordering of unmerged comparisons is definitely incorrect (without inserting freeze instructions), so we should avoid it. There are easier ways to fix the first issue, but I figured it was worthwhile to do this properly to also fix the second one. What we now do is to restore the original relative order of (potentially merged) comparisons. I took the liberty of dropping the MERGEICMPS_DOT_ON functionality, because it would be more awkward to implement now (as the before and after representation is different) and it doesn't seem terribly useful nowadays. Differential Revision: https://reviews.llvm.org/D110024	2021-09-21 21:22:12 +02:00
David Blaikie	49c519a848	DebugInfo: Rebuild decltype(nullptr) as 'std::nullptr_t' Now that Clang's been changed to render nullptr types/template parameters as 'std::nullptr_t' do the same thing down here. (Clang commit: `131e878664` )	2021-09-21 11:37:30 -07:00
Michael Liao	2d1ffad010	[IR] Re-group AAMDNodes relevant interfaces. NFC.	2021-09-21 14:29:33 -04:00
alex-t	1a33294652	[AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC Normally, given that the DA results are kept consistent over the selection DAG, uniform comparisons get selected to S_CMP_* but divergent to V_CMP_*. Sometimes, for the sake of efficiency, SSA subgraphs may be converted to VALU to avoid repeatedly copying data back and forth. Hence we have to be able to sustain the correctness passing the i1 from VALU to SALU context and vice versa. VALU operations only process the active lanes of the VGPR and ignore inactive ones. Active lanes correspond to 1 bit in the EXEC mask register. SALU represents i1 as just one bit but VALU as 64bits: 0/1 and 0/(0xffffffffffffffff & EXEC) respectively. SALU uses one-bit conditional flag SCC but VALU - VCC that is a pair of 32-bit SGPRs To expose SCC to the VALU context we need to convert the one-bit boolean value to the appropriate 64bit. To return back to the SALU context we need to do the opposite. To correctly convert 64bit VALU boolean to either 0 or 1 we need to filter out the bits corresponding to the inactive lanes. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D109900	2021-09-21 21:19:31 +03:00
Owen Anderson	b5fbbdd202	Teach InstCombine to eliminate malloc-realloc-free triplets. Reviewed By: majnemer Differential Revision: https://reviews.llvm.org/D109988	2021-09-21 18:07:49 +00:00
Brendon Cahoon	cbdf624bb8	[AMDGPU] Correctly merge alias.scope and noalias metadata for memops When adding alias.scope and noalias metadata to a memcpy function, the alias.scope and noalias metadata from the operands are merged. The rule for merging alias.scope is to take the intersection of the domains and the union of the scopes within those domains. The rule for merging noalias is to take the intersection. The bug is that AMDGPULowerModuleLDS was using concatenation for both alias.scope and noalias. For example, when f1 and f2 are added to the LDS structure and there is a memcpy(f2, f1, sizeof(f1)). Then, concatenation creates noalias metadata for the memcpy that includes both {f1, f2}. That means that the memcpy is assumed not to alias a prior load of f2, which enables the optimizer to remove a load of f2 that occurs after mempcy. The function MDNode::getmostGenericAliasScope defines the semantics for alias.scope. There is a function, combineMetadata in Local.cpp, that uses intersect for noalias. Differential Revision: https://reviews.llvm.org/D110049	2021-09-21 13:02:01 -05:00
Craig Topper	7c975665b4	[RISCV] Make some arrays of constants 'static const'. NFC This helps the compiler generate better code.	2021-09-21 10:52:47 -07:00
Danila Malyutin	78b51c7a2c	[LSR] Make sure that Factor fits into Base type Fixes pr42770 Differential Revision: https://reviews.llvm.org/D108772	2021-09-21 20:50:50 +03:00
Amy Kwan	2af57b6099	[PowerPC] Add prefix load pattern for fpext to v2f64 This patch adds a prefixed load pattern involving v2f32 fpext v2f64, where we are dealing with a value with an offset that fits into a 34-bit signed immediate. A reduced test case is also added to patch that tests the pattern, in which the pattern is tested in the big endian CHECKs of the newly added test. Differential Revision: https://reviews.llvm.org/D109887	2021-09-21 12:45:24 -05:00
Ayal Zaks	ab6a69dfea	[LV] Fix crash for reverse interleaved loads with gap under fold-tail. This patch fixes the crash found by PR51614: whenever doing tail folding, interleave groups must be considered under mask. Another fix D108900 follows for targets that support masked loads and stores: when deciding to vectorize with masked interleave groups, check if the access is reverse - which is currently not supported; rather than (only) asserting when computing cost and generating code. Differential Revision: https://reviews.llvm.org/D108891	2021-09-21 20:13:32 +03:00
Craig Topper	aeb63d464f	[RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for and/or/xor. This requires a minor change to CodeGenPrepare to ensure that shouldSinkOperands will be called for And. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D110106	2021-09-21 10:07:29 -07:00
Dávid Bolvanský	c0fdfc9af2	[InstCombine] powi(x, y) * powi(x, z) -> powi(x, y + z) We already have pow(x, y) * pow(x, z) -> pow(x, y + z) transformation, but we are missing same transformation for powi (power is integer). Requires reassoc. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D109954	2021-09-21 18:20:46 +02:00
Florian Hahn	5131037ea9	[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange. isValidAssumeForContext can provide better results with access to the dominator tree in some cases. This patch adjusts computeConstantRange to allow passing through a dominator tree. The use VectorCombine is updated to pass through the DT to enable additional scalarization. Note that similar APIs like computeKnownBits already accept optional dominator tree arguments. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D110175	2021-09-21 16:54:47 +01:00
Michael Liao	5fb3ae525f	[SelectionDAG] Re-calculate scoped AA metadata when merging stores. Reviewed By: jeroen.dobbelaere Differential Revision: https://reviews.llvm.org/D102821	2021-09-21 11:41:17 -04:00
Aleksandr Bezzubikov	624e4d087e	[GlobalISel] Support ConstantAsMetadata in IRTranslator When using instructions which have a MetadataAsValue argument (e.g. some target-specific intrinsics) MD canonicalization strips internal MDNodes with a single ConstantAsMetadata child. That prevented IRTranslator from the proper translation of such a calls.	2021-09-21 11:24:56 -04:00
Dmitry Preobrazhensky	3500e7d2b0	[AMDGPU][MC][GFX7][GFX10] Corrected image_atomic_fcmpswap Differential Revision: https://reviews.llvm.org/D109616	2021-09-21 18:06:02 +03:00
Ben Shi	b3052013b4	[RISCV] Optimize (add (mul x, c0), c1) Optimize (add (mul x, c0), c1) -> (ADDI (MUL (ADDI, c1/c0), c0), c1%c0), if c1/c0 and c1%c0 are simm12, while c1 is not. Optimize (add (mul x, c0), c1) -> (MUL (ADDI, c1/c0), c0), if c1%c0 is zero, and c1/c0 is simm12 while c1 is not. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D108607	2021-09-21 14:13:14 +00:00
Anna Thomas	69921f6f45	[InstCombine] Improve TryToSinkInstruction with multiple uses This patch allows sinking an instruction which can have multiple uses in a single user. We were previously over-restrictive by looking for exactly one use, rather than one user. Also added an API for retrieving a unique undroppable user. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109700	2021-09-21 10:04:04 -04:00
Dmitry Preobrazhensky	b8e7f53208	[AMDGPU][MC][GFX10] Enabled dlc for FLAT and GLOBAL atomics Differential Revision: https://reviews.llvm.org/D109614	2021-09-21 16:23:20 +03:00
hyeongyu kim	043733d677	[IR] Add the constructor of ShuffleVector for one-input-vector. One of the two inputs of the Shufflevector is often a placeholder. Previously, there were cases where the placeholder was undef, and there were cases where it was poison. I added these constructors to create a placeholder consistently. Changing to use the newly added constructor will be written in a separate patch. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110146	2021-09-21 22:06:07 +09:00
Jonas Paulsson	a48b43f981	[SystemZ] Emit EXRL target instructions before text section is ended. SystemZ adds the EXRL target instructions in the end of each file. This must be done before debug info emission since that may end the text section, and therefore this is now done in emitConstantPools() (instead of in emitEndOfAsmFile). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D109513	2021-09-21 14:32:28 +02:00
Nicholas Guy	9e4d72675f	[AArch64] Improve schedule modelling on the Cortex-A55 Enables the FuseAddress feature in the Cortex-A55 scheduling model Differential Revision: https://reviews.llvm.org/D109323	2021-09-21 13:03:34 +01:00
Simon Pilgrim	fc8f1e4419	[InstCombine] foldConstantInsEltIntoShuffle - bail if we fail to find constant element (PR51824) If getAggregateElement() returns null for any element, early out as otherwise we will assert when creating a new constant vector Fixes PR51824 + ; OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38057	2021-09-21 13:01:09 +01:00
Simon Pilgrim	20b58855e0	[CodeGen] SelectionDAGBuilder - Use const-ref iterator in for-range loops. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-21 13:01:08 +01:00
Simon Pilgrim	f5d23d36de	RewriteStatepointsForGC - Use const-ref iterator in for-range loops. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-21 13:01:08 +01:00
Simon Pilgrim	0f83456cf5	[CodeGen] SDDbgValue::getSDNodes() - use const-ref to avoid unnecessary copies. NFCI. Reported by MSVC static analyzer.	2021-09-21 13:01:08 +01:00
Jay Foad	598bebeaa6	[AMDGPU] Prefer fmac over fma when selecting FMA_W_CHAIN FMA_W_CHAIN is used when lowering fdiv f32. Prefer to select it to fmac if there are no source modifiers, just like we do for other mad/mac and fma/fmac cases. Differential Revision: https://reviews.llvm.org/D110074	2021-09-21 11:57:45 +01:00
Jay Foad	86dcb59206	[AMDGPU] Prefer v_fmac over v_fma only when no source modifiers are used v_fmac with source modifiers forces VOP3 encoding, but it is strictly better to use the VOP3-only v_fma instead, because $dst and $src2 are not tied so it gives the register allocator more freedom and avoids a copy in some cases. This is the same strategy we already use for v_mad vs v_mac and v_fma_legacy vs v_fmac_legacy. Differential Revision: https://reviews.llvm.org/D110070	2021-09-21 11:57:45 +01:00
Max Kazantsev	cd166fb2ef	[SCEV] Use isAvailableAtLoopEntry in the asserts This is what is supposed to be there.	2021-09-21 17:11:15 +07:00
Petar Avramovic	8bc7185668	GlobalISel/Utils: Refactor constant splat match functions Add generic helper function that matches constant splat. It has option to match constant splat with undef (some elements can be undef but not all). Add util function and matcher for G_FCONSTANT splat. Differential Revision: https://reviews.llvm.org/D104410	2021-09-21 12:09:35 +02:00
Max Kazantsev	4d5d725428	[SCEV] Add some asserts on availability of arguments of isLoopEntryGuardedByCond The logic in howManyLessThans is fishy. It first checks invariance of RHS, and then uses OrigRHS as argument for isLoopEntryGuardedByCond, which is, strictly saying, a different thing. We are seeing a very rare intermittent failure of availability checks, and it looks like this precondition is sometimes broken. Before we can figure out what's going on, adding asserts that all involved values that may possibly to to isLoopEntryGuardedByCond are available at loop entry. If either of these asserts fails (OrigRHS is the most likely suspect), it means that the logic here is flawed.	2021-09-21 17:08:52 +07:00
David Stenberg	7b4cc09b14	[LowerConstantIntrinsics] Fix heap-use-after-free bug in worklist This fixes PR51730, a heap-use-after-free bug in replaceConditionalBranchesOnConstant(). With the attached reproducer we were left with a function looking something like this after replaceAndRecursivelySimplify(): [...] cont2.i: br i1 %.not1.i, label %handler.type_mismatch3.i, label %cont4.i handler.type_mismatch3.i: %3 = phi i1 [ %2, %cont2.thread.i ], [ false, %cont2.i ] unreachable cont4.i: unreachable [...] with both the branch instruction and PHI node being in the worklist. As a result of replacing the branch instruction with an unconditional branch, the PHI node in %handler.type_mismatch3.i would be removed. This then resulted in a heap-use-after-free bug due to accessing that removed PHI node in the next worklist iteration. This is solved by using a value handle worklist. I am a unsure if this is the most idiomatic solution. Another solution could have been to produce a worklist just containing the interesting branch instructions, but I thought that it perhaps was a bit cleaner to keep all worklist filtering in the loop that does the rewrites. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D109221	2021-09-21 11:33:07 +02:00
Cullen Rhodes	b23d22f7d5	[PowerPC] NFC: Remove unused tblgen template args Identified in D109359. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D109715	2021-09-21 08:24:16 +00:00
Evgeniy Brevnov	129cf33604	[DSE][NFC] Rename Later->Killing, Earlier->Dead First (and biggest) change is to use "Killing/Dead" in place of "Later/Earlier" base for names in DSE. For example, [Maybe]DeadLoc - is a location killed by KillingI instruction. I believe such names are more descriptive and easy to understand than current ones. Second, there are inconsistencies in naming where different names are used for the same thing. Fixed that too. Third, reordered parameters of isPartialOverwrite, tryToMergePartialOverlappingStores, isOverwrite to make them consistent between each other. This greatly reduces potential mistakes. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D106947	2021-09-21 13:44:12 +07:00
Amara Emerson	7091a7f781	[GlobalISel][Legalizer] Don't use eraseFromParentAndMarkDBGValuesForRemoval() for some artifacts. For artifacts excluding G_TRUNC/G_SEXT, which have IR counterparts, we don't seem to have debug users of defs. However, in the legalizer we're always calling MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() which is expensive. In some rare cases, this contributes significantly to unreasonably long compile times when we have lots of artifact combiner activity. To verify this, I added asserts to that function when it actually replaced a debug use operand with undef for these artifacts. On CTMark with both -O0 and -Os and debug info enabled, I didn't see a single case where it triggered. In my measurements I saw around a 0.5% geomean compile-time improvement on -g -O0 for AArch64 with this change. Differential Revision: https://reviews.llvm.org/D109750	2021-09-20 23:34:42 -07:00
Max Kazantsev	2c7d5fbc9e	[SCEV] Generalize implication when signedness of FoundPred doesn't matter The implication logic for two values that are both negative or non-negative says that it doesn't matter whether their predicate is signed and unsigned, but only flips unsigned into signed for further inference. This patch adds support for flipping a signed predicate into unsigned as well. Differential Revision: https://reviews.llvm.org/D109959 Reviewed By: nikic	2021-09-21 11:17:56 +07:00
Yonghong Song	ea72b0319d	BPF: make 32bit register spill with 64bit alignment In llvm, for non-alu32 mode, the stack alignment is 64bit so only one 64bit spill per 64bit slot. For alu32 mode, the stack alignment is 32bit, so it is possible to have two 32bit spills per 64bit slot. Currently, bpf kernel verifier does not preserve register states for 32bit spills. That is, one 32bit register may hold a constant value or a bounded range before spill. After reload from the stack, the information is lost and sometimes this may cause verifier failure. For 64bit register spill, the verifier indeed tries to preserve the register state for reloading. The current verifier can be modestly changed to handle one 32bit spill per 64bit stack slot with state-preserving reload. Handling two 32bit spills per 64bit stack slot will require substantial changes. This patch changes stack alignment for alu32 to be 64bit. This way, for any 64bit slot in alu32 mode, only one 32bit or 64bit register values can be saved. Together with previous-mentioned verifier enhancement, 32bit spill can be handled with state preserving. Note that llvm stack slot coallescing seems only doing adjacent packing which may leave some holes in the stack. For example, stack slot 8 <== 8 bytes stack slot 4 <== 8 bytes with 4 byte hole stack slot 8 <== 8 bytes stack slot 4 <== 4 bytes Differential Revision: https://reviews.llvm.org/D109073	2021-09-20 21:00:25 -07:00
Max Kazantsev	073b254cff	[SimplifyCFG] Redirect switch cases that lead to UB into an unreachable block When following a case of a switch instruction is guaranteed to lead to UB, we can safely break these edges and redirect those cases into a newly created unreachable block. As result, CFG will become simpler and we can remove some of Phi inputs to make further analyzes easier. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D109428 Reviewed By: lebedev.ri	2021-09-21 10:45:19 +07:00
Max Kazantsev	a06db78fd9	[NFC] Rename Context->CtxI in SCEV for uniformity reasons	2021-09-21 10:12:20 +07:00
Kazu Hirata	85b4b21c8b	[llvm] Use make_early_inc_range (NFC)	2021-09-20 19:30:02 -07:00
Usman Nadeem	f417d9d821	[InstCombine] Eliminate vector reverse if all inputs/outputs to an instruction are reverses Differential Revision: https://reviews.llvm.org/D109808 Change-Id: I1a10d2bc33acbe0ea353c6cb3d077851391fe73e	2021-09-20 18:32:24 -07:00
Amara Emerson	4ceea77409	[X86] Rename the X86WinAllocaExpander pass and related symbols to "DynAlloca". NFC. For x86 Darwin, we have a stack checking feature which re-uses some of this machinery around stack probing on Windows. Renaming this to be more appropriate for a generic feature. Differential Revision: https://reviews.llvm.org/D109993	2021-09-20 16:19:28 -07:00
Jacob Lambert	dc6e8dfdfe	[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.	2021-09-20 14:48:50 -07:00
Amara Emerson	f9d69a0ab0	[GlobalISel] Implement support for the "trap-func-name" attribute. This attribute calls a function instead of emitting a trap instruction. Differential Revision: https://reviews.llvm.org/D110098	2021-09-20 14:32:01 -07:00
Florian Mayer	16b5f4502c	[NFC] [hwasan] Separate outline and inline instrumentation. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D110067	2021-09-20 21:49:09 +01:00
Craig Topper	a95ba81073	[RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for FMA. If either of the multiplicands is a splat, we can sink it to use vfmacc.vf or similar.	2021-09-20 11:49:50 -07:00
Nikita Popov	dd0226561e	[IR] Add helper to convert offset to GEP indices We implement logic to convert a byte offset into a sequence of GEP indices for that offset in a number of places. This patch adds a DataLayout::getGEPIndicesForOffset() method, which implements the core logic. I've updated SROA, ConstantFolding and InstCombine to use it, and there's a few more places where it looks relevant. Differential Revision: https://reviews.llvm.org/D110043	2021-09-20 20:18:16 +02:00
Geoffrey Martin-Noble	01b097afd0	Fix bad merge the removed switch case When https://reviews.llvm.org/D109520 was landed, it reverted the addition of this switch case added in https://reviews.llvm.org/D109293. This caused `-Wswitch` failures (and presumably broke the functionality added in the latter patch).	2021-09-20 10:58:58 -07:00
Craig Topper	04ab6c85ef	[RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for FAdd/FSub/FMul/FDiv.	2021-09-20 10:25:46 -07:00
Nikita Popov	ecd52a5be9	[Verifier] Try to fix MSVC build Some buildbots fail with: > C:\a\llvm-clang-x86_64-expensive-checks-win\llvm-project\llvm\lib\IR\Verifier.cpp(4352): error C2678: binary '==': no operator found which takes a left-hand operand of type 'const llvm::MDOperand' (or there is no acceptable conversion) Possibly the explicit MDOperand to Metadata* conversion will help?	2021-09-20 18:47:25 +02:00
Kazu Hirata	f3cfec9c9e	[MCA] Fix a warning This patch fixes the warning InstructionTables.cpp:27:56: error: loop variable 'Resource' of type 'const std::pair<const uint64_t, ResourceUsage> &' (aka 'const pair<const unsigned long, llvm::mca::ResourceUsage> &') binds to a temporary constructed from type 'const std::pair<unsigned long, llvm::mca::ResourceUsage> &' [-Werror,-Wrange-loop-construct] Note that Resource is declared as: SmallVector<std::pair<uint64_t, ResourceUsage>, 4> Resources; without "const" for uint64_t.	2021-09-20 09:46:38 -07:00
Craig Topper	d85e347a28	[RISCV] Add a pass to recognize VLS strided loads/store from gather/scatter. For strided accesses the loop vectorizer seems to prefer creating a vector induction variable with a start value of the form <i32 0, i32 1, i32 2, ...>. This value will be incremented each loop iteration by a splat constant equal to the length of the vector. Within the loop, arithmetic using splat values will be done on this vector induction variable to produce indices for a vector GEP. This pass attempts to dig through the arithmetic back to the phi to create a new scalar induction variable and a stride. We push all of the arithmetic out of the loop by folding it into the start, step, and stride values. Then we create a scalar GEP to use as the base pointer for a strided load or store using the computed stride. Loop strength reduce will run after this pass and can do some cleanups to the scalar GEP and induction variable. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107790	2021-09-20 09:39:44 -07:00
Nikita Popov	8700f2bd36	[Verifier] Verify scoped noalias metadata Verify that !noalias, !alias.scope and llvm.experimental.noalias.scope arguments have the format specified in https://llvm.org/docs/LangRef.html#noalias-and-alias-scope-metadata. I've fixed up a lot of broken metadata used by tests in advance. Especially using a scope instead of the expected scope list is a commonly made mistake. Differential Revision: https://reviews.llvm.org/D110026	2021-09-20 18:27:28 +02:00
Alexey Bataev	bc69dd62c0	[SLP]Improve graph reordering. Reworked reordering algorithm. Originally, the compiler just tried to detect the most common order in the reordarable nodes (loads, stores, extractelements,extractvalues) and then fully rebuilding the graph in the best order. This was not effecient, since it required an extra memory and time for building/rebuilding tree, double the use of the scheduling budget, which could lead to missing vectorization due to exausted scheduling resources. Patch provide 2-way approach for graph reodering problem. At first, all reordering is done in-place, it doe not required tree deleting/rebuilding, it just rotates the scalars/orders/reuses masks in the graph node. The first step (top-to bottom) rotates the whole graph, similarly to the previous implementation. Compiler counts the number of the most used orders of the graph nodes with the same vectorization factor and then rotates the subgraph with the given vectorization factor to the most used order, if it is not empty. Then repeats the same procedure for the subgraphs with the smaller vectorization factor. We can do this because we still need to reshuffle smaller subgraph when buildiong operands for the graph nodes with lasrger vectorization factor, we can rotate just subgraph, not the whole graph. The second step (bottom-to-top) scans through the leaves and tries to detect the users of the leaves which can be reordered. If the leaves can be reorder in the best fashion, they are reordered and their user too. It allows to remove double shuffles to the same ordering of the operands in many cases and just reorder the user operations instead. Plus, it moves the final shuffles closer to the top of the graph and in many cases allows to remove extra shuffle because the same procedure is repeated again and we can again merge some reordering masks and reorder user nodes instead of the operands. Also, patch improves cost model for gathering of loads, which improves x264 benchmark in some cases. Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264, +3% for 508.namd, improves most of other benchmarks. The compile and link time are almost the same, though in some cases it should be better (we're not doing an extra instruction scheduling anymore) + we may vectorize more code for the large basic blocks again because of saving scheduling budget. Differential Revision: https://reviews.llvm.org/D105020	2021-09-20 08:42:19 -07:00
David Sherwood	f988f68064	[Analysis] Add support for vscale in computeKnownBitsFromOperator In ValueTracking.cpp we use a function called computeKnownBitsFromOperator to determine the known bits of a value. For the vscale intrinsic if the function contains the vscale_range attribute we can use the maximum and minimum values of vscale to determine some known zero and one bits. This should help to improve code quality by allowing certain optimisations to take place. Tests added here: Transforms/InstCombine/icmp-vscale.ll Differential Revision: https://reviews.llvm.org/D109883	2021-09-20 15:01:59 +01:00
Stefan Gränitz	e8d81d80f6	[JITLink] Adopt forEachRelocation() helper in ELF RISCV backend (NFC) Following D109516, this patch re-uses the new helper function for ELF relocation traversal in the RISCV backend. Reviewed By: StephenFan Differential Revision: https://reviews.llvm.org/D109522	2021-09-20 15:46:41 +02:00
Stefan Gränitz	68914dc990	[JITLink] Adopt forEachRelocation() helper in ELF x86-64 backend (NFC) Following D109516, this patch re-uses the new helper function for ELF relocation traversal in the x86-64 backend. Reviewed By: StephenFan Differential Revision: https://reviews.llvm.org/D109520	2021-09-20 15:46:41 +02:00
David Green	3f90df22f1	[ARM] MVE reverse shuffles. The vectorizer can sometimes make reverse shuffles from indices that count down. In MVE, we don't have a 128bit rev instruction, but we can select this to a VREV64 with some lane movs to swap the two halfs. Ideally this would use VMOVD's, but only gets as far as VMOVS's at the moment. Differential Revision: https://reviews.llvm.org/D69510	2021-09-20 13:48:01 +01:00
Simon Pilgrim	7fc12b822c	MachOObjectFile - checkOverlappingElement - use const-ref to avoid unnecessary copies. NFCI. Reported by MSVC static analyzer.	2021-09-20 12:53:18 +01:00
Simon Pilgrim	4ab7c0d3fa	[X86] X86TargetTransformInfo - remove unnecessary if-else after early exit. NFCI. (style) Break the if-else chain as they all return.	2021-09-20 12:53:17 +01:00
Simon Pilgrim	ea17b15f2d	[MCA] InstructionTables::execute() - use const-ref iterator in for-range loop. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-20 12:53:17 +01:00
Petar Avramovic	e4c46ddd91	[GlobalISel] Improve elimination of dead instructions in legalizer Add eraseInstr(s) utility functions. Before deleting an instruction collects its use instructions. After deletion deletes use instructions that became trivially dead. This patch clears all dead instructions in existing legalizer mir tests. Differential Revision: https://reviews.llvm.org/D109154	2021-09-20 13:00:58 +02:00
Bjorn Pettersson	c8cb7f611f	[NewPM] Make InlinerPass (aka 'inline') a parameterized pass In default pipelines the ModuleInlinerWrapperPass is adding the InlinerPass to the pipeline twice, once due to MandatoryFirst (passing true in the ctor) and then a second time with false as argument. To make it possible to bisect and reduce opt test cases for this part of the pipeline we need to be able to choose between the two different variants of the InlinerPass when running opt. This patch is changing 'inline' to a CGSCC_PASS_WITH_PARAMS in the PassRegistry, making it possible run opt with both -passes=cgscc(inline) and -passes=cgscc(inline<only-mandatory>). Reviewed By: aeubanks, mtrofin Differential Revision: https://reviews.llvm.org/D109877	2021-09-20 12:52:52 +02:00
Tim Northover	13aa102e07	AArch64: use ldp/stp for 128-bit atomic load/store in v.84 onwards v8.4 says that normal loads/stores of 128-bytes are single-copy atomic if they're properly aligned (which all LLVM atomics are) so we no longer need to do a full RMW operation to guarantee we got a clean read.	2021-09-20 09:50:11 +01:00
David Spickett	92c9b28347	Revert "[AArch64][SVE] Teach cost model that masked loads/stores are cheap" This reverts commit `734708e04f`. Due to build failures on the 2 stage SVE VLS bot. https://lab.llvm.org/buildbot/#/builders/176/builds/908/steps/11/logs/stdio	2021-09-20 08:45:18 +00:00
Florian Hahn	7f6a4826ac	[CaptureTracking] Allow passing LI to PointerMayBeCapturedBefore (NFC). isPotentiallyReachable can use LoopInfo to return earlier. This patch allows passing an optional LI to PointerMayBeCapturedBefore. Used in D109844. Reviewed By: nikic, asbirlea Differential Revision: https://reviews.llvm.org/D109978	2021-09-20 09:07:34 +01:00
Max Kazantsev	e9d34c5429	[NFC] Add assert and test showing that revert of D109596 wasn't justified All transforms of IndVars have prerequisite requirement of LCSSA and LoopSimplify form and rely on it. Added test that shows that this actually stands.	2021-09-20 12:01:12 +07:00
Max Kazantsev	471217cff8	Revert "Revert "[IndVars] Replace PHIs if loop exits on 1st iteration"" This reverts commit `6fec6552f5`. The patch was reverted on incorrect claim that this patch may break LCSSA form when the loop is not in a simplify form. All IndVars' transform insure that the loop is in simplify and LCSSA form, so if it wasn't broken before this transform, it will also not be broken after it.	2021-09-20 12:01:10 +07:00
Max Kazantsev	def15c5fb6	[SCEV] Support negative values in signed/unsigned predicate reasoning There is a piece of logic that uses the fact that signed and unsigned versions of the same predicate are equivalent when both values are non-negative. It's also true when both of them are negative. Differential Revision: https://reviews.llvm.org/D109957 Reviewed By: nikic	2021-09-20 11:26:33 +07:00
David Blaikie	cb42bb3550	llvm-dwarfdump: pretty type printing: print fully qualified names in function type parameter types	2021-09-19 18:49:15 -07:00
David Blaikie	606ea0dd2a	llvm-dwarfdump: support for type printing "decltype(nullptr)" as "nullptr_t" This should probably be rendered as "std::nullptr_t" but for now clang uses the unqualified name (which is ambiguous with possible user defined name in the global namespace), so match that here.	2021-09-19 17:33:56 -07:00
David Blaikie	11e0b79b05	llvm-dwarfdump: Don't print even an empty string when a type is unprintable	2021-09-19 17:03:10 -07:00
David Blaikie	5bfe5207ef	llvm-dwarfdump: Pretty print names qualified/with scopes	2021-09-19 16:36:01 -07:00
Simon Pilgrim	0e89ff8195	[X86] SimplifyDemandedBits - only narrow a broadcast source if we only have one use. Helps with the regression noted on D109065 - don't truncate a broadcast source if the source has multiple uses.	2021-09-19 22:53:30 +01:00
Kazu Hirata	84b07c9b3a	[llvm] Use pop_back_val (NFC)	2021-09-19 13:44:23 -07:00
Chris Jackson	5ba8020326	[DebugInfo][LSR] Emit shorter expressions from scev-based salvaging The scev-based salvaging for LSR can sometimes produce unnecessarily verbose expressions. This patch adds logic to detect when the value to be recovered and the induction variable differ by only a constant offset. Then, the expression to derive the current iteration count can be omitted from the dbg.value in favour of the offset. Reviewed by: aprantl Differential Revision: https://reviews.llvm.org/D109044	2021-09-19 21:41:44 +01:00
David Blaikie	372e2c24b6	llvm-dwarfdump: Pretty printing types including a space between const and parenthesized references/pointers to arrays	2021-09-19 13:32:53 -07:00
David Blaikie	a51fb58c55	DWARFDie.cpp: Minor follow-up clang-format	2021-09-19 13:06:18 -07:00
David Blaikie	f09ca5c646	DWARFDie: Improve type printing for function and array types - with qualifiers (cv/reference) and pointers to them	2021-09-19 12:59:31 -07:00
Simon Pilgrim	f855ef2601	[X86][Atom] Fix FP uops + port usage Both ports are required in most cases. Update the uops counts + port usage based off the most recent llvm-exegesis captures (PR36895) and what Intel AoM / Agner / InstLatX64 reports as well. Noticed while trying to improve fp costs for vectorization via the D103695 helper script.	2021-09-19 20:39:20 +01:00
Simon Pilgrim	b7342e3137	[X86] Fold SHUFPS(shuffle(x),shuffle(y),mask) -> SHUFPS(x,y,mask') We can combine unary shuffles into either of SHUFPS's inputs and adjust the shuffle mask accordingly. Unlike general shuffle combining, we can be more aggressive and handle multiuse cases as we're not going to accidentally create additional shuffles.	2021-09-19 20:39:19 +01:00
Simon Pilgrim	cf8fac7d07	[X86][Atom] Specific uops for all IMUL/IDIV instructions Based off a mixture of llvm-exegesis captures (PR36895) and Intel AoM / Agner / InstLatX64 reports.	2021-09-19 16:58:52 +01:00
Roman Lebedev	5f2fe48d06	[X86][TLI] SimplifyDemandedVectorEltsForTargetNode(): don't break apart broadcasts from which not just the 0'th elt is demanded Apparently this has no test coverage before D108382, but D108382 itself shows a few regressions that this fixes. It doesn't seem worthwhile breaking apart broadcasts, assuming we want the broadcasted value to be preset in several elements, not just the 0'th one. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D108411	2021-09-19 17:38:32 +03:00
Roman Lebedev	07f1d8f0ca	[X86] lowerShuffleAsDecomposedShuffleMerge(): if both inputs are broadcastable/identities, canonicalize broadcasts as such Split off from D108253. Broadcast is simpler than any other shuffle we might produce to do what we want to do here, so prefer it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D108382	2021-09-19 17:35:37 +03:00
Roman Lebedev	0852313e47	[NFC] combineX86ShufflesRecursively(): actually address nits for previous patch	2021-09-19 17:29:18 +03:00
Roman Lebedev	1e72ca94e5	[X86] combineX86ShufflesRecursively(): call SimplifyMultipleUseDemandedVectorElts() on after finishing recursing This was suggested in https://reviews.llvm.org/D108382#inline-1039018, and it avoids regressions in that patch. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D109065	2021-09-19 17:24:58 +03:00
David Green	1da52ef294	[ARM] Add VGETLANEu patterns for v4f16 and v8f16 These were apparently missing, having no pattern that could convert a VGETLANEu of a v4f16 to an i32. Added bf16 whilst here, following the same code.	2021-09-19 14:25:21 +01:00
Simon Pilgrim	e381d8b243	[X86][Atom] Fix (U)COMISS/SD uops, latency and throughput Both ports are required, for reg and mem variants - we can also use the WriteFComX class directly and remove the unnecessary InstRW overrides. Matches what Intel AoM / Agner / InstLatX64 report as well.	2021-09-19 12:44:44 +01:00
Ben Shi	dee5a8ca32	[RISCV] Optimize (add (shl x, c0), (shl y, c1)) with SHADD Optimize (add (shl x, c0), (shl y, c1)) -> (SLLI (SHADD x, y), c1), if c0-c1 == 1/2/3. Reviewed By: craig.topper, luismarques Differential Revision: https://reviews.llvm.org/D108916	2021-09-19 16:35:12 +08:00
David Blaikie	ae0873483d	DWARFDie:DWARFTypePrinter: Add common utility function for checking where parentheses are required	2021-09-18 22:54:57 -07:00
David Blaikie	d2373c04a7	DWARFDie.cpp: Reduce indentation with early continue	2021-09-18 22:22:25 -07:00
Roman Lebedev	6a2c2263fb	[X86] Improve i8 all-ones element insertion in pre-SSE4.1 Should avoid some regressions in D109065 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D109989	2021-09-18 22:24:06 +03:00
Alexandre Ganea	7b25fa8c7a	[Support] Attempt to fix deadlock in ThreadGroup This is an attempt to fix the situation described by https://reviews.llvm.org/D104207#2826290 and PR41508. See sequence of operations leading to the bug in https://reviews.llvm.org/D104207#3004689 We ensure that the Latch is completely "free" before decrementing the number of TaskGroupInstances. Differential revision: https://reviews.llvm.org/D109914	2021-09-18 13:49:10 -04:00
David Green	cb5e3f7959	[ARM] Prevent large integer VQDMULH pattern crashes Put a limit on the size of constant integers we test when looking for VQDMULH, to prevent it from crashing from values more than 64bits.	2021-09-18 18:47:02 +01:00
Kazu Hirata	48719e3b18	[CodeGen] Use make_early_inc_range (NFC)	2021-09-18 09:29:24 -07:00
Joseph Huber	c30d7730eb	[OpenMP] Change debugging symbol to weak_odr linkage The new device runtime uses an internal variable to set debugging. This variable was originally privately linked because every module will have a copy of it. This caused problems with merging the device bitcode library because it would get renamed and there was not a way to refer to an external, private symbol. This changes the symbol to weak_odr so it can be defined multiply, but will not be renamed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109997	2021-09-17 21:25:24 -04:00
Joseph Huber	27905eeb89	[Attributor] Change AAExecutionDomain to check intrinsic edges The AAExecutionDomain instance checks if a BB is executed by the main thread only. Currently, this only checks the `__kmpc_kernel_init` call for generic regions to indicate the path taken by the main thread. In the new runtime, we want to be able to detect basic blocks even in SPMD mode. For this we enable it to check thread-ID intrinsics being compared to zero as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109849	2021-09-17 19:51:38 -04:00
Usman Nadeem	757384abff	[AArch64][SVE][InstCombine] Fold redundant zip1/2(uzp1/2) operations zip1(uzp1(A, B), uzp2(A, B)) --> A zip2(uzp1(A, B), uzp2(A, B)) --> B Differential Revision: https://reviews.llvm.org/D109666 Change-Id: I4a6578db2fcef9ff71ad0e77b9fe08354e6dbfcd	2021-09-17 15:24:46 -07:00
Arthur Eubanks	0db9481208	[NFC] Remove FIXMEs about calling LLVMContext::yield() Nobody has complained about this, and the documentation for LLVMContext::yield() states that LLVM is allowed to never call it. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D110008	2021-09-17 14:59:34 -07:00
Andrew Browne	c533b88a6d	[DFSan] Add force_zero_label abilist option to DFSan. This can be used as a work-around for overtainting. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D109847	2021-09-17 12:57:40 -07:00
Hongtao Yu	c5fafc1e73	[CSSPGO] Tweakes to lower pseudo probe runtime overhead A couple tweaks to 1. allow more thinlto importing by excluding probe intrinsics from IR size in module summary 2. Allow general default attributes (nofree nosync nounwind) for pseudo probe intrinsic. Without those attributes, pseudo probes will be basically treated as unknown calls which will in turn block their containing functions from annotated with those attributes. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D109976	2021-09-17 12:28:09 -07:00
Kazu Hirata	e2febc2ed4	[llvm] Use drop_begin (NFC)	2021-09-17 09:16:40 -07:00
Roman Lebedev	358df06f4e	[X86] Improve `matchBinaryShuffle()`'s `BLEND` lowering with per-element all-zero/all-ones knowledge We can use `OR` instead of `BLEND` if either the element we are not picking is zero (or masked away); or the element we are picking overwhelms (e.g. it's all-ones) whatever the element we are not picking: https://alive2.llvm.org/ce/z/RKejao Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D109726	2021-09-17 19:13:33 +03:00
Sanjay Patel	41ff7612b3	[InstCombine] allow splat vectors for narrowing masked fold Mostly cosmetic diffs, but the use of m_APInt matches splat constants.	2021-09-17 11:24:16 -04:00
Simon Pilgrim	72e5786281	[DebugInfo] DWARF - Use const-ref iterator in for-range loop. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-17 14:04:54 +01:00
Simon Pilgrim	4af7643470	[CodeGen] LiveDebug - Use const-ref iterator in for-range loop. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-17 14:04:54 +01:00
Simon Pilgrim	db23f27786	[X86] X86PreTileConfig - Use const-ref iterator in for-range loop. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-17 14:04:53 +01:00
Simon Pilgrim	77f6c0bcaa	Fix Wdocumentation warnings. NFCI. Fix parameter name typos and drop returns statements from void functions	2021-09-17 12:45:56 +01:00
Simon Pilgrim	5ebe95e256	[X86][Atom] Fix integer shuffles uops, latency and throughput The MMX pack/unpck shuffles don't need an override - they have the same behaviour as other shuffles (Port0 only). The SSE pslldq/psrldq shuffles don't need an override - they have the same behaviour as other shuffles (Port0 only). The SSE pshufb shuffles use 4uops (+1 load). Noticed the pslldq/psrldq issue while trying to improve reduction costs via the D103695 helper script, and fixed the others while reviewing. Confirmed with Intel AoM / Agner / InstLatX64.	2021-09-17 12:11:54 +01:00
Simon Pilgrim	9e70d4e5f2	[AsmPrinter] DebugLocEntry::dump() - Use const-ref iterator in for-range loop. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-17 12:11:54 +01:00
Simon Pilgrim	e4b2f66d7f	[TableGen] Record::checkRecordAssertions() - Use const-ref iterator in for-range loop. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-17 12:11:53 +01:00
Jonas Paulsson	1a5ab3e97c	[SystemZ] Recognize .machine directive in parser. The .machine directive can be used in assembly files to specify the ISA for the instructions following it. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D109660	2021-09-17 12:03:54 +02:00
Petar Avramovic	d477a7c2e7	GlobalISel/Utils: Refactor integer/float constant match functions Rework getConstantstVRegValWithLookThrough in order to make it clear if we are matching integer/float constant only or any constant(default). Add helper functions that get DefVReg and APInt/APFloat from constant instr getIConstantVRegValWithLookThrough: integer constant, only G_CONSTANT getFConstantVRegValWithLookThrough: float constant, only G_FCONSTANT getAnyConstantVRegValWithLookThrough: either G_CONSTANT or G_FCONSTANT Rename getConstantVRegVal and getConstantVRegSExtVal to getIConstantVRegVal and getIConstantVRegSExtVal. These now only match G_CONSTANT as described in comment. Relevant matchers now return both DefVReg and APInt/APFloat. Replace existing uses of getConstantstVRegValWithLookThrough and getConstantVRegVal with new helper functions. Any constant match is only required in: ConstantFoldBinOp: for constant argument that was bit-cast of float to int getAArch64VectorSplat: AArch64::G_DUP operands can be any constant amdgpu select for G_BUILD_VECTOR_TRUNC: operands can be any constant In other places use integer only constant match. Differential Revision: https://reviews.llvm.org/D104409	2021-09-17 11:22:13 +02:00
Chen Zheng	80584f0056	Revert "[PowerPC][ELF] make sure local variable space does not overlap with parameter save area" This causes mix-compile issues on PowerPC Linux. This reverts commit `324bd467a2`.	2021-09-17 08:07:18 +00:00
Lang Hames	7e8babeb9d	Revert "[examples] Fix SectionMemoryManager deconstruction error with MSVC." This reverts commit `63838d8814`, which broke tests on some bots. See e.g. https://lab.llvm.org/buildbot#builders/109/builds/22561	2021-09-17 17:42:25 +10:00
Sjoerd Meijer	97cc678cc4	[FuncSpec] Specialising on addresses of const global values. This introduces an option to allow specialising on the address of global values. This option is off by default because it is likely not that profitable to do so and needs more investigation. Before, we were specialising on addresses and thus this changes the default behaviour. Differential Revision: https://reviews.llvm.org/D109775	2021-09-17 08:07:05 +01:00
Lang Hames	63838d8814	[examples] Fix SectionMemoryManager deconstruction error with MSVC. This commit fixes an order-of-initialization issue: If the default mmapper object is destroyed while some global SectionMemoryManager is still using it then calls to the mapper from ~SectionMemoryManager will fail. This issue was causing failures when running the LLVM Kaleidoscope examples on windows. Switching to a ManagedStatic solves the initialization order issue. Patch by Justice Adams. Thanks Justice! Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D107087	2021-09-17 16:58:23 +10:00
Peter Collingbourne	fc08cfb888	CodeView: static_cast result of getOffset() to size_t. Silences a narrowing conversion warning on 32-bit platforms after D109923.	2021-09-16 23:39:04 -07:00
Christudasan Devadasan	167ff5280d	[GlobalOpt] Do not shrink global to bool for an unfavorable AS Do not call `TryToShrinkGlobalToBoolean` for address spaces that don't allow initializers. It inserts an initializer value while shrinking to bool. Used the target hook introduced with D109337 to skip this call for the restricted address spaces. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D109823	2021-09-16 23:13:30 -04:00
Nuri Amari	cc8229603b	Extract LC_CODE_SIGNATURE related implementation out of LLD Move the functionality in lld that handles writing of the LC_CODE_SIGNATURE load command and associated data section to a central reusable location. This change is in preparation for another change that modifies llvm-objcopy to reproduce the LC_CODE_SIGNATURE load command and corresponding data section to maintain the validity of signed macho object files passed through llvm-objcopy. Reviewed By: #lld-macho, int3, oontvoo Differential Revision: https://reviews.llvm.org/D109803	2021-09-16 17:43:39 -07:00
Lang Hames	78b083dbb7	[ORC] Add finalization & deallocation actions, SimpleExecutorMemoryManager class Finalization and deallocation actions are a key part of the upcoming JITLinkMemoryManager redesign: They generalize the existing finalization and deallocate concepts (basically "copy-and-mprotect", and "munmap") to include support for arbitrary registration and deregistration of parts of JIT linked code. This allows us to register and deregister eh-frames, TLV sections, language metadata, etc. using regular memory management calls with no additional IPC/RPC overhead, which should both improve JIT performance and simplify interactions between ORC and the ORC runtime. The SimpleExecutorMemoryManager class provides executor-side support for memory management operations, including finalization and deallocation actions. This support is being added in advance of the rest of the memory manager redesign as it will simplify the introduction of an EPC based RuntimeDyld::MemoryManager (since eh-frame registration/deregistration will be expressible as actions). The new RuntimeDyld::MemoryManager will in turn allow us to remove older remote allocators that are blocking the rest of the memory manager changes.	2021-09-17 09:55:45 +10:00
Nico Weber	646299d183	[Support] Convert BinaryStream class zoo to 64-bit offsets Most PDB fields on disk are 32-bit but describe the file in terms of MSF blocks, which are 4 kiB by default. So PDB files can be a bit larger than 4 GiB, and much larger if you create them with a block size > 4 kiB. This is a first (necessary, but by far not not sufficient) step towards supporting such PDB files. Now we don't truncate in-memory file offsets (which are in terms of bytes, not in terms of blocks). No effective behavior change. lld-link will still error out if it were to produce PDBs > 4 GiB. Differential Revision: https://reviews.llvm.org/D109923	2021-09-16 19:14:52 -04:00
Daniil Suchkov	0e36288318	[LoopPredication] Report changes correctly when attempting loop exit predication To make the IR easier to analyze, this pass makes some minor transformations. After that, even if it doesn't decide to optimize anything, it can't report that it changed nothing and preserved all the analyses. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D109855	2021-09-16 22:49:55 +00:00
Jon Roelofs	4b19e7dfae	[LoopIdiomRecognize][Remarks] Track loop-strided store to/from blocks Differential revision: https://reviews.llvm.org/D109929	2021-09-16 15:46:26 -07:00
Jacob Lambert	4c1023b4b7	[AMDGPU] NFC: Fixing small spelling errors in AMDGPU header files Nonfunctional commit fixing several minor spelling errors in llvm/lib/Target/AMDGPU header files. Testing workflow as a new contributor. Differential Revision: https://reviews.llvm.org/D109733	2021-09-16 13:03:09 -07:00
Teresa Johnson	88cb3e2cb6	[MemProf] Don't instrument stack accesses unless requested Skip stack accesses unless requested, as the memory profiler runtime does not currently look at or report accesses for these addresses. Differential Revision: https://reviews.llvm.org/D109868	2021-09-16 12:21:51 -07:00
Nikita Popov	0fc624f029	[IR] Return AAMDNodes from Instruction::getMetadata() (NFC) getMetadata() currently uses a weird API where it populates a structure passed to it, and optionally merges into it. Instead, we can return the AAMDNodes and provide a separate merge() API. This makes usages more compact. Differential Revision: https://reviews.llvm.org/D109852	2021-09-16 21:06:57 +02:00
Craig Topper	73e5b9ea90	[RISCV] Select (srl (sext_inreg X, i32), uimm5) to SRAIW if only lower 32 bits are used. SimplifyDemandedBits can turn srl into sra if the bits being shifted in aren't demanded. This patch can recover the original sra in some cases. I've renamed the tablegen class for detecting W users since the "overflowing operator" term I originally borrowed from Operator.h does not include srl. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D109162	2021-09-16 11:03:35 -07:00
Vang Thao	106959acc1	[AMDGPU] Inline non-kernel functions using extern lds In https://reviews.llvm.org/D100481, forceful inline of all non-kernel functions using lds was disabled since AMDGPULowerModuleLDS pass now handles static lds. However that pass does not handle extern lds so non-kernel functions using extern lds must sill be inline. Reviewed By: hsmhsm, arsenm Differential Revision: https://reviews.llvm.org/D109773	2021-09-16 10:58:51 -07:00
Arthur Eubanks	d49cb5b303	[SimplifyCFG] Add bonus when seeing vector ops to branch fold to common dest This makes some tests in vector-reductions-logical.ll more stable when applying D108837. The cost of branching is higher when vector ops are involved due to potential SLP transformations. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D108935	2021-09-16 10:50:36 -07:00
Dávid Bolvanský	a4a426c9e0	[InstCombine] Added llvm.powi optimizations If power is even: powi(-x, p) -> powi(x, p) powi(fabs(x), p) -> powi(x, p) powi(copysign(x, y), p) -> powi(x, p)	2021-09-16 19:42:21 +02:00
Kazu Hirata	cfc7402419	[llvm] Use drop_begin (NFC)	2021-09-16 08:46:26 -07:00
Michael Liao	ffa5c3a555	Fix warning on `llvm-else-after-return`. NFC.	2021-09-16 11:25:43 -04:00
Doug Gregor	a773db7d76	Add a command-line flag to control the Swift extended async frame info. Introduce a new command-line flag `-swift-async-fp={auto\|always\|never}` that controls how code generation sets the Swift extended async frame info bit. There are three possibilities: * `auto`: which determines how to set the bit based on deployment target, either statically or dynamically via `swift_async_extendedFramePointerFlags`. * `always`: the default, always set the bit statically, regardless of deployment target. * `never`: never set the bit, regardless of deployment target. Patch by Doug Gregor <dgregor@apple.com> Reviewed By: doug.gregor Differential Revision: https://reviews.llvm.org/D109392	2021-09-16 06:57:45 -07:00
Bjorn Pettersson	d9fc3d879e	[NewPM] Replace 'kasan-module' by 'asan-module<kernel>' Change the asan-module pass into a MODULE_PASS_WITH_PARAMS in the pass registry, and add a single parameter called 'kernel' that can be set instead of having a special pass name 'kasan-module' to trigger that special pass config. Main reason is to make sure that we have a unique mapping from ClassName to PassName in the new passmanager framework, making it possible to correctly identify the passes when dealing with options such as -print-after and -print-pipeline-passes. This is a follow-up to D105006 and D105007.	2021-09-16 14:58:42 +02:00
Bjorn Pettersson	8f8616655c	[NewPM] Use a separate struct for ModuleThreadSanitizerPass Split ThreadSanitizerPass into ThreadSanitizerPass (as a function pass) and ModuleThreadSanitizerPass (as a module pass). Main reason is to make sure that we have a unique mapping from ClassName to PassName in the new passmanager framework, making it possible to correctly identify the passes when dealing with options such as -print-after and -print-pipeline-passes. This is a follow-up to D105006 and D105007.	2021-09-16 14:58:42 +02:00
Bjorn Pettersson	ab41eef9ac	[NewPM] Use a separate struct for ModuleMemorySanitizerPass Split MemorySanitizerPass into MemorySanitizerPass (as a function pass) and ModuleMemorySanitizerPass (as a module pass). Main reason is to make sure that we have a unique mapping from ClassName to PassName in the new passmanager framework, making it possible to correctly identify the passes when dealing with options such as -print-after and -print-pipeline-passes. This is a follow-up to D105006 and D105007.	2021-09-16 14:58:42 +02:00
Alexandros Lamprineas	1bd5ea968e	[ARM] Mitigate the cve-2021-35465 security vulnurability. Recently a vulnerability issue is found in the implementation of VLLDM instruction in the Arm Cortex-M33, Cortex-M35P and Cortex-M55. If the VLLDM instruction is abandoned due to an exception when it is partially completed, it is possible for subsequent non-secure handler to access and modify the partial restored register values. This vulnerability is identified as CVE-2021-35465. The mitigation sequence varies between v8-m and v8.1-m as follows: v8-m.main --------- mrs r5, control tst r5, #8 /* CONTROL_S.SFPA / it ne .inst.w 0xeeb00a40 / vmovne s0, s0 / 1: vlldm sp / Lazy restore of d0-d16 and FPSCR. / v8.1-m.main ----------- vscclrm {vpr} / Clear VPR. / vlldm sp / Lazy restore of d0-d16 and FPSCR. */ More details on developer.arm.com/support/arm-security-updates/vlldm-instruction-security-vulnerability Differential Revision: https://reviews.llvm.org/D109157	2021-09-16 12:56:43 +01:00
Alexandros Lamprineas	61f25daa8d	[ARM][CMSE] Clear the secure fp-registers when using softfp abi. When expanding the non-secure call instruction we are emiting code to clear the secure floating-point registers only if the targeted architecture has floating-point support. The potential problem is when the source code containing non-secure calls are built with -mfloat-abi=soft but some other part of the system has been built with -mfloat-abi=softfp (soft and softfp are compatible as they use the same procedure calling standard). In this case floating-point registers could leak to non-secure state as the non-secure won't have cleared them assuming no floating point has been used. Differential Revision: https://reviews.llvm.org/D109153	2021-09-16 12:56:43 +01:00
Cullen Rhodes	17f1ccc759	[AArch64][SVE] NFC: Remove unnecessary if	2021-09-16 11:26:46 +00:00
Simon Pilgrim	1ef62cb200	[X86] SimplifyDemandedVectorEltsForTargetNode - add PSADBW handling Peek through PSADBW operands to handle non demanded elements.	2021-09-16 11:28:31 +01:00
Konstantin Schwarz	d2e66d7fa4	[GlobalISel] Add a combine for and(load , mask) -> zextload This only handles simple masks, not shifted masks, for now. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D109357	2021-09-16 10:42:46 +02:00
Anton Afanasyev	6a5f49a1ac	[AggressiveInstCombine] Add `{insert/extract}element` to `TruncInstCombine` DAG Alive2 for `{insert/extract}element`: https://alive2.llvm.org/ce/z/hwy_E- Actually, no one file of test suite is touched by this change, which means that is rare pattern not generated by frontend. But it's worth being in place. Differential Revision: https://reviews.llvm.org/D109236	2021-09-16 11:24:31 +03:00

... 2 3 4 5 6 ...

151087 Commits