llvm-project

Commit Graph

Author	SHA1	Message	Date
Chen Zheng	f6db18fd4a	[PowerPC][NFC] make option ppc-formprep-max-vars can be set more than one time.	2021-11-04 13:44:58 +00:00
Simon Pilgrim	87d5bb66eb	[X86][SSE] Improve PMADDWD SimplifyDemandedVectorElts handling Check both operands for zero elements to remove unnecessary demanded elts. Try to help reduce some minor regressions noticed in D110995	2021-11-04 12:56:31 +00:00
Florian Hahn	b4992dbb21	[LV] Clarify uniform worklist contains instrs demanding lane 0.	2021-11-04 13:11:50 +01:00
Tim Northover	3d39612b3d	Coroutines: don't infer function attrs before lowering Coroutines have weird semantics that don't quite match normal LLVM functions, so trying to infer even simple attributes based on thier contents can go wrong.	2021-11-04 10:24:28 +00:00
David Green	1e5f814302	[InstCombine] Fix infinite recursion in ashr/xor vector fold. The added test has poison lanes due to the vector shuffle. This can cause an infinite loop of combines in instcombine where it folds xor(ashr, -1) -> select (icmp slt 0), -1, 0 -> sext (icmp slt 0) -> xor(ashr, -1). We usually prevent this by checking that the xor constant is not -1, but with vectors some of the lanes may be -1, some may be poison. So this changes the way we detect that from "!C1->isAllOnesValue()" to "!match(C1, m_AllOnes())", which is more able to detect that some of the lanes are poison. Fixes PR52397	2021-11-04 09:24:27 +00:00
Qiu Chaofan	a84118756c	[PowerPC] Enforce side effects to FPSCR read/set intrinsics Currently, FPSCR is not modeled, so in some early passes (such as early-cse), the read/set intrinsics to FPSCR may get incorrect simplification. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D112380	2021-11-04 11:45:32 +08:00
RamNalamothu	539f500e78	[AMDGPU] Do not add debug locations to the code inside prologue There is no real source location for code inside prologue as it is generated by compiler but source locations are being added to code inside prologue as a side effect of https://reviews.llvm.org/D99269 because buildSpillLoadStore() is using source location of the real instruction in the basic block if any. Fixes: SWDEV-307590 Reviewed By: scott.linder, sebastian-ne Differential Revision: https://reviews.llvm.org/D113100	2021-11-04 08:02:41 +05:30
Philip Reames	d4708fa480	Backout must-exit based parts of `3fc9882e`, and 412eb0 Not sure these are correct. I think I missed a case when porting this from the original SCEV change to the IndVar changes. I may end up reapplying this later with a comment about how this is correct, but in case the current bad feeling turns out to be true, I'm removing from tree while investigating further.	2021-11-03 15:19:49 -07:00
Arthur Eubanks	88052fc362	[ArgPromo] Preserve FunctionAnalysisManagerCGSCCProxy We already make sure to properly clear analyses for deleted functions. This makes investigating some future potential compile time improvements easier. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D113032	2021-11-03 14:56:58 -07:00
Craig Topper	5022ac0771	[RISCV] Use HasVInstructions and HasVInstructionsAnyF in more place in TableGen. NFC Change RISCVSubtarget.hasVInstructionAnyF() to call hasVInstructionsF32 so that any changes to hasVInstructionsF32 are reflected. The files were missed in D112496.	2021-11-03 14:32:45 -07:00
Matthias Braun	847a680733	X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr This is a re-commit of `e2c7ee0743` which was reverted in `a2a58d91e8`. This includes a fix to consistently check for EFLAGS being live-out. See phabricator review. Original Summary: This extends `optimizeCompareInstr` to re-use previous comparison results if the previous comparison was with an immediate that was 1 bigger or smaller. Example: CMP x, 13 ... CMP x, 12 ; can be removed if we change the SETg SETg ... ; x > 12 changed to `SETge` (x >= 13) removing CMP Motivation: This often happens because SelectionDAG canonicalization tends to add/subtract 1 often when optimizing for fallthrough blocks. Example for `x > C` the fallthrough optimization switches true/false blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into `x < C + 1`. Differential Revision: https://reviews.llvm.org/D110867	2021-11-03 14:12:23 -07:00
Philip Reames	64990f1408	Revert "[indvars] Move a check slightlly earlier [NFC]" This reverts commit `7ff943a9ed`. This wasn't NFC. isSigned != !isUnsigned as there are also relational operators.	2021-11-03 13:38:52 -07:00
Kirill Stoimenov	a55c4ec1ce	[ASan] Process functions in Asan module pass This came up as recommendation while reviewing D112098. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D112732	2021-11-03 20:27:53 +00:00
alex-t	0a3d755ee9	[AMDGPU] Enable divergence-driven BFE selection Detailed description: This change enables the bit field extract patterns selection to s_bfe_u32 or v_bfe_u32 dependent on the pattern root node divergence. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D110950	2021-11-03 23:26:59 +03:00
Martin Storsjö	a39eba7207	[Support] [Windows] Use RemoveFileOnSignal if unable to use the delete-on-close flag This takes care of cleaning up the temp files on crashes. It doesn't handle cleanup when explicitly killed though. Differential Revision: https://reviews.llvm.org/D112710	2021-11-03 21:29:37 +02:00
Philip Reames	7ff943a9ed	[indvars] Move a check slightlly earlier [NFC]	2021-11-03 12:24:10 -07:00
Philip Reames	3fc9882e88	[indvars] Rotate zext though icmp to reduce loop varying computation This change looks for cases where we can prove that an exit test of a loop can be performed in a narrower bitwidth, and that by doing so we can replace a loop-varying extend with a loop-invariant truncate. The motivation here is that doing this unblocks the trip count analysis for narrow IVs involved in extended compare exit tests. It also has the nice side effect of simply making the code faster, even if we gain no other benefit from the improved analysis ability. I've noted a few places this could be extended, but I think this stands reasonable on it's own as well. Differential Revision: https://reviews.llvm.org/D112262	2021-11-03 12:09:20 -07:00
Vitaly Buka	32eb697c0a	[PassBuilder] Remove unused function after D113072	2021-11-03 12:03:17 -07:00
Vitaly Buka	3131714f8d	[NFC][asan] Use AddressSanitizerOptions in ModuleAddressSanitizerPass Reviewed By: kstoimenov Differential Revision: https://reviews.llvm.org/D113072	2021-11-03 11:32:14 -07:00
Kirill Stoimenov	b3145323b5	Revert "[ASan] Process functions in Asan module pass" This reverts commit `76ea87b94e`. Reviewed By: kstoimenov Differential Revision: https://reviews.llvm.org/D113129	2021-11-03 18:01:01 +00:00
Kirill Stoimenov	76ea87b94e	[ASan] Process functions in Asan module pass This came up as recommendation while reviewing D112098. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D112732	2021-11-03 17:51:01 +00:00
Harald van Dijk	889c2b97bd	[X86] Fix X32 indirect call generation The check for whether a zero extension was needed was subtly wrong and saw a value that was already 64 bits, so did not extend. Fixes PR52357. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112860	2021-11-03 16:43:44 +00:00
Sanjay Patel	c85df3c7d5	[InstCombine] refactor fold for icmp with trunc op; NFC There are at least 3 related folds we can add here - see D112634.	2021-11-03 12:43:15 -04:00
Roman Lebedev	9c2469c1dd	[PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass run before vectorization passes Test thanks to Michael Kuklinski from `#llvm`: https://godbolt.org/z/bdrah5Goo originally inspired by Daniel Lemire's https://lemire.me/blog/2021/10/26/in-c-is-empty-faster-than-comparing-the-size-with-zero/ We manage to deduce that the answer does not require looping, but we do that after the last `LoopDeletion` pass run, so we end up being stuck with a dead loop. Now, as with all things SCEV, this has a very expected ~`+0.12%` compile time performance regression: https://llvm-compile-time-tracker.com/compare.php?from=0ae7bf124a9bca76dd9a91b2f7379168ff13f562&to=c2ae57c9b961aeb4a28c747266949340613a6d84&stat=instructions (for comparison, doing that in function simplification pipeline would have been ~`+0.5` compile time performance regression, D112840) Looking at the transformation stats over vanilla test-suite, i think it's rather expected: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|--------------------------------------------------\|----------:\|----------:\|------:\|-------:\|-------:\| \| scalar-evolution.NumBruteForceTripCountsComputed \| 789 \| 888 \| 99 \| 12.55% \| 12.55% \| \| scalar-evolution.NumTripCountsNotComputed \| 105592 \| 117900 \| 12308 \| 11.66% \| 11.66% \| \| loop-delete.NumBackedgesBroken \| 542 \| 559 \| 17 \| 3.14% \| 3.14% \| \| regalloc.numExtends \| 81 \| 79 \| -2 \| -2.47% \| 2.47% \| \| indvars.NumFoldedUser \| 408 \| 400 \| -8 \| -1.96% \| 1.96% \| \| indvars.NumElimCmp \| 3831 \| 3758 \| -73 \| -1.91% \| 1.91% \| \| scalar-evolution.NumTripCountsComputed \| 299759 \| 304278 \| 4519 \| 1.51% \| 1.51% \| \| loop-delete.NumDeleted \| 8055 \| 8128 \| 73 \| 0.91% \| 0.91% \| \| machine-cse.NumCommutes \| 111 \| 110 \| -1 \| -0.90% \| 0.90% \| \| globaldce.NumFunctions \| 1187 \| 1192 \| 5 \| 0.42% \| 0.42% \| \| codegenprepare.NumSelectsExpanded \| 277 \| 278 \| 1 \| 0.36% \| 0.36% \| \| loop-unroll.NumRuntimeUnrolled \| 13841 \| 13791 \| -50 \| -0.36% \| 0.36% \| \| machinelicm.NumPostRAHoisted \| 1168 \| 1172 \| 4 \| 0.34% \| 0.34% \| \| phi-node-elimination.NumCriticalEdgesSplit \| 83054 \| 82879 \| -175 \| -0.21% \| 0.21% \| \| machine-cse.NumPREs \| 3085 \| 3079 \| -6 \| -0.19% \| 0.19% \| \| branch-folder.NumBranchOpts \| 108122 \| 107942 \| -180 \| -0.17% \| 0.17% \| \| loop-unroll.NumUnrolled \| 40136 \| 40067 \| -69 \| -0.17% \| 0.17% \| \| branch-folder.NumDeadBlocks \| 130818 \| 130607 \| -211 \| -0.16% \| 0.16% \| \| codegenprepare.NumBlocksElim \| 92856 \| 92714 \| -142 \| -0.15% \| 0.15% \| \| instsimplify.NumSimplified \| 103263 \| 103129 \| -134 \| -0.13% \| 0.13% \| \| instcombine.NumConstProp \| 26070 \| 26102 \| 32 \| 0.12% \| 0.12% \| \| instsimplify.NumExpand \| 1716 \| 1718 \| 2 \| 0.12% \| 0.12% \| \| loop-unroll.NumCompletelyUnrolled \| 9236 \| 9225 \| -11 \| -0.12% \| 0.12% \| \| branch-folder.NumHoist \| 2773 \| 2770 \| -3 \| -0.11% \| 0.11% \| \| regalloc.NumReloadsRemoved \| 10822 \| 10834 \| 12 \| 0.11% \| 0.11% \| \| regalloc.NumSnippets \| 11394 \| 11406 \| 12 \| 0.11% \| 0.11% \| \| machine-cse.NumCrossBBCSEs \| 1052 \| 1053 \| 1 \| 0.10% \| 0.10% \| \| machinelicm.NumCSEed \| 99887 \| 99784 \| -103 \| -0.10% \| 0.10% \| \| branch-folder.NumTailMerge \| 72501 \| 72435 \| -66 \| -0.09% \| 0.09% \| \| codegenprepare.NumExtUses \| 22007 \| 21987 \| -20 \| -0.09% \| 0.09% \| \| local.NumRemoved \| 68232 \| 68294 \| 62 \| 0.09% \| 0.09% \| \| loop-vectorize.LoopsAnalyzed \| 75483 \| 75413 \| -70 \| -0.09% \| 0.09% \| ``` Note that i'm only changing current PM, and not touching obsolete PM. This is an alternative to the function simplification pipeline variant of the same change, D112840. It has both less compile time impact (since the additional number of SCEV trip count calculations is way lass less than with the D112840), and it is much more powerful/impactful (almost 2x more loops deleted). I have checked, and doing this after loop rotation is favorable (more loops deleted). Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D112851	2021-11-03 19:24:49 +03:00
Kazu Hirata	4bef0304e1	[AArch64, AMDGPU] Use make_early_inc_range (NFC)	2021-11-03 09:22:51 -07:00
Hans Wennborg	a2a58d91e8	Revert "X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr" This casued miscompiles of switches, see comments on the code review. > This extends `optimizeCompareInstr` to re-use previous comparison > results if the previous comparison was with an immediate that was 1 > bigger or smaller. Example: > > CMP x, 13 > ... > CMP x, 12 ; can be removed if we change the SETg > SETg ... ; x > 12 changed to `SETge` (x >= 13) removing CMP > > Motivation: This often happens because SelectionDAG canonicalization > tends to add/subtract 1 often when optimizing for fallthrough blocks. > Example for `x > C` the fallthrough optimization switches true/false > blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into > `x < C + 1`. > > Differential Revision: https://reviews.llvm.org/D110867 This reverts commit `e2c7ee0743`.	2021-11-03 17:01:36 +01:00
Roman Lebedev	df93c8a919	[X86] `X86TTIImpl::getInterleavedMemoryOpCostAVX512()`: fallback to scalarization cost computation for mask I don't really buy that masked interleaved memory loads/stores are supported on X86. There is zero costmodel test coverage, no actual cost modelling for the generation of the mask repetition, and basically only two LV tests. Additionally, i'm not very interested in AVX512. I don't know if this really helps "soft" block over at https://reviews.llvm.org/D111460#inline-1075467, but i think it can't make things worse at least. When we are being told that there is a masking, instead of completely giving up and falling back to fully scalarizing `BasicTTIImplBase::getInterleavedMemoryOpCost()`, let's correctly query the cost of masked memory ops, keep all the pretty shuffle cost modelling, but scalarize the cost computation for the mask replication. I think, not scalarizing the shuffles themselves may adjust the computed costs a bit, and maybe hopefully just enough to hide the "regressions" at https://reviews.llvm.org/D111460#inline-1075467 I do mean hide, because the test coverage is non-existent. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112873	2021-11-03 18:14:35 +03:00
Erich Keane	09233412ed	Revert part of D112349 to allow ifunc resolvers be declarations. The patch in D112349 added a previously nonexistant restriction on ifunc resolvers that they MUST be defintions. However, the function multiversioning depends on being able to resolve these resolvers at link-time, so this additional restriction was breaking.	2021-11-03 07:15:16 -07:00
David Sherwood	c0f2774973	[NFC][LoopVectorize] Simple tidy-up in InnerLoopVectorizer::createVectorIntOrFpInductionPHI Use getSignedIntOrFpConstant instead of creating int or FP constants manually.	2021-11-03 14:05:21 +00:00
Peter Waller	7a34145f40	Reland "[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store" This reverts commit `753eba6421`. Contiguous gather => masked load: (sve.ld1.gather.index Mask BasePtr (sve.index IndexBase 1)) => (masked.load (gep BasePtr IndexBase) Align Mask undef) Contiguous scatter => masked store: (sve.ld1.scatter.index Value Mask BasePtr (sve.index IndexBase 1)) => (masked.store Value (gep BasePtr IndexBase) Align Mask) Tests with <vscale x 2 x double>: [Gather, Scatter] for each [Positive test (index=1), Negative test (index=2), Alignment propagation]. Differential Revision: https://reviews.llvm.org/D112076	2021-11-03 13:42:14 +00:00
Peter Waller	753eba6421	Revert "[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store" This reverts commit `1febf42f03`, which has a use-of-uninitialized-memory bug. See: https://reviews.llvm.org/D112076	2021-11-03 13:39:38 +00:00
Florian Hahn	64bc31ee93	[LV] Drop unneeded use of getVPSingleValue (NFC). VPReductionPHIRecipe inherits from VPValue, so there's no need to call getVPSingleValue.	2021-11-03 14:26:15 +01:00
Florian Hahn	8e44bdd12a	[VPlan] Make VPWidenCanonicalIVRecipe a VPValue (NFC). The recipe produces exactly one VPValue and can inherit directly from it. This is in line with other recipes and avoids having to use getVPSingleValue.	2021-11-03 14:11:01 +01:00
Andrew Savonichev	123ad720f1	[NVPTX] Mark special registers as reserved A reserved register: - is not allocatable - is considered always live - is ignored by liveness tracking NVPTX special registers match the criteria, and marking them as reserved helps to avoid machine verifier error: * Bad machine code: Using an undefined physical register * - function: foo - basic block: %bb.0 (0x557bb178b708) - instruction: %0:int32regs = MOV_SPECIAL $envreg0 - operand 1: $envreg0 Differential Revision: https://reviews.llvm.org/D113008	2021-11-03 15:48:04 +03:00
Cullen Rhodes	d968b173d3	[TableGen] Emit a warning for unused template args Add a warning to TableGen for unused template arguments in classes and multiclasses, for example: multiclass Foo<int x> { def bar; } $ llvm-tblgen foo.td foo.td:1:20: warning: unused template argument: Foo::x multiclass Foo<int x> { ^ A flag '--no-warn-on-unused-template-args' is added to disable the warning. The warning is disabled for LLVM and sub-projects if 'LLVM_ENABLE_WARNINGS=OFF'. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D109359	2021-11-03 11:55:07 +00:00
Andrew Savonichev	0e70785538	[NVPTX] Add MoveParam instruction for TargetExternalSymbol operand TargetExternalSymbol is considered to be an immediate and not a register, so machine verifier emits an error: * Bad machine code: Expected a register operand. * - function: static_offset - basic block: %bb.0 bb (0x560e9b306028) - instruction: %3:int64regs = MoveParamI64 &static_offset_param_1 - operand 1: &static_offset_param_1 The patch adds variants of this instruction with an immediate operand for byval arguments on 64-bit and 32-bit targets. Differential Revision: https://reviews.llvm.org/D113006	2021-11-03 14:43:41 +03:00
David Green	3bc586b9aa	[ARM] Treat MVE gather add-like-or's like adds LLVM has the habit of turning adds with no common bits set into ors, which means we need to detect them and treat them like adds again in the MVE gather/scatter lowering pass. Differential Revision: https://reviews.llvm.org/D112922	2021-11-03 11:41:06 +00:00
Peter Waller	1febf42f03	[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store Contiguous gather => masked load: (sve.ld1.gather.index Mask BasePtr (sve.index IndexBase 1)) => (masked.load (gep BasePtr IndexBase) Align Mask undef) Contiguous scatter => masked store: (sve.ld1.scatter.index Value Mask BasePtr (sve.index IndexBase 1)) => (masked.store Value (gep BasePtr IndexBase) Align Mask) Tests with <vscale x 2 x double>: [Gather, Scatter] for each [Positive test (index=1), Negative test (index=2), Alignment propagation]. Differential Revision: https://reviews.llvm.org/D112076	2021-11-03 11:02:44 +00:00
David Green	d36dd1f842	[ARM] Push gather/scatter shl index updates out of loops This teaches the MVE gather scatter lowering pass that SHL is essentially the same as Mul, where we are able to optimize the induction of a gather/scatter address by pushing them out of loops. https://alive2.llvm.org/ce/z/wG4VyT Differential Revision: https://reviews.llvm.org/D112920	2021-11-03 11:00:05 +00:00
Qiu Chaofan	741aeda97d	[PowerPC] Implement longdouble pack/unpack builtins Implement two builtins to pack/unpack IBM extended long double float, according to GCC 'Basic PowerPC Builtin Functions Available ISA 2.05'. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D112055	2021-11-03 17:57:25 +08:00
Andrew Savonichev	30a3a17df8	[NVPTX] Copy machine operand flags in TII::insertBranch Before this patch, flags such as undef were dropped by TII::insertBranch (used by BranchFolding pass), resulting in the following error from machine verifier: * Bad machine code: Reading virtual register without a def * - function: hoge - basic block: %bb.0 bb (0x562e9c240e68) - instruction: CBranch %2:int1regs, %bb.3 - operand 0: %2:int1regs Differential Revision: https://reviews.llvm.org/D113001	2021-11-03 12:38:27 +03:00
Yi Kong	803d4f8a35	[ARM][AsmParser] Don't emit "deprecated instruction in IT block" warning if requested Also fixed formatting in AsmMatcherEmitter because it was confusing. Differential Revision: https://reviews.llvm.org/D112993	2021-11-03 17:18:04 +08:00
Piotr Sobczak	03961709ed	[InstCombine] Extend pattern to replace shuffle's insertelement operand In D71220 a pattern was added to replace shuffle's insertelement operand if inserted scalar is not demanded. The pattern was added only for the case where the shuffle's mask size is equal to element's vector size. However, that condition is not required because the pattern does not change the shuffle vector size. This patch extends the pattern to also include cases where shuffle's mask size is not equal to element's vector size. Differential Revision: https://reviews.llvm.org/D112318	2021-11-03 09:43:04 +01:00
Ben Shi	59c3b48d99	Revert "[AArch64] Optimize add/sub with immediate" This reverts commit `3de3ca3137`.	2021-11-03 14:15:21 +08:00
Chen Zheng	5a8b196340	[PowerPC] handle more splat loads without stack operation This mostly improves splat loads code generation on Power7 Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D106555	2021-11-03 05:17:41 +00:00
Johannes Doerfert	d61aac76bf	[OpenMP][FIX] Do not signal SPMD-mode but then keep generic-mode If we assume SPMD-mode during the fixpoint iteration we have to execute the kernel in SPMD-mode. If we change our mind during manifest there is the chance of a mismatch between the simplification, e.g., of `__kmpc_is_spmd_exec_mode` calls, and the execution mode. This problem was introduced in D109438. This patch is compromise to resolve the problem purely in OpenMP-opt while trying to keep the benefits of D109438 around. This might not always work, see `get_hardware_num_threads_in_block_fold` but it often does. At the same time we do keep value specialization and execution mode in sync. Proper solutions to this problem should be considered. I believe a new execution mode is the easiest way forward (Singleton-SPMD). Alternatively, SPMD-mode execution can be used with a way to provide a new thread_limit (here 1) to the runtime. This is more general and could be useful if we see `num_threads` clauses or workshared loops with small trip counts in the kernel. In either proposal we need to disable the guarding for the kernel (which was the motivation for D109438). Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D112894	2021-11-02 23:22:04 -05:00
Johannes Doerfert	73720c8059	[OpenMP][FIX] Introduce and use a simple generic-mode barrier Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was OK to be used in the custom state machine. Now that SPMD barriers are assumed to be aligned we need to use a "generic" barrier in places that are not aligned. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112893	2021-11-02 23:22:01 -05:00
Johannes Doerfert	e6e440ae5f	[OpenMP][FIX] Ensure guarding uses proper global name Global symbols cannot have any name so we need to sanitize the string first. Also remove an assertion that is not actually necessary nor true in general. Reviewed By: ggeorgakoudis Differential Revision: https://reviews.llvm.org/D112892	2021-11-02 23:21:53 -05:00
Abinav Puthan Purayil	fbe61fb0aa	[AMDGPU] Fix SGPR checks in S_MOV_B64_IMM_PSEUDO generation. The function to generate S_MOV_B64_IMM_PSEUDO was recently modified to optimize AGPR to AGPR copy but it missed checking for the SGPR clobbering for the S_MOV_B64_IMM_PSEUDO generation. Differential Revision: https://reviews.llvm.org/D113005	2021-11-03 09:09:24 +05:30
Ben Shi	3de3ca3137	[AArch64] Optimize add/sub with immediate Optimize ([add\|sub] r, imm) -> ([ADD\|SUB] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([add\|sub] r, imm) -> ([SUB\|ADD] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. Reviewed By: jaykang10, dmgreen Differential Revision: https://reviews.llvm.org/D111034	2021-11-03 03:06:43 +00:00
Liren Peng	57e093162e	[ScalarEvolution] Infer loop max trip count from array accesses Data references in a loop should not access elements over the statically allocated size. So we can infer a loop max trip count from this undefined behavior. Reviewed By: reames, mkazantsev, nikic Differential Revision: https://reviews.llvm.org/D109821	2021-11-03 10:40:18 +08:00
Phoebe Wang	8f101971b6	[X86][VARARG] Assign MMO earlier to avoid prolog insert point been sunk across VASTART_SAVE_XMM_REGS The changes in D80163 defered the assignment of MachineMemOperand (MMO) until the X86ExpandPseudo pass. This will result in crash due to prolog insert point been sunk across the pseudo instruction VASTART_SAVE_XMM_REGS. Moving the assignment to the creation of the node can avoid the problem. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D112859	2021-11-03 10:13:32 +08:00
Mircea Trofin	34f4fe3a90	[NFC][Regalloc] Ensure Query::interferingVRegs is accurate. To correctly use Query, one had to first call collectInterferingVRegs to pre-cache the query result, then call interferingVRegs. Failing the former, interferingVRegs could be stale. This did cause a bug which was addressed in D98232, but the underlying usability issue of the Query API wasn't. This patch addresses the latter by making collectInterferingVRegs an implementation detail, and having interferingVRegs play both roles. One side-effect of this is that interferingVRegs is not const anymore. Differential Revision: https://reviews.llvm.org/D112882	2021-11-02 18:26:54 -07:00
Kazu Hirata	1b108ab975	[Transforms] Use make_early_inc_range (NFC)	2021-11-02 18:13:23 -07:00
Hongtao Yu	d0eb472f33	[llvm-profdata] Print out section flags for FunctionMetadata section As titled. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D113064	2021-11-02 17:59:22 -07:00
Eli Friedman	c964afb2c8	[AArch64] Diagnose large adrp offset on Windows. On Windows, this relocation can only encode a 21-bit offset. Make sure we emit an error, instead of silently truncating the offset. Found investigating https://bugs.llvm.org/show_bug.cgi?id=52378 Differential Revision: https://reviews.llvm.org/D113051	2021-11-02 15:11:22 -07:00
Nikita Popov	c00e9c6345	[BasicAA] Check known access sizes earlier (NFC) All heuristics for variable accesses require both access sizes to be known, so check this once at the start, rather than for each particular heuristic.	2021-11-02 21:26:26 +01:00
Nikita Popov	0b6ed92c8a	[BasicAA] Use early returns (NFC) Reduce nesting in aliasGEP() a bit by returning early.	2021-11-02 21:17:36 +01:00
Simon Pilgrim	53900a19fd	[X86][AVX] combineConcatVectorOps - use getBROADCAST_LOAD helper for splat of normal vector loads. NFCI. Reapplied from rG1cfecf4fc427 with fix for PR51226 - ensure the load is a normal (non-ext) load.	2021-11-02 20:03:25 +00:00
Nikita Popov	51e9f33603	[BasicAA] Use saturating multiply on range if nsw If we know that the var * scale multiplication is nsw, we can use a saturating multiplication on the range (as a good approximation of an nsw multiply). This recovers some cases where the fix from D112611 is unnecessarily strict. (This can be further strengthened by using a saturating add, but we currently don't track all the necessary information for that.) This exposes an issue in our NSW tracking for multiplies. The code was assuming that (X +nsw Y) nsw Z results in (X nsw Z) +nsw (Y nsw Z) -- however, it is possible that the distributed multiplications overflow, even if the non-distributed one does not. We should discard the nsw flag if the the offset is non-zero. If we just have (X nsw Y) nsw Z then concluding X nsw (Y *nsw Z) is fine. Differential Revision: https://reviews.llvm.org/D112848	2021-11-02 20:27:39 +01:00
Chih-Ping Chen	2ed29d87ef	[CodeView] Fortran debug info emission in Code View. Differential Revision: https://reviews.llvm.org/D112826	2021-11-02 15:06:21 -04:00
Christopher Tetreault	5718b9f128	[NFC] Reformat VerifyPreservedCFG for non-CPP-aware syntax highlighters * Move `);` outside the #ENDIF. Syntax highlighters that highlight missed closing parens, but are not aware of the C Preprocessor saw the original code as having missed parens.	2021-11-02 11:35:38 -07:00
Simon Pilgrim	82e0eb22af	[X86][AVX] combineConcatVectorOps - use getBROADCAST_LOAD helper. NFCI. This is part of rG1cfecf4fc427 that was reverted to fix PR51226 - concating the broadcasts is OK, its the splatted loads that crash (we're not detecting extloads). I'm still creating a reduced test case so haven't added the load handling again yet.	2021-11-02 18:04:35 +00:00
Fraser Cormack	d065b03801	[RISCV] Optimize vp.load with an all-ones mask Similar to D110206, this patch optimizes unmasked vp.load intrinsics to avoid the need of a vmset instruction to set the mask. It does so by selecting a riscv_vle intrinsic rather than a riscv_vle_mask intrinsic. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D113022	2021-11-02 17:23:39 +00:00
Dmitry Makogon	e09958d5eb	[LoopPeel] Peel loops with exits followed by an unreachable or deopt block Added support for peeling loops with exits that are followed either by an unreachable-terminated block or block that has a terminatnig deoptimize call. All blocks in the sequence must have an unique successor, maybe except for the last one. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D110922	2021-11-02 23:12:04 +07:00
Arthur Eubanks	e2024d72fa	Revert "[NFC] Remove LinkAll*.h" This reverts commit `fe364e5dc7`. Causes breakages, e.g. https://lab.llvm.org/buildbot/#/builders/188/builds/5266	2021-11-02 09:08:09 -07:00
Jamie Schmeiser	816761f044	Add new choices dot-cfg and dot-cfg-quiet to print-changed. Summary: Add new options -print-changed=[dot-cfg \| dot-cfg-quiet] which create a website of DOT files showing colourized changes as the IR is changed by passes in the new pass manager pipeline. A new change reporter is introduced that creates a website of changes made by passes in the opt pipeline that change the IR. The hidden option -dot-cfg-dir=<dir> specifies a directory (defaulting to "./") into which the website will be created. A file passes.html is created that contains a list of all the passes that act on the IR. Those that do not change the IR are listed as omitted because of no change, ignored or filtered out (using -filter-print-func and -filter-passes) or not listed in quiet mode. Those that do change the IR are listed as a link to a DOT file which contains a CFG depiction of the IR (ala -dot-cfg) except that the instructions, basic blocks and links that are only in the IR before the pass (ie, removed) and those that are only in the IR after the pass (ie, added) are shown in red and green, respectively, while the aspects of the CFG that do not change are shown in black. Additional hidden options -dot-cfg-before-color=<dot named color>, -dot-cfg-after-color=<dot named color> and -dot-cfg-common-color=<dot named color> are defined that allow the customization of the colors used in colorizing the CFG. -change-printer-dot-path=<path to dot exe> is also added. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: aeubanks (Arthur Eubanks) Differential Revision: https://reviews.llvm.org/D87202	2021-11-02 12:06:25 -04:00
Arthur Eubanks	fe364e5dc7	[NFC] Remove LinkAll*.h These were added to prevent functions from being removed by WPO. But that doesn't make sense, correct WPO will not remove functions we actually use. I noticed these because compiling cc1_main.cpp was pulling in random LLVM pass headers. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112971	2021-11-02 08:43:17 -07:00
Jay Foad	be1a8f8834	[AMDGPU] Really preserve LiveVariables in SILowerControlFlow https://bugs.llvm.org/show_bug.cgi?id=52204 Differential Revision: https://reviews.llvm.org/D112731	2021-11-02 15:03:37 +00:00
Matt	895145aacb	Revert "[AArch64][SVE] Combine predicated FMUL/FADD into FMA" This reverts commit `fc28a2f8ce`.	2021-11-02 14:56:01 +00:00
Youngsuk Kim	76b53da3ce	[SimpleLoopUnswitch] Remove duplicate include. Header "llvm/Transforms/Scalar/SimpleLoopUnswitch.h" is currently included twice. This commit removes the duplicate 'include' line. Previous commit `693eedb138` seems to have mistakenly added the duplicate 'include'. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D112979	2021-11-02 15:22:41 +01:00
Sanjay Patel	829146164f	[InstCombine] change 'not' match for bitwise select The tests diffs are logically equivalent, and so this is generally NFC, but this makes the code match the code comment. It should also be more efficient. If we choose the 'not' operand (rather than the 'not' instruction) as the select condition, then we don't have to invert the select condition/operands as a subsequent transform.	2021-11-02 10:16:01 -04:00
Simon Pilgrim	e173631dd1	[X86][AVX] SimplifyDemandedVectorEltsForTargetNode - use getBROADCAST_LOAD helper. NFCI. Reduce width of X86ISD::SUBV_BROADCAST_LOAD node.	2021-11-02 14:07:22 +00:00
Simon Pilgrim	8ca666a280	[X86][AVX] lowerV2X128Shuffle - use getBROADCAST_LOAD helper. NFCI.	2021-11-02 14:07:21 +00:00
Daniele Vettorel	67887b0f81	[Scalarizer] Do not insert instructions between PHI nodes and debug intrinsics. The scalarizer pass seems to be inserting instructions in-between PHI nodes or debug intrinsics that end up staying at the end of the pass, resulting in malformed IR and violating assumptions. This patch adds a check to make sure the `extractelement` instructions that it adds are correctly placed after all PHI nodes and debug intrinsics. Patch by vettoreldaniele. Reviewed By: bjope Differential Revision: https://reviews.llvm.org/D112472	2021-11-02 09:53:59 -04:00
Martin Liska	c5029023fb	Fix building with GCC 12: Fixes: https://bugs.llvm.org/show_bug.cgi?id=52380 Differential Revision: https://reviews.llvm.org/D112990	2021-11-02 14:28:00 +01:00
jacquesguan	a39eadcf16	[DAGCombiner] Teach combineShiftToMULH to handle constant and const splat vector. Fold (srl (mul (zext i32:$a to i64), i64:c), 32) -> (mulhu $a, $b), if c can truncate to i32 without loss. Reviewed By: frasercrmck, craig.topper, RKSimon Differential Revision: https://reviews.llvm.org/D108129	2021-11-02 12:04:23 +00:00
David Callahan	4ec1b8eeac	[RISCV] Fix invalid kill on callee save A callee save may be live (specifically X1) on entry and so a spill should not mark it killed. Differential Revision: https://reviews.llvm.org/D111285	2021-11-02 11:56:54 +00:00
Simon Pilgrim	325031786e	[SelectionDAG] Optimize expansion for rotates/funnel shifts If the type of a funnel shift needs to be expanded, expand it to two funnel shifts instead of regular shifts. For constant shifts, this doesn't make much difference, but for variable shifts it allows a more optimal lowering. Also use the optimized funnel shift lowering for rotates. Alive2: https://alive2.llvm.org/ce/z/TvHDB- / https://alive2.llvm.org/ce/z/yzPept (Branched from D108058 as getting this completed should help unlock some other WIP patches). Original Patch: @efriedma (Eli Friedman) Differential Revision: https://reviews.llvm.org/D112443	2021-11-02 11:38:25 +00:00
Simon Pilgrim	37e17f278f	[DAG] MatchRotate - remove (redundant) legal type check. Rely on the hasOperation() instead - as commented on D77804, the mid-term intention is to recognise rotate/funnel-by-constant pre-legalization to help avoid SimplifyDemandedBits regressions.	2021-11-02 11:24:50 +00:00
Frederic Cambus	650311737e	[llvm-readobj] Add support for reading OpenBSD ELF core notes. Notes generated in OpenBSD core files provide additional information about the kernel state and CPU registers. These notes are described in core.5, which can be viewed here: https://man.openbsd.org/core.5 Differential Revision: https://reviews.llvm.org/D111966	2021-11-02 10:18:54 +01:00
Rosie Sumpter	dcb8222d87	[LoopVectorize] Propagate fast-math flags for inloop reductions This patch updates VPReductionRecipe::execute so that the fast-math flags associated with the underlying instruction of the VPRecipe are propagated through to the reductions which are created. Differential Revision: https://reviews.llvm.org/D112548	2021-11-02 08:59:53 +00:00
Kazu Hirata	6bdb61c58a	[CodeGen] Use make_early_inc_range (NFC)	2021-11-01 22:38:49 -07:00
hsmahesha	e9ea992496	[IR] Replace all uses of a constant expression by corresponding instruction When a constant expression CE is being converted into a corresponding instruction I, CE is supposed to be replaced by I. However, it is possible that CE is being used multiple times within a parent instruction PI. Make sure that all the uses of CE within PI are replaced by I. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D112717	2021-11-02 10:01:46 +05:30
Hongtao Yu	d137854412	[SamplePGO] Fix callsite sample lookup to use dwarf names when dwarf linkage name isn't available. When linkage name isn't available in dwarf (ususally the case of C code), looking up callee samples should be based on the dwarf name instead of using an empty string. Also fixing a test issue where using empty string to look up callee samples accidentally returns the correct samples because it is treated as indirect call. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D112948	2021-11-01 21:24:33 -07:00
Lang Hames	e9014d9743	[ORC] Run incoming jit-dispatch calls via the TaskDispatcher in SimpleRemoteEPC. Handlers for jit-dispatch calls are allowed to make their own EPC calls, so we don't want to run these on the handler thread.	2021-11-01 15:49:14 -07:00
Wouter van Oortmerssen	ac65366485	[WebAssembly] support "return" and unreachable code in asm type checker To support return (it not being supported well was the ground cause for https://github.com/WebAssembly/wasi-sdk/issues/200) we also have to have at least a basic notion of unreachable, which in this case just means to stop type checking until there is an end_block (an incoming control flow edge). This is conservative (may miss on some type checking opportunities) but is simple and an improvement over what we had before. Differential Revision: https://reviews.llvm.org/D112953	2021-11-01 15:42:58 -07:00
Yonghong Song	f63405f6e3	BPF: Workaround an InstCombine ICmp transformation with llvm.bpf.compare builtin Commit `acabad9ff6` ("[InstCombine] try to canonicalize icmp with trunc op into mask and cmp") added a transformation to convert "(conv)a < power_2_const" to "a & <const>" in certain cases and bpf kernel verifier has to handle the resulted code conservatively and this may reject otherwise legitimate program. This commit tries to prevent such a transformation. A bpf backend builtin llvm.bpf.compare is added. The ICMP insn, which is subject to above InstCombine transformation, is converted to the builtin function. The builtin function is later lowered to original ICMP insn, certainly after InstCombine pass. With this change, all affected bpf strobemeta* selftests are passed now. Differential Revision: https://reviews.llvm.org/D112938	2021-11-01 14:46:20 -07:00
Arthur Eubanks	029f1a5344	[LazyCallGraph] Skip blockaddresses blockaddresses do not participate in the call graph since the only instructions that use them must all return to someplace within the current function. And passes cannot retrieve a function address from a blockaddress. This was suggested by efriedma in D58260. Fixes PR50881. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D112178	2021-11-01 13:10:24 -07:00
Nikita Popov	4972d12185	[SCEV] Only add direct loop users (NFC) It it now sufficient to track only direct addrec users of a loop, and let the SCEVUsers mechanism track and invalidate transitive users. Differential Revision: https://reviews.llvm.org/D112875	2021-11-01 18:49:43 +01:00
Cameron McInally	702fd3d323	[SVE] Fix VLS FMA matching for CodeGenOpt::Aggressive. For NEON, FMA matching is done in the MachineCombiner, and not the DAGCombiner. That causes problems with VLS lowering, since the vectors are fixed width at the DAGCombiner, but are scalable in the MachineCombiner. This patch corrects it by matching FMAs for VLS vectors in the DAGCombiner. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D112557	2021-11-01 10:43:52 -07:00
Jay Foad	b8016b626e	[CodeGen] Tweak coding style in LivePhysRegs::stepForward. NFC.	2021-11-01 16:01:24 +00:00
Sanjay Patel	42c94bc1ab	[InstCombine] allow vector splat matching for bitwise logic fold Similar to `54e969cffd` (and with cosmetic updates to hopefully make that easier to read), this fold has been around since early in LLVM history. Intermediate folds have been added subsequently, so extra uses are required to exercise this code. The test example actually shows an unintended consequence with extra uses - we end up with an extra instruction compared to what we started with. But this at least makes scalar/vector consistent. General proof: https://alive2.llvm.org/ce/z/tmuBza	2021-11-01 11:39:48 -04:00
Kazu Hirata	d000431fb2	[X86] Remove X86ELFObjectWriter in X86AsmBackend.cpp (NFC) Note that the identically named class is defined in an anonymous namespace in X86ELFObjectWriter.cpp.	2021-11-01 08:31:54 -07:00
Jay Foad	7afef22926	[AMDGPU] Use MachineInstrBuilder::addReg. NFC.	2021-11-01 15:29:51 +00:00
Jay Foad	2b548b18c1	[AMDGPU] Shrink v_mac_legacy_f32 and v_fmac_legacy_f32 Differential Revision: https://reviews.llvm.org/D112917	2021-11-01 13:55:53 +00:00
Matt Morehouse	4d8b0aa5c0	[HWASan] Apply TagMaskByte to every global tag. Previously we only applied it to the first one, which could allow subsequent global tags to exceed the valid number of bits. Reviewed By: hctim Differential Revision: https://reviews.llvm.org/D112853	2021-11-01 06:31:44 -07:00
Sanjay Patel	54e969cffd	[InstCombine] allow vector splat matching for bitwise logic folds This fold was added long ago (part of fixing PR4216), and it matched scalars only. Intermediate folds have been added subsequently, so extra uses are required to exercise this code. General proof: https://alive2.llvm.org/ce/z/G6BBhB One of the specific tests: https://alive2.llvm.org/ce/z/t0JhEB	2021-11-01 08:26:42 -04:00
Sanjay Patel	511ee8759f	[InstCombine] reduce code duplication with commutative matcher; NFC	2021-11-01 08:26:41 -04:00
Mubashar Ahmad	0b83a18a2b	[AArch64] Enablement of Cortex-X2 Enables support for Cortex-X2 cores. Differential Revision: https://reviews.llvm.org/D112459	2021-11-01 11:55:24 +00:00
Simon Pilgrim	6fc50e531d	[CostModel][X86] Remove old FIXME comments for AVX512F vector splitting Similar to AVX1, the cost of splitting/merging 512-bit -> 256-bits vectors for arithmetic operations are typically hidden due to different used ports etc.	2021-11-01 11:11:11 +00:00
Simon Pilgrim	fd485d8cda	[X86][AVX] Prefer VINSERTF128 over VPERM2F128 for 128->256 subvector concatenations The VINSERTF128 instruction is often much quicker, and never slower, than the more general VPERM2F128 instruction, so we should try to use that in more circumstances. This requires a fallback to a commuted VPERM2F128 for the case where we need to fold the 256-bit vector source instead of the 128-bit subvector source. There is one interesting side effect - DAGCombine's narrowExtractedVectorLoad combine gets called in a number of locations, this often creates an extracted subvector load without regard to other uses of the original wider load. I'm expecting AVX cpus to be capable of merging such aliased loads, but I do wonder whether narrowExtractedVectorLoad's call to X86TargetLowering::shouldReduceLoadWidth needs to be altered to check for more partial uses? Noticed while investigating the quality of interleaved load/store codegen. Differential Revision: https://reviews.llvm.org/D111960	2021-11-01 10:45:50 +00:00
David Sherwood	87a294d5eb	[LoopVectorize] Change getRuntimeVFAsFloat to use unsigned int->FP conversion We never expect the runtime VF to be negative so we should use the uitofp instruction instead of sitofp. Differential revision: https://reviews.llvm.org/D112610	2021-11-01 09:58:14 +00:00
Roman Lebedev	b554e41e2d	[CVP] Canonicalize signed relational comparisons of scalar integers to unsigned comparison predicates Now that the reasoning was added to ConstantRange in D90924, this replicates IndVars variant of this transform (D111836) in a pass that uses value range reasoning for the transform. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D112895	2021-11-01 12:16:05 +03:00
Jun Ma	1f9fa54984	[Taildup] Don't tail-duplicate loop header with multiple successors as its latches when Taildup hit loop with multiple latches like: // 1 -> 2 <-> 3 \| // \ <-> 4 \| // \ <-> 5 \| // \---> rest \| it may transform this loop into multiple loops by duplicate loop header. However, this change may has little benefit while makes cfg much complex. In some uncommon cases, it causes large compile time regression (offered by @alexfh in D106056). This patch disable tail-duplicate of such cases. TestPlan: check-llvm Differential Revision: https://reviews.llvm.org/D110613	2021-11-01 15:32:00 +08:00
Jun Ma	c93f93b2e3	Revert "Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."""" This reverts commit `3a998c06a8`.	2021-11-01 15:31:59 +08:00
Kazu Hirata	476e1ee3da	[AArch64] Remove unused declaration hasSwiftExtendedFrame (NFC)	2021-10-31 22:58:56 -07:00
Max Kazantsev	e512c5b166	[SCEV][NFC] Factor out common API for getting unique operands of a SCEV This function is used at least in 2 places, to it makes sense to make it separate. Differential Revision: https://reviews.llvm.org/D112516 Reviewed By: reames	2021-11-01 11:36:47 +07:00
Chen Zheng	eeed1545b2	[PowerPC] turn off chain commoning by default.	2021-11-01 04:11:10 +00:00
Itay Bookstein	848812a55e	[Verifier] Add verification logic for GlobalIFuncs Verify that the resolver exists, that it is a defined Function, and that its return type matches the ifunc's type. Add corresponding check to BitcodeReader, change clang to emit the correct type, and fix tests to comply. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112349	2021-10-31 20:00:57 -07:00
Zi Xuan Wu	cf78715cae	[CSKY] First patch to construct codegen infra and generate first add instruction Ooops. It constructs codegen infra and provide only basic code to generate first add instruction successfully. Differential Revision: https://reviews.llvm.org/D112206	2021-11-01 10:06:56 +08:00
Shoaib Meenai	0cf624cad7	[TimeProfiler] Reset variable to nullptr Otherwise we'll hit a spurious assert failure when we reset and then reinitialize TimeProfiler on the same thread, as can happen when e.g. using LLD as a library and running it multiple times in the same process. Makes `lld/test/MachO/time-trace.s` pass with `LLD_IN_TEST=2`, which runs the linker twice in the same process and exposed the issue. Reviewed By: MaskRay, mehdi_amini Differential Revision: https://reviews.llvm.org/D112880	2021-10-31 16:14:30 -07:00
Roman Lebedev	03a4f1f3b8	[ConstantRange] Sign-flipping of signedness-invariant comparisons For certain combination of LHS and RHS constant ranges, the signedness of the relational comparison predicate is irrelevant. This implements complete and precise model for all predicates, as confirmed by the brute-force tests. I'm not sure if there are some more cases that we can handle here. In a follow-up, CVP will make use of this. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D90924	2021-10-31 22:53:17 +03:00
Lang Hames	ff846fcb64	[ORC][ORC-RT] Switch MachO EH/TLV registration from EPC-calls to alloc actions. MachOPlatform used to make an EPC-call (registerObjectSections) to register the eh-frame and thread-data sections for each linked object with the ORC runtime. Now that JITLinkMemoryManager supports allocation actions we can use these instead of an EPC call. This saves us one EPC-call per object linked, and manages registration/deregistration in the executor, rather than the controller process. In the future we may use this to allow JIT'd code in the executor to outlive the controller object while still being able to be cleanly destroyed. Since the code for allocation actions must be available when the actions are run, and since the eh-frame registration code lives in the ORC runtime itself, this change required that MachO eh-frame support be split out of macho_platform.cpp and into its own macho_ehframe_registration.cpp file that has no other dependencies. During bootstrap we start by forcing emission of macho_ehframe_registration.cpp so that eh-frame registration is guaranteed to be available for the rest of the bootstrap process. Then we load the rest of the MachO-platform runtime support, erroring out if there is any attempt to use TLVs. Once the bootstrap process is complete all subsequent code can use all features.	2021-10-31 10:27:40 -07:00
Lang Hames	b77c6db959	[JITLink] Fix alloc action call signature in InProcessMemoryManager. Alloc actions should return a CWrapperFunctionResult. JITLink does not have access to this type yet, due to library layering issues, so add a cut-down version with a fixme.	2021-10-31 10:27:40 -07:00
Craig Topper	ada5458521	[RISCV] Expand scalable vector bswap. Fix crash for bitreverse. Fix LegalizeVectorOps to not try shuffle or unrolling expansions for scalable vectors. Differential Revision: https://reviews.llvm.org/D112236	2021-10-31 10:01:27 -07:00
Kazu Hirata	1a605f395f	[CodeGen] Use make_early_inc_range (NFC)	2021-10-31 07:57:36 -07:00
Kazu Hirata	72710af233	[CodeGen, Target] Use MachineBasicBlock::terminators (NFC)	2021-10-31 07:57:34 -07:00
Kazu Hirata	c714da2ceb	[Transforms] Use {DenseSet,SetVector,SmallPtrSet}::contains (NFC)	2021-10-31 07:57:32 -07:00
Kazu Hirata	4cc7c4724f	[MachineCSE] Use make_early_inc_range (NFC)	2021-10-30 19:00:23 -07:00
Kazu Hirata	c8b1ed5fb2	[clang, llvm] Use Optional::getValueOr (NFC)	2021-10-30 19:00:21 -07:00
Lang Hames	213666f804	[ORC] Move CWrapperFunctionResult out of the detail:: namespace. This type has been moved up into the llvm::orc::shared namespace. This type was originally put in the detail:: namespace on the assumption that few (if any) LLVM source files would need to use it. In practice it has been needed in many places, and will continue to be needed until/unless OrcTargetProcess is fully merged into the ORC runtime.	2021-10-30 16:12:45 -07:00
Kazu Hirata	5970249439	[Hexagon] Remove chksetELFHeaderEFlags (NFC) The function was introduced without any use on Nov 9, 2015 in commit `7cd0892729`.	2021-10-30 08:43:43 -07:00
Kazu Hirata	c3d63a0697	[Hexagon] Remove ValidArch (NFC) This function seems to be unused for at least one year.	2021-10-30 08:43:41 -07:00
Kazu Hirata	c5cd371cc9	[Hexagon] Remove unused struct InstTy (NFC)	2021-10-30 08:43:39 -07:00
Roman Lebedev	25043c8276	[NFCI] Introduce `ICmpInst::compare()` and use it where appropriate As noted in https://reviews.llvm.org/D90924#inline-1076197 apparently this is a pretty common pattern, let's not repeat it yet again, but have it in a common place. There may be some more places where it could be used, but these are the most obvious ones.	2021-10-30 17:50:06 +03:00
David Green	2c4a9e830c	[ValueTracking] Teach computeConstantRange that the maximum value of a half is 65504 The maximal value of a half is 0x7bff, which is 65504 when converted to an integer. This patch teaches that to computeConstantRange to compute a constant range with the correct maximum value. https://alive2.llvm.org/ce/z/BV_Spb https://alive2.llvm.org/ce/z/Nwuqvb The maximum value for a float converted in the same way is 3.4e38, which requires 129bits of data. I have not added that here as integer types so larger are rare, compared to integers types larger than 17 bits require for half floats. The MVE tests change because instsimplify happens to be run as a part of the backend, where it doesn't tend to for other backends. Differential Revision: https://reviews.llvm.org/D112694	2021-10-30 14:27:38 +01:00
Christudasan Devadasan	aa2d3b59ce	GlobalISel/Utils: Use incoming regbank while constraining the superclasses Register operands with superclasses can possibly have multiple regBanks if they have different register types. The regBank ambiguity resolved during regbankselect should be used to constrain the operand regclass instead of obtaining one from the MCInstrDesc. This is a prerequisite patch for D109300 that introduces allocatable AV_* Superclasses for AMDGPU by combining both VGPRs and AGPRs and we want to restrain the regclass to either A or V based on the incoming regbank. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112323	2021-10-30 07:20:45 -04:00
David Green	66281baea1	[InstCombine] Fix type of constant in canonicalizeClampLike As a followup to D108049, one of the constants could now be generated with an incorrect type, now that the input could be truncated.	2021-10-30 09:06:21 +01:00
Kazu Hirata	972d4133e9	Use {DenseSet,SmallPtrSet}::contains (NFC)	2021-10-29 20:26:07 -07:00
Lang Hames	afeb1e4ac7	[ORC] Move all pass config into MachOPlatformPlugin::modifyPassConfig. NFC, this just makes it easier to see and reason about pass ordering.	2021-10-29 20:07:45 -07:00
Duncan P. N. Exon Smith	0d5b6423ba	Support: Reduce stats in fs::copy_file on Darwin fs::copy_file() on Darwin has a nice optimization to clone the file when possible. Change the implementation to use clonefile() directly, instead of the higher-level copyfile(). The latter does the wrong thing for symlinks, which requires calling `stat` first... With that out of the way, optimistically call clonefile() all the time, and then for any error that's recoverable try again with copyfile() (without the COPYFILE_CLONE flag, as before). Differential Revision: https://reviews.llvm.org/D112250	2021-10-29 16:48:35 -07:00
Stanislav Mekhanoshin	e5340ed30c	[AMDGPU] Fix global isel for kernels using agprs on gfx90a With Global ISel getReservedRegs() is called before function is regbank selected for the first time. Defer caching of usesAGPRs() in this case. Differential Revision: https://reviews.llvm.org/D112644	2021-10-29 14:23:14 -07:00
Florian Hahn	274a9b0f0b	[DSE] Support redundant stores eliminated by memset. This patch adds support to remove stores that write the same value as earlier memesets. It uses isOverwrite to check that a memset completely overwrites a later store. The candidate store must store the same bytewise value as the byte stored by the memset. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D112321	2021-10-29 22:19:53 +01:00
Nikita Popov	cdf45f98ca	[BasicAA] Extract linear expression multiplication (NFC) Extract a common method for multiplying a linear expression by a factor.	2021-10-29 22:41:40 +02:00
Sam Clegg	3b039c68f2	Revert "[WebAssembly] Fix debug locations for ExplicitLocals pass" This reverts commit `a66451ebbe`. This caused a failure when integrated with emscripten: https://ci.chromium.org/ui/p/emscripten-releases/builders/try/linux/b8832019855439718609/overview	2021-10-29 13:34:18 -07:00
Nikita Popov	7cf7378a9d	[BasicAA] Don't treat non-inbounds GEP as nsw The scale multiplication is only guaranteed to be nsw if the GEP is inbounds (or the multiplication is trivial). Previously we were only considering explicit muls in GEP indices.	2021-10-29 22:30:44 +02:00
Nick Desaulniers	39e5dd113f	[SparcISelLowering] avoid emitting libcalls to __muloti4 and __mulodi4 These compiler-rt-only symbols aren't available in libgcc. Similar to D108842, D108844, and D108926. Fixes: pr/52043 Reviewed By: craig.topper, rengolin Differential Revision: https://reviews.llvm.org/D112750	2021-10-29 13:14:09 -07:00
Sanjay Patel	285b8abce4	[x86] limit vector increment fold to allow load folding The tests are based on the example from: https://llvm.org/PR52032 I suspect that it looks worse than it actually is. :) That is, llvm-mca says there's no uop/timing difference with the load folding and pcmpeq vs. broadcast on Haswell (and probably other targets). The load-folding definitely makes the code smaller, so it's good for that at least. So this requires carving a narrow hole in the transform to get just this case without changing others that look good as-is (in other words, the transform still seems good for most examples). Differential Revision: https://reviews.llvm.org/D112464	2021-10-29 15:48:35 -04:00
Sanjay Patel	837518d6a0	[x86] make mayFold* helpers visible to more files; NFC The first function is needed for D112464, but we might as well keep these together in case the others can be used someday.	2021-10-29 15:48:35 -04:00
Sanjay Patel	8f786b4618	[InstCombine] fix comments to match code; NFC	2021-10-29 15:48:35 -04:00
modimo	5caad9b5d3	[InlineAdvisor] Add fallback/format switches and negative remark processing to Replay Inliner Adds the following switches: 1. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback: controls what the replay advisor does for inline sites that are not present in the replay. Options are: 1. Original: defers to original advisor 2. AlwaysInline: inline all sites not in replay 3. NeverInline: inline no sites not in replay 2. --sample-profile-inline-replay-format/--cgscc-inline-replay-format: controls what format should be generated to match against the replay remarks. Options are: 1. Line 2. LineColumn 3. LineDiscriminator 4. LineColumnDiscriminator Adds support for negative inlining decisions. These are denoted by "will not be inlined into" as compared to the positive "inlined into" in the remarks. All of these together with the previous `--sample-profile-inline-replay-scope/--cgscc-inline-replay-scope` allow tweaking in how to apply replay. In my testing, I'm using: 1. --sample-profile-inline-replay-scope/--cgscc-inline-replay-scope = Function to only replay on a function 2. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback = NeverInline since I'm feeding in only positive remarks to the replay system 3. --sample-profile-inline-replay-format/--cgscc-inline-replay-format = Line since I'm generating the remarks from DWARF information from GCC which can conflict quite heavily in column number compared to Clang An alternative configuration could be to do Function, AlwaysInline, Line fallback with negative remarks which closer matches the final call-sites. Note that this can lead to unbounded inlining if a negative remark doesn't match/exist for one reason or another. Updated various tests to cover the new switches and negative remarks Testing: ninja check-all Reviewed By: wenlei, mtrofin Differential Revision: https://reviews.llvm.org/D112040	2021-10-29 12:32:03 -07:00
Duncan P. N. Exon Smith	9902362701	Support: Use sys::path::is_style_{posix,windows}() in a few places Use the new sys::path::is_style_posix() and is_style_windows() in a few places that need to detect the system's native path style. In llvm/lib/Support/Path.cpp, this patch removes most uses of the private `real_style()`, where is_style_posix() and is_style_windows() are just a little tidier. Elsewhere, this removes `_WIN32` macro checks. Added a FIXME to a FileManagerTest that seemed fishy, but maintained the existing behaviour. Differential Revision: https://reviews.llvm.org/D112289	2021-10-29 12:09:41 -07:00
modimo	51ce567b38	[SampleProfile] Add all callsites to AllCandidates if InlineReplay is in effect Replay in sample profiling needs to be asked on candidates that may not have counts or below the threshold. If replay is in effect for a function make sure these are captured and also imported during thinLTO. Testing: ninja check-all Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D112033	2021-10-29 12:04:52 -07:00
Roman Lebedev	0ae7bf124a	[NFC][LoopDeletion] Count the number of broken backedges Those don't contribute to the number of deleted loops.	2021-10-29 21:58:16 +03:00
Amara Emerson	5dd9e019dd	[AArch64][GlobalISel] Fix an crash in RBS due to a new regclass being added. rdar://84674985	2021-10-29 11:47:00 -07:00
Duncan P. N. Exon Smith	4e4883e1f3	Support: Expose sys::path::is_style_{posix,windows,native}() Expose three helpers in namespace llvm::sys::path to detect the path rules followed by sys::path::Style. - is_style_posix() - is_style_windows() - is_style_native() This are constexpr functions that that will allow a bunch of path-related code to stop checking `_WIN32`. Originally I looked at adding system_style(), analogous to sys::endian::system_endianness(), but future patches (from others) will add more Windows style variants for slash preferences. These helpers should be resilient to that change, allowing callers to detect basic path rules. Differential Revision: https://reviews.llvm.org/D112288	2021-10-29 11:46:44 -07:00
Sanjay Patel	d0e9879d96	[InstCombine] allow vector splat matching for bitwise logic folds These transforms are also likely missing a one-use check, but that's another patch.	2021-10-29 14:22:50 -04:00
Stanislav Mekhanoshin	a905c54b76	[InstCombine] Fold `(~(a \| b) & c) \| ~(a \| c)` into `~((b & c) \| a)` ``` ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %or1 = or i4 %b, %a %not1 = xor i4 %or1, -1 %or2 = or i4 %a, %c %not2 = xor i4 %or2, -1 %and = and i4 %not2, %b %or3 = or i4 %and, %not1 ret i4 %or3 } define i4 @tgt(i4 %a, i4 %b, i4 %c) { %and = and i4 %c, %b %or = or i4 %and, %a %or3 = xor i4 %or, -1 ret i4 %or3 } Transformation seems to be correct! ``` Differential Revision: https://reviews.llvm.org/D112338	2021-10-29 10:58:09 -07:00
Matt Morehouse	33cc0cfd46	[X86] Don't affect jump tables under +tagged-globals. `classifyLocalReference(nullptr)` is called to get the appropriate relocation type for jump tables. We should not use @GOTPCREL for this case. The new test cases trigger assertions without this patch. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D112832	2021-10-29 10:37:43 -07:00
Fraser Cormack	8314a04ede	[SelectionDAG] Allow FindMemType to fail when widening loads & stores This patch removes an internal failure found in FindMemType and "bubbles it up" to the users of that method: GenWidenVectorLoads and GenWidenVectorStores. FindMemType -- renamed findMemType -- now returns an optional value, returning None if no such type is found. Each of the aforementioned users now pre-calculates the list of types it will use to widen the memory access. If the type breakdown is not possible they will signal a failure, at which point the compiler will crash as it does currently. This patch is preparing the ground for alternative legalization strategies for vector loads and stores, such as using vector-predication versions of loads or stores. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112000	2021-10-29 18:27:31 +01:00
Craig Topper	aefcd59895	[RISCV] Teach RISCVInsertVSETVLI::needVSETVLI to handle mask register instructions better. If the VL operand of a mask register instruction comes from an explicit vsetvli with a different VTYPE, we can still avoid needing a vsetvli as long as the SEW/LMUL ratio is the same and policy bits match. Differential Revision: https://reviews.llvm.org/D112762	2021-10-29 09:49:36 -07:00
Simon Pilgrim	6102e5d56b	[CostModel][X86] Remove old TODO comment BMI (TZCNT) scalar handling was added at rGa2db388dce77c2f23f2009d7363a0b63bb54523c	2021-10-29 17:28:45 +01:00
Mircea Trofin	d6790a0a3c	[NFC] ProfileSummary: const most of the fields. This simplifies readability / maintainability.	2021-10-29 08:36:08 -07:00
Bradley Smith	86972f1114	[AArch64][SVE] Use TargetFrameIndex in more SVE load/store addressing modes Add support for generating TargetFrameIndex in complex patterns for indexed addressing modes in SVE. Additionally, add missing load/stores to getMemOpInfo and getLoadStoreImmIdx. Differential Revision: https://reviews.llvm.org/D112617	2021-10-29 14:44:16 +00:00
Jay Foad	56f03d25b4	[IR] Remove createReplacementInstr. NFC. It is unused since D112791. Differential Revision: https://reviews.llvm.org/D112795	2021-10-29 15:03:19 +01:00
Jay Foad	1b758925ad	[IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction createReplacementInstr was a trivial wrapper around ConstantExpr::getAsInstruction, which also inserted the new instruction into a basic block. Implement this directly in getAsInstruction by adding an InsertBefore parameter and change all callers to use it. NFC. A follow-up patch will remove createReplacementInstr. Differential Revision: https://reviews.llvm.org/D112791	2021-10-29 15:02:58 +01:00
Jay Foad	21a1d4cf71	[AMDGPU] Change numBitsSigned for simplicity and document it. NFC. Change numBitsSigned to return the minimum size of a signed integer that can hold the value. This is different by one from the previous result but is more consistent with numBitsUnsigned. Update all callers. All callers are now more consistent between the signed and unsigned cases, and some callers get simpler, especially the ones that deal with quantities like numBitsSigned(LHS) + numBitsSigned(RHS). Differential Revision: https://reviews.llvm.org/D112813	2021-10-29 14:22:06 +01:00
Chen Zheng	7591d21032	[PowerPC] fix a miscompile for Solaris build	2021-10-29 12:06:25 +00:00
Bradley Smith	bf72a469ba	[AArch64][SVE] Fix build failure introduced in `13faa5f440`	2021-10-29 11:57:02 +00:00
David Green	11630dbbc3	[InstCombine] Fold BW/2+1 tops bits are same pattern Match "icmp eq (trunc (lsr A, BW), (ashr (trunc A), BW-1))", which checks the top BW/2 + 1 bits are all the same. Create "A >=s INT_MIN && A <=s INT_MAX", which we generate as "icmp ult (add A, 2^BW-1), 2^BW" to skip a few steps of instcombining. https://alive2.llvm.org/ce/z/NjH6Ty https://alive2.llvm.org/ce/z/_fEQ9P Differential Revision: https://reviews.llvm.org/D109155	2021-10-29 12:30:20 +01:00
Simon Pilgrim	154c036ebb	[X86] combineX86GatherScatter - only fold scale if the index isn't extended As mentioned on D108539, when the gather indices are smaller than the pointer size, they are sign-extended BEFORE scale is applied, making the general fold unsafe. If the index have sufficient sign-bits then folding the scale could be safe - I'll investigate this.	2021-10-29 11:48:05 +01:00
David Green	9020e22a87	[InstCombine] Convert xor (ashr X, BW-1), C -> select(X >=s 0, C, ~C) The sequence of instructions `xor (ashr X, BW-1), C` (or with a truncation `xor (trunc (ashr X, BW-1)), C)` takes a value, produces all zeros or all ones and with it optionally inverts a constant depending on whether the original input was positive or negative. This is the same as checking if the value is positive, and selecting between the constant and ~constant. https://alive2.llvm.org/ce/z/NJ85qY This is a fairly general version of a fold that helps pull saturating arithmetic into a canonical form. Differential Revision: https://reviews.llvm.org/D109151	2021-10-29 11:19:20 +01:00
Bradley Smith	13faa5f440	[AArch64][SVE] Generate SVE >1 element structured load/stores from fixed types This adds support for SVE structured loads/stores to the relevant target hooks, such that we can support these instructions in the InterleavedAccess pass. Depends on D112078 Differential Revision: https://reviews.llvm.org/D112303	2021-10-29 09:35:57 +00:00
Cullen Rhodes	8686626244	[Sparc] NFC: Remove unused tblgen template args Identified in D109359. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D109712	2021-10-29 09:16:15 +00:00
Neubauer, Sebastian	c78640ee6a	[TailDuplicator] Fix merging block with terminator The TailDuplicator merged two blocks, even if the first one ended with a terminator, resulting in invalid MIR, where a terminator is in the middle of a block. Abort merging if the first block ends with a terminator. Differential Revision: https://reviews.llvm.org/D112226	2021-10-29 10:52:46 +02:00
Vang Thao	52b43d1549	[AMDGPU] Fix cvt_f32_ubyte combine with shl Shift node is still needed to check if the shift is shr or shl to increment/decrement offset. Do not override the node. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112733	2021-10-28 21:43:06 -07:00
Chuanqi Xu	bb16e83932	[NFC] [Coroutines] Use llvm::make_scope_exit to replace self-defined RTTIHelper	2021-10-29 12:14:20 +08:00
Kazu Hirata	01b4789b62	[AMDGPU] Remove hasDefinedInitializer (NFC) The last use was removed on Sep 16, 2021 in commit `7a62a5b56d`.	2021-10-28 20:33:34 -07:00
Kazu Hirata	dd5d46b009	[AMDGPU] Remove unused BBSelectRegister in AMDGPUMachineCFGStructurizer (NFC) This field seems to be unused for at least one year.	2021-10-28 20:33:32 -07:00
Kazu Hirata	309357c01a	[AMDGPU] Remove unused declaration eliminateDeadBranchOperands (NFC)	2021-10-28 20:33:30 -07:00
Abinav Puthan Purayil	db8d7b6e2d	[DAGCombine][NFC] s/it's/its in the comment of hasNoInfs().	2021-10-29 07:36:38 +05:30
Lang Hames	12b2cc2294	[ORC] Rename SupportFunctionCall to WrapperFunctionCall. The new name better suits the type. This patch also changes the signature of the run method (it now returns a WrapperFunctionResult), and adds runWithSPSRet methods that deserialize the function result using SPS. Together these chages bring this type into close alignment with its ORC runtime counterpart.	2021-10-28 17:48:54 -07:00
Lang Hames	999c6a235e	Reapply `e32b1eee6a` "[ORC] Change SPSExecutorAddr serialization,..." with fixes. This re-applies `e32b1eee6a`, which was reverted in `20675d8f7d` due to broken unit tests. This patch includes fixes for the tests.	2021-10-28 16:40:25 -07:00
Thomas Lively	fb67f3d969	[WebAssembly] Add prototype relaxed float to int trunc instructions Add i32x4.relaxed_trunc_f32x4_s, i32x4.relaxed_trunc_f32x4_u, i32x4.relaxed_trunc_f64x2_s_zero, i32x4.relaxed_trunc_f64x2_u_zero. These are only exposed as builtins, and require user opt-in. Differential Revision: https://reviews.llvm.org/D112186	2021-10-28 14:01:53 -07:00
Daniel Kiss	d8075e8781	Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This is relanding commit `da1d1a0869` . This patch additionally addresses failures found in buildbots & post review comments. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-28 21:45:09 +02:00
Wouter van Oortmerssen	a66451ebbe	[WebAssembly] Fix debug locations for ExplicitLocals pass Differential Revision: https://reviews.llvm.org/D112487	2021-10-28 12:35:46 -07:00
Stanislav Mekhanoshin	f7f430c913	[InstCombine] Fixed non-determinisctic order of new instructions Fixes non-determinisctic order of XOR instructions created after `5a7a458306`. The order of call argument evaluation is not defined, so create one Value before the call.	2021-10-28 12:14:02 -07:00
Stanislav Mekhanoshin	5a7a458306	[InstCombine] Fold `(c & ~(a \| b)) \| (b & ~(a \| c))` to `~a & (b ^ c)` ``` ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %or1 = or i4 %a, %b %not1 = xor i4 %or1, 15 %and1 = and i4 %not1, %c %or2 = or i4 %a, %c %not2 = xor i4 %or2, 15 %and2 = and i4 %not2, %b %or3 = or i4 %and1, %and2 ret i4 %or3 } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %xor = xor i4 %b, %c %not = xor i4 %a, 15 %or3 = and i4 %xor, %not ret i4 %or3 } Transformation seems to be correct! ``` Differential Revision: https://reviews.llvm.org/D112276	2021-10-28 11:54:30 -07:00
Ahmed Bougacha	bef777206e	[AArch64] Rename some timm predicates for consistency. NFC. timm isn't the common case, and TImmLeafs should make it clear what they are. We're adding a plain ImmLeaf for 0_65535, so rename i64_imm0_65535 to timm64_0_65535, and imm32_0_7 to timm32_0_7.	2021-10-28 11:41:29 -07:00
Yuanfang Chen	ac02bcad56	[IRSymTab] Mark __stack_chk_guard used `__stack_chk_guard` is a global variable that has no uses before the LLVM code generation phase (how it is defined is platform-dependent). LTO needs to preserve this symbol for that reason. Currently, legacy LTO API preserves it by hardcoding the logic in Internalizer, but this symbol is not preserved by regular LTO API in thinlink phase. This patch marks `__stack_chk_guard` used during IR symbol table writing since this is how builtin functions are preserved by thinlink by using `RuntimeLibcalls.def`. Reviewed By: MaskRay, tejohnson Differential Revision: https://reviews.llvm.org/D112595	2021-10-28 11:22:26 -07:00
Yuanfang Chen	c18ed69873	[Internalize] Preserve __stack_chk_fail in Internalizer correctly Move the section collecting `AlwaysPreserved` up before any `maybeInternalize` is called. Otherwise, functions in `AlwaysPreserved` (in this case, `__stack_chk_fail`) are not preserved. Reviewed By: MaskRay, tejohnson Differential Revision: https://reviews.llvm.org/D112684	2021-10-28 11:22:26 -07:00
Guozhi Wei	1e46dcb77b	[TwoAddressInstructionPass] Put all new instructions into DistanceMap In function convertInstTo3Addr, after converting a two address instruction into three address instruction, only the last new instruction is inserted into DistanceMap. This is wrong, DistanceMap should track all instructions from the beginning of current MBB to the working instruction. When a two address instruction is converted to three address instruction, multiple instructions may be generated (usually an extra COPY is generated), all of them should be inserted into DistanceMap. Similarly when unfolding memory operand in function tryInstructionTransform DistanceMap is not maintained correctly. Differential Revision: https://reviews.llvm.org/D111857	2021-10-28 11:11:59 -07:00
Matthias Braun	e2c7ee0743	X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr This extends `optimizeCompareInstr` to re-use previous comparison results if the previous comparison was with an immediate that was 1 bigger or smaller. Example: CMP x, 13 ... CMP x, 12 ; can be removed if we change the SETg SETg ... ; x > 12 changed to `SETge` (x >= 13) removing CMP Motivation: This often happens because SelectionDAG canonicalization tends to add/subtract 1 often when optimizing for fallthrough blocks. Example for `x > C` the fallthrough optimization switches true/false blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into `x < C + 1`. Differential Revision: https://reviews.llvm.org/D110867	2021-10-28 10:33:56 -07:00
Matthias Braun	97a1570d8c	X86InstrInfo: Optimize more combinations of SUB+CMP `X86InstrInfo::optimizeCompareInstr` would only optimize a `SUB` followed by a `CMP` in `isRedundantFlagInstr`. This extends the code to also look for other combinations like `CMP`+`CMP`, `TEST`+`TEST`, `SUB x,0`+`TEST`. - Change `isRedundantFlagInstr` to run `analyzeCompareInstr` on the candidate instruction and compare the results. This normalizes things and gives consistent results for various comparisons (`CMP x, y`, `SUB x, y`) and immediate cases (`TEST x, x`, `SUB x, 0`, `CMP x, 0`...). - Turn `isRedundantFlagInstr` into a member function so it can call `analyzeCompare`. - We now also check `isRedundantFlagInstr` even if `IsCmpZero` is true, since we now have cases like `TEST`+`TEST`. Differential Revision: https://reviews.llvm.org/D110865	2021-10-28 10:33:56 -07:00
Florian Hahn	c45045bfd0	[VPlan] Keep induction recipes in header. This patch updates recipe creation to ensure all VPWidenIntOrFpInductionRecipes are in the header block. At the moment, new induction recipes can be created in different blocks when trying to optimize casts and induction variables. Having all induction recipes in the header makes it easier to analyze/transform them in VPlan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D111300	2021-10-28 18:22:05 +01:00
Nicolai Hähnle	b437aaa672	MachineDominators: Define MachineDomTree type alias This is a (very) small move towards making the machine dominators more aligned with the IR dominators: * DominatorTree / MachineDomTree is the class holding the dominator tree * DominatorTreeWrapperPass / MachineDominatorTree is the corresponding (machine) function pass This alignment will be used by analyses that are designed as templates that work with LLVM IR as well as Machine IR. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D112690	2021-10-28 22:30:35 +05:30
Leonard Grey	793b481f54	[CGProfile] Don't emit call graph profile edges with zero weight With D112160 and D112164, on a Chrome Mac build this reduces the total size of CGProfile sections by 78% (around 25% eliminated entirely) and total size of object files by 0.14%. Differential Revision: https://reviews.llvm.org/D112655	2021-10-28 11:32:49 -04:00
Daniel Kiss	66e03db814	Revert "Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."" This reverts commit `b6420e575f`.	2021-10-28 17:24:53 +02:00
Daniel Kiss	b6420e575f	Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This is relanding commit `da1d1a0869` . This patch additionally addresses failures found in buildbots & post review comments. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-28 16:49:19 +02:00
David Green	9358384fd6	[InstCombine] Extend canonicalizeClampLike to handle truncated inputs This extends the canonicalizeClampLike function to allow cases where the input is truncated, but still matching on the types of the ICmps. For example %t = trunc i32 %X to i8 %a = add i32 %X, 128 %cmp = icmp ult i32 %a, 256 %c = icmp sgt i32 %X, -1 %f = select i1 %c, i8 High, i8 Low %r = select i1 %cmp, i8 %t, i8 %f becomes %c1 = icmp slt i32 %X, -128 %c2 = icmp sge i32 %X, 128 %s1 = select i1 %c1, i32 sext(Low), i32 %X %s2 = select i1 %c2, i32 sext(High), i32 %s1 %t = trunc i32 %s2 to i8 https://alive2.llvm.org/ce/z/vPzfxH We limit the transform to constant High and Low values, where we know the sext are free. Differential Revision: https://reviews.llvm.org/D108049	2021-10-28 15:46:58 +01:00
Dawid Jurczak	f87e0c68d7	[DSE] Eliminates redundant store of an exisiting value (PR16520) That's https://reviews.llvm.org/D90328 follow-up. This change eliminates writes to variables where the value that is being written is already stored in the variable. This achieves the goal by looping through all memory definitions in the current state and getting defining access from each of them. When there is defining access where the write instruction is identical to the original instruction it will remove this redundant write. For example: void f() { x = 1; if foo() { x = 1; g(); } else { h(); } } void g(); void h(); The second x=1 will be eliminated since it is rewriting 1 to x. This pass will produce this: void f() { x = 1; if foo() { g(); } else { h(); } } void g(); void h(); Differential Revision: https://reviews.llvm.org/D111727	2021-10-28 16:20:09 +02:00
David Green	79011c705b	[InstCombine] Fix rare condition violation in canonicalizeClampLike With a "ult x, 0", the fold in canonicalizeClampLike does not validate with undef inputs. This condition will usually have been simplified away, but we should ensure the code is correct in case. https://alive2.llvm.org/ce/z/S8HQ6H vs https://alive2.llvm.org/ce/z/h2XBJ_ See: https://reviews.llvm.org/D108049	2021-10-28 15:03:07 +01:00
Simon Pilgrim	d29ccbecd0	[X86][AVX] Attempt to fold a scaled index into a gather/scatter scale immediate (PR13310) If the index operand for a gather/scatter intrinsic is being scaled (self-addition or a shl-by-immediate) then we may be able to fold that scaling into the intrinsic scale immediate value instead. Fixes PR13310. Differential Revision: https://reviews.llvm.org/D108539	2021-10-28 14:07:17 +01:00
Alexey Bataev	07ef9f513f	[SLP]Improve/fix reordering of the gathered graph nodes. Gathered loads/extractelements/extractvalue instructions should be checked if they can represent a vector reordering node too and their order should ve taken into account for better graph reordering analysis/ Also, if the gather node has reused scalars, they must be reordered instead of the scalars themselves. Differential Revision: https://reviews.llvm.org/D112454	2021-10-28 05:45:09 -07:00
Sanjay Patel	e8535fa784	[InstCombine] allow Negator to fold multi-use select with constant arms The motivating test is reduced from: https://llvm.org/PR52261 Note that the more general problem of folding any binop into a multi-use select of constants is still there. We need to ease the restriction in InstCombinerImpl::FoldOpIntoSelect() to catch those. But these examples never reach that code because Negator exclusively handles negation patterns within visitSub(). Differential Revision: https://reviews.llvm.org/D112657	2021-10-28 08:35:58 -04:00
Peter Waller	98f08752f7	[InstCombine][ConstantFolding] Make ConstantFoldLoadThroughBitcast TypeSize-aware The newly added test previously caused the compiler to fail an assertion. It looks like a strightforward TypeSize upgrade. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D112142	2021-10-28 12:15:15 +00:00
Abinav Puthan Purayil	2da6ef3664	[AMDGPU] Add 24-bit mulhi intrinsics in INTRINSIC_WO_CHAIN combine. mul24 intrinsic's operands are simplified by AMDGPUTargetLowering::performIntrinsicWOChainCombine(). This change adds the mul24hi intrinsics in the combine since its operands can be simplified like that of the mul24 intrinsics. Differential Revision: https://reviews.llvm.org/D112702	2021-10-28 16:57:48 +05:30
Sebastian Neubauer	fd1cfc9094	[AMDGPU][GlobalISel] Fix waterfall loops - Move the `s_and exec` to its correct position before the content of the waterfall loop - Use the SI_WATERFALL pseudo instruction, like for sdag, to benefit from optimizations - Add support for indirect function calls To support indirect calls, add a G_SI_CALL instruction without register class restrictions and insert a waterfall loop when applying register banks. Differential Revision: https://reviews.llvm.org/D109052	2021-10-28 10:30:55 +02:00
Neubauer, Sebastian	50d8d963e3	[GlobalISel] Simplify RegBankSelect Save the instruction list of a block before selecting banks. This allows to cope with moved instructions, even if they are reordered or splitted into multiple basic blocks. Differential Revision: https://reviews.llvm.org/D111223	2021-10-28 10:30:55 +02:00
Caroline Concatto	2186b011e9	[Driver][AArch64]Add driver support for neoverse-512tvb target The support for neoverse-512tvb mirrors the same option available in GCC[1]. There is no functional effect for this option yet. This patch ensures the driver accepts "-mcpu=neoverse-512tvb", and enough plumbing is in place to allow the new option to be used in the future. [1]https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html Differential Revision: https://reviews.llvm.org/D112406	2021-10-28 09:08:40 +01:00
Martin Storsjö	177176f75c	[Support] [Windows] Manually clean up temp files if not setting delete disposition Since D81803 / `79657e2339`, temp files created on network shares don't set "Disposition.DeleteFile = true". This flag normally takes care of removing the temp file both if the process exits abnormally (either crashing or killed externally), and when the file is closed cleanly. For network shares, we voluntarily choose to not set the flag, and if the operation to inspect the file handle (as a prerequisite to setting the flag since `79657e2339`) fails we also error out. In both of these cases, we can at least make sure to remove the temp files when they are closed cleanly. Adjust the semantics of "OF_Delete" to not set the delete disposition, but only set the access mode for allowing deletion. Move the call to setDeleteDisposition into TempFile::create, where we can check if it failed, and if it did, set a flag noting that the file should be removed manually at the end. This does leak files on crash, but at least doesn't leak files in regular successful runs. (Technically, the alternative codepath could use the RemoveFileOnSignal function, but that might complicate the TempFile implementation further.) This fixes https://github.com/mstorsjo/llvm-mingw/issues/233 and https://bugs.llvm.org/show_bug.cgi?id=52080. Differential Revision: https://reviews.llvm.org/D111875	2021-10-28 10:33:37 +03:00
Hongtao Yu	259e4c5658	[CSSPGO] Trim cold base profiles for the CS preinliner. Adding support to the CS preinliner to trim cold base profiles. This makes trimming consistent with the inline decision made by the preinliner. Also disable the existing profile merger when preinliner is on unless explicitly specified. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D112489	2021-10-27 22:50:27 -07:00
Hsiangkai Wang	7051f73d69	[RISCV] Sync Zvlsseg register order as the same as vector registers. Sync the order of Zvlsseg registers with vector registers to avoid unnecessary register copies between vector instructions and zvlsseg instructions. Differential Revision: https://reviews.llvm.org/D110250	2021-10-28 13:34:53 +08:00
Kazu Hirata	cee3419d65	[AMDGPU] Remove unused declaration findNumUsedRegistersSI (NFC)	2021-10-27 21:24:02 -07:00
Phoebe Wang	2bc28c6f82	[X86] Add a dependency breaking xor before any gathers with an undef passthru value. In the instruction encoding, the passthru register is always tied to the destination register. The CPU scheduler has to wait for the last writer of this register to finish executing before the gather can start. This is true even if the initial mask is all ones so that the passthru will never be used. By explicitly zeroing the register we can break the false dependency. The zero idiom is executed completing by the register renamer and so is immedately considered ready. Authored by Craig. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D112505	2021-10-28 11:44:52 +08:00
Hsiangkai Wang	0a9b82960c	[RISCV] Use vmv.v.[v\|i] if we know COPY is under the same vl and vtype. If we know the source operand of COPY is defined by a vector instruction with tail agnostic and the same LMUL and there is no vsetvli between COPY and the define instruction to change the vl and vtype, we could use vmv.v.v or vmv.v.i to copy vector registers to get better performance than the whole vector register move instructions. If the source of COPY is from vmv.v.i, we could use vmv.v.i for the COPY. This patch only considers all these instructions within one basic block. Case 1: ``` bb.0: ... VSETVLI # The first VSETVLI before COPY and VOP. ... # Use this VSETVLI to check LMUL and tail agnostic. ... vy = VOP va, vb # Define vy. ... # There is no vsetvli between VOP and COPY. vx = COPY vy ``` Case 2: ``` bb.0: ... VSETVLI # The first VSETVLI before VOP. ... # Use this VSETVLI to check LMUL and tail agnostic. ... vy = VOP va, vb # Define vy. ... # There is no vsetvli to change vl between VOP and COPY. ... VSETVLI # The first VSETVLI before COPY. ... # This VSETVLI does not change vl and vtype. ... vx = COPY vy ``` Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Co-Authored-by: Kito Cheng <kito.cheng@sifive.com> Differential Revision: https://reviews.llvm.org/D103510	2021-10-28 11:39:04 +08:00
Max Kazantsev	513914e1f3	[SCEV] Invalidate user SCEVs along with operand SCEVs to avoid cache corruption Following discussion in D110390, it seems that we are suffering from unability to traverse users of a SCEV being invalidated. The result of that is that ScalarEvolution's inner caches may store obsolete data about SCEVs even if their operands are forgotten. It creates problems when we try to verify the contents of those caches. It's also a frequent situation when messing with cache causes very sneaky and hard-to-analyze bugs related to corruption of memory when dealing with cached data. They are lurking there because ScalarEvolution's veirfication is not powerful enough and misses many problematic cases. I plan to make SCEV's verification much stricter in follow-ups, and this requires dangling-pointers-free caches. This patch makes sure that, whenever we forget cached information for a SCEV, we also forget it for all SCEVs that (transitively) use it. This may have negative compile time impact. It's a sacrifice we are more than willing to make to enforce correctness. We can also save some time by reworking invokers of forgetMemoizedResults (maybe we can forget multiple SCEVs with single query). Differential Revision: https://reviews.llvm.org/D111533 Reviewed By: reames	2021-10-28 09:39:24 +07:00
Craig Topper	1387483e72	[RISCV] Replace most uses of RISCVSubtarget::hasStdExtV. NFCI Add new hasVInstructions() which is currently equivalent. Replace vector uses of hasStdExtZfh/F/D with new vector specific versions. The vector spec no longer requires that the vectors implement the same types as scalar. It only requires that the scalar type is the maximum size the vectors can support. This is currently implemented using the scalar rule we were using before. Add new hasVInstructionsI64() begin using to qualify code that requires i64 vector elements. This is all NFC for now, but we can start using this to better implement D112408 which introduces the Zve extensions. Reviewed By: frasercrmck, eopXD Differential Revision: https://reviews.llvm.org/D112496	2021-10-27 19:33:48 -07:00
Johannes Doerfert	acf3093117	[Attributor][FIX] Do not ignore memory writes in AAMemoryBehavior Even if we look for `nocapture` we need to bail on escaping pointers. The crucial thing is that we might not look at a big enough scope when we derive the memory behavior. Thus, it might be `nocapture` in a larger context while it is "captured" in a smaller context.	2021-10-27 21:04:32 -05:00
Johannes Doerfert	734f91441d	[Attributor][NFC] Improve debug messages	2021-10-27 21:04:31 -05:00
Ard Biesheuvel	d7e089f2d6	[ARM] Use hardware TLS register in Thumb2 mode when -mtp=cp15 is passed In ARM mode, passing -mtp=cp15 forces the use of an inline MRC system register read to move the thread pointer value into a register. Currently, in Thumb2 mode, -mtp=cp15 is ignored, and a call to the __aeabi_read_tp helper is emitted instead. This is inconsistent, and breaks the Linux/ARM build for Thumb2 targets, as the Linux kernel does not provide an implementation of __aeabi_read_tp,. Reviewed By: nickdesaulniers, peter.smith Differential Revision: https://reviews.llvm.org/D112600	2021-10-27 16:42:11 -07:00
Lang Hames	20675d8f7d	Revert "[ORC] Change SPSExecutorAddr serialization, SupportFunctionCall struct." This reverts commit `e32b1eee6a`. Reverting while I fix some broken unit tests.	2021-10-27 16:39:56 -07:00
Johannes Doerfert	8a4551b893	[Attributor][FIX] Use right address space to avoid assertion When we strip and accumulate constant offsets we need to pick the right address space such that the offset APInt has the right bit width. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D112544	2021-10-27 18:22:37 -05:00
Lang Hames	e32b1eee6a	[ORC] Change SPSExecutorAddr serialization, SupportFunctionCall struct. SPSExecutorAddr will now be serializable to/from ExecutorAddr, rather than uint64_t. This improves type safety when working with serialized addresses. Also updates the SupportFunctionCall to use an ExecutorAddrRange (rather than a separate ExecutorAddr addr and uint64_t size field), and updates the tpctypes::*Write data structures to use ExecutorAddr rather than JITTargetAddress.	2021-10-27 16:20:46 -07:00
Roman Lebedev	b291597112	Revert rest of `IRBuilderBase`'s short-circuiting folds Upon further investigation and discussion, this is actually the opposite direction from what we should be taking, and this direction wouldn't solve the motivational problem anyway. Additionally, some more (polly) tests have escaped being updated. So, let's just take a step back here. This reverts commit `f3190dedee`. This reverts commit `749581d21f`. This reverts commit `f3df87d57e`. This reverts commit `ab1dbcecd6`.	2021-10-28 02:15:14 +03:00
Michael Liao	e6a4ba3aa6	[amdgpu] Handle the case where there is no scavenged register. - When an unconditional branch is expanded into an indirect branch, if there is no scavenged register, an SGPR pair needs spilling to enable the destination PC calculation. In addition, before jumping into the destination, that clobbered SGPR pair need restoring. - As SGPR cannot be spilled to or restored from memory directly, the spilling/restoring of that SGPR pair reuses the regular SGPR spilling support but without spilling it into memory. As that spilling and restoring points are fully controlled, we only need to spill that SGPR into the temporary VGPR, which needs spilling into its emergency slot. - The target-specific hook is revised to take additional restore block, where the restoring code is filled. After that, the relaxation will place that restore block directly before the destination block and insert an unconditional branch in any fall-through block into the destination block. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106449	2021-10-27 18:37:27 -04:00
Ben Langmuir	3d13ee2891	[ORC][ORC-RT] Enable the MachO platform for arm64 Enables the arm64 MachO platform, adds basic tests, and implements the missing TLV relocations and runtime wrapper function. The TLV relocations are just handled as GOT accesses. rdar://84671534 Differential Revision: https://reviews.llvm.org/D112656	2021-10-27 13:36:03 -07:00
Nikita Popov	ea7be26045	[ConstantRange] Optimize smul_sat() (NFC) Base the implementation on the APInt smul_sat() implementation, which is much more efficient than performing calculations in double the bitwidth.	2021-10-27 21:01:09 +02:00
Nikita Popov	665060ea45	[BasicAA] Remove misleading overflow check GEP decomposition currently checks whether the multiplication of the linear expression offset and GEP scale overflows. However, if everything else works correctly, this overflow check is both unnecessary and dangerously misleading. While it will avoid an overflow in Scale * Offset in particular, other parts of the calculation (including those on dynamic values) may still overflow. The code working on the decomposed GEPs is responsible for ensuring that it remains correct in the presence of overflow. D112611 fixes the last issue of that kind that I'm aware of (in fact, the overflow check was originally introduced to work around precisely that issue). Differential Revision: https://reviews.llvm.org/D112618	2021-10-27 20:56:03 +02:00
Nick Desaulniers	3ccd041af9	[LowerTypeTests] Emit cfi_jt aliases regardless of function export A constant complaint we get is that the __typeid__ symbols in the CFI jump tables causes confusing stack traces in applications. Emit the more readable cfi_jt aliases regardless of function export (LTO vs Thin LTO). Reviewed By: pcc, tejohnson Differential Revision: https://reviews.llvm.org/D107934	2021-10-27 11:36:26 -07:00
Philip Reames	425cbbc602	[Operator] Add hasPoisonGeneratingFlags [mostly NFC] This method parallels the dropPoisonGeneratingFlags on Instruction, but is hoisted to operator to handle constant expressions as well. This is mostly code movement, but I did go ahead and add the inrange constexpr gep case. This had been discussed previously, but apparently never followed up o.	2021-10-27 11:25:40 -07:00
Alexey Bataev	f06e332982	Revert "[SLP]Improve/fix reordering of the gathered graph nodes." This reverts commit `64d1617d18` to fix test non-stability.	2021-10-27 11:16:58 -07:00
Roman Lebedev	156f10c840	[IR] `SCEVExpander::generateOverflowCheck()`: short-circuit `umul_with_overflow`-by-one It's a no-op, no overflow happens ever: https://alive2.llvm.org/ce/z/Zw89rZ While generally i don't like such hacks, we have a very good reason to do this: here we are expanding a run-time correctness check for the vectorization, and said `umul_with_overflow` will not be optimized out before we query the cost of the checks we've generated. Which means, the cost of run-time checks would be artificially inflated, and after https://reviews.llvm.org/D109368 that will affect the minimal trip count for which these checks are even evaluated. And if they aren't even evaluated, then the vectorized code certainly won't be run. We could consider doing this in IRBuilder, but then we'd need to also teach `CreateExtractValue()` to look into chain of `insertvalue`'s, and i'm not sure there's precedent for that. Refs. https://reviews.llvm.org/D109368#3089809	2021-10-27 19:45:55 +03:00
Kazu Hirata	593451bd3c	[X86] Remove getSETOpc (NFC) This function seems to be unused for at least one year.	2021-10-27 09:22:31 -07:00
Kazu Hirata	e6b6190ead	[X86] Remove NeedsRetpoline in X86AsmPrinter (NFC) This field seems to be unused for at least one year.	2021-10-27 09:22:29 -07:00
Kazu Hirata	cc73310a81	[X86] Remove CallOperand in X86Operand (NFC) This field seems to be unused for at least one year.	2021-10-27 09:22:27 -07:00
Alexey Bataev	64d1617d18	[SLP]Improve/fix reordering of the gathered graph nodes. Gathered loads/extractelements/extractvalue instructions should be checked if they can represent a vector reordering node too and their order should ve taken into account for better graph reordering analysis/ Also, if the gather node has reused scalars, they must be reordered instead of the scalars themselves. Differential Revision: https://reviews.llvm.org/D112454	2021-10-27 08:49:13 -07:00
David Sherwood	5d9318638e	[NFC][LoopVectorize] Change getStepVector to take a Value* for the StartIdx This patch changes the definition of getStepVector from: Value getStepVector(Value Val, int StartIdx, Value Step, ... to Value getStepVector(Value Val, Value StartIdx, Value Step, ... because: 1. it seems inconsistent to pass some values as Value and some as integer, and 2. future work will require the StartIdx to be an expression made up of runtime calculations of the VF. In widenIntOrFpInduction I've changed the code to pass in the value returned from getRuntimeVF, but the presence of the assert: assert(!VF.isScalable() && "scalable vectors not yet supported."); means that currently this code path is only exercised for fixed-width VFs and so the patch is still NFC. Differential revision: https://reviews.llvm.org/D111882	2021-10-27 16:12:38 +01:00
Roman Lebedev	ab1dbcecd6	[IR] `IRBuilderBase::CreateSelect()`: if cond is a constant i1, short-circuit While we could emit such a tautological `select`, it will stick around until the next instsimplify invocation, which may happen after we count the cost of this redundant `select`. Which is precisely what happens with loop vectorization legality checks, and that artificially increases the cost of said checks, which is bad. There is prior art for this in `IRBuilderBase::CreateAnd()`/`IRBuilderBase::CreateOr()`. Refs. https://reviews.llvm.org/D109368#3089809	2021-10-27 18:01:05 +03:00
Alexey Bataev	9b12975cbf	Revert "[SLP]Improve/fix reordering of the gathered graph nodes." This reverts commit `f719b794bc` to fix instability in tests.	2021-10-27 07:31:36 -07:00
Kerry McLaughlin	f01fafdcd4	[SVE][CodeGen] Fix incorrect legalisation of zero-extended masked loads PromoteIntRes_MLOAD always sets the extension type to `EXTLOAD`, which results in a sign-extended load. If the type returned by getExtensionType() for the load being promoted is something other than `NON_EXTLOAD`, we should instead pass this to getMaskedLoad() as the extension type. Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D112320	2021-10-27 14:15:41 +01:00
Alexey Bataev	f719b794bc	[SLP]Improve/fix reordering of the gathered graph nodes. Gathered loads/extractelements/extractvalue instructions should be checked if they can represent a vector reordering node too and their order should ve taken into account for better graph reordering analysis/ Also, if the gather node has reused scalars, they must be reordered instead of the scalars themselves. Differential Revision: https://reviews.llvm.org/D112454	2021-10-27 06:08:40 -07:00
Caroline Concatto	1137b7207d	[SelectionDAG] Widening the result of INSERT_SUBVECTOR. Widens the result and first input vector because they have the same size. The subvector to be inserted is widened in the operand widen function. Differential Revision: https://reviews.llvm.org/D112187	2021-10-27 13:52:25 +01:00
Nikita Popov	fbc0c308d5	[BasicAA] Handle known bits as ranges BasicAA currently tries to determine that the offset is positive by checking whether all variable indices are positive based on known bits, multiplied by a positive scale. However, this is incorrect if the scale multiplication might overflow. In the modified test case the original value is positive, but may be negative after a left shift. Fix this by converting known bits into a constant range and reusing the range-based logic, which handles overflow correctly. Differential Revision: https://reviews.llvm.org/D112611	2021-10-27 14:41:31 +02:00
Daniel Kiss	894ddba1c9	Revert "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This reverts commit `da1d1a0869`.	2021-10-27 14:29:35 +02:00
Sanjay Patel	6c0a2c2804	[x86] enhance mayFoldLoad to check alignment As noted in D112464, a pre-AVX target may not be able to fold an under-aligned vector load into another op, so we shouldn't report that as a load folding candidate. I only found one caller where this would make a difference -- combineCommutableSHUFP() -- so that's where I added a test to show the (minor) regression. Differential Revision: https://reviews.llvm.org/D112545	2021-10-27 07:54:25 -04:00
Matt	fc28a2f8ce	[AArch64][SVE] Combine predicated FMUL/FADD into FMA Combine FADD and FMUL intrinsics into FMA when the result of the FMUL is an FADD operand with one only use and both use the same predicate. Differential Revision: https://reviews.llvm.org/D111638	2021-10-27 11:41:23 +00:00
Alexandros Lamprineas	8689f5e6e7	[AArch64] Add support for the 'R' architecture profile. This change introduces subtarget features to predicate certain instructions and system registers that are available only on 'A' profile targets. Those features are not present when targeting a generic CPU, which is the default processor. In other words the generic CPU now means the intersection of 'A' and 'R' profiles. To maintain backwards compatibility we enable the features that correspond to -march=armv8-a when the architecture is not explicitly specified on the command line. References: https://developer.arm.com/documentation/ddi0600/latest Differential Revision: https://reviews.llvm.org/D110065	2021-10-27 12:32:30 +01:00
Alexey Bataev	cb4feae7bd	[SLP]Fix logical and/or reductions. Need to emit select(cmp) instructions for poison-safe forms of select ops. Currently alive reports that `Target is more poisonous than source` for operations we generating for such instructions. https://alive2.llvm.org/ce/z/FiNiAA Differential Revision: https://reviews.llvm.org/D112562	2021-10-27 04:25:20 -07:00
Nikita Popov	9bc7e543b4	[BasicAA] Make range check more precise Make the range check more precise by calculating the range of potentially accessed bytes for both accesses and checking whether their intersection is empty. In that case there can be no overlap between the accesses and the result is NoAlias. This is more powerful than the previous approach, because it can deal with sign-wrapped ranges. In the test case the original range is [-1, INT_MAX] but becomes [0, INT_MIN] after applying the offset. This is a wrapping range, so getSignedMin/getSignedMax will treat it as a full range. However, the range excludes the elements [INT_MIN+1, -1], which is enough to prove NoAlias with an access at offset -1. Differential Revision: https://reviews.llvm.org/D112486	2021-10-27 12:40:58 +02:00
Jay Foad	b9e3af124b	[LiveInterval] Add RemoveDeadValNo argument to removeSegment(iterator) Add an optional bool RemoveDeadValNo argument to the removeSegment(iterator) overload, for consistency with the other overloads. This gives clients a way to remove dead valnos while also getting an updated iterator returned (in the manner of vector::erase). Use this to clean up some inefficient code in LiveIntervals::repairOldRegInRange. NFC. Differential Revision: https://reviews.llvm.org/D110560	2021-10-27 09:43:32 +01:00
Daniel Kiss	da1d1a0869	[ARM] __cxa_end_cleanup should be called instead of _UnwindResume. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-27 10:40:00 +02:00
David Sherwood	3d706c20f8	[NFC][LoopVectorize] Remove setBestPlan in favour of getBestPlanFor I have removed LoopVectorizationPlanner::setBestPlan, since this function is quite aggressive because it deletes all other plans except the one containing the <VF,UF> pair required. The code is currently written to assume that all <VF,UF> pairs will live in the same vplan. This is overly restrictive, since scalable VFs live in different plans to fixed-width VFS. When we add support for vectorising epilogue loops when the main loop uses scalable vectors then we will the vplan for the main loop will be different to the epilogue. Instead I have added a new function called LoopVectorizationPlanner::getBestPlanFor that returns the best vplan for the <VF,UF> pair requested and leaves all the vplans untouched. We then pass this best vplan to LoopVectorizationPlanner::executePlan which now takes an additional VPlanPtr argument. Differential revision: https://reviews.llvm.org/D111125	2021-10-27 09:38:27 +01:00
Arthur Eubanks	ae27c57b18	[InferAddressSpaces] Make pass work with opaque pointers Avoid getPointerElementType().	2021-10-26 23:53:20 -07:00
Kazu Hirata	6af3e87d2d	[Hexagon] Remove set-but-unused variables (NFC)	2021-10-26 23:38:15 -07:00
Phoebe Wang	eb55c1f153	[X86][NFC] Add the missed `break;` for `79f9dfef0d`	2021-10-27 13:58:31 +08:00
Craig Topper	2783a5cfaf	[RISCV] Add ICmp and FCmp to shouldSinkOperands.	2021-10-26 22:23:54 -07:00
Lang Hames	91434d4469	[JITLink] Fix element-present check in MachOLinkGraphParser. Not all symbols are added to the index-to-symbol map, so we shouldn't use the size of the map as a proxy for the highest valid index.	2021-10-26 20:48:40 -07:00
Lang Hames	bfb40e83ee	[ORC] Don't try to perform empty deallocations.	2021-10-26 20:48:40 -07:00
Max Kazantsev	5961f0308f	[SCEV][NFC] Verify intergity of SCEVUsers Make sure that, for every living SCEV, we have all its direct operand tracking it as their user. Differential Revision: https://reviews.llvm.org/D112402 Reviewed By: reames	2021-10-27 09:54:49 +07:00
Ben Shi	97e52e1c35	[RISCV] Optimize immediate materialisation with SLLI.UW in the Zba extension Simplify "LUI+SLLI+ADDI+SLLI" and "LUI+ADDIW+SLLI+ADDI+SLLI" to "LUI+ADDIW+SLLIUW" to reduce total instruction amount. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D111933	2021-10-27 02:48:38 +00:00
David Blaikie	3ac709b6ce	llvm-dwarfdump --verify: Exit non-zero on simplified template name rebuilding failures	2021-10-26 15:57:16 -07:00
Austin Kerbow	02e60f2e77	[AMDGPU] Use max waves for scheduler's initial occupancy target The scheduler should set critical/excess register usage thresholds that are guided by the maximum possible occupancy for the function. This change is focused on setting proper lower bounds on register usage which we would typically only see when a specific number of maximum waves is requested with the "waves-per-eu" attribute, or by setting "amdgpu-num-vgpr\|sgpr" directly. This was broken previously. I have a follow-on patch that will address issues with the scheduler not targeting correct upper bounds on register usage which is typical with launch bounds and min "waves-per-eu". Changes by this patch: Set the initial critical register usage thresholds to minimum values that are determined by the maximum possible occupancy for the function, or the number of allocatable registers, whichever is lower. Avoid unisgned overflow if register limits are lower than the register tracking "ErrorMargin", I.e. when using stress-regalloc=2. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112373	2021-10-26 15:30:26 -07:00
Yuanfang Chen	7c3fa52785	[DebugInfo] Skip ODRUniquing for mismatched tags Otherwise, ODRUniquing would map some member method/variable MDNodes to have enum type DIScope, resulting in invalid debug info and bad DWARF. - Add a Verifier check that when a 'scope:' operand is an ODR type that is not an enum. - Makes ODRUniquing apply to only ODR types with the same tag so that the debuginfo/DWARF is well-formed. Reviewed By: probinson, aprantl Differential Revision: https://reviews.llvm.org/D111770	2021-10-26 15:28:25 -07:00
Sanjay Patel	acabad9ff6	[InstCombine] try to canonicalize icmp with trunc op into mask and cmp The motivating test is based on: https://llvm.org/PR52260 We have better analysis for X == 0, so try harder to form that.	2021-10-26 17:43:28 -04:00
Fangrui Song	226465efe3	[ARC] Fix `undefined symbol: llvm::MachineFunction::dump() const`	2021-10-26 11:44:18 -07:00
Usman Nadeem	da1318ccca	[NFC][Instcombine] Cleanup some obsolete matches in visitSelectInstr These are now redundant after https://reviews.llvm.org/D106872 Change-Id: I82edfedf1d45cac4e3368d77ce3a48c78e342c19	2021-10-26 10:07:08 -07:00
Rosie Sumpter	b716d0aa94	[LoopVectorize] Clean up VPReductionRecipe::execute. NFC Use RdxDesc->getOpcode instead of getUnderlingInstr()->getOpcode. Move the code which finds Kind and IsOrdered to be outside the for loop since neither of these change with the vector part. Differential Revision: https://reviews.llvm.org/D112547	2021-10-26 17:18:25 +01:00
Kazu Hirata	c3e698e2f5	[CodeGen, Hexagon] Use MachineBasicBlock::phis (NFC)	2021-10-26 09:01:29 -07:00
Jonas Paulsson	bb506938be	[SystemZ] Improvement of emitMemMemWrapper() It was discovered that an extra register COPY remained when expanding a (variable length) memory operation with a loop and there was another use of the involved address register(s) afterwards. A simple fix for this is to COPY the address registers before the loop and use that new vreg instead. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D112065	2021-10-26 17:03:01 +02:00
Alexey Bataev	ce14d1b690	[SLP]Do not reorder reduction nodes. The final reduction nodes should not be reordered, the order does not matter for reductions. Also, it might be profitable to vectorize smaller reduction trees, reduction cost may compensate small tree cost. Part of D111574 Differential Revision: https://reviews.llvm.org/D112467	2021-10-26 07:41:24 -07:00
zhijian	158083f0de	[AIX][XCOFF] parsing xcoff object file auxiliary header Summary: The patch supports parsing the xcoff object file auxiliary header with llvm-readobj with option "auxiliary-headers" the format of auxiliary header as https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/filesreference/XCOFF.html#XCOFF__fyovh386shar Reviewers: James Henderson, Jason Liu, Hubert Tong, Esme yi, Sean Fertile. Differential Revision: https://reviews.llvm.org/D82549	2021-10-26 10:40:25 -04:00
Neubauer, Sebastian	eb16570ab0	[AMDGPU] Remove unused CSR defs CSR_AMDGPU_VGPRs_24_255 and CSR_AMDGPU_VGPRs_32_255 are not used anywhere, so remove them. Differential Revision: https://reviews.llvm.org/D112535	2021-10-26 16:01:49 +02:00
Abinav Puthan Purayil	61e3b9fefe	[AMDGPU] Add constrained shift pattern matches. The motivation for this is due to clang's conformance to https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#operators-shift which makes clang emit (<shift> a, (and b, <width> - 1)) for `a <shift> b` in OpenCL where a is an int of bit width <width>. Differential revision: https://reviews.llvm.org/D110231	2021-10-26 19:07:19 +05:30
Chen Zheng	631f44f338	[PowerPC] use right extend type for SCEV Fix an issue caused by D108750 Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D112502	2021-10-26 13:32:03 +00:00
Abinav Puthan Purayil	781dd39b7b	[AMDGPU] Enable 48-bit mul in AMDGPUCodeGenPrepare. We were bailing out of creating 24-bit muls for results wider than 32 bits in AMDGPUCodeGenPrepare. With the 24-bit mulhi intrinsic, this change teaches AMDGPUCodeGenPrepare to generate the 48-bit mul correctly. Differential Revision: https://reviews.llvm.org/D112395	2021-10-26 18:53:07 +05:30
Abinav Puthan Purayil	9bd5cfeb1f	[AMDGPU] Implement llvm.amdgcn.mulhi.[i,u]24 intrinsics. These intrinsics maps to the 24-bit v_mul_hi instructions. This change also fixes an incorrect assumption on the associativity of 24-bit mulhi in its SDNode record in tblgen. Differential Revision: https://reviews.llvm.org/D112394	2021-10-26 18:53:07 +05:30
Sanjay Patel	2ab0148c14	[x86] use cast instead of dyn_cast for unchecked usage; NFC This was noted as an independent clean-up in D112464.	2021-10-26 08:20:19 -04:00
Neubauer, Sebastian	487f15603e	[AMDGPU] Fix setcc combine for i128 The combine asserted if constants could not be represented as uint64_t. Use APInts to fix this. Differential Revision: https://reviews.llvm.org/D112416	2021-10-26 13:39:50 +02:00
Jay Foad	c8e5aef1a0	[AMDGPU] Use standard MachineBasicBlock::getFallThrough method. NFCI. Differential Revision: https://reviews.llvm.org/D101825	2021-10-26 12:07:54 +01:00
Jonas Paulsson	9f8872779a	[SystemZ] Provide size values for PATCHPOINT, STACKMAP and FENTRY_CALL. All instructions must have a correct size value close to emission when SystemZLongBranch runs, or a necessary branch relaxation may be missed. This patch also adds an assert for instruction sizes in SystemZLongBranch. Review: Ulrich Weigand	2021-10-26 12:07:22 +02:00
Nikita Popov	11a8423dab	[SCEV] Use reverse() (NFC)	2021-10-26 11:08:58 +02:00
Max Kazantsev	9bbfe0f72c	[NFC] Remove obsolete simplifyOnceImpl function The function simplifyOnce only calls simplifyOnceImpl and does nothing else. Having this separate helper makes no sense. Removing it. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D112517 Reviewed By: mkazantsev	2021-10-26 13:51:42 +07:00
Max Kazantsev	d4c74cd4e8	[NFC] [LoopPeel] Update IDoms of non-loop blocks dominated by the loop When peeling a loop, we assume that the latch has a `br` terminator and that all loop exits are either terminated with an `unreachable` or have a terminating deoptimize call. So when we peel off the 1st iteration, we change the IDom of all loop exits to the peeled copy of `NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support loops with exits that are followed by a block with an `unreachable` or a terminating deoptimize call, changing the exit's idom wouldn't be enough and DT would be broken. For example, let `Exit1` and `Exit2` are loop exits, and each of them unconditionally branches to the same `unreachable` terminated block. So neither of the exits dominates this unreachable block. If we change the IDoms of the exits to some peeled loop block, we don't update the dominators of the unreachable block. Currently we just don't get to the peeling logic, saying that we can't peel such loops. Previously we stored exits' IDoms in a map before peeling a loop and then, after peeling off one iteration, we changed their IDoms. Now we use the same logic not only for exits but for all non-loop blocks dominated by the loop. So when we add logic to support peeling loops with exits which branch, for example, to an unreachable-terminated block, we would update the IDoms not only for exits, but for their successors. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D111611 Reviewed By: mkazantsev, nikic	2021-10-26 13:09:07 +07:00
Phoebe Wang	79f9dfef0d	[X86] Move splat addends from the gather/scatter index operand to the base address This can avoid a vector add and a constant pool load. Or an explicit broadcast in case of non-constant. Also reverse the transform any time we encounter a constant index addend that can't be moved to base. In that case pull the constant from base into the index. This reduces code size needed for the displacement since we needed the index add anyway. Limit this to scale of 1 to avoid divisibility and wrap issues. Authored by Craig. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D111595	2021-10-26 12:35:57 +08:00
Duncan P. N. Exon Smith	b12a864c29	Bitcode: Use Expected<T>::takeError() and moveInto() more, NFC Avoid naming some Expected<T> values in the Bitcode reader by using takeError() and moveInto() more often. This follows the smaller set of changes included in `2410fb4616`.	2021-10-25 16:03:40 -07:00
Zarko Todorovski	e9163660b1	[PPC][LLVM] Inclusive terms: remove references to sanity check in lib/Target/PowerPC Removed references to `sanity check` in `PPCBranchCoalescing.cpp` code comments. No word substitution made in this case, as the comments and code following illustrated are sufficient IMO. Reviewed By: quinnp Differential Revision: https://reviews.llvm.org/D112452	2021-10-25 18:13:54 -04:00
Craig Topper	d51e3a2139	[LegalizeTypes][TargetLowering] Merge getShiftAmountTyForConstant into TargetLowering::getShiftAmountTy. getShiftAmountTyForConstant is a special helper that changes the shift amount to i32 if the type chosen by TargetLowering::getShiftAmountTy can't represent all possible values. This is needed to satisfy an assert in SelectionDAG::getNode. It requires additional consideration to know when this helper should be used. I'm not sure that we are always using it when we should. This patch merges the getShiftAmountTyForConstant handling into TargetLowering::getShiftAmountTy so we don't need to think about it anymore. Technically this may slightly increase compile times since the majority of callers of getShiftAmountTy won't need this. Hopefully, this isn't an issue in practice. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112469	2021-10-25 14:06:53 -07:00
Nikita Popov	3a995c918e	[SCEV] Move SCEVLostPoisonFlags() check into SCEVExpander Always insert values into ExprValueMap, and instead skip using them in SCEVExpander if poison-generating flags have been lost. This ensures that all values that are in ValueExprMap are also in ExprValueMap, so we can use the latter to invalidate the former. This change is probably not entirely NFC for the case where originally the SCEV had no nowrap flags but they were inferred later, in which case that would now allow reusing the existing value for expansion. Differential Revision: https://reviews.llvm.org/D112389	2021-10-25 22:37:20 +02:00
Arthur Eubanks	4a9db7367d	[AlwaysInliner] Invalidate analyses when we delete functions Fixes PR52292. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D112473	2021-10-25 13:36:32 -07:00
Zarko Todorovski	9769e97c35	[LLVM] Inclusive terms: remove/replace references to sanity in RewriteStatepointsForGC.cpp and test Part of work to have the LLVM backend to use more inclusive terms. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D112461	2021-10-25 16:17:41 -04:00
Stefan Gränitz	cdb335ffaf	[JITLink] Fix warning 'shift count exceeds width' in AArch64 backend	2021-10-25 20:44:07 +02:00
Jeremy Morse	4136897bd4	[DebugInfo][InstrRef][NFC] Switch to using DenseMaps and similar There are a few STL containers hanging around that can become DenseMaps, SmallVectors and similar. This recovers a modest amount of compile time performance. While I'm here, adjust the bit layout of ValueIDNum: this was always supposed to act like a value type, however it seems that clang doesn't compile the comparison functions to act that way. Add a uint64_t to a union that explicitly aliases the bitfields, so that we can compare the whole value as a single integer. Differential Revision: https://reviews.llvm.org/D112333	2021-10-25 18:07:17 +01:00
Wouter van Oortmerssen	5694dbccc3	[WebAssembly] support Memory64 in target_features section Differential Revision: https://reviews.llvm.org/D112266	2021-10-25 09:31:45 -07:00
Jeremy Morse	97ddf49e43	[DebugInfo][InstrRef] Recover stack-slot tracking performance This patch is like D111627 -- instead of calculating IDF for every location on the stack, only do it for the smallest units of interference, and copy the PHIs for those units to any aliases. The test added runs placeMLocPHIs directly, and tests that: * A def of the lower 8 bits of a stack slot causes all aliasing regs to have PHIs placed, * It doesn't cause the equivalent location to x86's $ah, which isn't aliased, to have a PHI placed. Differential Revision: https://reviews.llvm.org/D112324	2021-10-25 17:31:09 +01:00
Philip Reames	f82cf6187f	[indvars] Fix pr52276 (missing one use check) The recently added logic to canonicalize exit conditions to unsigned relies on facts which hold about the use (i.e. exit test). Applying this blindly to the icmp is not legal, as there may be another use which never reaches the exit. Restrict ourselves to case where we have a single use.	2021-10-25 09:26:55 -07:00
Craig Topper	e2b7aabb57	[RISCV] Reduce the number of RISCV vector builtins by an order of magnitude. All but 2 of the vector builtins are only used by clang_builtin_alias. When using clang_builtin_alias, the type string of the builtin is never checked. Only the types in the function definition used for the alias are checked. This patch takes advantage of this to share a single builtin for many different types. We already used type overloads on the IR intrinsic so the codegen for the builtins that are being merge were already the same. This extends the type overloading to the builtins. I had to make a few tweaks to make this work. -Floating point vector-vector vmerge now uses the vmerge intrinsic instead of the vfmerge intrinsic. New isel patterns and tests are added to support this. -The SemaChecking for the immediate of vset_v/vget_v has been removed. Determining the valid range is harder now. I've added masking to ManualCodegen to ensure valid IR for invalid input. This reduces the number of builtins from ~25000 to ~1100. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D112102	2021-10-25 09:03:59 -07:00
Craig Topper	210b586a85	[RISCV] Add vcsr CSR name for V extension. Reviewed By: frasercrmck, kito-cheng Differential Revision: https://reviews.llvm.org/D112342	2021-10-25 08:56:25 -07:00
Danila Malyutin	7b102fcc91	[CodeGen] Fix dependence breaking for tied operands Differential Revision: https://reviews.llvm.org/D107582	2021-10-25 18:52:27 +03:00
Jeremy Morse	ee3eee71e4	[DebugInfo][InstrRef] Track values fused into stack spills During register allocation, some instructions can have stack spills fused into them. It means that when vregs are allocated on the stack we can convert: SETCCr %0 DBG_VALUE %0 to SETCCm %stack.0 DBG_VALUE %stack.0 Unfortunately instruction referencing finds this harder: a store to the stack doesn't have a specific operand number, therefore we don't substitute the old operand for a new operand, and the location is dropped. This patch implements a solution: just recognise the memory operand attached to an instruction with a Special Number (TM), and record a substitution between the old value and the new one. This patch adds substitution code to InlineSpiller to record such fused spills, and tracking in InstrRefBasedLDV to recognise such values, and produce the value numbers for them. Everything to do with the movement of stack-defined values is already handled in InstrRefBasedLDV. Differential Revision: https://reviews.llvm.org/D111317	2021-10-25 15:14:53 +01:00
Danila Malyutin	2d9ee590b6	[AArch64] Handle ST1iN instructions in isAArch64FrameOffsetLegal Before the code would crash with "unhandled opcode in isAArch64FrameOffsetLegal" when there was a spill from extractelement. Fixes pr52249 Differential Revision: https://reviews.llvm.org/D112311	2021-10-25 17:05:12 +03:00
Nikita Popov	0d20ebf686	[BasicAA] Use ranges for more than one index D109746 made BasicAA use range information to determine the minimum/maximum GEP offset. However, it was limited to the case of a single variable index. This patch extends support to multiple indices by adding all the ranges together. Differential Revision: https://reviews.llvm.org/D112378	2021-10-25 15:30:50 +02:00
Alexey Bataev	eb9b75dd4d	[SLP]Change the order of the reduction/binops args pair vectorization attempts. Need to change the order of the reduction/binops args pair vectorization attempts. Need to try to find the reduction at first and postpone vectorization of binops args. This may help to find more reduction patterns and vectorize them. Part of D111574. Differential Revision: https://reviews.llvm.org/D112224	2021-10-25 06:27:14 -07:00
Jeremy Morse	2eb96e1711	[DebugInfo][NFC] Avoid a use-after-free This patch swaps two lines -- the CurSucc reference can be invalidated by the call to DFS.push_back, therefore that should happen last. The usual hat-tip to asan for catching this. This patch also swaps an ealier call to ToAdd.insert and DFS.push_back, where a stable iterator (from successors()) is being used. This isn't strictly necessary, but is good for consistency and avoiding readers asking themselves why the two code portions have a different order.	2021-10-25 14:16:30 +01:00
Sanjay Patel	6e46b66e2a	[DAGCombiner] make matching bit-hack form of usubsat more flexible (i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128 As suggested in D112085, we can substitute 'xor' with 'add' in this pattern, and it is logically equivalent: https://alive2.llvm.org/ce/z/eJtWWC We canonicalize to 'xor' in IR, but SDAG does not do that (and it probably should not - https://llvm.org/PR52267 ), so it is possible to see either pattern in codegen. Note that 'sub' is a another potential pattern, but that is canonicalized to 'add' in DAGCombiner, so we don't need to worry about that variation. Differential Revision: https://reviews.llvm.org/D112377	2021-10-25 09:01:52 -04:00
Tim Northover	f9089accba	CodeGenPrep: remove all copies of GEP from list if there are duplicates. Unfortunately ToT has changed enough from the revision where this actually caused problems that the test no longer triggers an assertion failure.	2021-10-25 14:00:02 +01:00
Kerry McLaughlin	1f49b71fe5	[SVE][CodeGen] Enable reciprocal estimates for scalable fdiv/fsqrt This patch enables the use of reciprocal estimates for SVE when both the -Ofast and -mrecip flags are used. Reviewed By: david-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D111657	2021-10-25 11:30:44 +01:00
Max Kazantsev	a9b0776a81	[SimplifyCFG] Sanity assert in iterativelySimplifyCFG We observe a hang within iterativelySimplifyCFG due to infinite loop execution. Currently, there is no limit to this loop, so in case of bug it just works forever. This patch adds an assert that will break it after 1000 iterations if it didn't converge.	2021-10-25 17:10:17 +07:00
Nikita Popov	75384ecdf8	[InstSimplify] Refactor invariant.group load folding Currently strip.invariant/launder.invariant are handled by constructing constant expressions with the intrinsics skipped. This takes an alternative approach of accumulating the offset using stripAndAccumulateConstantOffsets(), with a flag to look through invariant.group intrinsics. Differential Revision: https://reviews.llvm.org/D112382	2021-10-25 10:56:25 +02:00

... 4 5 6 7 8 ...

152407 Commits