llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	5fc0e98d9a	[LoopIdiomRecognize] Minor cleanups to the FFS idiom matching. NFC -Make sure of the CreateShl/LShr/AShr methods that take a uint64_t instead of creating a ConstantInt for 1 ourselves. -Use Builder.getInt1 or ConstantInt::getBool instead of a conditional. -Pull out repeated calls to getType.	2021-04-07 10:03:14 -07:00
Roman Lebedev	24f67473dd	[InstCombine] foldAddWithConstant(): don't deal with non-immediate constants All of the code that handles general constant here (other than the more restrictive APInt-dealing code) expects that it is an immediate, because otherwise we won't actually fold the constants, and increase instruction count. And it isn't obvious why we'd be okay with increasing the number of constant expressions, those still will have to be run.. But after `2829094a8e` this could also cause endless combine loops. So actually properly restrict this code to immediates.	2021-04-07 19:50:19 +03:00
Sanjay Patel	1894c6c59e	[InstCombine] avoid infinite loop from partial undef vectors This fixes the examples from D99674 and https://llvm.org/PR49878 The matchers succeed on partial undef/poison vector constants, but the transform creates a full 'not' (-1) constant, so it would undo a demanded vector elements change triggered by the extractelement. Differential Revision: https://reviews.llvm.org/D100044	2021-04-07 12:18:12 -04:00
wlei	6d5132b426	[CSSPGO] Fix incorrect probe distribution factor computation in top-down inliner We see a regression related to low probe factor(0.01) which prevents some callsites being promoted in ICPPass and later cause the missing inline in CGSCC inliner. The root cause is due to redundant(the second) multiplication of the probe factor and this change try to fix it. `Sum` does multiply a factor right after findCallSamples but later when using as the parameter in setProbeDistributionFactor, it multiplies one again. This change could get ~2% perf back on mcf benchmark. In mcf, previously the corresponding factor is 1 and it's the recent feature introducing the <1 factor then trigger this bug. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D99787	2021-04-07 08:48:59 -07:00
Alexey Bataev	a78e86e6be	[SLP]Avoid multiple attempts to vectorize CmpInsts. No need to lookup through and/or try to vectorize operands of the CmpInst instructions during attempts to find/vectorize min/max reductions. Compiler implements postanalysis of the CmpInsts so we can skip extra attempts in tryToVectorizeHorReductionOrInstOperands and save compile time. Differential Revision: https://reviews.llvm.org/D99950	2021-04-07 06:15:42 -07:00
Sanjay Patel	0333ed8e0c	[InstCombine] move abs transform to helper function; NFC The swap of the operands can affect later transforms that are expecting a constant as operand 1. I don't think we can trigger a bug with the current code, but I hit that problem while drafting a new transform for min/max intrinsics.	2021-04-07 08:35:07 -04:00
Roman Lebedev	2829094a8e	Reland [InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858) This reverts commit `a547b4e26b`, relanding commit `31d219d299`, which was reverted because there was a conflicting inverse transform, which was causing an endless combine loop, which has now been adjusted. Original commit message: https://alive2.llvm.org/ce/z/67w-wQ We prefer `add`s over `sub`, and this particular xform allows further folds to happen: Fixes https://bugs.llvm.org/show_bug.cgi?id=49858	2021-04-07 12:06:25 +03:00
Roman Lebedev	93d1d94b74	[InstCombine] Restrict "C-(X+C2) --> (C-C2)-X" fold to immediate constants I.e., if any/all of the consants is an expression, don't do it. Since those constants won't reduce into an immediate, but would be left as an constant expression, they could cause endless combine loops after `31d219d299` added an inverse transformation.	2021-04-07 12:06:24 +03:00
Petr Hosek	a547b4e26b	Revert "[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858)" This reverts commit `31d219d299` which causes an infinite loop when compiling the XRay runtime.	2021-04-06 22:30:28 -07:00
Sidharth Baveja	d81d9e8b86	[SplitEdge] Update SplitCriticalEdge to return a nullptr only when the edge is not critical Summary: The function SplitCriticalEdge (called by SplitEdge) can return a nullptr in cases where the edge is a critical. SplitEdge uses SplitCriticalEdge assuming it can always split all critical edges, which is an incorrect assumption. The three cases where the function SplitCriticalEdge will return a nullptr is: 1. DestBB is an exception block 2. Options.IgnoreUnreachableDests is set to true and isa(DestBB->getFirstNonPHIOrDbgOrLifetime()) is not equal to a nullptr 3. LoopSimplify form must be preserved (Options.PreserveLoopSimplify is true) and it cannot be maintained for a loop due to indirect branches For each of these situations they are handled in the following way: 1. Modified the function ehAwareSplitEdge originally from llvm/lib/Transforms/Coroutines/CoroFrame.cpp to handle the cases when the DestBB is an exception block. This function is called directly in SplitEdge. SplitEdge does not call SplitCriticalEdge in this case 2. Options.IgnoreUnreachableDests is set to false by default, so this situation does not apply. 3. Return a nullptr in this situation since the SplitCriticalEdge also returned nullptr. Nothing we can do in this case. Reviewed By: asbirlea Differential Revision:https://reviews.llvm.org/D94619	2021-04-06 21:24:40 +00:00
Philip Reames	4bf8985f4f	Replace calls to IntrinsicInst::Create with CallInst::Create [nfc] There is no IntrinsicInst::Create. These are binding to the method in the super type. Be explicitly about which method is being called.	2021-04-06 13:23:58 -07:00
Philip Reames	908215b346	Use AssumeInst in a few more places [nfc] Follow up to `a6d2a8d6f5`. These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.	2021-04-06 13:18:53 -07:00
Philip Reames	9ef6aa020b	Plumb AssumeInst through operand bundle apis [nfc] Follow up to `a6d2a8d6f5`. This covers all the public interfaces of the bundle related code. I tried to cleanup the internals where the changes were obvious, but there's definitely more room for improvement.	2021-04-06 12:53:53 -07:00
Luís Marques	0c3bc1f3a4	[ASan][RISCV] Fix RISC-V memory mapping Fixes the ASan RISC-V memory mapping (originally introduced by D87580 and D87581). This should be an improvement both in terms of first principles soundness and observed test failures --- test failures would occur non-deterministically depending on the ASLR random offset. On RISC-V Linux (64-bit), `TASK_UNMAPPED_BASE` is currently defined as `PAGE_ALIGN(TASK_SIZE / 3)`. The non-power-of-two divisor makes the result be the not very round number 0x1555556000. That address had to be further rounded to ensure page alignment after the shadow scale shifting is applied. Still, that value explains why the mapping table may look less regular than expected. Further cleanups: - Moved the mapping table comment, to ensure that the two Linux/AArch64 tables stayed together; - Removed mention of Sv48. Neither the original mapping nor this one are compatible with an actual Linux Sv48 address space (mainline Linux still operates Sv48 in Sv39 mode). A future patch can improve this; - Removed the additional comments, for consistency. Differential Revision: https://reviews.llvm.org/D97646	2021-04-06 20:46:17 +01:00
Philip Reames	a6d2a8d6f5	Add a subclass of IntrinsicInst for llvm.assume [nfc] Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class. A follow up change will do the same for the newer assumption query/bundle mechanisms.	2021-04-06 11:16:22 -07:00
Arthur Eubanks	4e83e59eb8	[GVN] Add missing ICF update performScalarPREInsertion() inserts instructions into blocks that we need to tell ImplicitControlFlowTracking about, otherwise the ICF cache may be invalid. Fixes PR49193. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99909	2021-04-06 10:13:42 -07:00
Philip Reames	21d4839948	Move GCRelocateInst and GCResultInst to IntrinsicInst.h [nfc] These two are part of the IntrinsicInst class hierarchy and it helps to cut down on some redundant includes.	2021-04-06 08:33:15 -07:00
Philip Reames	52ecd94cfb	Remove last remnants of PR49607 migration [NFC] The key change (`4f5e92c`) to switch gc.result and gc.relocate to being readnone landed nearly two weeks ago, and we haven't seen any fallout. Time to remove the code added to make reverting easy.	2021-04-06 07:56:55 -07:00
Jan Svoboda	fb6a5237aa	Revert "[IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction" This reverts commit `167ea67d` This causes a bunch of build failures: * http://lab.llvm.org:8011/#/builders/121/builds/6287 * http://green.lab.llvm.org/green/job/clang-stage1-RA/19915	2021-04-06 16:33:28 +02:00
Benjamin Kramer	ce4acb01b3	Avoid unused variable warning in Release builds	2021-04-06 16:25:19 +02:00
Kerry McLaughlin	7344f3d39a	[LoopVectorize] Add strict in-order reduction support for fixed-width vectorization Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to reorder FP operations. However, it may still be beneficial to vectorize the loop by moving the reduction inside the vectorized loop and making sure that the scalar reduction value be an input to the horizontal reduction, e.g: %phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ] %load = load <8 x float> %reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load) This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions. For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D98435	2021-04-06 14:45:34 +01:00
Roman Lebedev	31d219d299	[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858) https://alive2.llvm.org/ce/z/67w-wQ We prefer `add`s over `sub`, and this particular xform allows further folds to happen: Fixes https://bugs.llvm.org/show_bug.cgi?id=49858	2021-04-06 15:58:14 +03:00
Simon Pilgrim	b8aba76a4e	LoopFlatten - CanWidenIV - Fix uninitialized variable warnings and use for-range loop. NFCI. Fix static analysis uninitialized variable warnings, and use for-range loop iteration across WideIVs array.	2021-04-06 12:24:20 +01:00
Abhina Sreeskantharajan	82b3e28e83	[SystemZ][z/OS][Windows] Add new OF_TextWithCRLF flag and use this flag instead of OF_Text Problem: On SystemZ we need to open text files in text mode. On Windows, files opened in text mode adds a CRLF '\r\n' which may not be desirable. Solution: This patch adds two new flags - OF_CRLF which indicates that CRLF translation is used. - OF_TextWithCRLF = OF_Text \| OF_CRLF indicates that the file is text and uses CRLF translation. Developers should now use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF. So this is the behaviour per platform with my patch: z/OS: OF_None: open in binary mode OF_Text : open in text mode OF_TextWithCRLF: open in text mode Windows: OF_None: open file with no carriage return OF_Text: open file with no carriage return OF_TextWithCRLF: open file with carriage return The Major change is in llvm/lib/Support/Windows/Path.inc to only set text mode if the OF_CRLF is set. ``` if (Flags & OF_CRLF) CrtOpenFlags \|= _O_TEXT; ``` These following files are the ones that still use OF_Text which I left unchanged. I modified all these except raw_ostream.cpp in recent patches so I know these were previously in Binary mode on Windows. ./llvm/lib/Support/raw_ostream.cpp ./llvm/lib/TableGen/Main.cpp ./llvm/tools/dsymutil/DwarfLinkerForBinary.cpp ./llvm/unittests/Support/Path.cpp ./clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp ./clang/lib/Frontend/CompilerInstance.cpp ./clang/lib/Driver/Driver.cpp ./clang/lib/Driver/ToolChains/Clang.cpp Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99426	2021-04-06 07:23:31 -04:00
Kerry McLaughlin	857b8a73da	[LoopVectorize] Change the identity element for FAdd Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd. Reviewed By: dmgreen, spatel Differential Revision: https://reviews.llvm.org/D98963	2021-04-06 12:13:43 +01:00
Florian Hahn	a6b06b785c	[VPlan] Print VPValue operands for VPWidenPHI if possible. For VPWidenPHIRecipes that model all incoming values as VPValue operands, print those operands instead of printing the original PHI. D99294 updates recipes of reduction PHIs to use the VPValue for the incoming value from the loop backedge, making use of this new printing.	2021-04-06 12:11:21 +01:00
madhur13490	167ea67d76	[IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction This patch enhances hasAddressTaken() to ignore bitcasts as a callee in callbase instruction. Such bitcast usage doesn't really take the address in a useful meaningful way. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D98884	2021-04-06 09:23:46 +00:00
Arthur Eubanks	ea0e2ca1ac	[SROA] Allow SROA on pointers with invariant group intrinsic uses When we are able to SROA an alloca, we know all uses of it, meaning we don't have to preserve the invariant group intrinsics and metadata. It's possible that we could lose information regarding redundant loads/stores, but that's unlikely to have any real impact since right now the only user is Clang and vtables. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99760	2021-04-05 19:53:40 -07:00
Ta-Wei Tu	6a82ace5f2	[LoopFusion] Bails out if only the second candidate is guarded (PR48060) If only the second candidate loop is guarded while the first one is not, fusioning two loops might not be valid but this check is currently missing. Fixes https://bugs.llvm.org/show_bug.cgi?id=48060 Reviewed By: sidbav Differential Revision: https://reviews.llvm.org/D99716	2021-04-06 01:08:56 +08:00
Sanjay Patel	c590a9880d	[InstCombine] fix potential miscompile in select value equivalence As shown in the example based on: https://llvm.org/PR49832 ...and the existing test, we can't substitute a vector value because the equality compare replacement that we are attempting requires that the comparison is true for the entire value. Vector select can be partly true/false.	2021-04-05 12:25:40 -04:00
Alexey Bataev	00a84f9a7f	[SLP]Improve vectorization of the CmpInst instructions. During vectorization better to postpone the vectorization of the CmpInst instructions till the end of the basic block. Otherwise we may vectorize it too early and may miss some vectorization patterns, like reductions. Reworked part of D57059 Differential Revision: https://reviews.llvm.org/D99796	2021-04-05 06:22:51 -07:00
Roman Lebedev	2760a808b9	[InstCombine] dropRedundantMaskingOfLeftShiftInput(): check that adding shift amounts doesn't overflow (PR49778) This is identical to `781d077afb`, but for the other function. For certain shift amount bit widths, we must first ensure that adding shift amounts is safe, that the sum won't have an unsigned overflow. Fixes https://bugs.llvm.org/show_bug.cgi?id=49778	2021-04-04 23:26:41 +03:00
Roman Lebedev	dceb3e5996	[NFC][InstCombine] Extract canTryToConstantAddTwoShiftAmounts() as helper	2021-04-04 23:26:41 +03:00
Sanjay Patel	c0645f1324	[InstCombine] fold popcount of exactly one bit to shift This is discussed in https://llvm.org/PR48999 , but it does not solve that request. The difference in the vector test shows that some other logic transform is limited to scalar types.	2021-04-04 11:43:49 -04:00
Nikita Popov	9bad7de9a3	[SimplifyCFG] Handle two equal cases in switch to select When converting a switch with two cases and a default into a select, also handle the denegerate case where two cases have the same value. Generate this case directly as %or = or i1 %cmp1, %cmp2 %res = select i1 %or, i32 %val, i32 %default rather than %sel1 = select i1 %cmp1, i32 %val, i32 %default %res = select i1 %cmp2, i32 %val, i32 %sel1 as InstCombine is going to canonicalize to the former anyway.	2021-04-04 17:27:28 +02:00
Juneyoung Lee	5207cde5cb	[InstCombine] Conditionally fold select i1 into and/or This patch fixes llvm.org/pr49688 by conditionally folding select i1 into and/or: ``` select cond, cond2, false -> and cond, cond2 ``` This is not safe if cond2 is poison whereas cond isn’t. Unconditionally disabling this transformation affects later pipelines that depend on and/or i1s. To minimize its impact, this patch conservatively checks whether cond2 is an instruction that creates a poison or its operand creates a poison. This approach is similar to what InstSimplify's SimplifyWithOpReplaced is doing. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99674	2021-04-04 14:11:28 +09:00
Fangrui Song	8e5f3d04f2	[SLPVectorizer] Fix divide-by-zero after D99719 Will add a test case later.	2021-04-02 11:13:51 -07:00
Sanjay Patel	412fc74140	[InstCombine] fold not+or+neg ~((-X) \| Y) --> (X - 1) & (~Y) We generally prefer 'add' over 'sub', this reduces the dependency chain, and this looks better for codegen on x86, ARM, and AArch64 targets. https://llvm.org/PR45755 https://alive2.llvm.org/ce/z/cxZDSp	2021-04-02 13:16:36 -04:00
Dimitry Andric	6abb92f210	[SCCP] Avoid modifying AdditionalUsers while iterating over it When run under valgrind, or with a malloc that poisons freed memory, this can lead to segfaults or other problems. To avoid modifying the AdditionalUsers DenseMap while still iterating, save the instructions to be notified in a separate SmallPtrSet, and use this to later call OperandChangedState on each instruction. Fixes PR49582. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D98602	2021-04-02 19:05:59 +02:00
Florian Hahn	8867fc69f0	[LV] Hoist mapping of IR operands to VPValues (NFC). This patch moves mapping of IR operands to VPValues out of tryToCreateWidenRecipe. This allows using existing VPValue operands when widening recipes directly, which will be introduced in future patches.	2021-04-02 17:57:20 +01:00
Philip Reames	2c4548e18e	[rs4gc] Use loops instead of straightline code for attribute stripping [nfc] Mostly because I'm about to add more attributes and the straightline copies get much uglier. What's currently there isn't too bad.	2021-04-02 09:25:15 -07:00
Philip Reames	a505801e2b	[rs4gc] Strip nofree and nosync attributes when lowering from abstract model The safepoints being inserted exists to free memory, or coordinate with another thread to do so. Thus, we must strip any inferred attributes and reinfer them after the lowering. I'm not aware of any active miscompiles caused by this, but since I'm working on strengthening inference of both and leveraging them in the optimization decisions, I figured a bit of future proofing was warranted.	2021-04-02 09:12:24 -07:00
Alexey Bataev	5fcb07a070	[SLP]Fix a bug in min/max reduction, number of condition uses. The ultimate reduction node may have multiple uses, but if the ultimate reduction is min/max reduction and based on SelectInstruction, the condition of this select instruction must have only single use. Differential Revision: https://reviews.llvm.org/D99753	2021-04-02 07:09:44 -07:00
Jeroen Dobbelaere	b82b305cf9	[InstCombine] Fix out-of-bounds ashr(shl) optimization This fixes a crash found by the oss fuzzer and reported by @fhahn. The suggestion of @RKSimon seems to be the correct fix here. (See D91343). The oss fuzz report can be found here: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32759 Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D99792	2021-04-02 13:45:11 +02:00
Florian Hahn	0f3230390b	[SLP] Better estimate cost of no-op extracts on target vectors. The motivation for this patch is to better estimate the cost of extracelement instructions in cases were they are going to be free, because the source vector can be used directly. A simple example is %v1.lane.0 = extractelement <2 x double> %v.1, i32 0 %v1.lane.1 = extractelement <2 x double> %v.1, i32 1 %a.lane.0 = fmul double %v1.lane.0, %x %a.lane.1 = fmul double %v1.lane.1, %y Currently we only consider the extracts free, if there are no other users. In this particular case, on AArch64 which can fit <2 x double> in a vector register, the extracts should be free, independently of other users, because the source vector of the extracts will be in a vector register directly, so it should be free to use the vector directly. The SLP vectorized version of noop_extracts_9_lanes is 30%-50% faster on certain AArch64 CPUs. It looks like this does not impact any code in SPEC2000/SPEC2006/MultiSource both on X86 and AArch64 with -O3 -flto. This originally regressed after D80773, so if there's a better alternative to explore, I'd be more than happy to do that. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D99719	2021-04-02 10:40:12 +01:00
Evgeniy Brevnov	2388aae401	[NARY-REASSOCIATE] Support reassociation of min/max Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available. Reviewed By: mkazantsev, lebedev.ri Differential Revision: https://reviews.llvm.org/D88287	2021-04-02 15:30:13 +07:00
Roman Lebedev	a26f1bf67e	[PassManager] Run additional LICM before LoopRotate Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (sic) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: \| statistic name \| LoopRotate-LICM \| LICM-LoopRotate \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015930 \| 9015799 \| -131 \| 0.00% \| 0.00% \| \| indvars.NumElimCmp \| 3536 \| 3544 \| 8 \| 0.23% \| 0.23% \| \| indvars.NumElimExt \| 36725 \| 36580 \| -145 \| -0.39% \| 0.39% \| \| indvars.NumElimIV \| 1197 \| 1187 \| -10 \| -0.84% \| 0.84% \| \| indvars.NumElimIdentity \| 143 \| 136 \| -7 \| -4.90% \| 4.90% \| \| indvars.NumElimRem \| 4 \| 5 \| 1 \| 25.00% \| 25.00% \| \| indvars.NumLFTR \| 29842 \| 29890 \| 48 \| 0.16% \| 0.16% \| \| indvars.NumReplaced \| 2293 \| 2227 \| -66 \| -2.88% \| 2.88% \| \| indvars.NumSimplifiedSDiv \| 6 \| 8 \| 2 \| 33.33% \| 33.33% \| \| indvars.NumWidened \| 26438 \| 26329 \| -109 \| -0.41% \| 0.41% \| \| instcount.TotalBlocks \| 1178338 \| 1173840 \| -4498 \| -0.38% \| 0.38% \| \| instcount.TotalFuncs \| 111825 \| 111829 \| 4 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9905442 \| 9896139 \| -9303 \| -0.09% \| 0.09% \| \| lcssa.NumLCSSA \| 425871 \| 423961 \| -1910 \| -0.45% \| 0.45% \| \| licm.NumHoisted \| 378357 \| 378753 \| 396 \| 0.10% \| 0.10% \| \| licm.NumMovedCalls \| 2193 \| 2208 \| 15 \| 0.68% \| 0.68% \| \| licm.NumMovedLoads \| 35899 \| 31821 \| -4078 \| -11.36% \| 11.36% \| \| licm.NumPromoted \| 11178 \| 11154 \| -24 \| -0.21% \| 0.21% \| \| licm.NumSunk \| 13359 \| 13587 \| 228 \| 1.71% \| 1.71% \| \| loop-delete.NumDeleted \| 8547 \| 8402 \| -145 \| -1.70% \| 1.70% \| \| loop-instsimplify.NumSimplified \| 12876 \| 11890 \| -986 \| -7.66% \| 7.66% \| \| loop-peel.NumPeeled \| 1008 \| 925 \| -83 \| -8.23% \| 8.23% \| \| loop-rotate.NumNotRotatedDueToHeaderSize \| 368 \| 365 \| -3 \| -0.82% \| 0.82% \| \| loop-rotate.NumRotated \| 42015 \| 42003 \| -12 \| -0.03% \| 0.03% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 240 \| 242 \| 2 \| 0.83% \| 0.83% \| \| loop-simplifycfg.NumLoopExitsDeleted \| 497 \| 20 \| -477 \| -95.98% \| 95.98% \| \| loop-simplifycfg.NumTerminatorsFolded \| 618 \| 336 \| -282 \| -45.63% \| 45.63% \| \| loop-unroll.NumCompletelyUnrolled \| 11028 \| 11032 \| 4 \| 0.04% \| 0.04% \| \| loop-unroll.NumUnrolled \| 12608 \| 12529 \| -79 \| -0.63% \| 0.63% \| \| mem2reg.NumDeadAlloca \| 10222 \| 10221 \| -1 \| -0.01% \| 0.01% \| \| mem2reg.NumPHIInsert \| 192110 \| 192106 \| -4 \| 0.00% \| 0.00% \| \| mem2reg.NumSingleStore \| 637650 \| 637643 \| -7 \| 0.00% \| 0.00% \| \| scalar-evolution.NumBruteForceTripCountsComputed \| 814 \| 812 \| -2 \| -0.25% \| 0.25% \| \| scalar-evolution.NumTripCountsComputed \| 283108 \| 282934 \| -174 \| -0.06% \| 0.06% \| \| scalar-evolution.NumTripCountsNotComputed \| 106712 \| 106718 \| 6 \| 0.01% \| 0.01% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 4752 \| -426 \| -8.23% \| 8.23% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 503 \| -411 \| -44.97% \| 44.97% \| \| simple-loop-unswitch.NumSwitches \| 20 \| 18 \| -2 \| -10.00% \| 10.00% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 95 \| -88 \| -48.09% \| 48.09% \| ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? \| statistic name \| LoopRotate-LICM \| LICM-LoopRotate-LICM \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015930 \| 9014474 \| -1456 \| -0.02% \| 0.02% \| \| indvars.NumElimCmp \| 3536 \| 3546 \| 10 \| 0.28% \| 0.28% \| \| indvars.NumElimExt \| 36725 \| 36681 \| -44 \| -0.12% \| 0.12% \| \| indvars.NumElimIV \| 1197 \| 1185 \| -12 \| -1.00% \| 1.00% \| \| indvars.NumElimIdentity \| 143 \| 146 \| 3 \| 2.10% \| 2.10% \| \| indvars.NumElimRem \| 4 \| 5 \| 1 \| 25.00% \| 25.00% \| \| indvars.NumLFTR \| 29842 \| 29899 \| 57 \| 0.19% \| 0.19% \| \| indvars.NumReplaced \| 2293 \| 2299 \| 6 \| 0.26% \| 0.26% \| \| indvars.NumSimplifiedSDiv \| 6 \| 8 \| 2 \| 33.33% \| 33.33% \| \| indvars.NumWidened \| 26438 \| 26404 \| -34 \| -0.13% \| 0.13% \| \| instcount.TotalBlocks \| 1178338 \| 1173652 \| -4686 \| -0.40% \| 0.40% \| \| instcount.TotalFuncs \| 111825 \| 111829 \| 4 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9905442 \| 9895452 \| -9990 \| -0.10% \| 0.10% \| \| lcssa.NumLCSSA \| 425871 \| 425373 \| -498 \| -0.12% \| 0.12% \| \| licm.NumHoisted \| 378357 \| 383352 \| 4995 \| 1.32% \| 1.32% \| \| licm.NumMovedCalls \| 2193 \| 2204 \| 11 \| 0.50% \| 0.50% \| \| licm.NumMovedLoads \| 35899 \| 35755 \| -144 \| -0.40% \| 0.40% \| \| licm.NumPromoted \| 11178 \| 11163 \| -15 \| -0.13% \| 0.13% \| \| licm.NumSunk \| 13359 \| 14321 \| 962 \| 7.20% \| 7.20% \| \| loop-delete.NumDeleted \| 8547 \| 8538 \| -9 \| -0.11% \| 0.11% \| \| loop-instsimplify.NumSimplified \| 12876 \| 12041 \| -835 \| -6.48% \| 6.48% \| \| loop-peel.NumPeeled \| 1008 \| 924 \| -84 \| -8.33% \| 8.33% \| \| loop-rotate.NumNotRotatedDueToHeaderSize \| 368 \| 365 \| -3 \| -0.82% \| 0.82% \| \| loop-rotate.NumRotated \| 42015 \| 42005 \| -10 \| -0.02% \| 0.02% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 240 \| 241 \| 1 \| 0.42% \| 0.42% \| \| loop-simplifycfg.NumTerminatorsFolded \| 618 \| 619 \| 1 \| 0.16% \| 0.16% \| \| loop-unroll.NumCompletelyUnrolled \| 11028 \| 11029 \| 1 \| 0.01% \| 0.01% \| \| loop-unroll.NumUnrolled \| 12608 \| 12525 \| -83 \| -0.66% \| 0.66% \| \| mem2reg.NumPHIInsert \| 192110 \| 192073 \| -37 \| -0.02% \| 0.02% \| \| mem2reg.NumSingleStore \| 637650 \| 637652 \| 2 \| 0.00% \| 0.00% \| \| scalar-evolution.NumTripCountsComputed \| 283108 \| 282998 \| -110 \| -0.04% \| 0.04% \| \| scalar-evolution.NumTripCountsNotComputed \| 106712 \| 106691 \| -21 \| -0.02% \| 0.02% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 5185 \| 7 \| 0.14% \| 0.14% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 925 \| 11 \| 1.20% \| 1.20% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 179 \| -4 \| -2.19% \| 2.19% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 4752 \| -426 \| -8.23% \| 8.23% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 503 \| -411 \| -44.97% \| 44.97% \| \| simple-loop-unswitch.NumSwitches \| 20 \| 18 \| -2 \| -10.00% \| 10.00% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 95 \| -88 \| -48.09% \| 48.09% \| I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: \| statistic name \| LICM-LoopRotate \| LICM-LoopRotate-LICM \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015799 \| 9014474 \| -1325 \| -0.01% \| 0.01% \| \| indvars.NumElimCmp \| 3544 \| 3546 \| 2 \| 0.06% \| 0.06% \| \| indvars.NumElimExt \| 36580 \| 36681 \| 101 \| 0.28% \| 0.28% \| \| indvars.NumElimIV \| 1187 \| 1185 \| -2 \| -0.17% \| 0.17% \| \| indvars.NumElimIdentity \| 136 \| 146 \| 10 \| 7.35% \| 7.35% \| \| indvars.NumLFTR \| 29890 \| 29899 \| 9 \| 0.03% \| 0.03% \| \| indvars.NumReplaced \| 2227 \| 2299 \| 72 \| 3.23% \| 3.23% \| \| indvars.NumWidened \| 26329 \| 26404 \| 75 \| 0.28% \| 0.28% \| \| instcount.TotalBlocks \| 1173840 \| 1173652 \| -188 \| -0.02% \| 0.02% \| \| instcount.TotalInsts \| 9896139 \| 9895452 \| -687 \| -0.01% \| 0.01% \| \| lcssa.NumLCSSA \| 423961 \| 425373 \| 1412 \| 0.33% \| 0.33% \| \| licm.NumHoisted \| 378753 \| 383352 \| 4599 \| 1.21% \| 1.21% \| \| licm.NumMovedCalls \| 2208 \| 2204 \| -4 \| -0.18% \| 0.18% \| \| licm.NumMovedLoads \| 31821 \| 35755 \| 3934 \| 12.36% \| 12.36% \| \| licm.NumPromoted \| 11154 \| 11163 \| 9 \| 0.08% \| 0.08% \| \| licm.NumSunk \| 13587 \| 14321 \| 734 \| 5.40% \| 5.40% \| \| loop-delete.NumDeleted \| 8402 \| 8538 \| 136 \| 1.62% \| 1.62% \| \| loop-instsimplify.NumSimplified \| 11890 \| 12041 \| 151 \| 1.27% \| 1.27% \| \| loop-peel.NumPeeled \| 925 \| 924 \| -1 \| -0.11% \| 0.11% \| \| loop-rotate.NumRotated \| 42003 \| 42005 \| 2 \| 0.00% \| 0.00% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 242 \| 241 \| -1 \| -0.41% \| 0.41% \| \| loop-simplifycfg.NumLoopExitsDeleted \| 20 \| 497 \| 477 \| 2385.00% \| 2385.00% \| \| loop-simplifycfg.NumTerminatorsFolded \| 336 \| 619 \| 283 \| 84.23% \| 84.23% \| \| loop-unroll.NumCompletelyUnrolled \| 11032 \| 11029 \| -3 \| -0.03% \| 0.03% \| \| loop-unroll.NumUnrolled \| 12529 \| 12525 \| -4 \| -0.03% \| 0.03% \| \| mem2reg.NumDeadAlloca \| 10221 \| 10222 \| 1 \| 0.01% \| 0.01% \| \| mem2reg.NumPHIInsert \| 192106 \| 192073 \| -33 \| -0.02% \| 0.02% \| \| mem2reg.NumSingleStore \| 637643 \| 637652 \| 9 \| 0.00% \| 0.00% \| \| scalar-evolution.NumBruteForceTripCountsComputed \| 812 \| 814 \| 2 \| 0.25% \| 0.25% \| \| scalar-evolution.NumTripCountsComputed \| 282934 \| 282998 \| 64 \| 0.02% \| 0.02% \| \| scalar-evolution.NumTripCountsNotComputed \| 106718 \| 106691 \| -27 \| -0.03% \| 0.03% \| \| simple-loop-unswitch.NumBranches \| 4752 \| 5185 \| 433 \| 9.11% \| 9.11% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 503 \| 925 \| 422 \| 83.90% \| 83.90% \| \| simple-loop-unswitch.NumSwitches \| 18 \| 20 \| 2 \| 11.11% \| 11.11% \| \| simple-loop-unswitch.NumTrivial \| 95 \| 179 \| 84 \| 88.42% \| 88.42% \| {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249	2021-04-02 11:11:42 +03:00
Juneyoung Lee	c664769330	[AssumeBundles] offset should be added to correctly calculate align This is a patch to fix the bug in alignment calculation (see https://reviews.llvm.org/D90529#2619492). Consider this code: ``` call void @llvm.assume(i1 true) ["align"(i32* %a, i32 32, i32 28)] %arrayidx = getelementptr inbounds i32, i32* %a, i64 -1 ; aligment of %arrayidx? ``` The llvm.assume guarantees that `%a - 28` is 32-bytes aligned, meaning that `%a` is 32k + 28 for some k. Therefore `a - 4` cannot be 32-bytes aligned but the existing code was calculating the pointer as 32-bytes aligned. The reason why this happened is as follows. `DiffSCEV` stores `%arrayidx - %a` which is -4. `OffSCEV` stores the offset value of “align”, which is 28. `DiffSCEV` + `OffSCEV` = 24 should be used for `a - 4`'s offset from 32k, but `DiffSCEV` - `OffSCEV` = 32 was being used instead. Reviewed By: Tyker Differential Revision: https://reviews.llvm.org/D98759	2021-04-02 12:32:05 +09:00
Philip Reames	91790c6785	[indvars[ Fix pr49802 by checking for SCEVCouldNotCompute The code is assuming that having an exact exit count for the loop implies that exit counts for every exit are known. This used to be true, but when we added handling for dead exits we broke this invariant. The new invariant is that an exact loop count implies that any exits non trivially dead have exit counts. We could have fixed this by either a) explicitly checking for a dead exit, or b) just testing for SCEVCouldNotCompute. I chose the second as it was simpler. (Debugging this took longer than it should have since I'd mistyped the original assert and it wasn't checking what it was meant to...) p.s. Sorry for the lack of test case. Getting things into a state to actually hit this is difficult and fragile. The original repro involves loop-deletion leaving SCEV in a slightly inprecise state which lets us bypass other transforms in IndVarSimplify on the way to this one. All of my attempts to separate it into a standalone test failed.	2021-04-01 17:53:44 -07:00
Philip Reames	b23a314146	[funcattrs] Respect nofree attribute on callsites (not just callee)	2021-04-01 14:45:49 -07:00
Philip Reames	1e69a5af92	[Attributor] Cleanup detection of non-relaxed atomics in nosync inference The code was checking for cases which are disallowed by the verifier. Delete dead code and adjust style.	2021-04-01 12:01:29 -07:00
Philip Reames	8e596f7e27	[Attributor] Cleanup intrinsic handling in nosync inference [mostly NFC] Mostly stylistic adjustment, but the old code didn't handle the memcpy.inline intrinsic. By using the matcher class, we now do.	2021-04-01 11:49:59 -07:00
Philip Reames	6ef4505298	[funcattrs] Infer nosync from readnone and non-convergent This implements the most basic possible nosync inference. The choice of inference rule is taken from the comments in attributor and the discussion on the review of the change which introduced the nosync attribute (`0626367202`). This is deliberately minimal. As noted in code comments, I do plan to add a more robust inference which actually scans the function IR directly, but a) I need to do some refactoring of the attributor code to use common interfaces, and b) I wanted to get something in. I also wanted to minimize the "interesting" analysis discussion since that's time intensive. Context: This combines with existing nofree attribute inference to help prove dereferenceability in the ongoing deref-at-point semantics work. Differential Revision: https://reviews.llvm.org/D99749	2021-04-01 11:37:34 -07:00
Philip Reames	ffa15e9463	Extract isVolatile helper on Instruction [NFCI] We have this logic duplicated in several cases, none of which were exhaustive. Consolidate it in one place. I don't believe this actually impacts behavior of the callers. I think they all filter their inputs such that their partial implementations were correct. If not, this might be fixing a cornercase bug.	2021-04-01 11:24:02 -07:00
Philip Reames	6b05d753e0	Mark unordered memset/memmove/memcpy as nosync Mostly a means to remove a bit of code from attributor in advance of implementing a FuncAttr inference for nosync.	2021-04-01 10:38:54 -07:00
Alexey Bataev	c03696da5e	[SLP]Improve and fix getVectorElementSize. 1. Need to cleanup InstrElementSize map for each new tree, otherwise might use sizes from the previous run of the vectorization attempt. 2. No need to include into analysis the instructions from the different basic blocks to save compile time. Differential Revision: https://reviews.llvm.org/D99677	2021-04-01 06:51:26 -07:00
Alexey Bataev	ce98a0556a	[SLP]Remove `else` after `return`, NFC.`	2021-04-01 05:33:01 -07:00
Yevgeny Rouban	1ed53d44d8	[LoopFlatten] Do not report CFG analyses as up-to-date Removes CFGAnalyses from the preserved analyses set returned by LoopFlattenPass::run(). Reviewed By: Dave Green, Ta-Wei Tu Differential Revision: https://reviews.llvm.org/D99700	2021-04-01 15:52:36 +07:00
Max Kazantsev	a1d83776bf	[NFC] Undo some erroneous renamings Some vars renamed by mistake during auto-replacements. Undoing them.	2021-04-01 13:10:10 +07:00
Max Kazantsev	630818a850	[NFC] Disambiguate LI in GVN Name GVN uses name 'LI' for two different unrelated things: LoadInst and LoopInfo. This patch relates the variables with former meaning into 'Load' to disambiguate the code.	2021-04-01 12:40:35 +07:00
KAWASHIMA Takahiro	5fac7c6046	[GVN] Propagate llvm.access.group metadata of loads Before this change, the `llvm.access.group` metadata was dropped when moving a load instruction in GVN. This prevents vectorizing a C/C++ loop with `#pragma clang loop vectorize(assume_safety)`. This change propagates the metadata as well as other metadata if it is safe (the move-destination basic block and source basic block belong to the same loop). Differential Revision: https://reviews.llvm.org/D93503	2021-04-01 10:00:48 +09:00
qixingxue	62b74f7564	[GVN][NFC] Refactor analyzeLoadFromClobberingWrite This commit adjusts the order of two swappable if statements to make code cleaner. Reviewed By: lattner, nikic Differential Revision: https://reviews.llvm.org/D99648	2021-04-01 08:35:35 +08:00
Roman Lebedev	43ded90094	[NFC][LoopRotation] Count the number of instructions hoisted/cloned into preheader	2021-03-31 23:27:36 +03:00
Huihui Zhang	fe5c4a06a4	[LoopVectorize] Use SetVector to track uniform uses to prevent non-determinism. Use SetVector instead of SmallPtrSet to track values with uniform use. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM test consecutive-ptr-uniforms.ll . Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99549	2021-03-31 11:21:07 -07:00
Sanjay Patel	1462bdf1b9	[InstCombine] fold abs(srem X, 2) This is a missing optimization based on an example in: https://llvm.org/PR49763 As noted there and the test here, we could add a more general fold if that is shown useful. https://alive2.llvm.org/ce/z/xEHdTv https://alive2.llvm.org/ce/z/97dcY5	2021-03-31 11:29:20 -04:00
Sander de Smalen	7108b2dec1	[SVE] Fix LoopVectorizer test scalalable-call.ll This marks FSIN and other operations to EXPAND for scalable vectors, so that they are not assumed to be legal by the cost-model. Depends on D97470 Reviewed By: dmgreen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D97471	2021-03-31 14:52:49 +01:00
Chuanqi Xu	eb51dd719f	[Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2 Summary: Try to insert dbg.declare to entry.resume basic block in resume function. In this way, we could print alloca such as __promise in gdb/lldb under O2, which would be beneficial to debug coroutine program. Test Plan: check-llvm Reviewed by: aprantl Differential Revision: https://reviews.llvm.org/D96938	2021-03-31 10:37:06 +08:00
Fangrui Song	3e5ee194c0	[SimpleLoopUnswitch] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after `431a40e1e2`	2021-03-30 19:27:10 -07:00
Juneyoung Lee	431a40e1e2	[LoopUnswitch] Assert that branch condition is either and/or but not both as suggested at https://reviews.llvm.org/rG5bb38e84d3d0#986321	2021-03-31 10:35:22 +09:00
Sanjay Patel	c2ebad8d55	[InstCombine] add fold for demand of low bit of abs() This is one problem shown in https://llvm.org/PR49763 https://alive2.llvm.org/ce/z/cV6-4K https://alive2.llvm.org/ce/z/9_3g-L	2021-03-30 15:14:37 -04:00
Huihui Zhang	d857a81437	[VPlan] Use SetVector for VPExternalDefs to prevent non-determinism. Use SetVector instead of SmallPtrSet for external definitions created for VPlan. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM-Unit test VPRecipeTest.dump. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99544	2021-03-30 12:10:56 -07:00
spupyrev	22998738e8	[SamplePGO] Keeping prof metadata for IndirectBrInst Currently prof metadata with branch counts is added only for BranchInst and SwitchInst, but not for IndirectBrInst. As a result, BPI/BFI make incorrect inferences for indirect branches, which can be very hot. This diff adds metadata for IndirectBrInst, in addition to BranchInst and SwitchInst. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D99550	2021-03-30 10:44:48 -07:00
Hongtao Yu	3e3fc431df	[CSSPGO] Top-down processing order based on full profile. Use profiled call edges to augment the top-down order. There are cases that the top-down order computed based on the static call graph doesn't reflect real execution order. For example: 1. Incomplete static call graph due to unknown indirect call targets. Adjusting the order by considering indirect call edges from the profile can enable the inlining of indirect call targets by allowing the caller processed before them. 2. Mutual call edges in an SCC. The static processing order computed for an SCC may not reflect the call contexts in the context-sensitive profile, thus may cause potential inlining to be overlooked. The function order in one SCC is being adjusted to a top-down order based on the profile to favor more inlining. 3. Transitive indirect call edges due to inlining. When a callee function is inlined into into a caller function in LTO prelink, every call edge originated from the callee will be transferred to the caller. If any of the transferred edges is indirect, the original profiled indirect edge, even if considered, would not enforce a top-down order from the caller to the potential indirect call target in LTO postlink since the inlined callee is gone from the static call graph. 4. #3 can happen even for direct call targets, due to functions defined in header files. Header functions, when included into source files, are defined multiple times but only one definition survives due to ODR. Therefore, the LTO prelink inlining done on those dropped definitions can be useless based on a local file scope. More importantly, the inlinee, once fully inlined to a to-be-dropped inliner, will have no profile to consume when its outlined version is compiled. This can lead to a profile-less prelink compilation for the outlined version of the inlinee function which may be called from external modules. while this isn't easy to fix, we rely on the postlink AutoFDO pipeline to optimize the inlinee. Since the survived copy of the inliner (defined in headers) can be inlined in its local scope in prelink, it may not exist in the merged IR in postlink, and we'll need the profiled call edges to enforce a top-down order for the rest of the functions. Considering those cases, a profiled call graph completely independent of the static call graph is constructed based on profile data, where function objects are not even needed to handle case #3 and case 4. I'm seeing an average 0.4% perf win out of SPEC2017. For certain benchmark such as Xalanbmk and GCC, the win is bigger, above 2%. The change is an enhancement to https://reviews.llvm.org/D95988. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D99351	2021-03-30 10:42:22 -07:00
Krasimir Georgiev	c51e91e046	Revert "[Passes] Add relative lookup table converter pass" This reverts commit `5178ffc7cf`. Compiling `llvm-profdata` with a compiler build from this produces a crashing binary.	2021-03-30 14:13:37 +02:00
Juneyoung Lee	6b4b1dc6ec	[LoopUnswitch] Simplify branch condition if it is select with constant operands This fixes the miscompilation reported in https://reviews.llvm.org/rG5bb38e84d3d0#986154 . `select _, true, false` matches both m_LogicalAnd and m_LogicalOr, making later transformations confused. Simplify the branch condition to not have the form.	2021-03-30 20:09:42 +09:00
Sander de Smalen	f71ed5dfe2	NFC: Migrate PartialInlining to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D97382	2021-03-30 11:59:45 +01:00
David Sherwood	a08c7736a7	[LoopVectorize] Add support for scalable vectorization of induction variables This patch adds support for the vectorization of induction variables when using scalable vectors, which required the following changes: 1. Removed assert from InnerLoopVectorizer::getStepVector. 2. Modified InnerLoopVectorizer::createVectorIntOrFpInductionPHI to use a runtime determined value for VF and removed an assert. 3. Modified InnerLoopVectorizer::buildScalarSteps to work for scalable vectors. I did this by calculating the full vector value for each Part of the unroll factor (UF) and caching this in the VP state. This means that we are always able to extract an arbitrary element from the vector if necessary. In addition to this, I also permitted the caching of the individual lane values themselves for the known minimum number of elements in the same way we do for fixed width vectors. This is a further optimisation that improves the code quality since it avoids unnecessary extractelement operations when extracting the first lane. 4. Added an assert to InnerLoopVectorizer::widenPHIInstruction, since while testing some code paths I noticed this is currently broken for scalable vectors. Various tests to support different cases have been added here: Transforms/LoopVectorize/AArch64/sve-inductions.ll Differential Revision: https://reviews.llvm.org/D98715	2021-03-30 11:13:31 +01:00
Krasimir Georgiev	8e7df996e3	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `92ddd3c1b6`. Causes multistage clang crashes, e.g.: https://lab.llvm.org/buildbot/#/builders/36/builds/6678	2021-03-30 11:47:12 +02:00
Han Zhu	92ddd3c1b6	[loop-idiom] Hoist loop memcpys to loop preheader For a simple loop like: ``` struct S { int x; int y; char b; }; unsigned foo(S* __restrict__ a, S* b, int n) { for (int i = 0; i < n; i++) a[i] = b[i]; return sizeof(a[0]); } ``` We could eliminate the loop and convert it to a large memcpy of 12n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll` ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %0 = bitcast %struct.S* %arrayidx2 to i8* %1 = bitcast %struct.S* %arrayidx to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false) %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic. With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass. ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %a1 = bitcast %struct.S* %a to i8* %b2 = bitcast %struct.S* %b to i8* %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry %0 = zext i32 %n to i64 %1 = mul nuw nsw i64 %0, 12 call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false) br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %2 = bitcast %struct.S* %arrayidx2 to i8* %3 = bitcast %struct.S* %arrayidx to i8* %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` Reviewed By: zino Differential Revision: https://reviews.llvm.org/D97667	2021-03-29 23:36:26 -07:00
Han Zhu	2bd4049ceb	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `deb5095833`. Bad commit message.	2021-03-29 23:35:35 -07:00
Han Zhu	deb5095833	[loop-idiom] Hoist loop memcpys to loop preheader Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Blame Revision: Differential Revision: https://phabricator.intern.facebook.com/D26380397	2021-03-29 23:14:42 -07:00
Huihui Zhang	ca721042f1	[IPO][SampleContextTracker] Use SmallVector to track context profiles to prevent non-determinism. Use SmallVector instead of SmallSet to track the context profiles mapped. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM test profile-context-tracker-debug.ll . Reviewed By: MaskRay, wenlei Differential Revision: https://reviews.llvm.org/D99547	2021-03-29 16:37:10 -07:00
Gulfem Savrun Yeniceri	5178ffc7cf	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-03-29 21:53:32 +00:00
Wenlei He	30b0232336	[CSSPGO][llvm-profgen] Context-sensitive global pre-inliner This change sets up a framework in llvm-profgen to estimate inline decision and adjust context-sensitive profile based on that. We call it a global pre-inliner in llvm-profgen. It will serve two purposes: 1) Since context profile for not inlined context will be merged into base profile, if we estimate a context will not be inlined, we can merge the context profile in the output to save profile size. 2) For thinLTO, when a context involving functions from different modules is not inined, we can't merge functions profiles across modules, leading to suboptimal post-inline count quality. By estimating some inline decisions, we would be able to adjust/merge context profiles beforehand as a mitigation. Compiler inline heuristic uses inline cost which is not available in llvm-profgen. But since inline cost is closely related to size, we could get an estimate through function size from debug info. Because the size we have in llvm-profgen is the final size, it could also be more accurate than the inline cost estimation in the compiler. This change only has the framework, with a few TODOs left for follow up patches for a complete implementation: 1) We need to retrieve size for funciton//inlinee from debug info for inlining estimation. Currently we use number of samples in a profile as place holder for size estimation. 2) Currently the thresholds are using the values used by sample loader inliner. But they need to be tuned since the size here is fully optimized machine code size, instead of inline cost based on not yet fully optimized IR. Differential Revision: https://reviews.llvm.org/D99146	2021-03-29 09:46:14 -07:00
Florian Hahn	c773d0f973	Recommit "[LV] Move runtime pointer size check to LVP::plan()." Re-apply `25fbe803d4`, with a small update to emit the right remark class. Original message: [LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri	2021-03-29 16:14:27 +01:00
Florian Hahn	485c8ce733	Revert "[LV] Move runtime pointer size check to LVP::plan()." This reverts commit `25fbe803d4`. This breaks a clang test which filters for the wrong remark type.	2021-03-29 14:41:53 +01:00
Sanjay Patel	da381cf7ce	[SLP] allow matching integer min/max intrinsics as reduction ops This is a 2nd try of: `3c8473ba53` which was reverted at: `a26312f9d4` because of crashing. This version includes extra code and tests to avoid the known crashing examples as discussed in PR49730. Original commit message: As noted in D98152, we need to patch SLP to avoid regressions when we start canonicalizing to integer min/max intrinsics. Most of the real work to make this possible was in: `7202f47508` Differential Revision: https://reviews.llvm.org/D98981	2021-03-29 09:38:18 -04:00
Florian Hahn	25fbe803d4	[LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98634	2021-03-29 14:12:29 +01:00
Matt Arsenault	9a0c9402fa	Reapply "OpaquePtr: Turn inalloca into a type attribute" This reverts commit `07e46367ba`.	2021-03-29 08:55:30 -04:00
Jingu Kang	e4abb64100	[LoopUnswitch] Use reference variables instead of pointer one Differential Revision: https://reviews.llvm.org/D99496	2021-03-29 13:08:46 +01:00
Hans Wennborg	c6e5c4654b	Don't use $ as suffix for symbol names in ThinLTOBitcodeWriter and other places Using $ breaks demangling of the symbols. For example, $ c++filt _Z3foov\$123 _Z3foov$123 This causes problems for developers who would like to see nice stack traces etc., but also for automatic crash tracking systems which try to organize crashes based on the stack traces. Instead, use the period as suffix separator, since Itanium demanglers normally ignore such suffixes: $ c++filt _Z3foov.123 foo() [clone .123] This is already done in some places; try to do it everywhere. Differential revision: https://reviews.llvm.org/D97484	2021-03-29 13:03:52 +02:00
Oliver Stannard	07e46367ba	Revert "Reapply "OpaquePtr: Turn inalloca into a type attribute"" Reverting because test 'Bindings/Go/go.test' is failing on most buildbots. This reverts commit `fc9df30991`.	2021-03-29 11:32:22 +01:00
Jingu Kang	cfe87d4edd	[NFC][LoopUnswitch] Move hasPartialIVCondition to LoopUtils Differential revision: https://reviews.llvm.org/D99490	2021-03-29 10:29:45 +01:00
Matt Arsenault	fc9df30991	Reapply "OpaquePtr: Turn inalloca into a type attribute" This reverts commit `20d5c42e0e`.	2021-03-28 13:35:21 -04:00
Sanjay Patel	01ae6e5ead	[InstCombine] sink min/max intrinsics with common op after select This is another step towards parity with cmp+select min/max idioms. See D98152.	2021-03-28 13:13:04 -04:00
Nico Weber	20d5c42e0e	Revert "OpaquePtr: Turn inalloca into a type attribute" This reverts commit `4fefed6563`. Broke check-clang everywhere.	2021-03-28 13:02:52 -04:00
Matt Arsenault	4fefed6563	OpaquePtr: Turn inalloca into a type attribute I think byval/sret and the others are close to being able to rip out the code to support the missing type case. A lot of this code is shared with inalloca, so catch this up to the others so that can happen.	2021-03-28 11:12:23 -04:00
Florian Hahn	8c6c357897	[LV] Mark a few more cost-model members as const (NFC).	2021-03-28 14:59:48 +01:00
Florian Hahn	d2855eba81	[LV] Fix formatting from `2f9d68c3f1`.	2021-03-27 21:29:56 +00:00
Florian Hahn	2f9d68c3f1	[LV] Mark some methods as const (NFC). Mark a few methods as const, as they do not modify any state.	2021-03-27 21:27:53 +00:00
Juneyoung Lee	05884d3b52	Make FoldBranchToCommonDest poison-safe by default This is a small patch to make FoldBranchToCommonDest poison-safe by default. After `fc3f0c9c`, only two syntactic changes are needed to fix unit tests. This does not cause any assembly difference in testsuite as well (-O3, X86-64 Manjaro). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99452	2021-03-27 19:05:12 +09:00
Juneyoung Lee	fc3f0c9cc0	[IRCE] Use m_LogicalAnd This is a minor fix to use m_LogicalAnd. This allows IRCE to recognize select form of and conditions as well.	2021-03-27 15:23:18 +09:00
Hongtao Yu	12ac0403b1	[CSSPGO][NFC] Fix a debug dump issue. During context promotion, intermediate nodes that are on a call path but do not come with a profile can be promoted together with their parent nodes. Do not print sample context string for such nodes since they do not have profile. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D99441	2021-03-26 16:06:56 -07:00
Chris Lattner	62c41cfba1	Add a missing file header comment, NFC.	2021-03-26 15:34:04 -07:00
Nikita Popov	4622648a06	Revert "[ArgPromotion] Copy additional metadata for loads." This reverts commit `166620a4f0`. A miscompile has been reported in https://reviews.llvm.org/D93927#2653480 and following.	2021-03-26 21:34:54 +01:00
Florian Hahn	4858e081d7	[ConstraintElimination] Only strip casts preserving the representation. Things like addrspacecast may not be no-ops, so we should not look through them.	2021-03-26 20:07:41 +00:00
Sanjay Patel	b0797e0c12	[SLP] use dyn_cast instead of isa + cast; NFC	2021-03-26 13:52:31 -04:00
Sanjay Patel	a26312f9d4	Revert "[SLP] allow matching integer min/max intrinsics as reduction ops" This reverts commit `3c8473ba53` and includes test diffs to maintain testing status. There's at least 1 place that was not updated with `7202f47508` , so we can crash mismatching select and intrinsics as shown in PR49730.	2021-03-26 09:59:14 -04:00
David Sherwood	c39460cc4f	Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost" This reverts commit `240aa96cf2`.	2021-03-26 11:36:53 +00:00
David Sherwood	240aa96cf2	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. Differential Revision: https://reviews.llvm.org/D98512	2021-03-26 11:27:12 +00:00
Wenlei He	5f59f407f5	[CSSPGO] Minor tweak for inline candidate priority tie breaker When prioritize call site to consider for inlining in sample loader, use number of samples as a first tier breaker before using name/guid comparison. This would favor smaller functions when hotness is the same (from the same block). We could try to retrieve accurate function size if this turns out to be more important. Differential Revision: https://reviews.llvm.org/D99370	2021-03-25 21:15:36 -07:00
Leonard Chan	36eaeaf728	[llvm][hwasan] Add Fuchsia shadow mapping configuration Ensure that Fuchsia shadow memory starts at zero. Differential Revision: https://reviews.llvm.org/D99380	2021-03-25 15:28:59 -07:00
Guozhi Wei	3240910f00	[DAE] Adjust param/arg attributes when changing parameter to undef In DeadArgumentElimination pass, if a function's argument is never used, corresponding caller's parameter can be changed to undef. If the param/arg has attribute noundef or other related attributes, LLVM LangRef(https://llvm.org/docs/LangRef.html#parameter-attributes) says its behavior is undefined. SimplifyCFG(D97244) takes advantage of this behavior and does bad transformation on valid code. To avoid this undefined behavior when change caller's parameter to undef, this patch removes noundef attribute and other attributes imply noundef on param/arg. Differential Revision: https://reviews.llvm.org/D98899	2021-03-25 14:53:22 -07:00
Roman Lebedev	1c55dcbca7	[NFCI][SimplifyCFG] Don't pay for a Small{Map,Set}Vector when plain SmallSet will suffice This only changes the cases where we really don't care about the iteration order of the underlying contained, namely when we will use the values from it to form DTU updates.	2021-03-25 23:25:40 +03:00
Yevgeny Rouban	f7ef26ef0b	[SLP] Fix crash in reduction for integer min/max The SCEV commit `b46c085d2b` [NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions seems to reveal a new crash in SLPVectorizer. SLP crashes expecting a SelectInst as an externally used value but umin() call is found. The patch relaxes the assumption to make the IR flag propagation safe. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D99328	2021-03-25 21:44:21 +07:00
Matt Morehouse	96a4167b4c	[HWASan] Use page aliasing on x86_64. Userspace page aliasing allows us to use middle pointer bits for tags without untagging them before syscalls or accesses. This should enable easier experimentation with HWASan on x86_64 platforms. Currently stack, global, and secondary heap tagging are unsupported. Only primary heap allocations get tagged. Note that aliasing mode will not work properly in the presence of fork(), since heap memory will be shared between the parent and child processes. This mode is non-ideal; we expect Intel LAM to enable full HWASan support on x86_64 in the future. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D98875	2021-03-25 07:04:14 -07:00
Alexey Bataev	568c874117	[SLP]Improve and simplify extendSchedulingRegion. We do not need to scan further if the upper end or lower end of the basic block is reached already and the instruction is not found. It means that the instruction is definitely in the lower part of basic block or in the upper block relatively. This should improve compile time for the very big basic blocks. Differential Revision: https://reviews.llvm.org/D99266	2021-03-25 05:31:58 -07:00
Sameer Sahasrabuddhe	b92c8c22b9	[NewPM] Disable non-trivial loop-unswitch on targets with divergence Unswitching a loop on a non-trivial divergent branch is expensive since it serializes the execution of both version of the loop. But identifying a divergent branch needs divergence analysis, which is a function level analysis. The legacy pass manager handles this dependency by isolating such a loop transform and rerunning the required function analyses. This functionality is currently missing in the new pass manager, and there is no safe way for the SimpleLoopUnswitch pass to depend on DivergenceAnalysis. So we conservatively assume that all non-trivial branches are divergent if the target has divergence. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D98958	2021-03-25 11:27:10 +00:00
Philip Reames	9a82f42d12	Plumb TLI through isSafeToExecuteUnconditionally [NFC] Split from D95815 to reduce patch size. Isn't (yet) used for anything, only the client side is wired up.	2021-03-24 17:52:04 -07:00
Matt Morehouse	c8ef98e5de	Revert "[HWASan] Use page aliasing on x86_64." This reverts commit `63f73c3eb9` due to breakage on aarch64 without TBI.	2021-03-24 16:18:29 -07:00
Roman Lebedev	2070fe7144	[NFCI][SimplifyCFG] Don't form DTU updates if we aren't going to apply them I think we may want to have a thin wrapper over a vector to deduplicate those `if(DTU)` predicates, and instead do them in the `insert()` itself.	2021-03-25 00:02:37 +03:00
Congzhe Cao	829c1b6443	[LoopInterchange] fix tightlyNested() in LoopInterchange legality This is yet another attempt to fix tightlyNested(). Add checks in tightlyNested() for the inner loop exit block, such that 1) if there is control-flow divergence in between the inner loop exit block and the outer loop latch, or 2) if the inner loop exit block contains unsafe instructions, tightlyNested() returns false. The reasoning behind is that after interchange, the original inner loop exit block, which was part of the outer loop, would be put into the new inner loop, and will be executed different number of times before and after interchange. Thus it should be dealt with appropriately. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D98263	2021-03-24 15:49:25 -04:00
Florian Hahn	9d45579279	[LV] Factor out phi type access to variable (NFC). A slight simplification of the code to reduce future diffs.	2021-03-24 19:25:22 +00:00
Florian Hahn	8d1342f79d	[LV] Remove redundant access to Legal::getReductionVars() (NFC). The reduction descriptor is retrieved earlier and stored in a variable RdxDesc already.	2021-03-24 19:15:14 +00:00
Gulfem Savrun Yeniceri	5fbe1fdf17	Revert "[Passes] Add relative lookup table converter pass" This reverts commit `5fd001a5ff` because it broke clang-with-thin-lto-ubuntu bot.	2021-03-24 18:59:33 +00:00
Matt Morehouse	63f73c3eb9	[HWASan] Use page aliasing on x86_64. Userspace page aliasing allows us to use middle pointer bits for tags without untagging them before syscalls or accesses. This should enable easier experimentation with HWASan on x86_64 platforms. Currently stack, global, and secondary heap tagging are unsupported. Only primary heap allocations get tagged. Note that aliasing mode will not work properly in the presence of fork(), since heap memory will be shared between the parent and child processes. This mode is non-ideal; we expect Intel LAM to enable full HWASan support on x86_64 in the future. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D98875	2021-03-24 11:43:41 -07:00
Gulfem Savrun Yeniceri	5fd001a5ff	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-03-24 17:31:18 +00:00
Nikita Popov	8a168d2d70	[LICM] Fix NumSunk statistic (NFC) LICM can sink instructions that have uses inside the loop, as long as these uses are considered "free". However, if there were only free uses inside the loop, and no uses outside the loop at all, the instruction would still count towards the NumSunk statistic. This resulted in a wild inflation of the NumSunk metric. After this patch it drops down from 1141787 to 5852 on test-suite O3.	2021-03-24 18:28:19 +01:00
Thomas Preud'homme	3b52c04e82	Make FindAvailableLoadedValue TBAA aware FindAvailableLoadedValue() relies on FindAvailablePtrLoadStore() to run the alias analysis when searching for an equivalent value. However, FindAvailablePtrLoadStore() calls the alias analysis framework with a memory location for the load constructed from an address and a size, which thus lacks TBAA metadata info. This commit modifies FindAvailablePtrLoadStore() to accept an optional memory location as parameter to allow FindAvailableLoadedValue() to create it based on the load instruction, which would then have TBAA metadata info attached. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99206	2021-03-24 17:20:26 +00:00
Roman Lebedev	fe36b834db	[NFCI][SimplifyCFG] Fold branch to common dest: don't check cost if no qualified preds	2021-03-24 19:01:47 +03:00
Sander de Smalen	55d18b3cc2	[TTI] Return a TypeSize from getRegisterBitWidth. This patch changes the interface to take a RegisterKind, to indicate whether the register bitwidth of a scalar register, fixed-width vector register, or scalable vector register must be returned. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98874	2021-03-24 14:45:13 +00:00
Florian Hahn	cd0c00c9fe	[LV] Move exact FP math check out of Requirements. We know if the loop contains FP instructions preventing vectorization after we are done with legality checks. This patch updates the code the check for un-vectorizable FP operations earlier, to avoid unnecessarily running the cost model and picking a vectorization factor. It also makes the code more direct and moves the check to a position where similar checks are done. I might be missing something, but I don't see any reason to handle this check differently to other, similar checks. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98633	2021-03-24 11:01:44 +00:00
Ta-Wei Tu	4d9d736875	[NFC] Improve debug message and test description in `4c1f74a`	2021-03-24 18:21:13 +08:00
Ta-Wei Tu	4c1f74a76c	[LoopFlatten] Fix invalid assertion (PR49571) The `InductionPHI` is not necessarily the increment instruction, as demonstrated in pr49571.ll. This patch removes the assertion and instead bails out from the `LoopFlatten` pass if that happens. This fixes https://bugs.llvm.org/show_bug.cgi?id=49571 Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D99252	2021-03-24 18:08:27 +08:00
Ta-Wei Tu	8fde25b3c3	[NFC] Remove redundant `struct` prefix Reviewed By: SjoerdMeijer, fhahn Differential Revision: https://reviews.llvm.org/D99251	2021-03-24 17:58:33 +08:00
Alexey Bataev	99203f2004	[Analysis]Add getPointersDiff function to improve compile time. Added getPointersDiff function to LoopAccessAnalysis and used it instead direct calculatoin of the distance between pointers and/or isConsecutiveAccess function in SLP vectorizer to improve compile time and detection of stores consecutive chains. Part of D57059 Differential Revision: https://reviews.llvm.org/D98967	2021-03-23 14:25:36 -07:00
Alexey Bataev	f1b47ad278	Revert "[Analysis]Add getPointersDiff function to improve compile time." This reverts commit `065a14a12d` to investigate and fix crash in SLP vectorizer.	2021-03-23 13:17:54 -07:00
Alexey Bataev	065a14a12d	[Analysis]Add getPointersDiff function to improve compile time. Added getPointersDiff function to LoopAccessAnalysis and used it instead direct calculatoin of the distance between pointers and/or isConsecutiveAccess function in SLP vectorizer to improve compile time and detection of stores consecutive chains. Part of D57059 Differential Revision: https://reviews.llvm.org/D98967	2021-03-23 12:58:42 -07:00
Roman Lebedev	b5822026dd	[SimplifyCFG] 'Fold branch to common dest': don't overestimate the cost `FoldBranchToCommonDest()` has a certain budget (`-bonus-inst-threshold=`) for bonus instruction duplication. And currently it calculates the cost as-if it will actually duplicate into each predecessor. But ignoring the budget, it won't always duplicate into each predecessor, there are some correctness and profitability checks. So when calculating the cost, we should first check into which blocks will we actually duplicate, and only then use that block count to do budgeting.	2021-03-23 18:30:26 +03:00
Roman Lebedev	514bc01ca3	[SimplifyCFG] FoldBranchToCommonDest(): properly handle same-block external uses (PR49510/PR49689) We clone bonus instructions to the end of the predecessor block, and then use `SSAUpdater::RewriteUseAfterInsertions()`. But that only deals with the cases where the use-to-be-rewritten are either in different block from the def, or come after the def. But in some loop cases, the external use may be in the beginning of predecessor block, before the newly cloned bonus instruction. `SSAUpdater::RewriteUseAfterInsertions()` does not deal with that. Notably, the external use can't happen to be both in the same block and after the newly-cloned instruction, because of the fold preconditions. To properly handle these cases, when the use is in the same block, we should instead use `SSAUpdater::RewriteUse()`. TBN, they do the same thing for PHI users. Fixes https://bugs.llvm.org/show_bug.cgi?id=49510 Likely Fixes https://bugs.llvm.org/show_bug.cgi?id=49689	2021-03-23 17:37:28 +03:00
Sanjay Patel	1bf8f9e228	[SimplifyCFG] use profile metadata to refine merging branch conditions 2nd try (original: `27ae17a6b0`) with fix/test for crash. We must make sure that TTI is available before trying to use it because it is not required (might be another bug). Original commit message: This is one step towards solving: https://llvm.org/PR49336 In that example, we disregard the recommended usage of builtin_expect, so an expensive (unpredictable) branch is folded into another branch that is guarding it. Here, we read the profile metadata to see if the 1st (predecessor) condition is likely to cause execution to bypass the 2nd (successor) condition before merging conditions by using logic ops. Differential Revision: https://reviews.llvm.org/D98898	2021-03-23 10:19:37 -04:00
Sanjay Patel	3c8473ba53	[SLP] allow matching integer min/max intrinsics as reduction ops As noted in D98152, we need to patch SLP to avoid regressions when we start canonicalizing to integer min/max intrinsics. Most of the real work to make this possible was in: `7202f47508` Differential Revision: https://reviews.llvm.org/D98981	2021-03-23 08:56:44 -04:00
Luke Drummond	520f70e94d	[NFC] clang-format llvm/lib/Transforms/Utils/CloneFunction.cpp Differential Revision: https://reviews.llvm.org/D98957	2021-03-23 12:53:28 +00:00
Luke Drummond	ab44ec1b22	[NFC] Minor refactor - Give unwieldy repeated expression a name - Use a ranged `for` basic block iterator Reviewed by: nikic, dexonsmith Differential Revisision: https://reviews.llvm.org/D98957	2021-03-23 12:53:28 +00:00
Luke Drummond	0448ddd169	[NFCI] cleanup CloneFunctionInto Hoist early return for decl-only clones to before DIFinder calculation. Also fix an out of date assert message after invariants changed in `22a52dfddc`. Reviewed by: nikic, dexonsmith Differential Revisision: https://reviews.llvm.org/D98957	2021-03-23 12:53:27 +00:00
Nashe Mncube	5d929794a8	[llvm-opt] Bug fix within combining FP vectors A bug was found within InstCombineCasts where a function call is only implemented to work with FixedVectors. This caused a crash when a ScalableVector was passed to this function. This commit introduces a regression test which recreates the failure and a bug fix. Differential Revision: https://reviews.llvm.org/D98351	2021-03-23 12:13:41 +00:00
Florian Hahn	e43e8e9138	[AnnotationRemarks] Use subprogram location for summary remarks. The summary remarks are generated on a per-function basis. Using the first instruction's location is sub-optimal for 2 reasons: 1. Sometimes the first instruction is missing !dbg 2. The location of the first instruction may be mis-leading. Instead, just use the location of the function directly.	2021-03-23 12:05:41 +00:00
David Sherwood	d70251163f	[LoopVectorize][NFC] Refactor code to use IRBuilder::CreateStepVector In places where we create a ConstantVector whose elements are a linear sequence of the form <start, start + 1, start + 2, ...> I've changed the code to make use of CreateStepVector, which creates a vector with the sequence <0, 1, 2, ...>, and a vector addition operation. This patch is a non-functional change, since the output from the vectoriser remains unchanged for fixed length vectors and there are existing asserts that still fire when attempting to use scalable vectors for vectorising induction variables. In a later patch we will enable support for scalable vectors in InnerLoopVectorizer::getStepVector(), which relies upon the new stepvector intrinsic in IRBuilder::CreateStepVector. Differential Revision: https://reviews.llvm.org/D97861	2021-03-23 11:29:05 +00:00
Florian Hahn	f759d512c8	[VPlan] Include name when printing after `93a9d2de8f`. The name is included when printing in DOT mode. Also print it in non-DOT mode after `93a9d2de8f`. This will become more important to distinguish different plans once VPlans are gradually refined.	2021-03-23 09:50:14 +00:00
Juneyoung Lee	960a767368	Reland "[InstCombine] Add simplification of two logical and/ors" This relands `07c3b97e18` (D96945) which was reverted by commit `f49354838e`. The two-stage compilation successfully tests passes on my machine.	2021-03-23 16:24:50 +09:00
Fangrui Song	3c81822ec5	[SanitizerCoverage] Use External on Windows This should fix https://reviews.llvm.org/D98903#2643589 though it is not clear to me why ExternalWeak does not work.	2021-03-22 23:05:36 -07:00
Serguei Katkov	9fec382601	[RS4GC] Fix hang on infinite loop meetBDVState utility may sets the base pointer for the conflict state. At this moment the base for conflict state does not have any meaning but is used in comparison of BDV states. This comparison is used as an indicator of progress done on iteration and RS4GC pass uses infinite loop to reach fixed point. As a result for added test on each iteration state for some phi nodes is updated with other base value for conflict state and it indicates as a progress while for conflict state there is no any progress more possible. In reality the base value is transferred from one state to another and pass detects the progress on these states. The test is very fragile. The traversal order of states and operands of phi nodes plays important role. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D99058	2021-03-23 12:54:51 +07:00
Gulfem Savrun Yeniceri	e3a6d70c68	Revert "[Passes] Add relative lookup table converter pass" This reverts commit `78a65cd945` which caused buildbot failures.	2021-03-23 00:43:16 +00:00
Juneyoung Lee	5c2e50b5d2	Reland "[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe" This relands commit `99108c791d` (D95026) which was reverted by `8d5a981a13` because the underlying problem (https://llvm.org/pr49495) is fixed.	2021-03-23 09:19:53 +09:00
Gulfem Savrun Yeniceri	78a65cd945	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-03-22 22:09:02 +00:00
Roman Lebedev	d37fe26a2b	[NFC][IR] Type: add getWithNewType() method Sometimes you want to get a type with same vector element count as the current type, but different element type, but there's no QOL wrapper to do that. Add one.	2021-03-23 00:50:58 +03:00
Sanjay Patel	95f7f7c21b	Revert "[SimplifyCFG] use profile metadata to refine merging branch conditions" This reverts commit `27ae17a6b0`. There are bot failures that end with: #4 0x00007fff7ae3c9b8 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0 #5 0x00007fff84e504d8 (linux-vdso64.so.1+0x4d8) #6 0x00007fff7c419a5c llvm::TargetTransformInfo::getPredictableBranchThreshold() const (/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libLLVMAnalysis.so.13git+0x479a5c) ...but not sure how to trigger that yet.	2021-03-22 17:48:06 -04:00
Sanjay Patel	27ae17a6b0	[SimplifyCFG] use profile metadata to refine merging branch conditions This is one step towards solving: https://llvm.org/PR49336 In that example, we disregard the recommended usage of builtin_expect, so an expensive (unpredictable) branch is folded into another branch that is guarding it. Here, we read the profile metadata to see if the 1st (predecessor) condition is likely to cause execution to bypass the 2nd (successor) condition before merging conditions by using logic ops. Differential Revision: https://reviews.llvm.org/D98898	2021-03-22 16:49:21 -04:00
Sanjay Patel	664d0c052c	[TargetTransformInfo] move branch probability query from TargetLoweringInfo This is no-functional-change intended (NFC), but needed to allow optimizer passes to use the API. See D98898 for a proposed usage by SimplifyCFG. I'm simplifying the code by removing the cl::opt. That was added back with the original commit in D19488, but I don't see any evidence in regression tests that it was used. Target-specific overrides can use the usual patterns to adjust as necessary. We could also restore that cl::opt, but it was not clear to me exactly how to do it in the convoluted TTI class structure.	2021-03-22 15:55:34 -04:00
Bjorn Pettersson	688cdddafb	[SLP] Honor min/max regsize and min/max VF in vectorizeStores Make sure we use PowerOf2Floor instead of PowerOf2Ceil when calculating max number of elements that fits inside a vector register (otherwise we could end up creating vectors larger than the maximum vector register size). Also make sure we honor the min/max VF (as given by TTI or cmd line parameters) when doing vectorizeStores. Reviewed By: anton-afanasyev Differential Revision: https://reviews.llvm.org/D97691	2021-03-22 17:29:35 +01:00
Matt Morehouse	772851ca4e	[HWASan] Disable stack, globals and force callbacks for x86_64. Subsequent patches will implement page-aliasing mode for x86_64, which will initially only work for the primary heap allocator. We force callback instrumentation to simplify the initial aliasing implementation. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D98069	2021-03-22 08:02:27 -07:00
Bradley Smith	48f5a392cb	[IR] Add vscale_range IR function attribute This attribute represents the minimum and maximum values vscale can take. For now this attribute is not hooked up to anything during codegen, this will be added in the future when such codegen is considered stable. Additionally hook up the -msve-vector-bits=<x> clang option to emit this attribute. Differential Revision: https://reviews.llvm.org/D98030	2021-03-22 12:05:06 +00:00
Max Kazantsev	8fab9f824f	[IndVars] Sharpen context in eliminateIVComparison When eliminating comparisons, we can use common dominator of all its users as context. This gives better results when ICMP is not computed right before the branch that uses it. Differential Revision: https://reviews.llvm.org/D98924 Reviewed By: lebedev.ri	2021-03-22 11:55:57 +07:00
Roman Lebedev	e3a4701627	[clang][CodeGen] Lower Likelihood attributes to @llvm.expect intrin instead of branch weights `08196e0b2e` exposed LowerExpectIntrinsic's internal implementation detail in the form of LikelyBranchWeight/UnlikelyBranchWeight options to the outside. While this isn't incorrect from the results viewpoint, this is suboptimal from the layering viewpoint, and causes confusion - should transforms also use those weights, or should they use something else, D98898? So go back to status quo by making LikelyBranchWeight/UnlikelyBranchWeight internal again, and fixing all the code that used it directly, which currently is only clang codegen, thankfully, to emit proper @llvm.expect intrinsics instead.	2021-03-21 22:50:21 +03:00
Roman Lebedev	37d6be9052	Revert "[BranchProbability] move options for 'likely' and 'unlikely'" Upon reviewing D98898 i've come to realization that these are implementation detail of LowerExpectIntrinsicPass, and they should not be exposed to outside of it. This reverts commit `ee8b53815d`.	2021-03-21 22:50:21 +03:00
Sanjay Patel	ee8b53815d	[BranchProbability] move options for 'likely' and 'unlikely' This makes the settings available for use in other passes by housing them within the Support lib, but NFC otherwise. See D98898 for the proposed usage in SimplifyCFG (where this change was originally included). Differential Revision: https://reviews.llvm.org/D98945	2021-03-20 14:46:46 -04:00
Jeroen Dobbelaere	77080a1eb6	Revert of D49126 [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types. Now that intrinsic name mangling can cope with unnamed types, the custom name mangling in PredicateInfo (introduced by D49126) can be removed. (See D91250, D48541) Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D91661	2021-03-20 11:37:09 +01:00
Arthur Eubanks	a17394dc88	[NewPM] Verify LoopAnalysisResults after a loop pass All loop passes should preserve all analyses in LoopAnalysisResults. Add checks for those when the checks are enabled (which is by default with expensive checks on). Note that due to PR44815, we don't check LAR's ScalarEvolution. Apparently calling SE.verify() can change its results. This is a reland of https://reviews.llvm.org/D98820 which was reverted due to unacceptably large compile time regressions in normal debug builds.	2021-03-19 14:56:37 -07:00
Arthur Eubanks	a1ab5627f0	Revert "[NewPM] Verify LoopAnalysisResults after a loop pass" This reverts commit `94c269baf5`. Still causes too large of compile time regression in normal debug builds. Will put under expensive checks instead.	2021-03-19 14:31:08 -07:00
Arthur Eubanks	94c269baf5	[NewPM] Verify LoopAnalysisResults after a loop pass All loop passes should preserve all analyses in LoopAnalysisResults. Add checks for those. Note that due to PR44815, we don't check LAR's ScalarEvolution. Apparently calling SE.verify() can change its results. Only verify MSSA when VerifyMemorySSA, normally it's very expensive. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98820	2021-03-19 13:26:45 -07:00
Philip Reames	5698537f81	Update basic deref API to account for possiblity of free [NFC] This patch is plumbing to support work towards the goal outlined in the recent llvm-dev post "[llvm-dev] RFC: Decomposing deref(N) into deref(N) + nofree". The point of this change is purely to simplify iteration on other pieces on way to making the switch. Rebuilding with a change to Value.h is slow and painful, so I want to get the API change landed. Once that's done, I plan to more closely audit each caller, add the inference rules in their own patch, then post a patch with the langref changes and test diffs. The value of the command line flag is that we can exercise the inference logic in standalone patches without needing the whole switch ready to go just yet. Differential Revision: https://reviews.llvm.org/D98908	2021-03-19 11:17:19 -07:00
Andrei Elovikov	92205cb27f	[NFC][VPlan] Guard print routines with "#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)" Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D98897	2021-03-19 10:50:12 -07:00
Andrei Elovikov	93a9d2de8f	[VPlan] Add plain text (not DOT's digraph) dumps I foresee two uses for this: 1) It's easier to use those in debugger. 2) Once we start implementing more VPlan-to-VPlan transformations (especially inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in LIT test would become too obscure. I can imagine that we'd want to CHECK against VPlan dumps after multiple transformations instead. That would be easier with plain text dumps than with DOT format. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96628	2021-03-19 10:50:12 -07:00
Max Kazantsev	8eefa07fcf	[NFC] Move function up in code	2021-03-19 14:03:31 +07:00
Max Kazantsev	8bb952b57f	[NFC] Factor out utility function for finding common dom of user set	2021-03-19 13:49:29 +07:00
Max Kazantsev	16370e02a7	[IndVars] Provide eliminateIVComparison with context We can prove more predicates when we have a context when eliminating ICmp. As first (and very obvious) approximation we can use the ICmp instruction itself, though in the future we are going to use a common dominator of all its users. Need some refactoring before that. Observed ~0.5% negative compile time impact. Differential Revision: https://reviews.llvm.org/D98697 Reviewed By: lebedev.ri	2021-03-19 12:28:22 +07:00
Fangrui Song	9558456b53	[SanitizerCoverage] Make __start_/__stop_ symbols extern_weak On ELF, we place the metadata sections (`__sancov_guards`, `__sancov_cntrs`, `__sancov_bools`, `__sancov_pcs` in section groups (either `comdat any` or `comdat noduplicates`). With `--gc-sections`, LLD since D96753 and GNU ld `-z start-stop-gc` may garbage collect such sections. If all `__sancov_bools` are discarded, LLD will error `error: undefined hidden symbol: __start___sancov_cntrs` (other sections are similar). ``` % cat a.c void discarded() {} % clang -fsanitize-coverage=func,trace-pc-guard -fpic -fvisibility=hidden a.c -shared -fuse-ld=lld -Wl,--gc-sections ... ld.lld: error: undefined hidden symbol: __start___sancov_guards >>> referenced by a.c >>> /tmp/a-456662.o:(sancov.module_ctor_trace_pc_guard) ``` Use the `extern_weak` linkage (lowered to undefined weak symbols) to avoid the undefined error. Differential Revision: https://reviews.llvm.org/D98903	2021-03-18 16:46:04 -07:00
George Balatsouras	d10f173f34	[dfsan] Add -dfsan-fast-8-labels flag This is only adding support to the dfsan instrumentation pass but not to the runtime. Added more RUN lines for testing: for each instrumentation test that had a -dfsan-fast-16-labels invocation, a new invocation was added using fast8. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D98734	2021-03-18 16:28:42 -07:00
Mehdi Amini	3614df3537	Revert "[VPlan] Add plain text (not DOT's digraph) dumps" This reverts commit `6b053c9867`. The build is broken: ld.lld: error: undefined symbol: llvm::VPlan::printDOT(llvm::raw_ostream&) const >>> referenced by LoopVectorize.cpp >>> LoopVectorize.cpp.o:(llvm::LoopVectorizationPlanner::printPlans(llvm::raw_ostream&)) in archive lib/libLLVMVectorize.a	2021-03-18 19:20:39 +00:00
Andrei Elovikov	6b053c9867	[VPlan] Add plain text (not DOT's digraph) dumps I foresee two uses for this: 1) It's easier to use those in debugger. 2) Once we start implementing more VPlan-to-VPlan transformations (especially inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in LIT test would become too obscure. I can imagine that we'd want to CHECK against VPlan dumps after multiple transformations instead. That would be easier with plain text dumps than with DOT format. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96628	2021-03-18 11:33:39 -07:00
Wei Mi	14756b70ee	[SampleFDO] Don't mix up the existing indirect call value profile with the new value profile annotated after inlining. In https://reviews.llvm.org/D96806 and https://reviews.llvm.org/D97350, we use the magic number -1 in the value profile to avoid repeated indirect call promotion to the same target for an indirect call. Function updateIDTMetaData is used to mark an target as being promoted in the value profile with the magic number. updateIDTMetaData is also used to update the value profile when an indirect call is inlined and new inline instance profile should be applied. For the second case, currently updateIDTMetaData mixes up the existing value profile of the indirect call with the new profile, leading to the problematic senario that a target count is larger than the total count in the value profile. The patch fixes the problem. When updateIDTMetaData is used to update the value profile after inlining, all the values in the existing value profile will be dropped except the values with the magic number counts. Differential Revision: https://reviews.llvm.org/D98835	2021-03-18 09:54:34 -07:00
Mircea Trofin	4b1c8070bb	[NFC][ArgumentPromotion] Clear FAM cached results of erased function. Not doing it here can lead to subtle bugs - the analysis results are associated by the Function object's address. Nothing stops the memory allocator from allocating new functions at the same address.	2021-03-18 09:17:32 -07:00
Alexey Bataev	b3ced9852c	[SLP]Fix crash on extending scheduling region. If SLP vectorizer tries to extend the scheduling region and runs out of the budget too early, but still extends the region to the new ending instructions (i.e., it was able to extend the region for the first instruction in the bundle, but not for the second), the compiler need to recalculate dependecies in full, just like if the extending was successfull. Without it, the schedule data chunks may end up with the wrong number of (unscheduled) dependecies and it may end up with the incorrect function, where the vectorized instruction does not dominate on the extractelement instruction. Differential Revision: https://reviews.llvm.org/D98531	2021-03-18 06:11:08 -07:00
Max Kazantsev	26ec76add5	[NFC] One more use case for evaluatePredicate	2021-03-18 19:21:29 +07:00
Max Kazantsev	1067a13cc1	[NFC] Use evaluatePredicate in eliminateComparison Just makes code simpler.	2021-03-18 19:21:29 +07:00
Arthur Eubanks	792bed6a4c	Revert "[NewPM] Verify LoopAnalysisResults after a loop pass" This reverts commit `6db3ab2903`. Causing too large of compile time regression.	2021-03-17 15:22:52 -07:00
Arthur Eubanks	6db3ab2903	[NewPM] Verify LoopAnalysisResults after a loop pass All loop passes should preserve all analyses in LoopAnalysisResults. Add checks for those. Note that due to PR44815, we don't check LAR's ScalarEvolution. Apparently calling SE.verify() can change its results. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98805	2021-03-17 13:37:22 -07:00
Philip Reames	31764ea295	[LCSSA] Extract a utility for deciding if a new use requires a new lcssa phi [NFC] (Triggered by a review comment on D98728, but otherwise unrelated.)	2021-03-17 12:14:01 -07:00
Philip Reames	7c7f4676cd	[LICM] Fix a crash when sinking instructions w/token operands It is not legal to form a phi node with token type. The generic LCSSA construction code handles this correctly - by not forming LCSSA for such cases - but the adhoc fixup implementation in LICM did not. This was noticed in the context of PR49607, but can be demonstrated on ToT with the tweaked test case. This is not specific to gc.relocate btw, it also applies to usage of the preallocated family of intrinsics as well. Differential Revision: https://reviews.llvm.org/D98728	2021-03-17 11:18:46 -07:00
David Green	e2935dcfc4	[TTI] Add a Mask to getShuffleCost This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask can be provided a more accurate cost can be provided by the backend. For example VREV costs could be returned by the ARM backend. This should be an NFC until then, laying the groundwork for that to be added. Differential Revision: https://reviews.llvm.org/D98206	2021-03-17 17:46:26 +00:00
Stephen Tozer	3bfddc2593	Reapply "[DebugInfo] Handle multiple variable location operands in IR" Fixed section of code that iterated through a SmallDenseMap and added instructions in each iteration, causing non-deterministic code; replaced SmallDenseMap with MapVector to prevent non-determinism. This reverts commit `01ac6d1587`.	2021-03-17 16:45:25 +00:00
LemonBoy	4f024938e4	[LoopVectorize] Refine hasIrregularType predicate The `hasIrregularType` predicate checks whether an array of N values of type Ty is "bitcast-compatible" with a <N x Ty> vector. The previous check returned invalid results in some cases where there's some padding between the array elements: eg. a 4-element array of u7 values is considered as compatible with <4 x u7>, even though the vector is only loading/storing 28 bits instead of 32. The problem causes LLVM to generate incorrect code for some targets: for AArch64 the vector loads/stores are lowered in terms of ubfx/bfi, effectively losing the top (N * padding bits). Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D97465	2021-03-17 17:03:47 +01:00
Hans Wennborg	01ac6d1587	Revert "[DebugInfo] Handle multiple variable location operands in IR" This caused non-deterministic compiler output; see comment on the code review. > This patch updates the various IR passes to correctly handle dbg.values with a > DIArgList location. This patch does not actually allow DIArgLists to be produced > by salvageDebugInfo, and it does not affect any pass after codegen-prepare. > Other than that, it should cover every IR pass. > > Most of the changes simply extend code that operated on a single debug value to > operate on the list of debug values in the style of any_of, all_of, for_each, > etc. Instances of setOperand(0, ...) have been replaced with with > replaceVariableLocationOp, which takes the value that is being replaced as an > additional argument. In places where this value isn't readily available, we have > to track the old value through to the point where it gets replaced. > > Differential Revision: https://reviews.llvm.org/D88232 This reverts commit `df69c69427`.	2021-03-17 13:36:48 +01:00
David Green	3c25c40d51	[LV] Account for the cost of predication of scalarized load/store This adds the cost of an i1 extract and a branch to the cost in getMemInstScalarizationCost when the instruction is predicated. These predicated loads/store would generate blocks of something like: %c1 = extractelement <4 x i1> %C, i32 1 br i1 %c1, label %if, label %else if: %sa = extractelement <4 x i32> %a, i32 1 %sb = getelementptr inbounds float, float* %pg, i32 %sa %sv = extractelement <4 x float> %x, i32 1 store float %sa, float* %sb, align 4 else: So this increases the cost by the extract and branch. This is probably still too low in many cases due to the cost of all that branching, but there is already an existing hack increasing the cost using useEmulatedMaskMemRefHack. It will increase the cost of a memop if it is a load or there are more than one store. This patch improves the cost for when there is only a single store, and hopefully at some point in the future the hack can be removed. Differential Revision: https://reviews.llvm.org/D98243	2021-03-17 10:57:50 +00:00
Bu Le	9abe500473	[SLP] Fix the trunc instruction insertion problem Current SLP pass has this piece of code that inserts a trunc instruction after the vectorized instruction. In the case that the vectorized instruction is a phi node and not the last phi node in the BB, the trunc instruction will be inserted between two phi nodes, which will trigger verify problem in debug version or unpredictable error in another pass. This patch changes the algorithm to 'if the last vectorized instruction is a phi, insert it after the last phi node in current BB' to fix this problem.	2021-03-17 13:51:08 +03:00
Arthur Eubanks	70af2924a7	[Unswitch] Guard dbgs logging with LLVM_DEBUG	2021-03-16 22:31:57 -07:00
Sanjay Patel	7202f47508	[SLP] separate min/max matching from its instruction-level implementation; NFC The motivation is to handle integer min/max reductions independently of whether they are in the current cmp+sel form or the planned intrinsic form. We assumed that min/max included a select instruction, but we can decouple that implementation detail by checking the instructions themselves rather than relying on the recurrence (reduction) type.	2021-03-16 17:16:11 -04:00
Mohammad Hadi Jooybar	302b80abf0	[InstCombine] Avoid Bitcast-GEP fusion for pointers directly from allocation functions Elimination of bitcasts with void pointer arguments results in GEPs with pure byte indexes. These GEPs do not preserve struct/array information and interrupt phi address translation in later pipeline stages. Here is the original motivation for this patch: ``` #include<stdio.h> #include<malloc.h> typedef struct __Node{ double f; struct __Node next; } Node; void foo () { Node a = (Node) malloc (sizeof(Node)); a->next = NULL; a->f = 11.5f; Node ptr = a; double sum = 0.0f; while (ptr) { sum += ptr->f; ptr = ptr->next; } printf("%f\n", sum); } ``` By explicit assignment `a->next = NULL`, we can infer the length of the link list is `1`. In this case we can eliminate while loop traversal entirely. This elimination is supposed to be performed by GVN/MemoryDependencyAnalysis/PhiTranslation . The final IR before this patch: ``` define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2 %next = getelementptr inbounds i8, i8* %call, i64 8 %0 = bitcast i8* %next to %struct.__Node** store %struct.__Node* null, %struct.__Node** %0, align 8, !tbaa !2 %f = bitcast i8* %call to double* store double 1.150000e+01, double* %f, align 8, !tbaa !8 %tobool12 = icmp eq i8* %call, null br i1 %tobool12, label %while.end, label %while.body.lr.ph while.body.lr.ph: ; preds = %entry %1 = bitcast i8* %call to %struct.__Node* br label %while.body while.body: ; preds = %while.body.lr.ph, %while.body %sum.014 = phi double [ 0.000000e+00, %while.body.lr.ph ], [ %add, %while.body ] %ptr.013 = phi %struct.__Node* [ %1, %while.body.lr.ph ], [ %3, %while.body ] %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0 %2 = load double, double* %f1, align 8, !tbaa !8 %add = fadd contract double %sum.014, %2 %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1 %3 = load %struct.__Node, %struct.__Node* %next2, align 8, !tbaa !2 %tobool = icmp eq %struct.__Node* %3, null br i1 %tobool, label %while.end, label %while.body while.end: ; preds = %while.body, %entry %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %while.body ] %call3 = tail call i32 (i8, ...) @printf(i8 nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa) ret void } ``` Final IR after this patch: ``` ; Function Attrs: nofree nounwind define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { while.end: %call3 = tail call i32 (i8, ...) @printf(i8 nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 1.150000e+01) ret void } ``` IR before GVN before this patch: ``` define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2 %next = getelementptr inbounds i8, i8* %call, i64 8 %0 = bitcast i8* %next to %struct.__Node** store %struct.__Node* null, %struct.__Node** %0, align 8, !tbaa !2 %f = bitcast i8* %call to double* store double 1.150000e+01, double* %f, align 8, !tbaa !8 %tobool12 = icmp eq i8* %call, null br i1 %tobool12, label %while.end, label %while.body.lr.ph while.body.lr.ph: ; preds = %entry %1 = bitcast i8* %call to %struct.__Node* br label %while.body while.body: ; preds = %while.body.lr.ph, %while.body %sum.014 = phi double [ 0.000000e+00, %while.body.lr.ph ], [ %add, %while.body ] %ptr.013 = phi %struct.__Node* [ %1, %while.body.lr.ph ], [ %3, %while.body ] %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0 %2 = load double, double* %f1, align 8, !tbaa !8 %add = fadd contract double %sum.014, %2 %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1 %3 = load %struct.__Node, %struct.__Node* %next2, align 8, !tbaa !2 %tobool = icmp eq %struct.__Node* %3, null br i1 %tobool, label %while.end.loopexit, label %while.body while.end.loopexit: ; preds = %while.body %add.lcssa = phi double [ %add, %while.body ] br label %while.end while.end: ; preds = %while.end.loopexit, %entry %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, %while.end.loopexit ] %call3 = tail call i32 (i8, ...) @printf(i8 nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa) ret void } ``` IR before GVN after this patch: ``` define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2 %0 = bitcast i8* %call to %struct.__Node* %next = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 1 store %struct.__Node* null, %struct.__Node** %next, align 8, !tbaa !2 %f = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 0 store double 1.150000e+01, double* %f, align 8, !tbaa !8 %tobool12 = icmp eq i8* %call, null br i1 %tobool12, label %while.end, label %while.body.preheader while.body.preheader: ; preds = %entry br label %while.body while.body: ; preds = %while.body.preheader, %while.body %sum.014 = phi double [ %add, %while.body ], [ 0.000000e+00, %while.body.preheader ] %ptr.013 = phi %struct.__Node* [ %2, %while.body ], [ %0, %while.body.preheader ] %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0 %1 = load double, double* %f1, align 8, !tbaa !8 %add = fadd contract double %sum.014, %1 %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1 %2 = load %struct.__Node, %struct.__Node* %next2, align 8, !tbaa !2 %tobool = icmp eq %struct.__Node* %2, null br i1 %tobool, label %while.end.loopexit, label %while.body while.end.loopexit: ; preds = %while.body %add.lcssa = phi double [ %add, %while.body ] br label %while.end while.end: ; preds = %while.end.loopexit, %entry %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, %while.end.loopexit ] %call3 = tail call i32 (i8, ...) @printf(i8 nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa) ret void } ``` The phi translation fails before this patch and it prevents GVN to remove the loop. The reason for this failure is in InstCombine. When the Instruction combining pass decides to convert: ``` %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) %0 = bitcast i8* %call to %struct.__Node* %next = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 1 store %struct.__Node* null, %struct.__Node** %next ``` to ``` %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) %next = getelementptr inbounds i8, i8* %call, i64 8 %0 = bitcast i8* %next to %struct.__Node** store %struct.__Node* null, %struct.__Node** %0 ``` GEP instructions with pure byte indexes (e.g. `getelementptr inbounds i8, i8* %call, i64 8`) are obstacles for address translation. address translation is looking for structural similarity between GEPs and these GEPs usually do not match since they have different structure. This change will cause couple of failures in LLVM-tests. However, in all cases we need to change expected result by the test. I will update those tests as soon as I get green light on this patch. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96881	2021-03-16 17:05:44 -04:00
Philip Reames	cec9e7352b	[rs4gc] Simplify code by cloning existing instructions when inserting base chain [NFC] Previously we created a new node, then filled in the pieces. Now, we clone the existing node, then change the respective fields. The only change in handling is with phis since we have to handle multiple incoming edges from the same block a bit differently. Differential Revision: https://reviews.llvm.org/D98316	2021-03-16 13:10:32 -07:00
Philip Reames	ef884e155d	[rs4gc] don't force a conflict for a canonical broadcast A broadcast is a shufflevector where only one input is used. Because of the way we handle constants (undef is a constant), the canonical shuffle sees a meet of (some value) and (nullptr). Given this, every broadcast gets treated as a conflict and a new base pointer computation is added. The other way to tackle this would be to change constant handling specifically for undefs, but this seems easier. Differential Revision: https://reviews.llvm.org/D98315	2021-03-16 12:59:06 -07:00
Philip Reames	5cabf472cb	[rs4gc] don't duplicate existing values which are provably base pointers RS4GC needs to rewrite the IR to ensure that every relocated pointer has an associated base pointer. The existing code isn't particularly smart about avoiding duplication of existing IR when it turns out the original pointer we were asked to materialize a base pointer for is itself a base pointer. This patch adds a stage to the algorithm which prunes nodes proven (with a simple forward dataflow fixed point) to be base pointers from the list of nodes considered for duplication. This does require changing some of the later invariants slightly, that's probably the riskiest part of the change. Differential Revision: D98122	2021-03-16 12:51:21 -07:00
Liam Keegan	edf9565a86	[MemCpyOpt] Add missing MemorySSAWrapperPass dependency macro Add MemorySSAWrapperPass as a dependency to MemCpyOptLegacyPass, since MemCpyOpt now uses MemorySSA by default. Differential Revision: https://reviews.llvm.org/D98484	2021-03-16 20:30:00 +01:00
Philip Reames	6972e39d47	[gvn] CSE gc.relocates based on meaning, not spelling (try 2) This was (partially) reverted in `cfe8f8e0` because the conversion from readonly to readnone in Intrinsics.td exposed a couple of problems. This change has been reworked to not need that change (via some explicit checks in client code). This is being done to address the original optimization issue and simplify the testing of the readonly changes. I'm working on that piece under 49607. Original commit message follows: The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.) We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass. Differential Revision: https://reviews.llvm.org/D97974	2021-03-16 10:59:31 -07:00
Florian Hahn	f586de8459	[VPlan] Remove PredInst2Recipe, use VP operands instead. (NFC) Instead of maintaining a separate map from predicated instructions to recipes, we can instead directly look at the VP operands. If the operand comes from a predicated instruction, the operand will be a VPPredInstPHIRecipe with a VPReplicateRecipe as its operand.	2021-03-16 17:40:35 +00:00
Sanjay Patel	40fdb43d30	[SLP] improve readability in reduction logic; NFC We had 2 different and ambiguously-named 'I' variables.	2021-03-16 07:35:13 -04:00
Caroline Concatto	3c03635d53	[SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse This patch adds support for reverse loop vectorization. It is possible to vectorize the following loop: ``` for (int i = n-1; i >= 0; --i) a[i] = b[i] + 1.0; ``` with fixed or scalable vector. The loop-vectorizer will use 'reverse' on the loads/stores to make sure the lanes themselves are also handled in the right order. This patch adds support for scalable vector on IRBuilder interface to create a reverse vector. The IR function CreateVectorReverse lowers to experimental.vector.reverse for scalable vector and keedp the original behavior for fixed vector using shuffle reverse. Differential Revision: https://reviews.llvm.org/D95363	2021-03-16 07:51:59 +00:00
Wenlei He	a5d30421a6	[CSSPGO] Load context profile for external functions in PreLink and populate ThinLTO import list For ThinLTO's prelink compilation, we need to put external inline candidates into an import list attached to function's entry count metadata. This enables ThinLink to treat such cross module callee as hot in summary index, and later helps postlink to import them for profile guided cross module inlining. For AutoFDO, the import list is retrieved by traversing the nested inlinee functions. For CSSPGO, since profile is flatterned, a few things need to happen for it to work: - When loading input profile in extended binary format, we need to load all child context profile whose parent is in current module, so context trie for current module includes potential cross module inlinee. - In order to make the above happen, we need to know whether input profile is CSSPGO profile before start reading function profile, hence a flag for profile summary section is added. - When searching for cross module inline candidate, we need to walk through the context trie instead of nested inlinee profile (callsite sample of AutoFDO profile). - Now that we have more accurate counts with CSSPGO, we swtiched to use entry count instead of total count to decided if an external callee is potentially beneficial to inline. This make it consistent with how we determine whether call tagert is potential inline candidate. Differential Revision: https://reviews.llvm.org/D98590	2021-03-15 12:22:15 -07:00
Juneyoung Lee	edf634ebc2	[AssumeBundles] Add nonnull/align to op bundle if noundef exists This is a patch to add nonnull and align to assume's operand bundle only if noundef exists. Since nonnull and align in fn attr have poison semantics, they should be paired with noundef or noundef-implying attributes to be immediate UB. Reviewed By: jdoerfert, Tyker Differential Revision: https://reviews.llvm.org/D98228	2021-03-16 10:23:42 +09:00
Hongtao Yu	beea06c106	[NFC][Inliner] Debugging support to print funtion size after each inlining. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D98439	2021-03-14 22:11:53 -07:00
Chenguang Wang	166620a4f0	[ArgPromotion] Copy additional metadata for loads. Current ArgPromotion implementation does not copy it: https://godbolt.org/z/zzTKof Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D93927	2021-03-14 21:28:14 +00:00
Simonas Kazlauskas	7d7001b2cb	[InstCombine] Restrict a GEP transform to avoid changing provenance This is an alternative to D98120. Herein, instead of deleting the transformation entirely, we check that the underlying objects are both the same and therefore this transformation wouldn't incur a provenance change, if applied. https://alive2.llvm.org/ce/z/SYF_yv Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98588	2021-03-14 16:32:04 +02:00
Luo, Yuanke	66fbf5fafb	[X86][AMX] Prevent transforming load pointer from <256 x i32>* to x86_amx*. The load/store instruction will be transformed to amx intrinsics in the pass of AMX type lowering. Prohibiting the pointer cast make that pass happy. Differential Revision: https://reviews.llvm.org/D98247	2021-03-14 09:24:56 +08:00
Nikita Popov	5556660971	[MemCpyOpt] Handle read from lifetime.start with offset This fixes a regression from the MemDep-based implementation: MemDep completely ignores lifetime.start intrinsics that aren't MustAlias -- this is probably unsound, but it does mean that the MemDep based implementation successfully eliminated memcpy's from lifetime.start if the memcpy happens at an offset, rather than the base address of the alloca. Add a special case for the case where the lifetime.start spans the whole alloca (which is pretty much the only kind of lifetime.start that frontends ever emit), as we don't need to figure out our exact aliasing relationship in that case, the whole alloca is dead prior to the call. If this doesn't cover all practically relevant cases, then it would be possible to make use of the recently added PartialAlias clobber offsets to make this more precise.	2021-03-13 20:38:09 +01:00
Sanjay Patel	4224a36957	[InstCombine] avoid creating an extra instruction in zext fold and possible inf-loop The structure of this fold is suspect vs. most of instcombine because it creates instructions and tries to delete them immediately after. If we don't have the operand types for the icmps, then we are not behaving as assumed. And as shown in PR49475, we can inf-loop.	2021-03-13 08:30:51 -05:00
Roman Lebedev	6e9b9978cf	[LSR] Don't try to fixup uses in 'EH pad' instructions The added test case crashes before this fix: ``` opt: /repositories/llvm-project/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp:5172: BasicBlock::iterator (anonymous namespace)::LSRInstance::AdjustInsertPositionForExpand(BasicBlock::iterator, const (anonymous namespace)::LSRFixup &, const (anonymous namespace)::LSRUse &, llvm::SCEVExpander &) const: Assertion `!isa<PHINode>(LowestIP) && !LowestIP->isEHPad() && !isa<DbgInfoIntrinsic>(LowestIP) && "Insertion point must be a normal instruction"' failed. ``` This is fully analogous to the previous commit, with the pointer constant replaced to be something non-null. The comparison here can be strength-reduced, but the second operand of the comparison happens to be identical to the constant pointer in the `catch` case of `landingpad`. While LSRInstance::CollectLoopInvariantFixupsAndFormulae() already gave up on uses in blocks ending up with EH pads, it didn't consider this case. Eventually, `LSRInstance::AdjustInsertPositionForExpand()` will be called, but the original insertion point it will get is the user instruction itself, and it doesn't want to deal with EH pads, and asserts as much. It would seem that this basically never happens in-the-wild, otherwise it would have been reported already, so it seems safe to take the cautious approach, and just not deal with such users.	2021-03-13 16:05:34 +03:00
Nikita Popov	2902bdeea1	[MemCpyOpt] Use AA to check for MustAlias between memset and memcpy Rather than checking for simple equality, check for MustAlias, as we do in other transforms. This catches equivalent GEPs.	2021-03-13 11:41:15 +01:00
Nikita Popov	9080444f33	[MemCpyOpt] Don't generate zero-size memset If a memset destination is overwritten by a memcpy and the sizes are exactly the same, then the memset is simply dead. We can directly drop it, instead of replacing it with a memset of zero size, which is particularly ugly for the case of a dynamic size.	2021-03-13 11:41:15 +01:00
Wei Mi	ef9d7db723	[IndirectCallPromotion] Recommit "Don't strip ".__uniq." suffix when it strips ".llvm." suffix". The recommit fixed a bug that symbols with "." at the beginning is not properly handled in the last commit. Original commit message: Currently IndirectCallPromotion simply strip everything after the first "." in LTO mode, in order to match the symbol name and the name with ".llvm." suffix in the value profile. However, if -funique-internal-linkage-names and thinlto are both enabled, the name may have both ".__uniq." suffix and ".llvm." suffix, and the current mechanism will strip them both, which is unexpected. The patch fixes the problem. Differential Revision: https://reviews.llvm.org/D98389	2021-03-12 13:48:14 -08:00
Nikita Popov	42eb658f65	[OpaquePtrs] Remove some uses of type-less CreateGEP() (NFC) This removes some (but not all) uses of type-less CreateGEP() and CreateInBoundsGEP() APIs, which are incompatible with opaque pointers. There are a still a number of tricky uses left, as well as many more variation APIs for CreateGEP.	2021-03-12 21:01:16 +01:00
Florian Hahn	fb3ca70761	[LV] Account IV recipes being uniform in VPTransformState::get(). This patch fixes a crash when trying to get a scalar value using VPTransformState::get() for uniform induction values or truncated induction values. IVs and truncated IVs can be uniform and the updated code accounts for that, fixing the crash. This should fix https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=31981	2021-03-12 13:29:06 +00:00
Sanjay Patel	bd197ed0a5	[SimplifyCFG] avoid sinking insts within an infinite-loop The test is reduced from a C source example in: https://llvm.org/PR49541 It's possible that the test could be reduced further or the predicate generalized further, but it seems to require a few ingredients (including the "late" SimplifyCFG options on the RUN line) to fall into the infinite-loop trap.	2021-03-12 08:04:57 -05:00
Hans Wennborg	f50aef745c	Revert "[InstrProfiling] Don't generate __llvm_profile_runtime_user" This broke the check-profile tests on Mac, see comment on the code review. > This is no longer needed, we can add __llvm_profile_runtime directly > to llvm.compiler.used or llvm.used to achieve the same effect. > > Differential Revision: https://reviews.llvm.org/D98325 This reverts commit `c7712087cb`. Also reverting the dependent follow-up commit: Revert "[InstrProfiling] Generate runtime hook for ELF platforms" > When using -fprofile-list to selectively apply instrumentation only > to certain files or functions, we may end up with a binary that doesn't > have any counters in the case where no files were selected. However, > because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the > runtime would still be pulled in and incur some non-trivial overhead, > especially in the case when the continuous or runtime counter relocation > mode is being used. A better way would be to pull in the profile runtime > only when needed by declaring the __llvm_profile_runtime symbol in the > translation unit only when needed. > > This approach was already used prior to `9a041a7522`, but we changed it > to always generate the __llvm_profile_runtime due to a TAPI limitation. > Since TAPI is only used on Mach-O platforms, we could use the early > emission of __llvm_profile_runtime there, and on other platforms we > could change back to the earlier approach where the symbol is generated > later only when needed. We can stop passing -u__llvm_profile_runtime to > the linker on Linux and Fuchsia since the generated undefined symbol in > each translation unit that needed it serves the same purpose. > > Differential Revision: https://reviews.llvm.org/D98061 This reverts commit `87fd09b25f`.	2021-03-12 13:53:46 +01:00
Serguei Katkov	cfe8f8e0f0	Revert "Mark gc.relocate and gc.result as readnone" As readnone function they become movable and LICM can hoist them out of a loop. As a result in LCSSA form phi node of type token is created. No one is ready that GCRelocate first operand is phi node but expects to be token. GVN test were also updated, it seems it does not do what is expected. Test for LICM is also added. This reverts commit `f352463ade`.	2021-03-12 16:59:17 +07:00
Johannes Doerfert	ff256c1376	[Attributor] Derive `willreturn` based on `mustprogress` Since D86233 we have `mustprogress` which, in combination with `readonly`, implies `willreturn`. The idea is that every side-effect has to be modeled as a "write". Consequently, `readonly` means there is no side-effect, and `mustprogress` guarantees that we cannot "loop" forever without side-effect. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94125	2021-03-11 23:31:44 -06:00
Nikita Popov	2fe85dd289	[Attributor] Don't access pointer elem type in constructPointer (NFC) Splitting this out as the change is non-trivial: The way this code handled pointer types doesn't really make sense, as GEPs can only apply an offset to the outermost pointer, but can't drill down into interior pointer types (which would require dereferencing memory). Instead give special treatment to the first (pointer) index. I've hardcoded it to zero as that's the only way the function is used right now, but handling non-zero indexes would be straightforward. The original goal here was to have an element type for CreateGEP.	2021-03-11 21:36:40 +01:00
Petr Hosek	87fd09b25f	[InstrProfiling] Generate runtime hook for ELF platforms When using -fprofile-list to selectively apply instrumentation only to certain files or functions, we may end up with a binary that doesn't have any counters in the case where no files were selected. However, because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the runtime would still be pulled in and incur some non-trivial overhead, especially in the case when the continuous or runtime counter relocation mode is being used. A better way would be to pull in the profile runtime only when needed by declaring the __llvm_profile_runtime symbol in the translation unit only when needed. This approach was already used prior to `9a041a7522`, but we changed it to always generate the __llvm_profile_runtime due to a TAPI limitation. Since TAPI is only used on Mach-O platforms, we could use the early emission of __llvm_profile_runtime there, and on other platforms we could change back to the earlier approach where the symbol is generated later only when needed. We can stop passing -u__llvm_profile_runtime to the linker on Linux and Fuchsia since the generated undefined symbol in each translation unit that needed it serves the same purpose. Differential Revision: https://reviews.llvm.org/D98061	2021-03-11 12:29:01 -08:00
Valery N Dmitriev	73f94969b2	[SLP] Fix crash when matching associative reduction for integer min/max. Associative reduction matcher in SLP begins with select instruction but when it reached call to llvm.umax (or alike) via def-use chain the latter also matched as UMax kind. The routine's later code assumes matched instruction to be a select and thus it merely died on the first encountered cast that did not fit. Differential Revision: https://reviews.llvm.org/D98432	2021-03-11 11:52:57 -08:00
Wenlei He	051f2c144e	[SamplePGO] Skip inlinee profile scaling for sample loader inlining For CGSCC inline, we need to scale down a function's branch weights and entry counts when thee it's inlined at a callsite. This is done through updateCallProfile. Additionally, we also scale the weigths for the inlined clone based on call site count in updateCallerBFI. Neither is needed for inlining during sample profile loader as it's using context profile that is separated from inlinee's own profile. This change skip the inlinee profile scaling for sample loader inlining. Differential Revision: https://reviews.llvm.org/D98187	2021-03-11 10:18:26 -08:00
Hiroshi Yamauchi	365b225d46	[PGO] Fix two issues in PGOMemOPSizeOpt. 1. PGOMemOPSizeOpt grabs only the first, up to five (by default) entries from the value profile metadata and preserves the remaining entries for the fallback memop call site. If there are more than five entries, the rest of the entries would get dropped. This is fine for PGOMemOPSizeOpt itself as it only promotes up to 3 (by default) values, but potentially not for other downstream passes that may use the value profile metadata. 2. PGOMemOPSizeOpt originally assumed that only values 0 through 8 are kept track of. When the range buckets were introduced, it was changed to skip the range buckets, but since it does not grab all entries (only five), if some range buckets exist in the first five entries, it could potentially cause fewer promotion opportunities (eg. if 4 out of 5 were range buckets, it may be able to promote up to one non-range bucket, as opposed to 3.) Also, combined with 1, it means that wrong entries may be preserved, as it didn't correctly keep track of which were entries were skipped. To fix this, PGOMemOPSizeOpt now grabs all the entries (up to the maximum number of value profile buckets), keeps track of which entries were skipped, and preserves all the remaining entries. Differential Revision: https://reviews.llvm.org/D97592	2021-03-11 09:53:05 -08:00
Stephen Tozer	f40976bd01	Revert "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" This reverts commit `c0f3dfb9f1`. Reverted due to an error on the clang-x64-windows-msvc buildbot.	2021-03-11 14:48:01 +00:00
Nikita Popov	46354bac76	[OpaquePtrs] Remove some uses of type-less CreateLoad APIs (NFC) Explicitly pass loaded type when creating loads, in preparation for the deprecation of these APIs. There are still a couple of uses left.	2021-03-11 14:40:57 +01:00
gbtozers	c0f3dfb9f1	[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands This patch improves salvageDebugInfoImpl by allowing it to salvage arithmetic operations with two or more non-const operands; this includes the GetElementPtr instruction, and most Binary Operator instructions. These salvages produce DIArgList locations and are only valid for dbg.values, as currently variadic DIExpressions must use DW_OP_stack_value. This functionality is also only added for salvageDebugInfoForDbgValues; other functions that directly call salvageDebugInfoImpl (such as in ISel or Coroutine frame building) can be updated in a later patch. Differential Revision: https://reviews.llvm.org/D91722	2021-03-11 13:33:49 +00:00
Nikita Popov	403da6a69a	Reapply [LICM] Make promotion faster Relative to the previous implementation, this always uses aliasesUnknownInst() instead of aliasesPointer() to correctly handle atomics. The added test case was previously miscompiled. ----- Even when MemorySSA-based LICM is used, an AST is still populated for scalar promotion. As the AST has quadratic complexity, a lot of time is spent in this step despite the existing access count limit. This patch optimizes the identification of promotable stores. The idea here is pretty simple: We're only interested in must-alias mod sets of loop invariant pointers. As such, only populate the AST with loop-invariant loads and stores (anything else is definitely not promotable) and then discard any sets which alias with any of the remaining, definitely non-promotable accesses. If we promoted something, check whether this has made some other accesses loop invariant and thus possible promotion candidates. This is much faster in practice, because we need to perform AA queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable) instead of O(NumTotal^2), and NumPromotable tends to be small. Additionally, promotable accesses have loop invariant pointers, for which AA is cheaper. This has a signicant positive compile-time impact. We save ~1.8% geomean on CTMark at O3, with 6% on lencod in particular and 25% on individual files. Conceptually, this change is NFC, but may not be so in practice, because the AST is only an approximation, and can produce different results depending on the order in which accesses are added. However, there is at least no impact on the number of promotions (licm.NumPromoted) in test-suite O3 configuration with this change. Differential Revision: https://reviews.llvm.org/D89264	2021-03-11 10:50:28 +01:00
Djordje Todorovic	9f41c03f82	[Debugify][OriginalDIMode] Export the report into JSON file By using the original-di check with debugify in the combination with the llvm/utils/llvm-original-di-preservation.py it becomes very user friendly tool. An example of the HTML page with the issues related to debug info can be found at [0]. [0] https://djolertrk.github.io/di-checker-html-report-example/ Differential Revision: https://reviews.llvm.org/D82546	2021-03-11 01:11:13 -08:00
Petr Hosek	c7712087cb	[InstrProfiling] Don't generate __llvm_profile_runtime_user This is no longer needed, we can add __llvm_profile_runtime directly to llvm.compiler.used or llvm.used to achieve the same effect. Differential Revision: https://reviews.llvm.org/D98325	2021-03-10 22:33:51 -08:00
Ruiling Song	8b7d3bed0f	[ValueMapper] Add debug output for metadata remapping This is useful for debugging which pointers are updated during remapping process. Differential Revision: https://reviews.llvm.org/D95775	2021-03-11 09:54:55 +08:00
kuterd	d75c9e61a5	[Attributor] Attributor call site specific AAValueConstantRange This patch makes uses of the context bridges introduced in D83299 to make AAValueConstantRange call site specific. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D83744	2021-03-11 01:19:44 +03:00
Mauri Mustonen	0de8aeae72	[VPlan] Support to widen select intructions in VPlan native path Add support to widen select instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer. Previously select instructions get handled by the wrong recipe and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D97136	2021-03-10 20:59:53 +00:00
Matteo Favaro	989051d5f8	[DSE] Extending isOverwrite to support offsetted fully overlapping stores The isOverwrite function is making sure to identify if two stores are fully overlapping and ideally we would like to identify all the instances of OW_Complete as they'll yield possibly killable stores. The current implementation is incapable of spotting instances where the earlier store is offsetted compared to the later store, but still fully overlapped. The limitation seems to lie on the computation of the base pointers with the GetPointerBaseWithConstantOffset API that often yields different base pointers even if the stores are guaranteed to partially overlap (e.g. the alias analysis is returning AliasResult::PartialAlias). The patch relies on the offsets computed and cached by BatchAAResults (available after D93529) to determine if the offsetted overlapping is OW_Complete. Differential Revision: https://reviews.llvm.org/D97676	2021-03-10 21:09:33 +01:00
Sriraman Tallam	0ba1ebcbb7	Remove original implementation of UniqueInternalLinkageNames pass. D96109 was recently submitted which contains the refactored implementation of -funique-internal-linakge-names by adding the unique suffixes in clang rather than as an LLVM pass. Deleting the former implementation in this change. Differential Revision: https://reviews.llvm.org/D98234	2021-03-10 11:57:40 -08:00
gbtozers	81b8357e70	[DebugInfo][NFC] Refactor BinOp+GEP salvaging in salvageDebugInfoImpl This patch refactors out the salvaging of GEP and BinOp instructions into separate functions, in preparation for further changes to the salvaging of these instructions coming in another patch; there should be no functional change as a result of this refactor. Differential Revision: https://reviews.llvm.org/D92851	2021-03-10 18:03:12 +00:00
Daniil Seredkin	7c49f3c75b	[InstCombine][SimplifyLibCalls] An extra sqrtf was produced because of transformations in optimizePow function See: https://bugs.llvm.org/show_bug.cgi?id=47613 There was an extra sqrt call because shrinking emitted a new powf and at the same time optimizePow replaces the previous pow with sqrt and as the result we have two instructions that will be in worklist of InstCombie despite the fact that %powf is not used by anyone (it is alive because of errno). As the result we have two instructions: %powf = call fast float @powf(float %x, float 5.000000e-01) %sqrt = call fast double @sqrt(double %dx) %powf will be converted to %sqrtf on a later iteration. As a quick fix for that I moved shrinking to the end of optimizePow so that pow is replaced with sqrt at first that allows not to emit a new shrunk powf. Differential Revision: https://reviews.llvm.org/D98235	2021-03-10 12:33:05 -05:00
Ta-Wei Tu	7ff2768be1	Revert "[LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest`" This reverts commit `df9158c9a4`.	2021-03-11 01:24:43 +08:00
Jianzhou Zhao	6a9a686ce7	[dfsan] Tracking origins at phi nodes This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D98268	2021-03-10 17:02:58 +00:00
Dávid Bolvanský	c68b560be3	[DSE] Handle memmove with equal non-const sizes Follow up for fhahn's D98284. Also fixes a case from PR47644. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D98346	2021-03-10 17:52:00 +01:00
Florian Hahn	8d9b9c0edc	[DSE] Handle memcpy/memset with equal non-const sizes. Currently DSE misses cases where the size is a non-const IR value, even if they match. For example, this means that llvm.memcpy/llvm.memset calls are not eliminated, even if they write the same number of bytes. This patch extends isOverwite to try to get IR values for the number of bytes written from the analyzed instructions. If the values match, alias checks are performed and the result is returned. At the moment this only covers llvm.memcpy/llvm.memset. In the future, we may enable MemoryLocation to also track variable sizes, but this simple approach should allow us to cover the important cases in DSE. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98284	2021-03-10 10:13:58 +00:00
Wei Mi	ee35784a90	[SampleFDO] Support enabling -funique-internal-linkage-name. now -funique-internal-linkage-name flag is available, and we want to flip it on by default since it is beneficial to have separate sample profiles for different internal symbols with the same name. As a preparation, we want to avoid regression caused by the flip. When we flip -funique-internal-linkage-name on, the profile is collected from binary built without -funique-internal-linkage-name so it has no uniq suffix, but the IR in the optimized build contains the suffix. This kind of mismatch may introduce transient regression. To avoid such mismatch, we introduce a NameTable section flag indicating whether there is any name in the profile containing uniq suffix. Compiler will decide whether to keep uniq suffix during name canonicalization depending on the NameTable section flag. The flag is only available for extbinary format. For other formats, by default compiler will keep uniq suffix so they will only experience transient regression when -funique-internal-linkage-name is just flipped. Another type of regression is caused by places where we miss to call getCanonicalFnName. Those places are fixed. Differential Revision: https://reviews.llvm.org/D96932	2021-03-09 21:41:40 -08:00
Philip Reames	cf1899e0a9	[rs4gc] common bdv operand visitation [nfc]	2021-03-09 20:28:47 -08:00
Arnold Schwaighofer	590ac0a26a	[coro async] Transfer the original function's attributes to the clone rdar://75052917 Differential Revision: https://reviews.llvm.org/D98051	2021-03-09 17:01:41 -08:00
Leonard Chan	cf371573b0	[llvm] Change DSOLocalEquivalent type if the underlying global value type changes We encountered an issue where LTO running on IR that used the DSOLocalEquivalent constant would result in bad codegen. The underlying issue was ValueMapper wasn't properly handling DSOLocalEquivalent, so this just adds the machinery for handling it. This code path is triggered by a fix to DSOLocalEquivalent::handleOperandChangeImpl where DSOLocalEquivalent could potentially not have the same type as its underlying GV. This updates DSOLocalEquivalent::handleOperandChangeImpl to change the type if the GV type changes and handles this constant in ValueMapper. Differential Revision: https://reviews.llvm.org/D97978	2021-03-09 15:09:48 -08:00
Sanjay Patel	23fd647cc6	[SLP] remove dead null check; NFC We cast<> to Instruction (not dyn_cast<>), so we already required/assumed that Cmp is not null.	2021-03-09 17:43:07 -05:00
Jianzhou Zhao	8506fe5b41	[dfsan] Tracking origins at memory transfer This is a part of https://reviews.llvm.org/D95835. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D98192	2021-03-09 22:15:07 +00:00
Juneyoung Lee	f49354838e	Revert "[InstCombine] Add simplification of two logical and/ors" This reverts commit `07c3b97e18` due to a reported failure in two-stage build.	2021-03-10 05:48:31 +09:00
gbtozers	df69c69427	[DebugInfo] Handle multiple variable location operands in IR This patch updates the various IR passes to correctly handle dbg.values with a DIArgList location. This patch does not actually allow DIArgLists to be produced by salvageDebugInfo, and it does not affect any pass after codegen-prepare. Other than that, it should cover every IR pass. Most of the changes simply extend code that operated on a single debug value to operate on the list of debug values in the style of any_of, all_of, for_each, etc. Instances of setOperand(0, ...) have been replaced with with replaceVariableLocationOp, which takes the value that is being replaced as an additional argument. In places where this value isn't readily available, we have to track the old value through to the point where it gets replaced. Differential Revision: https://reviews.llvm.org/D88232	2021-03-09 16:44:38 +00:00
Sanjay Patel	2986a9c7e2	[InstCombine] canonicalize 'not' op after min/max intrinsic This is another step towards parity between existing select transforms and min/max intrinsics (D98152).. The existing 'not' folds around select are complicated, so it's likely that we will need to enhance this, but this should be a safe step.	2021-03-09 11:33:28 -05:00
Sanjay Patel	41b9209a12	[InstCombine] fold min/max intrinsics with not ops This is a partial translation of the existing select-based folds. We need to recreate several different transforms to avoid regressions as noted in D98152. https://alive2.llvm.org/ce/z/teuZ_J	2021-03-09 08:55:48 -05:00
Florian Hahn	92da5b7119	[InstCombine] Simplify phis with incoming pointer-casts. If the incoming values of a phi are pointer casts of the same original value, replace the phi with a single cast. Such redundant phis are somewhat common after loop-rotate and removing them can avoid some unnecessary code bloat, e.g. because an iteration of a loop is peeled off to make the phi invariant. It should also simplify further analysis on its own. InstCombine already uses stripPointerCasts in a couple of places and also simplifies phis based on the incoming values, so the patch should fit in the existing scope. The patch causes binary changes in 47 out of 237 benchmarks in MultiSource/SPEC2000/SPEC2006 with -O3 -flto on X86. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98058	2021-03-09 11:40:18 +00:00
Hongtao Yu	4c3d759d00	[CSSPGO] Always use callsite samples as callsite probe counts. For CS profile, the callsite count of previously inlined callees is populated with the entry count of the callees. Therefore when trying to get a weight for calliste probe after inlinining, the callsite count should always be used. The same fix has already been made for non-probe case. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D98094	2021-03-08 22:52:36 -08:00
Akira Hatanaka	dca5737945	Move ObjCARCUtil.h back to llvm/Analysis Instead of adding the header to llvm/IR, just duplicate the marker string in the auto upgrader.	2021-03-08 16:35:24 -08:00
Alina Sbirlea	29482426b5	Revert "[LICM] Make promotion faster" Revert `3d8f842712` Revision triggers a miscompile sinking a store incorrectly outside a threading loop. Detected by tsan. Reverting while investigating. Differential Revision: https://reviews.llvm.org/D89264	2021-03-08 12:53:03 -08:00
Philip Reames	ebc61f9d3c	[instcombine] Collapse trivial or recurrences If we have a recurrence of the form <Start, Or, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence. Differential Revision: https://reviews.llvm.org/D97578 (or part)	2021-03-08 09:21:38 -08:00
Philip Reames	239a618180	[instcombine] Collapse trivial and recurrences If we have a recurrence of the form <Start, And, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence. Differential Revision: https://reviews.llvm.org/D97578 (and part)	2021-03-08 09:21:38 -08:00
Philip Reames	97a7bc5831	[gvn] Precisely propagate equalities to phi operands The code used for propagating equalities (e.g. assume facts) was conservative in two ways - one of which this patch fixes. Specifically, it shifts the code reasoning about whether a use is dominated by the end of the assume block to consider phi uses to exist on the predecessor edge. This matches the dominator tree handling for dominates(Edge, Use), and simply extends it to dominates(BB, Use). Note that the decision to use the end of the block is itself a conservative choice. The more precise option would be to use the later of the assume and the value, and replace all uses after that. GVN handles that case separately (with the replace operand mechanism) because it used to be expensive to ask dominator questions within blocks. With the new instruction ordering support, we should probably rewrite this code at some point to simplify. Differential Revision: https://reviews.llvm.org/D98082	2021-03-08 08:59:00 -08:00
Sanne Wouda	05a6e2eb9a	[InstCombine] Add a combine for a shuffle of similar bitcasts Some intrinsics wrapper code has the habit of ignoring the type of the elements in vectors, thinking of vector registers as a "bag of bits". As a consequence, some operations are shared between vectors of different types are shared. For example, functions that rearrange elements in a vector can be shared between vectors of int32 and float. This can result in bitcasts in awkward places that prevent the backend from recognizing some instructions. For AArch64 in particular, it inhibits the selection of dup from a general purpose register (GPR), and mov from GPR to a vector lane. This patch adds a pattern in InstCombine to move the bitcasts past the shufflevector if this is possible. Sometimes this even allows InstCombine to remove the bitcast entirely, as in the included tests. Alternatively this could be done with a few extra patterns in the AArch64 backend, but InstCombine seems like a better place for this. Differential Revision: https://reviews.llvm.org/D97397	2021-03-08 16:32:30 +00:00
Sanne Wouda	5e963a2441	Rehome an orphaned comment [NFC] As seen in `35827164c4`, the "shuffle x, x, mask" comment has drifted away from the implementation of the pattern. Put it back.	2021-03-08 16:32:30 +00:00
Stephen Tozer	4343c68fa3	Fix: [DebugInfo] Support DIArgList in DbgVariableIntrinsic This patch removed the only use of a lambda capture, triggering an error on `-Werror -Wunused-lambda-capture` builds.	2021-03-08 14:57:11 +00:00
gbtozers	e5d958c456	[DebugInfo] Support DIArgList in DbgVariableIntrinsic This patch updates DbgVariableIntrinsics to support use of a DIArgList for the location operand, resulting in a significant change to its interface. This patch does not update all IR passes to support multiple location operands in a dbg.value; the only change is to update the DbgVariableIntrinsic interface and its uses. All code outside of the intrinsic classes assumes that an intrinsic will always have exactly one location operand; they will still support DIArgLists, but only if they contain exactly one Value. Among other changes, the setOperand and setArgOperand functions in DbgVariableIntrinsic have been made private. This is to prevent code from setting the operands of these intrinsics directly, which could easily result in incorrect/invalid operands being set. This does not prevent these functions from being called on a debug intrinsic at all, as they can still be called on any CallInst pointer; it is assumed that any code directly setting the operands on a generic call instruction is doing so safely. The intention for making these functions private is to prevent DIArgLists from being overwritten by code that's naively trying to replace one of the Values it points to, and also to fail fast if a DbgVariableIntrinsic is updated to use a DIArgList without a valid corresponding DIExpression.	2021-03-08 14:36:13 +00:00
Ta-Wei Tu	df9158c9a4	[LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest` The check `tightlyNested()` in `LoopInterchange` is similar to the one in `LoopNest`. In fact, the former misses some cases where loop-interchange is not feasible and results in incorrect behaviour. Replacing it with the much robust version provided by `LoopNest` reduces code duplications and fixes https://bugs.llvm.org/show_bug.cgi?id=48113. `LoopInterchange` has a weaker definition of tightly or perfectly nesting-ness than the one implemented in `LoopNest::arePerfectlyNested()`. Therefore, `tightlyNested()` is instead implemented with `LoopNest::checkLoopsStructure` and additional checks for unsafe instructions. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D97290	2021-03-08 11:36:08 +08:00
Mehdi Amini	8d5a981a13	Revert "[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe" This reverts commit `99108c791d`. Clang is miscompiling LLVM with this change, a stage-2 build hits multiple failures. As a repro, I built clang in a stage1 directory and used it this way: cmake -G Ninja ../llvm \ -DCMAKE_CXX_COMPILER=`pwd`/../build-stage1/bin/clang++ \ -DCMAKE_C_COMPILER=`pwd`/../build-stage1/bin/clang \ -DLLVM_TARGETS_TO_BUILD="X86;NVPTX;AMDGPU" \ -DLLVM_ENABLE_PROJECTS=mlir \ -DLLVM_BUILD_EXAMPLES=ON \ -DCMAKE_BUILD_TYPE=Release \ -DLLVM_ENABLE_ASSERTIONS=On ninja check-mlir	2021-03-08 00:15:47 +00:00
Juneyoung Lee	07c3b97e18	[InstCombine] Add simplification of two logical and/ors This is a patch that adds folding of two logical and/ors that share one variable: a && (a && b) -> a && b a && (a & b) -> a && b ... This is towards removing the poison-unsafe select optimization (D93065 has more context). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96945	2021-03-08 02:38:43 +09:00
Juneyoung Lee	d672c81126	[InstCombine] use safe transformation by default .. since it will be folded into and/or anyway	2021-03-08 02:25:29 +09:00
Nikita Popov	2b494f85f1	[CVP] Remove -cvp-dont-add-nowrap-flags option This option was originally added to work around a bug in LFTR. The bug has long since been fixed.	2021-03-07 18:19:31 +01:00
Nikita Popov	176bbcae11	[DSE] Remove MemDep-based implementation The MemorySSA-based implementation has been enabled without issue for a while now, so keeping the old implementation around doesn't seem useful anymore. This drops the MemDep-based implementation. Differential Revision: https://reviews.llvm.org/D97877	2021-03-07 18:17:31 +01:00
Juneyoung Lee	33590ed4f2	[InstCombine] fix another poison-unsafe select transformation This fixes another unsafe select folding by disabling it if EnableUnsafeSelectTransform is set to false. EnableUnsafeSelectTransform's default value is true, hence it won't affect generated code (unless the flag is explicitly set to false).	2021-03-08 02:11:04 +09:00
Juneyoung Lee	99108c791d	[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe This patch makes FoldBranchToCommonDest merge branch conditions into `select i1` rather than `and/or i1` when it is called by SimplifyCFG. It is known that merging conditions into and/or is poison-unsafe, and this is towards making things more correct by removing possible miscompilations. Currently, InstCombine simply consumes these selects into and/or of i1 (which is also unsafe), so the visible effect would be very small. The unsafe select -> and/or transformation will be removed in the future. There has been efforts for updating optimizations to support the select form as well, and they are linked to D93065. The safe transformation is fired when it is called by SimplifyCFG only. This is done by setting the new `PoisonSafe` argument as true. Another place that calls FoldBranchToCommonDest is LoopSimplify. `PoisonSafe` flag is set to false in this case because enabling it has a nontrivial impact in performance because SCEV is more conservative with select form and InductiveRangeCheckElimination isn't aware of select form of and/or i1. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95026	2021-03-08 01:38:03 +09:00
Juneyoung Lee	5bb38e84d3	[LoopUnswitch] unswitch if cond is in select form of and/or as well Hello all, I'm trying to fix unsafe propagation of poison values in and/or conditions by using equivalent select forms (`select i1 A, i1 B, i1 false` and `select i1 A, i1 true, i1 false`) instead. D93065 has links to patches for this. This patch allows unswitch to happen if the condition is in this form as well. `collectHomogenousInstGraphLoopInvariants` is updated to keep traversal if Root and the visiting I matches both m_LogicalOr()/m_LogicalAnd(). Other than this, the remaining changes are almost straightforward and simply replaces Instruction::And/Or check with match(m_LogicalOr()/m_LogicalAnd()). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D97756	2021-03-08 01:19:43 +09:00
Nikita Popov	3fedaf2a52	[GVN] Don't explicitly materialize undefs from dead blocks When materializing an available load value, do not explicitly materialize the undef values from dead blocks. Doing so will will force creation of a phi with an undef operand, even if there is a dominating definition. The phi will be folded away on subsequent GVN iterations, but by then we may have already poisoned MDA cache slots. Simply don't register these values in the first place, and let SSAUpdater do its thing.	2021-03-06 23:46:24 +01:00
Fangrui Song	fb2cf0dd60	[FunctionImport] Delete unneeded setLive. NFC ValueInfo's in Worklist are guaranteed to be live.	2021-03-06 14:09:54 -08:00
Mauri Mustonen	494b5ba364	[VPlan] Support to widen call intructions in VPlan native path Add support to widen call instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer. Previously call instructions got handled by wrong recipes and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139. Patch by Mauri Mustonen <mauri.mustonen@tuni.fi> Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D97278	2021-03-06 21:59:52 +00:00
Roman Lebedev	2ad1f5eb1a	[InstCombine] Don't canonicalize (gep i8* X, -(ptrtoint Y)) as (inttoptr (sub (ptrtoint X), (ptrtoint Y))) It's just a wrong thing to do. We introduce inttoptr where there were none, which results in loosing all provenance information because we no longer have a GEP{i,}, and pessimize all future optimizations, because we are basically not allowed to look past `inttoptr`. (gep i8* X, -(ptrtoint Y)) is the canonical form. So just drop this fold. Noticed while reviewing D98120.	2021-03-06 23:00:25 +03:00
Fangrui Song	9fb6782c69	[rs4gc] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds	2021-03-06 11:42:27 -08:00
Roman Lebedev	b46c085d2b	[NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions These intrinsics, not the icmp+select are the canonical form nowadays, so we might as well directly emit them. This should not cause any regressions, but if it does, then then they would needed to be fixed regardless. Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`, but that is a pessimization, not a correctness issue. Additionally, the non-intrinsic form has issues with undef, see https://reviews.llvm.org/D88287#2587863	2021-03-06 21:52:46 +03:00
Ta-Wei Tu	8a003861a3	[NPM] Add -enable-loopinterchange option to NPM We have the `enable-loopinterchange` option in legacy pass manager but not in NPM. Add `LoopInterchange` pass to the optimization pipeline (at the same position as before) when `enable-loopinterchange` is turned on. Reviewed By: aeubanks, fhahn Differential Revision: https://reviews.llvm.org/D98116	2021-03-07 02:39:28 +08:00
William S. Moses	d163e75c81	[Attributor] Enable heap-to-stack of any size Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1 Differential Revision: https://reviews.llvm.org/D97873	2021-03-06 12:57:32 -05:00
Philip Reames	5db2735af9	[gvn] Handle simply phi equivalence cases GVN basically doesn't handle phi nodes at all. This is for a reason - we can't value number their inputs since the predecessor blocks have probably not been visited yet. However, it also creates a significant pass ordering problem. As it stands, instcombine and simplifycfg ends up implementing CSE of phi nodes. This means that for any series of CSE opportunities intermixed with phi nodes, we end up having to alternate instcombine/simplifycfg and gvn to make progress. This patch handles the simplest case by simply preprocessing the phi instructions in a block, and CSEing them if they are syntactically identical. This turns out to be powerful enough to handle many cases in a single invocation of GVN since blocks which use the cse'd phi results are visited after the block containing the phi. If there's a CSE opportunity in one the phi predecessors required to recognize the phi CSE opportunity, that will require a second iteration on the function. (Still within a single run of gvn though.) Compile time wise, this could go either way. On one hand, we're potentially causing GVN to iterate over the function more. On the other, we're cutting down on iterations between two passes and potentially shrinking the IR aggressively. So, a bit unclear what to expect. Note that this does still rely on instcombine to canonicalize block order of the phis, but that's a one time transformation independent of the values incoming to the phi. Differential Revision: https://reviews.llvm.org/D98080	2021-03-06 09:31:12 -08:00
Philip Reames	8fe59ba51e	[rs4gc] track the original value in the state use for base pointer rewriting I'd originally intended to build on this for another purpose and have decided not to, but at a minimum, the stronger asserts are useful.	2021-03-06 08:46:15 -08:00
Philip Reames	6334952ff0	[rs4gc] minor code style improvement	2021-03-06 08:46:15 -08:00
Philip Reames	51b13a7ea0	[gvn] CSE gc.relocates based on meaning, not spelling The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.) We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass. Differential Revision: https://reviews.llvm.org/D97974 (Note: Part of the reviewed change was split and landed as `f352463a`)	2021-03-05 10:16:12 -08:00
Philip Reames	f352463ade	Mark gc.relocate and gc.result as readnone For some reason, we had been marking gc.relocates as reading memory. There's no known reason for this, and I suspect it to be a legacy of very early implementation conservatism. gc.relocate and gc.result are simply projections of the return values from the associated statepoint. Note that the LangRef has always declared them readnone. The EarlyCSE change is simply moving the special casing from readonly to readnone handling. As noted by the test diffs, this does allow some additional CSE when relocates are separated by stores, but since we generate gc.relocates in batches, this is unlikely to help anything in practice. This was reviewed as part of https://reviews.llvm.org/D97974, but split at reviewer request before landing. The motivation is to enable the GVN changes in that patch.	2021-03-05 10:07:17 -08:00
Philip Reames	99f93dd3a5	[rs4gc] avoid insert base computation instructions for deopt uses If we have a value live over a call which is used for deopt at the call, we know that the value must be a base pointer. We can avoid potentially inserting IR to materialize a base for this value. In it's current form, this is mostly a compile time optimization. Building the base pointer graph (and then optimizing it away again) is a relatively expensive operation. We also sometimes end up with better codegen in practice - due to failures in optimizing away the inserted base pointer propogation - but those are optimization bugs we're fixing concurrently. The alternative to this would be to extend the base pointer inference with the ability to generally reuse multiple-base input instructions (phis and selects). That's somewhat invasive and complicated, so we're defering it a bit longer. Differential Revision: https://reviews.llvm.org/D97885	2021-03-05 09:55:36 -08:00
gbtozers	65600cb2a7	[DebugInfo] Add DIArgList MD to store multple values in DbgVariableIntrinsics This patch adds a new metadata node, DIArgList, which contains a list of SSA values. This node is in many ways similar in function to the existing ValueAsMetadata node, with the difference being that it tracks a list instead of a single value. Internally, it uses ValueAsMetadata to track the individual values, but there is also a reasonable amount of DIArgList-specific value-tracking logic on top of that. Similar to ValueAsMetadata, it is a special case in parsing and printing due to the fact that it requires a function state (as it may reference function-local values). This patch should not result in any immediate functional change; it allows for DIArgLists to be parsed and printed, but debug variable intrinsics do not yet recognize them as a valid argument (outside of parsing). Differential Revision: https://reviews.llvm.org/D88175	2021-03-05 17:02:24 +00:00
David Sherwood	fec0a0adac	[SVE][LoopVectorize] Add support for extracting the last lane of a scalable vector There are certain loops like this below: for (int i = 0; i < n; i++) { a[i] = b[i] + 1; *inv = a[i]; } that can only be vectorised if we are able to extract the last lane of the vectorised form of 'a[i]'. For fixed width vectors this already works since we know at compile time what the final lane is, however for scalable vectors this is a different story. This patch adds support for extracting the last lane from a scalable vector using a runtime determined lane value. I have added support to VPIteration for runtime-determined lanes that still permit the caching of values. I did this by introducing a new class called VPLane, which describes the lane we're dealing with and provides interfaces to get both the compile-time known lane and the runtime determined value. Whilst doing this work I couldn't find any explicit tests for extracting the last lane values of fixed width vectors so I added tests for both scalable and fixed width vectors. Differential Revision: https://reviews.llvm.org/D95139	2021-03-05 09:57:56 +00:00
Michael Kruse	b119120673	[clang][OpenMP] Use OpenMPIRBuilder for workshare loops. Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to: * Only the worksharing-loop directive. * Recognizes only the nowait clause. * No loop nests with more than one loop. * Untested with templates, exceptions. * Semantic checking left to the existing infrastructure. This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics: * The distance function: How many loop iterations there will be before entering the loop nest. * The loop variable function: Conversion from a logical iteration number to the loop variable. These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang. The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does. For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting. Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset. Reviewed By: jdenny Differential Revision: https://reviews.llvm.org/D94973	2021-03-04 22:52:59 -06:00
Wei Mi	2357d29335	[SampleFDO] Another fix to prevent repeated indirect call promotion in sample loader pass. In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9, to prevent repeated indirect call promotion for the same indirect call and the same target, we used zero-count value profile to indicate an indirect call has been promoted for a certain target. We removed PromotedInsns cache in the same patch. However, there was a problem in that patch described below, and that problem led me to add PromotedInsns back as a mitigation in https://reviews.llvm.org/rG4ffad1fb489f691825d6c7d78e1626de142f26cf. When we get value profile from metadata by calling getValueProfDataFromInst, we need to specify the maximum possible number of values we expect to read. We uses MaxNumPromotions in the last patch so the maximum number of value information extracted from metadata is MaxNumPromotions. If we have many values including zero-count values when we write the metadata, some of them will be dropped when we read them because we only read MaxNumPromotions values. It will allow repeated indirect call promotion again. We need to make sure if there are values indicating promoted targets, those values need to be saved in metadata with higher priority than other values. The patch fixed that problem. We change to use -1 to represent the count of a promoted target instead of 0 so it is easier to sort the values. When we prepare to update the metadata in updateIDTMetaData, we will sort the values in the descending count order and extract only MaxNumPromotions values to write into metadata. Since -1 is the max uint64_t number, if we have equal to or less than MaxNumPromotions of -1 count values, they will all be kept in metadata. If we have more than MaxNumPromotions of -1 count values, we will only save MaxNumPromotions such values maximally. In such case, we have logic in place in doesHistoryAllowICP to guarantee no more promotion in sample loader pass will happen for the indirect call, because it has been promoted enough. With this change, now we can remove PromotedInsns without problem. Differential Revision: https://reviews.llvm.org/D97350	2021-03-04 18:44:12 -08:00
David Blaikie	a2a55def35	Move llvm/Analysis/ObjCARCUtil.h to IR to fix layering. This is included from IR files, and IR doesn't/can't depend on Analysis (because Analysis depends on IR). Also fix the implementation - don't use non-member static in headers, as it leads to ODR violations, inaccurate "unused function" warnings, etc. And fix the header protection macro name (we don't generally include "LIB" in the names, so far as I can tell).	2021-03-04 16:14:53 -08:00
Jianzhou Zhao	db7fe6cd4b	[dfsan] Propagate origin tracking at store This is a part of https://reviews.llvm.org/D95835. Reviewed By: morehouse, gbalats Differential Revision: https://reviews.llvm.org/D97789	2021-03-04 23:34:44 +00:00
William S. Moses	2b896e39bf	Revert "[Attributor] Enable heap-to-stack of any size" This reverts commit `51bd42ef9b`.	2021-03-04 17:24:56 -05:00
Sanjay Patel	1bee549737	[LoopVectorize] propagate fast-math-flags from induction instructions This code assumed that FP math was only permissable if it was fully "fast", so it hard-coded "fast" when creating new instructions. The underlying code already allows matching recurrences/reductions that are only "reassoc", so this change should prevent the potential miscompile seen in the test diffs (we created "fast" ops even though none existed in the original code). I don't know if we need to create the temporary IRBuilder objects used here, so that could be follow-up clean-up. There's an open question about whether we should require "nsz" in addition to "reassoc" here. InstCombine uses that combo for its reassociative folds, but I think codegen is not as strict.	2021-03-04 17:21:32 -05:00
William S. Moses	51bd42ef9b	[Attributor] Enable heap-to-stack of any size Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1 Differential Revision: https://reviews.llvm.org/D97873	2021-03-04 17:17:23 -05:00
Francis Visoiu Mistrih	365b78396a	[Remarks] Emit variable info in auto-init remarks This enhances the auto-init remark with information about the variable that is auto-initialized. This is based of debug info if available, or alloca names (mostly for development purposes). ``` auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes.Variables: var (4096 bytes). [-Rpass-missed=annotation-remarks] int var[1024]; ^ ``` This allows to see things like partial initialization of a variable that the optimizer won't be able to completely remove. Differential Revision: https://reviews.llvm.org/D97734	2021-03-04 12:51:22 -08:00
Akira Hatanaka	1900503595	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `ed4718eccb`, which was reverted because it was causing a miscompile. The bug that was causing the miscompile has been fixed in `75805dce5f`. Original commit message: Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-03-04 11:22:30 -08:00
Adrian Prantl	d268febc56	Improve the debug info for coro-split .resume functions This patch updates the scope line to point to the suspend point. This makes the first address in the function point to the first source line in the resume function rather than the function declaration. Without this the line table "jumps" from the beginning of the function to the suspend point at the beginning. rdar://73386346 Differential Revision: https://reviews.llvm.org/D97345	2021-03-04 11:05:35 -08:00
Jianzhou Zhao	72abc9bf07	[dfsan] add a missing zero origin at atomic commands	2021-03-04 16:50:05 +00:00
Alexey Bataev	04ba80ca4d	[Instcombiner]Improve emission of logical or/and reductions. For logical or/and reductions we emit regular intrinsics @llvm.vector.reduce.or/and.vxi1 calls. These intrinsics are not effective for the logical or/and reductions, especially if the optimizer is able to emit short circuit versions of the scalar or/and instructions and vector code gets less effective than the scalar version. Instead, or reduction for i1 can be represented as: ``` %val = bitcast <ReduxWidth x i1> to iReduxWidth %res = cmp ne iReduxWidth %val, 0 ``` and reduction for i1 can be represented as: ``` %val = bitcast <ReduxWidth x i1> to iReduxWidth %res = cmp eq iReduxWidth %val, 11111 ``` This improves perfromance of the vector code significantly and make it to outperform short circuit scalar code. Part of D57059. Differential Revision: https://reviews.llvm.org/D97406	2021-03-04 08:01:02 -08:00
Sanjay Patel	36a489d194	[Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC Similar to `b3a33553ae`, but this shows a TODO and a potential miscompile is already present. We are tracking an FP instruction that does not have FMF (reassoc) properties, so calling that "Unsafe" seems opposite of the common reading. I also removed one getter method by rolling the null check into the access. Further simplification may be possible. The motivation is to clean up the interactions between FMF and function-level attributes in these classes and their callers. The new test shows that there is an existing bug somewhere in the callers. We assumed that the original code was fully 'fast' and so we produced IR with 'fast' even though it was just 'reassoc'.	2021-03-04 10:40:26 -05:00
Sanjay Patel	b3a33553ae	[Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC We are tracking an FP instruction that does not have FMF (reassoc) properties, so calling that "Unsafe" seems opposite of the common reading. I also removed one getter method by rolling the null check into the access. Further simplification seems possible. The motivation is to clean up the interactions between FMF and function-level attributes in these classes and their callers.	2021-03-04 08:53:04 -05:00
Hongtao Yu	c75da238b4	[CSSPGO] Deduplicating dangling pseudo probes. Same dangling probes are redundant since they all have the same semantic that is to rely on the counts inference tool to get reasonable count for the same original block. Therefore, there's no need to keep multiple copies of them. I've seen jump threading created tons of redundant dangling probes that slowed down the compiler dramatically. Other optimization passes can also result in redundant probes though without an observed impact so far. This change removes block-wise redundant dangling probes specifically introduced by jump threading. To support removing redundant dangling probes caused by all other passes, a final function-wise deduplication is also added. An 18% size win of the .pseudo_probe section was seen for SPEC2017. No performance difference was observed. Differential Revision: https://reviews.llvm.org/D97482	2021-03-03 22:44:42 -08:00
Hongtao Yu	8985515822	[CSSPGO] Unblocking optimizations by dangling pseudo probes. This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments. The optimizations being unblocked are: 1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted. 2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions. 3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks. Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D97481	2021-03-03 22:44:42 -08:00
Johannes Doerfert	5b70c12f3e	[Attributor] Make DepClass a required argument We often used a sub-optimal dependence class in the past because we didn't see the argument. Let's make it explicit so we remember to think about it.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	e592dad82e	[Attributor] Fold "TrackDependence" into the DepClassTy enum We don't need a bool and an enum to express the three options we currently have. This makes the interface nicer and much easier to use optional dependencies. Also avoids mistakes where the bool is false and enum ignored.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	c8c93fdf0a	[Attributor] Avoid work for GEPs and wait till the users are visited	2021-03-04 00:35:52 -06:00
Johannes Doerfert	f3f88287c5	[Attributor] Use known alignment as lower bound to avoid work If we know already more than available from a use, we don't need to invest time on it.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	c14213e030	[Attributor][NFC] Move some trivial checks up	2021-03-04 00:35:52 -06:00
Johannes Doerfert	09c3eebf5f	[Attributor] Use sensible initialization in AANoCaptureCallSiteReturned	2021-03-04 00:35:51 -06:00
Evgeniy Brevnov	e94125f054	[DSE] Add support for not aligned begin/end This is an attempt to improve handling of partial overlaps in case of unaligned begin\end. Existing implementation just bails out if it encounters such cases. Even when it doesn't I believe existing code checking alignment constraints is not quite correct. It tries to ensure alignment of the "later" start/end offset while should be preserving relative alignment between earlier and later start/end. The idea behind the change is simple. When start/end is not aligned as we wish instead of bailing out let's adjust it as necessary to get desired alignment. I'll update with performance results as measured by the test-suite...it's still running... Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D93530	2021-03-04 12:24:23 +07:00
Serguei Katkov	a0ff0f30df	[InstCombine] Move statepoint intrinsic handling from visitCall to visitCallBase statepoint intrinsic can be used in invoke context, so it should be handled in visitCallBase to cover both call and invoke. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D97833	2021-03-04 11:00:22 +07:00
Xun Li	03f668613c	[LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions See pr46990(https://bugs.llvm.org/show_bug.cgi?id=46990). LICM should not sink store instructions to loop exit blocks which cross coro.suspend intrinsics. This breaks semantic of coro.suspend intrinsic which return to caller directly. Also this leads to use-after-free if the coroutine is freed before control returns to the caller in multithread environment. This patch disable promotion by check whether loop contains coro.suspend intrinsics. This is a resubmit of D86190. Disabling LICM for loops with coroutine suspension is a better option not only for correctness purpose but also for performance purpose. In most cases LICM sinks memory operations. In the case of coroutine, sinking memory operation out of the loop does not improve performance since coroutien needs to get data from the frame anyway. In fact LICM would hurt coroutine performance since it adds more entries to the frame. Differential Revision: https://reviews.llvm.org/D96928	2021-03-03 15:21:57 -08:00
Whitney Tsang	58d531fd6f	[LoopUnrollRuntime] Add option to assume the non latch exit block to be predictable. Reviewed By: Meinersbur, bmahjour Differential Revision: https://reviews.llvm.org/D97747	2021-03-03 20:43:31 +00:00
Philip Reames	89d331a31e	Address review comment from D97219 (follow up to `8051156`) Probably should have done this before landing, but I forgot. Basic idea is to avoid using the SCEV predicate when it doesn't buy us anything. Also happens to set us up for handling non-add recurrences in the future if desired.	2021-03-03 12:20:27 -08:00
Philip Reames	99f5417346	Sink routine for replacing a operand bundle to CallBase [NFC] We had equivalent code for both CallInst and InvokeInst, but never cared about the result type.	2021-03-03 12:07:55 -08:00
Philip Reames	805115655e	[LSR] Unify scheduling of existing and inserted addrecs LSR goes to some lengths to schedule IV increments such that %iv and %iv.next never need to overlap. This is fairly fundamental to LSRs cost model. LSR assumes that an addrec can be represented with a single register. If %iv and %iv.next have to overlap, then that assumption does not hold. The bug - which this patch is fixing - is that LSR only does this scheduling for IVs which it inserts, but it's cost model assumes the same for existing IVs that it reuses. It will rewrite existing IV users such that the no-overlap property holds, but will not actually reschedule said IV increment. As you can see from the relatively lack of test updates, this doesn't actually impact codegen much. The main reason for doing it is to make a follow up patch series which improves post-increment use and scheduling easier to follow. Differential Revision: https://reviews.llvm.org/D97219	2021-03-03 12:07:55 -08:00
Fangrui Song	a84f4fc0df	[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF `__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not referenced via relocation in the translation unit. With `-z start-stop-gc` (LLD 13 (D96914); GNU ld 2.37 https://sourceware.org/bugzilla/show_bug.cgi?id=27451), the linker does not let `__start_/__stop_` references retain their sections. Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make them retained by the linker. This patch changes most existing `UsedVars` cases to `CompilerUsedVars` to reflect the ideal state - if the binary format properly supports section based GC (dead stripping), `llvm.compiler.used` should be sufficient. `__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars` since we want them to be unconditionally retained by both compiler and linker. Behaviors on COFF/Mach-O are not affected. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D97649	2021-03-03 11:32:24 -08:00
Arnold Schwaighofer	a42bea211a	[coro async] Allow a coro.suspend.async to specify which argument is the context argument Before we used the same argument as the entry point. The resume partial function might want to use a different ABI for its context argument Differential Revision: https://reviews.llvm.org/D97333	2021-03-03 08:27:37 -08:00
Nico Weber	64f5d7e972	Revert "[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF" This reverts commit `04c3040f41`. Breaks instrprof-value-merge.c in bootstrap builds.	2021-03-03 10:21:17 -05:00
Hans Wennborg	0a5dd06718	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR" This caused miscompiles of Chromium tests for iOS due clobbering of live registers. See discussion on the code review for details. > Background: > > This fixes a longstanding problem where llvm breaks ARC's autorelease > optimization (see the link below) by separating calls from the marker > instructions or retainRV/claimRV calls. The backend changes are in > https://reviews.llvm.org/D92569. > > https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue > > What this patch does to fix the problem: > > - The front-end adds operand bundle "clang.arc.attachedcall" to calls, > which indicates the call is implicitly followed by a marker > instruction and an implicit retainRV/claimRV call that consumes the > call result. In addition, it emits a call to > @llvm.objc.clang.arc.noop.use, which consumes the call result, to > prevent the middle-end passes from changing the return type of the > called function. This is currently done only when the target is arm64 > and the optimization level is higher than -O0. > > - ARC optimizer temporarily emits retainRV/claimRV calls after the calls > with the operand bundle in the IR and removes the inserted calls after > processing the function. > > - ARC contract pass emits retainRV/claimRV calls after the call with the > operand bundle. It doesn't remove the operand bundle on the call since > the backend needs it to emit the marker instruction. The retainRV and > claimRV calls are emitted late in the pipeline to prevent optimization > passes from transforming the IR in a way that makes it harder for the > ARC middle-end passes to figure out the def-use relationship between > the call and the retainRV/claimRV calls (which is the cause of > PR31925). > > - The function inliner removes an autoreleaseRV call in the callee if > nothing in the callee prevents it from being paired up with the > retainRV/claimRV call in the caller. It then inserts a release call if > claimRV is attached to the call since autoreleaseRV+claimRV is > equivalent to a release. If it cannot find an autoreleaseRV call, it > tries to transfer the operand bundle to a function call in the callee. > This is important since the ARC optimizer can remove the autoreleaseRV > returning the callee result, which makes it impossible to pair it up > with the retainRV/claimRV call in the caller. If that fails, it simply > emits a retain call in the IR if retainRV is attached to the call and > does nothing if claimRV is attached to it. > > - SCCP refrains from replacing the return value of a call with a > constant value if the call has the operand bundle. This ensures the > call always has at least one user (the call to > @llvm.objc.clang.arc.noop.use). > > - This patch also fixes a bug in replaceUsesOfNonProtoConstant where > multiple operand bundles of the same kind were being added to a call. > > Future work: > > - Use the operand bundle on x86-64. > > - Fix the auto upgrader to convert call+retainRV/claimRV pairs into > calls with the operand bundles. > > rdar://71443534 > > Differential Revision: https://reviews.llvm.org/D92808 This reverts commit `ed4718eccb`.	2021-03-03 15:51:40 +01:00
Jianzhou Zhao	ac4c1760b2	Fix the build error caused by D97570	2021-03-03 04:47:00 +00:00
Jianzhou Zhao	d866b9c99d	[dfsan] Propagate origin tracking at load This is a part of https://reviews.llvm.org/D95835. One issue is about origin load optimization: see the comments of useCallbackLoadLabelAndOrigin @gbalats This change may have some conflicts with your 8bit change. PTAL the change at visitLoad. Reviewed By: morehouse, gbalats Differential Revision: https://reviews.llvm.org/D97570	2021-03-03 04:32:30 +00:00
George Balatsouras	6ff18b08e6	[dfsan] Fix clang-tidy warnings This addresses ~50 clang-tidy warnings on dfsan instrumentation pass. It also contains some refactoring (all non-functional changes) to eliminate some variables and simplify code. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D97714	2021-03-02 17:37:45 -08:00
Andrei Elovikov	b24afec8ae	[NFCI][VPlan] Modify Recipes' print methods to honor Indent parameter Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D97787	2021-03-02 15:32:10 -08:00
Nikita Popov	3d8f842712	[LICM] Make promotion faster Even when MemorySSA-based LICM is used, an AST is still populated for scalar promotion. As the AST has quadratic complexity, a lot of time is spent in this step despite the existing access count limit. This patch optimizes the identification of promotable stores. The idea here is pretty simple: We're only interested in must-alias mod sets of loop invariant pointers. As such, only populate the AST with loop-invariant loads and stores (anything else is definitely not promotable) and then discard any sets which alias with any of the remaining, definitely non-promotable accesses. If we promoted something, check whether this has made some other accesses loop invariant and thus possible promotion candidates. This is much faster in practice, because we need to perform AA queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable) instead of O(NumTotal^2), and NumPromotable tends to be small. Additionally, promotable accesses have loop invariant pointers, for which AA is cheaper. This has a signicant positive compile-time impact. We save ~1.8% geomean on CTMark at O3, with 6% on lencod in particular and 25% on individual files. Conceptually, this change is NFC, but may not be so in practice, because the AST is only an approximation, and can produce different results depending on the order in which accesses are added. However, there is at least no impact on the number of promotions (licm.NumPromoted) in test-suite O3 configuration with this change. Differential Revision: https://reviews.llvm.org/D89264	2021-03-02 22:10:48 +01:00
Simon Pilgrim	232f32f0da	[DSE] eliminateDeadStoresMemorySSA - fix "initialization is never read" clang-tidy warning. NFCI.	2021-03-02 15:01:33 +00:00
Alexey Bataev	a054e94e9e	[SLP]Merge reorder and reuse shuffles. It is possible to merge reuse and reorder shuffles and reduce the total cost of the vectorization tree/number of final instructions. Differential Revision: https://reviews.llvm.org/D94992	2021-03-02 06:39:47 -08:00
Juneyoung Lee	365f5e2475	[JumpThreading] Fix tryToUnfoldSelectInCurrBB to treat and/or and its select form equally This is a minor fix to update tryToUnfoldSelectInCurrBB to ignore select form of and/ors because the function does not look into binops as well	2021-03-02 18:35:18 +09:00
Ta-Wei Tu	ea1a1ebbc6	[NFC] Use std::swap in LoopInterchange	2021-03-02 11:42:48 +08:00
Fangrui Song	04c3040f41	[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF `__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not referenced via relocation in the translation unit. With `-z start-stop-gc` (D96914 https://sourceware.org/bugzilla/show_bug.cgi?id=27451), the linker no longer lets `__start_/__stop_` references retain them. Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make them retained by the linker. This patch changes most existing `UsedVars` cases to `CompilerUsedVars` to reflect the ideal state - if the binary format properly supports section based GC (dead stripping), `llvm.compiler.used` should be sufficient. `__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars` since we want them to be unconditionally retained by both compiler and linker. Behaviors on other COFF/Mach-O are not affected. Differential Revision: https://reviews.llvm.org/D97649	2021-03-01 13:43:23 -08:00
Arthur Eubanks	040c1b49d7	Move EntryExitInstrumentation pass location This seems to be more of a Clang thing rather than a generic LLVM thing, so this moves it out of LLVM pipelines and as Clang extension hooks into LLVM pipelines. Move the post-inline EEInstrumentation out of the backend pipeline and into a late pass, similar to other sanitizer passes. It doesn't fit into the codegen pipeline. Also fix up EntryExitInstrumentation not running at -O0 under the new PM. PR49143 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D97608	2021-03-01 10:08:10 -08:00
Florian Hahn	a6c81d3366	[VPlan] Remove recipes from back to front. Update the deletion order when destroying VPBasicBlocks. This ensures recipes that depend on earlier ones in the block are removed first. Otherwise this may cause issues when recipes have remaining users later in the block.	2021-03-01 16:06:30 +00:00
Florian Hahn	53dacb7b67	[LV] Generate RT checks up-front and remove them if required. This patch updates LV to generate the runtime checks just after cost modeling, to allow a more precise estimate of the actual cost of the checks. This information will be used in future patches to generate larger runtime checks in cases where the checks only make up a small fraction of the expected scalar loop execution time. The runtime checks are created up-front in a temporary block to allow better estimating the cost and un-linked from the existing IR. After deciding to vectorize, the checks are moved backed. If deciding not to vectorize, the temporary block is completely removed. This patch is similar in spirit to D71053, but explores a different direction: instead of delaying the decision on whether to vectorize in the presence of runtime checks it instead optimistically creates the runtime checks early and discards them later if decided to not vectorize. This has the advantage that the cost-modeling decisions can be kept together and can be done up-front and thus preserving the general code structure. I think delaying (part) of the decision to vectorize would also make the VPlan migration a bit harder. One potential drawback of this patch is that we speculatively generate IR which we might have to clean up later. However it seems like the code required to do so is quite manageable. Reviewed By: lebedev.ri, ebrevnov Differential Revision: https://reviews.llvm.org/D75980	2021-03-01 10:48:04 +00:00
Juneyoung Lee	5419b67137	[SimplifyCFG] Update FoldTwoEntryPHINode to handle and/or of select and binop equally This is a minor change that fixes FoldTwoEntryPHINode to handle phis with and/ors of select form and binop form equally.	2021-03-01 13:34:51 +09:00
Kazu Hirata	d639120983	[llvm] Use set_is_subset (NFC)	2021-02-28 10:59:20 -08:00
Sanjay Patel	9502061bcc	[InstCombine] avoid infinite loop in demanded bits for select https://llvm.org/PR49205	2021-02-28 10:17:53 -05:00
William S. Moses	b077d82b00	[Attributor] Conditinoally delete fns Allow the attributor to delete functions only if requested Differential Revision: https://reviews.llvm.org/D97238	2021-02-27 20:37:42 -05:00
Sanjay Patel	356cdabd3a	[SimplifyCFG] avoid illegal phi with both poison and undef In the example based on: https://llvm.org/PR49218 ...we are crashing because poison is a subclass of undef, so we merge blocks and create: PHI node has multiple entries for the same basic block with different incoming values! %k3 = phi i64 [ poison, %entry ], [ %k3, %g ], [ undef, %entry ] If both poison and undef values are incoming, we soften the poison values to undef. Differential Revision: https://reviews.llvm.org/D97495	2021-02-27 09:10:32 -05:00
Kazu Hirata	1d4a2f3778	[Transforms/Utils] Use range-based for loops (NFC)	2021-02-26 22:36:40 -08:00
Fangrui Song	bf176c49e8	[InstrProfiling] Use llvm.compiler.used instead of llvm.used for ELF Many optimizers (e.g. GlobalOpt/ConstantMerge) do not respect linker semantics for comdat and may not discard the sections as a unit. The interconnected `__llvm_prf_{cnts,data}` sections (in comdat for ELF) are similar to D97432: `__profd_` is not directly referenced, so `__profd_` may be discarded while `__profc_` is retained, breaking the interconnection. We currently conservatively add all such sections to `llvm.used` and let the linker do GC for ELF. In D97448, we will change GlobalObject's in the llvm.used list to use SHF_GNU_RETAIN, causing the metadata sections to be unnecessarily retained (some `check-profile` tests check for GC). Use `llvm.compiler.used` to retain the current GC behavior. Differential Revision: https://reviews.llvm.org/D97585	2021-02-26 16:14:03 -08:00
George Balatsouras	c9075a1c8e	[dfsan] Record dfsan metadata in globals This will allow identifying exactly how many shadow bytes were used during compilation, for when fast8 mode is introduced. Also, it will provide a consistent matching point for instrumentation tests so that the exact llvm type used (i8 or i16) for the shadow can be replaced by a pattern substitution. This is handy for tests with multiple prefixes. Reviewed by: stephan.yichao.zhao, morehouse Differential Revision: https://reviews.llvm.org/D97409	2021-02-26 14:42:46 -08:00
Jianzhou Zhao	a47d435bc4	[dfsan] Propagate origins for callsites This is a part of https://reviews.llvm.org/D95835. Each customized function has two wrappers. The first one dfsw is for the normal shadow propagation. The second one dfso is used when origin tracking is on. It calls the first one, and does additional origin propagation. Which one to use can be decided at instrumentation time. This is to ensure minimal additional overhead when origin tracking is off. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97483	2021-02-26 19:12:03 +00:00
Fangrui Song	b55f29c194	[SanitizerCoverage] Clarify llvm.used/llvm.compiler.used and partially fix unmatched metadata sections on Windows `__sancov_pcs` parallels the other metadata section(s). While some optimizers (e.g. GlobalDCE) respect linker semantics for comdat and retain or discard the sections as a unit, some (e.g. GlobalOpt/ConstantMerge) do not. So we have to conservatively retain all unconditionally in the compiler. When a comdat is used, the COFF/ELF linkers' GC semantics ensure the associated parallel array elements are retained or discarded together, so `llvm.compiler.used` is sufficient. Otherwise (MachO (see rL311955/rL311959), COFF special case where comdat is not used), we have to use `llvm.used` to conservatively make all sections retain by the linker. This will fix the Windows problem once internal linkage GlobalObject's in `llvm.used` are retained via `/INCLUDE:`. Reviewed By: morehouse, vitalybuka Differential Revision: https://reviews.llvm.org/D97432	2021-02-26 11:10:03 -08:00
Simon Pilgrim	455d43b951	[Utils] collectBitParts - bail for integers > 128-bits collectBitParts uses int8_t for the bit indices, leaving a 128-bit limit. We already test for this before calling collectBitParts, but rGb94c215592bd added truncate handling which meant we could end up processing wider integers. Thanks to @manojgupta for the repro.	2021-02-26 14:58:01 +00:00
Stephen Tozer	ec7b9b0c18	[InstCombine] Avoid redundant or out-of-order debug value sinking This patch modifies TryToSinkInstruction in the InstCombine pass, to prevent redundant debug intrinsics from being produced, and also prevent the intrinsics from being emitted in an incorrect order. It does this by ensuring that when this pass sinks an instruction and creates clones of the debug intrinsics that use that instruction, it inserts those debug intrinsics in their original order, and only inserts the last debug intrinsic for each variable in the Instruction's block. Differential revision: https://reviews.llvm.org/D95463	2021-02-26 13:04:33 +00:00
Evgeniy Brevnov	13a5cac2ba	Revert "[NARY-REASSOCIATE] Support reassociation of min/max" This reverts commit `83d134c3c4`.	2021-02-26 19:47:54 +07:00
Kazu Hirata	5fc9e30985	[Scalar] Use range-based for loops (NFC)	2021-02-25 19:54:38 -08:00
Jianzhou Zhao	c88fedef2a	[dfsan] Conservative solution to atomic load/store DFSan at store does store shadow data; store app data; and at load does load shadow data; load app data. When an application data is atomic, one overtainting case is thread A: load shadow thread B: store shadow thread B: store app thread A: load app If the application address had been used by other flows, thread A reads previous shadow, causing overtainting. The change is similar to MSan's solution. 1) enforce ordering of app load/store 2) load shadow after load app; store shadow before shadow app 3) do not track atomic store by reseting its shadow to be 0. The last one is to address a case like this. Thread A: load app Thread B: store shadow Thread A: load shadow Thread B: store app This approach eliminates overtainting as a trade-off between undertainting flows via shadow data race. Note that this change addresses only native atomic instructions, but does not support builtin libcalls yet. https://llvm.org/docs/Atomics.html#libcalls-atomic Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97310	2021-02-25 23:34:58 +00:00
James Y Knight	24539f1ef2	Add Alignment argument to IRBuilder CreateAtomicRMW and CreateAtomicCmpXchg. And then push those change throughout LLVM. Keep the old signature in Clang's CGBuilder for now -- that will be updated in a follow-on patch (D97224). The MLIR LLVM-IR dialect is not updated to support the new alignment attribute, but preserves its existing behavior. Differential Revision: https://reviews.llvm.org/D97223	2021-02-25 18:29:42 -05:00
Francis Visoiu Mistrih	fee9abe69c	[Remarks] Provide more information about auto-init calls This now analyzes calls to both intrinsics and functions. For intrinsics, grab the ones we know and care about (mem* family) and analyze the arguments. For calls, use TLI to get more information about the libcalls, then analyze the arguments if known. ``` auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes. [-Rpass-missed=annotation-remarks] int var[1024]; ^ ``` Differential Revision: https://reviews.llvm.org/D97489	2021-02-25 15:14:09 -08:00
Francis Visoiu Mistrih	4753a69a31	[Remarks] Provide more information about auto-init stores This adds support for analyzing the instruction with the !annotation "auto-init" in order to generate a more user-friendly remark. For now, support the store size, and whether it's atomic/volatile. Example: ``` auto-init.c:4:7: remark: Store inserted by -ftrivial-auto-var-init.Store size: 4 bytes. [-Rpass-missed=annotation-remarks] int var; ^ ``` Differential Revision: https://reviews.llvm.org/D97412	2021-02-25 15:14:09 -08:00
Francis Visoiu Mistrih	c49b600b2f	[Remarks] Emit remarks for "auto-init" !annotations Using the !annotation metadata, emit remarks pointing to code added by `-ftrivial-auto-var-init` that survived the optimizer. Example: ``` auto-init.c:4:7: remark: Initialization inserted by -ftrivial-auto-var-init. [-Rpass-missed=annotation-remarks] int buf[1024]; ^ ``` The tests are testing various situations like calls/stores/other instructions, with debug locations, and extra debug information on purpose: more patches will come to improve the reporting to make it more user-friendly, and these tests will show how the reporting evolves. Differential Revision: https://reviews.llvm.org/D97405	2021-02-25 15:14:09 -08:00
Adrian Prantl	1693180884	Add a nullptr check. This doesn't actually reproduce with a dbg.declare(i8* null, ...) which produces a non-null null Value, but I have seen this show up in crash logs. I'm suspecting that there may be another pass forcibly setting the operand to a nullptr.	2021-02-25 12:01:11 -08:00
Fangrui Song	4d63892acb	[SanitizerCoverage] Drop !associated on metadata sections In SanitizerCoverage, the metadata sections (`__sancov_guards`, `__sancov_cntrs`, `__sancov_bools`) are referenced by functions. After inlining, such a `__sancov_*` section can be referenced by more than one functions, but its sh_link still refers to the original function's section. (Note: a SHF_LINK_ORDER section referenced by a section other than its linked-to section violates the invariant.) If the original function's section is discarded (e.g. LTO internalization + `ld.lld --gc-sections`), ld.lld may report a `sh_link points to discarded section` error. This above reasoning means that `!associated` is not appropriate to be called by an inlinable function. Non-interposable functions are inline candidates, so we have to drop `!associated`. A `__sancov_pcs` is not referenced by other sections but is expected to parallel a metadata section, so we have to make sure the two sections are retained or discarded at the same time. A section group does the trick. (Note: we have a module ctor, so `getUniqueModuleId` guarantees to return a non-empty string, and `GetOrCreateFunctionComdat` guarantees to return non-null.) For interposable functions, we could keep using `!associated`, but LTO can change the linkage to `internal` and allow such functions to be inlinable, so we have to drop `!associated`, too. To not interfere with section group resolution, we need to use the `noduplicates` variant (section group flag 0). (This allows us to get rid of the ModuleID parameter.) In -fno-pie and -fpie code (mostly dso_local), instrumented interposable functions have WeakAny/LinkOnceAny linkages, which are rare. So the section group header overload should be low. This patch does not change the object file output for COFF (where `!associated` is ignored). Reviewed By: morehouse, rnk, vitalybuka Differential Revision: https://reviews.llvm.org/D97430	2021-02-25 11:59:23 -08:00
Jon Roelofs	7f6e331645	Support `#pragma clang section` directives on MachO targets rdar://59560986 Differential Revision: https://reviews.llvm.org/D97233	2021-02-25 09:30:10 -08:00
Rong Xu	6103b6ad69	[SampleFDO][NFC] Refactor: make SampleProfileLoaderBaseImpl a template class This patch makes SampleProfileLoaderBaseImpl a template class so it can be used in CodeGen transformation. Noticeable changes: * use one template parameter and use IRTraits to get other used types an type specific functions. * remove the temporary "inline" keywords in previous refactor patch. * change the template function findEquivalencesFor to a regular function. This function has a single caller with type of PostDominatorTree. It's simpler to use the type directly because MachinePostDominatorTree is not a derived type of template DominatorTreeBase. Differential Revision: https://reviews.llvm.org/D96981	2021-02-25 08:26:17 -08:00
Evgeniy Brevnov	d0a6f8bb65	[NFC] Fix build failure after `83d134c3c4`	2021-02-25 18:43:00 +07:00
Evgeniy Brevnov	83d134c3c4	[NARY-REASSOCIATE] Support reassociation of min/max Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88287	2021-02-25 18:22:39 +07:00
Xun Li	c38000a9fb	[Coroutine] Check indirect uses of alloca when checking lifetime info In the existing logic, we look at the lifetime.start marker of each alloca, and check all uses of the alloca, to see if any pair of the lifetime marker and an use of alloca crosses suspension point. This approach is unfortunately incorrect. An use of alloca does not need to be a direct use, but can be an indirect use through alias. Only checking direct uses can miss cases where indirect uses are crossing suspension point. This can be demonstrated in the newly added test case 007. In the test case, both x and y are only directly used prior to suspend, but they are captured into an alias, merged through a PHINode (so they couldn't be materialized), and used after CoroSuspend. If we only check whether the lifetime starts cross suspension points with direct uses, we will put the allocas to the stack, and then capture their addresses in the frame. Instead of fixing it in D96441 and D96566, this patch takes a different approach which I think is better. We still checks the lifetime info in the same way as before, but with two differences: 1. The collection of liftime.start is moved into AllocaUseVisitor to make the logic more concentrated. 2. When looking at lifetime.start and use pairs, we not only checks the direct uses as before, but in this patch we check all uses collected by AllocaUseVisitor, which would include all indirect uses through alias. This will make the analysis more accurate without throwing away the lifetime optimization. Differential Revision: https://reviews.llvm.org/D96922	2021-02-24 18:29:23 -08:00
Sanjay Patel	a7cee55762	[InstCombine] fold fdiv with powi divisor (PR49147) This extends `b40fde062c` for the especially non-standard powi pattern. We want to avoid being completely wrong on the negation-of-int-min corner case, so I'm adding an extra FMF check for 'ninf' assuming that gives us the flexibility to handle that possibility. https://llvm.org/PR49147	2021-02-24 16:44:36 -05:00
Sanjay Patel	868d43fbd6	[InstCombine] add helper for x/pow(); NFC We at least want to add powi to this list, so split it off into a switch to reduce code duplication.	2021-02-24 16:44:36 -05:00
Duncan P. N. Exon Smith	01701646d5	Transforms: Clone distinct nodes in metadata mapper unless RF_ReuseAndMutateDistinctMDs This is a follow up to `22a52dfddc` and a revert of `df763188c9`. With this change, we only skip cloning distinct nodes in MDNodeMapper::mapDistinct if RF_ReuseAndMutateDistinctMDs, dropping the no-longer-needed local helper `cloneOrBuildODR()`. Skipping cloning in other cases is unsound and breaks CloneModule, which is why the textual IR for PR48841 didn't pass previously. This commit adds the test as: Transforms/ThinLTOBitcodeWriter/cfi-debug-info-cloned-type-references-global-value.ll Cloning less often exposed a hole in subprogram cloning in CloneFunctionInto thanks to df763188c9a1ecb1e7e5c4d4ea53a99fbb755903's test ThinLTO/X86/Inputs/dicompositetype-unique-alias.ll. If a function has a subprogram attachment whose scope is a DICompositeType that shouldn't be cloned, but it has no internal debug info pointing at that type, that composite type was being cloned. This commit plugs that hole, calling DebugInfoFinder::processSubprogram from CloneFunctionInto. As hinted at in 22a52dfddcefad4f275eb8ad1cc0e200074c2d8a's commit message, I think we need to formalize ownership of metadata a bit more so that ValueMapper/CloneFunctionInto (and similar functions) can deal with cloning (or not) metadata in a more generic, less fragile way. This fixes PR48841. Differential Revision: https://reviews.llvm.org/D96734	2021-02-24 12:57:52 -08:00
Sander de Smalen	5e19208d96	[InstructionCost] NFC: Fix up missing cases in LoopVectorize and CodeGenPrep. This fixes the types of a few more cost variables to be of type InstructionCost.	2021-02-24 14:30:03 +00:00
Pierre Gousseau	27830bc2b1	[asan] Avoid putting globals in a comdat section when targetting elf. Putting globals in a comdat for dead-stripping changes the semantic and can potentially cause false negative odr violations at link time. If odr indicators are used, we keep the comdat sections, as link time odr violations will be dectected for the odr indicator symbols. This fixes PR 47925	2021-02-24 12:01:56 +00:00
Simon Pilgrim	b94c215592	[Utils] collectBitParts - add truncate() handling	2021-02-24 11:48:34 +00:00
Florian Hahn	6240f436dd	Recommit "[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends." This reverts the revert commit `437f0bbcd5`. It adds a new toVPRecipeResult, which forces VPRecipeOrVPValueTy to be constructed with a VPRecipeBase *. This should address ambiguous constructor issues for recipe sub-types that also inherit from VPValue.	2021-02-24 10:36:02 +00:00
Dan Liew	7d3ef103b5	[ASan] Introduce a way set different ways of emitting module destructors. Previously there was no way to control how module destructors were emitted by `ModuleAddressSanitizerPass`. However, we want language frontends (e.g. Clang) to be able to decide how to emit these destructors (if at all). This patch introduces the `AsanDtorKind` enum that represents the different ways destructors can be emitted. There are currently only two valid ways to emit destructors. * `Global` - Use `llvm.global_dtors`. This was the previous behavior and is the default. * `None` - Do not emit module destructors. The `ModuleAddressSanitizerPass` and the various wrappers around it have been updated to take the `AsanDtorKind` as an argument. The `-asan-destructor-kind=` command line argument has been introduced to make this easy to test from `opt`. If this argument is specified it overrides the value passed to the `ModuleAddressSanitizerPass` constructor. Note that `AsanDtorKind` is not `bool` because we will introduce a new way to emit destructors in a subsequent patch. Note that `AsanDtorKind` is given its own header file because if it is declared in `Transforms/Instrumentation/AddressSanitizer.h` it leads to compile error (Module is ambiguous) when trying to use it in `clang/Basic/CodeGenOptions.def`. rdar://71609176 Differential Revision: https://reviews.llvm.org/D96571	2021-02-23 20:01:21 -08:00
Juneyoung Lee	56d228a14e	[SimplifyCFG] Update passingValueIsAlwaysUndefined to check more attributes This is a simple patch to update SimplifyCFG's passingValueIsAlwaysUndefined to inspect more attributes. A new function `CallBase::isPassingUndefUB` checks attributes that imply noundef. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D97244	2021-02-24 10:40:50 +09:00
Fangrui Song	ef312951fd	collectUsedGlobalVariables: migrate SmallPtrSetImpl overload to SmallVecImpl overload after D97128 And delete the SmallPtrSetImpl overload. While here, decrease inline element counts from 8 to 4. See D97128 for the choice. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D97257	2021-02-23 16:09:06 -08:00
Fangrui Song	ed02f52d28	Fix unstable SmallPtrSet iteration issues due to collectUsedGlobalVariables While here, decrease inline element counts from 8 to 4. See D97128 for the choice. Depends on D97128 (which added a new SmallVecImpl overload for collectUsedGlobalVariables). Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D97139	2021-02-23 16:09:05 -08:00
Fangrui Song	3adb89bb9f	[ThinLTO] Make cloneUsedGlobalVariables deterministic Iterating on `SmallPtrSet<GlobalValue *, 8>` with more than 8 elements is not deterministic. Use a SmallVector instead because `Used` is guaranteed to contain unique elements. While here, decrease inline element counts from 8 to 4. The number of `llvm.used`/`llvm.compiler.used` elements is usually 0 or 1. For full LTO/hybrid LTO, the number may be large, so we need to be careful. According to tejohnson's analysis https://reviews.llvm.org/D97128#2582399 , 4 is good for a large project with WholeProgramDevirt, when available_externally vtables are placed in the llvm.compiler.used set. Differential Revision: https://reviews.llvm.org/D97128	2021-02-23 16:09:05 -08:00
Teresa Johnson	0a5949dcfa	[WPD] Fix handling of pure virtual base class The fix in `3c4c205060` caused an assert in the case of a pure virtual base class. In that case, the vTableFuncs list on the summary will be empty, so we were hitting the new assert that the linkage type was not available_externally. In the case of pure virtual, we do not want to assert, and additionally need to set VS so that we don't treat it conservatively and quit the analysis of the type id early. This exposed a pre-existing issue where we were not updating the vcall visibility on pure virtual functions when whole program visibility was specified. We were skipping updating the visibility on any global vars that didn't have any vTableFuncs, which meant all pure virtual were not updated, and the later analysis would block any devirtualization of calls that had a type id used on those pure virtual vtables (see the handling in the other code modified in this patch). Simply remove that check. It will mean that we may update the vcall visibility on global vars that aren't vtables, but that setting is ignored for any global vars that didn't have type metadata anyway. Added a new test case that asserted without removing the assert, and that requires the other fixes in this patch (updateVCallVisibilityInIndex and not skipping all vtables without virtual funcs) to get a successful devirtualization with index-only WPD. I added cases to test hybrid and regular LTO for completeness, although those already worked without the fixes here. With this final fix, a clang multistage bootstrap with WPD builds and runs all tests successfully. Differential Revision: https://reviews.llvm.org/D97126	2021-02-23 16:07:09 -08:00
Jianzhou Zhao	a05aa0dd5e	[dfsan] Update memset and dfsan_(set\|add)_label with origin tracking This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97302	2021-02-23 23:16:33 +00:00
Matthew Voss	6da7d31416	[llvm-profdata] Emit Error when Invalid MemOpSize Section is Created by llvm-profdata Under certain (currently unknown) conditions, llvm-profdata is outputting profiles that have two consecutive entries in the MemOPSize section for the value 0. This causes the PGOMemOPSizeOpt pass to output an invalid switch instruction with two cases for 0. As mentioned, we’re not quite sure what’s causing this to happen, but this patch prevents llvm-profdata from outputting a profile that has this problem and gives an error with a request for a reproducible. Differential Revision: https://reviews.llvm.org/D92074	2021-02-23 12:51:54 -08:00
Andrei Elovikov	3605b873f6	[NFC][VPlan] Use VPUser to store block's predicate Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96529	2021-02-23 11:08:27 -08:00
Florian Hahn	de40423c85	[LV] Ensure fixNonInductionPHIs uses a valid insertion point. In some cases, Builder's insertion point may be invalidated before using it in VPTransformState::get. Make sure the insertion point is up-to-date. This should fix various sanitizer errors, like https://lab.llvm.org/buildbot/#/builders/5/builds/4933/steps/9/logs/stdio	2021-02-23 18:51:05 +00:00
Florian Hahn	437f0bbcd5	Revert "[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends." This reverts commit `4efa097eb4`, because some the compilers used for some bots do not support automatic conversions to PointerUnion.	2021-02-23 16:57:21 +00:00
Florian Hahn	4efa097eb4	[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends. Generalize the return value of tryToCreateWidenRecipe to return either a newly create recipe or an existing VPValue. Use this to avoid creating unnecessary VPBlendRecipes. Fixes PR44800.	2021-02-23 16:52:03 +00:00
Juneyoung Lee	19c2e12947	[JumpThreading] Update computeValueKnownInPredecessors to recognize logical and/or patterns This allows JumpThreading's computeValueKnownInPredecessors to recognize select form of and/or patterns as well.	2021-02-24 00:06:10 +09:00
Nate Chandler	01b4890e47	Add @llvm.coro.async.size.replace intrinsic. The new intrinsic replaces the size in one specified AsyncFunctionPointer with the size in another. This ability is necessary for functions which merely forward to async functions such as those defined for partial applications. Reviewed By: aschwaighofer Differential Revision: https://reviews.llvm.org/D97229	2021-02-23 06:43:52 -08:00
David Green	dd2dbf7ee2	[TTI] Change getOperandsScalarizationOverhead to take Type args As a followup to D95291, getOperandsScalarizationOverhead was still using a VF as a vector factor if the arguments were scalar, and would assert on certain matrix intrinsics with differently sized vector arguments. This patch removes the VF arg, instead passing the Types through directly. This should allow it to more accurately compute the cost without having to guess at which operands will be vectorized, something difficult with more complex intrinsics. This adjusts one SVE test as it is now calling the wrong intrinsic vs veccall. Without invalid InstructCosts the cost of the scalarized intrinsic is too low. This should get fixed when the cost of scalarization is accounted for with scalable types. Differential Revision: https://reviews.llvm.org/D96287	2021-02-23 13:04:59 +00:00
David Green	bd4b61efbd	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-23 13:03:26 +00:00
Matteo Favaro	633e090528	[DSE] Allow ptrs defined in the entry block in IsGuaranteedLoopInvariant. The IsGuaranteedLoopInvariant function is making sure to check if the incoming pointer is guaranteed to be loop invariant, therefore I think the case where the pointer is defined in the entry block of a function automatically guarantees the pointer to be loop invariant, as the entry block of a function cannot have predecessors or be part of a loop. I implemented this small patch and tested it using ninja check-llvm-unit and ninja check-llvm. I added a contained test file that shows the problem and used opt -O3 -debug on it to make sure the case is not currently handled (in fact the debug log is showing that the DSE pass is bailing out when testing if the killer store is able to clobber the dead store). Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96979	2021-02-23 12:00:44 +00:00
Juneyoung Lee	481c62277d	[BuildLibCalls] Add noundef to allocator fns' size This is a patch to explicitly mark the size parameter of allocator functions like malloc/realloc/... as noundef. For C/C++: undef can be created from reading an uninitialized variable or padding. Calling a function with uninitialized variable is already UB. Calling malloc with padding value is.. something that's not expected. Padding bits may appear in a coerced aggregate, which doesn't apply to malloc's size. Therefore, malloc's size can be marked as noundef. For transformations that introduce malloc/realloc/..: I ran LLVM unit tests with an updated Alive2 semantics, and found no regression, so it seems okay. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97045	2021-02-23 13:58:03 +09:00
Kazu Hirata	4ed47858ab	[llvm] Use llvm::drop_begin (NFC)	2021-02-22 20:17:16 -08:00
ksyx	4125cabce1	[GVN] Fix a typo in comment NFC. Differential Revision: https://reviews.llvm.org/D97200 Reviewed By: fhahn	2021-02-23 10:39:34 +08:00
Jianzhou Zhao	7424efd5ad	[dfsan] Propagate origins at non-memory/phi/call instructions This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97200	2021-02-23 02:12:45 +00:00
Petr Hosek	c24b7a16b1	[InstrProfiling] Use ELF section groups for counters, data and values __start_/__stop_ references retain C identifier name sections such as __llvm_prf_*. Putting these into a section group disables this logic. The ELF section group semantics ensures that group members are retained or discarded as a unit. When a function symbol is discarded, this allows allows linker to discard counters, data and values associated with that function symbol as well. Note that `noduplicates` COMDAT is lowered to zero-flag section group in ELF. We only set this for functions that aren't already in a COMDAT and for those that don't have available_externally linkage since we already use regular COMDAT groups for those. Differential Revision: https://reviews.llvm.org/D96757	2021-02-22 14:00:02 -08:00
Alexey Bataev	9a4dd4de9d	[SLP]No need to mark scatter load pointer as scalar as it gets vectorized. Pointer operand of scatter loads does not remain scalar in the tree (it gest vectorized) and thus must not be marked as the scalar that remains scalar in vectorized form. Differential Revision: https://reviews.llvm.org/D96818	2021-02-22 11:58:28 -08:00
Petr Hosek	4827492d9f	Revert "[InstrProfiling] Use ELF section groups for counters, data and values" This reverts commits: `5ca21175e0` `97184ab99c` The instrprof-gc-sections.c is failing on AArch64 LLD bot.	2021-02-22 11:13:55 -08:00
Florian Hahn	95daec6a84	[ConstraintElimination] Use unsigned > 0 instead of != 0. ICMP_NE predicates cannot be directly represented as constraint. But we can use ICMP_UGT instead ICMP_NE for %x != 0. See https://alive2.llvm.org/ce/z/XlLCsW	2021-02-22 17:54:36 +00:00
Nikita Popov	4125afc357	[MemCpyOpt] Fix handling of readnone byval arguments If the call is readnone, then there may not be any MemoryAccess associated with the call. Bail out in that case. This fixes the issue reported at https://reviews.llvm.org/D94376#2578312.	2021-02-22 18:48:31 +01:00
Nikita Popov	5e7e499b91	[JumpThreading] Clone noalias.scope.decl when threading blocks When cloning instructions during jump threading, also clone and adapt any declared scopes. This is primarily important when threading loop exits, because we'll end up with two dominating scope declarations in that case (at least after additional loop rotation). This addresses a loose thread from https://reviews.llvm.org/rG2556b413a7b8#975012. Differential Revision: https://reviews.llvm.org/D97154	2021-02-22 18:35:30 +01:00
Florian Hahn	c7ee57f1dc	[LV] Directly use incoming value for single VPBlendRecipes. VPBlendRecipes with single incoming (value, mask) pair are no-ops. Use the incoming value directly.	2021-02-22 16:10:08 +00:00
Florian Hahn	c11fd0df64	[VPlan] Skip VPWidenPHIRecipe in VPInterleavedACcessInfo. Update unit tests that did not expect VPWidenPHIRecipes after `15a74b64df`.	2021-02-22 10:35:09 +00:00
Florian Hahn	15a74b64df	[VPlan] Manage pairs of incoming (VPValue, VPBB) in VPWidenPHIRecipe. This patch extends VPWidenPHIRecipe to manage pairs of incoming (VPValue, VPBasicBlock) in the VPlan native path. This is made possible because we now directly manage defined VPValues for recipes. By keeping both the incoming value and block in the recipe directly, code-generation in the VPlan native path becomes independent of the predecessor ordering when fixing up non-induction phis, which currently can cause crashes in the VPlan native path. This fixes PR45958. Reviewed By: sguggill Differential Revision: https://reviews.llvm.org/D96773	2021-02-22 09:44:25 +00:00
Petr Hosek	5ca21175e0	[InstrProfiling] Use ELF section groups for counters, data and values __start_/__stop_ references retain C identifier name sections such as __llvm_prf_*. Putting these into a section group disables this logic. The ELF section group semantics ensures that group members are retained or discarded as a unit. When a function symbol is discarded, this allows allows linker to discard counters, data and values associated with that function symbol as well. Note that `noduplicates` COMDAT is lowered to zero-flag section group in ELF. We only set this for functions that aren't already in a COMDAT and for those that don't have available_externally linkage since we already use regular COMDAT groups for those. Differential Revision: https://reviews.llvm.org/D96757	2021-02-21 16:13:06 -08:00
Nikita Popov	e0615bcd39	[Loads] Add optimized FindAvailableLoadedValue() overload (NFCI) FindAvailableLoadedValue() accepts an iterator by reference. If no available value is found, then the iterator will either be left at a clobbering instruction or the beginning of the basic block. This allows using FindAvailableLoadedValue() across multiple blocks. If this functionality is not needed, as is the case in InstCombine, then we can use a much more efficient implementation: First try to find an available value, and only perform clobber checks if we actually found one. As this function only looks at a very small number of instructions (6 by default) and usually doesn't find an available value, this saves many expensive alias analysis queries.	2021-02-21 18:42:56 +01:00
Kristina Bessonova	e97aab8d15	[ThinLTO] Fix import of multiply defined global variables Currently, if there is a module that contains a strong definition of a global variable and a module that has both a weak definition for the same global and a reference to it, it may result in an undefined symbol error while linking with ThinLTO. It happens because: * the strong definition become internal because it is read-only and can be imported; * the weak definition gets replaced by a declaration because it's non-prevailing; * the strong definition failed to be imported because the destination module already contains another definition of the global yet this def is non-prevailing. The patch adds a check to computeImportForReferencedGlobals() that allows considering a global variable for being imported even if the module contains a definition of it in the case this def has an interposable linkage type. Note that currently the check is based only on the linkage type (and this seems to be enough at the moment), but it might be worth to account the information whether the def is prevailing or not. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95943	2021-02-21 18:34:12 +02:00
Jianzhou Zhao	9524632fa2	[dfsan] Comment out unused methods by D97087 temporarily	2021-02-21 03:31:19 +00:00
Sanjay Patel	e772618f1e	[InstCombine] fold fdiv with exp/exp2 divisor (PR49147) Follow-up to: D96648 / `b40fde062` ...for the special-case base calls. From the earlier commit: This is unusual in the general (non-reciprocal) case because we need an extra instruction, but that should be better for general FP reassociation and codegen. We conservatively check for "arcp" FMF here as we do with existing fdiv folds, but it is not strictly necessary to have that.	2021-02-20 16:02:58 -05:00
Teresa Johnson	fde55a9c9b	[LTO] Fix cloning of llvm.used when splitting module Refines the fix in `3c4c205060` to only put globals whose defs were cloned into the split regular LTO module on the cloned llvm.used globals. This avoids an issue where one of the attached values was a local that was promoted in the original module after the module was cloned. We only need to have the values defined in the new module on those globals. Fixes PR49251. Differential Revision: https://reviews.llvm.org/D97013	2021-02-20 09:46:43 -08:00
Simon Pilgrim	609d0c9772	[InstCombine] matchBSwapOrBitReverse - remove pattern matching early-out. NFCI. recognizeBSwapOrBitReverseIdiom + collectBitParts have pattern matching to bail out early if a bswap/bitreverse pattern isn't possible - we should be able to rely on this instead without any notable change in compile time. This is part of a cleanup towards letting matchBSwapOrBitReverse /recognizeBSwapOrBitReverseIdiom use 'root' instructions that aren't ORs (FSHL/FSHRs in particular which can be prematurely created). Differential Revision: https://reviews.llvm.org/D97056	2021-02-20 13:15:34 +00:00
Dávid Bolvanský	cd54c57919	Reland "[Libcalls, Attrs] Annotate libcalls with noundef" Fixed Clang tests.	2021-02-20 06:18:48 +01:00
Dávid Bolvanský	94d034fb86	Revert "[Libcalls, Attrs] Annotate libcalls with noundef" This reverts commit `33b0c63775`. Bots are failing. Some Clang tests need to be updated too.	2021-02-20 04:18:42 +01:00
Dávid Bolvanský	33b0c63775	[Libcalls, Attrs] Annotate libcalls with noundef I think we can use here same logic as for nonnull. strlen(X) - X must be noundef => valid pointer. for libcalls with size arg, we add noundef only if size is known and greater than 0 - so pointers must be noundef (valid ones) Reviewed By: jdoerfert, aqjune Differential Revision: https://reviews.llvm.org/D95122	2021-02-20 04:10:07 +01:00
Dávid Bolvanský	68e6025cf7	Revert "[BuildLibcalls] Mark some libcalls with inaccessiblememonly and inaccessiblemem_or_argmemonly" This reverts commit `05d891a19e`.	2021-02-20 03:58:53 +01:00
Dávid Bolvanský	05d891a19e	[BuildLibcalls] Mark some libcalls with inaccessiblememonly and inaccessiblemem_or_argmemonly Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94850	2021-02-20 03:56:01 +01:00
Jianzhou Zhao	dab953c8e4	[dfsan] Add utils that get/set origins This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97087	2021-02-20 00:52:33 +00:00
Jianzhou Zhao	cb1f1aab90	[dfsan] Add origin address calculation This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97065	2021-02-19 21:30:07 +00:00
Jianzhou Zhao	efc8f3311b	[msan] Set cmpxchg shadow precisely In terms of https://llvm.org/docs/LangRef.html#cmpxchg-instruction, the return type of chmpxchg is a pair {ty, i1}, while I think we only wanted to set the shadow for the address 0th op, and it has type ty. Reviewed-by: eugenis Differential Revision: https://reviews.llvm.org/D97029	2021-02-19 20:23:23 +00:00
Wei Mi	4ffad1fb48	[SampleFDO] Add PromotedInsns to prevent repeated ICP. In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9, We use 0 count value profile to memorize which target has been promoted and prevent repeated ICP for the same target, so we delete PromotedInsns. However, I found the implementation in the patch has some shortcomings to be fixed otherwise there will still be repeated ICP. So I add PromotedInsns back temorarily. Will remove it after I get a thorough fix.	2021-02-19 10:01:49 -08:00
Benjamin Kramer	59f442e6bb	[LV] Fold single-use variable into assert. NFC.	2021-02-19 18:11:39 +01:00
Nikita Popov	71a8e4e7d6	[MemCopyOpt] Enable MemorySSA by default This enables use of MemorySSA instead of MemDep in MemCpyOpt. To allow this without significant compile-time impact, the MemCpyOpt pass is moved directly before DSE (in the cases where this was not already the case), which allows us to reuse the existing MemorySSA analysis. Unlike the MemDep-based implementation, the MemorySSA-based MemCpyOpt can also perform simple optimizations across basic blocks. Differential Revision: https://reviews.llvm.org/D94376	2021-02-19 18:06:25 +01:00
Florian Hahn	edc92a1c42	[LV] Remove VPCallback. Now that all state for generated instructions is managed directly in VPTransformState, VPCallBack is no longer needed. This patch updates the last use of `getOrCreateScalarValue` to instead manage the value directly in VPTransformState and removes VPCallback. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D95383	2021-02-19 12:50:41 +00:00
Nikita Popov	2f17ed294f	[DCE] Don't remove non-willreturn calls In both ADCE and BDCE (via DemandedBits) we should not remove instructions that are not guaranteed to return. This issue was pointed out by fhahn in the recent llvm-dev thread. Differential Revision: https://reviews.llvm.org/D96993	2021-02-19 12:35:40 +01:00
Nikita Popov	370addb996	[IR] Move willReturn() to Instruction This moves the willReturn() helper from CallBase to Instruction, so that it can be used in a more generic manner. This will make it easier to fix additional passes (ADCE and BDCE), and will give us one place to change if additional instructions should become non-willreturn (e.g. there has been talk about handling volatile operations this way). I have also included the IntrinsicInst workaround directly in here, so that it gets applied consistently. (As such this change is not entirely NFC -- FuncAttrs will now use this as well.) Differential Revision: https://reviews.llvm.org/D96992	2021-02-19 11:56:01 +01:00
Djordje Todorovic	1a2b3536ef	Reland "[Debugify] Make the debugify aware of the original (-g) Debug Info" As discussed on the RFC [0], I am sharing the set of patches that enables checking of original Debug Info metadata preservation in optimizations. The proof-of-concept/proposal can be found at [1]. The implementation from the [1] was full of duplicated code, so this set of patches tries to merge this approach into the existing debugify utility. For example, the utility pass in the original-debuginfo-check mode could be invoked as follows: $ opt -verify-debuginfo-preserve -pass-to-test sample.ll Since this is very initial stage of the implementation, there is a space for improvements such as: - Add support for the new pass manager - Add support for metadata other than DILocations and DISubprograms [0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ [1] https://github.com/djolertrk/llvm-di-checker Differential Revision: https://reviews.llvm.org/D82545 The test that was failing is now forced to use the old PM.	2021-02-18 23:29:22 -08:00
Xun Li	3bf8f162a0	[Coroutine] Relax CoroElide musttail check As discussed in D94834, we don't really need to do complicated analysis. It's safe to just drop the tail call attribute. Differential Revision: https://reviews.llvm.org/D96926	2021-02-18 19:36:11 -08:00
Wei Mi	5fb65c02ca	[SampleFDO] Stop repeated indirect call promotion for the same target. Found a problem in indirect call promotion in sample loader pass. Currently if an indirect call is promoted for a target, and if the parent function is inlined into some other function, the indirect call can be promoted for the same target again. That is redundent which can harm performance and can cause excessive compile time in some extreme case. The patch fixes the issue. If a target is promoted for an indirect call, the patch will write ICP metadata with the target call count being set to 0. In the later ICP in sample profile loader, if it sees a target has 0 count for an indirect call, it knows the target has been promoted and won't do indirect call promotion for the indirect call. The fix brings 0.1~0.2% performance on our search benchmark. Differential Revision: https://reviews.llvm.org/D96806	2021-02-18 17:01:32 -08:00
Jianzhou Zhao	7e658b2fdc	[dfsan] Instrument origin variable and function definitions This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse, gbalats Differential Revision: https://reviews.llvm.org/D96977	2021-02-18 23:50:05 +00:00
Hongtao Yu	e87b1b1d4e	[CSSPGO] Use callsite sample counts to annotate indirect call sites. With CSSPGO all indirect call targets are counted torwards the original indirect call site in the profile, including both inlined and non-inlined targets. Therefore no need to look for callee entry counts. This also fixes the issue where callee entry count doesn't match callsite count due to the nature of CS sampling. I'm also cleaning up the orginal code that called `findIndirectCallFunctionSamples` just to compute the sum, the return value of which was disgarded. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D96990	2021-02-18 14:52:34 -08:00
Nikita Popov	70e3c9a8b6	[BasicAA] Always strip single-argument phi nodes We can always look through single-argument (LCSSA) phi nodes when performing alias analysis. getUnderlyingObject() already does this, but stripPointerCastsAndInvariantGroups() does not. We still look through these phi nodes with the usual aliasPhi() logic, but sometimes get sub-optimal results due to the restrictions on value equivalence when looking through arbitrary phi nodes. I think it's generally beneficial to keep the underlying object logic and the pointer cast stripping logic in sync, insofar as it is possible. With this patch we get marginally better results: aa.NumMayAlias \| 5010069 \| 5009861 aa.NumMustAlias \| 347518 \| 347674 aa.NumNoAlias \| 27201336 \| 27201528 ... licm.NumPromoted \| 1293 \| 1296 I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(), as we're past the point where we can explicitly spell out everything that's getting stripped. Differential Revision: https://reviews.llvm.org/D96668	2021-02-18 23:07:50 +01:00
Ta-Wei Tu	f70cdc5b5c	[NPM] Properly reset parent loop after loop passes This fixes https://bugs.llvm.org/show_bug.cgi?id=49185 When `NDEBUG` is not set, `LPMUpdater` checks if the added loops have the same parent loop as the current one in `addSiblingLoops`. If multiple loop passes are executed through `LoopPassManager`, `U.ParentL` will be the same across all passes. However, the parent loop might change after running a loop pass, resulting in assertion failures in subsequent passes. This patch resets `U.ParentL` after running individual loop passes in `LoopPassManager`. Reviewed By: asbirlea, ychen Differential Revision: https://reviews.llvm.org/D96727	2021-02-19 02:50:53 +08:00
Jianzhou Zhao	406dc54903	[dfsan] Refactor defining TLS variables This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D96941	2021-02-18 18:04:21 +00:00
Jianzhou Zhao	2e6cd338c6	[dfsan] Refactor runtime functions checking This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D96940	2021-02-18 18:01:46 +00:00
Philip Reames	8666463889	[instcombine] Exploit UB implied by nofree attributes This patch simply implements the documented UB of the current nofree attributes as specified. It doesn't try to be fancy about inference (yet), it just implements the cases already specified and inferred. Note: When this lands, it may expose miscompiles. If so, please revert and provide a test case. It's likely the bug is in the existing inference code and without a relatively complete test case, it will be hard to debug. Differential Revision: https://reviews.llvm.org/D96349	2021-02-18 08:34:22 -08:00
Djordje Todorovic	c1e23894fc	Revert "[Debugify] Make the debugify aware of the original (-g) Debug Info" This reverts rG8ee7c7e02953. One test is failing, I'll reland this as soon as possible.	2021-02-18 02:04:27 -08:00
Djordje Todorovic	8ee7c7e029	[Debugify] Make the debugify aware of the original (-g) Debug Info As discussed on the RFC [0], I am sharing the set of patches that enables checking of original Debug Info metadata preservation in optimizations. The proof-of-concept/proposal can be found at [1]. The implementation from the [1] was full of duplicated code, so this set of patches tries to merge this approach into the existing debugify utility. For example, the utility pass in the original-debuginfo-check mode could be invoked as follows: $ opt -verify-debuginfo-preserve -pass-to-test sample.ll Since this is very initial stage of the implementation, there is a space for improvements such as: - Add support for the new pass manager - Add support for metadata other than DILocations and DISubprograms [0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ [1] https://github.com/djolertrk/llvm-di-checker Differential Revision: https://reviews.llvm.org/D82545	2021-02-18 01:52:16 -08:00
Joseph Huber	c3a3d20093	[LV] Add analysis remark for mixed precision conversions Floating point conversions inside vectorized loops have performance implications but are very subtle. The user could specify a floating point constant, or call a function without realizing that it will force a change in the vector width. An example of this behaviour is seen in https://godbolt.org/z/M3nT6c . The vectorizer should indicate when this happens becuase it is most likely unintended behaviour. This patch adds a simple check for this behaviour by following floating point stores in the original loop and checking if a floating point conversion operation occurs. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D95539	2021-02-17 21:37:08 -05:00
Teresa Johnson	d55d46f43b	[WPD] Add an optional checking mode for debugging devirtualization This adds an internal option -wholeprogramdevirt-check which if enabled will guard each devirtualization with a runtime check against the expected target, and an invocation of a debug trap if the check fails. This is useful for debugging WPD failures involving undefined behavior (e.g. casting to another class type not in the inheritance chain). Differential Revision: https://reviews.llvm.org/D95969	2021-02-17 16:46:15 -08:00
Rong Xu	7397905ab0	[SampleFDO] Third Try: Refactor SampleProfile.cpp Apply the patch for the third time after fixing buildbot failures. Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. (4) Add inline keyword to avoid duplicated symbols -- they will be removed later when the class is changed to a template. Differential Revision: https://reviews.llvm.org/D96455	2021-02-17 15:31:50 -08:00
Teresa Johnson	3c4c205060	[WPD][lld] Test handling of vtable definition from shared libraries Adds a lld test for a case that the handling added for dynamically exported symbols in `1487747e99` already fixes. Because isExportDynamic returns true when the symbol is SharedKind with default visibility, it will treat as dynamically exported and block devirtualization when the definition of a vtable comes from a shared library. This is desireable as it is dangerous to devirtualize in that case, since there could be hidden overrides in the shared library. Typically that happens when the shared library header contains available externally definitions, which applications can override. An example is std::error_category, which is overridden in LLVM and causing failures after a self build with WPD enabled, because libstdc++ contains hidden overrides of the virtual base class methods. The regular LTO case in the new test already worked, but there are 2 fixes in this patch needed for the index-only case and the hybrid LTO case. For the index-only case, WPD should not simply ignore available externally vtables. A follow on fix will be made to clang to emit type metadata for those vtables, which the new test is modeling. For the hybrid case, we need to ensure when the module is split that any llvm.*used globals are cloned to the regular LTO split module so available externally vtable definitions are not prematurely deleted. Another follow on fix will add the equivalent gold test, which requires a small fix to the plugin to treat symbols in dynamic libraries the same way lld already is. Differential Revision: https://reviews.llvm.org/D96721	2021-02-17 12:49:24 -08:00
Vedant Kumar	c28622fbf3	Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp" Revert "[SampleFDO] Add missing #includes to unbreak modules build after D96455" This reverts commit `c73cbf218a`. Revert "[SampleFDO] Fix MSVC "namespace uses itself" warning (NFC)" This reverts commit `a23e6b321c`. Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp" This reverts commit `6fd5ccff72`. Still seeing link failures when building llc (or other tools), due to the new SampleProfileLoaderBaseImpl.h containing definitions that get duplicated across multiple TU's. ``` duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::findEquivalenceClasses(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::buildEdges(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::computeDominanceAndLoopInfo(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getFunctionLoc(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getBlockWeight(llvm::BasicBlock const)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockWeight(llvm::raw_ostream&, llvm::BasicBlock const) const' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockEquivalence(llvm::raw_ostream&, llvm::BasicBlock const)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printEdgeWeight(llvm::raw_ostream&, std::__1::pair<llvm::BasicBlock const, llvm::BasicBlock const*>)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) ```	2021-02-17 10:22:24 -08:00
William S. Moses	40862b1a74	[SROA] Propagate correct TBAA/TBAA Struct offsets SROA does not correctly account for offsets in TBAA/TBAA struct metadata. This patch creates functionality for generating new MD with the corresponding offset and updates SROA to use this functionality. Differential Revision: https://reviews.llvm.org/D95826	2021-02-17 11:59:00 -05:00
Ta-Wei Tu	0eeaec2a6d	[NFC] Refactor LoopInterchange into a loop-nest pass This is the preliminary patch of converting `LoopInterchange` pass to a loop-nest pass and has no intended functional change. Changes that are not loop-nest related are split to D96650. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D96644	2021-02-18 00:55:38 +08:00
Sjoerd Meijer	f78aa8b2c2	[LSR] Add a flag that overrides the target's preferred addressing mode This adds a new flag -lsr-preferred-addressing-mode to override the target's preferred addressing mode. It replaces flag -lsr-backedge-indexing, which is equivalent to preindexed addressing that is one of the options that -lsr-preferred-addressing-mode accepts. Differential Revision: https://reviews.llvm.org/D96855	2021-02-17 16:50:21 +00:00
Sanjay Patel	85294703a7	[InstCombine] fold fcmp-of-copysign idiom As discussed in: https://llvm.org/PR49179 ...this pattern shows up in library code. There are several potential generalizations as noted, but we need to be careful that we get FP special-values right, and it's not clear how much variation we should expect to see from this exact idiom.	2021-02-17 10:32:33 -05:00
Sjoerd Meijer	5a641cf194	Follow up of rGdea4a63e6359, which committed a slightly different version than intended.	2021-02-17 10:07:26 +00:00
Sjoerd Meijer	dea4a63e63	[LSR] Cleanup of getPreferredAddresingMode. NFC. This is a follow up D96600 and cleans up most calls to getPreferredAddresingMode. I.e., we really don't need to query the same things again and again, but get the preferred addressing mode once for each loop. So this should be a lot friendlier for compile times, especially if we start implementing getPreferredAddresingMode. Differential Revision: https://reviews.llvm.org/D96772	2021-02-17 09:45:29 +00:00
Rong Xu	6fd5ccff72	[SampleFDO] Reapply: Refactor SampleProfile.cpp Reapply patch after fixing buildbot failure. Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. Differential Revision: https://reviews.llvm.org/D96455	2021-02-16 16:43:21 -08:00
Mehdi Amini	c761fe77bd	Revert "[SampleFDO][NFC] Refactor SampleProfile.cpp" This reverts commit `310b35304c`. The build is broken with -DBUILD_SHARED_LIBS=ON : lib/ProfileData/CMakeFiles/LLVMProfileData.dir/SampleProfileLoaderBaseUtil.cpp.o: In function `llvm::sampleprofutil::callsiteIsHot(llvm::sampleprof::FunctionSamples const, llvm::ProfileSummaryInfo, bool)': SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x1a): undefined reference to `llvm::ProfileSummaryInfo::isColdCount(unsigned long) const' SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x28): undefined reference to `llvm::ProfileSummaryInfo::isHotCount(unsigned long) const' ...	2021-02-16 22:11:42 +00:00
Rong Xu	310b35304c	[SampleFDO][NFC] Refactor SampleProfile.cpp Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. Differential Revision: https://reviews.llvm.org/D96455	2021-02-16 11:18:21 -08:00
Arnold Schwaighofer	627cfd4394	[coro async] Don't promote allocas to the frame or rewrite swifterror if there are no suspend points Also don't call function to update the call graph if there are no clones. The function will fail. rdar://74277860 Differential Revision: https://reviews.llvm.org/D96620	2021-02-16 09:05:38 -08:00
Ta-Wei Tu	6b612a7baf	[NFC][LoopInterchange] Explicitly pass both `InnerLoop` and `OuterLoop` to `processLoop` This is a split patch of D96644. Explicitly pass both `InnerLoop` and `OuterLoop` to function `processLoop` to remove the need to swap elements in loop list and allow making loop list an `ArrayRef`. Also, fix inconsistent spellings of `OuterLoopId` and `Inner Loop Id` in debug log. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96650	2021-02-16 22:17:44 +08:00
Florian Hahn	f64c626069	[VPlan] Remove unused Phi member from VPWidenPHIRecipe (NFC). The member is not needed any longer after recent changes.	2021-02-16 13:53:06 +00:00
Kerry McLaughlin	ba1e150d03	[SVE] Add support for scalable vectorization of loops with int/fast FP reductions This patch enables scalable vectorization of loops with integer/fast reductions, e.g: ``` unsigned sum = 0; for (int i = 0; i < n; ++i) { sum += a[i]; } ``` A new TTI interface, isLegalToVectorizeReduction, has been added to prevent reductions which are not supported for scalable types from vectorizing. If the reduction is not supported for a given scalable VF, computeFeasibleMaxVF will fall back to using fixed-width vectorization. Reviewed By: david-arm, fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D95245	2021-02-16 13:50:06 +00:00
Sander de Smalen	00fe10c6a6	[SCEVExpander] Migrate costAndCollectOperands to use InstructionCost. This patch changes costAndCollectOperands to use InstructionCost for accumulated cost values. isHighCostExpansion will return true if the cost has exceeded the budget. Reviewed By: CarolineConcatto, ctetreau Differential Revision: https://reviews.llvm.org/D92238	2021-02-16 09:27:34 +00:00
Florian Hahn	54a14c264a	[VPlan] Manage scalarized values using VPValues. This patch updates codegen to use VPValues to manage the generated scalarized instructions. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D92285	2021-02-16 09:04:10 +00:00
Akira Hatanaka	32dc79c5ef	[ObjC][ARC] Do not perform code motion on precise release calls This fixes a bug where an object can get deallocated before reaching the end of its full formal lifetime. rdar://72110887 rdar://74123176	2021-02-15 17:39:37 -08:00
Duncan P. N. Exon Smith	22a52dfddc	TransformUtils: Fix metadata handling in CloneModule (and improve CloneFunctionInto) This commit fixes how metadata is handled in CloneModule to be sound, and improves how it's handled in CloneFunctionInto (although the latter is still awkward when called within a module). Ruiling Song pointed out in PR48841 that CloneModule was changed to unsoundly use the RF_ReuseAndMutateDistinctMDs flag (renamed in `fa35c1f80f` for clarity). This flag papered over a crash caused by other various changes made to CloneFunctionInto over the past few years that made it unsound to use cloning between different modules. (This commit partially addresses PR48841, fixing the repro from preprocessed source but not textual IR. MDNodeMapper::mapDistinctNode became unsound in `df763188c9` and this commit does not address that regression.) RF_ReuseAndMutateDistinctMDs is designed for the IRMover to use, avoiding unnecessary clones of all referenced metadata when linking between modules (with IRMover, the source module is discarded after linking). It never makes sense to use when you're not discarding the source. This commit drops its incorrect use in CloneModule. Sadly, the right thing to do with metadata when cloning a function is complicated, and this patch doesn't totally fix it. The first problem is that there are two different types of referenceable metadata and it's not obvious what to with one of them when remapping. - `!0 = !{!1}` is metadata's version of a constant. Programatically it's called "uniqued" (probably a better term would be "constant") because, like `ConstantArray`, it's stored in uniquing tables. Once it's constructed, it's illegal to change its arguments. - `!0 = distinct !{!1}` is a bit closer to a global variable. It's legal to change the operands after construction. What should be done with distinct metadata when cloning functions within the same module? - Should new, cloned nodes be created? - Should all references point to the same, old nodes? The answer depends on whether that metadata is effectively owned by a function. And that's the second problem. Referenceable metadata's ownership model is not clear or explicit. Technically, it's all stored on an LLVMContext. However, any metadata that is `distinct`, that transitively references a `distinct` node, or that transitively references a GlobalValue is specific to a Module and is effectively owned by it. More specifically, some metadata is effectively owned by a specific Function within a module. Effectively function-local metadata was introduced somewhere around `c10d0e5ccd`, which made it illegal for two functions to share a DISubprogram attachment. When cloning a function within a module, you need to clone the function-local debug info and suppress cloning of global debug info (the status quo suppresses cloning some global debug info but not all). When cloning a function to a new/different module, you need to clone all of the debug info. Here's what I think we should do (eventually? soon? not this patch though): - Distinguish explicitly (somehow) between pure constant metadata owned by the LLVMContext, global metadata owned by the Module, and local metadata owned by a GlobalValue (such as a function). - Update CloneFunctionInto to trigger cloning of all "local" metadata (only), perhaps by adding a bit to RemapFlag. Alternatively, split out a separate function CloneFunctionMetadataInto to prime the metadata map that callers are updated to call ahead of time as appropriate. Here's the somewhat more isolated fix in this patch: - Converted the `ModuleLevelChanges` parameter to `CloneFunctionInto` to an enum called `CloneFunctionChangeType` that is one of LocalChangesOnly, GlobalChanges, DifferentModule, and ClonedModule. - The code maintaining the "functions uniquely own subprograms" invariant is now only active in the first two cases, where a function is being cloned within a single module. That's necessary because this code inhibits cloning of (some) "global" metadata that's effectively owned by the module. - The code maintaining the "all compile units must be explicitly referenced by !llvm.dbg.cu" invariant is now only active in the DifferentModule case, where a function is being cloned into a new module in isolation. - CoroSplit.cpp's call to CloneFunctionInto in CoroCloner::create uses LocalChangeOnly, since `fa635d730f` only set `ModuleLevelChanges` to trigger cloning of local metadata. - CloneModule drops its unsound use of RF_ReuseAndMutateDistinctMDs and special handling of !llvm.dbg.cu. - Fixed some outdated header docs and left a couple of FIXMEs. Differential Revision: https://reviews.llvm.org/D96531	2021-02-15 11:56:00 -08:00
Sjoerd Meijer	357237e93e	Recommit "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `effc3b0799`, with the build problem fixed.	2021-02-15 11:33:00 +00:00
Max Kazantsev	e3c759bd58	[LoopLoadElim] Pass ScalarEvolution in old pass manager. PR49141 Loop canonicalization may end up deleting blocks from CFG. And Scalar Evolution may still keep cached referenced to those blocks unless updated properly.	2021-02-15 18:08:23 +07:00
Sjoerd Meijer	effc3b0799	Revert "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `cd6de0e8de`.	2021-02-15 11:01:23 +00:00
Sjoerd Meijer	cd6de0e8de	[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into getPreferredAddressingMode() so that we have one interface to steer LSR in generating the preferred addressing mode. Differential Revision: https://reviews.llvm.org/D96600	2021-02-15 10:44:15 +00:00
Florian Hahn	3df5d5aace	[ConstraintElimination] Fix variables used for pattern matching. Re-using the matched variable in the pattern does not work as expected. This patch fixes that by introducing a new variable for the 2nd level match.	2021-02-14 18:42:37 +00:00
Kazu Hirata	910e2d1e57	[llvm] Use llvm::is_contained (NFC)	2021-02-14 08:36:20 -08:00
Sanjay Patel	b40fde062c	[InstCombine] fold fdiv with pow divisor (PR49147) This is unusual in the general (non-reciprocal) case because we need an extra instruction, but that should be better for general FP reassociation and codegen. We conservatively check for "arcp" FMF here as we do with existing fdiv folds, but it is not strictly necessary to have that. This is part of solving: https://llvm.org/PR49147 (The powi variant potentially has a different constraint.) Differential Revision: https://reviews.llvm.org/D96648	2021-02-14 08:07:36 -05:00
Juneyoung Lee	ed253ef772	[LoopVectorize] Fix VPRecipeBuilder::createEdgeMask to correctly generate the mask This patch fixes pr48832 by correctly generating the mask when a poison value is involved. Consider this CFG (which is a part of the input): ``` for.body: ; preds = %for.cond br i1 true, label %cond.false, label %land.rhs land.rhs: ; preds = %for.body br i1 poison, label %cond.end, label %cond.false cond.false: ; preds = %for.body, %land.rhs br label %cond.end cond.end: ; preds = %land.rhs, %cond.false %cond = phi i32 [ 0, %cond.false ], [ 1, %land.rhs ] ``` The path for.body -> land.rhs -> cond.end should be taken when 'select i1 false, i1 poison, i1 false' holds (which means it's never taken); but VPRecipeBuilder::createEdgeMask was emitting 'and i1 false, poison' instead. The former one successfully blocks poison propagation whereas the latter one doesn't, making the condition poison and thus causing the miscompilation. SimplifyCFG has a similar bug (which didn't expose a real-world bug yet), and a patch for this is also ongoing (see https://reviews.llvm.org/D95026). Reviewed By: bjope Differential Revision: https://reviews.llvm.org/D95217	2021-02-14 21:12:34 +09:00
Teresa Johnson	a80232bd5f	[LTT] Address post-review comments (NFC) Implement some post-review cleanup suggestions for D96083.	2021-02-13 15:52:59 -08:00
Tyker	642e9225c6	reland [InstCombine] convert assumes to operand bundles Instcombine will convert the nonnull and alignment assumption that use the boolean condtion to an assumption that uses the operand bundles when knowledge retention is enabled. Differential Revision: https://reviews.llvm.org/D82703	2021-02-13 13:03:11 +01:00
Wei Wang	80dc0661bd	[LTO] Perform DSOLocal propagation in combined index Perform DSOLocal propagation within summary list of every GV. This avoids the repeated query of this information during function importing. Differential Revision: https://reviews.llvm.org/D96398	2021-02-12 22:58:26 -08:00
Arnold Schwaighofer	e760ec2a01	[coro] Add support for polymorphic return typed coro.suspend.async This allows for suspend point specific resume function types. Return values from a suspend point can therefore be modelled as arguments to the resume function. Allowing for directly passed return types. Differential Revision: https://reviews.llvm.org/D96136	2021-02-12 10:08:00 -08:00
Akira Hatanaka	ed4718eccb	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-12 09:51:57 -08:00
Kerry McLaughlin	fea06efe7c	[SVE][LoopVectorize] Support for vectorization of loops with function calls Changes `getScalarizationOverhead` to return an invalid cost for scalable VFs and adds some simple tests for loops containing a function for which there is a vectorized variant available. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96356	2021-02-12 13:47:43 +00:00
Sanjay Patel	79b1b4a581	[Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promoting them to 1st-class intrinsics, however, codegen support was added/improved: D58015 D90247 So I think it is safe to now remove this complication from IR. Note that we still have an IR-level codegen expansion pass for these as discussed in D95690. Removing that is another step in simplifying the logic. Also note that x86 was already unconditionally forming reductions in IR, so there should be no difference for x86. I spot checked a couple of the tests here by running them through opt+llc and did not see any asm diffs. If we do find functional differences for other targets, it should be possible to (at least temporarily) restore the shuffle IR with the ExpandReductions IR pass. Differential Revision: https://reviews.llvm.org/D96552	2021-02-12 08:13:50 -05:00
Florian Hahn	85fe5c9345	[VPlan] Make VPRecipeBase inherit from VPUser directly (NFC). The individual recipes have been updated to manage their operands using VPUser a while back. Now that the transition is done, we can instead make VPRecipeBase a VPUser and get rid of the toVPUser helper.	2021-02-12 13:06:58 +00:00
David Sherwood	01b87444cb	[NFC][Analysis] Change struct VecDesc to use ElementCount This patch changes the VecDesc struct to use ElementCount instead of an unsigned VF value, in preparation for future work that adds support for vectorized versions of math functions using scalable vectors. Since all I'm doing in this patch is switching the type I believe it's a non-functional change. I changed getWidestVF to now return both the widest fixed-width and scalable VF values, but currently the widest scalable value will be zero. Differential Revision: https://reviews.llvm.org/D96011	2021-02-12 11:07:58 +00:00
David Sherwood	9700228abc	[Analysis] Change VFABI::mangleTLIVectorName to use ElementCount Adds support for mangling TLI vector names for scalable vectors. Differential Revision: https://reviews.llvm.org/D96338	2021-02-12 09:38:12 +00:00
Kazu Hirata	9dc62d1dc1	[PGO] Drop unnecessary const from return types (NFC)	2021-02-11 23:31:29 -08:00
Hongtao Yu	de40f6d623	[CSSPGO] Process functions in a top-down order on a dynamic call graph. Functions are currently processed by the sample profiler loader in a top-down order defined by the static call graph. The order is being adjusted to be a top-down order based on the input context-sensitive profile. One benefit is that the processing order of caller and callee in one SCC would follow the context order in the profile to favor more inlining. Another benefit is that the processing order of caller and callee through an indirect call (which is not on the static call graph) can be honored which in turn allows for more inlining. The profile top-down order for SCC is also extended to support non-CS profiles. Two switches `-mllvm -use-profile-indirect-call-edges` and `-mllvm -use-profile-top-down-order` are being introduced. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95988	2021-02-11 12:36:59 -08:00
Michael Kruse	606aa622b2	Revert "[AssumptionCache] Avoid dangling llvm.assume calls in the cache" This reverts commit `b7d870eae7` and the subsequent fix "[Polly] Fix build after AssumptionCache change (D96168)" (commit `e6810cab09`). It caused indeterminism in the output, such that e.g. the polly-x86_64-linux buildbot failed accasionally.	2021-02-11 12:17:38 -06:00
Sander de Smalen	703130fb01	[TTI] Change TargetTransformInfo::getMinimumVF to return ElementCount This will be needed in the loop-vectorizer where the minimum VF requested may be a scalable VF. getMinimumVF now takes an additional operand 'IsScalableVF' that indicates whether a scalable VF is required. Reviewed By: kparzysz, rampitec Differential Revision: https://reviews.llvm.org/D96020	2021-02-11 09:08:48 +00:00
Markus Lavin	9498315c9b	Expand masked mem intrinsics correctly wrt big-endian Need to take endianness into account when doing vector to scalar casts such as %bc = bitcast <8 x i1> %v to i8 Companion commit for https://reviews.llvm.org/D94867 Upload in response to https://lists.llvm.org/pipermail/llvm-dev/2021-January/147862.html Attempting to document the actual memory layout rules for vectors in https://reviews.llvm.org/D94964 Differential Revision: https://reviews.llvm.org/D94765	2021-02-11 08:59:52 +00:00
Sander de Smalen	be9bbb57f4	[LoopVectorize] NFC: Change selectVectorizationFactor to work on ElementCount. This patch is NFC and changes occurrences of `unsigned Width` and `unsigned i` to work on type ElementCount instead. This patch is a preparatory patch with the ultimate goal of making `computeMaxVF()` return both a max fixed VF and a max scalable VF, so that `selectVectorizationFactor()` can pick the most cost-effective vectorization factor. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96019	2021-02-11 08:47:59 +00:00
Kazu Hirata	d12a0f4fc0	[GCOV] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2021-02-10 20:01:18 -08:00
Duncan P. N. Exon Smith	fa35c1f80f	ValueMapper: Rename RF_MoveDistinctMDs => RF_ReuseAndMutateDistinctMDs, NFC Rename the `RF_MoveDistinctMDs` flag passed into `MapValue` and `MapMetadata` to `RF_ReuseAndMutateDistinctMDs` in order to more precisely describe its effect and clarify the header documentation. Found this while helping to investigate PR48841, which pointed out an unsound use of the flag in `CloneModule()`. For now I've just added a FIXME there, but I'm hopeful that the new (more precise) name will prevent other similar errors.	2021-02-10 16:53:21 -08:00
Benjamin Kramer	8fb4a4f7bb	[SampleFDO] Silence -Wnon-virtual-dtor warning There's no polymorphic deletion happening here.	2021-02-10 23:37:15 +01:00
Rong Xu	db0d7d0ba9	[SampleFDO][NFC] Refactor SampleProfileLoader to reuse in CodeGen Break SampleProfileLoader into to a base and a derived class. Base class (SampleProfileLoaderBaseImpl) includes the common code for IR and MachineIR (CodeGen) sample loader. It will be templatelized in the later patch. Inline and Probe related code will remain in the derived class of SampleProfileLoader and stays in SampleProfile.cpp. We need to refactor some functions: (1) getInstWeight() to enable the code sharing -- put the core into getInstWeightImpl(). (2) emitAnnotation() and propagateWeights() to carve out the code specific to SampleProfileLoader. (3) make getInstWeight() and findFunctionSamples() virtual and override in SampleProfileLoader as they need to access the fields in the derived class. Differential Revision: https://reviews.llvm.org/D95832	2021-02-10 13:29:15 -08:00
Hongtao Yu	1cb47a063e	[CSSPGO] Unblock optimizations with pseudo probe instrumentation. The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include: 1. IR InstCombine, sinking load operation to shorten lifetimes. 2. MIR LiveRangeShrink, similar to #1 3. MIR TwoAddressInstructionPass, i.e, opeq transform 4. MIR function argument copy elision 5. IR stack protection. (though not perf-critical but nice to have). Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95982	2021-02-10 12:43:17 -08:00
Adrian Prantl	19fc8eede4	Add missing nullptr check. salvageDebugInfoImpl() may fail and return a nullptr.	2021-02-10 12:15:24 -08:00
Sanjay Patel	6e2053983e	[InstCombine] fold lshr(mul X, SplatC), C2 This is a special-case multiply that replicates bits of the source operand. We need this fold to avoid regression if we make canonicalization to `mul` more aggressive for shl+or patterns. I did not see a way to make Alive generalize the bit width condition for even-number-of-bits only, but an example of the proof is: Name: i32 Pre: isPowerOf2(C1 - 1) && log2(C1) == C2 && (C2 * 2 == width(C2)) %m = mul nuw i32 %x, C1 %t = lshr i32 %m, C2 => %t = and i32 %x, C1 - 2 Name: i14 %m = mul nuw i14 %x, 129 %t = lshr i14 %m, 7 => %t = and i14 %x, 127 https://rise4fun.com/Alive/e52	2021-02-10 15:02:31 -05:00
Sander de Smalen	9db6e97a86	[LoopVectorize] NFC: Change computeFeasibleMaxVF to operate on ElementCount. This patch is NFC and changes occurrences of `unsigned MaxVectorSize` to work on type ElementCount. This patch is a preparatory patch with the ultimate goal of making `computeMaxVF()` return both a max fixed VF and a max scalable VF, so that `selectVectorizationFactor()` can pick the most cost-effective vectorization factor. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D96018	2021-02-10 08:52:10 +00:00
Tyker	5652e192fc	Revert "[InstCombine] convert assumes to operand bundles" This reverts commit `5eb2e994f9`.	2021-02-10 01:32:00 +01:00
Florian Hahn	fd8afa41eb	[VPlan] Use VPUser to manage CondBit VP blocks keep track of a condition, which is a VPValue. This patch updates VPBlockBase to manage the value using VPUser, so replaceAllUsesWith properly updates the condition bit as well. This is required to enable VP2VP transformations and it helps with simplifying some of the code required to manage condition bits. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D95382	2021-02-09 21:53:50 +00:00
Johannes Doerfert	81429abd83	[Attributor][FIX] Do not create UB by introducing a `noundef undef` This was reported as PR49104. The reproducer uses varargs but the issue is the same, we know an argument is dead but can't change the signature for some reason. The PR49104 situation was: We are in an CG-SCC traversal and we remove all the uses of an argument and proof it thereby dead. However, if we do not remove the argument, via signature rewrite, we need to ensure that the `undef` we introduce at the call site doesn't clash with a `noundef` attribute.	2021-02-09 13:02:38 -06:00
Tyker	5eb2e994f9	[InstCombine] convert assumes to operand bundles Instcombine will convert the nonnull and alignment assumption that use the boolean condtion to an assumption that uses the operand bundles when knowledge retention is enabled. Differential Revision: https://reviews.llvm.org/D82703	2021-02-09 19:33:53 +01:00
Jianzhou Zhao	9887fdebd6	[dfsan] Refactor loadShadow To simplify the review of https://reviews.llvm.org/D95835. Reviewed-by: gbalats, morehouse Differential Revision: https://reviews.llvm.org/D96180	2021-02-09 17:21:41 +00:00
Nico Weber	de1966e542	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `4a64d8fe39`. Makes clang crash when buildling trivial iOS programs, see comment after https://reviews.llvm.org/D92808#2551401	2021-02-09 11:06:32 -05:00
Chuanqi Xu	88d7876e1e	[NFC] [Coroutine] Remove Unused Variables	2021-02-09 15:55:06 +08:00
Kazu Hirata	302313a264	[Transforms] Use range-based for loops (NFC)	2021-02-08 22:33:53 -08:00
Kazu Hirata	de6c49ae31	[Transforms/Utils] Drop unnecessary const from a return type (NFC) Identified with const-return-type.	2021-02-08 22:33:49 -08:00
Jinsong Ji	9202806241	Revert "[CostModel] Remove VF from IntrinsicCostAttributes" This reverts commit `502a67dd7f`. This expose a failure in test-suite build on PowerPC, revert to unblock buildbot first, Dave will re-commit in https://reviews.llvm.org/D96287. Thanks Dave.	2021-02-09 02:14:14 +00:00
Arthur Eubanks	0eda454796	[SimpleLoopUnswitch] Don't non-trivially unswitch loops that are unsafe to clone Non-trivial unswitching can clone loops. The legacy -loop-unswitch pass also checks for this. Fixes PR49085. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D96288	2021-02-08 13:19:24 -08:00
Jianzhou Zhao	64b448b983	[dfsan] Refactor visitCallBase To simplify the review of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D96177	2021-02-08 19:55:18 +00:00
Florian Hahn	68dc90b347	[ConstraintElimination] Decompose a few more GEP indices. This patch adds handling for zero-extended GEP indices.	2021-02-08 18:06:38 +00:00
Florian Hahn	1f1f037ed3	[ConstraintElimination] Improve index handing during constraint building. This patch improves the index management during constraint building. Previously, the code rejected constraints which used values that were not part of Value2Index, but after combining the coefficients of the new indices were 0 (if ShouldAdd was 0). In those cases, no new indices need to be added. Instead of adding to Value2Index directly, add new indices to the NewIndices map. The caller can then check if it needs to add any new indices. This enables checking constraints like `a + x <= a + n` to `x <= n`, even if there is no constraint for `a` directly.	2021-02-08 13:05:13 +00:00
Florian Hahn	ca268ed285	[ConstraintElimination] Decompose zext for unsigned compares. For unsigned compares, zext should be a no-op and we can add the extended value to the constraint system.	2021-02-07 20:53:06 +00:00
Florian Hahn	3bb6dc0b26	[LV] Replace some uses of VectorLoopValueMap with VPTransformState (NFC) This patch updates some places where VectorLoopValueMap is accessed directly to instead go through VPTransformState. As we move towards managing created values exclusively in VPTransformState, this ensures the use always can fetch the correct value. This is in preparation for D92285, which switches to managing scalarized values through VPValues. In the future, the various fix* functions should be moved directly into the VPlan codegen stage. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D95757	2021-02-07 18:28:21 +00:00
Kazu Hirata	be23012d5a	[Transforms/Utils] Use range-based for loops (NFC)	2021-02-07 09:49:36 -08:00
Sanjay Patel	6fd91be354	[Reassociate] allow or->add with shl operands As discussed in: https://llvm.org/PR49055 We invert instcombine's add->or transform here because it makes it easier to identify factorization transforms like the mul in the motivating test. This extends the logic added with: https://reviews.llvm.org/rG70472f3 https://reviews.llvm.org/rG93f3d7f (I intentionally kept the formatting fix in this patch to provide more context about the calling logic.)	2021-02-07 09:45:19 -05:00
Florian Hahn	853c52c988	[ConstraintElimination] Require GEPs to be inbounds for decomposition. During decomposition, we assume the no-wrap properties of inbound GEPs. Make sure the decomposed GEP is actually inbounds.	2021-02-07 11:08:53 +00:00
Johannes Doerfert	b7d870eae7	[AssumptionCache] Avoid dangling llvm.assume calls in the cache PR49043 exposed a problem when it comes to RAUW llvm.assumes. While D96106 would fix it for GVNSink, it seems a more general concern. To avoid future problems this patch moves away from the vector of weak reference model used in the assumption cache. Instead, we track the llvm.assume calls with a callback handle which will remove itself from the cache if the call is deleted. Fixes PR49043. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96168	2021-02-06 12:18:39 -06:00
Teresa Johnson	3a27933ec2	[LTT] Don't attempt to lower type tests used only by assumes Type tests used only by assumes were original for devirtualization, but are meant to be kept through the first invocation of LTT so that they can be used for additional optimization. In the regular LTO case where the IR is analyzed we may find a resolution for the type test and end up rewriting the associated vtable global, which can have implications on section splitting. Simply ignore these type tests. Fixes PR48245. Differential Revision: https://reviews.llvm.org/D96083	2021-02-06 09:02:10 -08:00
Sander de Smalen	79a6cfc29e	NFC: Migrate LoopIdiomRecognize to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html	2021-02-06 14:39:19 +00:00
Sander de Smalen	ae27274b2f	NFC: Migrate LoopFlatten to work on InstructionCost. This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96029	2021-02-06 11:47:04 +00:00
Kazu Hirata	ea3175c15b	[Transforms/Instrumentation] Use range-based for loops (NFC)	2021-02-05 21:02:08 -08:00
Sidharth Baveja	22ebbc4765	LoopUnrollAndJam] Only allow loops with single exit(ing) blocks Summary: This resolves an issue posted on Bugzilla. https://bugs.llvm.org/show_bug.cgi?id=48764 In this issue, the loop had multiple exit blocks, which resulted in the function getExitBlock to return a nullptr, which resulted in hitting the assert. This patch ensures that loops which only have one exit block as allowed to be unrolled and jammed. Reviewed By: Whitney, Meinersbur, dmgreen Differential Revision: https://reviews.llvm.org/D95806	2021-02-05 16:10:53 +00:00
Akira Hatanaka	4a64d8fe39	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `3fe3946d9a` without the changes made to lib/IR/AutoUpgrade.cpp, which was violating layering. Original commit message: Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 06:09:42 -08:00
Arnold Schwaighofer	8a7f5ad0fd	We can only move static allocas into the resume entry points Dynamic allocas that still exist have been verified to be only used 'locally' not accross a suspend point. rdar://73903220 Differential Revision: https://reviews.llvm.org/D96071	2021-02-05 06:06:10 -08:00
Akira Hatanaka	2fbbb18c1d	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `3fe3946d9a`. The commit violates layering by including a header from Analysis in lib/IR/AutoUpgrade.cpp.	2021-02-05 06:00:05 -08:00
Akira Hatanaka	3fe3946d9a	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 05:55:18 -08:00
Adrian Kuegel	7fe41ac3df	Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute" This reverts commit `3e5ce49e53`. Tests started failing on PPC, for example: http://lab.llvm.org:8011/#/builders/105/builds/5569	2021-02-05 12:51:03 +01:00
Simon Pilgrim	f7d07dbb29	IROutliner.cpp - fix Wdocumentation warning. NFCI. Remove duplicate param	2021-02-05 11:38:09 +00:00
Simon Pilgrim	476b912e7c	SampleProfile.cpp - fix Wdocumentation warning. NFCI. Remove duplicate param	2021-02-05 11:31:17 +00:00
Simon Pilgrim	89edda7084	IROutliner.cpp - fix Wdocumentation warnings. NFCI.	2021-02-05 11:21:00 +00:00
David Green	502a67dd7f	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-05 09:34:24 +00:00
Kazu Hirata	fb74e1e78a	[Transforms/Scalar] Use range-based for loops (NFC)	2021-02-04 21:18:05 -08:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Philip Reames	3e5ce49e53	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCI-ish prep work, but the changes are a bit too involved for me to feel comfortable tagging the change that way. Differential Revision: https://reviews.llvm.org/D94892	2021-02-04 17:28:30 -08:00
Richard Smith	ab243efb26	Don't infer attributes on '::operator new'. These attributes were all incorrect or inappropriate for LLVM to infer: - inaccessiblememonly is generally wrong; user replacement operator new can access memory that's visible to the caller, as can a new_handler function. - willreturn is generally wrong; a custom new_handler is not guaranteed to terminate. - noalias is inappropriate: Clang has a flag to determine whether this attribute should be present and adds it itself when appropriate. - noundef and nonnull on the return value should be specified by the frontend on all 'operator new' functions if we want them, not here. In any case, inferring attributes on functions declared 'nobuiltin' (as these are when Clang emits them) seems questionable.	2021-02-04 13:59:49 -08:00
Richard Smith	1484ad4137	Revert "[BuildLibcalls, Attrs] Support more variants of C++'s new, add attributes for C++'s delete" Several of the new attributes here were incorrect, and even the ones that are generally correct were being added even to nobuiltin calls. This reverts commit `bb3f169b59`.	2021-02-04 13:59:49 -08:00
Sander de Smalen	75b2555d6e	NFC: Migrate LoopUnrollPass to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm, fhahn Differential Revision: https://reviews.llvm.org/D95817	2021-02-04 14:05:40 +00:00
Florian Hahn	703f6a6828	[ConstraintElimination] Support conditions from loop preheaders This patch extends the condition collection logic to allow adding conditions from pre-headers to loop headers, by allowing cases where the target block dominates some of its predecessors.	2021-02-04 13:58:32 +00:00
Chuanqi Xu	9511fa2dda	[NFC][Coroutine] Remove redundant comment The functionallity in the TODO was added before: https://reviews.llvm.org/rGb3a722e66b75328ab5e2eb5c8572022cb083855b	2021-02-04 12:54:30 +08:00
Kazu Hirata	be37475897	[Transforms/IPO] Use range-based for loops (NFC)	2021-02-03 20:41:20 -08:00
Nico Weber	b995314143	Revert "[InstrProfiling] Use !associated metadata for counters, data and values" This reverts commit `97ba5cde52`. Still breaks tests: https://reviews.llvm.org/D76802#2540647	2021-02-03 19:14:34 -05:00
Arthur Eubanks	f020544601	[NewPM][HelloWorld] Move HelloWorld to Utils To prevent creating a new component, which creates a new library. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D95907	2021-02-03 12:59:40 -08:00
Rong Xu	b8f13db5b7	[SampleFDO][NFC] Detach SampleProfileLoader from SampleCoverageTracker This patch detaches SampleProfileLoader from class SampleCoverageTracker. We plan to move SampleProfileLoader to a template class. This would remain SampleCoverageTracker as a class. Also make callsiteIsHot() as a file static function. Differential Revision: https://reviews.llvm.org/D95823	2021-02-03 11:38:04 -08:00
Florian Hahn	daaa0e3501	[VPlan] Manage induction value creation using VPValues. This patch updates the induction value creation to use VPValues of recipes to map the created values. This should bring is one step closer to being able to optimize induction recipes directly in VPlan. Currently widenIntOrFpInduction also generates vector values for a cast of the induction, if it exists. Make this explicit by adding the cast instruction to the values defined by the recipe. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D92284	2021-02-03 17:45:03 +00:00
David Sherwood	d4626eb0bd	[VPlan][NFC] Introduce constructors for VPIteration This patch adds constructors to VPIteration as a cleaner way of initialising the struct and replaces existing constructions of the form: {Part, Lane} with VPIteration(Part, Lane) I have also added a default constructor, which is used by VPlan.cpp when deciding whether to replicate a block or not. This refactoring will be required in a later patch that adds more members and functions to VPIteration. Differential Revision: https://reviews.llvm.org/D95676	2021-02-03 08:52:27 +00:00
Petr Hosek	97ba5cde52	[InstrProfiling] Use !associated metadata for counters, data and values C identifier name input sections such as __llvm_prf_* are GC roots so they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the C identifier name semantics. The !associated metadata may be attached to a global object declaration with a single argument that references another global object, and it gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded by the linker, setting up !associated metadata allows linker to discard counters, data and values associated with that function symbol. Note that !associated metadata is only supported by ELF, it does not have any effect on non-ELF targets. Differential Revision: https://reviews.llvm.org/D76802	2021-02-02 23:19:51 -08:00
Kazu Hirata	dc3d5453bc	[Transforms/Utils] Use range-based for loops (NFC)	2021-02-02 22:52:47 -08:00
Florian Hahn	d8e90716df	[ConstraintElimination] Skip pointer casts. We should be able to look through pointer casts that do not impact the value.	2021-02-02 21:25:29 +00:00
Hongtao Yu	3d89b3cbec	[CSSPGO] Introducing distribution factor for pseudo probe. Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count. This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes. A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead. Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D93264	2021-02-02 11:55:01 -08:00
Fangrui Song	51da12680f	[ConstraintElimination] Fix -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off build	2021-02-02 10:23:14 -08:00
Jeroen Dobbelaere	50c523a9d4	[InlineFunction] Only update noalias scopes once for an instruction. Inlining sometimes maps different instructions to be inlined onto the same instruction. We must ensure to only remap the noalias scopes once. Otherwise the scope might disappear (at best). This patch ensures that we only replace scopes for which the mapping is known. This approach is preferred over tracking which instructions we already handled in a SmallPtrSet, as that one will need more memory. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95862	2021-02-02 17:57:10 +01:00
Florian Hahn	3e09bc2500	[ConstraintElimination] Add nicer way to dump constraints (NFC). Use ConstraintSystem::dump(Names) to display the result of decomposing a condition.	2021-02-02 16:36:45 +00:00
Wenlei He	1645f465be	[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path. This is resubmit of D95024, with build break and overtighten assertion fixed. Test Plan:	2021-02-02 07:55:08 -08:00
Roman Lebedev	485c4b552b	[InstCombine] Host inversion out of ashr's value operand (PR48995) This is a yet another hint that we will eventually need InstCombineInverter, which would consistently sink inversions, but but for that we'll need to consistently hoist inversions where possible, so let's do that here. Example of a proof: https://alive2.llvm.org/ce/z/78SbDq See https://bugs.llvm.org/show_bug.cgi?id=48995	2021-02-02 17:56:43 +03:00
Tom Weaver	4f1320b77d	Revert "[InstrProfiling] Use !associated metadata for counters, data and values" This reverts commit `df3e39f60b`. introduced failing test instrprof-gc-sections.c causing build bot to fail: http://lab.llvm.org:8011/#/builders/53/builds/1184	2021-02-02 14:19:31 +00:00
Sander de Smalen	3d3ca8f8eb	NFC: Migrate SpeculateAroundPHIs to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: ctetreau Differential Revision: https://reviews.llvm.org/D95353	2021-02-02 13:32:45 +00:00
Sander de Smalen	00da322788	NFC: Migrate SimpleLoopUnswitch to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D95352	2021-02-02 13:32:44 +00:00
Adrian Kuegel	48ca6da9d2	Revert "[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline" This reverts commit `9a03058d63`.	2021-02-02 11:51:04 +01:00
Adrian Kuegel	3a65ec4bf9	Revert "Fix build break from D95024" This reverts commit `09cd849fde`.	2021-02-02 11:51:04 +01:00
David Sherwood	d4d4ceeb8f	[SVE][LoopVectorize] Add masked load/store and gather/scatter support for SVE This patch updates IRBuilder::CreateMaskedGather/Scatter to work with ScalableVectorType and adds isLegalMaskedGather/Scatter functions to AArch64TargetTransformInfo. In addition I've fixed up isLegalMaskedLoad/Store to return true for supported scalar types, since this is what the vectorizer asks for. In LoopVectorize.cpp I've changed LoopVectorizationCostModel::getInterleaveGroupCost to return an invalid cost for scalable vectors, since currently this relies upon using shuffle vector for reversing vectors. In addition, in LoopVectorizationCostModel::setCostBasedWideningDecision I have assumed that the cost of scalarising memory ops is infinitely expensive. I have added some simple masked load/store and gather/scatter tests, including cases where we use gathers and scatters for conditional invariant loads and stores. Differential Revision: https://reviews.llvm.org/D95350	2021-02-02 09:52:39 +00:00
Wenlei He	09cd849fde	Fix build break from D95024	2021-02-02 01:01:12 -08:00
Wenlei He	9a03058d63	[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path. Test Plan: Differential Revision: https://reviews.llvm.org/D95024	2021-02-02 00:34:06 -08:00
Wenlei He	6bae5973c4	[CSSPGO] Call site prioritized inlining for sample PGO This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`). Motivation With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO. - It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat. - The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately. - It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO. New FDO Inliner Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed. - Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655. - Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check. - Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites. - BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path. Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO). Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO. Tunings and knobs I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO. Results Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it). Differential Revision: https://reviews.llvm.org/D94001	2021-02-01 23:46:34 -08:00
Gil Rapaport	d475030dc2	[SCEV] Apply loop guards to divisibility tests Extend applyLoopGuards() to take into account conditions/assumes proving some value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the loop unroller and the loop vectorizer identify more loops as not requiring remainder loops. Differential Revision: https://reviews.llvm.org/D95521	2021-02-02 08:09:39 +02:00
Petr Hosek	df3e39f60b	[InstrProfiling] Use !associated metadata for counters, data and values C identifier name input sections such as __llvm_prf_* are GC roots so they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the C identifier name semantics. The !associated metadata may be attached to a global object declaration with a single argument that references another global object, and it gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded by the linker, setting up !associated metadata allows linker to discard counters, data and values associated with that function symbol. Note that !associated metadata is only supported by ELF, it does not have any effect on non-ELF targets. Differential Revision: https://reviews.llvm.org/D76802	2021-02-01 15:01:43 -08:00
Hongtao Yu	224fee8219	[CSSPGO] Tweaking inlining with pseudo probes. Fixing up a couple places where `getCallSiteIdentifier` is needed to support pseudo-probe-based callsites. Also fixing an issue in the extbinary profile reader where the metadata section is not fully scanned based on the number of profiles loaded only for the current module. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95791	2021-02-01 13:56:40 -08:00
Sanjay Patel	bbed5f2f8a	[LoopVectorize] improve IR fast-math-flags propagation in reductions This is another step (see D95452) towards correcting fast-math-flags bugs in vector reductions. There are multiple bugs visible in the test diffs, and this is still not working as it should. We still use function attributes (rather than FMF) to drive part of the logic, but we are not checking for the correct FP function attributes. Note that FMF may not be propagated optimally on selects (example in https://llvm.org/PR35607 ). That's why I'm proposing to union the FMF of a fcmp+select pair and avoid regressions on existing vectorizer tests. Differential Revision: https://reviews.llvm.org/D95690	2021-02-01 16:21:36 -05:00
Florian Hahn	0b28d756af	[ConstraintElimination] Add support for EQ predicates. A == B map to A >= B && A <= B (https://alive2.llvm.org/ce/z/_dwxKn). This extends the constraint construction to return a list of constraints, which can be used to properly de-compose nested AND & OR.	2021-02-01 20:48:31 +00:00
Michael Holman	8bfef78722	[ConstantHoisting] Fix bug where constant materialization could insert into EH pad If the incoming block to a phi node is an EH pad, then we will materialize into an EH pad, which is not supposed to happen. To fix this, I added a check to see if incoming block of a phi node is an EH pad before using it as the insertion point. Differential Revision: https://reviews.llvm.org/D95019	2021-02-01 11:23:56 -08:00
Sanjay Patel	0ce2920f17	[InstCombine] try to narrow min/max intrinsics with constant operand The constant trunc/ext may not be the optimal pre-condition, but I think that handles the common cases. Example of Alive2 proof: https://alive2.llvm.org/ce/z/sREeLC This is another step towards canonicalizing to the intrinsics. Narrowing was identified as source of potential regression for abs(), so we need to handle this for min/max - see: https://llvm.org/PR48816 If this is not enough, we could process intrinsics in the trunc-driven matching in canEvaluateTruncated().	2021-02-01 13:44:13 -05:00
Florian Hahn	ce190e4144	[ConstraintElimination] Negate IR condition directly. Instead of using ConstraintSystem::negate when adding new constraints, flip the condition in IR. The main advantage is that EQ predicates can be represented by 2 constraints, which makes negating based on the constraint tricky. The IR condition can easily negated.	2021-02-01 17:21:40 +00:00
Sander de Smalen	bf294953e7	NFC: Migrate SimplifyCFG to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D95351	2021-02-01 16:14:05 +00:00
Sander de Smalen	880b64aa22	[SimplifyCFG] NFC: Rename static methods to clang-tidy standards. This patch is a precursor to D95351, which changes the signature of these methods.	2021-02-01 16:14:05 +00:00
Cullen Rhodes	8cda227432	[LV] Fix crash when computing max VF too early D90687 introduced a crash: llvm::LoopVectorizationCostModel::computeMaxVF(llvm::ElementCount, unsigned int): Assertion `WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() && "No decisions should have been taken at this point"' failed. when compiling the following C code: typedef struct { char a; } b; b *c; int d, e; int f() { int g = 0; for (; d; d++) { e = 0; for (; e < c[d].a; e++) g++; } return g; } with: clang -Os -target hexagon -mhvx -fvectorize -mv67 testcase.c -S -o - This occurred since prior to D90687 computeFeasibleMaxVF would only be called in computeMaxVF when a scalar epilogue was allowed, but now it's always called. This causes the assert above since computeFeasibleMaxVF collects all viable VFs larger than the default MaxVF, and for each VF calculates the register usage which results in analysis being done the assert above guards against. This can occur in computeFeasibleMaxVF if TTI.shouldMaximizeVectorBandwidth and this target hook is implemented in the hexagon backend to always return true. Reported by @iajbar. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94869	2021-02-01 12:14:59 +00:00
Sander de Smalen	3b8a1d581e	NFC: Migrate SpeculativeExecution to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D95356	2021-02-01 12:13:23 +00:00
Florian Hahn	a9583a1923	[LoopUnswitch] Pacify compiler warnings. Attempt to fix some compiler warnings on some bots after `b8c81fa5c7`.	2021-02-01 09:13:06 +00:00
Florian Hahn	b8c81fa5c7	[LoopUnswitch] Add shortcut if unswitched path is a no-op. If we determine that the invariant path through the loop has no effects, we can directly branch to the exit block, instead to unswitching first. Besides avoiding some extra work (unswitching first, then deleting the loop again) this allows to be more aggressive than regular unswitching with respect to cost-modeling. This approach should always be be desirable. This is similar in spirit to D93734, just that it uses the previously added checks for loop-unswitching. I tried to add the required no-op checks from scratch, as we only check a subset of the loop. There is potential to unify the checks with LoopDeletion, at the cost of adding a predicate whether a block should be considered. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95468	2021-02-01 09:03:30 +00:00
Jeroen Dobbelaere	80cdd30eb9	[LoopPeel] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed. The reduction of a sanitizer build failure when enabling the dominance check (D95335) showed that loop peeling also needs to take care of scope duplication, just like loop unrolling (D92887). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95544	2021-02-01 10:01:17 +01:00
Kazu Hirata	3d1200b9f6	[llvm] Drop unnecessary const from return types (NFC) Identified with const-return-type.	2021-01-31 10:23:43 -08:00
Florian Hahn	39486753d5	[ConstraintElimination] Verify CS and DFSInStack are in sync.(NFC) After the main loop is done, we should have one constraint per item in DFSInStack. Otherwise we added a constraint without a proper DFSInStack item.	2021-01-30 18:27:04 +00:00
Florian Hahn	10c57268c0	[LoopUnswitch] Properly update MSSA if header has non-clobbering stores. This patch fixes updating MemorySSA if the header contains memory defs that do not clobber a duplicated instruction. We need to find the first defining access outside the loop body and use that as defining access of the duplicated instruction. This fixes a crash caused by `bee486851c`.	2021-01-30 13:51:05 +00:00
Kazu Hirata	8ed1636184	[llvm] Use isa instead of dyn_cast (NFC)	2021-01-29 23:23:37 -08:00
Roman Lebedev	c2534a7097	[ShadowStackGCLowering] Preserve Dominator Tree, if avaliable This doesn't help avoid any Dominator Tree recalculations just yet, there's one more pass to go..	2021-01-30 01:14:51 +03:00
Roman Lebedev	a78d8feb48	[LowerConstantIntrinsics] Preserve Dominator Tree, if avaliable	2021-01-30 01:14:50 +03:00
Sriraman Tallam	9a81a4ef79	Emit metadata when instr. profiles hash mismatch occurs. This patch emits "instr_prof_hash_mismatch" function annotation metadata if there is a hash mismatch while applying instrumented profiles. During the PGO optimized build using instrumented profiles, if the CFG of the function has changed since generating the profile, a hash mismatch is encountered. This patch emits this information as annotation metadata. We plan to use this with Propeller which is done at the machine IR level. Propeller is usually applied on top of PGO and a hash mismatch during PGO could be used to detect source drift. Differential Revision: https://reviews.llvm.org/D95495	2021-01-29 12:56:01 -08:00
Florian Hahn	f3a710cade	[LTO] Update splitCodeGen to take a reference to the module. (NFC) splitCodeGen does not need to take ownership of the module, as it currently clones the original module for each split operation. There is an ~4 year old fixme to change that, but until this is addressed, the function can just take a reference to the module. This makes the transition of LTOCodeGenerator to use LTOBackend a bit easier, because under some circumstances, LTOCodeGenerator needs to write the original module back after codegen. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95222	2021-01-29 11:53:11 +00:00
Yang Fan	59bd2068e9	[NFC][ScalarizeMaskedMemIntrin] Fix unused variable warning GCC warning: ``` /llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp: In function ‘void scalarizeMaskedStore(llvm::CallInst, llvm::DomTreeUpdater, bool&)’: /llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp:295:15: warning: variable ‘IfBlock’ set but not used [-Wunused-but-set-variable] 295 \| BasicBlock IfBlock = CI->getParent(); \| ^~~~~~~ /llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp: In function ‘void scalarizeMaskedScatter(llvm::CallInst, llvm::DomTreeUpdater, bool&)’: /llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp:555:15: warning: variable ‘IfBlock’ set but not used [-Wunused-but-set-variable] 555 \| BasicBlock IfBlock = CI->getParent(); \| ^~~~~~~ ```	2021-01-29 15:15:58 +08:00
Roman Lebedev	056385921d	[ScalarizeMaskedMemIntrin] Preserve Dominator Tree, if avaliable This de-pessimizes the arguably more usual case of no masked mem intrinsics, and gets rid of one more Dominator Tree recalculation. As per llvm/test/CodeGen/X86/opt-pipeline.ll, there's one more Dominator Tree recalculation left, we could get rid of.	2021-01-29 01:11:36 +03:00
Roman Lebedev	577fdcaa93	[PartiallyInlineLibCalls] Preserve Dominator Tree, if avaliable This doesn't get rid of any Dominator Tree recalculations just yet, there is one more pass to update..	2021-01-29 01:11:36 +03:00
Roman Lebedev	573f74117b	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedCompressStore(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:35 +03:00
Roman Lebedev	2e4bb3f119	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedExpandLoad(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:35 +03:00
Roman Lebedev	e8efc03a1e	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedScatter(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:35 +03:00
Roman Lebedev	1356399a11	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedGather(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:34 +03:00
Roman Lebedev	22b8421156	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedStore(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:34 +03:00
Roman Lebedev	0ea45a412a	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedLoad(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:34 +03:00
Roman Lebedev	394685481c	[NFC][PartiallyInlineLibCalls] Port to SplitBlockAndInsertIfThen() This makes follow-up patch for Dominator Tree preservation somewhat more straight-forward.	2021-01-29 01:11:33 +03:00
Roman Lebedev	2de2d84ed0	[NFC][EntryExitInstrumenter] Mark Dominator Tree as preserved in legacy-PM too This is correctly handled in new-PM wrappers, but not in old-PM.	2021-01-29 01:11:33 +03:00
Adrian Prantl	62140d943c	Better document the limitations of coro::salvageDebugInfo() and fix a few edge cases that show up in the Swift compiler but weren't caught by the existing tests. Most notably the old code wasn't salvaging load operations correctly. The patch also gets rid of the LoadFromFramePtr argument and replaces it with a more generalized mechanism.	2021-01-28 09:53:19 -08:00
Roman Lebedev	8cfa963463	[SimplifyCFG] If provided, preserve Dominator Tree SimplifyCFG is an utility pass, and the fact that it does not preserve DomTree's, forces it's users to somehow workaround that, likely by not preserving DomTrees's themselves. Indeed, simplifycfg pass didn't know how to preserve dominator tree, it took me just under a month (starting with `e113317958`) do rectify that, now it fully knows how to, there's likely some problems with that still, but i've dealt with everything i can spot so far. I think we now can flip the switch. Note that this is functionally an NFC change, since this doesn't change the users to pass in the DomTree, that is a separate question. Reviewed By: kuhar, nikic Differential Revision: https://reviews.llvm.org/D94827	2021-01-28 14:11:34 +03:00
Yang Fan	8644eb024b	[NFC][Transforms][Coroutines] Remove unused variable	2021-01-28 16:42:30 +08:00
Kazu Hirata	0da15ea581	[llvm] Use append_range (NFC)	2021-01-27 23:25:41 -08:00
Hongtao Yu	7e99bddfea	[CSSPGO] Support of CS profiles in extended binary format. This change brings up support of context-sensitive profiles in the format of extended binary. Existing sample profile reader/writer/merger code is being tweaked to reflect the fact of bracketed input contexts, like (`[...]`). The paired brackets are also needed in extbinary profiles because we don't yet have an otherwise good way to tell calling contexts apart from regular function names since the context delimiter `@` can somehow serve as a part of the C++ mangled names. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95547	2021-01-27 21:29:46 -08:00
Teresa Johnson	1487747e99	[LTO] Prevent devirtualization for symbols dynamically exported Identify dynamically exported symbols (--export-dynamic[-symbol=], --dynamic-list=, or definitions needed to preempt shared objects) and prevent their LTO visibility from being upgraded. This helps avoid use of whole program devirtualization when there may be overrides in dynamic libraries. Differential Revision: https://reviews.llvm.org/D91583	2021-01-27 15:54:13 -08:00
Sanjay Patel	ab93c18c12	[LoopVectorize] use IR fast-math-flags exclusively (not FP function attributes) I am trying to untangle the fast-math-flags propagation logic in the vectorizers (see `a6f022127` for SLP). The loop vectorizer has a mix of checking FP function attributes, IR-level FMF, and just wrong assumptions. I am trying to avoid regressions while fixing this, and I think the IR-level logic is good enough for that, but it's hard to say for sure. This would be the 1st step in the clean-up. The existing test that I changed to include 'fast' actually shows a miscompile: the function only had the equivalent of nnan, but we created new instructions that had fast (all FMF set). This is similar to the example in https://llvm.org/PR35538 Differential Revision: https://reviews.llvm.org/D95452	2021-01-27 14:17:11 -05:00
Fangrui Song	54fb3ca96e	[ThinLTO] Add Visibility bits to GlobalValueSummary::GVFlags Imported functions and variable get the visibility from the module supplying the definition. However, non-imported definitions do not get the visibility from (ELF) the most constraining visibility among all modules (Mach-O) the visibility of the prevailing definition. This patch * adds visibility bits to GlobalValueSummary::GVFlags * computes the result visibility and propagates it to all definitions Protected/hidden can imply dso_local which can enable some optimizations (this is stronger than GVFlags::DSOLocal because the implied dso_local can be leveraged for ELF -shared while default visibility dso_local has to be cleared for ELF -shared). Note: we don't have summaries for declarations, so for ELF if a declaration has the most constraining visibility, the result visibility may not be that one. Differential Revision: https://reviews.llvm.org/D92900	2021-01-27 10:43:51 -08:00
Florian Hahn	28410d17f5	[LoopUtils] Pass SCEVExpander instead SE to addRuntimeChecks. This gives the user control over which expander to use, which in turn allows the user to decide what to do with the expanded instructions. Used in D75980. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D94295	2021-01-27 17:36:19 +00:00
Petr Hosek	bb9eb19829	Support for instrumenting only selected files or functions This change implements support for applying profile instrumentation only to selected files or functions. The implementation uses the sanitizer special case list format to select which files and functions to instrument, and relies on the new noprofile IR attribute to exclude functions from instrumentation. Differential Revision: https://reviews.llvm.org/D94820	2021-01-26 17:13:34 -08:00
Adrian Prantl	0554541b44	Salvage debug info for function arguments in coro-split funclets. This patch improves the availability for variables stored in the coroutine frame by emitting an alloca to hold the pointer to the frame object and rewriting dbg.declare intrinsics to point inside the frame object using salvaged DIExpressions. Finally, a new alloca is created in the funclet to hold the FramePtr pointer to ensure that it is available throughout the entire function at -O0. This path also effectively reverts D90772. The testcase updates highlight nicely how every removed CHECK for a dbg.value is preceded by a new CHECK for a dbg.declare. Thanks to JunMa, Yifeng, and Bruno for their thoughtful reviews! Differential Revision: https://reviews.llvm.org/D93497 rdar://71866936	2021-01-26 15:01:26 -08:00
Bjorn Pettersson	a9bd3d37bd	[NewPM] Add ExtraVectorizerPasses support As it looks like NewPM generally is using SimpleLoopUnswitch instead of LoopUnswitch, this patch also use SimpleLoopUnswitch in the ExtraVectorizerPasses sequence (compared with LegacyPM which use the LoopUnswitch pass). Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D95457	2021-01-26 22:59:10 +01:00
Valery N Dmitriev	716b9dd0d8	[InstCombine] Preserve FMF for powi simplifications. Differential Revision: https://reviews.llvm.org/D95455	2021-01-26 13:26:06 -08:00
Petr Hosek	1e634f3952	Revert "Support for instrumenting only selected files or functions" This reverts commit `4edf35f11a` because the test fails on Windows bots.	2021-01-26 12:25:28 -08:00
Petr Hosek	4edf35f11a	Support for instrumenting only selected files or functions This change implements support for applying profile instrumentation only to selected files or functions. The implementation uses the sanitizer special case list format to select which files and functions to instrument, and relies on the new noprofile IR attribute to exclude functions from instrumentation. Differential Revision: https://reviews.llvm.org/D94820	2021-01-26 11:11:39 -08:00
Sanjay Patel	09b1c56366	[LoopUtils] do not initialize Cmp predicate unnecessarily; NFC The switch must set the predicate correctly; anything else should lead to unreachable/assert. I'm trying to fix FMF propagation here and the callers, so this is a preliminary cleanup.	2021-01-26 11:22:51 -05:00
Florian Hahn	1272f16d14	[LoopUnswitch] Avoid partially unswitching too aggressively. This patch adds additional checks to avoid partial unswitching in cases where it won't be profitable, e.g. because the path directly exits the loop anyways.	2021-01-26 15:18:41 +00:00
Florian Hahn	35b3989a30	[Passes] Run peeling as part of simple/full loop unrolling. Loop peeling removes conditions from loop bodies that become invariant after a small number of iterations. When triggered, this leads to fewer compares and possibly PHIs in loop bodies, enabling further optimizations. The current cost-model of loop peeling should be quite conservative/safe, i.e. only peel if a condition in the loop becomes known after peeling. For example, see PR47671, where loop peeling enables vectorization by removing a PHI the vectorizer does not understand. Granted, the loop-vectorizer could also be taught about constant PHIs, but loop peeling is likely to enable other optimizations as well. This has an impact on quite a few benchmarks from MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto, for example Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsVectorized Program base patch diff test-suite...ve-susan/automotive-susan.test 8.00 9.00 12.5% test-suite...nal/skidmarks10/skidmarks.test 35.00 31.00 -11.4% test-suite...lications/sqlite3/sqlite3.test 41.00 43.00 4.9% test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test 25.00 26.00 4.0% test-suite...006/450.soplex/450.soplex.test 88.00 89.00 1.1% test-suite...TimberWolfMC/timberwolfmc.test 120.00 119.00 -0.8% test-suite.../CINT2006/403.gcc/403.gcc.test 215.00 216.00 0.5% test-suite...006/447.dealII/447.dealII.test 957.00 958.00 0.1% test-suite...ternal/HMMER/hmmcalibrate.test 75.00 75.00 0.0% Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsAnalyzed Program base patch diff test-suite...ks/Prolangs-C/agrep/agrep.test 440.00 434.00 -1.4% test-suite...nal/skidmarks10/skidmarks.test 312.00 308.00 -1.3% test-suite...marks/7zip/7zip-benchmark.test 6399.00 6323.00 -1.2% test-suite...lications/minisat/minisat.test 134.00 135.00 0.7% test-suite...rks/FreeBench/pifft/pifft.test 295.00 297.00 0.7% test-suite...TimberWolfMC/timberwolfmc.test 1879.00 1869.00 -0.5% test-suite...pplications/treecc/treecc.test 689.00 691.00 0.3% test-suite...T2000/300.twolf/300.twolf.test 1593.00 1597.00 0.3% test-suite.../Benchmarks/Bullet/bullet.test 1394.00 1392.00 -0.1% test-suite...ications/JM/ldecod/ldecod.test 1431.00 1429.00 -0.1% test-suite...6/464.h264ref/464.h264ref.test 2229.00 2230.00 0.0% test-suite...lications/sqlite3/sqlite3.test 2590.00 2589.00 -0.0% test-suite...ications/JM/lencod/lencod.test 2732.00 2733.00 0.0% test-suite...006/453.povray/453.povray.test 3395.00 3394.00 -0.0% Note the -11% regression in number of loops vectorized for skidmarks. I suspect this corresponds to the fact that those loops are gone now (see the reduction in number of loops analyzed by LV). Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88471	2021-01-26 13:52:30 +00:00
Sergey Dmitriev	13cedcaf45	[llvm-link] Fix crash when materializing appending global This patch fixes llvm-link crash when materializing global variable with appending linkage and initializer that depends on another global with appending linkage. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D95329	2021-01-25 18:08:07 -08:00
modimo	ce7f9cdb50	[InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks This change leverages the work done in D83743 to replay in the SampleProfile inliner to also be used in the CGSCC inliner. NOTE: currently restricted to non-ML advisors only. The added switch `-cgscc-inline-replay=<remarks file>` will replay the inlining decisions in that file where the remarks file is generated via `-Rpass=inline`. The aim here is to make it easier to analyze changes that would modify inlining heuristics to be separated from this behavior. Doing so allows easier examination of assembly and runtime behavior compared to the baseline rather than trying to dig through the large churn caused by inlining. In LTO compilation, since inlining is done twice you can separately specify replay by passing the flag to the FE (`-cgscc-inline-replay=`) and to the linker (`-Wl,cgscc-inline-replay=`) with the remarks generated from their respective places. Testing on mysqld by comparing the inline decisions between base (generates remarks.txt) and diff (replay using identical input/tools with remarks.txt) and examining the inlining sites with `diff` shows 14,000 mismatches out of 247,341 for a ~94% replay accuracy. I believe this gap can be narrowed further though for the general case we may never achieve full accuracy. For my personal use, this is close enough to be representative: I set the baseline as the one generated by the replay on identical input/toolset and compare that to my modified input/toolset using the same replay. Testing: ninja check-llvm newly added test correctly replays CGSCC inlining decisions Reviewed By: mtrofin, wenlei Differential Revision: https://reviews.llvm.org/D94334	2021-01-25 15:38:57 -08:00
Nikita Popov	835104a114	[LSR] Drop potentially invalid nowrap flags when switching to post-inc IV (PR46943) When LSR converts a branch on the pre-inc IV into a branch on the post-inc IV, the nowrap flags on the addition may no longer be valid. Previously, a poison result of the addition might have been ignored, in which case the program was well defined. After branching on the post-inc IV, we might be branching on poison, which is undefined behavior. Fix this by discarding nowrap flags which are not present on the SCEV expression. Nowrap flags on the SCEV expression are proven by SCEV to always hold, independently of how the expression will be used. This is essentially the same fix we applied to IndVars LFTR, which also performs this kind of pre-inc to post-inc conversion. I believe a similar problem can also exist for getelementptr inbounds, but I was not able to come up with a problematic test case. The inbounds case would have to be addressed in a differently anyway (as SCEV does not track this property). Fixes https://bugs.llvm.org/show_bug.cgi?id=46943. Differential Revision: https://reviews.llvm.org/D95286	2021-01-25 23:13:48 +01:00
Richard Smith	925ae8c790	Revert "[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV" This reverts commit `53176c1680`, which introduceed a layering violation. LLVM's IR library can't include headers from Analysis.	2021-01-25 13:53:38 -08:00
Akira Hatanaka	53176c1680	[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end annotates calls with attribute "clang.arc.rv"="retain" or "clang.arc.rv"="claim", which indicates the call is implicitly followed by a marker instruction and a retainRV/claimRV call that consumes the call result. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the annotated calls in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the annotated calls. It doesn't remove the attribute on the call since the backend needs it to emit the marker instruction. The retainRV/claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes the autoreleaseRV call in the callee that returns the result if nothing in the callee prevents it from being paired up with the calls annotated with "clang.arc.rv"="retain/claim" in the caller. If the call is annotated with "claim", a release call is inserted since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the attributes to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV call returning the callee result, which makes it impossible to pair it up with the retainRV or claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the call is annotated with "retain" and does nothing if it's annotated with "claim". - This patch teaches dead argument elimination pass not to change the return type of a function if any of the calls to the function are annotated with attribute "clang.arc.rv". This is necessary since the pass can incorrectly determine nothing in the IR uses the function return, which can happen since the front-end no longer explicitly emits retainRV/claimRV calls in the IR, and change its return type to 'void'. Future work: - Use the attribute on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the attributes. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-01-25 11:57:08 -08:00
Florian Hahn	76afbf60ed	[VPlan] Replace uses with new value in VPInstructionsToVPRecipe (NFC). Now that VPRecipeBase inherits from VPDef, we can always use the new VPValue for replacement, if the recipe defines one. Given the recipes that are supported at the moment, all new recipes must have either 0 or 1 defined values.	2021-01-25 19:38:08 +00:00
Nick Desaulniers	d36812892c	[GVN] do not repeat PRE on failure to split critical edge Fixes an infinite loop encountered in GVN. GVN will delay PRE if it encounters critical edges, attempt to split them later via calls to SplitCriticalEdge(), then restart. The caller of GVN::splitCriticalEdges() assumed a return value of true meant that critical edges were split, that the IR had changed, and that PRE should be re-attempted, upon which we loop infinitely. This was exposed after D88438, by compiling the Linux kernel for s390, but the test case is reproducible on x86. Fixes: https://github.com/ClangBuiltLinux/linux/issues/1261 Reviewed By: void Differential Revision: https://reviews.llvm.org/D94996	2021-01-25 11:23:44 -08:00
Wei Mi	c9cd9a0066	[SampleFDO] Report error when reading a bad/incompatible profile instead of turning off SampleFDO silently. Currently sample loader pass turns off SampleFDO optimization silently when it sees error in reading the profile. This behavior will defeat the tests which could have caught those bad/incompatible profile problems. This patch change the behavior to report error. Differential Revision: https://reviews.llvm.org/D95269	2021-01-25 10:28:23 -08:00
Xun Li	17c3538aef	Revert "Fix unused variable in CoroFrame.cpp when building Release with GCC 10" This reverts commit `ff5e896425`.	2021-01-25 08:37:45 -08:00
Florian Hahn	3201274dea	[VPlan] Handle scalarized values in VPTransformState. This patch adds plumbing to handle scalarized values directly in VPTransformState. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D92282	2021-01-25 14:21:56 +00:00
Sanjay Patel	09a136bcc6	[InstCombine] narrow min/max intrinsics with extended inputs We can sink extends after min/max if they match and would not change the sign-interpreted compare. The only combo that doesn't work is zext+smin/smax because the zexts could change a negative number into positive: https://alive2.llvm.org/ce/z/D6sz6J Sext+umax/umin works: define i32 @src(i8 %x, i8 %y) { %0: %sx = sext i8 %x to i32 %sy = sext i8 %y to i32 %m = umax i32 %sx, %sy ret i32 %m } => define i32 @tgt(i8 %x, i8 %y) { %0: %m = umax i8 %x, %y %r = sext i8 %m to i32 ret i32 %r } Transformation seems to be correct!	2021-01-25 07:52:50 -05:00
Sander de Smalen	171d12489f	[SLPVectorizer] NFC: Migrate getVectorCallCosts to use InstructionCost. This change also changes getReductionCost to return InstructionCost, and it simplifies two expressions by removing a redundant 'isValid' check.	2021-01-25 12:27:01 +00:00
Nikita Popov	8b9df70bf7	[Utils] Use NoAliasScopeDeclInst in a few more places (NFC) In the cloning infrastructure, only track an MDNode mapping, without explicitly storing the Metadata mapping, same as is done during inlining. This makes things slightly simpler.	2021-01-24 16:24:11 +01:00
Sanjay Patel	77adbe6a8c	[SLP] fix fast-math requirements for fmin/fmax reductions `a6f0221276` enabled intersection of FMF on reduction instructions, so it is safe to ease the check here. There is still some room to improve here - it looks like we have nearly duplicate flags propagation logic inside of the LoopUtils helper but it is limited targets that do not form reduction intrinsics (they form the shuffle expansion).	2021-01-24 08:55:56 -05:00
Jeroen Dobbelaere	dcc7706fcf	[InstCombine] Remove unused llvm.experimental.noalias.scope.decl A @llvm.experimental.noalias.scope.decl is only useful if there is !alias.scope and !noalias metadata that uses the declared scope. When that is not the case for at least one of the two, the intrinsic call can as well be removed. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95141	2021-01-24 13:55:50 +01:00
Jeroen Dobbelaere	659c7bcde6	[LoopRotate] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed Similar to D92887, LoopRotation also needs duplicate the noalias scopes when rotating a `@llvm.experimental.noalias.scope.decl` across a block boundary. This is based on the version from the Full Restrict paches (D68511). The problem it fixes also showed up in Transforms/Coroutines/ex5.ll after D93040 (when enabling strict checking with -verify-noalias-scope-decl-dom). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94306	2021-01-24 13:53:13 +01:00
Jeroen Dobbelaere	774629641b	[LoopUnroll] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed This is a fix for https://bugs.llvm.org/show_bug.cgi?id=39282. Compared to D90104, this version is based on part of the full restrict patched (D68484) and uses the `@llvm.experimental.noalias.scope.decl` intrinsic to track the location where !noalias and !alias.scope scopes have been introduced. This allows us to only duplicate the scopes that are really needed. Notes: - it also includes changes and tests from D90104 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D92887	2021-01-24 13:48:20 +01:00
Roman Lebedev	6f2753273e	[NFC][SimplifyCFG] Extract CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses() out of PerformBranchToCommonDestFolding() To be used in PerformValueComparisonIntoPredecessorFolding()	2021-01-24 00:54:55 +03:00
Roman Lebedev	67f9c87a65	[NFC][SimplifyCFG] Perform early-continue in FoldValueComparisonIntoPredecessors() per-pred loop	2021-01-24 00:54:54 +03:00
Roman Lebedev	a4e6c2e647	[NFC][SimplifyCFG] Extract PerformValueComparisonIntoPredecessorFolding() out of FoldValueComparisonIntoPredecessors() Less nested code is much easier to follow and modify.	2021-01-24 00:54:54 +03:00
Nikita Popov	c83cff45c7	[IR] Add NoAliasScopeDeclInst (NFC) Add an intrinsic type class to represent the llvm.experimental.noalias.scope.decl intrinsic, to make code working with it a bit nicer by hiding the metadata extraction from view.	2021-01-23 22:40:32 +01:00
Kazu Hirata	1238378f18	[llvm] Use pop_back_val (NFC)	2021-01-23 10:56:33 -08:00
Florian Hahn	d60b74c28a	[InstCombine] Set MadeIRChange in replaceInstUsesWith. Some utilities used by InstCombine, like SimplifyLibCalls, may add new instructions and replace the uses of a call, but return nullptr because the inserted call produces multiple results. Previously, the replaced library calls would get removed by InstCombine's deleter, but after `292077072e` this may not happen, if the willreturn attribute is missing. As a work-around, update replaceInstUsesWith to set MadeIRChange, if it replaces any uses. This catches the cases where it is used as replacer by utilities used by InstCombine and seems useful in general; updating uses will modify the IR. This fixes an expensive-check failure when replacing @__sinpif/@__cospifi with @__sincospif_sret.	2021-01-23 17:52:59 +00:00
Sanjay Patel	a6f0221276	[SLP] fix fast-math-flag propagation on FP reductions As shown in the test diffs, we could miscompile by propagating flags that did not exist in the original code. The flags required for fmin/fmax reductions will be fixed in a follow-up patch.	2021-01-23 11:17:20 -05:00
Florian Hahn	292077072e	[Local] Treat calls that may not return as being alive. With the addition of the `willreturn` attribute, functions that may not return (e.g. due to an infinite loop) are well defined, if they are not marked as `willreturn`. This patch updates `wouldInstructionBeTriviallyDead` to not consider calls that may not return as dead. This patch still provides an escape hatch for intrinsics, which are still assumed as willreturn unconditionally. It will be removed once all intrinsics definitions have been reviewed and updated. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94106	2021-01-23 16:05:14 +00:00
Roman Lebedev	022da61f6b	[SimplifyCFG] Change 'LoopHeaders' to be ArrayRef<WeakVH>, not a naked set, thus avoiding dangling pointers If i change it to AssertingVH instead, a number of existing tests fail, which means we don't consistently remove from the set when deleting blocks, which means newly-created blocks may happen to appear in that set if they happen to occupy the same memory chunk as did some block that was in the set originally. There are many places where we delete blocks, and while we could probably consistently delete from LoopHeaders when deleting a block in transforms located in SimplifyCFG.cpp itself, transforms located elsewhere (Local.cpp/BasicBlockUtils.cpp) also may delete blocks, and it doesn't seem good to teach them to deal with it. Since we at most only ever delete from LoopHeaders, let's just delegate to WeakVH to do that automatically. But to be honest, personally, i'm not sure that the idea behind LoopHeaders is sound.	2021-01-23 16:48:35 +03:00
Jeroen Dobbelaere	2b9a834c43	[InlineFunction] Use llvm.experimental.noalias.scope.decl for noalias arguments. Insert a llvm.experimental.noalias.scope.decl intrinsic that identifies where a noalias argument was inlined. This patch includes some refactorings from D90104. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93040	2021-01-23 12:10:57 +01:00
Zequan Wu	867bdfeff1	[InstCombine] remove incompatible attribute when simplifying some lib calls Like D95088, remove incompatible attribute in more lib calls. Differential Revision: https://reviews.llvm.org/D95278	2021-01-22 17:27:36 -08:00
Philip Reames	ef51eed37b	[LoopDeletion] Handle inner loops w/untaken backedges This builds on the restricted after initial revert form of D93906, and adds back support for breaking backedges of inner loops. It turns out the original invalidation logic wasn't quite right, specifically around the handling of LCSSA. When breaking the backedge of an inner loop, we can cause blocks which were in the outer loop only because they were also included in a sub-loop to be removed from both loops. This results in the exit block set for our original parent loop changing, and thus a need for new LCSSA phi nodes. This case happens when the inner loop has an exit block which is also an exit block of the parent, and there's a block in the child which reaches an exit to said block without also reaching an exit to the parent loop. (I'm describing this in terms of the immediate parent, but the problem is general for any transitive parent in the nest.) The approach implemented here involves a potentially expensive LCSSA rebuild. Perf testing during review didn't show anything concerning, but we may end up needing to revert this if anyone encounters a practical compile time issue. Differential Revision: https://reviews.llvm.org/D94378	2021-01-22 16:31:29 -08:00
Francis Visoiu Mistrih	0cc38acfc4	[Matrix] Propagate shape information through fneg Similar to binary operators like fadd/fmul/fsub, propagate shape info through unary operators (fneg is the only one?). Differential Revision: https://reviews.llvm.org/D95252	2021-01-22 14:34:28 -08:00
Roman Lebedev	1742203844	[SimplifyCFG] FoldBranchToCommonDest(): re-lift restrictions on liveout uses of bonus instructions I have previously tried doing that in `b33fbbaa34` / `d38205144f`, but eventually it was pointed out that the approach taken there was just broken wrt how the uses of bonus instructions are updated to account for the fact that they should now use either bonus instruction or the cloned bonus instruction. In particluar, all that manual handling of PHI nodes in successors was just wrong. But, the fix is actually much much simpler than my initial approach: just tell SSAUpdate about both instances of bonus instruction, and let it deal with all the PHI handling. Alive2 confirms that the reproducers from the original bugs (@pr48450*) are now handled correctly. This effectively reverts commit `59560e8589`, effectively relanding `b33fbbaa34`.	2021-01-23 01:29:05 +03:00
Roman Lebedev	eae1cc0de5	[NFC][SimplifyCFG] PerformBranchToCommonDestFolding(): move instruction cloning to after CFG update This simplifies follow-up patch, and is NFC otherwise.	2021-01-23 01:29:04 +03:00
Roman Lebedev	9bd8bcf993	[NFC][SimplifyCFG] PerformBranchToCommonDestFolding(): fix instruction name preservation NewBonusInst just took name from BonusInst, so BonusInst has no name, so BonusInst.getName() makes no sense. So we need to ask NewBonusInst for the name.	2021-01-23 01:29:03 +03:00
Shimin Cui	99a0aa07e9	[Analysis] Support AIX vec_malloc routines This is to support the memory routines vec_malloc, vec_calloc, vec_realloc, and vec_free. These routines manage memory that is 16-byte aligned. And they are only available on AIX. Differential Revision: https://reviews.llvm.org/D94710	2021-01-22 16:03:01 -05:00
Nikita Popov	45b259f995	[SimplifyLibCalls] Skip unused calls in sincos transform If the call result is unused, we should let it get DCEd rather than replacing it. Also, don't try to replace an existing sincos with another one (unless it's as part of combining sin and cos). This avoids an infinite combine loop if the calls are not DCEd as expected, which can happen with D94106 and lack of willreturn annotation in hand-crafted IR.	2021-01-22 20:57:13 +01:00
Sanjay Patel	411c144e4c	[InstCombine] narrow abs with sign-extended input In the motivating cases from https://llvm.org/PR48816 , we have a trailing trunc. But that is not required to reduce the abs width: https://alive2.llvm.org/ce/z/ECaz-p ...as long as we clear the int-min-is-poison bit (nsw). We have some existing tests that are affected, and I'm not sure what the overall implications are, but in general we favor narrowing operations over preserving nsw/nuw. If that causes problems, we could restrict this transform based on type (shouldChangeType() and/or vector vs. scalar). Differential Revision: https://reviews.llvm.org/D95235	2021-01-22 13:36:04 -05:00
Florian Hahn	86991d3231	[LoopUnswitch] Fix logic to avoid unswitching with atomic loads. The existing code did not deal with atomic loads correctly. Such loads are represented as MemoryDefs. Bail out on any MemoryAccess that is not a MemoryUse.	2021-01-22 15:10:12 +00:00
Arnold Schwaighofer	87b628dadd	[coro.async] Make sure we process async coroutines Because we were not looking for the llvm.coro.id.async intrinsic in the early coro pass which triggers follow-up passes we relied on the llvm.coro.end intrinsic being present. This might not be the case in functions that end in unreachable code. Differential Revision: https://reviews.llvm.org/D95144	2021-01-22 07:04:01 -08:00
Roman Lebedev	85e7578c6d	Revert "[NFCI-ish][SimplifyCFG] FoldBranchToCommonDest(): really don't deal with uncond branches" Does not build in XCode: http://green.lab.llvm.org/green/job/clang-stage1-RA/17963/consoleFull#-1704658317a1ca8a51-895e-46c6-af87-ce24fa4cd561 This reverts commit `aabed3718a`.	2021-01-22 17:37:11 +03:00
Roman Lebedev	d1a6f92fd5	[InstCombine] Fold `(~x) \| y` --> `~(x & (~y))` iff it is free to do so Iff we know we can get rid of the inversions in the new pattern, we can thus get rid of the inversion in the old pattern, this decreasing instruction count. Note that we could position this transformation as just hoisting of the `not` (still, iff y is freely negatible), but the test changes show a number of regressions, so let's not do that.	2021-01-22 17:23:54 +03:00
Roman Lebedev	79b0d21ce9	[InstCombine] Fold `(~x) & y` --> `~(x \| (~y))` iff it is free to do so Iff we know we can get rid of the inversions in the new pattern, we can thus get rid of the inversion in the old pattern, this decreasing instruction count.	2021-01-22 17:23:54 +03:00
Roman Lebedev	4ed0d8f2f0	[NFC][InstCombine] Extract freelyInvertAllUsersOf() out of canonicalizeICmpPredicate() I'd like to use it in an upcoming fold.	2021-01-22 17:23:53 +03:00
Roman Lebedev	efeb8caf8b	[NFC][SimplifyCFG] FoldBranchToCommonDest(): extract the actual transform into helper function I'm intentionally structuring it this way, so that the actual fold only does the fold, and no legality/correctness checks, all of which must be done by the caller. This allows for the fold code to be more compact and more easily grokable.	2021-01-22 17:23:53 +03:00
Roman Lebedev	b482560a59	[NFC][SimplifyCFG] FoldBranchToCommonDest(): extract check for destination sharing into a helper function As a follow-up, i'll extract the actual transform into a function, and this helper will be called from both places, so this avoids code duplication.	2021-01-22 17:23:53 +03:00
Roman Lebedev	7b89efb55e	[NFC][SimplifyCFG] FoldBranchToCommonDest(): somewhat better structure weight updating code Hoist the successor updating out of the code that deals with branch weight updating, and hoist the 'has weights' check from the latter, making code more consistent and easier to follow.	2021-01-22 17:23:41 +03:00
Roman Lebedev	256a035752	[NFC][SimplifyCFG] FoldBranchToCommonDest(): unclutter Cond/CondInPred handling We don't need those variables, we can just get the final value directly.	2021-01-22 17:23:11 +03:00
Roman Lebedev	aabed3718a	[NFCI-ish][SimplifyCFG] FoldBranchToCommonDest(): really don't deal with uncond branches While we already ignore uncond branches, we could still potentially end up with a conditional branches with identical destinations due to the visitation order, or because we were called as an utility. But if we have such a disguised uncond branch, we still probably shouldn't deal with it here.	2021-01-22 17:23:10 +03:00
Roman Lebedev	0895b836d7	[SimplifyCFG] FoldBranchToCommonDest(): don't deal with unconditional branches The case where BB ends with an unconditional branch, and has a single predecessor w/ conditional branch to BB and a single successor of BB is exactly the pattern SpeculativelyExecuteBB() transform deals with. (and in this case they both allow speculating only a single instruction) Well, or FoldTwoEntryPHINode(), if the final block has only those two predecessors. Here, in FoldBranchToCommonDest(), only a weird subset of that transform is supported, and it's glued on the side in a weird way. In particular, it took me a bit to understand that the Cond isn't actually a branch condition in that case, but just the value we allow to speculate (otherwise it reads as a miscompile to me). Additionally, this only supports for the speculated instruction to be an ICmp. So let's just unclutter FoldBranchToCommonDest(), and leave this transform up to SpeculativelyExecuteBB(). As far as i can tell, this shouldn't really impact optimization potential, but if it does, improving SpeculativelyExecuteBB() will be more beneficial anyways. Notably, this only affects a single test, but EarlyCSE should have run beforehand in the pipeline, and then FoldTwoEntryPHINode() would have caught it. This reverts commit rL158392 / commit `d33f4efbfd`.	2021-01-22 17:22:49 +03:00
Anton Rapetov	a4914dc1f2	[SLP] do not traverse constant uses Walking the use list of a Constant (particularly, ConstantData) is not scalable, since a given constant may be used by many instructinos in many functions in many modules. Differential Revision: https://reviews.llvm.org/D94713	2021-01-22 08:14:09 -05:00
David Sherwood	2e080eb00a	[SVE] Add support for scalable vectorization of loops with selects and cmps I have removed an unnecessary assert in LoopVectorizationCostModel::getInstructionCost that prevented a cost being calculated for select instructions when using scalable vectors. In addition, I have changed AArch64TTIImpl::getCmpSelInstrCost to only do special cost calculations for fixed width vectors and fall back to the base version for scalable vectors. I have added a simple cost model test for cmps and selects: test/Analysis/CostModel/sve-cmpsel.ll and some simple tests that show we vectorize loops with cmp and select: test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll Differential Revision: https://reviews.llvm.org/D95039	2021-01-22 09:48:13 +00:00
Xun Li	bd3ca6666d	[Inlining] Delete redundant optnone/alwaysinline check The same check is done in InlineCost: `8b0bd54d0e/llvm/lib/Analysis/InlineCost.cpp (L2537-L2552)` Also, doing a check on the callee here is confusing, because anything that deals with callee should be done in the inner loop where we proecss all calls from the same caller. Differential Revision: https://reviews.llvm.org/D95186	2021-01-21 18:38:10 -08:00
David Green	39db5753f9	[LV][ARM] Inloop reduction cost modelling This adds cost modelling for the inloop vectorization added in `745bf6cf44`. Up until now they have been modelled as the original underlying instruction, usually an add. This happens to works OK for MVE with instructions that are reducing into the same type as they are working on. But MVE's instructions can perform the equivalent of an extended MLA as a single instruction: %sa = sext <16 x i8> A to <16 x i32> %sb = sext <16 x i8> B to <16 x i32> %m = mul <16 x i32> %sa, %sb %r = vecreduce.add(%m) -> R = VMLADAV A, B There are other instructions for performing add reductions of v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64 (VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV). The i64 are particularly interesting as there are no native i64 add/mul instructions, leading to the i64 add and mul naturally getting very high costs. Also worth mentioning, under NEON there is the concept of a sdot/udot instruction which performs a partial reduction from a v16i8 to a v4i32. They extend and mul/sum the first four elements from the inputs into the first element of the output, repeating for each of the four output lanes. They could possibly be represented in the same way as above in llvm, so long as a vecreduce.add could perform a partial reduction. The vectorizer would then produce a combination of in and outer loop reductions to efficiently use the sdot and udot instructions. Although this patch does not do that yet, it does suggest that separating the input reduction type from the produced result type is a useful concept to model. It also shows that a MLA reduction as a single instruction is fairly common. This patch attempt to improve the costmodelling of in-loop reductions by: - Adding some pattern matching in the loop vectorizer cost model to match extended reduction patterns that are optionally extended and/or MLA patterns. This marks the cost of the reduction instruction correctly and the sext/zext/mul leading up to it as free, which is otherwise difficult to tell and may get a very high cost. (In the long run this can hopefully be replaced by vplan producing a single node and costing it correctly, but that is not yet something that vplan can do). - getExtendedAddReductionCost is added to query the cost of these extended reduction patterns. - Expanded the ARM costs to account for these expanded sizes, which is a fairly simple change in itself. - Some minor alterations to allow inloop reduction larger than the highest vector width and i64 MVE reductions. - An extra InLoopReductionImmediateChains map was added to the vectorizer for it to efficiently detect which instructions are reductions in the cost model. - The tests have some updates to show what I believe is optimal vectorization and where we are now. Put together this can greatly improve performance for reduction loop under MVE. Differential Revision: https://reviews.llvm.org/D93476	2021-01-21 21:03:41 +00:00
Sanjay Patel	2f03528f5e	[SLP] rename reduction variable to avoid shadowing; NFC The code structure can likely be improved now that 'OperationData' is gone.	2021-01-21 16:02:38 -05:00
Anton Rapetov	bfec9148a0	Scalar: Don't visit constants in findInnerReductionPhi in LoopInterchange In LoopInterchange, `findInnerReductionPhi()` looks for reduction variables, which cannot be constants. Update it to return early in that case. This also addresses a blocker for removing use-lists from ConstantData, whose users could be spread across arbitrary modules in the same LLVMContext. Differential Revision: https://reviews.llvm.org/D94712	2021-01-21 12:33:06 -08:00
Sanjay Patel	d77753381f	[SLP] simplify reduction matching This is NFC-intended and removes the "OperationData" class which had become nothing more than a recurrence (reduction) type. I adjusted the matching logic to distinguish instructions from non-instructions - that's all that the "IsLeafValue" member was keeping track of.	2021-01-21 14:58:57 -05:00
Nikita Popov	65fd034b95	[FunctionAttrs] Infer willreturn for functions without loops If a function doesn't contain loops and does not call non-willreturn functions, then it is willreturn. Loops are detected by checking for backedges in the function. We don't attempt to handle finite loops at this point. Differential Revision: https://reviews.llvm.org/D94633	2021-01-21 20:29:33 +01:00
Sanjay Patel	070af1b788	[InstCombine] avoid crashing on attribute propagation In https://llvm.org/PR48810 , we are crashing while trying to propagate attributes from mempcpy (returns void*) to memcpy (returns nothing - void). We can avoid the crash by removing known incompatible attributes for the void return type. I'm not sure if this goes far enough (should we just drop all attributes since this isn't the same function?). We also need to audit other transforms in LibCallSimplifier to make sure there are no other cases that have the same problem. Differential Revision: https://reviews.llvm.org/D95088	2021-01-21 08:13:26 -05:00
Florian Hahn	bee486851c	[LoopUnswitch] Implement first version of partial unswitching. This patch applies the idea from D93734 to LoopUnswitch. It adds support for unswitching on conditions that are only invariant along certain paths through a loop. In particular, it targets conditions in the loop header that depend on values loaded from memory. If either path from the true or false successor through the loop does not modify memory, perform partial loop unswitching. That is, duplicate the instructions feeding the condition in the pre-header. Then unswitch on the duplicated condition. The condition is now known in the unswitched version for the 'invariant' path through the original loop. On caveat of this approach is that one of the loops created can be partially unswitched again. To avoid this behavior, `llvm.loop.unswitch.partial.disable` metadata is added to the unswitched loops, to avoid subsequent partial unswitching. If that's the approach to go, I can move the code handling the metadata kind into separate functions. This increases the cases we unswitch quite a bit in SPEC2006/SPEC2000 & MultiSource. It also allows us to eliminate a dead loop in SPEC2017's omnetpp ``` Tests: 236 Same hash: 170 (filtered out) Remaining: 66 Metric: loop-unswitch.NumBranches Program base patch diff test-suite...000/255.vortex/255.vortex.test 2.00 23.00 1050.0% test-suite...T2006/401.bzip2/401.bzip2.test 7.00 55.00 685.7% test-suite :: External/Nurbs/nurbs.test 5.00 26.00 420.0% test-suite...s-C/unix-smail/unix-smail.test 1.00 3.00 200.0% test-suite.../Prolangs-C++/ocean/ocean.test 1.00 3.00 200.0% test-suite...tions/lambda-0.1.3/lambda.test 1.00 3.00 200.0% test-suite...yApps-C++/PENNANT/PENNANT.test 2.00 5.00 150.0% test-suite...marks/Ptrdist/yacr2/yacr2.test 1.00 2.00 100.0% test-suite...lications/viterbi/viterbi.test 1.00 2.00 100.0% test-suite...plications/d/make_dparser.test 12.00 24.00 100.0% test-suite...CFP2006/433.milc/433.milc.test 14.00 27.00 92.9% test-suite.../Applications/lemon/lemon.test 7.00 12.00 71.4% test-suite...ce/Applications/Burg/burg.test 6.00 10.00 66.7% test-suite...T2006/473.astar/473.astar.test 16.00 26.00 62.5% test-suite...marks/7zip/7zip-benchmark.test 78.00 121.00 55.1% ``` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93764	2021-01-21 09:46:41 +00:00
Kazu Hirata	e53472de68	[Transforms] Use llvm::append_range (NFC)	2021-01-20 21:35:54 -08:00
Kazu Hirata	8f5da41c4d	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-20 21:35:52 -08:00
Dávid Bolvanský	bb3f169b59	[BuildLibcalls, Attrs] Support more variants of C++'s new, add attributes for C++'s delete Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95095	2021-01-21 00:12:37 +01:00
Mircea Trofin	ccec2cf1d9	Reland "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor" This reverts commit `d97f776be5`. The original problem was due to build failures in shared lib builds. D95079 moved ImportedFunctionsInliningStatistics under Analysis, unblocking this.	2021-01-20 13:33:43 -08:00
Mircea Trofin	95ce32c787	[NFC] Move ImportedFunctionsInliningStatistics to Analysis This is related to D94982. We want to call these APIs from the Analysis component, so we can't leave them under Transforms. Differential Revision: https://reviews.llvm.org/D95079	2021-01-20 13:18:03 -08:00
Nikita Popov	1c6d1e57c1	[PredicateInfo] Handle logical and/or Teach PredicateInfo to handle logical and/or the same way as bitwise and/or. This allows handling logical and/or inside IPSCCP and NewGVN.	2021-01-20 21:03:07 +01:00
Nikita Popov	ca4ed1e7ae	[PredicateInfo] Generalize processing of conditions Branch/assume conditions in PredicateInfo are currently handled in a rather ad-hoc manner, with some arbitrary limitations. For example, an `and` of two `icmp`s will be handled, but an `and` of an `icmp` and some other condition will not. That also includes the case where more than two conditions and and'ed together. This patch makes the handling more general by looking through and/ors up to a limit and considering all kinds of conditions (though operands will only be taken for cmps of course). Differential Revision: https://reviews.llvm.org/D94447	2021-01-20 20:40:41 +01:00
Mircea Trofin	d97f776be5	Revert "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor" This reverts commit `e8aec763a5`.	2021-01-20 11:19:34 -08:00
Mircea Trofin	e8aec763a5	[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor When using 2 InlinePass instances in the same CGSCC - one for other mandatory inlinings, the other for the heuristic-driven ones - the order in which the ImportedFunctionStats would be output-ed would depend on the destruction order of the inline passes, which is not deterministic. This patch moves the ImportedFunctionStats responsibility to the InlineAdvisor to address this problem. Differential Revision: https://reviews.llvm.org/D94982	2021-01-20 11:07:36 -08:00
Dávid Bolvanský	16d6e85271	[BuildLibcalls] Mark some libcalls with inaccessiblememonly and inaccessiblemem_or_argmemonly Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94850	2021-01-20 19:45:23 +01:00
Sanjay Patel	c09be0d2a0	[SLP] reduce reduction code for checking vectorizable ops; NFC This is another step towards removing `OperationData` and fixing FMF matching/propagation bugs when forming reductions.	2021-01-20 11:14:48 -05:00
Sanjay Patel	1c54112a57	[SLP] refactor more reduction functions; NFC We were able to remove almost all of the state from OperationData, so these don't make sense as members of that class - just pass the RecurKind in as a param. More streamlining is possible, but I'm trying to avoid logic/typo bugs while fixing this. Eventually, we should not need the `OperationData` class.	2021-01-20 11:14:48 -05:00
Sanjay Patel	8590d24543	[SLP] move reduction createOp functions; NFC We were able to remove almost all of the state from OperationData, so these don't make sense as members of that class - just pass the RecurKind in as a param.	2021-01-20 11:14:48 -05:00
Joseph Tremoulet	40cd262c43	Loop peeling: check that latch is conditional branch Loop peeling assumes that the loop's latch is a conditional branch. Add a check to canPeel that explicitly checks for this, and testcases that otherwise fail an assertion when trying to peel a loop whose back-edge is a switch case or the non-unwind edge of an invoke. Reviewed By: skatkov, fhahn Differential Revision: https://reviews.llvm.org/D94995	2021-01-20 11:01:16 -05:00
Chuanqi Xu	c1bc7981ba	[Coroutine] Remain alignment information when merging frame variables Summary: This is to address bug48712. The solution in this patch is that when we want to merge two variable a into the storage frame of variable b only if the alignment of a is multiple of b. There may be other strategies. But now I think they are hard to handle and benefit little. Or we can implement them in the future. Test-plan: check-llvm Reviewers: jmorse, lxfind, junparser Differential Revision: https://reviews.llvm.org/D94891	2021-01-20 18:59:00 +08:00
David Sherwood	255a507716	[NFC][InstructionCost] Use InstructionCost in lib/Transforms/IPO/IROutliner.cpp In places where we call a TTI.getXXCost() function I have changed the code to use InstructionCost instead of unsigned. This is in preparation for later on when we will change the TTI interfaces to return InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D94427	2021-01-20 08:33:59 +00:00
Kazu Hirata	b023cdeacc	[llvm] Use llvm::all_of (NFC)	2021-01-19 20:19:17 -08:00
Kazu Hirata	8857202489	[llvm] Use llvm::find (NFC)	2021-01-19 20:19:14 -08:00
Juneyoung Lee	4479c0c2c0	Allow nonnull/align attribute to accept poison Currently LLVM is relying on ValueTracking's `isKnownNonZero` to attach `nonnull`, which can return true when the value is poison. To make the semantics of `nonnull` consistent with the behavior of `isKnownNonZero`, this makes the semantics of `nonnull` to accept poison, and return poison if the input pointer isn't null. This makes many transformations like below legal: ``` %p = gep inbounds %x, 1 ; % p is non-null pointer or poison call void @f(%p) ; instcombine converts this to call void @f(nonnull %p) ``` Instead, this semantics makes propagation of `nonnull` to caller illegal. The reason is that, passing poison to `nonnull` does not immediately raise UB anymore, so such program is still well defined, if the callee does not use the argument. Having `noundef` attribute there re-allows this. ``` define void @f(i8* %p) { ; functionattr cannot mark %p nonnull here anymore call void @g(i8* nonnull %p) ; .. because @g never raises UB if it never uses %p. ret void } ``` Another attribute that needs to be updated is `align`. This patch updates the semantics of align to accept poison as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90529	2021-01-20 11:31:23 +09:00
Wei Mi	21b1ad0340	[SampleFDO] Add the support to split the function profiles with context into separate sections. For ThinLTO, all the function profiles without context has been annotated to outline functions if possible in prelink phase. In postlink phase, profile annotation in postlink phase is only meaningful for function profile with context. If the profile is large, it is better to split the profile into two parts, one with context and one without, so the profile reading in postlink phase only has to read the part with context. To have the profile splitting, we extend the ExtBinary format to support different section arrangement. It will be flexible to add other section layout in the future without the need to create new class inheriting from ExtBinary class. Differential Revision: https://reviews.llvm.org/D94435	2021-01-19 15:16:19 -08:00
Alexey Bataev	e463bd53c0	Revert "[SLP]Merge reorder and reuse shuffles." This reverts commit `438682de6a` to fix the bug with the reducing size of the resulting vector for the entry node with multiple users.	2021-01-19 11:48:04 -08:00
Mariya Podchishchaeva	7113de301a	[ScalarizeMaskedMemIntrin] Add missing dependency The pass has dependency on 'TargetTransformInfoWrapperPass', but the corresponding call to INITIALIZE_PASS_DEPENDENCY was missing. Differential Revision: https://reviews.llvm.org/D94916	2021-01-19 22:33:47 +03:00
Nikita Popov	21443381c0	Reapply [InstCombine] Replace one-use select operand based on condition Relative to the original change, this adds a check that the instruction on which we're replacing operands is safe to speculatively execute, because that's what we're effectively doing. We're executing the instruction with the replaced operand, which is fine if it's pure, but not fine if can cause side-effects or UB (aka is not speculatable). Additionally, we cannot (generally) replace operands in phi nodes, as these may refer to a different loop iteration. This is also covered by the speculation check. ----- InstCombine already performs a fold where X == Y ? f(X) : Z is transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However, if f(X) only has one use, then we can always directly replace the use inside the instruction. To actually be profitable, limit it to the case where Y is a non-expr constant. This could be further extended to replace uses further up a one-use instruction chain, but for now this only looks one level up. Among other things, this also subsumes D94860. Differential Revision: https://reviews.llvm.org/D94862	2021-01-19 20:26:38 +01:00
Jeroen Dobbelaere	121cac01e8	[noalias.decl] Look through llvm.experimental.noalias.scope.decl Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93042	2021-01-19 20:09:42 +01:00
Hans Wennborg	58bdfcfac0	Revert `5238e7b302` "[InstCombine] Replace one-use select operand based on condition" This caused a miscompile in Chromium, see comments on the codereview for discussion and pointer to a reproducer. > InstCombine already performs a fold where X == Y ? f(X) : Z is > transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However, > if f(X) only has one use, then we can always directly replace the > use inside the instruction. To actually be profitable, limit it to > the case where Y is a non-expr constant. > > This could be further extended to replace uses further up a one-use > instruction chain, but for now this only looks one level up. > > Among other things, this also subsumes D94860. > > Differential Revision: https://reviews.llvm.org/D94862 This also reverts the follow-up a003f26539cf4db744655e76c41f4c4a8913f116: > [llvm] Prevent infinite loop in InstCombine of select statements > > This fixes an issue where the RHS and LHS the comparison operation > creating the predicate were swapped back and forth forever. > > Differential Revision: https://reviews.llvm.org/D94934	2021-01-19 11:50:56 +01:00
Florian Hahn	83daa49758	[LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands. D84108 exposed a bad interaction between inlining and loop-rotation during regular LTO, which is causing notable regressions in at least CINT2006/473.astar. The problem boils down to: we now rotate a loop just before the vectorizer which requires duplicating a function call in the preheader when compiling the individual files ('prepare for LTO'). But this then prevents further inlining of the function during LTO. This patch tries to resolve this issue by making LoopRotate more conservative with respect to rotating loops that have inline-able calls during the 'prepare for LTO' stage. I think this change intuitively improves the current situation in general. Loop-rotate tries hard to avoid creating headers that are 'too big'. At the moment, it assumes all inlining already happened and the cost of duplicating a call is equal to just doing the call. But with LTO, inlining also happens during full LTO and it is possible that a previously duplicated call is actually a huge function which gets inlined during LTO. From the perspective of LV, not much should change overall. Most loops calling user-provided functions won't get vectorized to start with (unless we can infer that the function does not touch memory, has no other side effects). If we do not inline the 'inline-able' call during the LTO stage, we merely delayed loop-rotation & vectorization. If we inline during LTO, chances should be very high that the inlined code is itself vectorizable or the user call was not vectorizable to start with. There could of course be scenarios where we inline a sufficiently large function with code not profitable to vectorize, which would have be vectorized earlier (by scalarzing the call). But even in that case, there probably is no big performance impact, because it should be mostly down to the cost-model to reject vectorization in that case. And then the version with scalarized calls should also not be beneficial. In a way, LV should have strictly more information after inlining and make more accurate decisions (barring cost-model issues). There is of course plenty of room for things to go wrong unexpectedly, so we need to keep a close look at actual performance and address any follow-up issues. I took a look at the impact on statistics for MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer loops rotated, but no change to the number of loops vectorized. Reviewed By: sanwou01 Differential Revision: https://reviews.llvm.org/D94232	2021-01-19 10:15:29 +00:00
Tres Popp	a003f26539	[llvm] Prevent infinite loop in InstCombine of select statements This fixes an issue where the RHS and LHS the comparison operation creating the predicate were swapped back and forth forever. Differential Revision: https://reviews.llvm.org/D94934	2021-01-19 10:31:48 +01:00
David Sherwood	c3ce262794	[NFC] Make remaining cost functions in LoopVectorize.cpp use InstructionCost A previous patch has already changed getInstructionCost to return an InstructionCost type. This patch changes the other various getXXXCost functions to return an InstructionCost too. This is a non-functional change - I've added a few asserts that the costs are valid in places where we're selecting between vector call and intrinsic costs. However, since we don't yet return invalid costs from any of the TTI implementations these asserts should not fire. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D94065	2021-01-19 09:08:40 +00:00
Juneyoung Lee	2d89ebd5d1	Address unused variable warning	2021-01-19 09:30:16 +09:00
Juneyoung Lee	0441df94ad	[InstCombine,InstSimplify] Optimize select followed by and/or/xor This patch adds `A & (A && B)` -> `A && B` (similarly for or + logical or) Also, this patch adds `~(select C, (icmp pred X, Y), const)` -> `select C, (icmp pred' X, Y), ~const`. Alive2 proof: merge_and: https://alive2.llvm.org/ce/z/teMR97 merge_or: https://alive2.llvm.org/ce/z/b4yZUp xor_and: https://alive2.llvm.org/ce/z/_-TXHi xor_or: https://alive2.llvm.org/ce/z/2uYx_a Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94861	2021-01-19 09:14:17 +09:00
Juneyoung Lee	395c737d9f	[SimplifyCFG] Update SimplifyBranchOnICmpChain to recognize select form of and/or This patch teaches SimplifyCFG::SimplifyBranchOnICmpChain to understand select form of (x == C1 \|\| x == C2 \|\| ...) / (x != C1 && x != C2 && ...) and optimize them into switch if possible. D93065 has more context about the transition, including links to the list of optimizations being updated. Differential Revision: https://reviews.llvm.org/D93943	2021-01-19 08:53:40 +09:00
Sanjay Patel	5b77ac32b1	[SLP] match maxnum/minnum intrinsics as FP reduction ops After much refactoring over the last 2 weeks to the reduction matching code, I think this change is finally ready. We effectively broke fmax/fmin vector reduction optimization when we started canonicalizing to intrinsics in instcombine, so this should restore that functionality for SLP. There are still FMF problems here as noted in the code comments, but we should be avoiding miscompiles on those for fmax/fmin by restricting to full 'fast' ops (negative tests are included). Fixing FMF propagation is a planned follow-up. Differential Revision: https://reviews.llvm.org/D94913	2021-01-18 17:37:16 -05:00
Kazu Hirata	23b0ab2acb	[llvm] Use the default value of drop_begin (NFC)	2021-01-18 10:16:36 -08:00
Kazu Hirata	dc300beba7	[STLExtras] Add a default value to drop_begin This patch adds the default value of 1 to drop_begin. In the llvm codebase, 70% of calls to drop_begin have 1 as the second argument. The interface similar to with std::next should improve readability. This patch converts a couple of calls to drop_begin as examples. Differential Revision: https://reviews.llvm.org/D94858	2021-01-18 10:16:34 -08:00
Xun Li	1d04dc52dd	[Coroutine] Do not CoroElide if there are musttail calls This is to address https://bugs.llvm.org/show_bug.cgi?id=48626. When there are musttail calls that use parameters aliasing the newly created coroutine frame, the existing implementation will fatal. We simply cannot perform CoroElide in such cases. In theory a precise analysis can be done to check whether the parameters of the musttail call actually alias the frame, but it's very hard to do it before the transformation happens. Also in most cases the existence of musttail call is generated due to symmetric transfers, and in those cases alias analysis won't be able to tell that they don't alias anyway. Differential Revision: https://reviews.llvm.org/D94834	2021-01-18 09:06:21 -08:00
Sanjay Patel	3dbbadb8ef	[SLP] rename reduction query for min/max ops; NFC This will avoid confusion once we start matching min/max intrinsics. All of these hacks to accomodate cmp+sel idioms should disappear once we canonicalize to min/max intrinsics.	2021-01-18 09:32:57 -05:00
Sanjay Patel	d1c4e859ce	[SLP] reduce opcode API dependency in reduction cost calc; NFC The icmp opcode is now hard-coded in the cost model call. This will make it easier to eventually remove all opcode queries for min/max patterns as we transition to intrinsics.	2021-01-18 09:32:57 -05:00
Florian Hahn	e6d758de82	[InferAttrs] Mark some library functions as willreturn. This patch marks some library functions as willreturn. On the first pass, I excluded most functions that interact with streams/the filesystem. Along with willreturn, it also adds nounwind to a set of math functions. There probably are a few additional attributes we can add for those, but that should be done separately. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94684	2021-01-18 13:40:21 +00:00
Caroline Concatto	36710c38c1	[NFC]Migrate VectorCombine.cpp to use InstructionCost This patch changes these functions: vectorizeLoadInsert isExtractExtractCheap foldExtractedCmps scalarizeBinopOrCmp getShuffleExtract foldBitcastShuf to use the class InstructionCost when calling TTI.get<something>Cost(). This patch is part of a series of patches to use InstructionCost instead of unsigned/int for the cost model functions. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html See this patch for the introduction of the type: https://reviews.llvm.org/D91174 ps.:This patch adds the test \|\| !NewCost.isValid(), because we want to return false when: !NewCost.isValid && !OldCost.isValid()->the cost to transform it expensive and !NewCost.isValid() && OldCost.isValid() Therefore for simplication we only add test for !NewCost.isValid() Differential Revision: https://reviews.llvm.org/D94069	2021-01-18 13:37:21 +00:00
Dávid Bolvanský	ed396212da	[InstCombine] Transform abs pattern using multiplication to abs intrinsic (PR45691) ``` unsigned r(int v) { return (1 \| -(v < 0)) * v; } `r` is equivalent to `abs(v)`. ``` ``` define <4 x i8> @src(<4 x i8> %0) { %1: %2 = ashr <4 x i8> %0, { 31, undef, 31, 31 } %3 = or <4 x i8> %2, { 1, 1, 1, undef } %4 = mul nsw <4 x i8> %3, %0 ret <4 x i8> %4 } => define <4 x i8> @tgt(<4 x i8> %0) { %1: %2 = icmp slt <4 x i8> %0, { 0, 0, 0, 0 } %3 = sub nsw <4 x i8> { 0, 0, 0, 0 }, %0 %4 = select <4 x i1> %2, <4 x i8> %3, <4 x i8> %0 ret <4 x i8> %4 } Transformation seems to be correct! ``` Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94874	2021-01-17 17:06:14 +01:00
Nikita Popov	5238e7b302	[InstCombine] Replace one-use select operand based on condition InstCombine already performs a fold where X == Y ? f(X) : Z is transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However, if f(X) only has one use, then we can always directly replace the use inside the instruction. To actually be profitable, limit it to the case where Y is a non-expr constant. This could be further extended to replace uses further up a one-use instruction chain, but for now this only looks one level up. Among other things, this also subsumes D94860. Differential Revision: https://reviews.llvm.org/D94862	2021-01-16 23:25:02 +01:00
Roman Lebedev	32fc32317a	[SimplifyCFG] markAliveBlocks(): catchswitch: preserve PostDomTree When removing catchpad's from catchswitch, if that removes a successor, we need to record that in DomTreeUpdater. This fixes PostDomTree preservation failure in an existing test. This appears to be the single issue that i see in my current test coverage.	2021-01-17 01:21:05 +03:00
Sanjay Patel	49b96cd9ef	[SLP] remove opcode field from reduction data class This is NFC-intended and another step towards supporting intrinsics as reduction candidates. The remaining bits of the OperationData class do not make much sense as-is, so I will try to improve that, but I'm trying to take minimal steps because it's still not clear how this was intended to work.	2021-01-16 13:55:52 -05:00
Sanjay Patel	fcfcc3cc6b	[SLP] fix typos; NFC	2021-01-16 13:55:52 -05:00
Sanjay Patel	48dbac5b6b	[SLP] remove unnecessary use of 'OperationData' This is another NFC-intended patch to allow matching intrinsics (example: maxnum) as candidates for reductions. It's possible that the loop/if logic can be reduced now, but it's still difficult to understand how this all works.	2021-01-16 13:55:52 -05:00
Kazu Hirata	2082b10d10	[llvm] Use *::empty (NFC)	2021-01-16 09:40:55 -08:00
Kazu Hirata	19aacdb715	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-16 09:40:53 -08:00
Mircea Trofin	e8049dc3c8	[NewPM][Inliner] Move the 'always inliner' case in the same CGSCC pass as 'regular' inliner Expanding from D94808 - we ensure the same InlineAdvisor is used by both InlinerPass instances. The notion of mandatory inlining is moved into the core InlineAdvisor: advisors anyway have to handle that case, so this change also factors out that a bit better. Differential Revision: https://reviews.llvm.org/D94825	2021-01-15 17:59:38 -08:00
Dávid Bolvanský	a1500105ee	[SimplifyCFG] Optimize CFG when null is passed to a function with nonnull argument Example: ``` __attribute__((nonnull,noinline)) char * pinc(char p) { return ++p; } char foo(bool b, char a) { return pinc(b ? 0 : a); } ``` optimize to ``` char foo(bool b, char *a) { return pinc(a); } ``` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94180	2021-01-15 23:53:43 +01:00
Sanjay Patel	ceb3cdccd0	[SLP] remove dead code in reduction matching; NFC To get into this block we had: !A \|\| B \|\| C and we checked C in the first 'if' clause leaving !A \|\| B. But the 2nd 'if' is checking: A && !B --> !(!A \|\| B)	2021-01-15 17:03:26 -05:00
Nick Desaulniers	ed0fd567eb	BreakCriticalEdges: do not split the critical edge from a CallBr indirect successor Otherwise we'll fail the assertion in SplitBlockPredecessors() related to splitting the edges from CallBr's. Fixes: https://github.com/ClangBuiltLinux/linux/issues/1161 Fixes: https://github.com/ClangBuiltLinux/linux/issues/1252 Reviewed By: void, MaskRay, jyknight Differential Revision: https://reviews.llvm.org/D88438	2021-01-15 13:51:47 -08:00
Roman Lebedev	a14c36fe27	[SimplifyCFG] switchToSelect(): don't forget to insert DomTree edge iff needed DestBB might or might not already be a successor of SelectBB, and it wasn't we need to ensure that we record the fact in DomTree. The testcase used to crash in lazy domtree updater mode + non-per-function domtree validity checks disabled.	2021-01-15 23:35:57 +03:00
Roman Lebedev	c6654a4cda	[SimplifyCFG][BasicBlockUtils] Port SplitBlockPredecessors()/SplitLandingPadPredecessors() to DomTreeUpdater This is not nice, but it's the best transient solution possible, and is better than just duplicating the whole function. The problem is, this function is widely used, and it is not at all obvious that all the users could be painlessly switched to operate on DomTreeUpdater, and somehow i don't feel like porting all those users first. This function is one of last three that not operate on DomTreeUpdater.	2021-01-15 23:35:56 +03:00
Roman Lebedev	286cf6cb02	[SimplifyCFG] Port SplitBlockAndInsertIfThen() to DomTreeUpdater This is not nice, but it's the best transient solution possible, and is better than just duplicating the whole function. The problem is, this function is widely used, and it is not at all obvious that all the users could be painlessly switched to operate on DomTreeUpdater, and somehow i don't feel like porting all those users first. This function is one of last three that not operate on DomTreeUpdater.	2021-01-15 23:35:56 +03:00
Roman Lebedev	c845c724c2	[Utils][SimplifyCFG] Port SplitBlock() to DomTreeUpdater This is not nice, but it's the best transient solution possible, and is better than just duplicating the whole function. The problem is, this function is widely used, and it is not at all obvious that all the users could be painlessly switched to operate on DomTreeUpdater, and somehow i don't feel like porting all those users first. This function is one of last three that not operate on DomTreeUpdater.	2021-01-15 23:35:56 +03:00
Roman Lebedev	b81f75fa79	[Utils] splitBlockBefore() always operates on DomTreeUpdater, so take it, not DomTree Even though not all it's users operate on DomTreeUpdater, it itself internally operates on DomTreeUpdater, so it must mean everything is fine with that, so just do that globally.	2021-01-15 23:35:56 +03:00
Sanjay Patel	1f21de535d	[SLP] remove unused reduction functions; NFC These were made obsolete by simplifying the code in recent patches.	2021-01-15 14:59:33 -05:00
Jamie Schmeiser	17d0fb7f57	Set option default for enabling memory ssa for new pass manager loop sink pass to true. Summary: Set the default for the option enabling memory ssa use in the loop sink pass to true for the new pass manager. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea) Differential Revision: https://reviews.llvm.org/D92486	2021-01-15 09:56:44 -05:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Kazu Hirata	2efcbe24a7	[llvm] Use llvm::drop_begin (NFC)	2021-01-14 20:30:33 -08:00
Kazu Hirata	9bcc0d1040	[CodeGen, Transforms] Use llvm::sort (NFC)	2021-01-14 20:30:31 -08:00
Sanjay Patel	b21905dfe3	[SLP] remove unnecessary state in matching reductions This is NFC-intended. I'm still trying to figure out how the loop where this is used works. It does not seem like we require this data at all, but it's hard to confirm given the complicated predicates.	2021-01-14 18:32:37 -05:00
Bjorn Pettersson	d58512b2e3	[SLP] Don't vectorize stores of non-packed types (like i1, i2) In the spirit of commit `fc783e91e0` (llvm-svn: 248943) we shouldn't vectorize stores of non-packed types (i.e. types that has padding between consecutive variables in a scalar layout, but being packed in a vector layout). The problem was detected as a miscompile in a downstream test case. Reviewed By: anton-afanasyev Differential Revision: https://reviews.llvm.org/D94446	2021-01-14 11:30:33 +01:00
Daniel Paoliello	ff5e896425	Fix unused variable in CoroFrame.cpp when building Release with GCC 10 When building with GCC 10, the following warning is reported: ``` /llvm-project/llvm/lib/Transforms/Coroutines/CoroFrame.cpp:1527:28: warning: unused variable ‘CS’ [-Wunused-variable] 1527 \| if (CatchSwitchInst *CS = ``` This change adds a cast to `void` to avoid the warning. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D94456	2021-01-13 22:53:25 -08:00
Kazu Hirata	125ea20d55	[llvm] Use llvm::stable_sort (NFC)	2021-01-13 19:14:43 -08:00
Kazu Hirata	5c1c39e8d8	[llvm] Use *Set::contains (NFC)	2021-01-13 19:14:41 -08:00
Wei Mi	86341247c4	[NFC] Rename ThinLTOPhase to ThinOrFullLTOPhase and move it from PassBuilder.h to Pass.h. In some compiler passes like SampleProfileLoaderPass, we want to know which LTO/ThinLTO phase the pass is in. Currently the phase is represented in enum class PassBuilder::ThinLTOPhase, so it is only available in PassBuilder and it also cannot represent phase in full LTO. The patch extends it to include full LTO phases and move it from PassBuilder.h to Pass.h, then it is much easier for PassBuilder to communiate with each pass about current LTO phase. Differential Revision: https://reviews.llvm.org/D94613	2021-01-13 15:55:40 -08:00
Arthur Eubanks	39e6d24237	[NewPM] Only non-trivially loop unswitch at -O3 and for non-optsize functions This matches the legacy pipeline/pass. Reviewed By: asbirlea, SjoerdMeijer Differential Revision: https://reviews.llvm.org/D94559	2021-01-13 14:54:49 -08:00
Kazu Hirata	fb98a1be43	Fix the warnings on unused variables (NFC)	2021-01-13 13:32:40 -08:00
Sanjay Patel	123674a816	[SLP] simplify type check for reductions This is NFC-intended. The 'valid' call allows int/FP/pointers for other parts of SLP. The difference here is that we can't reduce pointers.	2021-01-13 13:30:46 -05:00
Andrew Litteken	05b1a15f70	[IROutliner] Adapting to hoisted bitcasts in CodeExtractor In commit `700d2417d8` the CodeExtractor was updated so that bitcasts that have lifetime markers that beginning outside of the region are deduplicated outside the region and are not used as an output. This caused a discrepancy in the IROutliner, where in these cases there were arguments added to the aggregate function that were not needed causing assertion errors. The IROutliner queries the CodeExtractor twice to determine the inputs and outputs, before and after `findAllocas` is called with the same ValueSet for the outputs causing the duplication. This has been fixed with a dummy ValueSet for the first call. However, the additional bitcasts prevent us from using the same similarity relationships that were previously defined by the IR Similarity Analysis Pass. In these cases, we check whether the initial version of the region being analyzed for outlining is still the same as it was previously. If it is not, i.e. because of the additional bitcast instructions from the CodeExtractor, we discard the region. Reviewers: yroux Differential Revision: https://reviews.llvm.org/D94303	2021-01-13 11:10:37 -06:00
Nikita Popov	17863614da	[InstCombine] Fold select -> and/or using impliesPoison We can fold a ? b : false to a & b if is_poison(b) implies that is_poison(a), at which point we're able to reuse all the usual fold on ands. In particular, this covers the very common case of icmp X, C && icmp X, C'. The same applies to ors. This currently only has an effect if the -instcombine-unsafe-select-transform=0 option is set. Differential Revision: https://reviews.llvm.org/D94550	2021-01-13 17:45:40 +01:00
David Sherwood	4cd48535ec	[NFC][InstructionCost] Use InstructionCost in Transforms/Scalar/RewriteStatepointsForGC.cpp In places where we calculate costs using TTI.getXXXCost() interfaces I have changed the code to use InstructionCost instead of unsigned. The change is non functional since InstructionCost behaves in the same way as an integer for valid costs. Currently the getXXXCost() functions used in this file do not return invalid costs. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential revision: https://reviews.llvm.org/D94484	2021-01-13 09:42:58 +00:00
Kazu Hirata	8a20e2b3d3	[llvm] Use Optional::getValueOr (NFC)	2021-01-12 21:43:50 -08:00
Kazu Hirata	12fc9ca3a4	[llvm] Remove redundant string initialization (NFC) Identified with readability-redundant-string-init.	2021-01-12 21:43:46 -08:00
Yuanfang Chen	5c7dcd7aea	[Coroutine] Update promise object's final layout index promise is a header field but it is not guaranteed that it would be the third field of the frame due to `performOptimizedStructLayout`. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D94137	2021-01-12 17:44:02 -08:00
Luo, Yuanke	055644cc45	[X86][AMX] Prohibit pointer cast on load. The load/store instruction will be transformed to amx intrinsics in the pass of AMX type lowering. Prohibiting the pointer cast make that pass happy. Differential Revision: https://reviews.llvm.org/D94372	2021-01-13 09:39:19 +08:00
Hongtao Yu	175288a1af	Add sample-profile-suffix-elision-policy attribute with -funique-internal-linkage-names. Adding sample-profile-suffix-elision-policy attribute to functions whose linkage names are uniquefied so that their unique name suffix won't be trimmed when applying AutoFDO profiles. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D94455	2021-01-12 15:15:53 -08:00
modimo	2a49b7c64a	[Inliner] Change inline remark format and update ReplayInlineAdvisor to use it This change modifies the source location formatting from: LineNumber.Discriminator to: LineNumber:ColumnNumber.Discriminator The motivation here is to enhance location information for inline replay that currently exists for the SampleProfile inliner. This will be leveraged further in inline replay for the CGSCC inliner in the related diff. The ReplayInlineAdvisor is also modified to read the new format and now takes into account the callee for greater accuracy. Testing: ninja check-llvm Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D94333	2021-01-12 13:43:48 -08:00
Nikita Popov	23390e7a13	[InstCombine] Handle logical and/or in assume optimization assume(a && b) can be converted to assume(a); assume(b) even if the condition is logical. Same for assume(!(a \|\| b)).	2021-01-12 22:36:40 +01:00
Sanjay Patel	9e7895a868	[SLP] reduce code duplication while processing reductions; NFC	2021-01-12 16:03:57 -05:00
Sanjay Patel	92fb5c49e8	[SLP] rename variable to improve readability; NFC The OperationData in the 2nd block (visiting the operands) is completely independent of the 1st block.	2021-01-12 16:03:57 -05:00
Sanjay Patel	554be30a42	[SLP] reduce code duplication in processing reductions; NFC	2021-01-12 16:03:57 -05:00
Sanjay Patel	46507a96fc	[SLP] reduce code duplication while matching reductions; NFC	2021-01-12 16:03:57 -05:00
Philip Reames	caafdf07bb	[LV] Weaken spuriously strong assert in LoopVersioning LoopVectorize uses some utilities on LoopVersioning, but doesn't actually use it for, you know, versioning. As a result, the precondition LoopVersioning expects is too strong for this user. At the moment, LoopVectorize supports any loop with a unique exit block, so check the same precondition here. Really, the whole class structure here is a mess. We should separate the actual versioning from the metadata updates, but that's a bigger problem.	2021-01-12 12:57:13 -08:00
Philip Reames	9f61fbd75a	[LV] Relax assumption that LCSSA implies single entry This relates to the ongoing effort to support vectorization of multiple exit loops (see D93317). The previous code assumed that LCSSA phis were always single entry before the vectorizer ran. This was correct, but only because the vectorizer allowed only a single exiting edge. There's nothing in the definition of LCSSA which requires single entry phis. A common case where this comes up is with a loop with multiple exiting blocks which all reach a common exit block. (e.g. see the test updates) Differential Revision: https://reviews.llvm.org/D93725	2021-01-12 12:34:52 -08:00
Florian Hahn	6cd44b204c	[FunctionAttrs] Derive willreturn for fns with readonly` & `mustprogress`. Similar to D94125, derive `willreturn` for functions that are `readonly` and `mustprogress` in FunctionAttrs. To quote the reasoning from D94125: Since D86233 we have `mustprogress` which, in combination with `readonly`, implies `willreturn`. The idea is that every side-effect has to be modeled as a "write". Consequently, `readonly` means there is no side-effect, and `mustprogress` guarantees that we cannot "loop" forever without side-effect. Reviewed By: jdoerfert, nikic Differential Revision: https://reviews.llvm.org/D94502	2021-01-12 20:02:34 +00:00
Dávid Bolvanský	0529946b5b	[instCombine] Add (A ^ B) \| ~(A \| B) -> ~(A & B) define i32 @src(i32 %x, i32 %y) { %0: %xor = xor i32 %y, %x %or = or i32 %y, %x %neg = xor i32 %or, 4294967295 %or1 = or i32 %xor, %neg ret i32 %or1 } => define i32 @tgt(i32 %x, i32 %y) { %0: %and = and i32 %x, %y %neg = xor i32 %and, 4294967295 ret i32 %neg } Transformation seems to be correct! https://alive2.llvm.org/ce/z/Cvca4a	2021-01-12 19:29:17 +01:00
Quentin Colombet	905623b64d	[NFC][LICM] Minor improvements to debug output Added a utility function in Value class to print block name and use block labels for unnamed blocks. Changed LICM to call this function in its debug output. Patch by Xiaoqing Wu <xiaoqing_wu@apple.com> Differential Revision: https://reviews.llvm.org/D93577	2021-01-11 18:02:49 -08:00
Roman Lebedev	ec8a6c11db	[SimplifyCFGPass] iterativelySimplifyCFG(): support lazy DomTreeUpdater This boils down to how we deal with early-increment iterator over function's basic blocks: not only we need to early-increment, after that we also need to skip all the blocks that are scheduled for removal, as per DomTreeUpdater.	2021-01-12 02:09:47 +03:00
Roman Lebedev	81afeacd37	[SimplifyCFGPass] mergeEmptyReturnBlocks(): skip blocks scheduled for removal as per DomTreeUpdater Thus supporting lazy DomTreeUpdater mode, where the domtree updates (and thus block removals) aren't applied immediately, but are delayed until last possible moment.	2021-01-12 02:09:47 +03:00
Roman Lebedev	90a92f8b4d	[NFCI][Utils/Local] removeUnreachableBlocks(): cleanup support for lazy DomTreeUpdater When DomTreeUpdater is in lazy update mode, the blocks that were scheduled to be removed, won't be removed until the updates are flushed, e.g. by asking DomTreeUpdater for a up-to-date DomTree. From the function's current code, it is pretty evident that the support for the lazy mode is an afterthought, see e.g. how we roll-back NumRemoved statistic.. So instead of considering all the unreachable blocks as the blocks-to-be-removed, simply additionally skip all the blocks that are already scheduled to be removed	2021-01-12 02:09:47 +03:00
Roman Lebedev	f9ba347706	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): don't insert a DomTree edge if it already exists When we are adding edges to the terminator and potentially turning it into a switch (if it wasn't already), it is possible that the case we're adding will share it's destination with one of the preexisting cases, in which case there is no domtree edge to add. Indeed, this change does not have a test coverage change. This failure has been exposed in an existing test coverage by a follow-up patch that switches to lazy domtreeupdater mode, and removes domtree verification from SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(), IOW it does not appear feasible to add dedicated test coverage here.	2021-01-12 02:09:47 +03:00
Roman Lebedev	c0de0a1b72	[SimplifyCFG] SimplifyBranchOnICmpChain(): don't insert a DomTree edge that already exists BB was already always branching to EdgeBB, there is no edge to add. Indeed, this change does not have a test coverage change. This failure has been exposed in an existing test coverage by a follow-up patch that switches to lazy domtreeupdater mode, and removes domtree verification from SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(), IOW it does not appear feasible to add dedicated test coverage here.	2021-01-12 02:09:46 +03:00
Roman Lebedev	c22bc5f1f8	[SimplifyCFG] SwitchToLookupTable(): don't insert a DomTree edge that already exists SI is the terminator of BB, so the edge we are adding obviously already existed. Indeed, this change does not have a test coverage change. This failure has been exposed in an existing test coverage by a follow-up patch that switches to lazy domtreeupdater mode, and removes domtree verification from SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(), IOW it does not appear feasible to add dedicated test coverage here.	2021-01-12 02:09:46 +03:00
Hongtao Yu	32bcfcda4e	Rename debug linkage name with -funique-internal-linkage-names Functions that are renamed under -funique-internal-linkage-names have their debug linkage name updated as well. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D93747	2021-01-11 13:56:07 -08:00
Sanjay Patel	288f3fc5df	[InstCombine] reduce icmp(ashr X, C1), C2 to sign-bit test This is a more basic pattern that we should handle before trying to solve: https://llvm.org/PR48640 There might be a better way to think about this because the pre-condition that I came up with (number of sign bits in the compare constant) misses a potential transform for each of ugt and ult as commented on in the test file. Tried to model this is in Alive: https://rise4fun.com/Alive/juX1 ...but I couldn't get the ComputeNumSignBits() pre-condition to work as expected, so replaced with leading 0/1 preconditions instead. Name: ugt Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1 %a = ashr %x, C1 %r = icmp ugt i8 %a, C2 => %r = icmp slt i8 %x, 0 Name: ult Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1 %a = ashr %x, C1 %r = icmp ult i4 %a, C2 => %r = icmp sgt i4 %x, -1 Also approximated in Alive2: https://alive2.llvm.org/ce/z/u5hCcz https://alive2.llvm.org/ce/z/__szVL Differential Revision: https://reviews.llvm.org/D94014	2021-01-11 15:53:39 -05:00
Sriraman Tallam	d8c6d24359	-funique-internal-linkage-names appends a hex md5hash suffix to the symbol name which is not demangler friendly, convert it to decimal. Please see D93747 for more context which tries to make linkage names of internal linkage functions to be the uniqueified names. This causes a problem with gdb because breaking using the demangled function name will not work if the new uniqueified name cannot be demangled. The problem is the generated suffix which is a mix of integers and letters which do not demangle. The demangler accepts either all numbers or all letters. This patch simply converts the hash to decimal. There is no loss of uniqueness by doing this as the precision is maintained. The symbol names get longer by a few characters though. Differential Revision: https://reviews.llvm.org/D94154	2021-01-11 11:10:29 -08:00
Giorgis Georgakoudis	9751705512	[OpenMPOpt][WIP] Expand parallel region merging The existing implementation of parallel region merging applies only to consecutive parallel regions that have speculatable sequential instructions in-between. This patch lifts this limitation to expand merging with any sequential instructions in-between, except calls to unmergable OpenMP runtime functions. In-between sequential instructions in the merged region are sequentialized in a "master" region and any output values are broadcasted to the following parallel regions and the sequential region continuation of the merged region. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90909	2021-01-11 08:06:23 -08:00
Florian Hahn	eb0371e403	[VPlan] Unify value/recipe printing after VPDef transition. This patch unifies the way recipes and VPValues are printed after the transition to VPDef. VPSlotTracker has been updated to iterate over all recipes and all their defined values to number those. There is no need to number values in Value2VPValue. It also updates a few places that only used slot numbers for VPInstruction. All recipes now can produce numbered VPValues.	2021-01-11 14:42:46 +00:00
Florian Hahn	a94497a342	[VPlan] Move initial quote emission from ::print to ::dumpBasicBlock. This means there will be no stray " when printing individual recipes using print()/dump() in a debugger, for example.	2021-01-11 12:22:15 +00:00
Bjorn Pettersson	675be65106	Require chained analyses in BasicAA and AAResults to be transitive This patch fixes a bug that could result in miscompiles (at least in an OOT target). The problem could be seen by adding checks that the DominatorTree used in BasicAliasAnalysis and ValueTracking was valid (e.g. by adding DT->verify() call before every DT dereference and then running all tests in test/CodeGen). Problem was that the LegacyPassManager calculated "last user" incorrectly for passes such as the DominatorTree when not telling the pass manager that there was a transitive dependency between the different analyses. And then it could happen that an incorrect dominator tree was used when doing alias analysis (which was a pretty serious bug as the alias analysis result could be invalid). Fixes: https://bugs.llvm.org/show_bug.cgi?id=48709 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94138	2021-01-11 11:50:07 +01:00
David Sherwood	40abeb11f4	[NFC][InstructionCost] Change LoopVectorizationCostModel::getInstructionCost to return InstructionCost This patch is part of a series of patches that migrate integer instruction costs to use InstructionCost. In the function selectVectorizationFactor I have simply asserted that the cost is valid and extracted the value as is. In future we expect to encounter invalid costs, but we should filter out those vectorization factors that lead to such invalid costs. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D92178	2021-01-11 09:22:37 +00:00
David Sherwood	b7ccaca537	[NFC] Remove min/max functions from InstructionCost Removed the InstructionCost::min/max functions because it's fine to use std::min/max instead. Differential Revision: https://reviews.llvm.org/D94301	2021-01-11 09:00:12 +00:00
Serguei Katkov	7f69860243	[LoopUnroll] Fix a crash Loop peeling as a last step triggers loop simplification and this can change the loop structure. As a result all cashed values like latch branch becomes invalid. Patch re-structure the code to take into account the possible changes caused by peeling. Reviewers: dmgreen, Meinersbur, etiotto, fhahn, efriedma, bmahjour Reviewed By: Meinersbur, fhahn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D93686	2021-01-11 10:19:26 +07:00
Philip Reames	4739dd67e7	[LoopDeletion] Break backedge of outermost loops when known not taken This is a resubmit of `dd6bb367` (which was reverted due to stage2 build failures in `7c63aac`), with the additional restriction added to the transform to only consider outer most loops. As shown in the added test case, ensuring LCSSA is up to date when deleting an inner loop is tricky as we may actually need to remove blocks from any outer loops, thus changing the exit block set. For the moment, just avoid transforming this case. I plan to return to this case in a follow up patch and see if we can do better. Original commit message follows... The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects. Differential Revision: https://reviews.llvm.org/D93906	2021-01-10 16:02:33 -08:00
Roman Lebedev	8e8d214c4a	[NFCI][SimplifyCFG] Prefer to add Insert edges before Delete edges into DomTreeUpdater, if reasonable This has a measurable impact on the number of DomTree recalculations. While this doesn't handle all the cases, it deals with the most obvious ones.	2021-01-11 00:30:44 +03:00
Sanjay Patel	3f09c77d33	[SLP] fix typo in assert This snuck into `0aa75fb12f` , but I didn't catch it locally.	2021-01-10 13:15:04 -05:00
Sanjay Patel	0aa75fb12f	[SLP] put verifyFunction call behind EXPENSIVE_CHECKS A severe compile-time slowdown from this call is noted in: https://llvm.org/PR48689 My naive fix was to put it under LLVM_DEBUG ( `267ff79` ), but that's not limiting in the way we want. This is a quick fix (or we could just remove the call completely and rely on some later pass to discover potentially wrong IR?). A bigger/better fix would be to improve/limit verifyFunction() as noted in: https://llvm.org/PR47712 Differential Revision: https://reviews.llvm.org/D94328	2021-01-10 12:32:21 -05:00
Florian Hahn	c701f85c45	[STLExtras] Use return type from operator* of the wrapped iter. Currently make_early_inc_range cannot be used with iterators with operator* implementations that do not return a reference. Most notably in the LLVM codebase, this means the User iterator ranges cannot be used with make_early_inc_range, which slightly simplifies iterating over ranges while elements are removed. Instead of directly using BaseT::reference as return type of operator, this patch uses decltype to get the actual return type of the operator implementation in WrappedIteratorT. This patch also updates a few places to use make use of make_early_inc_range. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D93992	2021-01-10 14:41:13 +00:00
Florian Hahn	d98fc62ae6	[SimplifyCFG] Keep !dgb metadata of moved instruction, if they match. Currently SimplifyCFG drops the debug locations of 'bonus' instructions. Such instructions are moved before the first branch. The reason for the current behavior is that this could lead to surprising debug stepping, if the block that's folded is dead. In case the first branch and the instructions to be folded have the same debug location, this shouldn't be an issue and we can keep the debug location. Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D93662	2021-01-09 19:15:16 +00:00
Kazu Hirata	6a6e382161	[llvm] Drop unnecessary make_range (NFC)	2021-01-09 09:25:00 -08:00
Kazu Hirata	4d92ab1669	[Transforms] Use llvm::find_if (NFC)	2021-01-09 09:24:58 -08:00
Kazu Hirata	9a7c03b800	[SCEV] Remove unused getOrInsertCanonicalInductionVariable (NFC) The last use was removed on Mar 22, 2012 in commit `f47d0af551`.	2021-01-09 09:24:56 -08:00
Florian Hahn	65f578fc0e	[VPlan] Keep start value of VPWidenPHIRecipe as VPValue. Similar to D92129, update VPWidenPHIRecipe to manage the start value as VPValue. This allows adjusting the start value as a VPlan transform, which will be used in a follow-up patch to support reductions during epilogue vectorization. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D93975	2021-01-09 16:34:15 +00:00
Kazu Hirata	f62b93b9a2	[SCEV] Remove unused getExactExistingExpansion (NFC) The last use was removed on Sep 4, 2018 in commit `2cbba56337`.	2021-01-08 18:39:57 -08:00
Kazu Hirata	b7c5e0b02c	[Target, Transforms] Use *Set::contains (NFC)	2021-01-08 18:39:54 -08:00
Arthur Eubanks	756dd70766	[NewPM] Run ObjC ARC passes Match the legacy PM in running various ObjC ARC passes. This requires making some module passes into function passes. These were initially ported as module passes since they add function declarations (e.g. https://reviews.llvm.org/D86178), but that's still up for debate and other passes do so. Reviewed By: ahatanak Differential Revision: https://reviews.llvm.org/D93743	2021-01-08 15:47:11 -08:00
Florian Hahn	c493e9216b	[VPlan] Move reduction start value creation to widenPHIRecipe. This was suggested to prepare for D93975. By moving the start value creation to widenPHInstruction, we set the stage to manage the start value directly in VPWidenPHIRecipe, which be used subsequently to set the 'resume' value for reductions during epilogue vectorization. It also moves RdxDesc to the recipe, so we do not have to rely on Legal to look it up later. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D94175	2021-01-08 17:49:43 +00:00
Alexander Belyaev	bcbdeafa9c	Revert "[SLP]Need shrink the load vector after reordering." This reverts commit `4284afdf94`. This changes computed values in fused_batchnorm_test_cpu. Not equal to tolerance rtol=1e-06, atol=0.001 Mismatched value: a is different from b. not close where = (array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]), array([0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5])) not close lhs = [-0.6636615 -0.9804948 -1.148275 -0.68193716 -0.8572368 -0.65046215 -0.6993756 -1.2244141 -1.0938729 -0.50369143 -0.51830524 -0.738452 -0.7214286 -0.48115745 -0.9380924 -0.9341769 -0.5916775 -1.2896856 -0.7264182 -0.9746917 -0.783249 -0.7659018 -0.86214024 -0.47784212] not close rhs = [ 0.44102234 0.12418899 -0.04359123 0.42274666 0.24744703 0.45422167 0.40530816 -0.11973029 0.01081094 0.6009924 0.5863786 0.3662318 0.38325527 0.62352633 0.1665914 0.1705069 0.5130063 -0.18500176 0.37826565 0.12999213 0.3214348 0.338782 0.24254355 0.62684166] not close dif = [1.1046839 1.1046838 1.1046838 1.1046839 1.1046839 1.1046839 1.1046838 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046838 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046838 1.1046838] not close tol = [0.00100044 0.00100012 0.00100004 0.00100042 0.00100025 0.00100045 0.00100041 0.00100012 0.00100001 0.0010006 0.00100059 0.00100037 0.00100038 0.00100062 0.00100017 0.00100017 0.00100051 0.00100019 0.00100038 0.00100013 0.00100032 0.00100034 0.00100024 0.00100063]	2021-01-08 14:42:26 +01:00
Sanjay Patel	267ff7901c	[SLP] limit verifyFunction to debug build (PR48689) As noted in PR48689, the verifier may have some kind of exponential behavior that should be addressed separately. For now, only run it in debug mode to prevent problems for release+asserts. That limit is what we had before D80401, and I'm not sure if there was a reason to change it in that patch.	2021-01-08 08:10:17 -05:00
Cullen Rhodes	1e7efd397a	[LV] Legalize scalable VF hints In the following loop: void foo(int a, int b, int N) { for (int i=0; i<N; ++i) a[i + 4] = a[i] + b[i]; } The loop dependence constrains the VF to a maximum of (4, fixed), which would mean using <4 x i32> as the vector type in vectorization. Extending this to scalable vectorization, a VF of (4, scalable) implies a vector type of <vscale x 4 x i32>. To determine if this is legal vscale must be taken into account. For this example, unless max(vscale)=1, it's unsafe to vectorize. For SVE, the number of bits in an SVE register is architecturally defined to be a multiple of 128 bits with a maximum of 2048 bits, thus the maximum vscale is 16. In the loop above it is therefore unfeasible to vectorize with SVE. However, in this loop: void foo(int a, int b, int N) { #pragma clang loop vectorize_width(X, scalable) for (int i=0; i<N; ++i) a[i + 32] = a[i] + b[i]; } As long as max(vscale) multiplied by the number of lanes 'X' doesn't exceed the dependence distance, it is safe to vectorize. For SVE a VF of (2, scalable) is within this constraint, since a vector of <16 x 2 x 32> will have no dependencies between lanes. For any number of lanes larger than this it would be unsafe to vectorize. This patch extends 'computeFeasibleMaxVF' to legalize scalable VFs specified as loop hints, implementing the following behaviour: * If the backend does not support scalable vectors, ignore the hint. * If scalable vectorization is unfeasible given the loop dependence, like in the first example above for SVE, then use a fixed VF. * Accept scalable VFs if it's safe to do so. * Otherwise, clamp scalable VFs that exceed the maximum safe VF. Reviewed By: sdesmalen, fhahn, david-arm Differential Revision: https://reviews.llvm.org/D91718	2021-01-08 10:49:44 +00:00
David Green	72fb5ba079	[LV] Don't sink into replication regions The new test case here contains a first order recurrences and an instruction that is replicated. The first order recurrence forces an instruction to be sunk _into_, as opposed to after the replication region. That causes several things to go wrong including registering vector instructions multiple times and failing to create dominance relations correctly. Instead we should be sinking to after the replication region, which is what this patch makes sure happens. Differential Revision: https://reviews.llvm.org/D93629	2021-01-08 09:50:10 +00:00
Kazu Hirata	33bf1cad75	[llvm] Use *Set::contains (NFC)	2021-01-07 20:29:34 -08:00
Ruiling Song	8dddcc762d	[Cloning] Copy metadata of global declarations We have modules with metadata on declarations, and out-of-tree passes use that metadata, and we need to clone those modules. We really expect such metadata is kept during the clone operation. Reviewed by: arsenm, aprantl Differential Revision: https://reviews.llvm.org/D93451	2021-01-08 08:21:18 +08:00
Roman Lebedev	f2f81c554b	[SimplifyCFG] markAliveBlocks(): switch to non-permissive DomTree updates No actual changes needed, invoke can't have the same block as an unwind destination and a normal destination.	2021-01-08 02:15:27 +03:00
Roman Lebedev	d59f97bb3a	[SimplifyCFG] removeUnwindEdge(): switch to non-permissive DomTree updates No actual changes needed, Catchswitch cannot unwind to one of its catchpads.	2021-01-08 02:15:27 +03:00
Roman Lebedev	f0eba8ce2d	[SimplifyCFG] changeToCall(): switch to non-permissive DomTree updates No actual changes needed, normal and unwind destinations of an invoke can never be identical.	2021-01-08 02:15:27 +03:00
Roman Lebedev	be0a31d13b	[SimplifyCFG] DeleteDeadBlocks(): switch to non-permissive DomTree updates No actual changes needed, DetatchDeadBlocks() was already doing the right thing.	2021-01-08 02:15:27 +03:00
Roman Lebedev	66189212bb	[SimplifyCFG] MergeBlockIntoPredecessor(): switch to non-permissive DomTree updates ... which requires not deleting edges that were just deleted already, by not processing the same successor more than once.	2021-01-08 02:15:26 +03:00
Roman Lebedev	05adc73db0	[SimplifyCFG] changeToUnreachable(): switch to non-permissive DomTree updates ... which requires not deleting edges that were just deleted already, by not processing the same predecessor more than once.	2021-01-08 02:15:26 +03:00
Roman Lebedev	7600d7c7be	[SimplifyCFG] removeUnreachableBlocks(): switch to non-permissive DomTree updates ... which requires not deleting edges that were just deleted already, by not processing the same predecessor more than once.	2021-01-08 02:15:26 +03:00
Roman Lebedev	1f9b591ee6	[SimplifyCFG] TryToSimplifyUncondBranchFromEmptyBlock(): switch to non-permissive DomTree updates ... which requires not deleting edges that were just deleted already, by not processing the same predecessor more than once.	2021-01-08 02:15:25 +03:00
Roman Lebedev	b3822728fa	[SimplifyCFG] ConstantFoldTerminator(): switch to non-permissive DomTree updates in `indirectbr` handling ... which requires not deleting edges that were just deleted already.	2021-01-08 02:15:25 +03:00
Roman Lebedev	36593a30a4	[SimplifyCFG] ConstantFoldTerminator(): switch to non-permissive DomTree updates in `SwitchInst` handling ... which requires not deleting edges that will still be present.	2021-01-08 02:15:24 +03:00
Roman Lebedev	16ab8e5f6d	[SimplifyCFG] ConstantFoldTerminator(): handle matching destinations of condbr earlier We need to handle this case before dealing with the case of constant branch condition, because if the destinations match, latter fold would try to remove the DomTree edge that would still be present. This allows to make that particular DomTree update non-permissive	2021-01-08 02:15:24 +03:00

... 14 15 16 17 18 ...

27735 Commits