llvm-project

Commit Graph

Author	SHA1	Message	Date
Kerry McLaughlin	857b8a73da	[LoopVectorize] Change the identity element for FAdd Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd. Reviewed By: dmgreen, spatel Differential Revision: https://reviews.llvm.org/D98963	2021-04-06 12:13:43 +01:00
Florian Hahn	a6b06b785c	[VPlan] Print VPValue operands for VPWidenPHI if possible. For VPWidenPHIRecipes that model all incoming values as VPValue operands, print those operands instead of printing the original PHI. D99294 updates recipes of reduction PHIs to use the VPValue for the incoming value from the loop backedge, making use of this new printing.	2021-04-06 12:11:21 +01:00
madhur13490	167ea67d76	[IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction This patch enhances hasAddressTaken() to ignore bitcasts as a callee in callbase instruction. Such bitcast usage doesn't really take the address in a useful meaningful way. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D98884	2021-04-06 09:23:46 +00:00
Arthur Eubanks	ea0e2ca1ac	[SROA] Allow SROA on pointers with invariant group intrinsic uses When we are able to SROA an alloca, we know all uses of it, meaning we don't have to preserve the invariant group intrinsics and metadata. It's possible that we could lose information regarding redundant loads/stores, but that's unlikely to have any real impact since right now the only user is Clang and vtables. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99760	2021-04-05 19:53:40 -07:00
Ta-Wei Tu	6a82ace5f2	[LoopFusion] Bails out if only the second candidate is guarded (PR48060) If only the second candidate loop is guarded while the first one is not, fusioning two loops might not be valid but this check is currently missing. Fixes https://bugs.llvm.org/show_bug.cgi?id=48060 Reviewed By: sidbav Differential Revision: https://reviews.llvm.org/D99716	2021-04-06 01:08:56 +08:00
Sanjay Patel	c590a9880d	[InstCombine] fix potential miscompile in select value equivalence As shown in the example based on: https://llvm.org/PR49832 ...and the existing test, we can't substitute a vector value because the equality compare replacement that we are attempting requires that the comparison is true for the entire value. Vector select can be partly true/false.	2021-04-05 12:25:40 -04:00
Alexey Bataev	00a84f9a7f	[SLP]Improve vectorization of the CmpInst instructions. During vectorization better to postpone the vectorization of the CmpInst instructions till the end of the basic block. Otherwise we may vectorize it too early and may miss some vectorization patterns, like reductions. Reworked part of D57059 Differential Revision: https://reviews.llvm.org/D99796	2021-04-05 06:22:51 -07:00
Roman Lebedev	2760a808b9	[InstCombine] dropRedundantMaskingOfLeftShiftInput(): check that adding shift amounts doesn't overflow (PR49778) This is identical to `781d077afb`, but for the other function. For certain shift amount bit widths, we must first ensure that adding shift amounts is safe, that the sum won't have an unsigned overflow. Fixes https://bugs.llvm.org/show_bug.cgi?id=49778	2021-04-04 23:26:41 +03:00
Roman Lebedev	dceb3e5996	[NFC][InstCombine] Extract canTryToConstantAddTwoShiftAmounts() as helper	2021-04-04 23:26:41 +03:00
Sanjay Patel	c0645f1324	[InstCombine] fold popcount of exactly one bit to shift This is discussed in https://llvm.org/PR48999 , but it does not solve that request. The difference in the vector test shows that some other logic transform is limited to scalar types.	2021-04-04 11:43:49 -04:00
Nikita Popov	9bad7de9a3	[SimplifyCFG] Handle two equal cases in switch to select When converting a switch with two cases and a default into a select, also handle the denegerate case where two cases have the same value. Generate this case directly as %or = or i1 %cmp1, %cmp2 %res = select i1 %or, i32 %val, i32 %default rather than %sel1 = select i1 %cmp1, i32 %val, i32 %default %res = select i1 %cmp2, i32 %val, i32 %sel1 as InstCombine is going to canonicalize to the former anyway.	2021-04-04 17:27:28 +02:00
Juneyoung Lee	5207cde5cb	[InstCombine] Conditionally fold select i1 into and/or This patch fixes llvm.org/pr49688 by conditionally folding select i1 into and/or: ``` select cond, cond2, false -> and cond, cond2 ``` This is not safe if cond2 is poison whereas cond isn’t. Unconditionally disabling this transformation affects later pipelines that depend on and/or i1s. To minimize its impact, this patch conservatively checks whether cond2 is an instruction that creates a poison or its operand creates a poison. This approach is similar to what InstSimplify's SimplifyWithOpReplaced is doing. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99674	2021-04-04 14:11:28 +09:00
Fangrui Song	8e5f3d04f2	[SLPVectorizer] Fix divide-by-zero after D99719 Will add a test case later.	2021-04-02 11:13:51 -07:00
Sanjay Patel	412fc74140	[InstCombine] fold not+or+neg ~((-X) \| Y) --> (X - 1) & (~Y) We generally prefer 'add' over 'sub', this reduces the dependency chain, and this looks better for codegen on x86, ARM, and AArch64 targets. https://llvm.org/PR45755 https://alive2.llvm.org/ce/z/cxZDSp	2021-04-02 13:16:36 -04:00
Dimitry Andric	6abb92f210	[SCCP] Avoid modifying AdditionalUsers while iterating over it When run under valgrind, or with a malloc that poisons freed memory, this can lead to segfaults or other problems. To avoid modifying the AdditionalUsers DenseMap while still iterating, save the instructions to be notified in a separate SmallPtrSet, and use this to later call OperandChangedState on each instruction. Fixes PR49582. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D98602	2021-04-02 19:05:59 +02:00
Florian Hahn	8867fc69f0	[LV] Hoist mapping of IR operands to VPValues (NFC). This patch moves mapping of IR operands to VPValues out of tryToCreateWidenRecipe. This allows using existing VPValue operands when widening recipes directly, which will be introduced in future patches.	2021-04-02 17:57:20 +01:00
Philip Reames	2c4548e18e	[rs4gc] Use loops instead of straightline code for attribute stripping [nfc] Mostly because I'm about to add more attributes and the straightline copies get much uglier. What's currently there isn't too bad.	2021-04-02 09:25:15 -07:00
Philip Reames	a505801e2b	[rs4gc] Strip nofree and nosync attributes when lowering from abstract model The safepoints being inserted exists to free memory, or coordinate with another thread to do so. Thus, we must strip any inferred attributes and reinfer them after the lowering. I'm not aware of any active miscompiles caused by this, but since I'm working on strengthening inference of both and leveraging them in the optimization decisions, I figured a bit of future proofing was warranted.	2021-04-02 09:12:24 -07:00
Alexey Bataev	5fcb07a070	[SLP]Fix a bug in min/max reduction, number of condition uses. The ultimate reduction node may have multiple uses, but if the ultimate reduction is min/max reduction and based on SelectInstruction, the condition of this select instruction must have only single use. Differential Revision: https://reviews.llvm.org/D99753	2021-04-02 07:09:44 -07:00
Jeroen Dobbelaere	b82b305cf9	[InstCombine] Fix out-of-bounds ashr(shl) optimization This fixes a crash found by the oss fuzzer and reported by @fhahn. The suggestion of @RKSimon seems to be the correct fix here. (See D91343). The oss fuzz report can be found here: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32759 Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D99792	2021-04-02 13:45:11 +02:00
Florian Hahn	0f3230390b	[SLP] Better estimate cost of no-op extracts on target vectors. The motivation for this patch is to better estimate the cost of extracelement instructions in cases were they are going to be free, because the source vector can be used directly. A simple example is %v1.lane.0 = extractelement <2 x double> %v.1, i32 0 %v1.lane.1 = extractelement <2 x double> %v.1, i32 1 %a.lane.0 = fmul double %v1.lane.0, %x %a.lane.1 = fmul double %v1.lane.1, %y Currently we only consider the extracts free, if there are no other users. In this particular case, on AArch64 which can fit <2 x double> in a vector register, the extracts should be free, independently of other users, because the source vector of the extracts will be in a vector register directly, so it should be free to use the vector directly. The SLP vectorized version of noop_extracts_9_lanes is 30%-50% faster on certain AArch64 CPUs. It looks like this does not impact any code in SPEC2000/SPEC2006/MultiSource both on X86 and AArch64 with -O3 -flto. This originally regressed after D80773, so if there's a better alternative to explore, I'd be more than happy to do that. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D99719	2021-04-02 10:40:12 +01:00
Evgeniy Brevnov	2388aae401	[NARY-REASSOCIATE] Support reassociation of min/max Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available. Reviewed By: mkazantsev, lebedev.ri Differential Revision: https://reviews.llvm.org/D88287	2021-04-02 15:30:13 +07:00
Roman Lebedev	a26f1bf67e	[PassManager] Run additional LICM before LoopRotate Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (sic) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: \| statistic name \| LoopRotate-LICM \| LICM-LoopRotate \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015930 \| 9015799 \| -131 \| 0.00% \| 0.00% \| \| indvars.NumElimCmp \| 3536 \| 3544 \| 8 \| 0.23% \| 0.23% \| \| indvars.NumElimExt \| 36725 \| 36580 \| -145 \| -0.39% \| 0.39% \| \| indvars.NumElimIV \| 1197 \| 1187 \| -10 \| -0.84% \| 0.84% \| \| indvars.NumElimIdentity \| 143 \| 136 \| -7 \| -4.90% \| 4.90% \| \| indvars.NumElimRem \| 4 \| 5 \| 1 \| 25.00% \| 25.00% \| \| indvars.NumLFTR \| 29842 \| 29890 \| 48 \| 0.16% \| 0.16% \| \| indvars.NumReplaced \| 2293 \| 2227 \| -66 \| -2.88% \| 2.88% \| \| indvars.NumSimplifiedSDiv \| 6 \| 8 \| 2 \| 33.33% \| 33.33% \| \| indvars.NumWidened \| 26438 \| 26329 \| -109 \| -0.41% \| 0.41% \| \| instcount.TotalBlocks \| 1178338 \| 1173840 \| -4498 \| -0.38% \| 0.38% \| \| instcount.TotalFuncs \| 111825 \| 111829 \| 4 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9905442 \| 9896139 \| -9303 \| -0.09% \| 0.09% \| \| lcssa.NumLCSSA \| 425871 \| 423961 \| -1910 \| -0.45% \| 0.45% \| \| licm.NumHoisted \| 378357 \| 378753 \| 396 \| 0.10% \| 0.10% \| \| licm.NumMovedCalls \| 2193 \| 2208 \| 15 \| 0.68% \| 0.68% \| \| licm.NumMovedLoads \| 35899 \| 31821 \| -4078 \| -11.36% \| 11.36% \| \| licm.NumPromoted \| 11178 \| 11154 \| -24 \| -0.21% \| 0.21% \| \| licm.NumSunk \| 13359 \| 13587 \| 228 \| 1.71% \| 1.71% \| \| loop-delete.NumDeleted \| 8547 \| 8402 \| -145 \| -1.70% \| 1.70% \| \| loop-instsimplify.NumSimplified \| 12876 \| 11890 \| -986 \| -7.66% \| 7.66% \| \| loop-peel.NumPeeled \| 1008 \| 925 \| -83 \| -8.23% \| 8.23% \| \| loop-rotate.NumNotRotatedDueToHeaderSize \| 368 \| 365 \| -3 \| -0.82% \| 0.82% \| \| loop-rotate.NumRotated \| 42015 \| 42003 \| -12 \| -0.03% \| 0.03% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 240 \| 242 \| 2 \| 0.83% \| 0.83% \| \| loop-simplifycfg.NumLoopExitsDeleted \| 497 \| 20 \| -477 \| -95.98% \| 95.98% \| \| loop-simplifycfg.NumTerminatorsFolded \| 618 \| 336 \| -282 \| -45.63% \| 45.63% \| \| loop-unroll.NumCompletelyUnrolled \| 11028 \| 11032 \| 4 \| 0.04% \| 0.04% \| \| loop-unroll.NumUnrolled \| 12608 \| 12529 \| -79 \| -0.63% \| 0.63% \| \| mem2reg.NumDeadAlloca \| 10222 \| 10221 \| -1 \| -0.01% \| 0.01% \| \| mem2reg.NumPHIInsert \| 192110 \| 192106 \| -4 \| 0.00% \| 0.00% \| \| mem2reg.NumSingleStore \| 637650 \| 637643 \| -7 \| 0.00% \| 0.00% \| \| scalar-evolution.NumBruteForceTripCountsComputed \| 814 \| 812 \| -2 \| -0.25% \| 0.25% \| \| scalar-evolution.NumTripCountsComputed \| 283108 \| 282934 \| -174 \| -0.06% \| 0.06% \| \| scalar-evolution.NumTripCountsNotComputed \| 106712 \| 106718 \| 6 \| 0.01% \| 0.01% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 4752 \| -426 \| -8.23% \| 8.23% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 503 \| -411 \| -44.97% \| 44.97% \| \| simple-loop-unswitch.NumSwitches \| 20 \| 18 \| -2 \| -10.00% \| 10.00% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 95 \| -88 \| -48.09% \| 48.09% \| ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? \| statistic name \| LoopRotate-LICM \| LICM-LoopRotate-LICM \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015930 \| 9014474 \| -1456 \| -0.02% \| 0.02% \| \| indvars.NumElimCmp \| 3536 \| 3546 \| 10 \| 0.28% \| 0.28% \| \| indvars.NumElimExt \| 36725 \| 36681 \| -44 \| -0.12% \| 0.12% \| \| indvars.NumElimIV \| 1197 \| 1185 \| -12 \| -1.00% \| 1.00% \| \| indvars.NumElimIdentity \| 143 \| 146 \| 3 \| 2.10% \| 2.10% \| \| indvars.NumElimRem \| 4 \| 5 \| 1 \| 25.00% \| 25.00% \| \| indvars.NumLFTR \| 29842 \| 29899 \| 57 \| 0.19% \| 0.19% \| \| indvars.NumReplaced \| 2293 \| 2299 \| 6 \| 0.26% \| 0.26% \| \| indvars.NumSimplifiedSDiv \| 6 \| 8 \| 2 \| 33.33% \| 33.33% \| \| indvars.NumWidened \| 26438 \| 26404 \| -34 \| -0.13% \| 0.13% \| \| instcount.TotalBlocks \| 1178338 \| 1173652 \| -4686 \| -0.40% \| 0.40% \| \| instcount.TotalFuncs \| 111825 \| 111829 \| 4 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9905442 \| 9895452 \| -9990 \| -0.10% \| 0.10% \| \| lcssa.NumLCSSA \| 425871 \| 425373 \| -498 \| -0.12% \| 0.12% \| \| licm.NumHoisted \| 378357 \| 383352 \| 4995 \| 1.32% \| 1.32% \| \| licm.NumMovedCalls \| 2193 \| 2204 \| 11 \| 0.50% \| 0.50% \| \| licm.NumMovedLoads \| 35899 \| 35755 \| -144 \| -0.40% \| 0.40% \| \| licm.NumPromoted \| 11178 \| 11163 \| -15 \| -0.13% \| 0.13% \| \| licm.NumSunk \| 13359 \| 14321 \| 962 \| 7.20% \| 7.20% \| \| loop-delete.NumDeleted \| 8547 \| 8538 \| -9 \| -0.11% \| 0.11% \| \| loop-instsimplify.NumSimplified \| 12876 \| 12041 \| -835 \| -6.48% \| 6.48% \| \| loop-peel.NumPeeled \| 1008 \| 924 \| -84 \| -8.33% \| 8.33% \| \| loop-rotate.NumNotRotatedDueToHeaderSize \| 368 \| 365 \| -3 \| -0.82% \| 0.82% \| \| loop-rotate.NumRotated \| 42015 \| 42005 \| -10 \| -0.02% \| 0.02% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 240 \| 241 \| 1 \| 0.42% \| 0.42% \| \| loop-simplifycfg.NumTerminatorsFolded \| 618 \| 619 \| 1 \| 0.16% \| 0.16% \| \| loop-unroll.NumCompletelyUnrolled \| 11028 \| 11029 \| 1 \| 0.01% \| 0.01% \| \| loop-unroll.NumUnrolled \| 12608 \| 12525 \| -83 \| -0.66% \| 0.66% \| \| mem2reg.NumPHIInsert \| 192110 \| 192073 \| -37 \| -0.02% \| 0.02% \| \| mem2reg.NumSingleStore \| 637650 \| 637652 \| 2 \| 0.00% \| 0.00% \| \| scalar-evolution.NumTripCountsComputed \| 283108 \| 282998 \| -110 \| -0.04% \| 0.04% \| \| scalar-evolution.NumTripCountsNotComputed \| 106712 \| 106691 \| -21 \| -0.02% \| 0.02% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 5185 \| 7 \| 0.14% \| 0.14% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 925 \| 11 \| 1.20% \| 1.20% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 179 \| -4 \| -2.19% \| 2.19% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 4752 \| -426 \| -8.23% \| 8.23% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 503 \| -411 \| -44.97% \| 44.97% \| \| simple-loop-unswitch.NumSwitches \| 20 \| 18 \| -2 \| -10.00% \| 10.00% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 95 \| -88 \| -48.09% \| 48.09% \| I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: \| statistic name \| LICM-LoopRotate \| LICM-LoopRotate-LICM \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015799 \| 9014474 \| -1325 \| -0.01% \| 0.01% \| \| indvars.NumElimCmp \| 3544 \| 3546 \| 2 \| 0.06% \| 0.06% \| \| indvars.NumElimExt \| 36580 \| 36681 \| 101 \| 0.28% \| 0.28% \| \| indvars.NumElimIV \| 1187 \| 1185 \| -2 \| -0.17% \| 0.17% \| \| indvars.NumElimIdentity \| 136 \| 146 \| 10 \| 7.35% \| 7.35% \| \| indvars.NumLFTR \| 29890 \| 29899 \| 9 \| 0.03% \| 0.03% \| \| indvars.NumReplaced \| 2227 \| 2299 \| 72 \| 3.23% \| 3.23% \| \| indvars.NumWidened \| 26329 \| 26404 \| 75 \| 0.28% \| 0.28% \| \| instcount.TotalBlocks \| 1173840 \| 1173652 \| -188 \| -0.02% \| 0.02% \| \| instcount.TotalInsts \| 9896139 \| 9895452 \| -687 \| -0.01% \| 0.01% \| \| lcssa.NumLCSSA \| 423961 \| 425373 \| 1412 \| 0.33% \| 0.33% \| \| licm.NumHoisted \| 378753 \| 383352 \| 4599 \| 1.21% \| 1.21% \| \| licm.NumMovedCalls \| 2208 \| 2204 \| -4 \| -0.18% \| 0.18% \| \| licm.NumMovedLoads \| 31821 \| 35755 \| 3934 \| 12.36% \| 12.36% \| \| licm.NumPromoted \| 11154 \| 11163 \| 9 \| 0.08% \| 0.08% \| \| licm.NumSunk \| 13587 \| 14321 \| 734 \| 5.40% \| 5.40% \| \| loop-delete.NumDeleted \| 8402 \| 8538 \| 136 \| 1.62% \| 1.62% \| \| loop-instsimplify.NumSimplified \| 11890 \| 12041 \| 151 \| 1.27% \| 1.27% \| \| loop-peel.NumPeeled \| 925 \| 924 \| -1 \| -0.11% \| 0.11% \| \| loop-rotate.NumRotated \| 42003 \| 42005 \| 2 \| 0.00% \| 0.00% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 242 \| 241 \| -1 \| -0.41% \| 0.41% \| \| loop-simplifycfg.NumLoopExitsDeleted \| 20 \| 497 \| 477 \| 2385.00% \| 2385.00% \| \| loop-simplifycfg.NumTerminatorsFolded \| 336 \| 619 \| 283 \| 84.23% \| 84.23% \| \| loop-unroll.NumCompletelyUnrolled \| 11032 \| 11029 \| -3 \| -0.03% \| 0.03% \| \| loop-unroll.NumUnrolled \| 12529 \| 12525 \| -4 \| -0.03% \| 0.03% \| \| mem2reg.NumDeadAlloca \| 10221 \| 10222 \| 1 \| 0.01% \| 0.01% \| \| mem2reg.NumPHIInsert \| 192106 \| 192073 \| -33 \| -0.02% \| 0.02% \| \| mem2reg.NumSingleStore \| 637643 \| 637652 \| 9 \| 0.00% \| 0.00% \| \| scalar-evolution.NumBruteForceTripCountsComputed \| 812 \| 814 \| 2 \| 0.25% \| 0.25% \| \| scalar-evolution.NumTripCountsComputed \| 282934 \| 282998 \| 64 \| 0.02% \| 0.02% \| \| scalar-evolution.NumTripCountsNotComputed \| 106718 \| 106691 \| -27 \| -0.03% \| 0.03% \| \| simple-loop-unswitch.NumBranches \| 4752 \| 5185 \| 433 \| 9.11% \| 9.11% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 503 \| 925 \| 422 \| 83.90% \| 83.90% \| \| simple-loop-unswitch.NumSwitches \| 18 \| 20 \| 2 \| 11.11% \| 11.11% \| \| simple-loop-unswitch.NumTrivial \| 95 \| 179 \| 84 \| 88.42% \| 88.42% \| {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249	2021-04-02 11:11:42 +03:00
Juneyoung Lee	c664769330	[AssumeBundles] offset should be added to correctly calculate align This is a patch to fix the bug in alignment calculation (see https://reviews.llvm.org/D90529#2619492). Consider this code: ``` call void @llvm.assume(i1 true) ["align"(i32* %a, i32 32, i32 28)] %arrayidx = getelementptr inbounds i32, i32* %a, i64 -1 ; aligment of %arrayidx? ``` The llvm.assume guarantees that `%a - 28` is 32-bytes aligned, meaning that `%a` is 32k + 28 for some k. Therefore `a - 4` cannot be 32-bytes aligned but the existing code was calculating the pointer as 32-bytes aligned. The reason why this happened is as follows. `DiffSCEV` stores `%arrayidx - %a` which is -4. `OffSCEV` stores the offset value of “align”, which is 28. `DiffSCEV` + `OffSCEV` = 24 should be used for `a - 4`'s offset from 32k, but `DiffSCEV` - `OffSCEV` = 32 was being used instead. Reviewed By: Tyker Differential Revision: https://reviews.llvm.org/D98759	2021-04-02 12:32:05 +09:00
Philip Reames	91790c6785	[indvars[ Fix pr49802 by checking for SCEVCouldNotCompute The code is assuming that having an exact exit count for the loop implies that exit counts for every exit are known. This used to be true, but when we added handling for dead exits we broke this invariant. The new invariant is that an exact loop count implies that any exits non trivially dead have exit counts. We could have fixed this by either a) explicitly checking for a dead exit, or b) just testing for SCEVCouldNotCompute. I chose the second as it was simpler. (Debugging this took longer than it should have since I'd mistyped the original assert and it wasn't checking what it was meant to...) p.s. Sorry for the lack of test case. Getting things into a state to actually hit this is difficult and fragile. The original repro involves loop-deletion leaving SCEV in a slightly inprecise state which lets us bypass other transforms in IndVarSimplify on the way to this one. All of my attempts to separate it into a standalone test failed.	2021-04-01 17:53:44 -07:00
Philip Reames	b23a314146	[funcattrs] Respect nofree attribute on callsites (not just callee)	2021-04-01 14:45:49 -07:00
Philip Reames	1e69a5af92	[Attributor] Cleanup detection of non-relaxed atomics in nosync inference The code was checking for cases which are disallowed by the verifier. Delete dead code and adjust style.	2021-04-01 12:01:29 -07:00
Philip Reames	8e596f7e27	[Attributor] Cleanup intrinsic handling in nosync inference [mostly NFC] Mostly stylistic adjustment, but the old code didn't handle the memcpy.inline intrinsic. By using the matcher class, we now do.	2021-04-01 11:49:59 -07:00
Philip Reames	6ef4505298	[funcattrs] Infer nosync from readnone and non-convergent This implements the most basic possible nosync inference. The choice of inference rule is taken from the comments in attributor and the discussion on the review of the change which introduced the nosync attribute (`0626367202`). This is deliberately minimal. As noted in code comments, I do plan to add a more robust inference which actually scans the function IR directly, but a) I need to do some refactoring of the attributor code to use common interfaces, and b) I wanted to get something in. I also wanted to minimize the "interesting" analysis discussion since that's time intensive. Context: This combines with existing nofree attribute inference to help prove dereferenceability in the ongoing deref-at-point semantics work. Differential Revision: https://reviews.llvm.org/D99749	2021-04-01 11:37:34 -07:00
Philip Reames	ffa15e9463	Extract isVolatile helper on Instruction [NFCI] We have this logic duplicated in several cases, none of which were exhaustive. Consolidate it in one place. I don't believe this actually impacts behavior of the callers. I think they all filter their inputs such that their partial implementations were correct. If not, this might be fixing a cornercase bug.	2021-04-01 11:24:02 -07:00
Philip Reames	6b05d753e0	Mark unordered memset/memmove/memcpy as nosync Mostly a means to remove a bit of code from attributor in advance of implementing a FuncAttr inference for nosync.	2021-04-01 10:38:54 -07:00
Alexey Bataev	c03696da5e	[SLP]Improve and fix getVectorElementSize. 1. Need to cleanup InstrElementSize map for each new tree, otherwise might use sizes from the previous run of the vectorization attempt. 2. No need to include into analysis the instructions from the different basic blocks to save compile time. Differential Revision: https://reviews.llvm.org/D99677	2021-04-01 06:51:26 -07:00
Alexey Bataev	ce98a0556a	[SLP]Remove `else` after `return`, NFC.`	2021-04-01 05:33:01 -07:00
Yevgeny Rouban	1ed53d44d8	[LoopFlatten] Do not report CFG analyses as up-to-date Removes CFGAnalyses from the preserved analyses set returned by LoopFlattenPass::run(). Reviewed By: Dave Green, Ta-Wei Tu Differential Revision: https://reviews.llvm.org/D99700	2021-04-01 15:52:36 +07:00
Max Kazantsev	a1d83776bf	[NFC] Undo some erroneous renamings Some vars renamed by mistake during auto-replacements. Undoing them.	2021-04-01 13:10:10 +07:00
Max Kazantsev	630818a850	[NFC] Disambiguate LI in GVN Name GVN uses name 'LI' for two different unrelated things: LoadInst and LoopInfo. This patch relates the variables with former meaning into 'Load' to disambiguate the code.	2021-04-01 12:40:35 +07:00
KAWASHIMA Takahiro	5fac7c6046	[GVN] Propagate llvm.access.group metadata of loads Before this change, the `llvm.access.group` metadata was dropped when moving a load instruction in GVN. This prevents vectorizing a C/C++ loop with `#pragma clang loop vectorize(assume_safety)`. This change propagates the metadata as well as other metadata if it is safe (the move-destination basic block and source basic block belong to the same loop). Differential Revision: https://reviews.llvm.org/D93503	2021-04-01 10:00:48 +09:00
qixingxue	62b74f7564	[GVN][NFC] Refactor analyzeLoadFromClobberingWrite This commit adjusts the order of two swappable if statements to make code cleaner. Reviewed By: lattner, nikic Differential Revision: https://reviews.llvm.org/D99648	2021-04-01 08:35:35 +08:00
Roman Lebedev	43ded90094	[NFC][LoopRotation] Count the number of instructions hoisted/cloned into preheader	2021-03-31 23:27:36 +03:00
Huihui Zhang	fe5c4a06a4	[LoopVectorize] Use SetVector to track uniform uses to prevent non-determinism. Use SetVector instead of SmallPtrSet to track values with uniform use. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM test consecutive-ptr-uniforms.ll . Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99549	2021-03-31 11:21:07 -07:00
Sanjay Patel	1462bdf1b9	[InstCombine] fold abs(srem X, 2) This is a missing optimization based on an example in: https://llvm.org/PR49763 As noted there and the test here, we could add a more general fold if that is shown useful. https://alive2.llvm.org/ce/z/xEHdTv https://alive2.llvm.org/ce/z/97dcY5	2021-03-31 11:29:20 -04:00
Sander de Smalen	7108b2dec1	[SVE] Fix LoopVectorizer test scalalable-call.ll This marks FSIN and other operations to EXPAND for scalable vectors, so that they are not assumed to be legal by the cost-model. Depends on D97470 Reviewed By: dmgreen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D97471	2021-03-31 14:52:49 +01:00
Chuanqi Xu	eb51dd719f	[Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2 Summary: Try to insert dbg.declare to entry.resume basic block in resume function. In this way, we could print alloca such as __promise in gdb/lldb under O2, which would be beneficial to debug coroutine program. Test Plan: check-llvm Reviewed by: aprantl Differential Revision: https://reviews.llvm.org/D96938	2021-03-31 10:37:06 +08:00
Fangrui Song	3e5ee194c0	[SimpleLoopUnswitch] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after `431a40e1e2`	2021-03-30 19:27:10 -07:00
Juneyoung Lee	431a40e1e2	[LoopUnswitch] Assert that branch condition is either and/or but not both as suggested at https://reviews.llvm.org/rG5bb38e84d3d0#986321	2021-03-31 10:35:22 +09:00
Sanjay Patel	c2ebad8d55	[InstCombine] add fold for demand of low bit of abs() This is one problem shown in https://llvm.org/PR49763 https://alive2.llvm.org/ce/z/cV6-4K https://alive2.llvm.org/ce/z/9_3g-L	2021-03-30 15:14:37 -04:00
Huihui Zhang	d857a81437	[VPlan] Use SetVector for VPExternalDefs to prevent non-determinism. Use SetVector instead of SmallPtrSet for external definitions created for VPlan. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM-Unit test VPRecipeTest.dump. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99544	2021-03-30 12:10:56 -07:00
spupyrev	22998738e8	[SamplePGO] Keeping prof metadata for IndirectBrInst Currently prof metadata with branch counts is added only for BranchInst and SwitchInst, but not for IndirectBrInst. As a result, BPI/BFI make incorrect inferences for indirect branches, which can be very hot. This diff adds metadata for IndirectBrInst, in addition to BranchInst and SwitchInst. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D99550	2021-03-30 10:44:48 -07:00
Hongtao Yu	3e3fc431df	[CSSPGO] Top-down processing order based on full profile. Use profiled call edges to augment the top-down order. There are cases that the top-down order computed based on the static call graph doesn't reflect real execution order. For example: 1. Incomplete static call graph due to unknown indirect call targets. Adjusting the order by considering indirect call edges from the profile can enable the inlining of indirect call targets by allowing the caller processed before them. 2. Mutual call edges in an SCC. The static processing order computed for an SCC may not reflect the call contexts in the context-sensitive profile, thus may cause potential inlining to be overlooked. The function order in one SCC is being adjusted to a top-down order based on the profile to favor more inlining. 3. Transitive indirect call edges due to inlining. When a callee function is inlined into into a caller function in LTO prelink, every call edge originated from the callee will be transferred to the caller. If any of the transferred edges is indirect, the original profiled indirect edge, even if considered, would not enforce a top-down order from the caller to the potential indirect call target in LTO postlink since the inlined callee is gone from the static call graph. 4. #3 can happen even for direct call targets, due to functions defined in header files. Header functions, when included into source files, are defined multiple times but only one definition survives due to ODR. Therefore, the LTO prelink inlining done on those dropped definitions can be useless based on a local file scope. More importantly, the inlinee, once fully inlined to a to-be-dropped inliner, will have no profile to consume when its outlined version is compiled. This can lead to a profile-less prelink compilation for the outlined version of the inlinee function which may be called from external modules. while this isn't easy to fix, we rely on the postlink AutoFDO pipeline to optimize the inlinee. Since the survived copy of the inliner (defined in headers) can be inlined in its local scope in prelink, it may not exist in the merged IR in postlink, and we'll need the profiled call edges to enforce a top-down order for the rest of the functions. Considering those cases, a profiled call graph completely independent of the static call graph is constructed based on profile data, where function objects are not even needed to handle case #3 and case 4. I'm seeing an average 0.4% perf win out of SPEC2017. For certain benchmark such as Xalanbmk and GCC, the win is bigger, above 2%. The change is an enhancement to https://reviews.llvm.org/D95988. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D99351	2021-03-30 10:42:22 -07:00
Krasimir Georgiev	c51e91e046	Revert "[Passes] Add relative lookup table converter pass" This reverts commit `5178ffc7cf`. Compiling `llvm-profdata` with a compiler build from this produces a crashing binary.	2021-03-30 14:13:37 +02:00
Juneyoung Lee	6b4b1dc6ec	[LoopUnswitch] Simplify branch condition if it is select with constant operands This fixes the miscompilation reported in https://reviews.llvm.org/rG5bb38e84d3d0#986154 . `select _, true, false` matches both m_LogicalAnd and m_LogicalOr, making later transformations confused. Simplify the branch condition to not have the form.	2021-03-30 20:09:42 +09:00
Sander de Smalen	f71ed5dfe2	NFC: Migrate PartialInlining to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D97382	2021-03-30 11:59:45 +01:00
David Sherwood	a08c7736a7	[LoopVectorize] Add support for scalable vectorization of induction variables This patch adds support for the vectorization of induction variables when using scalable vectors, which required the following changes: 1. Removed assert from InnerLoopVectorizer::getStepVector. 2. Modified InnerLoopVectorizer::createVectorIntOrFpInductionPHI to use a runtime determined value for VF and removed an assert. 3. Modified InnerLoopVectorizer::buildScalarSteps to work for scalable vectors. I did this by calculating the full vector value for each Part of the unroll factor (UF) and caching this in the VP state. This means that we are always able to extract an arbitrary element from the vector if necessary. In addition to this, I also permitted the caching of the individual lane values themselves for the known minimum number of elements in the same way we do for fixed width vectors. This is a further optimisation that improves the code quality since it avoids unnecessary extractelement operations when extracting the first lane. 4. Added an assert to InnerLoopVectorizer::widenPHIInstruction, since while testing some code paths I noticed this is currently broken for scalable vectors. Various tests to support different cases have been added here: Transforms/LoopVectorize/AArch64/sve-inductions.ll Differential Revision: https://reviews.llvm.org/D98715	2021-03-30 11:13:31 +01:00
Krasimir Georgiev	8e7df996e3	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `92ddd3c1b6`. Causes multistage clang crashes, e.g.: https://lab.llvm.org/buildbot/#/builders/36/builds/6678	2021-03-30 11:47:12 +02:00
Han Zhu	92ddd3c1b6	[loop-idiom] Hoist loop memcpys to loop preheader For a simple loop like: ``` struct S { int x; int y; char b; }; unsigned foo(S* __restrict__ a, S* b, int n) { for (int i = 0; i < n; i++) a[i] = b[i]; return sizeof(a[0]); } ``` We could eliminate the loop and convert it to a large memcpy of 12n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll` ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %0 = bitcast %struct.S* %arrayidx2 to i8* %1 = bitcast %struct.S* %arrayidx to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false) %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic. With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass. ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %a1 = bitcast %struct.S* %a to i8* %b2 = bitcast %struct.S* %b to i8* %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry %0 = zext i32 %n to i64 %1 = mul nuw nsw i64 %0, 12 call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false) br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %2 = bitcast %struct.S* %arrayidx2 to i8* %3 = bitcast %struct.S* %arrayidx to i8* %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` Reviewed By: zino Differential Revision: https://reviews.llvm.org/D97667	2021-03-29 23:36:26 -07:00
Han Zhu	2bd4049ceb	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `deb5095833`. Bad commit message.	2021-03-29 23:35:35 -07:00
Han Zhu	deb5095833	[loop-idiom] Hoist loop memcpys to loop preheader Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Blame Revision: Differential Revision: https://phabricator.intern.facebook.com/D26380397	2021-03-29 23:14:42 -07:00
Huihui Zhang	ca721042f1	[IPO][SampleContextTracker] Use SmallVector to track context profiles to prevent non-determinism. Use SmallVector instead of SmallSet to track the context profiles mapped. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM test profile-context-tracker-debug.ll . Reviewed By: MaskRay, wenlei Differential Revision: https://reviews.llvm.org/D99547	2021-03-29 16:37:10 -07:00
Gulfem Savrun Yeniceri	5178ffc7cf	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-03-29 21:53:32 +00:00
Wenlei He	30b0232336	[CSSPGO][llvm-profgen] Context-sensitive global pre-inliner This change sets up a framework in llvm-profgen to estimate inline decision and adjust context-sensitive profile based on that. We call it a global pre-inliner in llvm-profgen. It will serve two purposes: 1) Since context profile for not inlined context will be merged into base profile, if we estimate a context will not be inlined, we can merge the context profile in the output to save profile size. 2) For thinLTO, when a context involving functions from different modules is not inined, we can't merge functions profiles across modules, leading to suboptimal post-inline count quality. By estimating some inline decisions, we would be able to adjust/merge context profiles beforehand as a mitigation. Compiler inline heuristic uses inline cost which is not available in llvm-profgen. But since inline cost is closely related to size, we could get an estimate through function size from debug info. Because the size we have in llvm-profgen is the final size, it could also be more accurate than the inline cost estimation in the compiler. This change only has the framework, with a few TODOs left for follow up patches for a complete implementation: 1) We need to retrieve size for funciton//inlinee from debug info for inlining estimation. Currently we use number of samples in a profile as place holder for size estimation. 2) Currently the thresholds are using the values used by sample loader inliner. But they need to be tuned since the size here is fully optimized machine code size, instead of inline cost based on not yet fully optimized IR. Differential Revision: https://reviews.llvm.org/D99146	2021-03-29 09:46:14 -07:00
Florian Hahn	c773d0f973	Recommit "[LV] Move runtime pointer size check to LVP::plan()." Re-apply `25fbe803d4`, with a small update to emit the right remark class. Original message: [LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri	2021-03-29 16:14:27 +01:00
Florian Hahn	485c8ce733	Revert "[LV] Move runtime pointer size check to LVP::plan()." This reverts commit `25fbe803d4`. This breaks a clang test which filters for the wrong remark type.	2021-03-29 14:41:53 +01:00
Sanjay Patel	da381cf7ce	[SLP] allow matching integer min/max intrinsics as reduction ops This is a 2nd try of: `3c8473ba53` which was reverted at: `a26312f9d4` because of crashing. This version includes extra code and tests to avoid the known crashing examples as discussed in PR49730. Original commit message: As noted in D98152, we need to patch SLP to avoid regressions when we start canonicalizing to integer min/max intrinsics. Most of the real work to make this possible was in: `7202f47508` Differential Revision: https://reviews.llvm.org/D98981	2021-03-29 09:38:18 -04:00
Florian Hahn	25fbe803d4	[LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98634	2021-03-29 14:12:29 +01:00
Matt Arsenault	9a0c9402fa	Reapply "OpaquePtr: Turn inalloca into a type attribute" This reverts commit `07e46367ba`.	2021-03-29 08:55:30 -04:00
Jingu Kang	e4abb64100	[LoopUnswitch] Use reference variables instead of pointer one Differential Revision: https://reviews.llvm.org/D99496	2021-03-29 13:08:46 +01:00
Hans Wennborg	c6e5c4654b	Don't use $ as suffix for symbol names in ThinLTOBitcodeWriter and other places Using $ breaks demangling of the symbols. For example, $ c++filt _Z3foov\$123 _Z3foov$123 This causes problems for developers who would like to see nice stack traces etc., but also for automatic crash tracking systems which try to organize crashes based on the stack traces. Instead, use the period as suffix separator, since Itanium demanglers normally ignore such suffixes: $ c++filt _Z3foov.123 foo() [clone .123] This is already done in some places; try to do it everywhere. Differential revision: https://reviews.llvm.org/D97484	2021-03-29 13:03:52 +02:00
Oliver Stannard	07e46367ba	Revert "Reapply "OpaquePtr: Turn inalloca into a type attribute"" Reverting because test 'Bindings/Go/go.test' is failing on most buildbots. This reverts commit `fc9df30991`.	2021-03-29 11:32:22 +01:00
Jingu Kang	cfe87d4edd	[NFC][LoopUnswitch] Move hasPartialIVCondition to LoopUtils Differential revision: https://reviews.llvm.org/D99490	2021-03-29 10:29:45 +01:00
Matt Arsenault	fc9df30991	Reapply "OpaquePtr: Turn inalloca into a type attribute" This reverts commit `20d5c42e0e`.	2021-03-28 13:35:21 -04:00
Sanjay Patel	01ae6e5ead	[InstCombine] sink min/max intrinsics with common op after select This is another step towards parity with cmp+select min/max idioms. See D98152.	2021-03-28 13:13:04 -04:00
Nico Weber	20d5c42e0e	Revert "OpaquePtr: Turn inalloca into a type attribute" This reverts commit `4fefed6563`. Broke check-clang everywhere.	2021-03-28 13:02:52 -04:00
Matt Arsenault	4fefed6563	OpaquePtr: Turn inalloca into a type attribute I think byval/sret and the others are close to being able to rip out the code to support the missing type case. A lot of this code is shared with inalloca, so catch this up to the others so that can happen.	2021-03-28 11:12:23 -04:00
Florian Hahn	8c6c357897	[LV] Mark a few more cost-model members as const (NFC).	2021-03-28 14:59:48 +01:00
Florian Hahn	d2855eba81	[LV] Fix formatting from `2f9d68c3f1`.	2021-03-27 21:29:56 +00:00
Florian Hahn	2f9d68c3f1	[LV] Mark some methods as const (NFC). Mark a few methods as const, as they do not modify any state.	2021-03-27 21:27:53 +00:00
Juneyoung Lee	05884d3b52	Make FoldBranchToCommonDest poison-safe by default This is a small patch to make FoldBranchToCommonDest poison-safe by default. After `fc3f0c9c`, only two syntactic changes are needed to fix unit tests. This does not cause any assembly difference in testsuite as well (-O3, X86-64 Manjaro). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99452	2021-03-27 19:05:12 +09:00
Juneyoung Lee	fc3f0c9cc0	[IRCE] Use m_LogicalAnd This is a minor fix to use m_LogicalAnd. This allows IRCE to recognize select form of and conditions as well.	2021-03-27 15:23:18 +09:00
Hongtao Yu	12ac0403b1	[CSSPGO][NFC] Fix a debug dump issue. During context promotion, intermediate nodes that are on a call path but do not come with a profile can be promoted together with their parent nodes. Do not print sample context string for such nodes since they do not have profile. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D99441	2021-03-26 16:06:56 -07:00
Chris Lattner	62c41cfba1	Add a missing file header comment, NFC.	2021-03-26 15:34:04 -07:00
Nikita Popov	4622648a06	Revert "[ArgPromotion] Copy additional metadata for loads." This reverts commit `166620a4f0`. A miscompile has been reported in https://reviews.llvm.org/D93927#2653480 and following.	2021-03-26 21:34:54 +01:00
Florian Hahn	4858e081d7	[ConstraintElimination] Only strip casts preserving the representation. Things like addrspacecast may not be no-ops, so we should not look through them.	2021-03-26 20:07:41 +00:00
Sanjay Patel	b0797e0c12	[SLP] use dyn_cast instead of isa + cast; NFC	2021-03-26 13:52:31 -04:00
Sanjay Patel	a26312f9d4	Revert "[SLP] allow matching integer min/max intrinsics as reduction ops" This reverts commit `3c8473ba53` and includes test diffs to maintain testing status. There's at least 1 place that was not updated with `7202f47508` , so we can crash mismatching select and intrinsics as shown in PR49730.	2021-03-26 09:59:14 -04:00
David Sherwood	c39460cc4f	Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost" This reverts commit `240aa96cf2`.	2021-03-26 11:36:53 +00:00
David Sherwood	240aa96cf2	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. Differential Revision: https://reviews.llvm.org/D98512	2021-03-26 11:27:12 +00:00
Wenlei He	5f59f407f5	[CSSPGO] Minor tweak for inline candidate priority tie breaker When prioritize call site to consider for inlining in sample loader, use number of samples as a first tier breaker before using name/guid comparison. This would favor smaller functions when hotness is the same (from the same block). We could try to retrieve accurate function size if this turns out to be more important. Differential Revision: https://reviews.llvm.org/D99370	2021-03-25 21:15:36 -07:00
Leonard Chan	36eaeaf728	[llvm][hwasan] Add Fuchsia shadow mapping configuration Ensure that Fuchsia shadow memory starts at zero. Differential Revision: https://reviews.llvm.org/D99380	2021-03-25 15:28:59 -07:00
Guozhi Wei	3240910f00	[DAE] Adjust param/arg attributes when changing parameter to undef In DeadArgumentElimination pass, if a function's argument is never used, corresponding caller's parameter can be changed to undef. If the param/arg has attribute noundef or other related attributes, LLVM LangRef(https://llvm.org/docs/LangRef.html#parameter-attributes) says its behavior is undefined. SimplifyCFG(D97244) takes advantage of this behavior and does bad transformation on valid code. To avoid this undefined behavior when change caller's parameter to undef, this patch removes noundef attribute and other attributes imply noundef on param/arg. Differential Revision: https://reviews.llvm.org/D98899	2021-03-25 14:53:22 -07:00
Roman Lebedev	1c55dcbca7	[NFCI][SimplifyCFG] Don't pay for a Small{Map,Set}Vector when plain SmallSet will suffice This only changes the cases where we really don't care about the iteration order of the underlying contained, namely when we will use the values from it to form DTU updates.	2021-03-25 23:25:40 +03:00
Yevgeny Rouban	f7ef26ef0b	[SLP] Fix crash in reduction for integer min/max The SCEV commit `b46c085d2b` [NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions seems to reveal a new crash in SLPVectorizer. SLP crashes expecting a SelectInst as an externally used value but umin() call is found. The patch relaxes the assumption to make the IR flag propagation safe. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D99328	2021-03-25 21:44:21 +07:00
Matt Morehouse	96a4167b4c	[HWASan] Use page aliasing on x86_64. Userspace page aliasing allows us to use middle pointer bits for tags without untagging them before syscalls or accesses. This should enable easier experimentation with HWASan on x86_64 platforms. Currently stack, global, and secondary heap tagging are unsupported. Only primary heap allocations get tagged. Note that aliasing mode will not work properly in the presence of fork(), since heap memory will be shared between the parent and child processes. This mode is non-ideal; we expect Intel LAM to enable full HWASan support on x86_64 in the future. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D98875	2021-03-25 07:04:14 -07:00
Alexey Bataev	568c874117	[SLP]Improve and simplify extendSchedulingRegion. We do not need to scan further if the upper end or lower end of the basic block is reached already and the instruction is not found. It means that the instruction is definitely in the lower part of basic block or in the upper block relatively. This should improve compile time for the very big basic blocks. Differential Revision: https://reviews.llvm.org/D99266	2021-03-25 05:31:58 -07:00
Sameer Sahasrabuddhe	b92c8c22b9	[NewPM] Disable non-trivial loop-unswitch on targets with divergence Unswitching a loop on a non-trivial divergent branch is expensive since it serializes the execution of both version of the loop. But identifying a divergent branch needs divergence analysis, which is a function level analysis. The legacy pass manager handles this dependency by isolating such a loop transform and rerunning the required function analyses. This functionality is currently missing in the new pass manager, and there is no safe way for the SimpleLoopUnswitch pass to depend on DivergenceAnalysis. So we conservatively assume that all non-trivial branches are divergent if the target has divergence. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D98958	2021-03-25 11:27:10 +00:00
Philip Reames	9a82f42d12	Plumb TLI through isSafeToExecuteUnconditionally [NFC] Split from D95815 to reduce patch size. Isn't (yet) used for anything, only the client side is wired up.	2021-03-24 17:52:04 -07:00
Matt Morehouse	c8ef98e5de	Revert "[HWASan] Use page aliasing on x86_64." This reverts commit `63f73c3eb9` due to breakage on aarch64 without TBI.	2021-03-24 16:18:29 -07:00
Roman Lebedev	2070fe7144	[NFCI][SimplifyCFG] Don't form DTU updates if we aren't going to apply them I think we may want to have a thin wrapper over a vector to deduplicate those `if(DTU)` predicates, and instead do them in the `insert()` itself.	2021-03-25 00:02:37 +03:00
Congzhe Cao	829c1b6443	[LoopInterchange] fix tightlyNested() in LoopInterchange legality This is yet another attempt to fix tightlyNested(). Add checks in tightlyNested() for the inner loop exit block, such that 1) if there is control-flow divergence in between the inner loop exit block and the outer loop latch, or 2) if the inner loop exit block contains unsafe instructions, tightlyNested() returns false. The reasoning behind is that after interchange, the original inner loop exit block, which was part of the outer loop, would be put into the new inner loop, and will be executed different number of times before and after interchange. Thus it should be dealt with appropriately. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D98263	2021-03-24 15:49:25 -04:00
Florian Hahn	9d45579279	[LV] Factor out phi type access to variable (NFC). A slight simplification of the code to reduce future diffs.	2021-03-24 19:25:22 +00:00
Florian Hahn	8d1342f79d	[LV] Remove redundant access to Legal::getReductionVars() (NFC). The reduction descriptor is retrieved earlier and stored in a variable RdxDesc already.	2021-03-24 19:15:14 +00:00
Gulfem Savrun Yeniceri	5fbe1fdf17	Revert "[Passes] Add relative lookup table converter pass" This reverts commit `5fd001a5ff` because it broke clang-with-thin-lto-ubuntu bot.	2021-03-24 18:59:33 +00:00
Matt Morehouse	63f73c3eb9	[HWASan] Use page aliasing on x86_64. Userspace page aliasing allows us to use middle pointer bits for tags without untagging them before syscalls or accesses. This should enable easier experimentation with HWASan on x86_64 platforms. Currently stack, global, and secondary heap tagging are unsupported. Only primary heap allocations get tagged. Note that aliasing mode will not work properly in the presence of fork(), since heap memory will be shared between the parent and child processes. This mode is non-ideal; we expect Intel LAM to enable full HWASan support on x86_64 in the future. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D98875	2021-03-24 11:43:41 -07:00
Gulfem Savrun Yeniceri	5fd001a5ff	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-03-24 17:31:18 +00:00
Nikita Popov	8a168d2d70	[LICM] Fix NumSunk statistic (NFC) LICM can sink instructions that have uses inside the loop, as long as these uses are considered "free". However, if there were only free uses inside the loop, and no uses outside the loop at all, the instruction would still count towards the NumSunk statistic. This resulted in a wild inflation of the NumSunk metric. After this patch it drops down from 1141787 to 5852 on test-suite O3.	2021-03-24 18:28:19 +01:00
Thomas Preud'homme	3b52c04e82	Make FindAvailableLoadedValue TBAA aware FindAvailableLoadedValue() relies on FindAvailablePtrLoadStore() to run the alias analysis when searching for an equivalent value. However, FindAvailablePtrLoadStore() calls the alias analysis framework with a memory location for the load constructed from an address and a size, which thus lacks TBAA metadata info. This commit modifies FindAvailablePtrLoadStore() to accept an optional memory location as parameter to allow FindAvailableLoadedValue() to create it based on the load instruction, which would then have TBAA metadata info attached. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99206	2021-03-24 17:20:26 +00:00
Roman Lebedev	fe36b834db	[NFCI][SimplifyCFG] Fold branch to common dest: don't check cost if no qualified preds	2021-03-24 19:01:47 +03:00
Sander de Smalen	55d18b3cc2	[TTI] Return a TypeSize from getRegisterBitWidth. This patch changes the interface to take a RegisterKind, to indicate whether the register bitwidth of a scalar register, fixed-width vector register, or scalable vector register must be returned. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98874	2021-03-24 14:45:13 +00:00
Florian Hahn	cd0c00c9fe	[LV] Move exact FP math check out of Requirements. We know if the loop contains FP instructions preventing vectorization after we are done with legality checks. This patch updates the code the check for un-vectorizable FP operations earlier, to avoid unnecessarily running the cost model and picking a vectorization factor. It also makes the code more direct and moves the check to a position where similar checks are done. I might be missing something, but I don't see any reason to handle this check differently to other, similar checks. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98633	2021-03-24 11:01:44 +00:00
Ta-Wei Tu	4d9d736875	[NFC] Improve debug message and test description in `4c1f74a`	2021-03-24 18:21:13 +08:00
Ta-Wei Tu	4c1f74a76c	[LoopFlatten] Fix invalid assertion (PR49571) The `InductionPHI` is not necessarily the increment instruction, as demonstrated in pr49571.ll. This patch removes the assertion and instead bails out from the `LoopFlatten` pass if that happens. This fixes https://bugs.llvm.org/show_bug.cgi?id=49571 Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D99252	2021-03-24 18:08:27 +08:00
Ta-Wei Tu	8fde25b3c3	[NFC] Remove redundant `struct` prefix Reviewed By: SjoerdMeijer, fhahn Differential Revision: https://reviews.llvm.org/D99251	2021-03-24 17:58:33 +08:00
Alexey Bataev	99203f2004	[Analysis]Add getPointersDiff function to improve compile time. Added getPointersDiff function to LoopAccessAnalysis and used it instead direct calculatoin of the distance between pointers and/or isConsecutiveAccess function in SLP vectorizer to improve compile time and detection of stores consecutive chains. Part of D57059 Differential Revision: https://reviews.llvm.org/D98967	2021-03-23 14:25:36 -07:00
Alexey Bataev	f1b47ad278	Revert "[Analysis]Add getPointersDiff function to improve compile time." This reverts commit `065a14a12d` to investigate and fix crash in SLP vectorizer.	2021-03-23 13:17:54 -07:00
Alexey Bataev	065a14a12d	[Analysis]Add getPointersDiff function to improve compile time. Added getPointersDiff function to LoopAccessAnalysis and used it instead direct calculatoin of the distance between pointers and/or isConsecutiveAccess function in SLP vectorizer to improve compile time and detection of stores consecutive chains. Part of D57059 Differential Revision: https://reviews.llvm.org/D98967	2021-03-23 12:58:42 -07:00
Roman Lebedev	b5822026dd	[SimplifyCFG] 'Fold branch to common dest': don't overestimate the cost `FoldBranchToCommonDest()` has a certain budget (`-bonus-inst-threshold=`) for bonus instruction duplication. And currently it calculates the cost as-if it will actually duplicate into each predecessor. But ignoring the budget, it won't always duplicate into each predecessor, there are some correctness and profitability checks. So when calculating the cost, we should first check into which blocks will we actually duplicate, and only then use that block count to do budgeting.	2021-03-23 18:30:26 +03:00
Roman Lebedev	514bc01ca3	[SimplifyCFG] FoldBranchToCommonDest(): properly handle same-block external uses (PR49510/PR49689) We clone bonus instructions to the end of the predecessor block, and then use `SSAUpdater::RewriteUseAfterInsertions()`. But that only deals with the cases where the use-to-be-rewritten are either in different block from the def, or come after the def. But in some loop cases, the external use may be in the beginning of predecessor block, before the newly cloned bonus instruction. `SSAUpdater::RewriteUseAfterInsertions()` does not deal with that. Notably, the external use can't happen to be both in the same block and after the newly-cloned instruction, because of the fold preconditions. To properly handle these cases, when the use is in the same block, we should instead use `SSAUpdater::RewriteUse()`. TBN, they do the same thing for PHI users. Fixes https://bugs.llvm.org/show_bug.cgi?id=49510 Likely Fixes https://bugs.llvm.org/show_bug.cgi?id=49689	2021-03-23 17:37:28 +03:00
Sanjay Patel	1bf8f9e228	[SimplifyCFG] use profile metadata to refine merging branch conditions 2nd try (original: `27ae17a6b0`) with fix/test for crash. We must make sure that TTI is available before trying to use it because it is not required (might be another bug). Original commit message: This is one step towards solving: https://llvm.org/PR49336 In that example, we disregard the recommended usage of builtin_expect, so an expensive (unpredictable) branch is folded into another branch that is guarding it. Here, we read the profile metadata to see if the 1st (predecessor) condition is likely to cause execution to bypass the 2nd (successor) condition before merging conditions by using logic ops. Differential Revision: https://reviews.llvm.org/D98898	2021-03-23 10:19:37 -04:00
Sanjay Patel	3c8473ba53	[SLP] allow matching integer min/max intrinsics as reduction ops As noted in D98152, we need to patch SLP to avoid regressions when we start canonicalizing to integer min/max intrinsics. Most of the real work to make this possible was in: `7202f47508` Differential Revision: https://reviews.llvm.org/D98981	2021-03-23 08:56:44 -04:00
Luke Drummond	520f70e94d	[NFC] clang-format llvm/lib/Transforms/Utils/CloneFunction.cpp Differential Revision: https://reviews.llvm.org/D98957	2021-03-23 12:53:28 +00:00
Luke Drummond	ab44ec1b22	[NFC] Minor refactor - Give unwieldy repeated expression a name - Use a ranged `for` basic block iterator Reviewed by: nikic, dexonsmith Differential Revisision: https://reviews.llvm.org/D98957	2021-03-23 12:53:28 +00:00
Luke Drummond	0448ddd169	[NFCI] cleanup CloneFunctionInto Hoist early return for decl-only clones to before DIFinder calculation. Also fix an out of date assert message after invariants changed in `22a52dfddc`. Reviewed by: nikic, dexonsmith Differential Revisision: https://reviews.llvm.org/D98957	2021-03-23 12:53:27 +00:00
Nashe Mncube	5d929794a8	[llvm-opt] Bug fix within combining FP vectors A bug was found within InstCombineCasts where a function call is only implemented to work with FixedVectors. This caused a crash when a ScalableVector was passed to this function. This commit introduces a regression test which recreates the failure and a bug fix. Differential Revision: https://reviews.llvm.org/D98351	2021-03-23 12:13:41 +00:00
Florian Hahn	e43e8e9138	[AnnotationRemarks] Use subprogram location for summary remarks. The summary remarks are generated on a per-function basis. Using the first instruction's location is sub-optimal for 2 reasons: 1. Sometimes the first instruction is missing !dbg 2. The location of the first instruction may be mis-leading. Instead, just use the location of the function directly.	2021-03-23 12:05:41 +00:00
David Sherwood	d70251163f	[LoopVectorize][NFC] Refactor code to use IRBuilder::CreateStepVector In places where we create a ConstantVector whose elements are a linear sequence of the form <start, start + 1, start + 2, ...> I've changed the code to make use of CreateStepVector, which creates a vector with the sequence <0, 1, 2, ...>, and a vector addition operation. This patch is a non-functional change, since the output from the vectoriser remains unchanged for fixed length vectors and there are existing asserts that still fire when attempting to use scalable vectors for vectorising induction variables. In a later patch we will enable support for scalable vectors in InnerLoopVectorizer::getStepVector(), which relies upon the new stepvector intrinsic in IRBuilder::CreateStepVector. Differential Revision: https://reviews.llvm.org/D97861	2021-03-23 11:29:05 +00:00
Florian Hahn	f759d512c8	[VPlan] Include name when printing after `93a9d2de8f`. The name is included when printing in DOT mode. Also print it in non-DOT mode after `93a9d2de8f`. This will become more important to distinguish different plans once VPlans are gradually refined.	2021-03-23 09:50:14 +00:00
Juneyoung Lee	960a767368	Reland "[InstCombine] Add simplification of two logical and/ors" This relands `07c3b97e18` (D96945) which was reverted by commit `f49354838e`. The two-stage compilation successfully tests passes on my machine.	2021-03-23 16:24:50 +09:00
Fangrui Song	3c81822ec5	[SanitizerCoverage] Use External on Windows This should fix https://reviews.llvm.org/D98903#2643589 though it is not clear to me why ExternalWeak does not work.	2021-03-22 23:05:36 -07:00
Serguei Katkov	9fec382601	[RS4GC] Fix hang on infinite loop meetBDVState utility may sets the base pointer for the conflict state. At this moment the base for conflict state does not have any meaning but is used in comparison of BDV states. This comparison is used as an indicator of progress done on iteration and RS4GC pass uses infinite loop to reach fixed point. As a result for added test on each iteration state for some phi nodes is updated with other base value for conflict state and it indicates as a progress while for conflict state there is no any progress more possible. In reality the base value is transferred from one state to another and pass detects the progress on these states. The test is very fragile. The traversal order of states and operands of phi nodes plays important role. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D99058	2021-03-23 12:54:51 +07:00
Gulfem Savrun Yeniceri	e3a6d70c68	Revert "[Passes] Add relative lookup table converter pass" This reverts commit `78a65cd945` which caused buildbot failures.	2021-03-23 00:43:16 +00:00
Juneyoung Lee	5c2e50b5d2	Reland "[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe" This relands commit `99108c791d` (D95026) which was reverted by `8d5a981a13` because the underlying problem (https://llvm.org/pr49495) is fixed.	2021-03-23 09:19:53 +09:00
Gulfem Savrun Yeniceri	78a65cd945	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-03-22 22:09:02 +00:00
Roman Lebedev	d37fe26a2b	[NFC][IR] Type: add getWithNewType() method Sometimes you want to get a type with same vector element count as the current type, but different element type, but there's no QOL wrapper to do that. Add one.	2021-03-23 00:50:58 +03:00
Sanjay Patel	95f7f7c21b	Revert "[SimplifyCFG] use profile metadata to refine merging branch conditions" This reverts commit `27ae17a6b0`. There are bot failures that end with: #4 0x00007fff7ae3c9b8 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0 #5 0x00007fff84e504d8 (linux-vdso64.so.1+0x4d8) #6 0x00007fff7c419a5c llvm::TargetTransformInfo::getPredictableBranchThreshold() const (/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libLLVMAnalysis.so.13git+0x479a5c) ...but not sure how to trigger that yet.	2021-03-22 17:48:06 -04:00
Sanjay Patel	27ae17a6b0	[SimplifyCFG] use profile metadata to refine merging branch conditions This is one step towards solving: https://llvm.org/PR49336 In that example, we disregard the recommended usage of builtin_expect, so an expensive (unpredictable) branch is folded into another branch that is guarding it. Here, we read the profile metadata to see if the 1st (predecessor) condition is likely to cause execution to bypass the 2nd (successor) condition before merging conditions by using logic ops. Differential Revision: https://reviews.llvm.org/D98898	2021-03-22 16:49:21 -04:00
Sanjay Patel	664d0c052c	[TargetTransformInfo] move branch probability query from TargetLoweringInfo This is no-functional-change intended (NFC), but needed to allow optimizer passes to use the API. See D98898 for a proposed usage by SimplifyCFG. I'm simplifying the code by removing the cl::opt. That was added back with the original commit in D19488, but I don't see any evidence in regression tests that it was used. Target-specific overrides can use the usual patterns to adjust as necessary. We could also restore that cl::opt, but it was not clear to me exactly how to do it in the convoluted TTI class structure.	2021-03-22 15:55:34 -04:00
Bjorn Pettersson	688cdddafb	[SLP] Honor min/max regsize and min/max VF in vectorizeStores Make sure we use PowerOf2Floor instead of PowerOf2Ceil when calculating max number of elements that fits inside a vector register (otherwise we could end up creating vectors larger than the maximum vector register size). Also make sure we honor the min/max VF (as given by TTI or cmd line parameters) when doing vectorizeStores. Reviewed By: anton-afanasyev Differential Revision: https://reviews.llvm.org/D97691	2021-03-22 17:29:35 +01:00
Matt Morehouse	772851ca4e	[HWASan] Disable stack, globals and force callbacks for x86_64. Subsequent patches will implement page-aliasing mode for x86_64, which will initially only work for the primary heap allocator. We force callback instrumentation to simplify the initial aliasing implementation. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D98069	2021-03-22 08:02:27 -07:00
Bradley Smith	48f5a392cb	[IR] Add vscale_range IR function attribute This attribute represents the minimum and maximum values vscale can take. For now this attribute is not hooked up to anything during codegen, this will be added in the future when such codegen is considered stable. Additionally hook up the -msve-vector-bits=<x> clang option to emit this attribute. Differential Revision: https://reviews.llvm.org/D98030	2021-03-22 12:05:06 +00:00
Max Kazantsev	8fab9f824f	[IndVars] Sharpen context in eliminateIVComparison When eliminating comparisons, we can use common dominator of all its users as context. This gives better results when ICMP is not computed right before the branch that uses it. Differential Revision: https://reviews.llvm.org/D98924 Reviewed By: lebedev.ri	2021-03-22 11:55:57 +07:00
Roman Lebedev	e3a4701627	[clang][CodeGen] Lower Likelihood attributes to @llvm.expect intrin instead of branch weights `08196e0b2e` exposed LowerExpectIntrinsic's internal implementation detail in the form of LikelyBranchWeight/UnlikelyBranchWeight options to the outside. While this isn't incorrect from the results viewpoint, this is suboptimal from the layering viewpoint, and causes confusion - should transforms also use those weights, or should they use something else, D98898? So go back to status quo by making LikelyBranchWeight/UnlikelyBranchWeight internal again, and fixing all the code that used it directly, which currently is only clang codegen, thankfully, to emit proper @llvm.expect intrinsics instead.	2021-03-21 22:50:21 +03:00
Roman Lebedev	37d6be9052	Revert "[BranchProbability] move options for 'likely' and 'unlikely'" Upon reviewing D98898 i've come to realization that these are implementation detail of LowerExpectIntrinsicPass, and they should not be exposed to outside of it. This reverts commit `ee8b53815d`.	2021-03-21 22:50:21 +03:00
Sanjay Patel	ee8b53815d	[BranchProbability] move options for 'likely' and 'unlikely' This makes the settings available for use in other passes by housing them within the Support lib, but NFC otherwise. See D98898 for the proposed usage in SimplifyCFG (where this change was originally included). Differential Revision: https://reviews.llvm.org/D98945	2021-03-20 14:46:46 -04:00
Jeroen Dobbelaere	77080a1eb6	Revert of D49126 [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types. Now that intrinsic name mangling can cope with unnamed types, the custom name mangling in PredicateInfo (introduced by D49126) can be removed. (See D91250, D48541) Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D91661	2021-03-20 11:37:09 +01:00
Arthur Eubanks	a17394dc88	[NewPM] Verify LoopAnalysisResults after a loop pass All loop passes should preserve all analyses in LoopAnalysisResults. Add checks for those when the checks are enabled (which is by default with expensive checks on). Note that due to PR44815, we don't check LAR's ScalarEvolution. Apparently calling SE.verify() can change its results. This is a reland of https://reviews.llvm.org/D98820 which was reverted due to unacceptably large compile time regressions in normal debug builds.	2021-03-19 14:56:37 -07:00
Arthur Eubanks	a1ab5627f0	Revert "[NewPM] Verify LoopAnalysisResults after a loop pass" This reverts commit `94c269baf5`. Still causes too large of compile time regression in normal debug builds. Will put under expensive checks instead.	2021-03-19 14:31:08 -07:00
Arthur Eubanks	94c269baf5	[NewPM] Verify LoopAnalysisResults after a loop pass All loop passes should preserve all analyses in LoopAnalysisResults. Add checks for those. Note that due to PR44815, we don't check LAR's ScalarEvolution. Apparently calling SE.verify() can change its results. Only verify MSSA when VerifyMemorySSA, normally it's very expensive. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98820	2021-03-19 13:26:45 -07:00
Philip Reames	5698537f81	Update basic deref API to account for possiblity of free [NFC] This patch is plumbing to support work towards the goal outlined in the recent llvm-dev post "[llvm-dev] RFC: Decomposing deref(N) into deref(N) + nofree". The point of this change is purely to simplify iteration on other pieces on way to making the switch. Rebuilding with a change to Value.h is slow and painful, so I want to get the API change landed. Once that's done, I plan to more closely audit each caller, add the inference rules in their own patch, then post a patch with the langref changes and test diffs. The value of the command line flag is that we can exercise the inference logic in standalone patches without needing the whole switch ready to go just yet. Differential Revision: https://reviews.llvm.org/D98908	2021-03-19 11:17:19 -07:00
Andrei Elovikov	92205cb27f	[NFC][VPlan] Guard print routines with "#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)" Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D98897	2021-03-19 10:50:12 -07:00
Andrei Elovikov	93a9d2de8f	[VPlan] Add plain text (not DOT's digraph) dumps I foresee two uses for this: 1) It's easier to use those in debugger. 2) Once we start implementing more VPlan-to-VPlan transformations (especially inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in LIT test would become too obscure. I can imagine that we'd want to CHECK against VPlan dumps after multiple transformations instead. That would be easier with plain text dumps than with DOT format. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96628	2021-03-19 10:50:12 -07:00
Max Kazantsev	8eefa07fcf	[NFC] Move function up in code	2021-03-19 14:03:31 +07:00
Max Kazantsev	8bb952b57f	[NFC] Factor out utility function for finding common dom of user set	2021-03-19 13:49:29 +07:00
Max Kazantsev	16370e02a7	[IndVars] Provide eliminateIVComparison with context We can prove more predicates when we have a context when eliminating ICmp. As first (and very obvious) approximation we can use the ICmp instruction itself, though in the future we are going to use a common dominator of all its users. Need some refactoring before that. Observed ~0.5% negative compile time impact. Differential Revision: https://reviews.llvm.org/D98697 Reviewed By: lebedev.ri	2021-03-19 12:28:22 +07:00
Fangrui Song	9558456b53	[SanitizerCoverage] Make __start_/__stop_ symbols extern_weak On ELF, we place the metadata sections (`__sancov_guards`, `__sancov_cntrs`, `__sancov_bools`, `__sancov_pcs` in section groups (either `comdat any` or `comdat noduplicates`). With `--gc-sections`, LLD since D96753 and GNU ld `-z start-stop-gc` may garbage collect such sections. If all `__sancov_bools` are discarded, LLD will error `error: undefined hidden symbol: __start___sancov_cntrs` (other sections are similar). ``` % cat a.c void discarded() {} % clang -fsanitize-coverage=func,trace-pc-guard -fpic -fvisibility=hidden a.c -shared -fuse-ld=lld -Wl,--gc-sections ... ld.lld: error: undefined hidden symbol: __start___sancov_guards >>> referenced by a.c >>> /tmp/a-456662.o:(sancov.module_ctor_trace_pc_guard) ``` Use the `extern_weak` linkage (lowered to undefined weak symbols) to avoid the undefined error. Differential Revision: https://reviews.llvm.org/D98903	2021-03-18 16:46:04 -07:00
George Balatsouras	d10f173f34	[dfsan] Add -dfsan-fast-8-labels flag This is only adding support to the dfsan instrumentation pass but not to the runtime. Added more RUN lines for testing: for each instrumentation test that had a -dfsan-fast-16-labels invocation, a new invocation was added using fast8. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D98734	2021-03-18 16:28:42 -07:00
Mehdi Amini	3614df3537	Revert "[VPlan] Add plain text (not DOT's digraph) dumps" This reverts commit `6b053c9867`. The build is broken: ld.lld: error: undefined symbol: llvm::VPlan::printDOT(llvm::raw_ostream&) const >>> referenced by LoopVectorize.cpp >>> LoopVectorize.cpp.o:(llvm::LoopVectorizationPlanner::printPlans(llvm::raw_ostream&)) in archive lib/libLLVMVectorize.a	2021-03-18 19:20:39 +00:00
Andrei Elovikov	6b053c9867	[VPlan] Add plain text (not DOT's digraph) dumps I foresee two uses for this: 1) It's easier to use those in debugger. 2) Once we start implementing more VPlan-to-VPlan transformations (especially inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in LIT test would become too obscure. I can imagine that we'd want to CHECK against VPlan dumps after multiple transformations instead. That would be easier with plain text dumps than with DOT format. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96628	2021-03-18 11:33:39 -07:00
Wei Mi	14756b70ee	[SampleFDO] Don't mix up the existing indirect call value profile with the new value profile annotated after inlining. In https://reviews.llvm.org/D96806 and https://reviews.llvm.org/D97350, we use the magic number -1 in the value profile to avoid repeated indirect call promotion to the same target for an indirect call. Function updateIDTMetaData is used to mark an target as being promoted in the value profile with the magic number. updateIDTMetaData is also used to update the value profile when an indirect call is inlined and new inline instance profile should be applied. For the second case, currently updateIDTMetaData mixes up the existing value profile of the indirect call with the new profile, leading to the problematic senario that a target count is larger than the total count in the value profile. The patch fixes the problem. When updateIDTMetaData is used to update the value profile after inlining, all the values in the existing value profile will be dropped except the values with the magic number counts. Differential Revision: https://reviews.llvm.org/D98835	2021-03-18 09:54:34 -07:00
Mircea Trofin	4b1c8070bb	[NFC][ArgumentPromotion] Clear FAM cached results of erased function. Not doing it here can lead to subtle bugs - the analysis results are associated by the Function object's address. Nothing stops the memory allocator from allocating new functions at the same address.	2021-03-18 09:17:32 -07:00
Alexey Bataev	b3ced9852c	[SLP]Fix crash on extending scheduling region. If SLP vectorizer tries to extend the scheduling region and runs out of the budget too early, but still extends the region to the new ending instructions (i.e., it was able to extend the region for the first instruction in the bundle, but not for the second), the compiler need to recalculate dependecies in full, just like if the extending was successfull. Without it, the schedule data chunks may end up with the wrong number of (unscheduled) dependecies and it may end up with the incorrect function, where the vectorized instruction does not dominate on the extractelement instruction. Differential Revision: https://reviews.llvm.org/D98531	2021-03-18 06:11:08 -07:00
Max Kazantsev	26ec76add5	[NFC] One more use case for evaluatePredicate	2021-03-18 19:21:29 +07:00
Max Kazantsev	1067a13cc1	[NFC] Use evaluatePredicate in eliminateComparison Just makes code simpler.	2021-03-18 19:21:29 +07:00
Arthur Eubanks	792bed6a4c	Revert "[NewPM] Verify LoopAnalysisResults after a loop pass" This reverts commit `6db3ab2903`. Causing too large of compile time regression.	2021-03-17 15:22:52 -07:00
Arthur Eubanks	6db3ab2903	[NewPM] Verify LoopAnalysisResults after a loop pass All loop passes should preserve all analyses in LoopAnalysisResults. Add checks for those. Note that due to PR44815, we don't check LAR's ScalarEvolution. Apparently calling SE.verify() can change its results. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98805	2021-03-17 13:37:22 -07:00
Philip Reames	31764ea295	[LCSSA] Extract a utility for deciding if a new use requires a new lcssa phi [NFC] (Triggered by a review comment on D98728, but otherwise unrelated.)	2021-03-17 12:14:01 -07:00
Philip Reames	7c7f4676cd	[LICM] Fix a crash when sinking instructions w/token operands It is not legal to form a phi node with token type. The generic LCSSA construction code handles this correctly - by not forming LCSSA for such cases - but the adhoc fixup implementation in LICM did not. This was noticed in the context of PR49607, but can be demonstrated on ToT with the tweaked test case. This is not specific to gc.relocate btw, it also applies to usage of the preallocated family of intrinsics as well. Differential Revision: https://reviews.llvm.org/D98728	2021-03-17 11:18:46 -07:00
David Green	e2935dcfc4	[TTI] Add a Mask to getShuffleCost This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask can be provided a more accurate cost can be provided by the backend. For example VREV costs could be returned by the ARM backend. This should be an NFC until then, laying the groundwork for that to be added. Differential Revision: https://reviews.llvm.org/D98206	2021-03-17 17:46:26 +00:00
Stephen Tozer	3bfddc2593	Reapply "[DebugInfo] Handle multiple variable location operands in IR" Fixed section of code that iterated through a SmallDenseMap and added instructions in each iteration, causing non-deterministic code; replaced SmallDenseMap with MapVector to prevent non-determinism. This reverts commit `01ac6d1587`.	2021-03-17 16:45:25 +00:00
LemonBoy	4f024938e4	[LoopVectorize] Refine hasIrregularType predicate The `hasIrregularType` predicate checks whether an array of N values of type Ty is "bitcast-compatible" with a <N x Ty> vector. The previous check returned invalid results in some cases where there's some padding between the array elements: eg. a 4-element array of u7 values is considered as compatible with <4 x u7>, even though the vector is only loading/storing 28 bits instead of 32. The problem causes LLVM to generate incorrect code for some targets: for AArch64 the vector loads/stores are lowered in terms of ubfx/bfi, effectively losing the top (N * padding bits). Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D97465	2021-03-17 17:03:47 +01:00
Hans Wennborg	01ac6d1587	Revert "[DebugInfo] Handle multiple variable location operands in IR" This caused non-deterministic compiler output; see comment on the code review. > This patch updates the various IR passes to correctly handle dbg.values with a > DIArgList location. This patch does not actually allow DIArgLists to be produced > by salvageDebugInfo, and it does not affect any pass after codegen-prepare. > Other than that, it should cover every IR pass. > > Most of the changes simply extend code that operated on a single debug value to > operate on the list of debug values in the style of any_of, all_of, for_each, > etc. Instances of setOperand(0, ...) have been replaced with with > replaceVariableLocationOp, which takes the value that is being replaced as an > additional argument. In places where this value isn't readily available, we have > to track the old value through to the point where it gets replaced. > > Differential Revision: https://reviews.llvm.org/D88232 This reverts commit `df69c69427`.	2021-03-17 13:36:48 +01:00
David Green	3c25c40d51	[LV] Account for the cost of predication of scalarized load/store This adds the cost of an i1 extract and a branch to the cost in getMemInstScalarizationCost when the instruction is predicated. These predicated loads/store would generate blocks of something like: %c1 = extractelement <4 x i1> %C, i32 1 br i1 %c1, label %if, label %else if: %sa = extractelement <4 x i32> %a, i32 1 %sb = getelementptr inbounds float, float* %pg, i32 %sa %sv = extractelement <4 x float> %x, i32 1 store float %sa, float* %sb, align 4 else: So this increases the cost by the extract and branch. This is probably still too low in many cases due to the cost of all that branching, but there is already an existing hack increasing the cost using useEmulatedMaskMemRefHack. It will increase the cost of a memop if it is a load or there are more than one store. This patch improves the cost for when there is only a single store, and hopefully at some point in the future the hack can be removed. Differential Revision: https://reviews.llvm.org/D98243	2021-03-17 10:57:50 +00:00
Bu Le	9abe500473	[SLP] Fix the trunc instruction insertion problem Current SLP pass has this piece of code that inserts a trunc instruction after the vectorized instruction. In the case that the vectorized instruction is a phi node and not the last phi node in the BB, the trunc instruction will be inserted between two phi nodes, which will trigger verify problem in debug version or unpredictable error in another pass. This patch changes the algorithm to 'if the last vectorized instruction is a phi, insert it after the last phi node in current BB' to fix this problem.	2021-03-17 13:51:08 +03:00
Arthur Eubanks	70af2924a7	[Unswitch] Guard dbgs logging with LLVM_DEBUG	2021-03-16 22:31:57 -07:00
Sanjay Patel	7202f47508	[SLP] separate min/max matching from its instruction-level implementation; NFC The motivation is to handle integer min/max reductions independently of whether they are in the current cmp+sel form or the planned intrinsic form. We assumed that min/max included a select instruction, but we can decouple that implementation detail by checking the instructions themselves rather than relying on the recurrence (reduction) type.	2021-03-16 17:16:11 -04:00
Mohammad Hadi Jooybar	302b80abf0	[InstCombine] Avoid Bitcast-GEP fusion for pointers directly from allocation functions Elimination of bitcasts with void pointer arguments results in GEPs with pure byte indexes. These GEPs do not preserve struct/array information and interrupt phi address translation in later pipeline stages. Here is the original motivation for this patch: ``` #include<stdio.h> #include<malloc.h> typedef struct __Node{ double f; struct __Node next; } Node; void foo () { Node a = (Node) malloc (sizeof(Node)); a->next = NULL; a->f = 11.5f; Node ptr = a; double sum = 0.0f; while (ptr) { sum += ptr->f; ptr = ptr->next; } printf("%f\n", sum); } ``` By explicit assignment `a->next = NULL`, we can infer the length of the link list is `1`. In this case we can eliminate while loop traversal entirely. This elimination is supposed to be performed by GVN/MemoryDependencyAnalysis/PhiTranslation . The final IR before this patch: ``` define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2 %next = getelementptr inbounds i8, i8* %call, i64 8 %0 = bitcast i8* %next to %struct.__Node** store %struct.__Node* null, %struct.__Node** %0, align 8, !tbaa !2 %f = bitcast i8* %call to double* store double 1.150000e+01, double* %f, align 8, !tbaa !8 %tobool12 = icmp eq i8* %call, null br i1 %tobool12, label %while.end, label %while.body.lr.ph while.body.lr.ph: ; preds = %entry %1 = bitcast i8* %call to %struct.__Node* br label %while.body while.body: ; preds = %while.body.lr.ph, %while.body %sum.014 = phi double [ 0.000000e+00, %while.body.lr.ph ], [ %add, %while.body ] %ptr.013 = phi %struct.__Node* [ %1, %while.body.lr.ph ], [ %3, %while.body ] %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0 %2 = load double, double* %f1, align 8, !tbaa !8 %add = fadd contract double %sum.014, %2 %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1 %3 = load %struct.__Node, %struct.__Node* %next2, align 8, !tbaa !2 %tobool = icmp eq %struct.__Node* %3, null br i1 %tobool, label %while.end, label %while.body while.end: ; preds = %while.body, %entry %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %while.body ] %call3 = tail call i32 (i8, ...) @printf(i8 nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa) ret void } ``` Final IR after this patch: ``` ; Function Attrs: nofree nounwind define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { while.end: %call3 = tail call i32 (i8, ...) @printf(i8 nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 1.150000e+01) ret void } ``` IR before GVN before this patch: ``` define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2 %next = getelementptr inbounds i8, i8* %call, i64 8 %0 = bitcast i8* %next to %struct.__Node** store %struct.__Node* null, %struct.__Node** %0, align 8, !tbaa !2 %f = bitcast i8* %call to double* store double 1.150000e+01, double* %f, align 8, !tbaa !8 %tobool12 = icmp eq i8* %call, null br i1 %tobool12, label %while.end, label %while.body.lr.ph while.body.lr.ph: ; preds = %entry %1 = bitcast i8* %call to %struct.__Node* br label %while.body while.body: ; preds = %while.body.lr.ph, %while.body %sum.014 = phi double [ 0.000000e+00, %while.body.lr.ph ], [ %add, %while.body ] %ptr.013 = phi %struct.__Node* [ %1, %while.body.lr.ph ], [ %3, %while.body ] %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0 %2 = load double, double* %f1, align 8, !tbaa !8 %add = fadd contract double %sum.014, %2 %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1 %3 = load %struct.__Node, %struct.__Node* %next2, align 8, !tbaa !2 %tobool = icmp eq %struct.__Node* %3, null br i1 %tobool, label %while.end.loopexit, label %while.body while.end.loopexit: ; preds = %while.body %add.lcssa = phi double [ %add, %while.body ] br label %while.end while.end: ; preds = %while.end.loopexit, %entry %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, %while.end.loopexit ] %call3 = tail call i32 (i8, ...) @printf(i8 nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa) ret void } ``` IR before GVN after this patch: ``` define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2 %0 = bitcast i8* %call to %struct.__Node* %next = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 1 store %struct.__Node* null, %struct.__Node** %next, align 8, !tbaa !2 %f = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 0 store double 1.150000e+01, double* %f, align 8, !tbaa !8 %tobool12 = icmp eq i8* %call, null br i1 %tobool12, label %while.end, label %while.body.preheader while.body.preheader: ; preds = %entry br label %while.body while.body: ; preds = %while.body.preheader, %while.body %sum.014 = phi double [ %add, %while.body ], [ 0.000000e+00, %while.body.preheader ] %ptr.013 = phi %struct.__Node* [ %2, %while.body ], [ %0, %while.body.preheader ] %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0 %1 = load double, double* %f1, align 8, !tbaa !8 %add = fadd contract double %sum.014, %1 %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1 %2 = load %struct.__Node, %struct.__Node* %next2, align 8, !tbaa !2 %tobool = icmp eq %struct.__Node* %2, null br i1 %tobool, label %while.end.loopexit, label %while.body while.end.loopexit: ; preds = %while.body %add.lcssa = phi double [ %add, %while.body ] br label %while.end while.end: ; preds = %while.end.loopexit, %entry %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, %while.end.loopexit ] %call3 = tail call i32 (i8, ...) @printf(i8 nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa) ret void } ``` The phi translation fails before this patch and it prevents GVN to remove the loop. The reason for this failure is in InstCombine. When the Instruction combining pass decides to convert: ``` %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) %0 = bitcast i8* %call to %struct.__Node* %next = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 1 store %struct.__Node* null, %struct.__Node** %next ``` to ``` %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) %next = getelementptr inbounds i8, i8* %call, i64 8 %0 = bitcast i8* %next to %struct.__Node** store %struct.__Node* null, %struct.__Node** %0 ``` GEP instructions with pure byte indexes (e.g. `getelementptr inbounds i8, i8* %call, i64 8`) are obstacles for address translation. address translation is looking for structural similarity between GEPs and these GEPs usually do not match since they have different structure. This change will cause couple of failures in LLVM-tests. However, in all cases we need to change expected result by the test. I will update those tests as soon as I get green light on this patch. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96881	2021-03-16 17:05:44 -04:00
Philip Reames	cec9e7352b	[rs4gc] Simplify code by cloning existing instructions when inserting base chain [NFC] Previously we created a new node, then filled in the pieces. Now, we clone the existing node, then change the respective fields. The only change in handling is with phis since we have to handle multiple incoming edges from the same block a bit differently. Differential Revision: https://reviews.llvm.org/D98316	2021-03-16 13:10:32 -07:00
Philip Reames	ef884e155d	[rs4gc] don't force a conflict for a canonical broadcast A broadcast is a shufflevector where only one input is used. Because of the way we handle constants (undef is a constant), the canonical shuffle sees a meet of (some value) and (nullptr). Given this, every broadcast gets treated as a conflict and a new base pointer computation is added. The other way to tackle this would be to change constant handling specifically for undefs, but this seems easier. Differential Revision: https://reviews.llvm.org/D98315	2021-03-16 12:59:06 -07:00
Philip Reames	5cabf472cb	[rs4gc] don't duplicate existing values which are provably base pointers RS4GC needs to rewrite the IR to ensure that every relocated pointer has an associated base pointer. The existing code isn't particularly smart about avoiding duplication of existing IR when it turns out the original pointer we were asked to materialize a base pointer for is itself a base pointer. This patch adds a stage to the algorithm which prunes nodes proven (with a simple forward dataflow fixed point) to be base pointers from the list of nodes considered for duplication. This does require changing some of the later invariants slightly, that's probably the riskiest part of the change. Differential Revision: D98122	2021-03-16 12:51:21 -07:00
Liam Keegan	edf9565a86	[MemCpyOpt] Add missing MemorySSAWrapperPass dependency macro Add MemorySSAWrapperPass as a dependency to MemCpyOptLegacyPass, since MemCpyOpt now uses MemorySSA by default. Differential Revision: https://reviews.llvm.org/D98484	2021-03-16 20:30:00 +01:00
Philip Reames	6972e39d47	[gvn] CSE gc.relocates based on meaning, not spelling (try 2) This was (partially) reverted in `cfe8f8e0` because the conversion from readonly to readnone in Intrinsics.td exposed a couple of problems. This change has been reworked to not need that change (via some explicit checks in client code). This is being done to address the original optimization issue and simplify the testing of the readonly changes. I'm working on that piece under 49607. Original commit message follows: The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.) We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass. Differential Revision: https://reviews.llvm.org/D97974	2021-03-16 10:59:31 -07:00
Florian Hahn	f586de8459	[VPlan] Remove PredInst2Recipe, use VP operands instead. (NFC) Instead of maintaining a separate map from predicated instructions to recipes, we can instead directly look at the VP operands. If the operand comes from a predicated instruction, the operand will be a VPPredInstPHIRecipe with a VPReplicateRecipe as its operand.	2021-03-16 17:40:35 +00:00
Sanjay Patel	40fdb43d30	[SLP] improve readability in reduction logic; NFC We had 2 different and ambiguously-named 'I' variables.	2021-03-16 07:35:13 -04:00
Caroline Concatto	3c03635d53	[SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse This patch adds support for reverse loop vectorization. It is possible to vectorize the following loop: ``` for (int i = n-1; i >= 0; --i) a[i] = b[i] + 1.0; ``` with fixed or scalable vector. The loop-vectorizer will use 'reverse' on the loads/stores to make sure the lanes themselves are also handled in the right order. This patch adds support for scalable vector on IRBuilder interface to create a reverse vector. The IR function CreateVectorReverse lowers to experimental.vector.reverse for scalable vector and keedp the original behavior for fixed vector using shuffle reverse. Differential Revision: https://reviews.llvm.org/D95363	2021-03-16 07:51:59 +00:00
Wenlei He	a5d30421a6	[CSSPGO] Load context profile for external functions in PreLink and populate ThinLTO import list For ThinLTO's prelink compilation, we need to put external inline candidates into an import list attached to function's entry count metadata. This enables ThinLink to treat such cross module callee as hot in summary index, and later helps postlink to import them for profile guided cross module inlining. For AutoFDO, the import list is retrieved by traversing the nested inlinee functions. For CSSPGO, since profile is flatterned, a few things need to happen for it to work: - When loading input profile in extended binary format, we need to load all child context profile whose parent is in current module, so context trie for current module includes potential cross module inlinee. - In order to make the above happen, we need to know whether input profile is CSSPGO profile before start reading function profile, hence a flag for profile summary section is added. - When searching for cross module inline candidate, we need to walk through the context trie instead of nested inlinee profile (callsite sample of AutoFDO profile). - Now that we have more accurate counts with CSSPGO, we swtiched to use entry count instead of total count to decided if an external callee is potentially beneficial to inline. This make it consistent with how we determine whether call tagert is potential inline candidate. Differential Revision: https://reviews.llvm.org/D98590	2021-03-15 12:22:15 -07:00
Juneyoung Lee	edf634ebc2	[AssumeBundles] Add nonnull/align to op bundle if noundef exists This is a patch to add nonnull and align to assume's operand bundle only if noundef exists. Since nonnull and align in fn attr have poison semantics, they should be paired with noundef or noundef-implying attributes to be immediate UB. Reviewed By: jdoerfert, Tyker Differential Revision: https://reviews.llvm.org/D98228	2021-03-16 10:23:42 +09:00
Hongtao Yu	beea06c106	[NFC][Inliner] Debugging support to print funtion size after each inlining. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D98439	2021-03-14 22:11:53 -07:00
Chenguang Wang	166620a4f0	[ArgPromotion] Copy additional metadata for loads. Current ArgPromotion implementation does not copy it: https://godbolt.org/z/zzTKof Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D93927	2021-03-14 21:28:14 +00:00
Simonas Kazlauskas	7d7001b2cb	[InstCombine] Restrict a GEP transform to avoid changing provenance This is an alternative to D98120. Herein, instead of deleting the transformation entirely, we check that the underlying objects are both the same and therefore this transformation wouldn't incur a provenance change, if applied. https://alive2.llvm.org/ce/z/SYF_yv Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98588	2021-03-14 16:32:04 +02:00
Luo, Yuanke	66fbf5fafb	[X86][AMX] Prevent transforming load pointer from <256 x i32>* to x86_amx*. The load/store instruction will be transformed to amx intrinsics in the pass of AMX type lowering. Prohibiting the pointer cast make that pass happy. Differential Revision: https://reviews.llvm.org/D98247	2021-03-14 09:24:56 +08:00
Nikita Popov	5556660971	[MemCpyOpt] Handle read from lifetime.start with offset This fixes a regression from the MemDep-based implementation: MemDep completely ignores lifetime.start intrinsics that aren't MustAlias -- this is probably unsound, but it does mean that the MemDep based implementation successfully eliminated memcpy's from lifetime.start if the memcpy happens at an offset, rather than the base address of the alloca. Add a special case for the case where the lifetime.start spans the whole alloca (which is pretty much the only kind of lifetime.start that frontends ever emit), as we don't need to figure out our exact aliasing relationship in that case, the whole alloca is dead prior to the call. If this doesn't cover all practically relevant cases, then it would be possible to make use of the recently added PartialAlias clobber offsets to make this more precise.	2021-03-13 20:38:09 +01:00
Sanjay Patel	4224a36957	[InstCombine] avoid creating an extra instruction in zext fold and possible inf-loop The structure of this fold is suspect vs. most of instcombine because it creates instructions and tries to delete them immediately after. If we don't have the operand types for the icmps, then we are not behaving as assumed. And as shown in PR49475, we can inf-loop.	2021-03-13 08:30:51 -05:00
Roman Lebedev	6e9b9978cf	[LSR] Don't try to fixup uses in 'EH pad' instructions The added test case crashes before this fix: ``` opt: /repositories/llvm-project/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp:5172: BasicBlock::iterator (anonymous namespace)::LSRInstance::AdjustInsertPositionForExpand(BasicBlock::iterator, const (anonymous namespace)::LSRFixup &, const (anonymous namespace)::LSRUse &, llvm::SCEVExpander &) const: Assertion `!isa<PHINode>(LowestIP) && !LowestIP->isEHPad() && !isa<DbgInfoIntrinsic>(LowestIP) && "Insertion point must be a normal instruction"' failed. ``` This is fully analogous to the previous commit, with the pointer constant replaced to be something non-null. The comparison here can be strength-reduced, but the second operand of the comparison happens to be identical to the constant pointer in the `catch` case of `landingpad`. While LSRInstance::CollectLoopInvariantFixupsAndFormulae() already gave up on uses in blocks ending up with EH pads, it didn't consider this case. Eventually, `LSRInstance::AdjustInsertPositionForExpand()` will be called, but the original insertion point it will get is the user instruction itself, and it doesn't want to deal with EH pads, and asserts as much. It would seem that this basically never happens in-the-wild, otherwise it would have been reported already, so it seems safe to take the cautious approach, and just not deal with such users.	2021-03-13 16:05:34 +03:00
Nikita Popov	2902bdeea1	[MemCpyOpt] Use AA to check for MustAlias between memset and memcpy Rather than checking for simple equality, check for MustAlias, as we do in other transforms. This catches equivalent GEPs.	2021-03-13 11:41:15 +01:00
Nikita Popov	9080444f33	[MemCpyOpt] Don't generate zero-size memset If a memset destination is overwritten by a memcpy and the sizes are exactly the same, then the memset is simply dead. We can directly drop it, instead of replacing it with a memset of zero size, which is particularly ugly for the case of a dynamic size.	2021-03-13 11:41:15 +01:00
Wei Mi	ef9d7db723	[IndirectCallPromotion] Recommit "Don't strip ".__uniq." suffix when it strips ".llvm." suffix". The recommit fixed a bug that symbols with "." at the beginning is not properly handled in the last commit. Original commit message: Currently IndirectCallPromotion simply strip everything after the first "." in LTO mode, in order to match the symbol name and the name with ".llvm." suffix in the value profile. However, if -funique-internal-linkage-names and thinlto are both enabled, the name may have both ".__uniq." suffix and ".llvm." suffix, and the current mechanism will strip them both, which is unexpected. The patch fixes the problem. Differential Revision: https://reviews.llvm.org/D98389	2021-03-12 13:48:14 -08:00
Nikita Popov	42eb658f65	[OpaquePtrs] Remove some uses of type-less CreateGEP() (NFC) This removes some (but not all) uses of type-less CreateGEP() and CreateInBoundsGEP() APIs, which are incompatible with opaque pointers. There are a still a number of tricky uses left, as well as many more variation APIs for CreateGEP.	2021-03-12 21:01:16 +01:00
Florian Hahn	fb3ca70761	[LV] Account IV recipes being uniform in VPTransformState::get(). This patch fixes a crash when trying to get a scalar value using VPTransformState::get() for uniform induction values or truncated induction values. IVs and truncated IVs can be uniform and the updated code accounts for that, fixing the crash. This should fix https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=31981	2021-03-12 13:29:06 +00:00
Sanjay Patel	bd197ed0a5	[SimplifyCFG] avoid sinking insts within an infinite-loop The test is reduced from a C source example in: https://llvm.org/PR49541 It's possible that the test could be reduced further or the predicate generalized further, but it seems to require a few ingredients (including the "late" SimplifyCFG options on the RUN line) to fall into the infinite-loop trap.	2021-03-12 08:04:57 -05:00
Hans Wennborg	f50aef745c	Revert "[InstrProfiling] Don't generate __llvm_profile_runtime_user" This broke the check-profile tests on Mac, see comment on the code review. > This is no longer needed, we can add __llvm_profile_runtime directly > to llvm.compiler.used or llvm.used to achieve the same effect. > > Differential Revision: https://reviews.llvm.org/D98325 This reverts commit `c7712087cb`. Also reverting the dependent follow-up commit: Revert "[InstrProfiling] Generate runtime hook for ELF platforms" > When using -fprofile-list to selectively apply instrumentation only > to certain files or functions, we may end up with a binary that doesn't > have any counters in the case where no files were selected. However, > because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the > runtime would still be pulled in and incur some non-trivial overhead, > especially in the case when the continuous or runtime counter relocation > mode is being used. A better way would be to pull in the profile runtime > only when needed by declaring the __llvm_profile_runtime symbol in the > translation unit only when needed. > > This approach was already used prior to `9a041a7522`, but we changed it > to always generate the __llvm_profile_runtime due to a TAPI limitation. > Since TAPI is only used on Mach-O platforms, we could use the early > emission of __llvm_profile_runtime there, and on other platforms we > could change back to the earlier approach where the symbol is generated > later only when needed. We can stop passing -u__llvm_profile_runtime to > the linker on Linux and Fuchsia since the generated undefined symbol in > each translation unit that needed it serves the same purpose. > > Differential Revision: https://reviews.llvm.org/D98061 This reverts commit `87fd09b25f`.	2021-03-12 13:53:46 +01:00
Serguei Katkov	cfe8f8e0f0	Revert "Mark gc.relocate and gc.result as readnone" As readnone function they become movable and LICM can hoist them out of a loop. As a result in LCSSA form phi node of type token is created. No one is ready that GCRelocate first operand is phi node but expects to be token. GVN test were also updated, it seems it does not do what is expected. Test for LICM is also added. This reverts commit `f352463ade`.	2021-03-12 16:59:17 +07:00
Johannes Doerfert	ff256c1376	[Attributor] Derive `willreturn` based on `mustprogress` Since D86233 we have `mustprogress` which, in combination with `readonly`, implies `willreturn`. The idea is that every side-effect has to be modeled as a "write". Consequently, `readonly` means there is no side-effect, and `mustprogress` guarantees that we cannot "loop" forever without side-effect. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94125	2021-03-11 23:31:44 -06:00

... 2 3 4 5 6 ...

27111 Commits