Commit Graph

17930 Commits

Author SHA1 Message Date
Dávid Bolvanský c68b560be3 [DSE] Handle memmove with equal non-const sizes
Follow up for fhahn's D98284. Also fixes a case from PR47644.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D98346
2021-03-10 17:52:00 +01:00
Florian Hahn 077dc5c87b
[DSE] Add tests that require phi translation to be removed. 2021-03-10 16:32:55 +00:00
Alex Richardson b26d6758f0 [SLC] Simplify strcpy and friends with non-zero address spaces
The current logic in TargetLibraryInfoImpl::getLibFunc() was only treating
strcpy, etc. with i8* arguments in address space zero as a valid library
function. However, in the CHERI and Morello targets we expect all libc
functions to use address space 200 arguments.

This commit updates isValidProtoForLibFunc() to check that the argument
is a pointer type. This also drops the check for i8* since we should not
be checking the pointee type any more.

Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D95142
2021-03-10 11:17:34 +00:00
Alex Richardson 81e2550f94 [SLC] Baseline test for missed strcpy optimizations in non-zero AS
This will be fixed in D95142

Differential Revision: https://reviews.llvm.org/D95138
2021-03-10 11:17:34 +00:00
Florian Hahn 8d9b9c0edc
[DSE] Handle memcpy/memset with equal non-const sizes.
Currently DSE misses cases where the size is a non-const IR value, even
if they match. For example, this means that llvm.memcpy/llvm.memset
calls are not eliminated, even if they write the same number of bytes.

This patch extends isOverwite to try to get IR values for the number of
bytes written from the analyzed instructions. If the values match,
alias checks are performed and the result is returned.

At the moment this only covers llvm.memcpy/llvm.memset. In the future,
we may enable MemoryLocation to also track variable sizes, but this
simple approach should allow us to cover the important cases in DSE.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98284
2021-03-10 10:13:58 +00:00
Florian Hahn 5293287630 [DSE] Add tests with memset & memcpy combinations and non-const sizes. 2021-03-10 09:46:54 +00:00
Juneyoung Lee 3170978173 [InstSimplify] Add tests for pr49495 (NFC) 2021-03-10 17:54:46 +09:00
Wei Mi ee35784a90 [SampleFDO] Support enabling -funique-internal-linkage-name.
now -funique-internal-linkage-name flag is available, and we want to flip
it on by default since it is beneficial to have separate sample profiles
for different internal symbols with the same name. As a preparation, we
want to avoid regression caused by the flip.

When we flip -funique-internal-linkage-name on, the profile is collected
from binary built without -funique-internal-linkage-name so it has no uniq
suffix, but the IR in the optimized build contains the suffix. This kind of
mismatch may introduce transient regression.

To avoid such mismatch, we introduce a NameTable section flag indicating
whether there is any name in the profile containing uniq suffix. Compiler
will decide whether to keep uniq suffix during name canonicalization
depending on the NameTable section flag. The flag is only available for
extbinary format. For other formats, by default compiler will keep uniq
suffix so they will only experience transient regression when
-funique-internal-linkage-name is just flipped.

Another type of regression is caused by places where we miss to call
getCanonicalFnName. Those places are fixed.

Differential Revision: https://reviews.llvm.org/D96932
2021-03-09 21:41:40 -08:00
Philip Reames b7fc372987 [tests] add a few more tests for D98122 2021-03-09 19:18:22 -08:00
Arnold Schwaighofer 590ac0a26a [coro async] Transfer the original function's attributes to the clone
rdar://75052917

Differential Revision: https://reviews.llvm.org/D98051
2021-03-09 17:01:41 -08:00
William S. Moses 875891a10d [MemoryDependence] Fix invariant group store
Fix bug in MemoryDependence [and thus GVN] for invariant group.

Previously MemDep didn't verify that the store was storing into a
pointer rather than a store simply using a pointer.

Differential Revision: https://reviews.llvm.org/D98267
2021-03-09 19:03:39 -05:00
David Green fa450e98c5 [ARM] Test for predicated scalar memops. NFC
This test shows a case where we can potentially scalarize the store in a
predicated loop, creating a lot of instructions that would be much
slower than scalar.
2021-03-09 21:57:18 +00:00
Philip Reames 82400ae016 [tests] add tests to show effects of D98122 2021-03-09 13:54:15 -08:00
Juneyoung Lee f49354838e Revert "[InstCombine] Add simplification of two logical and/ors"
This reverts commit 07c3b97e18 due to a reported failure in two-stage build.
2021-03-10 05:48:31 +09:00
Florian Hahn 5a3bb7dde3
[DSE] Add test cases with memory intrinsics and varying size values.
This patch adds a few tests for memset/memcyp with non-constant size
values. Some of the tests will be optimized in further patches.
2021-03-09 20:31:21 +00:00
Sanjay Patel 2986a9c7e2 [InstCombine] canonicalize 'not' op after min/max intrinsic
This is another step towards parity between existing select
transforms and min/max intrinsics (D98152)..

The existing 'not' folds around select are complicated, so
it's likely that we will need to enhance this, but this
should be a safe step.
2021-03-09 11:33:28 -05:00
Sanjay Patel ef19f6cbf3 [InstCombine] add tests for min/max intrinsics with not+constant; NFC 2021-03-09 11:33:28 -05:00
Sanjay Patel 41b9209a12 [InstCombine] fold min/max intrinsics with not ops
This is a partial translation of the existing select-based
folds. We need to recreate several different transforms to
avoid regressions as noted in D98152.

https://alive2.llvm.org/ce/z/teuZ_J
2021-03-09 08:55:48 -05:00
Luo, Yuanke 0875c2f7f6 [X86][AMX] Add test case for combining AMX bitcast. 2021-03-09 19:48:01 +08:00
Florian Hahn 92da5b7119
[InstCombine] Simplify phis with incoming pointer-casts.
If the incoming values of a phi are pointer casts of the same original
value, replace the phi with a single cast. Such redundant phis are
somewhat common after loop-rotate and removing them can avoid some
unnecessary code bloat, e.g. because an iteration of a loop is peeled
off to make the phi invariant. It should also simplify further analysis
on its own.

InstCombine already uses stripPointerCasts in a couple of places and
also simplifies phis based on the incoming values, so the patch should
fit in the existing scope.

The patch causes binary changes in 47 out of 237 benchmarks in
MultiSource/SPEC2000/SPEC2006 with -O3 -flto on X86.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98058
2021-03-09 11:40:18 +00:00
Sanjay Patel c05d574a98 [InstCombine] add tests for min/max intrinsics with not ops; NFC 2021-03-08 17:38:23 -05:00
Masoud Ataei 820f508b08 [PowerPC] Removing _massv place holder
Since P8 is the oldest machine supported by MASSV pass,
_massv place holder is removed and the oldest version of
MASSV functions is assumed. If the P9 vector specific is
detected in the compilation process, the P8 prefix will
be updated to P9.

Differential Revision: https://reviews.llvm.org/D98064
2021-03-08 21:43:24 +00:00
Sanjay Patel 0a2d69480d [InstSimplify] cttz(1<<x) --> x
https://alive2.llvm.org/ce/z/TDacYu
https://alive2.llvm.org/ce/z/KF84S3
2021-03-08 16:30:14 -05:00
Sanjay Patel afa443831b [InstSimplify] add tests for cttz of shifted-1; NFC 2021-03-08 16:30:13 -05:00
Philip Reames f1f9cc6c40 Fix ppc build bot after 239a6181
(Yes, I checked, return undef is the right result for the function.)
2021-03-08 10:00:56 -08:00
Philip Reames ebc61f9d3c [instcombine] Collapse trivial or recurrences
If we have a recurrence of the form <Start, Or, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence.

Differential Revision: https://reviews.llvm.org/D97578 (or part)
2021-03-08 09:21:38 -08:00
Philip Reames 239a618180 [instcombine] Collapse trivial and recurrences
If we have a recurrence of the form <Start, And, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence.

Differential Revision: https://reviews.llvm.org/D97578 (and part)
2021-03-08 09:21:38 -08:00
Philip Reames 97a7bc5831 [gvn] Precisely propagate equalities to phi operands
The code used for propagating equalities (e.g. assume facts) was conservative in two ways - one of which this patch fixes. Specifically, it shifts the code reasoning about whether a use is dominated by the end of the assume block to consider phi uses to exist on the predecessor edge. This matches the dominator tree handling for dominates(Edge, Use), and simply extends it to dominates(BB, Use).

Note that the decision to use the end of the block is itself a conservative choice. The more precise option would be to use the later of the assume and the value, and replace all uses after that. GVN handles that case separately (with the replace operand mechanism) because it used to be expensive to ask dominator questions within blocks. With the new instruction ordering support, we should probably rewrite this code at some point to simplify.

Differential Revision: https://reviews.llvm.org/D98082
2021-03-08 08:59:00 -08:00
Sanne Wouda 05a6e2eb9a [InstCombine] Add a combine for a shuffle of similar bitcasts
Some intrinsics wrapper code has the habit of ignoring the type of the
elements in vectors, thinking of vector registers as a "bag of bits". As
a consequence, some operations are shared between vectors of different
types are shared. For example, functions that rearrange elements in a
vector can be shared between vectors of int32 and float.

This can result in bitcasts in awkward places that prevent the backend
from recognizing some instructions. For AArch64 in particular, it
inhibits the selection of dup from a general purpose register (GPR), and
mov from GPR to a vector lane.

This patch adds a pattern in InstCombine to move the bitcasts past the
shufflevector if this is possible. Sometimes this even allows
InstCombine to remove the bitcast entirely, as in the included tests.

Alternatively this could be done with a few extra patterns in the
AArch64 backend, but InstCombine seems like a better place for this.

Differential Revision: https://reviews.llvm.org/D97397
2021-03-08 16:32:30 +00:00
Florian Hahn 2bf1955f8b
[InstCombine] Pre-commit tests for redundant phis with pointer casts.
Pre-commit tests for D98058.
2021-03-08 16:25:50 +00:00
Nikita Popov 7faad5c900 [ConstantFold] Handle icmp of global and null consistently
Return UGT rather than NE for icmp @g, null, which is slightly
stronger. This is consistent with what we do for more complex
folds. It is somewhat silly that @g ugt null does not get folded
while (gep @g) ugt null does.
2021-03-08 17:18:01 +01:00
Nikita Popov f08148e874 [ConstProp] Fix folding of pointer icmp with signed predicates
While @g ugt null is always true (ignoring weak symbols),
@g sgt null is not necessarily the case -- that would imply that
it is forbidden to place globals in the high half of the address
space.
2021-03-08 17:12:12 +01:00
Nikita Popov 2ef03bc3a8 [ConstProp] Add more tests for pointer icmp folding (NFC) 2021-03-08 17:06:12 +01:00
Juneyoung Lee ff58b243ac Apply update_test_checks.py to test/Transforms/Util/assume-builder.ll (NFC) 2021-03-09 01:03:03 +09:00
Sanjay Patel f75b5305f4 [ConstantFold] allow folding icmp of null and constexpr
I noticed that we were not folding expressions like this:
icmp ult (constexpr), null
in https://llvm.org/PR49355, so we end up with extremely large
icmp instructions as the constant expressions pile up on each other.

There is no potential to mis-fold an unsigned boundary condition
with a zero/null, so this is just falling through a crack in the
pattern matching.

The more general case of comparisons of non-zero constants and
constexpr are more tricky and may require the datalayout to know
how to cast to different types, etc. Negative tests verify that
we are only changing a subset of potential patterns.

Differential Revision: https://reviews.llvm.org/D98150
2021-03-08 08:53:59 -05:00
Sanjay Patel a093942c28 [ConstProp][JumpThreading] add more test coverage for potential nullptr folds; NFC
See D98150.
2021-03-08 08:53:59 -05:00
Sanjay Patel 962c6fda4d [JumpThreading] auto-generate complete test checks; NFC 2021-03-08 08:26:16 -05:00
Haojian Wu c9ff39a3f9 Add "assert require" for the test added in df9158c9a4
The test is using "debug-only", it was failing in opt built mode.
2021-03-08 14:17:26 +01:00
Simon Pilgrim c2d18d7005 [KnownBits] Add min/max shift amount handling to shl/lshr/ashr KnownBits helpers
Pulled out of the original D90479 patch - also includes the "impossible shift amount" filtering from computeKnownBitsFromShiftOperator.

Differential Revision: https://reviews.llvm.org/D90479
2021-03-08 11:44:31 +00:00
David Sherwood de3185647d [LoopVectorize][SVE] Add tests for vectorising conditional loads of invariant addresses
For loops of the form:

 void foo(int *a, int *cond, short *inv, long long n) {
   for (long long i=0; i<n; ++i) {
     if (cond[i])
       a[i] = *inv;
   }
 }

we can vectorise for SVE using masked gather loads where the array
of pointers is simply a vector splat of 'inv' and the mask comes
from the condition 'cond[i] != 0'.

This patch simply adds tests upstream to defend this capability.

Differential Revision: https://reviews.llvm.org/D98043
2021-03-08 08:38:31 +00:00
Ta-Wei Tu df9158c9a4 [LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest`
The check `tightlyNested()` in `LoopInterchange` is similar to the one in `LoopNest`.
In fact, the former misses some cases where loop-interchange is not feasible and results in incorrect behaviour.
Replacing it with the much robust version provided by `LoopNest` reduces code duplications and fixes https://bugs.llvm.org/show_bug.cgi?id=48113.

`LoopInterchange` has a weaker definition of tightly or perfectly nesting-ness than the one implemented in `LoopNest::arePerfectlyNested()`.
Therefore, `tightlyNested()` is instead implemented with `LoopNest::checkLoopsStructure` and additional checks for unsafe instructions.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D97290
2021-03-08 11:36:08 +08:00
Mehdi Amini 8d5a981a13 Revert "[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe"
This reverts commit 99108c791d.
Clang is miscompiling LLVM with this change, a stage-2 build hits
multiple failures.

As a repro, I built clang in a stage1 directory and used it this way:

cmake -G Ninja ../llvm \
  -DCMAKE_CXX_COMPILER=`pwd`/../build-stage1/bin/clang++ \
  -DCMAKE_C_COMPILER=`pwd`/../build-stage1/bin/clang \
  -DLLVM_TARGETS_TO_BUILD="X86;NVPTX;AMDGPU" \
  -DLLVM_ENABLE_PROJECTS=mlir \
  -DLLVM_BUILD_EXAMPLES=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_ENABLE_ASSERTIONS=On
ninja check-mlir
2021-03-08 00:15:47 +00:00
Whitney Tsang 0d8f102809 [NFC][LoopUnroll] Add `-unroll-runtime-other-exit-predictable=false` in
`runtime-multiexit-heuristic.ll`

Added -unroll-runtime-other-exit-predictable=false in
runtime-multiexit-heuristic.ll to make it more robust.
runtime-multiexit-heuristic.ll intention is to test
-unroll-runtime-multi-exit=false, so the default value of
-unroll-runtime-other-exit-predictable should not impact the result.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D98098
2021-03-07 23:51:09 +00:00
Whitney Tsang 40391cef61 [LoopUnrollRuntime] Add option to assume the non latch exit block to be
predictable. (Add LIT)

Reviewed By: Meinersbur, bmahjour

Differential Revision: https://reviews.llvm.org/D97747
2021-03-07 23:48:00 +00:00
Sanjay Patel 898b40645d [ConstProp] add tests for cmp with null and constexpr; NFC 2021-03-07 14:02:44 -05:00
Juneyoung Lee 07c3b97e18 [InstCombine] Add simplification of two logical and/ors
This is a patch that adds folding of two logical and/ors that share one variable:

a && (a && b) -> a && b
a && (a & b)  -> a && b
...

This is towards removing the poison-unsafe select optimization (D93065 has more context).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96945
2021-03-08 02:38:43 +09:00
Nikita Popov 2b494f85f1 [CVP] Remove -cvp-dont-add-nowrap-flags option
This option was originally added to work around a bug in LFTR.
The bug has long since been fixed.
2021-03-07 18:19:31 +01:00
Nikita Popov 176bbcae11 [DSE] Remove MemDep-based implementation
The MemorySSA-based implementation has been enabled without issue
for a while now, so keeping the old implementation around doesn't
seem useful anymore. This drops the MemDep-based implementation.

Differential Revision: https://reviews.llvm.org/D97877
2021-03-07 18:17:31 +01:00
Juneyoung Lee 33590ed4f2 [InstCombine] fix another poison-unsafe select transformation
This fixes another unsafe select folding by disabling it if
EnableUnsafeSelectTransform is set to false.

EnableUnsafeSelectTransform's default value is true, hence it won't
affect generated code (unless the flag is explicitly set to false).
2021-03-08 02:11:04 +09:00
Juneyoung Lee 99108c791d [SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe
This patch makes FoldBranchToCommonDest merge branch conditions into `select i1` rather than `and/or i1` when it is called by SimplifyCFG.
It is known that merging conditions into and/or is poison-unsafe, and this is towards making things *more* correct by removing possible miscompilations.
Currently, InstCombine simply consumes these selects into and/or of i1 (which is also unsafe), so the visible effect would be very small. The unsafe select -> and/or transformation will be removed in the future.
There has been efforts for updating optimizations to support the select form as well, and they are linked to D93065.

The safe transformation is fired when it is called by SimplifyCFG only. This is done by setting the new `PoisonSafe` argument as true.
Another place that calls FoldBranchToCommonDest is LoopSimplify. `PoisonSafe` flag is set to false in this case because enabling it has a nontrivial impact in performance because SCEV is more conservative with select form and InductiveRangeCheckElimination isn't aware of select form of and/or i1.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95026
2021-03-08 01:38:03 +09:00
Juneyoung Lee 5bb38e84d3 [LoopUnswitch] unswitch if cond is in select form of and/or as well
Hello all,
I'm trying to fix unsafe propagation of poison values in and/or conditions by using
equivalent select forms (`select i1 A, i1 B, i1 false` and `select i1 A, i1 true, i1 false`)
instead.
D93065 has links to patches for this.

This patch allows unswitch to happen if the condition is in this form as well.
`collectHomogenousInstGraphLoopInvariants` is updated to keep traversal if
Root and the visiting I matches both m_LogicalOr()/m_LogicalAnd().
Other than this, the remaining changes are almost straightforward and simply replaces
Instruction::And/Or check with match(m_LogicalOr()/m_LogicalAnd()).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D97756
2021-03-08 01:19:43 +09:00
Juneyoung Lee d65c947600 [InstCombine] enrich select-safe-bool-transforms.ll test (NFC)
for https://reviews.llvm.org/D96945
2021-03-08 00:01:08 +09:00
Juneyoung Lee 2c16c4a43c [ValueTracking] update directlyImpliesPoison to look into select's condition
This is a minor update in directlyImpliesPoison and makes it look into select's
condition.
Splitted from https://reviews.llvm.org/D96945
2021-03-07 23:16:44 +09:00
Nikita Popov 3fedaf2a52 [GVN] Don't explicitly materialize undefs from dead blocks
When materializing an available load value, do not explicitly
materialize the undef values from dead blocks. Doing so will
will force creation of a phi with an undef operand, even if there
is a dominating definition. The phi will be folded away on
subsequent GVN iterations, but by then we may have already
poisoned MDA cache slots.

Simply don't register these values in the first place, and let
SSAUpdater do its thing.
2021-03-06 23:46:24 +01:00
Nikita Popov 5f319fc444 [GVN] Add test for load GVN with dead block (NFC)
What this test illustrates is that GVN inserts an unnecessary
phi node initially, which prevents alias analysis from establishing
NoAlias, and MDA caches that result. We would be able to fully fold
this after another -gvn run with clean MDA.
2021-03-06 23:19:58 +01:00
Mauri Mustonen 494b5ba364
[VPlan] Support to widen call intructions in VPlan native path
Add support to widen call instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer.

Previously call instructions got handled by wrong recipes and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139.

Patch by Mauri Mustonen <mauri.mustonen@tuni.fi>

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D97278
2021-03-06 21:59:52 +00:00
Roman Lebedev 2ad1f5eb1a
[InstCombine] Don't canonicalize (gep i8* X, -(ptrtoint Y)) as (inttoptr (sub (ptrtoint X), (ptrtoint Y)))
It's just a wrong thing to do.

We introduce inttoptr where there were none, which results in
loosing all provenance information because we no longer have a GEP{i,},
and pessimize all future optimizations,
because we are basically not allowed to look past `inttoptr`.

(gep i8* X, -(ptrtoint Y))  *is* the canonical form.
So just drop this fold.

Noticed while reviewing D98120.
2021-03-06 23:00:25 +03:00
Roman Lebedev 75c7e3e314
[NFC][InstCombine] Add plain GEP test for (gep i8* X, -(ptrtoint Y)) --> (inttoptr (sub (ptrtoint X), (ptrtoint Y))) fold 2021-03-06 23:00:25 +03:00
Roman Lebedev b46c085d2b
[NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions
These intrinsics, not the icmp+select are the canonical form nowadays,
so we might as well directly emit them.

This should not cause any regressions, but if it does,
then then they would needed to be fixed regardless.

Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`,
but that is a pessimization, not a correctness issue.

Additionally, the non-intrinsic form has issues with undef,
see https://reviews.llvm.org/D88287#2587863
2021-03-06 21:52:46 +03:00
William S. Moses d163e75c81 [Attributor] Enable heap-to-stack of any size
Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1

Differential Revision: https://reviews.llvm.org/D97873
2021-03-06 12:57:32 -05:00
Philip Reames 9c139c50c9 [tests] Update an autogen test for format change 2021-03-06 09:49:27 -08:00
Philip Reames 5db2735af9 [gvn] Handle simply phi equivalence cases
GVN basically doesn't handle phi nodes at all. This is for a reason - we can't value number their inputs since the predecessor blocks have probably not been visited yet.

However, it also creates a significant pass ordering problem. As it stands, instcombine and simplifycfg ends up implementing CSE of phi nodes. This means that for any series of CSE opportunities intermixed with phi nodes, we end up having to alternate instcombine/simplifycfg and gvn to make progress.

This patch handles the simplest case by simply preprocessing the phi instructions in a block, and CSEing them if they are syntactically identical. This turns out to be powerful enough to handle many cases in a single invocation of GVN since blocks which use the cse'd phi results are visited after the block containing the phi. If there's a CSE opportunity in one the phi predecessors required to recognize the phi CSE opportunity, that will require a second iteration on the function. (Still within a single run of gvn though.)

Compile time wise, this could go either way. On one hand, we're potentially causing GVN to iterate over the function more. On the other, we're cutting down on iterations between two passes and potentially shrinking the IR aggressively. So, a bit unclear what to expect.

Note that this does still rely on instcombine to canonicalize block order of the phis, but that's a one time transformation independent of the values incoming to the phi.

Differential Revision: https://reviews.llvm.org/D98080
2021-03-06 09:31:12 -08:00
Philip Reames 06a8a867d1 [rs4gc/tests] Remove use of internal debug flags
As a pragmatic tradeoff, the ease of updating the tests outweighs the slightly easier to understand test conditions.  Where revevant, debug output was converted to comments to help human understanding.
2021-03-06 09:20:02 -08:00
Philip Reames c6ec563f02 [rs4gc] autogen a bunch of tests for ease of update 2021-03-06 09:04:00 -08:00
Nikita Popov 1c59bf4d4d [InstCombine] Add tests for non-trivial store to load forward (NFC)
Examples of things we mostly don't handle.
2021-03-06 16:58:11 +01:00
Nikita Popov edf7004851 [ConstantFold] Handle vectors in ConstantFoldLoadThroughBitcast()
There seems to be an impedance mismatch between what the type
system considers an aggregate (structs and arrays) and what
constants consider an aggregate (structs, arrays and vectors).

Rather than adjusting the type check, simply drop it entirely,
as getAggregateElement() is well-defined for non-aggregates: It
simply returns null in that case.
2021-03-06 12:17:56 +01:00
Nikita Popov be58465591 [GVN] Regenerate test checks (NFC) 2021-03-06 12:11:16 +01:00
Nikita Popov a917fb89dc [LVI] Simplify and generalize handling of clamp patterns
Instead of handling a number of special cases for selects, handle
this generally when inferring ranges from conditions. We already
infer ranges from `x + C pred C2` to `x`, so doing the same for
`x pred C2` to `x + C` is straightforward.
2021-03-06 10:42:41 +01:00
Nikita Popov 906deaa0d9 [CVP] Add additional tests for clamp patterns (NFC)
These are the same as the existing tests, but using different
predicates that are not handled by the current code.
2021-03-06 10:42:40 +01:00
Nikita Popov 019ae8220f [CVP] Fix tests for clamp patterns (NFC)
These tests didn't test the pattern they were supposed to, because
%a instead of %add was used in the select, which turned this into
a normal min/max).

Noticed this when commenting out the clamp handling code did not
result in any test failures...
2021-03-06 10:24:44 +01:00
Philip Reames 8bdb5ecd82 [tests] precommit tests for D98082 2021-03-05 15:21:34 -08:00
Philip Reames c0d390d0d2 [tests] precommit tests for phi handling in GVN 2021-03-05 14:24:21 -08:00
Philip Reames 51b13a7ea0 [gvn] CSE gc.relocates based on meaning, not spelling
The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.)

We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass.

Differential Revision: https://reviews.llvm.org/D97974

(Note: Part of the reviewed change was split and landed as f352463a)
2021-03-05 10:16:12 -08:00
Philip Reames f352463ade Mark gc.relocate and gc.result as readnone
For some reason, we had been marking gc.relocates as reading memory. There's no known reason for this, and I suspect it to be a legacy of very early implementation conservatism.  gc.relocate and gc.result are simply projections of the return values from the associated statepoint.  Note that the LangRef has always declared them readnone.

The EarlyCSE change is simply moving the special casing from readonly to readnone handling.

As noted by the test diffs, this does allow some additional CSE when relocates are separated by stores, but since we generate gc.relocates in batches, this is unlikely to help anything in practice.

This was reviewed as part of https://reviews.llvm.org/D97974, but split at reviewer request before landing.  The motivation is to enable the GVN changes in that patch.
2021-03-05 10:07:17 -08:00
Philip Reames 9fe46d6487 [tests] precommit some additional tests for D97974 2021-03-05 10:04:07 -08:00
Philip Reames 99f93dd3a5 [rs4gc] avoid insert base computation instructions for deopt uses
If we have a value live over a call which is used for deopt at the call, we know that the value must be a base pointer. We can avoid potentially inserting IR to materialize a base for this value.

In it's current form, this is mostly a compile time optimization.   Building the base pointer graph (and then optimizing it away again) is a relatively expensive operation.  We also sometimes end up with better codegen in practice - due to failures in optimizing away the inserted base pointer propogation - but those are optimization bugs we're fixing concurrently.

The alternative to this would be to extend the base pointer inference with the ability to generally reuse multiple-base input instructions (phis and selects).  That's somewhat invasive and complicated, so we're defering it a bit longer.

Differential Revision: https://reviews.llvm.org/D97885
2021-03-05 09:55:36 -08:00
Jann Horn 202ae987d3 [test] Fix new CodeGenPrepare test for non-X86 systems
The new test llvm/test/Transforms/CodeGenPrepare/remove-assume-block.ll
breaks on non-X86 machines. Change it to look like the existing test
llvm/test/Transforms/CodeGenPrepare/X86/delete-assume-dead-code.ll
to fix it.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D97952
2021-03-05 11:48:38 +01:00
David Sherwood fec0a0adac [SVE][LoopVectorize] Add support for extracting the last lane of a scalable vector
There are certain loops like this below:

  for (int i = 0; i < n; i++) {
    a[i] = b[i] + 1;
    *inv = a[i];
  }

that can only be vectorised if we are able to extract the last lane of the
vectorised form of 'a[i]'. For fixed width vectors this already works since
we know at compile time what the final lane is, however for scalable vectors
this is a different story. This patch adds support for extracting the last
lane from a scalable vector using a runtime determined lane value. I have
added support to VPIteration for runtime-determined lanes that still permit
the caching of values. I did this by introducing a new class called VPLane,
which describes the lane we're dealing with and provides interfaces to get
both the compile-time known lane and the runtime determined value. Whilst
doing this work I couldn't find any explicit tests for extracting the last
lane values of fixed width vectors so I added tests for both scalable and
fixed width vectors.

Differential Revision: https://reviews.llvm.org/D95139
2021-03-05 09:57:56 +00:00
Luke d28297ff68 [RISCV] Enable fixed-length vectorization of LoopVectorizer for RISC-V Vector
By implementing the method "unsigned RISCVTTIImpl::getRegisterBitWidth(bool Vector)",
fixed-length vectorization is enabled when possible. Without this method, the
"#pragma clang loop" directive is needed to enable vectorization(or the cost model
may inform LLVM that "Vectorization is possible but not beneficial").

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97549
2021-03-05 10:54:51 +08:00
Wei Mi 2357d29335 [SampleFDO] Another fix to prevent repeated indirect call promotion in
sample loader pass.

In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9,
to prevent repeated indirect call promotion for the same indirect call
and the same target, we used zero-count value profile to indicate an
indirect call has been promoted for a certain target. We removed
PromotedInsns cache in the same patch. However, there was a problem in
that patch described below, and that problem led me to add PromotedInsns
back as a mitigation in
https://reviews.llvm.org/rG4ffad1fb489f691825d6c7d78e1626de142f26cf.

When we get value profile from metadata by calling getValueProfDataFromInst,
we need to specify the maximum possible number of values we expect to read.
We uses MaxNumPromotions in the last patch so the maximum number of value
information extracted from metadata is MaxNumPromotions. If we have many
values including zero-count values when we write the metadata, some of them
will be dropped when we read them because we only read MaxNumPromotions
values. It will allow repeated indirect call promotion again. We need to
make sure if there are values indicating promoted targets, those values need
to be saved in metadata with higher priority than other values.

The patch fixed that problem. We change to use -1 to represent the count
of a promoted target instead of 0 so it is easier to sort the values.
When we prepare to update the metadata in updateIDTMetaData, we will sort
the values in the descending count order and extract only MaxNumPromotions
values to write into metadata. Since -1 is the max uint64_t number, if we
have equal to or less than MaxNumPromotions of -1 count values, they will
all be kept in metadata. If we have more than MaxNumPromotions of -1 count
values, we will only save MaxNumPromotions such values maximally. In such
case, we have logic in place in doesHistoryAllowICP to guarantee no more
promotion in sample loader pass will happen for the indirect call, because
it has been promoted enough.

With this change, now we can remove PromotedInsns without problem.

Differential Revision: https://reviews.llvm.org/D97350
2021-03-04 18:44:12 -08:00
William S. Moses 2b896e39bf Revert "[Attributor] Enable heap-to-stack of any size"
This reverts commit 51bd42ef9b.
2021-03-04 17:24:56 -05:00
Sanjay Patel 1bee549737 [LoopVectorize] propagate fast-math-flags from induction instructions
This code assumed that FP math was only permissable if it was
fully "fast", so it hard-coded "fast" when creating new instructions.

The underlying code already allows matching recurrences/reductions
that are only "reassoc", so this change should prevent the potential
miscompile seen in the test diffs (we created "fast" ops even though
none existed in the original code).

I don't know if we need to create the temporary IRBuilder objects
used here, so that could be follow-up clean-up.

There's an open question about whether we should require "nsz" in
addition to "reassoc" here. InstCombine uses that combo for its
reassociative folds, but I think codegen is not as strict.
2021-03-04 17:21:32 -05:00
William S. Moses 51bd42ef9b [Attributor] Enable heap-to-stack of any size
Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1

Differential Revision: https://reviews.llvm.org/D97873
2021-03-04 17:17:23 -05:00
Francis Visoiu Mistrih 365b78396a [Remarks] Emit variable info in auto-init remarks
This enhances the auto-init remark with information about the variable
that is auto-initialized.

This is based of debug info if available, or alloca names (mostly for
development purposes).

```
auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes.Variables: var (4096 bytes). [-Rpass-missed=annotation-remarks]
  int var[1024];
      ^
```

This allows to see things like partial initialization of a variable that
the optimizer won't be able to completely remove.

Differential Revision: https://reviews.llvm.org/D97734
2021-03-04 12:51:22 -08:00
Philip Reames 45fc4487c5 [tests] Precommit tests for upcoming patch to support CSE of gc.relocates in gvn 2021-03-04 12:23:27 -08:00
Philip Reames cf40539eac Use the right pass in test introduced in f1fdbd67 2021-03-04 12:21:13 -08:00
Philip Reames f1fdbd671b [test] Add DCE coverage for gc.relocate 2021-03-04 12:18:06 -08:00
Philip Reames 8998b811c9 [tests] Expand coverage of gc.relocate CSE in early-cse 2021-03-04 12:12:55 -08:00
Akira Hatanaka 1900503595 [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.

Original commit message:

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-03-04 11:22:30 -08:00
Adrian Prantl d268febc56 Improve the debug info for coro-split .resume functions
This patch updates the scope line to point to the suspend point. This
makes the first address in the function point to the first source line
in the resume function rather than the function declaration. Without
this the line table "jumps" from the beginning of the function to the
suspend point at the beginning.

rdar://73386346

Differential Revision: https://reviews.llvm.org/D97345
2021-03-04 11:05:35 -08:00
Alexey Bataev 04ba80ca4d [Instcombiner]Improve emission of logical or/and reductions.
For logical or/and reductions we emit regular intrinsics @llvm.vector.reduce.or/and.vxi1 calls.
These intrinsics are not effective for the logical or/and reductions,
especially if the optimizer is able to emit short circuit versions of
the scalar or/and instructions and vector code gets less effective than
the scalar version.
Instead, or reduction for i1 can be represented as:
```
%val = bitcast <ReduxWidth x i1> to iReduxWidth
%res = cmp ne iReduxWidth %val, 0
```
and reduction for i1 can be represented as:
```
%val = bitcast <ReduxWidth x i1> to iReduxWidth
%res = cmp eq iReduxWidth %val, 11111
```
This improves perfromance of the vector code significantly and make it
to outperform short circuit scalar code.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D97406
2021-03-04 08:01:02 -08:00
Sanjay Patel 36a489d194 [Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC
Similar to b3a33553ae, but this shows a TODO and a potential
miscompile is already present.

We are tracking an FP instruction that does *not* have FMF (reassoc)
properties, so calling that "Unsafe" seems opposite of the common
reading.

I also removed one getter method by rolling the null check into
the access. Further simplification may be possible.

The motivation is to clean up the interactions between FMF and
function-level attributes in these classes and their callers.

The new test shows that there is an existing bug somewhere in
the callers. We assumed that the original code was fully 'fast'
and so we produced IR with 'fast' even though it was just 'reassoc'.
2021-03-04 10:40:26 -05:00
Jann Horn 91c9dee3fb [CodeGenPrepare] Eliminate llvm.expect before removing empty blocks
CodeGenPrepare currently first removes empty blocks, then in a loop
performs other optimizations. One of those optimizations is the removal
of call instructions that invoke @llvm.assume, which can create new
empty blocks.

This means that when a branch only contains a call to __builtin_assume(),
the empty branch will survive into MIR, and will then only be
half-removed by MIR-level optimizations (e.g. removing the branch but
leaving the condition intact).

Fix it by eliminating @llvm.expect builtin calls before removing empty
blocks.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D97848
2021-03-04 14:48:26 +01:00
Hongtao Yu c75da238b4 [CSSPGO] Deduplicating dangling pseudo probes.
Same dangling probes are redundant since they all have the same semantic that is to rely on the counts inference tool to get reasonable count for the same original block. Therefore, there's no need to keep multiple copies of them. I've seen jump threading created tons of redundant dangling probes that slowed down the compiler dramatically. Other optimization passes can also result in redundant probes though without an observed impact so far.

This change removes block-wise redundant dangling probes specifically introduced by jump threading. To support removing redundant dangling probes caused by all other passes, a final function-wise deduplication is also added.

An 18% size win of the .pseudo_probe section was seen for SPEC2017. No performance difference was observed.

Differential Revision: https://reviews.llvm.org/D97482
2021-03-03 22:44:42 -08:00
Hongtao Yu 8985515822 [CSSPGO] Unblocking optimizations by dangling pseudo probes.
This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments.

The optimizations being unblocked are:

	1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted.
	2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions.
	3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks.

Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D97481
2021-03-03 22:44:42 -08:00
Hongtao Yu ad2a59f584 [CSSPGO] Introducing dangling pseudo probes.
Dangling probes are the probes associated to an empty block. This usually happens when all real instructions are optimized away from the block. There is a problem with dangling probes during the offline counts processing. The way the sample profiler works is that samples collected on the first physical instruction following a probe will be counted towards the probe. This logically equals to treating the instruction next to a probe as if it is from the same block of the probe. In the dangling probe case, the real instruction following a dangling probe actually starts a new block, and samples collected on the new block may cause issues when counted towards the empty block.

To mitigate this issue, we first try to move around a dangling probe inside its owning block. If there are still native instructions preceding the probe in the same block, we can then use them as a place holder to collect samples for the probe. A pass is added to walk each block backwards looking for probes not followed by any real instruction and moving them before the first real instruction. This is done right before the object emission.

If we are unlucky to find such in-block preceding instructions for a probe, the solution we are taking is to tag such probe as dangling so that the samples reported for them will not be trusted by the compiler. We leave it up to the counts inference algorithm to get such probes a reasonable count. The number `UINT64_MAX` is used to mark sample count as collected for a dangling probe.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95962
2021-03-03 22:44:41 -08:00
Johannes Doerfert 5b70c12f3e [Attributor] Make DepClass a required argument
We often used a sub-optimal dependence class in the past because we
didn't see the argument. Let's make it explicit so we remember to think
about it.
2021-03-04 00:35:52 -06:00
Johannes Doerfert e592dad82e [Attributor] Fold "TrackDependence" into the DepClassTy enum
We don't need a bool and an enum to express the three options we
currently have. This makes the interface nicer and much easier to
use optional dependencies. Also avoids mistakes where the bool is
false and enum ignored.
2021-03-04 00:35:52 -06:00
Johannes Doerfert f3f88287c5 [Attributor] Use known alignment as lower bound to avoid work
If we know already more than available from a use, we don't need to
invest time on it.
2021-03-04 00:35:52 -06:00
Evgeniy Brevnov e94125f054 [DSE] Add support for not aligned begin/end
This is an attempt to improve handling of partial overlaps in case of unaligned begin\end.

Existing implementation just bails out if it encounters such cases. Even when it doesn't I believe existing code checking alignment constraints is not quite correct. It tries to ensure alignment of the "later" start/end offset while should be preserving relative alignment between earlier and later start/end.

The idea behind the change is simple. When start/end is not aligned as we wish instead of bailing out let's adjust it as necessary to get desired alignment.

I'll update with performance results as measured by the test-suite...it's still running...

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D93530
2021-03-04 12:24:23 +07:00
Serguei Katkov a0ff0f30df [InstCombine] Move statepoint intrinsic handling from visitCall to visitCallBase
statepoint intrinsic can be used in invoke context,
so it should be handled in visitCallBase to cover both call and invoke.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D97833
2021-03-04 11:00:22 +07:00
Xun Li 03f668613c [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions
See pr46990(https://bugs.llvm.org/show_bug.cgi?id=46990). LICM should not sink store instructions to loop exit blocks which cross coro.suspend intrinsics. This breaks semantic of coro.suspend intrinsic which return to caller directly. Also this leads to use-after-free if the coroutine is freed before control returns to the caller in multithread environment.

This patch disable promotion by check whether loop contains coro.suspend intrinsics.
This is a resubmit of D86190.
Disabling LICM for loops with coroutine suspension is a better option not only for correctness purpose but also for performance purpose.
In most cases LICM sinks memory operations. In the case of coroutine, sinking memory operation out of the loop does not improve performance since coroutien needs to get data from the frame anyway. In fact LICM would hurt coroutine performance since it adds more entries to the frame.

Differential Revision: https://reviews.llvm.org/D96928
2021-03-03 15:21:57 -08:00
Philip Reames 805115655e [LSR] Unify scheduling of existing and inserted addrecs
LSR goes to some lengths to schedule IV increments such that %iv and %iv.next never need to overlap. This is fairly fundamental to LSRs cost model. LSR assumes that an addrec can be represented with a single register. If %iv and %iv.next have to overlap, then that assumption does not hold.

The bug - which this patch is fixing - is that LSR only does this scheduling for IVs which it inserts, but it's cost model assumes the same for existing IVs that it reuses. It will rewrite existing IV users such that the no-overlap property holds, but will not actually reschedule said IV increment.

As you can see from the relatively lack of test updates, this doesn't actually impact codegen much. The main reason for doing it is to make a follow up patch series which improves post-increment use and scheduling easier to follow.

Differential Revision: https://reviews.llvm.org/D97219
2021-03-03 12:07:55 -08:00
Arnold Schwaighofer a42bea211a [coro async] Allow a coro.suspend.async to specify which argument is the context argument
Before we used the same argument as the entry point. The resume partial
function might want to use a different ABI for its context argument

Differential Revision: https://reviews.llvm.org/D97333
2021-03-03 08:27:37 -08:00
Hans Wennborg 0a5dd06718 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR"
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.

> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
>   which indicates the call is implicitly followed by a marker
>   instruction and an implicit retainRV/claimRV call that consumes the
>   call result. In addition, it emits a call to
>   @llvm.objc.clang.arc.noop.use, which consumes the call result, to
>   prevent the middle-end passes from changing the return type of the
>   called function. This is currently done only when the target is arm64
>   and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
>   with the operand bundle in the IR and removes the inserted calls after
>   processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
>   operand bundle. It doesn't remove the operand bundle on the call since
>   the backend needs it to emit the marker instruction. The retainRV and
>   claimRV calls are emitted late in the pipeline to prevent optimization
>   passes from transforming the IR in a way that makes it harder for the
>   ARC middle-end passes to figure out the def-use relationship between
>   the call and the retainRV/claimRV calls (which is the cause of
>   PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
>   nothing in the callee prevents it from being paired up with the
>   retainRV/claimRV call in the caller. It then inserts a release call if
>   claimRV is attached to the call since autoreleaseRV+claimRV is
>   equivalent to a release. If it cannot find an autoreleaseRV call, it
>   tries to transfer the operand bundle to a function call in the callee.
>   This is important since the ARC optimizer can remove the autoreleaseRV
>   returning the callee result, which makes it impossible to pair it up
>   with the retainRV/claimRV call in the caller. If that fails, it simply
>   emits a retain call in the IR if retainRV is attached to the call and
>   does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
>   constant value if the call has the operand bundle. This ensures the
>   call always has at least one user (the call to
>   @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
>   multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
>   calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808

This reverts commit ed4718eccb.
2021-03-03 15:51:40 +01:00
Alexey Bataev 3c3c4ee24f [Instcombine][NFC]Simplify logical reductions tests, NFC. 2021-03-02 08:27:42 -08:00
Alexey Bataev a054e94e9e [SLP]Merge reorder and reuse shuffles.
It is possible to merge reuse and reorder shuffles and reduce the total
cost of the vectorization tree/number of final instructions.

Differential Revision: https://reviews.llvm.org/D94992
2021-03-02 06:39:47 -08:00
Florian Hahn 0cb9d8acbc
[LV] Add test cases that require a larger number of RT checks.
Precommit tests cases for D75981.
2021-03-02 10:49:38 +00:00
David Green 54e2876132 [ARM] Update and add extra WLS testing. NFC 2021-03-01 21:46:09 +00:00
Masoud Ataei 5fe0cab79e [PowerPC] Removing sqrtd2 and sqrtf4 from list of vectorizable function with MASSV
Under -O3 and -Ofast, the MASSV conversion prevents the sqrt call to be inlined.
Inline sqrt is faster than MASSV call on leppc.

Differential Revision: https://reviews.llvm.org/D97487
2021-03-01 15:42:19 +00:00
Florian Hahn 53dacb7b67
[LV] Generate RT checks up-front and remove them if required.
This patch updates LV to generate the runtime checks just after cost
modeling, to allow a more precise estimate of the actual cost of the
checks. This information will be used in future patches to generate
larger runtime checks in cases where the checks only make up a small
fraction of the expected scalar loop execution time.

The runtime checks are created up-front in a temporary block to allow better
estimating the cost and un-linked from the existing IR. After deciding to
vectorize, the checks are moved backed. If deciding not to vectorize, the
temporary block is completely removed.

This patch is similar in spirit to D71053, but explores a different
direction: instead of delaying the decision on whether to vectorize in
the presence of runtime checks it instead optimistically creates the
runtime checks early and discards them later if decided to not
vectorize. This has the advantage that the cost-modeling decisions
can be kept together and can be done up-front and thus preserving the
general code structure. I think delaying (part) of the decision to
vectorize would also make the VPlan migration a bit harder.

One potential drawback of this patch is that we speculatively
generate IR which we might have to clean up later. However it seems like
the code required to do so is quite manageable.

Reviewed By: lebedev.ri, ebrevnov

Differential Revision: https://reviews.llvm.org/D75980
2021-03-01 10:48:04 +00:00
Sanjay Patel 9502061bcc [InstCombine] avoid infinite loop in demanded bits for select
https://llvm.org/PR49205
2021-02-28 10:17:53 -05:00
Wei Mi 7fb400112f [SampleFDO] Add a cutoff flag to control how many symbols will be included
into profile symbol list.

When test is unrepresentative to production behavior, sample profile
collected from production can cause unexpected performance behavior
in test. To triage such issue, it is useful to have a cutoff flag
to control how many symbols will be included into profile symbol list
in order to do binary search.

Differential Revision: https://reviews.llvm.org/D97623
2021-02-27 23:15:31 -08:00
William S. Moses b077d82b00 [Attributor] Conditinoally delete fns
Allow the attributor to delete functions only if requested

Differential Revision: https://reviews.llvm.org/D97238
2021-02-27 20:37:42 -05:00
Sanjay Patel 356cdabd3a [SimplifyCFG] avoid illegal phi with both poison and undef
In the example based on:
https://llvm.org/PR49218
...we are crashing because poison is a subclass of undef, so we merge blocks and create:

PHI node has multiple entries for the same basic block with different incoming values!
  %k3 = phi i64 [ poison, %entry ], [ %k3, %g ], [ undef, %entry ]

If both poison and undef values are incoming, we soften the poison values to undef.

Differential Revision: https://reviews.llvm.org/D97495
2021-02-27 09:10:32 -05:00
Fangrui Song 1d7f8c7517 [test] Fix PGOProfile/comdat_internal.ll 2021-02-26 16:27:23 -08:00
Philip Reames 83bc7815c4 [tests] Precommit for upcoming patch 2021-02-26 13:11:25 -08:00
Alexey Bataev 7820518d55 [InstCombine][NFC]Add a test for logical reductions. 2021-02-26 10:08:48 -08:00
Simon Pilgrim 455d43b951 [Utils] collectBitParts - bail for integers > 128-bits
collectBitParts uses int8_t for the bit indices, leaving a 128-bit limit.

We already test for this before calling collectBitParts, but rGb94c215592bd added truncate handling which meant we could end up processing wider integers.

Thanks to @manojgupta for the repro.
2021-02-26 14:58:01 +00:00
Stephen Tozer ec7b9b0c18 [InstCombine] Avoid redundant or out-of-order debug value sinking
This patch modifies TryToSinkInstruction in the InstCombine pass, to prevent
redundant debug intrinsics from being produced, and also prevent the intrinsics
from being emitted in an incorrect order. It does this by ensuring that when
this pass sinks an instruction and creates clones of the debug intrinsics that
use that instruction, it inserts those debug intrinsics in their original order,
and only inserts the last debug intrinsic for each variable in the Instruction's
block.

Differential revision: https://reviews.llvm.org/D95463
2021-02-26 13:04:33 +00:00
Evgeniy Brevnov 13a5cac2ba Revert "[NARY-REASSOCIATE] Support reassociation of min/max"
This reverts commit 83d134c3c4.
2021-02-26 19:47:54 +07:00
Francis Visoiu Mistrih fee9abe69c [Remarks] Provide more information about auto-init calls
This now analyzes calls to both intrinsics and functions.

For intrinsics, grab the ones we know and care about (mem* family) and
analyze the arguments.

For calls, use TLI to get more information about the libcalls, then
analyze the arguments if known.

```
auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes. [-Rpass-missed=annotation-remarks]
  int var[1024];
      ^
```

Differential Revision: https://reviews.llvm.org/D97489
2021-02-25 15:14:09 -08:00
Francis Visoiu Mistrih 4753a69a31 [Remarks] Provide more information about auto-init stores
This adds support for analyzing the instruction with the !annotation
"auto-init" in order to generate a more user-friendly remark.

For now, support the store size, and whether it's atomic/volatile.

Example:

```
auto-init.c:4:7: remark: Store inserted by -ftrivial-auto-var-init.Store size: 4 bytes. [-Rpass-missed=annotation-remarks]
  int var;
      ^
```

Differential Revision: https://reviews.llvm.org/D97412
2021-02-25 15:14:09 -08:00
Francis Visoiu Mistrih c49b600b2f [Remarks] Emit remarks for "auto-init" !annotations
Using the !annotation metadata, emit remarks pointing to code added by
`-ftrivial-auto-var-init` that survived the optimizer.

Example:

```
auto-init.c:4:7: remark: Initialization inserted by -ftrivial-auto-var-init. [-Rpass-missed=annotation-remarks]
  int buf[1024];
      ^
```

The tests are testing various situations like calls/stores/other
instructions, with debug locations, and extra debug information on
purpose: more patches will come to improve the reporting to make it more
user-friendly, and these tests will show how the reporting evolves.

Differential Revision: https://reviews.llvm.org/D97405
2021-02-25 15:14:09 -08:00
Nicolas Guillemot 3573a90b8a [PM] Show the pass argument in pre/post-pass IR dumps
This patch adds each pass' pass argument in the header for IR dumps.
For example:

Before:

```
    *** IR Dump Before InstructionSelect ***
```

After:

```
    *** IR Dump Before InstructionSelect (instruction-select) ***
```

The goal is to make it easier to know what argument to pass to
command line options like `debug-only` or `run-pass` to further
investigate a given pass.
2021-02-25 14:02:00 -08:00
Adrian Prantl 1693180884 Add a nullptr check.
This doesn't actually reproduce with a dbg.declare(i8* null, ...)
which produces a non-null null Value, but I have seen this show up in
crash logs. I'm suspecting that there may be another pass forcibly
setting the operand to a nullptr.
2021-02-25 12:01:11 -08:00
Florian Hahn 261f219ffc
[IndVars] Add test cases inspired by PR48965. 2021-02-25 15:54:18 +00:00
Evgeniy Brevnov 83d134c3c4 [NARY-REASSOCIATE] Support reassociation of min/max
Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88287
2021-02-25 18:22:39 +07:00
Evgeniy Brevnov 6d31ee1cea [NARY][NFC] New tests for upcoming changes. 2021-02-25 10:52:35 +07:00
Xun Li c38000a9fb [Coroutine] Check indirect uses of alloca when checking lifetime info
In the existing logic, we look at the lifetime.start marker of each alloca, and check all uses of the alloca, to see if any pair of the lifetime marker and an use of alloca crosses suspension point.
This approach is unfortunately incorrect. An use of alloca does not need to be a direct use, but can be an indirect use through alias.
Only checking direct uses can miss cases where indirect uses are crossing suspension point.
This can be demonstrated in the newly added test case 007.
In the test case, both x and y are only directly used prior to suspend, but they are captured into an alias, merged through a PHINode (so they couldn't be materialized), and used after CoroSuspend.
If we only check whether the lifetime starts cross suspension points with direct uses, we will put the allocas to the stack, and then capture their addresses in the frame.

Instead of fixing it in D96441 and D96566, this patch takes a different approach which I think is better.
We still checks the lifetime info in the same way as before, but with two differences:
1. The collection of liftime.start is moved into AllocaUseVisitor to make the logic more concentrated.
2. When looking at lifetime.start and use pairs, we not only checks the direct uses as before, but in this patch we check all uses collected by AllocaUseVisitor, which would include all indirect uses through alias. This will make the analysis more accurate without throwing away the lifetime optimization.

Differential Revision: https://reviews.llvm.org/D96922
2021-02-24 18:29:23 -08:00
Sanjay Patel a7cee55762 [InstCombine] fold fdiv with powi divisor (PR49147)
This extends b40fde062c for the especially non-standard
powi pattern. We want to avoid being completely wrong
on the negation-of-int-min corner case, so I'm adding
an extra FMF check for 'ninf' assuming that gives us
the flexibility to handle that possibility.
https://llvm.org/PR49147
2021-02-24 16:44:36 -05:00
Duncan P. N. Exon Smith 01701646d5 Transforms: Clone distinct nodes in metadata mapper unless RF_ReuseAndMutateDistinctMDs
This is a follow up to 22a52dfddc and a
revert of df763188c9.

With this change, we only skip cloning distinct nodes in
MDNodeMapper::mapDistinct if RF_ReuseAndMutateDistinctMDs, dropping the
no-longer-needed local helper `cloneOrBuildODR()`.  Skipping cloning in
other cases is unsound and breaks CloneModule, which is why the textual
IR for PR48841 didn't pass previously. This commit adds the test as:
Transforms/ThinLTOBitcodeWriter/cfi-debug-info-cloned-type-references-global-value.ll

Cloning less often exposed a hole in subprogram cloning in
CloneFunctionInto thanks to df763188c9a1ecb1e7e5c4d4ea53a99fbb755903's
test ThinLTO/X86/Inputs/dicompositetype-unique-alias.ll. If a function
has a subprogram attachment whose scope is a DICompositeType that
shouldn't be cloned, but it has no internal debug info pointing at that
type, that composite type was being cloned. This commit plugs that hole,
calling DebugInfoFinder::processSubprogram from CloneFunctionInto.

As hinted at in 22a52dfddcefad4f275eb8ad1cc0e200074c2d8a's commit
message, I think we need to formalize ownership of metadata a bit more
so that ValueMapper/CloneFunctionInto (and similar functions) can deal
with cloning (or not) metadata in a more generic, less fragile way.

This fixes PR48841.

Differential Revision: https://reviews.llvm.org/D96734
2021-02-24 12:57:52 -08:00
Philip Reames 52745e4d90 [tests] precommit tests for D97219 2021-02-24 12:44:12 -08:00
Sanjay Patel 3475159122 [InstCombine] add tests for fdiv+powi; NFC 2021-02-24 15:08:00 -05:00
Simon Pilgrim 96a3dfeb93 Revert rGd65ddca83ff85c7345fe9a0f5a15750f01e38420 - "[ValueTracking] ComputeKnownBits - minimum leading/trailing zero bits in LSHR/SHL (PR44526)"
This is causing sanitizer test failures that I haven't been able to fix yet.
2021-02-24 18:03:17 +00:00
Simon Pilgrim d65ddca83f [ValueTracking] ComputeKnownBits - minimum leading/trailing zero bits in LSHR/SHL (PR44526)
Followup to D72573 - as detailed in https://blog.regehr.org/archives/1709 we don't make use of the known leading/trailing zeros for shifted values in cases where we don't know the shift amount value.

Stop ValueTracking returning zero for poison shift patterns and use the KnownBits shift helpers directly.

Extend KnownBits::shl to combine all possible shifted combinations if both min/max shift amount values are in range.

Differential Revision: https://reviews.llvm.org/D90479
2021-02-24 12:15:45 +00:00
Simon Pilgrim b94c215592 [Utils] collectBitParts - add truncate() handling 2021-02-24 11:48:34 +00:00
Florian Hahn 6240f436dd
Recommit "[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends."
This reverts the revert commit 437f0bbcd5.

It adds a new toVPRecipeResult, which forces VPRecipeOrVPValueTy to be
constructed with a VPRecipeBase *. This should address ambiguous
constructor issues for recipe sub-types that also inherit from VPValue.
2021-02-24 10:36:02 +00:00
Juneyoung Lee 56d228a14e [SimplifyCFG] Update passingValueIsAlwaysUndefined to check more attributes
This is a simple patch to update SimplifyCFG's passingValueIsAlwaysUndefined to inspect more attributes.

A new function `CallBase::isPassingUndefUB` checks attributes that imply noundef.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D97244
2021-02-24 10:40:50 +09:00
Matthew Voss 1d7f1d15c5 [LTO] Fix test failures caused by 6da7d31416
Adds "REQUIRES: asserts", since the test uses debug messages
2021-02-23 14:58:30 -08:00
Matthew Voss 6da7d31416 [llvm-profdata] Emit Error when Invalid MemOpSize Section is Created by llvm-profdata
Under certain (currently unknown) conditions, llvm-profdata is outputting
profiles that have two consecutive entries in the MemOPSize section for the
value 0. This causes the PGOMemOPSizeOpt pass to output an invalid switch
instruction with two cases for 0. As mentioned, we’re not quite sure what’s
causing this to happen, but this patch prevents llvm-profdata from outputting a
profile that has this problem and gives an error with a request for a
reproducible.

Differential Revision: https://reviews.llvm.org/D92074
2021-02-23 12:51:54 -08:00
Simon Pilgrim 1020d16156 [InstSimplify] Handle nsw shl -> poison patterns
Pulled out from D90479 - this recognises invalid nsw shl patterns with signbit changes that result in poison.

Differential Revision: https://reviews.llvm.org/D97305
2021-02-23 18:26:56 +00:00
Florian Hahn 437f0bbcd5
Revert "[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends."
This reverts commit 4efa097eb4, because
some the compilers used for some bots do not support automatic
conversions to PointerUnion.
2021-02-23 16:57:21 +00:00
Florian Hahn 4efa097eb4
[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends.
Generalize the return value of tryToCreateWidenRecipe to return either a
newly create recipe or an existing VPValue. Use this to avoid creating
unnecessary VPBlendRecipes.

Fixes PR44800.
2021-02-23 16:52:03 +00:00
Nate Chandler 01b4890e47 Add @llvm.coro.async.size.replace intrinsic.
The new intrinsic replaces the size in one specified AsyncFunctionPointer with
the size in another.  This ability is necessary for functions which merely
forward to async functions such as those defined for partial applications.

Reviewed By: aschwaighofer

Differential Revision: https://reviews.llvm.org/D97229
2021-02-23 06:43:52 -08:00
David Green dd2dbf7ee2 [TTI] Change getOperandsScalarizationOverhead to take Type args
As a followup to D95291, getOperandsScalarizationOverhead was still
using a VF as a vector factor if the arguments were scalar, and would
assert on certain matrix intrinsics with differently sized vector
arguments. This patch removes the VF arg, instead passing the Types
through directly. This should allow it to more accurately compute the
cost without having to guess at which operands will be vectorized,
something difficult with more complex intrinsics.

This adjusts one SVE test as it is now calling the wrong intrinsic vs
veccall. Without invalid InstructCosts the cost of the scalarized
intrinsic is too low. This should get fixed when the cost of
scalarization is accounted for with scalable types.

Differential Revision: https://reviews.llvm.org/D96287
2021-02-23 13:04:59 +00:00
David Green bd4b61efbd [CostModel] Remove VF from IntrinsicCostAttributes
getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.

Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.

Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf working. ARM removed the fix in
dfac521da1, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.

Differential Revision: https://reviews.llvm.org/D95291
2021-02-23 13:03:26 +00:00
Matteo Favaro 633e090528
[DSE] Allow ptrs defined in the entry block in IsGuaranteedLoopInvariant.
The **IsGuaranteedLoopInvariant** function is making sure to check if the
incoming pointer is guaranteed to be loop invariant, therefore I think
the case where the pointer is defined in the entry block of a function
automatically guarantees the pointer to be loop invariant, as the entry
block of a function cannot have predecessors or be part of a loop.

I implemented this small patch and tested it using
**ninja check-llvm-unit** and **ninja check-llvm**. I added a contained test
file that shows the problem and used **opt -O3 -debug** on it to make sure
the case is not currently handled (in fact the debug log is showing that
the DSE pass is bailing out when testing if the killer store is able to
clobber the dead store).

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D96979
2021-02-23 12:00:44 +00:00
Juneyoung Lee edf2e96742 [SimplifyCFG] Minor tweaks to the added tests (NFC) 2021-02-23 17:32:28 +09:00
Juneyoung Lee 28be9af0f8 [SimplifyCFG] Add tests for D97244 (NFC) 2021-02-23 17:15:17 +09:00
Anton Afanasyev 5207151cf6 [SLP][Test] Add test for PR49081.ll 2021-02-23 09:37:42 +03:00
Juneyoung Lee 481c62277d [BuildLibCalls] Add noundef to allocator fns' size
This is a patch to explicitly mark the size parameter of allocator functions like malloc/realloc/... as noundef.

For C/C++: undef can be created from reading an uninitialized variable or padding.
Calling a function with uninitialized variable is already UB.
Calling malloc with padding value is.. something that's not expected. Padding bits may appear in a coerced aggregate, which doesn't apply to malloc's size.
Therefore, malloc's size can be marked as noundef.

For transformations that introduce malloc/realloc/..: I ran LLVM unit tests with an updated Alive2 semantics, and found no regression, so it seems okay.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97045
2021-02-23 13:58:03 +09:00
Craig Topper 89440df64a [ValueTracking] Improve ComputeNumSignBits for SRem.
The result will have the same sign as the dividend unless the
result is 0. The magnitude of the result will always be less
than or equal to the dividend. So the result will have at least
as many sign bits as the dividend.

Previously we would do this if the divisor was a positive constant,
but that isn't required.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97170
2021-02-22 14:36:25 -08:00
Petr Hosek c24b7a16b1 [InstrProfiling] Use ELF section groups for counters, data and values
__start_/__stop_ references retain C identifier name sections such as
__llvm_prf_*. Putting these into a section group disables this logic.

The ELF section group semantics ensures that group members are retained
or discarded as a unit. When a function symbol is discarded, this allows
allows linker to discard counters, data and values associated with that
function symbol as well.

Note that `noduplicates` COMDAT is lowered to zero-flag section group in
ELF. We only set this for functions that aren't already in a COMDAT and
for those that don't have available_externally linkage since we already
use regular COMDAT groups for those.

Differential Revision: https://reviews.llvm.org/D96757
2021-02-22 14:00:02 -08:00
Alexey Bataev 9a4dd4de9d [SLP]No need to mark scatter load pointer as scalar as it gets vectorized.
Pointer operand of scatter loads does not remain scalar in the tree (it
gest vectorized) and thus must not be marked as the scalar that remains
scalar in vectorized form.

Differential Revision: https://reviews.llvm.org/D96818
2021-02-22 11:58:28 -08:00
Petr Hosek 4827492d9f Revert "[InstrProfiling] Use ELF section groups for counters, data and values"
This reverts commits:
5ca21175e0
97184ab99c

The instrprof-gc-sections.c is failing on AArch64 LLD bot.
2021-02-22 11:13:55 -08:00
Florian Hahn 95daec6a84
[ConstraintElimination] Use unsigned > 0 instead of != 0.
ICMP_NE predicates cannot be directly represented as constraint. But we
can use ICMP_UGT instead ICMP_NE for %x != 0.

See https://alive2.llvm.org/ce/z/XlLCsW
2021-02-22 17:54:36 +00:00
Nikita Popov 4125afc357 [MemCpyOpt] Fix handling of readnone byval arguments
If the call is readnone, then there may not be any MemoryAccess
associated with the call. Bail out in that case.

This fixes the issue reported at
https://reviews.llvm.org/D94376#2578312.
2021-02-22 18:48:31 +01:00
Nikita Popov 5e7e499b91 [JumpThreading] Clone noalias.scope.decl when threading blocks
When cloning instructions during jump threading, also clone and
adapt any declared scopes. This is primarily important when
threading loop exits, because we'll end up with two dominating
scope declarations in that case (at least after additional loop
rotation). This addresses a loose thread from
https://reviews.llvm.org/rG2556b413a7b8#975012.

Differential Revision: https://reviews.llvm.org/D97154
2021-02-22 18:35:30 +01:00
Florian Hahn 784c70d704
[ConstraintElimination] Add initial ICMP_NE test cases. 2021-02-22 17:11:04 +00:00
Florian Hahn c7ee57f1dc
[LV] Directly use incoming value for single VPBlendRecipes.
VPBlendRecipes with single incoming (value, mask) pair are no-ops. Use
the incoming value directly.
2021-02-22 16:10:08 +00:00
Simon Pilgrim 19084887d9 [InstCombine] Add PR45977 test coverage 2021-02-22 12:20:35 +00:00
Simon Pilgrim 106b63de3a [InstCombine] Add smulo NumSignBits test from D97170 2021-02-22 10:25:13 +00:00
Florian Hahn 15a74b64df
[VPlan] Manage pairs of incoming (VPValue, VPBB) in VPWidenPHIRecipe.
This patch extends VPWidenPHIRecipe to manage pairs of incoming
(VPValue, VPBasicBlock) in the VPlan native path. This is made possible
because we now directly manage defined VPValues for recipes.

By keeping both the incoming value and block in the recipe directly,
code-generation in the VPlan native path becomes independent of the
predecessor ordering when fixing up non-induction phis, which currently
can cause crashes in the VPlan native path.

This fixes PR45958.

Reviewed By: sguggill

Differential Revision: https://reviews.llvm.org/D96773
2021-02-22 09:44:25 +00:00
Petr Hosek 5ca21175e0 [InstrProfiling] Use ELF section groups for counters, data and values
__start_/__stop_ references retain C identifier name sections such as
__llvm_prf_*. Putting these into a section group disables this logic.

The ELF section group semantics ensures that group members are retained
or discarded as a unit. When a function symbol is discarded, this allows
allows linker to discard counters, data and values associated with that
function symbol as well.

Note that `noduplicates` COMDAT is lowered to zero-flag section group in
ELF. We only set this for functions that aren't already in a COMDAT and
for those that don't have available_externally linkage since we already
use regular COMDAT groups for those.

Differential Revision: https://reviews.llvm.org/D96757
2021-02-21 16:13:06 -08:00
Sanjay Patel e772618f1e [InstCombine] fold fdiv with exp/exp2 divisor (PR49147)
Follow-up to:
D96648 / b40fde062
...for the special-case base calls.

From the earlier commit:
This is unusual in the general (non-reciprocal) case because we need
an extra instruction, but that should be better for general FP
reassociation and codegen. We conservatively check for "arcp" FMF
here as we do with existing fdiv folds, but it is not strictly
necessary to have that.
2021-02-20 16:02:58 -05:00
Sanjay Patel fbca27bf29 [InstCombine] add tests for fdiv of exp/exp2; NFC 2021-02-20 16:02:58 -05:00
Teresa Johnson fde55a9c9b [LTO] Fix cloning of llvm*.used when splitting module
Refines the fix in 3c4c205060 to only
put globals whose defs were cloned into the split regular LTO module
on the cloned llvm*.used globals. This avoids an issue where one of the
attached values was a local that was promoted in the original module
after the module was cloned. We only need to have the values defined in
the new module on those globals.

Fixes PR49251.

Differential Revision: https://reviews.llvm.org/D97013
2021-02-20 09:46:43 -08:00
Juneyoung Lee 3b8cfef486 [InstCombine] Add more tests to nonnull-select.ll (NFC) 2021-02-20 16:59:52 +09:00
Dávid Bolvanský cd54c57919 Reland "[Libcalls, Attrs] Annotate libcalls with noundef"
Fixed Clang tests.
2021-02-20 06:18:48 +01:00
Juneyoung Lee aacf7878bc [ValueTracking] Improve impliesPoison
This patch improves ValueTracking's impliesPoison(V1, V2) to do this reasoning:

```
  %res = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b)
  %overflow = extractvalue { i64, i1 } %res, 1
  %mul      = extractvalue { i64, i1 } %res, 0

	; If %mul is poison, %overflow is also poison, and vice versa.
```

This improvement leads to supporting this optimization under `-instcombine-unsafe-select-transform=0`:

```
define i1 @test2_logical(i64 %a, i64 %b, i64* %ptr) {
; CHECK-LABEL: @test2_logical(
; CHECK-NEXT:    [[MUL:%.*]] = mul i64 [[A:%.*]], [[B:%.*]]
; CHECK-NEXT:    [[TMP1:%.*]] = icmp ne i64 [[A]], 0
; CHECK-NEXT:    [[TMP2:%.*]] = icmp ne i64 [[B]], 0
; CHECK-NEXT:    [[OVERFLOW_1:%.*]] = and i1 [[TMP1]], [[TMP2]]
; CHECK-NEXT:    [[NEG:%.*]] = sub i64 0, [[MUL]]
; CHECK-NEXT:    store i64 [[NEG]], i64* [[PTR:%.*]], align 8
; CHECK-NEXT:    ret i1 [[OVERFLOW_1]]
;

  %res = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b)
  %overflow = extractvalue { i64, i1 } %res, 1
  %mul = extractvalue { i64, i1 } %res, 0
  %cmp = icmp ne i64 %mul, 0
  %overflow.1 = select i1 %overflow, i1 true, i1 %cmp
  %neg = sub i64 0, %mul
  store i64 %neg, i64* %ptr, align 8
  ret i1 %overflow.1
}
```

Previously, this didn't happen because the flag prevented `select i1 %overflow, i1 true, i1 %cmp` from being `or i1 %overflow, %cmp`.
Note that the select -> or conversion happens only when `impliesPoison(%cmp, %overflow)` returns true.
This improvement allows `impliesPoison` to do the reasoning.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96929
2021-02-20 13:22:34 +09:00
Dávid Bolvanský 94d034fb86 Revert "[Libcalls, Attrs] Annotate libcalls with noundef"
This reverts commit 33b0c63775. Bots are failing. Some Clang tests need to be updated too.
2021-02-20 04:18:42 +01:00
Dávid Bolvanský 33b0c63775 [Libcalls, Attrs] Annotate libcalls with noundef
I think we can use here same logic as for nonnull.

strlen(X) - X must be noundef => valid pointer.

for libcalls with size arg, we add noundef only if size is known and greater than 0 - so pointers must be noundef (valid ones)

Reviewed By: jdoerfert, aqjune

Differential Revision: https://reviews.llvm.org/D95122
2021-02-20 04:10:07 +01:00
Dávid Bolvanský 68e6025cf7 Revert "[BuildLibcalls] Mark some libcalls with inaccessiblememonly and inaccessiblemem_or_argmemonly"
This reverts commit 05d891a19e.
2021-02-20 03:58:53 +01:00
Dávid Bolvanský 05d891a19e [BuildLibcalls] Mark some libcalls with inaccessiblememonly and inaccessiblemem_or_argmemonly
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94850
2021-02-20 03:56:01 +01:00
Sanjay Patel 5b250a27ec [Analysis][LoopVectorize] do not form reductions of pointers
This is a fix for https://llvm.org/PR49215 either before/after
we make a verifier enhancement for vector reductions with D96904.

I'm not sure what the current thinking is for pointer math/logic
in IR. We allow icmp on pointer values. Therefore, we match min/max
patterns, so without this patch, the vectorizer could form a vector
reduction from that sequence.

But the LangRef definitions for min/max and vector reduction
intrinsics do not allow pointer types:
https://llvm.org/docs/LangRef.html#llvm-smax-intrinsic
https://llvm.org/docs/LangRef.html#llvm-vector-reduce-umax-intrinsic

So we would crash/assert at some point - either in IR verification,
in the cost model, or in codegen. If we do want to allow this kind
of transform, we will need to update the LangRef and all of those
parts of the compiler.

Differential Revision: https://reviews.llvm.org/D97047
2021-02-19 14:01:57 -05:00
Philip Reames 4a5edea193 [SCEV] Use both known bits and sign bits when computing range of SCEV unknowns
When computing a range for a SCEVUnknown, today we use computeKnownBits for unsigned ranges, and computeNumSignBots for signed ranges. This means we miss opportunities to improve range results.

One common missed pattern is that we have a signed range of a value which CKB can determine is positive, but CNSB doesn't convey that information. The current range includes the negative part, and is thus double the size.

Per the removed comment, the original concern which delayed using both (after some code merging years back) was a compile time concern. CTMark results (provided by Nikita, thanks!) showed a geomean impact of about 0.1%. This doesn't seem large enough to avoid higher quality results.

Differential Revision: https://reviews.llvm.org/D96534
2021-02-19 08:29:12 -08:00
David Green a1c34a9d6a [ARM] Correct vector predicate type in MVE getCmpSelInstrCost 2021-02-19 14:43:51 +00:00
Florian Hahn edc92a1c42
[LV] Remove VPCallback.
Now that all state for generated instructions is managed directly in
VPTransformState, VPCallBack is no longer needed. This patch updates the
last use of `getOrCreateScalarValue` to instead manage the value
directly in VPTransformState and removes VPCallback.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D95383
2021-02-19 12:50:41 +00:00
Nikita Popov 2f17ed294f [DCE] Don't remove non-willreturn calls
In both ADCE and BDCE (via DemandedBits) we should not remove
instructions that are not guaranteed to return. This issue was
pointed out by fhahn in the recent llvm-dev thread.

Differential Revision: https://reviews.llvm.org/D96993
2021-02-19 12:35:40 +01:00
Xun Li 3bf8f162a0 [Coroutine] Relax CoroElide musttail check
As discussed in D94834, we don't really need to do complicated analysis. It's safe to just drop the tail call attribute.

Differential Revision: https://reviews.llvm.org/D96926
2021-02-18 19:36:11 -08:00
Wei Mi 5fb65c02ca [SampleFDO] Stop repeated indirect call promotion for the same target.
Found a problem in indirect call promotion in sample loader pass. Currently
if an indirect call is promoted for a target, and if the parent function is
inlined into some other function, the indirect call can be promoted for the
same target again. That is redundent which can harm performance and can cause
excessive compile time in some extreme case.

The patch fixes the issue. If a target is promoted for an indirect call, the
patch will write ICP metadata with the target call count being set to 0.
In the later ICP in sample profile loader, if it sees a target has 0 count
for an indirect call, it knows the target has been promoted and won't do
indirect call promotion for the indirect call.

The fix brings 0.1~0.2% performance on our search benchmark.

Differential Revision: https://reviews.llvm.org/D96806
2021-02-18 17:01:32 -08:00
Nikita Popov 4045ad6b0c [DCE] Add tests for non-willreturn function being removed (NFC) 2021-02-18 21:56:20 +01:00
Ta-Wei Tu 0c087a6c85 Remove redundent types in pr49185.ll 2021-02-19 04:44:27 +08:00
Ta-Wei Tu f70cdc5b5c [NPM] Properly reset parent loop after loop passes
This fixes https://bugs.llvm.org/show_bug.cgi?id=49185

When `NDEBUG` is not set, `LPMUpdater` checks if the added loops have the same parent loop as the current one in `addSiblingLoops`.
If multiple loop passes are executed through `LoopPassManager`, `U.ParentL` will be the same across all passes.
However, the parent loop might change after running a loop pass, resulting in assertion failures in subsequent passes.

This patch resets `U.ParentL` after running individual loop passes in `LoopPassManager`.

Reviewed By: asbirlea, ychen

Differential Revision: https://reviews.llvm.org/D96727
2021-02-19 02:50:53 +08:00
Philip Reames 8666463889 [instcombine] Exploit UB implied by nofree attributes
This patch simply implements the documented UB of the current nofree attributes as specified. It doesn't try to be fancy about inference (yet), it just implements the cases already specified and inferred.

Note: When this lands, it may expose miscompiles. If so, please revert and provide a test case. It's likely the bug is in the existing inference code and without a relatively complete test case, it will be hard to debug.

Differential Revision: https://reviews.llvm.org/D96349
2021-02-18 08:34:22 -08:00
David Green 1a6744e3dc [ARM] Add larger than legal ICmp costs
A v8i32 compare will produce a v8i1 predicate, but during codegen the
v8i32 will be split into two v4i32, potentially requiring two v4i1
predicates to be merged into a single v8i1. Because this merging of two
v4i1's into a v8i1 is very expensive, we need to make the cost of the
compare equally high.

This patch adds the cost of that to ARMTTIImpl::getCmpSelInstrCost.
Because we don't know whether the user of the predicate can be split,
and the cost model is mostly pre-instruction, we may be pessimistic but
that should only be for larger and legal types. This also adds min/max
detection to the costmodel where it can be detected, to keep those in
line with the cost of simple min/max instructions. Otherwise for the
most part, costs that were already expensive have become more expensive.

Differential Revision: https://reviews.llvm.org/D96692
2021-02-18 11:42:17 +00:00
Florian Hahn 059cfe3093
[FuncAttrs] Add tests for willreturn callsite inference. 2021-02-18 11:41:42 +00:00
David Green 1fbb3287fc [ARM] MVE ICmp costing tests. NFC 2021-02-18 10:50:34 +00:00
Juneyoung Lee 5260873c3b [InstCombine] add tests for simplification of logical and/ors (NFC) 2021-02-18 17:42:06 +09:00
Joseph Huber c3a3d20093 [LV] Add analysis remark for mixed precision conversions
Floating point conversions inside vectorized loops have performance
implications but are very subtle. The user could specify a floating
point constant, or call a function without realizing that it will
force a change in the vector width. An example of this behaviour is
seen in https://godbolt.org/z/M3nT6c . The vectorizer should indicate
when this happens becuase it is most likely unintended behaviour.

This patch adds a simple check for this behaviour by following floating
point stores in the original loop and checking if a floating point
conversion operation occurs.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D95539
2021-02-17 21:37:08 -05:00
Stanislav Mekhanoshin a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
William S. Moses 892d2822b6 [SROA] Amend failing test from D95826 2021-02-17 13:58:34 -05:00
William S. Moses 40862b1a74 [SROA] Propagate correct TBAA/TBAA Struct offsets
SROA does not correctly account for offsets in TBAA/TBAA struct metadata.
This patch creates functionality for generating new MD with the corresponding
offset and updates SROA to use this functionality.

Differential Revision: https://reviews.llvm.org/D95826
2021-02-17 11:59:00 -05:00
Ta-Wei Tu 0eeaec2a6d [NFC] Refactor LoopInterchange into a loop-nest pass
This is the preliminary patch of converting `LoopInterchange` pass to a loop-nest pass and has no intended functional change.
Changes that are not loop-nest related are split to D96650.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D96644
2021-02-18 00:55:38 +08:00
Sanjay Patel 85294703a7 [InstCombine] fold fcmp-of-copysign idiom
As discussed in:
https://llvm.org/PR49179
...this pattern shows up in library code.
There are several potential generalizations as noted,
but we need to be careful that we get FP special-values
right, and it's not clear how much variation we should
expect to see from this exact idiom.
2021-02-17 10:32:33 -05:00
Sanjay Patel 236f82c640 [InstCombine] add tests for fcmp-of-copysign; NFC
https://llvm.org/PR49179
2021-02-17 10:32:33 -05:00
Arnold Schwaighofer 627cfd4394 [coro async] Don't promote allocas to the frame or rewrite swifterror if there are no suspend points
Also don't call function to update the call graph if there are no
clones. The function will fail.

rdar://74277860

Differential Revision: https://reviews.llvm.org/D96620
2021-02-16 09:05:38 -08:00
Ta-Wei Tu 6b612a7baf [NFC][LoopInterchange] Explicitly pass both `InnerLoop` and `OuterLoop` to `processLoop`
This is a split patch of D96644.

Explicitly pass both `InnerLoop` and `OuterLoop` to function `processLoop` to remove the need to swap elements in loop list and allow making loop list an `ArrayRef`.
Also, fix inconsistent spellings of `OuterLoopId` and `Inner Loop Id` in debug log.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D96650
2021-02-16 22:17:44 +08:00
Kerry McLaughlin ba1e150d03 [SVE] Add support for scalable vectorization of loops with int/fast FP reductions
This patch enables scalable vectorization of loops with integer/fast reductions, e.g:

```
unsigned sum = 0;
for (int i = 0; i < n; ++i) {
  sum += a[i];
}
```

A new TTI interface, isLegalToVectorizeReduction, has been added to prevent
reductions which are not supported for scalable types from vectorizing.
If the reduction is not supported for a given scalable VF,
computeFeasibleMaxVF will fall back to using fixed-width vectorization.

Reviewed By: david-arm, fhahn, dmgreen

Differential Revision: https://reviews.llvm.org/D95245
2021-02-16 13:50:06 +00:00
Florian Hahn 54a14c264a
[VPlan] Manage scalarized values using VPValues.
This patch updates codegen to use VPValues to manage the generated
scalarized instructions.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D92285
2021-02-16 09:04:10 +00:00
Akira Hatanaka 32dc79c5ef [ObjC][ARC] Do not perform code motion on precise release calls
This fixes a bug where an object can get deallocated before reaching the
end of its full formal lifetime.

rdar://72110887
rdar://74123176
2021-02-15 17:39:37 -08:00
Sanjay Patel 378941f611 [ValueTracking] add scan limit for assumes
In the motivating example from https://llvm.org/PR49171 and
reduced test here, we would unroll and clone assumes so much
that compile-time effectively became infinite while analyzing
all of those assumes.
2021-02-15 15:24:20 -05:00
Kerry McLaughlin 5fe1593438 [LoopVectorizer] Require no-signed-zeros-fp-math=true for fmin/fmax
Currently, setting the `no-nans-fp-math` attribute to true will allow
loops with fmin/fmax to vectorize, though we should be requiring that
`no-signed-zeros-fp-math` is also set.

This patch adds the check for no-signed-zeros at the function level and includes
tests to make sure we don't vectorize functions with only one of the attributes
associated.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D96604
2021-02-15 13:47:05 +00:00
Caroline Concatto 2d728bbff5 [CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse
This patch adds  a new intrinsic experimental.vector.reduce that takes a single
vector and returns a vector of matching type but with the original lane order
 reversed. For example:

```
vector.reverse(<A,B,C,D>) ==> <D,C,B,A>
```

The new intrinsic supports fixed and scalable vectors types.
The fixed-width vector relies on shufflevector to maintain existing behaviour.
Scalable vector uses the new ISD node - VECTOR_REVERSE.

This new intrinsic is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].

Patch by Paul Walker (@paulwalker-arm).

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html

Differential Revision: https://reviews.llvm.org/D94883
2021-02-15 13:39:43 +00:00
Max Kazantsev e3c759bd58 [LoopLoadElim] Pass ScalarEvolution in old pass manager. PR49141
Loop canonicalization may end up deleting blocks from CFG. And
Scalar Evolution may still keep cached referenced to those blocks
unless updated properly.
2021-02-15 18:08:23 +07:00
David Green 08c09bb89f [ARM] Move PhaseOrdering test to the correct place. NFC 2021-02-14 18:43:39 +00:00
Florian Hahn 3df5d5aace [ConstraintElimination] Fix variables used for pattern matching.
Re-using the matched variable in the pattern does not work as expected.
This patch fixes that by introducing a new variable for the 2nd level
match.
2021-02-14 18:42:37 +00:00
David Green bc2e843839 [ARM] A couple of small MVE reduction tests from intrinsics. NFC
Also added a PhaseOrdering test, to make sure they are not broken by
VectorCombine cost changes.
2021-02-14 18:26:22 +00:00
aqjune 5f3c99085d [ValueTracking] Dereferenced pointers are noundef
This is a follow-up of D95238's LangRef update.
This patch updates `programUndefinedIfUndefOrPoison(V)` to return true if
`V` is used by any memory-accessing instruction.
Interestingly, this affected many tests in Attributors, mainly about adding noundefs.
The tests are updated using llvm/utils/update_test_checks.py. I checked that the diffs
are about updating noundefs.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96642
2021-02-14 22:50:48 +09:00
Sanjay Patel b40fde062c [InstCombine] fold fdiv with pow divisor (PR49147)
This is unusual in the general (non-reciprocal) case because we need
an extra instruction, but that should be better for general FP
reassociation and codegen. We conservatively check for "arcp" FMF
here as we do with existing fdiv folds, but it is not strictly
necessary to have that.

This is part of solving:
https://llvm.org/PR49147
(The powi variant potentially has a different constraint.)

Differential Revision: https://reviews.llvm.org/D96648
2021-02-14 08:07:36 -05:00
Juneyoung Lee 2c5c7d5feb [InstCombine] Add nonnull(select c, null, p) tests (NFC) 2021-02-14 22:03:41 +09:00
Juneyoung Lee ed253ef772 [LoopVectorize] Fix VPRecipeBuilder::createEdgeMask to correctly generate the mask
This patch fixes pr48832 by correctly generating the mask when a poison value is involved.

Consider this CFG (which is a part of the input):

```
for.body:                                         ; preds = %for.cond
  br i1 true, label %cond.false, label %land.rhs

land.rhs:                                         ; preds = %for.body
  br i1 poison, label %cond.end, label %cond.false

cond.false:                                       ; preds = %for.body, %land.rhs
  br label %cond.end

cond.end:                                         ; preds = %land.rhs, %cond.false
  %cond = phi i32 [ 0, %cond.false ], [ 1, %land.rhs ]

```

The path for.body -> land.rhs -> cond.end should be taken when 'select i1 false, i1 poison, i1 false' holds (which means it's never taken); but VPRecipeBuilder::createEdgeMask was emitting 'and i1 false, poison' instead.
The former one successfully blocks poison propagation whereas the latter one doesn't, making the condition poison and thus causing the miscompilation.

SimplifyCFG has a similar bug (which didn't expose a real-world bug yet), and a patch for this is also ongoing (see https://reviews.llvm.org/D95026).

Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D95217
2021-02-14 21:12:34 +09:00
Sanjay Patel b45fd233ad [InstCombine] add tests for pow() divisor; NFC 2021-02-13 13:04:38 -05:00
Tyker 642e9225c6 reland [InstCombine] convert assumes to operand bundles
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.

Differential Revision: https://reviews.llvm.org/D82703
2021-02-13 13:03:11 +01:00
Juneyoung Lee debaf942cf [InstSimplify] add tests that look into pointer operands of instructions 2021-02-13 16:25:27 +09:00
Arnold Schwaighofer e760ec2a01 [coro] Add support for polymorphic return typed coro.suspend.async
This allows for suspend point specific resume function types.

Return values from a suspend point can therefore be modelled as
arguments to the resume function. Allowing for directly passed return
types.

Differential Revision: https://reviews.llvm.org/D96136
2021-02-12 10:08:00 -08:00
Akira Hatanaka ed4718eccb [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-12 09:51:57 -08:00
Kerry McLaughlin fea06efe7c [SVE][LoopVectorize] Support for vectorization of loops with function calls
Changes `getScalarizationOverhead` to return an invalid cost for scalable VFs
and adds some simple tests for loops containing a function for which
there is a vectorized variant available.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D96356
2021-02-12 13:47:43 +00:00
Sanjay Patel 79b1b4a581 [Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics
The vector reduction intrinsics started life as experimental ops, so backend support
was lacking. As part of promoting them to 1st-class intrinsics, however, codegen
support was added/improved:
D58015
D90247

So I think it is safe to now remove this complication from IR.

Note that we still have an IR-level codegen expansion pass for these as discussed
in D95690. Removing that is another step in simplifying the logic. Also note that
x86 was already unconditionally forming reductions in IR, so there should be no
difference for x86.

I spot checked a couple of the tests here by running them through opt+llc and did
not see any asm diffs.

If we do find functional differences for other targets, it should be possible
to (at least temporarily) restore the shuffle IR with the ExpandReductions IR
pass.

Differential Revision: https://reviews.llvm.org/D96552
2021-02-12 08:13:50 -05:00
Hongtao Yu 0eed2b1a3c Remove test code that cause MSAN failure.
Summary:
The negative test (with the feature being added disabled) caused MSAN failure and that's the added feature is supposed to fix. Therefore the negative test code is being removed.
2021-02-11 14:51:55 -08:00
Hongtao Yu de40f6d623 [CSSPGO] Process functions in a top-down order on a dynamic call graph.
Functions are currently processed by the sample profiler loader in a top-down order defined by the static call graph. The order is being adjusted to be a top-down order based on the input context-sensitive profile. One benefit is that the processing order of caller and callee in one SCC would follow the context order in the profile to favor more inlining. Another benefit is that the processing order of caller and callee through an indirect call (which is not on the static call graph) can be honored which in turn allows for more inlining.

The profile top-down order for SCC is also extended to support non-CS profiles.

Two switches `-mllvm -use-profile-indirect-call-edges` and `-mllvm -use-profile-top-down-order` are being introduced.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95988
2021-02-11 12:36:59 -08:00
Guillaume Chatelet d06ab79816 Encode alignment attribute for `atomicrmw`
This is a follow up patch to D83136 adding the align attribute to `atomicwmw`.

Differential Revision: https://reviews.llvm.org/D83465
2021-02-11 15:17:37 -05:00
Sanjay Patel 6ef8473015 [InstCombine] add tests for disguised mul ops; NFC 2021-02-11 13:39:52 -05:00
Florian Hahn d5387ec267
[LV] Add tests showing suboptimal vectorization for narrow types.
This patch adds additional test cases showing missing/sub-optimal
vectorization for loops which contain small and wider memory ops on
AArch64.
2021-02-11 17:24:28 +00:00
Markus Lavin 9498315c9b Expand masked mem intrinsics correctly wrt big-endian
Need to take endianness into account when doing vector to scalar casts
such as %bc = bitcast <8 x i1> %v to i8
Companion commit for https://reviews.llvm.org/D94867
Upload in response to
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147862.html
Attempting to document the actual memory layout rules for vectors in
https://reviews.llvm.org/D94964

Differential Revision: https://reviews.llvm.org/D94765
2021-02-11 08:59:52 +00:00
Hongtao Yu 3a5f8a3ea3 [CSSPGO] Restrict pseudo probe tests to x86_64 only. 2021-02-10 14:41:10 -08:00
Hongtao Yu 1cb47a063e [CSSPGO] Unblock optimizations with pseudo probe instrumentation.
The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include:

1. IR InstCombine, sinking load operation to shorten lifetimes.
2. MIR LiveRangeShrink, similar to #1
3. MIR TwoAddressInstructionPass, i.e, opeq transform
4. MIR function argument copy elision
5. IR stack protection. (though not perf-critical but nice to have).

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95982
2021-02-10 12:43:17 -08:00
Adrian Prantl 19fc8eede4 Add missing nullptr check.
salvageDebugInfoImpl() may fail and return a nullptr.
2021-02-10 12:15:24 -08:00
Sanjay Patel 6e2053983e [InstCombine] fold lshr(mul X, SplatC), C2
This is a special-case multiply that replicates bits of
the source operand. We need this fold to avoid regression
if we make canonicalization to `mul` more aggressive for
shl+or patterns.

I did not see a way to make Alive generalize the bit width
condition for even-number-of-bits only, but an example of
the proof is:
  Name: i32
  Pre: isPowerOf2(C1 - 1) && log2(C1) == C2 && (C2 * 2 == width(C2))
  %m = mul nuw i32 %x, C1
  %t = lshr i32 %m, C2
  =>
  %t = and i32 %x, C1 - 2

  Name: i14
  %m = mul nuw i14 %x, 129
  %t = lshr i14 %m, 7
  =>
  %t = and i14 %x, 127

https://rise4fun.com/Alive/e52
2021-02-10 15:02:31 -05:00
Sanjay Patel 6bcc1fd461 [InstCombine] add tests for lshr with mul; NFC 2021-02-10 15:02:31 -05:00
Sam Parker 9d81ccc02f [WebAssembly] Enable loop unrolling
Enable partial and runtime unrolling with a threshold of 30, which
was derived from a large number of kernels running on node and
wasmtime for amd64 and aarch64.

Unrolling is enabled by default at -O2 and -O3 and is disabled at
-Oz and -Os. Compiling with -Os is recommended if the wasm binary
size is the most important factor.

Differential Revision: https://reviews.llvm.org/D95125
2021-02-10 08:25:46 +00:00
Tyker 5652e192fc Revert "[InstCombine] convert assumes to operand bundles"
This reverts commit 5eb2e994f9.
2021-02-10 01:32:00 +01:00
Johannes Doerfert 81429abd83 [Attributor][FIX] Do not create UB by introducing a `noundef undef`
This was reported as PR49104. The reproducer uses varargs but the issue
is the same, we know an argument is dead but can't change the signature
for some reason. The PR49104 situation was: We are in an CG-SCC
traversal and we remove all the uses of an argument and proof it thereby
dead. However, if we do not remove the argument, via signature rewrite,
we need to ensure that the `undef` we introduce at the call site doesn't
clash with a `noundef` attribute.
2021-02-09 13:02:38 -06:00
Craig Topper 18ff7e045a [RISCV] Make the min and max vector width command line options more consistent and check their relationship to each other. 2021-02-09 10:47:23 -08:00
Tyker 5eb2e994f9 [InstCombine] convert assumes to operand bundles
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.

Differential Revision: https://reviews.llvm.org/D82703
2021-02-09 19:33:53 +01:00
Sanjay Patel 0be0a1237c [ValueTracking] improve analysis for "C << X" and "C >> X"
This is based on the example/comments in:
https://llvm.org/PR48984

I tried just lifting the restriction in computeKnownBitsFromShiftOperator()
as suggested in the bug report, but that doesn't catch all of the cases
shown here. I didn't step through to see exactly why that happened. But it
seems like a reasonable compromise to cheaply check the special-case of
shifting a constant.

There's a slight regression on a cmp transform as noted, but this is likely
the more important/common pattern, so we can fix that icmp pattern later if
needed.

Differential Revision: https://reviews.llvm.org/D95959
2021-02-09 12:38:06 -05:00
Nico Weber de1966e542 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 4a64d8fe39.
Makes clang crash when buildling trivial iOS programs, see comment
after https://reviews.llvm.org/D92808#2551401
2021-02-09 11:06:32 -05:00
Jinsong Ji 9202806241 Revert "[CostModel] Remove VF from IntrinsicCostAttributes"
This reverts commit 502a67dd7f.

This expose a failure in test-suite build on PowerPC,
revert to unblock buildbot first,
Dave will re-commit in https://reviews.llvm.org/D96287.

Thanks Dave.
2021-02-09 02:14:14 +00:00
Hsiangkai Wang a5b07a221a [RISCV] Initial support of LoopVectorizer for RISC-V Vector.
Define an option -riscv-vector-bits-max to specify the maximum vector
bits for vectorizer. Loop vectorizer will use the value to check if it
is safe to use the whole vector registers to vectorize the loop.

It is not the optimum solution for loop vectorizing for scalable vector.
It assumed the whole vector registers will be used to vectorize the code.
If it is possible, we should configure vl to do vectorize instead of
using whole vector registers.

We only consider LMUL = 1 in this patch.

This patch just an initial work for loop vectorizer for RISC-V Vector.

Differential Revision: https://reviews.llvm.org/D95659
2021-02-09 06:32:18 +08:00
Fangrui Song 87dbdd2e3b [FileCheck] Default --allow-unused-prefixes to false
Link: https://lists.llvm.org/pipermail/llvm-dev/2020-October/146162.html "[RFC] FileCheck: (dis)allowing unused prefixes"

If a downstream project using lit needs time for transition,
add the following to `lit.local.cfg`:

```
from lit.llvm.subst import ToolSubst

fc = ToolSubst('FileCheck', unresolved='fatal')
config.substitutions.insert(0, (fc.regex, 'FileCheck --allow-unused-prefixes'))
```

Differential Revision: https://reviews.llvm.org/D95849
2021-02-08 13:37:04 -08:00
Arthur Eubanks 0eda454796 [SimpleLoopUnswitch] Don't non-trivially unswitch loops that are unsafe to clone
Non-trivial unswitching can clone loops.

The legacy -loop-unswitch pass also checks for this.

Fixes PR49085.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D96288
2021-02-08 13:19:24 -08:00
Florian Hahn 68dc90b347
[ConstraintElimination] Decompose a few more GEP indices.
This patch adds handling for zero-extended GEP indices.
2021-02-08 18:06:38 +00:00
Florian Hahn 1f1f037ed3
[ConstraintElimination] Improve index handing during constraint building.
This patch improves the index management during constraint building.
Previously, the code rejected constraints which used values that were not
part of Value2Index, but after combining the coefficients of the new
indices were 0 (if ShouldAdd was 0).

In those cases, no new indices need to be added. Instead of adding to
Value2Index directly, add new indices to the NewIndices map. The caller
can then check if it needs to add any new indices.

This enables checking constraints like `a + x <= a + n` to `x <= n`,
even if there is no constraint for `a` directly.
2021-02-08 13:05:13 +00:00
Florian Hahn ca268ed285
[ConstraintElimination] Decompose zext for unsigned compares.
For unsigned compares, zext should be a no-op and we can add the
extended value to the constraint system.
2021-02-07 20:53:06 +00:00
Florian Hahn a14a59f2f2
[ConstraintElimination] Add additional tests.
Adds additional test cases with zexts, conditions that require temporary
indices.
2021-02-07 18:00:17 +00:00
Sanjay Patel 6fd91be354 [Reassociate] allow or->add with shl operands
As discussed in:
https://llvm.org/PR49055

We invert instcombine's add->or transform here
because it makes it easier to identify factorization
transforms like the mul in the motivating test.

This extends the logic added with:
https://reviews.llvm.org/rG70472f3
https://reviews.llvm.org/rG93f3d7f

(I intentionally kept the formatting fix in this patch
to provide more context about the calling logic.)
2021-02-07 09:45:19 -05:00
Florian Hahn 853c52c988
[ConstraintElimination] Require GEPs to be inbounds for decomposition.
During decomposition, we assume the no-wrap properties of inbound GEPs.
Make sure the decomposed GEP is actually inbounds.
2021-02-07 11:08:53 +00:00
Florian Hahn 14da287e18
[ConstraintElimination] Extend test coverage.
This patch adds a lot of additional tests, focusing on loops and GEP
arithmetic. Some of the tests expose existing problems, which will be
fixed soon.
2021-02-06 21:21:48 +00:00
Johannes Doerfert 378f4e5ec2 [AssumptionCache] Do not track llvm.assume calls (PR49043)
This fixes PR49043 by invalidating the handle on RAUW. This will work
fine assuming all existing RAUW users add the new assumption to the
cache. That means, if a new llvm.assume call replaces an old one, you
need to add the new one now as a RAUW is not enough anymore.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96208
2021-02-06 12:18:30 -06:00