Commit Graph

11816 Commits

Author SHA1 Message Date
Nuno Lopes 64af9f61c3 [InstSimplify] add 'x + poison -> poison' (needed for NewGVN) 2021-12-30 11:52:42 +00:00
Fangrui Song b69fe48ccf [IROutliner] Move global namespace cl::opt inside llvm:: 2021-12-30 01:12:55 -08:00
Sanjay Patel 0edf99950e [Analysis] allow caller to choose signed/unsigned when computing constant range
We should not lose analysis precision if an 'add' has both no-wrap
flags (nsw and nuw) compared to just one or the other.

This patch is modeled on a similar construct that was added with
D59386.

I don't think it is possible to expose a problem with an unsigned
compare because of the way this was coded (nuw is handled first).

InstCombine has an assert that fires with the example from:
https://github.com/llvm/llvm-project/issues/52884
...because it was expecting InstSimplify to handle this kind of
pattern with an smax.

Fixes #52884

Differential Revision: https://reviews.llvm.org/D116322
2021-12-28 09:45:37 -05:00
Sanjay Patel 773ab3c665 [Analysis] remove unneeded casts; NFC
The callee does the casting too; this matches a plain call later in the same function for 'shl'.
2021-12-27 13:41:50 -05:00
Nikita Popov ae64c5a0fd [DSE][MemLoc] Handle intrinsics more generically
Remove the special casing for intrinsics in MemoryLocation::getForDest()
and handle them through the general attribute based code. On the DSE
side, this means that isRemovable() now needs to handle more than a
hardcoded list of intrinsics. We consider everything apart from
volatile memory intrinsics and lifetime markers to be removable.

This allows us to perform DSE on intrinsics that DSE has not been
specially taught about, using a matrix store as an example here.

There is an interesting test change for invariant.start, but I
believe that optimization is correct. It only looks a bit odd
because the code is immediate UB anyway.

Differential Revision: https://reviews.llvm.org/D116210
2021-12-24 09:29:57 +01:00
Mehrnoosh Heidarpour 0ff20f2f44 [InstSimplify] Fold logic AND to zero
Adding following fold opportunity:
((A | B) ^ A) & ((A | B) ^ B) --> 0

Reviewed By: spatel, rampitec

Differential Revision: https://reviews.llvm.org/D115755
2021-12-23 10:06:26 -05:00
Mircea Trofin edf8e3ea5e [NFC][mlgo]Make the test model generator inlining-specific
When looking at building the generator for regalloc, we realized we'd
need quite a bit of custom logic, and that perhaps it'd be easier to
just have each usecase (each kind of mlgo policy) have it's own
stand-alone test generator.

This patch just consolidates the old `config.py` and
`generate_mock_model.py` into one file, and does away with
subdirectories under Analysis/models.
2021-12-22 13:38:45 -08:00
Nikita Popov 8a0e35f3a7 [MemoryLocation] Don't require nocapture in getForDest()
As reames mentioned on related reviews, we don't need the nocapture
requirement here. First of all, from an API perspective, this is
not something that MemoryLocation::getForDest() should be checking
in the first place, because it does not affect which memory this
particular call can access; it's an orthogonal concern that should
be handled by the caller if necessary.

However, for both of the motivating users in DSE and InstCombine,
we don't need the nocapture requirement, because the capture can
either be purely local to the call (a pointer identity check that
is irrelevant to us), be part of the return value (which we check
is unused), or be written in the dest location, which we have
determined to be dead.

This allows us to remove the special handling for libcalls as well.

Differential Revision: https://reviews.llvm.org/D116148
2021-12-22 12:20:13 +01:00
Nikita Popov f5ac23b5ae [ArgPromotion][TTI] Pass types to ABI compatibility hook
The areFunctionArgsABICompatible() hook currently accepts a list of
pointer arguments, though what we're actually interested in is the
ABI compatibility after these pointer arguments have been converted
into value arguments.

This means that a) the current API is incompatible with opaque
pointers (because it requires inspection of pointee types) and
b) it can only be used in the specific context of ArgPromotion.
I would like to reuse the API when inspecting calls during inlining.

This patch converts it into an areTypesABICompatible() hook, which
accepts a list of types. This makes the method more generally usable,
and compatible with opaque pointers from an API perspective (the
actual usage in ArgPromotion/Attributor is still incompatible,
I'll follow up on that in separate patches).

Differential Revision: https://reviews.llvm.org/D116031
2021-12-22 09:37:51 +01:00
Serge Pavlov 77b923d0db [ConstantFolding] Do not remove side effect from constrained functions
According to the discussion in https://reviews.llvm.org/D110322 the code
that removes side effect from replaced function call is deleted.

Differential Revision: https://reviews.llvm.org/D115870
2021-12-22 13:45:49 +07:00
Nikita Popov 2926d6d335 [ConstantFold][GlobalOpt] Don't create x86_mmx null value
This fixes the assertion failure reported at
https://reviews.llvm.org/D114889#3198921 with a straightforward
check, until the cleaner fix in D115924 can be reapplied.
2021-12-21 09:11:41 +01:00
Kazu Hirata 500c4b68dc [llvm] Construct SmallVector with iterator ranges (NFC) 2021-12-20 23:43:24 -08:00
Philip Reames 44d23d5345 [DSE] Remove calls with known writes to dead memory
This is a reapply of a8a51fe5, which was reverted in 1ba99e due to a failing compiler-rt test.   That test was a false positive because it was checking asan failures not accounting for the fact the call could be validly optimized out.  I hopefully managed to stablize that test in 9b955f.  (That's a speculative fix due to disk consumption needed to build compiler-rt tests locally being absurd.)

Original commit message follows..

The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size.

Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet).

In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest.

Differential Revision: https://reviews.llvm.org/D115904
2021-12-20 18:10:23 -08:00
Sanjay Patel a56803b8f8 [Analysis] fix cast in ValueTracking to allow constant expression
The test would crash because a non-instruction negate op made it in here.

Fixes #51506
2021-12-20 17:16:47 -05:00
Sander de Smalen b1ff20fd35 [LV] Enable scalable vectorization by default for SVE cores.
The availability of SVE should be sufficient to enable scalable
auto-vectorization.

This patch adds a new TTI interface to query the target what style of
vectorization it wants when scalable vectors are available. For other
targets than AArch64, this currently defaults to 'FixedWidthOnly'.

Differential Revision: https://reviews.llvm.org/D115651
2021-12-20 16:23:29 +00:00
Nikita Popov aeb36ae0f4 Revert "[ConstantFolding] Unify handling of load from uniform value"
This reverts commit 9fd4f80e33.

This breaks SingleSource/Regression/C/gcc-c-torture/execute/pr19687.c
in test-suite. Either the test is incorrect, or clang is generating
incorrect union initialization code. I've submitted
https://reviews.llvm.org/D115994 to fix the test, assuming my
interpretation is correct. Reverting this in the meantime as it
may take some time to resolve.
2021-12-18 20:46:52 +01:00
Ricky Zhou 9927a06f74 [AA] Handle callbr instructions in alias analysis
Before this change, AAResults::getModRefInfo() was missing a case for
callbr instructions (asm goto), which may read/write memory. In PR52735,
this led to a miscompile where a load was incorrect eliminated.

Add this missing case, as well as an assert verifying that all
memory-accessing instructions are handled properly.

Fixes #52735.

Differential Revision: https://reviews.llvm.org/D115992
2021-12-18 18:49:17 +01:00
Nikita Popov 1ba99eaf70 Revert "[DSE] Remove calls with known writes to dead memory"
This reverts commit a8a51fe556.

This breaks the strncpy-overflow.cpp test case.
2021-12-18 09:23:41 +01:00
Philip Reames a8a51fe556 [DSE] Remove calls with known writes to dead memory
The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size.

Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet).

In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest.

Differential Revision: https://reviews.llvm.org/D115904
2021-12-17 13:42:36 -08:00
Philip Reames 793c0da89e [capturetracking] Explicitly check for callee operand [NFC]
Pull out an explicit check rather than relying on the fact that the callee operand is not a data operand.  The only real value is it gives us a clear place to move the comment, and makes the code slightly more understandable.
2021-12-17 09:21:35 -08:00
Nikita Popov 9fd4f80e33 [ConstantFolding] Unify handling of load from uniform value
There are a number of places that specially handle loads from a
uniform value where all the bits are the same (zero, one, undef,
poison), because we a) don't care about the load offset in that
case and b) it bypasses casts that might not be legal generally
but do work with uniform values.

We had multiple implementations of this, with a different set of
supported values each time, as well as incomplete type checks in
some cases. In particular, this fixes the assertion reported in
https://reviews.llvm.org/D114889#3198921, as well as a similar
assertion that could be triggered via constant folding.

Differential Revision: https://reviews.llvm.org/D115924
2021-12-17 17:05:06 +01:00
Momchil Velikov 6192c312cf [AA] Correctly maintain the sign of PartiaAlias offset
Preserve the invariant that offset reported in the case of a
`PartialAlias` between `Loc1` and `Loc2`, is such that
`Loc1 + Offset = Loc2`, where `Loc1` and `Loc2` are the first and
the second argument, respectively, in alias queries.

Differential Revision: https://reviews.llvm.org/D115927
2021-12-17 15:45:26 +00:00
Florian Hahn f5f421e0ee
[SCEV] Apply loop guards in reverse order.
This patch updates applyLoopGuards to first collect all conditions and
then applies them in reverse order. This ensures the SCEVs with the
shortest dependency chains are constructed first, limiting the required
stack size.

This fixes a crash reported in D113578.

Note that the order conditions are applied can impact the accuracy of
the result, mostly due to missing min/max simplifications when
constructing SCEVs.

The changed test highlights the impact of the evaluation order. I will
follow up with a SCEV patch to improve min/max simplifications to get
the same results for both orders.
2021-12-16 10:52:37 +00:00
Nikita Popov a8c2ba105d [Inline] Disable deferred inlining
After the switch to the new pass manager, we have observed multiple
instances of catastrophic inlining, where the inliner produces huge
functions with many hundreds of thousands of instructions from small
input IR. We were forced to back out the switch to the new pass
manager for this reason. This patch fixes at least one of the root
cause issues.

LLVM uses a bottom-up inliner, and the fact that functions are processed
bottom-up is not just a question of optimality -- it is an imporant
requirement to prevent runaway inlining. The premise of the current
inlining approach and cost model is that after all calls inside a function
have been inlined, it may get large enough that inlining it into its
callers is no longer considered profitable. This safeguard does not
exist if inlining doesn't happen bottom-up, as inlining the callees,
and their callees, and their callees etc. will always seem individually
profitable, and the inliner can easily flatten the whole call tree.

There are instances where we necessarily have to deviate from bottom-up
inlining: When inlining in an SCC there is no natural "bottom", so
inlining effectively happens top-down. This requires special care,
and the inliner avoids exponential blowup by ensuring that functions
in the SCC grow in a balanced way and will eventually hit the threshold.

However, there is one instance where the inlining advisor explicitly
violates the bottom-up principle: Deferred inlining tries to "defer"
inlining a call if it determines that inlining the caller into all
its call-sites would be more profitable. Something very important to
understand about deferred inlining is that it doesn't make one inlining
choice in place of another -- it effectively chooses to do both. If we
have a call chain A -> B -> C and cost modelling tells us that inlining
B -> C is profitable, but we defer this and instead inline A -> B first,
then we'll now have a call A -> C, and the cost model will (a few special
cases notwithstanding) still tell us that this is profitable. So the end
result is that we inlined *both* B and C, even though under the usual
cost model function B would have been too large to further inline after
C has been integrated into it.

Because deferred inlining violates the bottom-up invariant of the inliner,
it can result in exponential inlining. The exponential-deferred-inlining.ll
test case illustrates this on a simple example (see
https://gist.github.com/nikic/1262b5f7d27278e1b34a190ae10947f5 for a
much more catastrophic case with about 5000x size blowup). If the call
chain A -> B -> C is not a chain but a tree of calls, then we end up
deferring inlining across the tree and end up flattening everything into
the root node.

This patch proposes to address this by disabling deferred inlining
entirely (currently still behind an option). Beyond the issue of
exponential inlining, I don't think that the whole concept makes sense,
at least as long as deferred inlining still ends up inlining both call
edges.

I believe the motivation for having deferred inlining in the first place
is that you might have a small wrapper function with local linkage that
could be eliminated if inlined. This would automatically happen if there
was a single caller, due to the large "last call to local" bonus. However,
this bonus is not extended if there are multiple callers, even if we
would eventually end up inlining into all of them (if the bonus were
extended).

Now, unlike the normal inlining cost model, the deferred inlining cost
model does look at all callers, and will extend the "last call to local"
bonus if it determines that we could inline all of them as long as we
defer the current inlining decision. This makes very little sense.
The "last call to local" bonus doesn't really cost model anything.
It's basically an "infinite" bonus that ensures we always inline the
last call to a local. The fact that it's not literally infinite just
prevents inlining of huge functions, which can easily result in
scalability issues. I very much doubt that it was an intentional
cost-modelling choice to say that getting rid of a small local function
is worth adding 15000 instructions elsewhere, yet this is exactly how
this value is getting used here.

The main alternative I see to complete removal is to change deferred
inlining to an actual either/or decision. That is, to mark deferred
calls as noinline so we're actually trading off one inlining decision
against another, and not just adding a side-channel to the cost model
to do both.

Apart from fixing the catastrophic inlining case, the effect on rustc
is a modest compile-time improvement on average (up to 8% for a
parsing-type crate, where tree-like calls are expected) and pretty
neutral where run-time performance is concerned (mix of small wins
and losses, usually in the sub-1% category).

Differential Revision: https://reviews.llvm.org/D115497
2021-12-16 09:59:50 +01:00
Mircea Trofin db5aceb979 [NFC] Expose the ReleaseModeModelRunner
The type was pretty much generic, just needed a bit of parameterization.

Differential Revision: https://reviews.llvm.org/D115764
2021-12-15 23:21:58 -08:00
Fangrui Song cf9e61a9bb [LTO][WPD] Simplify mustBeUnreachableFunction and test after D115492
An well-formed IR function definition must have an entry basic block and
a well-formed IR basic block must have one terminator so the emptiness
check can be simplified.
Also simplify the test a bit.

Reviewed By: luna

Differential Revision: https://reviews.llvm.org/D115780
2021-12-15 15:43:35 -08:00
Arthur Eubanks 5a81a60391 [NFC] Remove more calls to getAlignment()
These are deprecated and should be replaced with getAlign().

Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align value.
2021-12-15 14:40:57 -08:00
Mingming Liu 09a704c5ef [LTO] Ignore unreachable virtual functions in WPD in hybrid LTO.
Differential Revision: https://reviews.llvm.org/D115492
2021-12-14 20:18:04 +00:00
Philip Reames 423f19680a Add FMF to hasPoisonGeneratingFlags/dropPoisonGeneratingFlags
These flags are documented as generating poison values for particular input values. As such, we should really be consistent about their handling with how we handle nsw/nuw/exact/inbounds.

Differential Revision: https://reviews.llvm.org/D115460
2021-12-14 08:43:00 -08:00
Florian Hahn ddfac0759c
Revert "[MemoryLocation] Handle memset_pattern{4,8,16} in getForDest."
This reverts commit ac60263ad1.

It looks like the test fails on certain non-Darwin system, even though
the triple is explicitly set to macos. Revert while I investigate.
2021-12-14 14:48:47 +00:00
Nikita Popov 7abf299fed [InlineAdvisor] Add option to control deferred inlining (NFC)
This change is split out from D115497 to add the option
independently from the switch of the default value.
2021-12-14 15:46:11 +01:00
Florian Hahn ac60263ad1
[MemoryLocation] Handle memset_pattern{4,8,16} in getForDest.
memset_pattern{4,8,16} writes to the first argument. Use getForDest
to return the corresponding MemoryLocation.

Reviewed By: ab

Differential Revision: https://reviews.llvm.org/D114906
2021-12-14 14:41:28 +00:00
Kazu Hirata d2377f24e1 Ensure newlines at the end of files (NFC) 2021-12-12 11:04:44 -08:00
Nikita Popov 9932d4db0d [SCEV] Fix unused variable warning (NFC) 2021-12-11 21:03:54 +01:00
Mircea Trofin 04f2712ef4 [NFC][MLGO] Factor ModelUnderTrainingRunner for reuse
This is so we may reuse it. It was very non-inliner specific already.

Differential Revision: https://reviews.llvm.org/D115465
2021-12-10 11:24:15 -08:00
Nikita Popov 65bec04295 [ConstantFold] Handle same type in ConstantFoldLoadThroughBitcast
Usually the case where the types are the same ends up being handled
fine because it's legal to do a trivial bitcast to the same type.
However, this is not true for aggregate types. Short-circuit the
whole code if the types match exactly to account for this.
2021-12-10 16:39:50 +01:00
Sameer Sahasrabuddhe 1d0244aed7 Reapply CycleInfo: Introduce cycles as a generalization of loops
Reverts 02940d6d22. Fixes breakage in the modules build.

LLVM loops cannot represent irreducible structures in the CFG. This
change introduce the concept of cycles as a generalization of loops,
along with a CycleInfo analysis that discovers a nested
hierarchy of such cycles. This is based on Havlak (1997), Nesting of
Reducible and Irreducible Loops.

The cycle analysis is implemented as a generic template and then
instatiated for LLVM IR and Machine IR. The template relies on a new
GenericSSAContext template which must be specialized when used for
each IR.

This review is a restart of an older review request:
https://reviews.llvm.org/D83094

Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>,
with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>

Differential Revision: https://reviews.llvm.org/D112696
2021-12-10 14:36:43 +05:30
Hasyimi Bahrudin c1cd698a52 [InstSimplify] Simplify bool icmp with not in LHS
Refer to https://llvm.org/PR52546.

Simplifies the following cases:
    not(X) == 0 -> X != 0 -> X
    not(X) <=u 0 -> X >u 0 -> X
    not(X) >=s 0 -> X <s 0 -> X
    not(X) != 1 -> X == 1 -> X
    not(X) <=u 1 -> X >=u 1 -> X
    not(X) >s 1 -> X <=s -1 -> X

Differential Revision: https://reviews.llvm.org/D114666
2021-12-09 16:26:46 -05:00
Arthur Eubanks 1172712f46 [NFC] Replace some deprecated getAlignment() calls with getAlign()
Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D115370
2021-12-09 08:43:19 -08:00
Nikita Popov 3beafecedf [InlineAdvisor] Remove outdated comment (NFC)
This just returns None nowadays, so this comment doesn't apply
anymore.
2021-12-09 15:11:56 +01:00
Florian Hahn d74a8a78ad
[LV] Mark various functions as const (NFC).
Make sure various accessors do not modify any state, in preparation for
D115111.
2021-12-09 10:51:29 +00:00
Mircea Trofin 059e03476c [NFC][mlgo] Generalize model runner interface
This prepares it for the regalloc work. Part of it is making model
evaluation accross 'development' and 'release' scenarios more reusable.
This patch:
- extends support to tensors of any shape (not just scalars, like we had
in the inliner -Oz case). While the tensor shape can be anything, we
assume row-major layout and expose the tensor as a buffer.
- exposes the NoInferenceModelRunner, which we use in the 'development'
mode to keep the evaluation code path consistent and simplify logging,
as we'll want to reuse it in the regalloc case.

Differential Revision: https://reviews.llvm.org/D115306
2021-12-08 20:10:58 -08:00
Florian Hahn 3c55acc4a6
[MemoryLocation] Support memset_pattern{4,8} in getForArgument.
memset_pattern{4,8} behave as memset_pattern16, with the only difference
being the size of the pattern location.

Reviewed By: ab

Differential Revision: https://reviews.llvm.org/D114905
2021-12-08 19:39:45 +00:00
Jolanta Jensen 77b2bb5567 [LAA] Use type sizes when determining dependence.
In the isDependence function the code does not try hard enough
to determine the dependence between types. If the types are
different it simply gives up, whereas in fact what we really
care about are the type sizes. I've changed the code to compare
sizes instead of types.

Reviewed By: fhahn, sdesmalen

Differential Revision: https://reviews.llvm.org/D108763
2021-12-08 15:00:58 +00:00
James Farrell 219672b8dd Revert "Revert "Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible.""
This reverts commit 63a6348cad.

Differential Revision: https://reviews.llvm.org/D115254
2021-12-07 23:15:21 +00:00
Jonas Devlieghere 02940d6d22 Revert "CycleInfo: Introduce cycles as a generalization of loops"
This reverts commit 0fe61ecc2c because it
breaks the modules build.

https://green.lab.llvm.org/green/job/clang-stage2-rthinlto/4858/
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/39112/
2021-12-07 13:06:34 -08:00
Sanjay Patel 8a69b04478 [InstSimplify] add logic fold for 'or' with 'xor'+'and'
This replaces the 'or' from 4b30076f16 with an 'and'.
We have to guard against propagating undef elements from
vector 'not' values:
https://alive2.llvm.org/ce/z/irMwRc
2021-12-07 11:08:26 -05:00
Cullen Rhodes 0395e01583 [IR] Split vscale_range interface
Interface is split from:

  std::pair<unsigned, unsigned> getVScaleRangeArgs()

into separate functions for min/max:

  unsigned getVScaleRangeMin();
  Optional<unsigned> getVScaleRangeMax();

Reviewed By: sdesmalen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D114075
2021-12-07 10:38:26 +00:00
Sameer Sahasrabuddhe 0fe61ecc2c CycleInfo: Introduce cycles as a generalization of loops
LLVM loops cannot represent irreducible structures in the CFG. This
change introduce the concept of cycles as a generalization of loops,
along with a CycleInfo analysis that discovers a nested
hierarchy of such cycles. This is based on Havlak (1997), Nesting of
Reducible and Irreducible Loops.

The cycle analysis is implemented as a generic template and then
instatiated for LLVM IR and Machine IR. The template relies on a new
GenericSSAContext template which must be specialized when used for
each IR.

This review is a restart of an older review request:
https://reviews.llvm.org/D83094

Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>,
with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>

Differential Revision: https://reviews.llvm.org/D112696
2021-12-07 12:02:34 +05:30
James Farrell 63a6348cad Revert "Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible."
This reverts commit 5032467034.
2021-12-06 17:35:26 +00:00
Bardia Mahjour dfcfd14070 [VP] getVPMemoryOpCost interface
Added TTI queries for the cost of a VP Memory operation, and added Opcode,
DataType and Alignment to the hasActiveVectorLength() interface.

Reviewed By: Roland Froese

Differential Revision: https://reviews.llvm.org/D109416
2021-12-06 11:27:07 -05:00
James Farrell 5032467034 Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible.
This reverts commit 40d5eeac6c.

Differential Revision: https://reviews.llvm.org/D114885
2021-12-06 14:57:47 +00:00
Kazu Hirata 1457e78352 [llvm] Use range-based for loops (NFC) 2021-12-05 08:33:02 -08:00
Sanjay Patel c65e651e60 [InstSimplify] fix logic fold of 'or' for vectors
Reduce code duplication for commutative pattern matching
and fix a miscompile.

We can't safely propagate an undef element in this transform:
https://alive2.llvm.org/ce/z/s5xy55
2021-12-05 09:57:07 -05:00
Florian Hahn 203f29b40c
[MemoryLocation] Use getForArgument in getForSource/getForDest. (NFC)
getForArgument already knows how to extract a memory location for all
memory intrinsics. Use it instead of duplicating the logic.
2021-12-05 11:13:14 +00:00
Florian Hahn a9125792b3
[MemoryLocation] Support missing atomic intrinsics in getForArg.
getForArgument is missing support for atomic memory transfer
intrinsics. In terms of accessed locations they behave like regular
memory transfer intrinsics and we already support them as such in
getForSource/getForDest.
2021-12-04 22:18:39 +00:00
Mehrnoosh Heidarpour e94134052f [InstSimplify] Add logic 'or' fold to -1
Adding the following folding opportunity:
(~A | B) | (A ^ B) --> -1

https://alive2.llvm.org/ce/z/PMtdYB

Differential revision: https://reviews.llvm.org/D114996
2021-12-04 15:04:18 -05:00
Florian Hahn ead3979a92
[MemoryLocation] Move DSE intrinsic handling to MemoryLocation. (NFC)
Suggested in D114872.
2021-12-03 16:00:39 +00:00
David Green ab0c5cea0b [ARM] Use v2i1 for MVE and CDE intrinsics
This adjusts all the MVE and CDE intrinsics now that v2i1 is a legal
type, to use a <2 x i1> as opposed to emulating the predicate with a
<4 x i1>. The v4i1 workarounds have been removed leaving the natural
v2i1 types, notably in vctp64 which now generates a v2i1 type.

AutoUpgrade code has been added to upgrade old IR, which needs to
convert the old v4i1 to a v2i1 be converting it back and forth to an
integer with arm.mve.v2i and arm.mve.i2v intrinsics. These should be
optimized away in the final assembly.

Differential Revision: https://reviews.llvm.org/D114455
2021-12-03 15:27:58 +00:00
Florian Hahn af86aa7980
[MemoryLocation] Use None instead of {}. (NFC) 2021-12-03 13:19:00 +00:00
Florian Hahn f078536f46
[MemoryLocation] Move DSE's logic to new MemLoc::getForDest helper (NFC).
DSE has some extra logic to determine the write location of library
calls like str*cpy and str*cat. This patch moves the logic to a new
MemoryLocation:getForDest variant, which takes a call and TLI.

This patch should be NFC, because no other places take advantage of the
new helper yet.

Suggested by @reames post-commit 7eec832def.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114872
2021-12-03 09:12:01 +00:00
Nikita Popov 49d040ac97 [SCEV] Fix ValuesAtScopesUsers consistency
Fixes verification failure reported at:
https://reviews.llvm.org/rGc9f9be0381d1

The issue is that getSCEVAtScope() might compute a result without
inserting it in the ValuesAtScopes map in degenerate cases,
specifically if the ValuesAtScopes entry is invalidated during the
calculation. Arguably we should still insert the result if no
existing placeholder is found, but for now just tweak the logic
to only update ValuesAtScopesUsers if ValuesAtScopes is updated.
2021-12-03 10:03:10 +01:00
Florian Hahn 829b29b619
[MemoryLocation] strcat/strncat/strcpy read/write after their args.
strcpy/strcat/strncat access memory starting from the passed in
pointers. Construct memory locations for their args using getAfter.

Discussed in D114872.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D114969
2021-12-03 08:48:23 +00:00
Florian Hahn 639a78a4bf
[MemoryLocation] Support strncpy in getForArgument.
The size argument of strncpy can be used as bound for the size of
its pointer arguments.

strncpy is guaranteed to write N bytes and reads up to N bytes.

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D114871
2021-12-02 14:18:05 +00:00
Sanjay Patel 97e921c81f [PatternMatch] create and use matcher for 'not' that excludes undef elements
We needed a stricter version of m_Not for D114462, but I wasn't
sure if that was going to be required anywhere else, so I didn't bother
to make that reusable.

It turns out we have one more existing simplification that needs
this (currently miscompiles):
https://alive2.llvm.org/ce/z/9-nTKi

And there's at least one more fold in that family that we could add.

Differential Revision: https://reviews.llvm.org/D114882
2021-12-02 08:51:13 -05:00
Florian Hahn 9f9e8ba114
[MemoryLocation] Support memset_chk in getForArgument.
The size argument for memset_chk is an upper bound for the size of the
pointer argument. memset_chk may write less than the specified length,
if it exceeds the specified max size and aborts.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114870
2021-12-02 13:45:58 +00:00
Florian Hahn ad88a37cea
[TLI] Add memset_pattern4, memset_pattern8 lib functions.
Similar to memset_pattern16, memset_pattern4, memset_pattern8 are
available on Darwin platforms.

https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/memset_pattern4.3.html

Reviewed By: ab

Differential Revision: https://reviews.llvm.org/D114881
2021-12-01 21:18:19 +00:00
Nikita Popov 67704801c6 [SCEV] Track backedge taken count users (NFCI)
Track which SCEVs are used as ExactNotTaken counts in
BackedgeTakenInfo structures, so we can directly determine which
loops need to be invalidated, rather than iterating over all BECounts.

This gives a small compile-time improvement on average, but the
motivation here is more to ensure there are no degenerate cases,
if the number of backedge taken counts is large.

Differential Revision: https://reviews.llvm.org/D114784
2021-12-01 10:16:47 +01:00
Nikita Popov c9f9be0381 [SCEV] Verify integrity of ValuesAtScopes and users (NFC)
Make sure that ValuesAtScopes and ValuesAtScopesUsers are
consistent during SCEV verification.
2021-11-30 21:08:40 +01:00
Sanjay Patel 4b30076f16 [InstSimplify] add logic fold for 'or'
https://alive2.llvm.org/ce/z/4PaPDy

There's a related fold where the inner 'or' is replaced by 'and',
but that needs to be more careful about matching a 'not'.
2021-11-30 14:08:54 -05:00
Sanjay Patel c49ef1448d [InstSimplify] reduce code duplication for 'or' logic folds; NFC 2021-11-30 14:08:54 -05:00
Sanjay Patel 7a7c059d86 [InstSimplify] reduce code duplication for 'or' logic fold; NFC 2021-11-30 12:55:37 -05:00
Sanjay Patel 8dec0b23da [InstSimplify] refactor 'or' logic folds; NFC
Reduce duplication for handling the top-level commuted operands.
There are several other folds that should be moved in here, but
we need to make sure there's good test coverage.
2021-11-30 12:55:36 -05:00
Nikita Popov 40d5eeac6c Revert "Use VersionTuple for parsing versions in Triple. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible."
This reverts commit 1e82864670.

llvm/test/Transforms/LoopStrengthReduce/X86/2009-11-10-LSRCrash.ll fails
with assertion failure:

llc: /home/nikic/llvm-project/llvm/include/llvm/ADT/Optional.h:196: T& llvm::optional_detail::OptionalStorage<T, true>::getValue() & [with T = unsigned int]: Assertion `hasVal' failed.
...
 #8 0x00005633843af5cb llvm::MCStreamer::emitVersionForTarget(llvm::Triple const&, llvm::VersionTuple const&)
 #9 0x0000563383b47f14 llvm::AsmPrinter::doInitialization(llvm::Module&)
2021-11-30 18:36:32 +01:00
kpyzhov a356dae74c [RegionPass] Added check for -filter-print-funcs option to the region IR dumps.
Differential Revision: https://reviews.llvm.org/D114310
2021-11-30 12:30:15 -05:00
Nikita Popov 37d72991c1 [SCEV] Track and invalidate ValuesAtScopes users
ValuesAtScopes maps a SCEV and a Loop to another SCEV. While we
invalidate entries if the left-hand SCEV is invalidated, we
currently don't do this for the right-hand SCEV. Fix this by
tracking users in a reverse map and using it for invalidation.

This is conceptually the same change as D114738, but using the
reverse map to avoid performance issues.

Differential Revision: https://reviews.llvm.org/D114788
2021-11-30 18:21:14 +01:00
James Farrell 1e82864670 Use VersionTuple for parsing versions in Triple. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible.
See also https://github.com/android/ndk/issues/1455.

Differential Revision: https://reviews.llvm.org/D114163
2021-11-30 15:44:23 +00:00
Nikita Popov 77dd579827 [SCEV] Remove incorrect assert
Fix assertion failure reported on D113349 by removing the assert.
While the produced expression should be equivalent, it may not
be strictly the same, e.g. due to lazy nowrap flag updates. Similar
to what the main createSCEV() code does, simply retain the old
value map entry if one already exists.
2021-11-29 17:09:12 +01:00
Florian Hahn 7b75110fac
[SCEV] Turn validity check in getExistingSCEV into assert (NFC).
Now that we track users of SCEV expressions, we should be able to always
invalidate containing expressions.

With that, I think the case where a value gets removed but
SCEVs containing references to it should not be possible any longer.
Turn check into an assert.

This slightly reduces compile-time:

NewPM-O3: -0.27%
NewPM-ReleaseThinLTO: -0.21%
NewPM-ReleaseLTO-g: -0.26%

http://llvm-compile-time-tracker.com/compare.php?from=c3dc6b081da6ba503e67d260033f81f61eb38ea3&to=95a4a028b1f1dd0bc3d221435953b7d2c031b3d5&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114633
2021-11-28 12:16:55 +00:00
Nikita Popov f492a414ba [SCEV] Simplify forgetSymbolicName() (NFCI)
With the recently introduced tracking as well as D113349, we can
greatly simplify forgetSymbolicName(). In fact, we can simply
replace it with forgetMemoizedResults().

What forgetSymbolicName() used to do is to walk the IR use-def
chain to find all SCEVs that mention the SymbolicName. However,
thanks to use tracking, we can now determine the relevant SCEVs
in a more direct way. D113349 is needed to also clear out the
actual IR to SCEV mapping in ValueExprMap.

Differential Revision: https://reviews.llvm.org/D114263
2021-11-27 16:42:38 +01:00
Nikita Popov c2550e3427 [SCEV] Simplify invalidation after BE count calculation (NFCI)
After backedge taken counts have been calculated, we want to
invalidate all addrecs and dependent expressions in the loop,
because we might compute better results with the newly available
backedge taken counts. Previously this was done with a forgetLoop()
style use-def walk. With recent improvements to SCEV invalidation,
we can instead directly invalidate any SCEVs using addrecs in this
loop. This requires a great deal less subtlety to avoid invalidating
more than necessary, and in particular gets rid of the hack from
D113349. The change is similar to D114263 in spirit.
2021-11-27 16:35:06 +01:00
Nikita Popov 2b160e95c8 Reland [SCEV] Fix and validate ValueExprMap/ExprValueMap consistency
Relative to the previous landing attempt, this introduces an additional
flag on forgetMemoizedResults() to not remove SCEVUnknown phis from
the value map. The invalidation after BECount calculation wants to
leave these alone and skips them in its own use-def walk, but we can
still end up invalidating them via forgetMemoizedResults() if there
is another IR value with the same SCEV. This is intended as a temporary
workaround only, and the need for this should go away once the
getBackedgeTakenInfo() invalidation is refactored in the spirit of
D114263.

-----

This adds validation for consistency of ValueExprMap and
ExprValueMap, and fixes identified issues:

* Addrec construction directly wrote to ValueExprMap in a few places,
  without updating ExprValueMap. Add a helper to ensures they stay
  consistent. The adjustment in forgetSymbolicName() explicitly
  drops the old value from the map, so that we don't rely on it
  being overwritten.
* forgetMemoizedResultsImpl() was dropping the SCEV from
  ExprValueMap, but not dropping the corresponding entries from
  ValueExprMap.

Differential Revision: https://reviews.llvm.org/D113349
2021-11-27 12:37:15 +01:00
Erik Desjardins 53b00b8215 [InstSimplify] Fold X {lshr,udiv} C <u X --> true for nonzero X, non-identity C
This eliminates the bounds check in Rust code like

pub fn mid(data: &[i32]) -> i32 {
  if data.is_empty() { return 0; }
  return data[data.len()/2];
}

(from https://blog.sigplan.org/2021/11/18/undefined-behavior-deserves-a-better-reputation/)

Alive proofs:
lshr https://alive2.llvm.org/ce/z/nyTu8D
udiv https://alive2.llvm.org/ce/z/CNUZH7

Differential Revision: https://reviews.llvm.org/D114279
2021-11-26 16:48:33 -05:00
Nikita Popov 719354a571 Revert "[SCEV] Fix and validate ValueExprMap/ExprValueMap consistency"
This reverts commit bee8dcda1f.

Some sanitizer buildbots fail with:
> Attempt to use a SCEVCouldNotCompute object!

For example:
https://lab.llvm.org/buildbot/#/builders/85/builds/7020/steps/9/logs/stdio
2021-11-26 22:18:23 +01:00
Nikita Popov bee8dcda1f [SCEV] Fix and validate ValueExprMap/ExprValueMap consistency
Relative to the previous landing attempt, this makes
insertValueToMap() resilient against the value already being
present in the map -- previously I only checked this for the
createSimpleAffineAddRec() case, but the same issue can also
occur for the general createNodeForPHI(). In both cases, the
addrec may be constructed and added to the map in a recursive
query trying to create said addrec. In this case, this happens
due to the invalidation when the BE count is computed, which
ends up clearing out the symbolic name as well.

-----

This adds validation for consistency of ValueExprMap and
ExprValueMap, and fixes identified issues:

* Addrec construction directly wrote to ValueExprMap in a few places,
  without updating ExprValueMap. Add a helper to ensures they stay
  consistent. The adjustment in forgetSymbolicName() explicitly
  drops the old value from the map, so that we don't rely on it
  being overwritten.
* forgetMemoizedResultsImpl() was dropping the SCEV from
  ExprValueMap, but not dropping the corresponding entries from
  ValueExprMap.

Differential Revision: https://reviews.llvm.org/D113349
2021-11-26 20:57:47 +01:00
Florian Hahn b927aa69bf
[SCEV] Turn check in createSimpleAffineAddRec to assertion. (NFC)
Accum is guaranteed to be defined outside L (via Loop::isLoopInvariant
checks above). I think that should guarantee that the more powerful
ScalarEvolution::isLoopInvariant also determines that the value is loop
invariant.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114634
2021-11-26 13:23:48 +00:00
Zarko Todorovski 95875d246a [LLVM][NFC]Inclusive language: remove occurances of sanity check/test from llvm
Part of work to use more inclusive language in clang/llvm. Rewording
some comments and change function and variable names.
2021-11-24 17:29:55 -05:00
Peter Waller 787b66eb5f [LoopAccessAnalysis][SVE] Bail out for scalable vectors
The supplied test case, reduced from real world code, crashes with a
'Invalid size request on a scalable vector.' error.

Since it's similar in spirit to an existing LAA test, rename the file to
generalize it to both.

Differential Revision: https://reviews.llvm.org/D114155
2021-11-24 15:52:20 +00:00
Sanjay Patel b326c05814 [InstSimplify] fold xor logic of 2 variables, part 2
(~a & b) ^ (a | b) --> a

This is the swapped and/or (Demorgan?) sibling fold for
the fold added with D114462 ( 892648b18a ).

This case is easier to specify because we are returning
a root value, not a 'not':
https://alive2.llvm.org/ce/z/SRzj4f
2021-11-24 08:15:47 -05:00
Rosie Sumpter c2441b6b89 [LoopVectorize] Add vector reduction support for fmuladd intrinsic
Enables LoopVectorize to handle reduction patterns involving the
llvm.fmuladd intrinsic.

Differential Revision: https://reviews.llvm.org/D111555
2021-11-24 08:50:04 +00:00
Florian Mayer 6c06d8e310 [stack-safety] Check SCEV constraints at memory instructions.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D113160
2021-11-23 15:29:23 -08:00
Florian Hahn 73a05cc8df
[LAA] Move visitPointers up in file (NFC).
This allows easier re-use in earlier functions.
2021-11-23 22:47:26 +00:00
Sanjay Patel 892648b18a [InstSimplify] fold xor logic of 2 variables
(a & b) ^ (~a | b) --> ~a

I was looking for a shortcut to reduce some of the complex logic
folds that are currently up for review (D113216
and others in that stack), and I found this missing from
instcombine/instsimplify.

There is a trade-off in putting it into instsimplify: because
we can't create new values here, we need a strict 'not' op (no
undef elements). Otherwise, the fold is not valid:
https://alive2.llvm.org/ce/z/k_AGGj

If this was in instcombine instead, we could create the proper
'not'. But having the fold here benefits other passes like GVN
that use instsimplify as an analysis.

There is a related fold where 'and' and 'or' are swapped, and
that is planned as a follow-up commit.

Differential Revision: https://reviews.llvm.org/D114462
2021-11-23 16:50:23 -05:00
Florian Hahn 0a00d64e32
[LAA] Turn aggregate type check into assertion (NFCI).
getPtrStride should not be called with aggregate access types. There's
also an old TODO.

Turn the check into an assertion.
2021-11-23 17:37:30 +00:00
Paul Robinson c075566c8d [PS4][TLI] Remove redundant line 2021-11-23 08:42:32 -08:00
Nikita Popov 62e9acad0a Revert "[SCEV] Fix and validate ValueExprMap/ExprValueMap consistency"
This reverts commit d633db8f9d.

Causes bootstrap assertion failures:
https://lab.llvm.org/buildbot/#/builders/168/builds/3459/steps/9/logs/stdio
2021-11-22 15:47:33 +01:00
Nikita Popov d633db8f9d [SCEV] Fix and validate ValueExprMap/ExprValueMap consistency
This adds validation for consistency of ValueExprMap and
ExprValueMap, and fixes identified issues:

* Addrec construction directly wrote to ValueExprMap in a few places,
  without updating ExprValueMap. Add a helper to ensures they stay
  consistent. The adjustment in forgetSymbolicName() explicitly
  drops the old value from the map, so that we don't rely on it
  being overwritten.
* forgetMemoizedResultsImpl() was dropping the SCEV from
  ExprValueMap, but not dropping the corresponding entries from
  ValueExprMap.

Differential Revision: https://reviews.llvm.org/D113349
2021-11-22 15:27:25 +01:00
Simon Moll 56db1c072c [DA][NFC] Update publication - add remarks
Update the reference publication for the SyncDependenceAnalysis and Divergence Analysis.  Fix phrasing, formatting. Add comments on reducible loop limitation.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D114146
2021-11-22 12:58:19 +01:00
Sjoerd Meijer 4d21b64464 [BPI] Look-up tables for non-loop branches. NFC.
This adds and uses look-up tables for non-loop branch probabilities, which have
have probabilities directly encoded into the tables for the different condition
codes. Compared to having this logic inlined in different functions, as it used
to be the case, I think this is compacter and thus also easier to check/cross
reference. This also adds a test for pointer heuristics that was missing.

Differential Revision: https://reviews.llvm.org/D114009
2021-11-22 10:30:42 +00:00
Kazu Hirata f6bce30cf9 [llvm] Use range-based for loops (NFC) 2021-11-20 18:42:10 -08:00
Nikita Popov 0a2bde94a0 [LVI] Drop requirement that modulus is constant
If we're looking only at the lower bound, the actual modulus
doesn't matter. This is a leftover from when I wanted to consider
the upper bound as well, where the modulus does matter.
2021-11-20 21:06:08 +01:00
Nikita Popov cd84cab6b3 [LVI] Support urem in implied conditions
If (X urem M) >= C we know that X >= C. Make use of this fact
when computing the implied condition range.

In some cases we could also establish an upper bound, but that's
both tricker and not interesting in practice.

Alive: https://alive2.llvm.org/ce/z/R5ZGSW
2021-11-20 21:01:26 +01:00
Philip Reames 28000587e1 [SCEV] Revert two speculative compile time optimizations which made no difference
Revert "[SCEV] Defer all work from ea12c2cb as late as possible"
Revert "[SCEV] Defer loop property checks from ea12c2cb as late as possible"

This reverts commit 734abbad79 and  1a5666acb2.

Both of these changes were speculative attempts to address a compile time regression.  Neither worked, and both complicated the code in undesirable ways.
2021-11-19 08:45:56 -08:00
Philip Reames 734abbad79 [SCEV] Defer all work from ea12c2cb as late as possible
This is a second speculative compile time optimization to address a reported regression.  My actual suspicion is that availability of no-self-wrap is making some *other* bit of code trigger, but let's rule this out.
2021-11-18 17:19:52 -08:00
Philip Reames 1a5666acb2 [SCEV] Defer loop property checks from ea12c2cb as late as possible
This is a speculative compile time optimization to address a reported regression.  It's the only thing which vaguely makes sense.
2021-11-18 13:47:45 -08:00
Philip Reames ea12c2cb9c [SCEV] Move mustprogress based no-self-wrap logic so it applies to all exit conditions
This change moves logic which we'd added specifically for less than tests so that it applies to equalities and greater than tests as well. The basic idea is that if we can show an IV cycles infinitely through the same series on self-wrap, and that the exit condition must be taken to prevent UB, we can conclude that it must be taken before self-wrap and thus infer said flag.

The motivation here is simple loops with unsigned induction variables w/non-one steps and inequality tests. A toy example would be:
for (unsigned i = 0; i != N; i += 2) { body; }

If body contains no side effects, and this is a mustprogress function, we can assume that this must be a finite loop and thus that the exit count is N/2.

Differential Revision: https://reviews.llvm.org/D103991
2021-11-18 10:07:44 -08:00
Kazu Hirata 7ca14f6044 [llvm] Use range-based for loops (NFC) 2021-11-18 09:09:52 -08:00
Kerry McLaughlin ff64b2933a [LoopVectorize] Check the number of uses of an FAdd before classifying as ordered
checkOrderedReductions looks for Phi nodes which can be classified as in-order,
meaning they can be vectorised without unsafe math. In order to vectorise the
reduction it should also be classified as in-loop by getReductionOpChain, which
checks that the reduction has two uses.

In this patch, a similar check is added to checkOrderedReductions so that we
now return false if there are more than two uses of the FAdd instruction.
This fixes PR52515.

Reviewed By: fhahn, david-arm

Differential Revision: https://reviews.llvm.org/D114002
2021-11-18 16:41:19 +00:00
Florian Hahn da9f2ba3b1
[SCEV] Reorder operands checks in collectConditions.
The initial two cases require a SCEVConstant as RHS. Pull up the condition
to check and swap SCEVConstants from below. Also remove a redundant
check & swap if RHS is SCEVUnknown.
2021-11-18 09:36:16 +00:00
Philip Reames ad69402f3e [SCEVAA] Avoid forming malformed pointer diff expressions
This solves the same crash as in D104503, but with a different approach.

The test case test_non_dom demonstrates a case where scev-aa crashes today. (If exercised either by -eval-aa or -licm.) The basic problem is that SCEV-AA expects to be able to compute a pointer difference between two SCEVs for any two pair of pointers we do an alias query on. For (valid, but out of scope) reasons, we can end up asking whether expressions in different sub-loops can alias each other. This results in a subtraction expression being formed where neither operand dominates the other.

The approach this patch takes is to leverage the "defining scope" notion we introduced for flag semantics to detect and disallow the formation of the problematic SCEV. This ends up being relatively straight forward on that new infrastructure. This change does hint that we should probably be verifying a similar property for all SCEVs somewhere, but I'll leave that to a follow on change.

Differential Revision: D114112
2021-11-17 12:38:04 -08:00
Arthur Eubanks e3e25b5112 [NewPM] Add option to prevent rerunning function pipeline on functions in CGSCC adaptor
In a CGSCC pass manager, we may visit the same function multiple times
due to SCC mutations. In the inliner pipeline, this results in running
the function simplification pipeline on a function multiple times even
if it hasn't been changed since the last function simplification
pipeline run.

We use a newly introduced analysis to keep track of whether or not a
function has changed since the last time the function simplification
pipeline has run on it. If we see this analysis available for a function
in a CGSCCToFunctionPassAdaptor, we skip running the function passes on
the function. The analysis is queried at the end of the function passes
so that it's available after the first time the function simplification
pipeline runs on a function. This is a per-adaptor option so it doesn't
apply to every adaptor.

The goal of this is to improve compile times. However, currently we
can't turn this on by default at least for the higher optimization
levels since the function simplification pipeline is not robust enough
to be idempotent in many cases, resulting in performance regressions if
we stop running the function simplification pipeline on a function
multiple times. We may be able to turn this on for -O1 in the near
future, but turning this on for higher optimization levels would require
more investment in the function simplification pipeline.

Heavily inspired by D98103.

Example compile time improvements with flag turned on:
https://llvm-compile-time-tracker.com/compare.php?from=998dc4a5d3491d2ae8cbe742d2e13bc1b0cacc5f&to=5c27c913687d3d5559ef3ab42b5a3d513531d61c&stat=instructions

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D113947
2021-11-17 09:06:46 -08:00
Florian Hahn e8b55cf7b7
[SCEV] Apply loop guards when computing max BTC for arbitrary steps.
Similar other cases in the current function (e.g. when the step is 1 or
-1), applying loop guards can lead to tighter upper bounds for the
backedge-taken counts.

Fixes PR52464.

Reviewed By: reames, nikic

Differential Revision: https://reviews.llvm.org/D113578
2021-11-17 11:00:49 +00:00
Philip Reames 8d85e945b2 [SCEV] Canonicalize X - urem X, Y patterns
There are multiple possible ways to represent the X - urem X, Y pattern. SCEV was not canonicalizing, and thus, depending on which you analyzed, you could get different results. The sub representation appears to produce strictly inferior results in practice, so I decided to canonicalize to the Y * X/Y version.

The motivation here is that runtime unroll produces the sub X - (and X, Y-1) pattern when Y is a power of two. SCEV is thus unable to recognize that an unrolled loop exits because we don't figure out that the new unrolled step evenly divides the trip count of the unrolled loop. After instcombine runs, we convert the the andn form which SCEV recognizes, so essentially, this is just fixing a nasty pass ordering dependency.

The ARM loop hardware interaction in the test diff is opague to me, but the comments in the review from others knowledge of the infrastructure appear to indicate these are improvements in loop recognition, not regressions.

Differential Revision: https://reviews.llvm.org/D114018
2021-11-16 11:59:21 -08:00
Arthur Eubanks c95a9f46c9 [Loads] Handle addrspacecast constant expressions when determining dereferenceability
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114015
2021-11-16 11:17:57 -08:00
Florian Hahn b7aec4f08e
[SCEV] Support rewriting ZExt expressions with loop guard info.
So far, applying loop guard information has been restricted to
SCEVUnknown. In a few cases, like PR40961 and PR52464, this leads to
SCEV failing to determine tight upper bounds for the backedge taken
count.

This patch adjusts SCEVLoopGuardRewriter and applyLoopGuards to support
re-writing ZExt expressions.

This is a first step towards fixing  PR40961 and PR52464.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D113577
2021-11-16 11:16:07 +00:00
Mehrnoosh Heidarpour 62c51a72f9 [InstSimplify] Fold A|B | (A^B) --> A|B
This patch adds the following fold opportunity:
A|B | (A^B) --> A|B

that is reported here : https://bugs.llvm.org/show_bug.cgi?id=52479

https://alive2.llvm.org/ce/z/33-My-

Test cases with base results are added in D113860

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D113861
2021-11-15 18:55:04 -05:00
Stanislav Mekhanoshin 833cdb0a07 Revert "[InstSimplify] Fold A|B | (A^B) --> A|B"
This reverts commit 193c40e966.
2021-11-15 14:56:20 -08:00
Arthur Eubanks 19867de9e7 [NewPM] Only invalidate modified functions' analyses in CGSCC passes + turn on eagerly invalidate analyses
Previously, any change in any function in an SCC would cause all
analyses for all functions in the SCC to be invalidated. With this
change, we now manually invalidate analyses for functions we modify,
then let the pass manager know that all function analyses should be
preserved since we've already handled function analysis invalidation.

So far this only touches the inliner, argpromotion, function-attrs, and
updateCGAndAnalysisManager(), since they are the most used.

This is part of an effort to investigate running the function
simplification pipeline less on functions we visit multiple times in the
inliner pipeline.

However, this causes major memory regressions especially on larger IR.
To counteract this, turn on the option to eagerly invalidate function
analyses. This invalidates analyses on functions immediately after
they're processed in a module or scc to function adaptor for specific
parts of the pipeline.

Within an SCC, if a pass only modifies one function, other functions in
the SCC do not have their analyses invalidated, so in later function
passes in the SCC pass manager the analyses may still be cached. It is
only after the function passes that the eager invalidation takes effect.
For the default pipelines this makes sense because the inliner pipeline
runs the function simplification pipeline after all other SCC passes
(except CoroSplit which doesn't request any analyses).

Overall this has mostly positive effects on compile time and positive effects on memory usage.
https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=instructions
https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=max-rss

D113196 shows that we slightly regressed compile times in exchange for
some memory improvements when turning on eager invalidation.  D100917
shows that we slightly improved compile times in exchange for major
memory regressions in some cases when invalidating less in SCC passes.
Turning these on at the same time keeps the memory improvements while
keeping compile times neutral/slightly positive.

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D113304
2021-11-15 14:44:53 -08:00
Stanislav Mekhanoshin 193c40e966 [InstSimplify] Fold A|B | (A^B) --> A|B
This patch adds the following fold opportunity:
A|B | (A^B) --> A|B

that is reported here : https://bugs.llvm.org/show_bug.cgi?id=52479

https://alive2.llvm.org/ce/z/33-My-

Test cases with base results are added in D113860

(authored by MehrHeidar, committed by rampitec).

Differential Revision:  https://reviews.llvm.org/D113861
2021-11-15 13:49:20 -08:00
Florian Hahn 112c1c346a
[IVDescriptor] Make sure the sign is included for negative extension.
At the moment, computeRecurrenceType does not include any sign bits in
the maximum bit width. If the value can be negative, this means the sign
bit will be missing and the sext won't properly extend the value.

If the value can be negative, increment the bitwidth by one to make sure
there is at least one sign bit in the result value.

Note that the increment is also needed *if* the value is *known* to be
negative, as a sign bit needs to be preserved for the sext to work.

Note that this at the moment prevents vectorization, because the
analysis computes i1 as type for the recurrence when looking through the
AND in lookThroughAnd.

Fixes PR51794, PR52485.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D113056
2021-11-15 13:12:57 +00:00
Kazu Hirata d243cbf8ea [llvm] Use isa instead of dyn_cast (NFC) 2021-11-14 19:40:46 -08:00
Mircea Trofin a32c2c3808 [NFC] Use Optional<ProfileCount> to model invalid counts
ProfileCount could model invalid values, but a user had no indication
that the getCount method could return bogus data. Optional<ProfileCount>
addresses that, because the user must dereference the optional. In
addition, the patch removes concept duplication.

Differential Revision: https://reviews.llvm.org/D113839
2021-11-14 19:03:30 -08:00
Kazu Hirata 7379736774 [llvm] Use range-based for loops with User::operands (NFC) 2021-11-14 09:32:38 -08:00
Roman Lebedev e876698a5d
[NFC][TTI] `getReplicationShuffleCost()`: s/Replicated/Dst/
'Replicated' is mouthful and somewhat ambigious,
while 'destination' is pretty self-explanatory.
2021-11-14 20:01:38 +03:00
Florian Hahn 8ed8d37088
[SCEV] Update SCEVLoopGuardRewriter to hold reference to map. (NFC)
SCEVLoopGuardRewriter doesn't need to copy the rewrite map. It can just
hold a const reference instead, to avoid an unnecessary copy.
2021-11-13 09:39:14 +00:00
Florian Hahn 03cfea68c6
[SCEV] Update SCEVLoopGuardRewriter to take SCEV -> SCEV map (NFC).
Split off refactoring from D113577 to reduce the diff. NFC as the new
interface will only be used in D113577.
2021-11-12 18:16:03 +00:00
Florian Hahn 819bca9b90
[SCEV] Use APIntOps::umin to select best max BC count (NFC).
Suggested in D102267, but I missed this in the committed version.
2021-11-12 12:20:01 +00:00
Mircea Trofin f64eee1625 [NFC][InlineAdvisor] Inform advisor when the module is invalidated
This avoids unnecessary re-calculation of module-wide features in the
MLInlineAdvisor. In cases where function passes don't invalidate
functions (and, thus, don't invalidate the module), but we re-process a
CGSCC, we currently refreshed module features unnecessarily. The
overhead of fetching cached results (albeit they weren't themselves
invalidated) was noticeable in certain modules' compilations.

We don't want to just invalidate the advisor object, though, via the
analysis manager, because we'd then need to re-create expensive state
(like the model evaluator in the ML 'development' mode).

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D113644
2021-11-11 10:23:49 -08:00
duanbo.db 53dc525828 [LoopInfo] Fix function getInductionVariable
The way function gets the induction variable is by judging whether
StepInst or IndVar in the phi statement is one of the operands of CMP.
But if the LatchCmpOp0/LatchCmpOp1 is a constant,  the subsequent
comparison may result in null == null, which is meaningless. This patch
fixes the typo.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D112980
2021-11-11 16:22:42 +08:00
Bin Cheng bf76e64854 [BPI] Push exit block rather than exiting ones in getSccExitBlocks
The function BranchProbabilityInfo::SccInfo::getSccExitBlocks is
supposed to collect all exit blocks for SCC rather than all exiting
blocks. This patch fixes the typo.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D113344
2021-11-11 14:22:19 +08:00
Chris Jackson 116dc70cf3 [DebugInfo][LSR] Add more stringent checks on IV selection and salvage
attempts

Prevent the selection of IVs that have a SCEV containing an undef. Also
prevent salvaging attempts for values for which a SCEV could not be
created by ScalarEvolution and have only SCEVUknown.

Reviewed by: Orlando

Differential Revision: https://reviews.llvm.org/D111810
2021-11-09 13:09:37 +00:00
Roman Lebedev d484cc152b
[TTI] Adjust `getReplicationShuffleCost()` interface
It is trivial to produce DemandedSrcElts given DemandedReplicatedElts,
so don't pass the former. Also, it isn't really useful so far
to have the overload taking the Mask, so just inline it.
2021-11-09 14:07:59 +03:00
Michael Liao bf225939bc [InferAddressSpaces] Support assumed addrspaces from addrspace predicates.
- CUDA cannot associate memory space with pointer types. Even though Clang could add extra attributes to specify the address space explicitly on a pointer type, it breaks the portability between Clang and NVCC.
- This change proposes to assume the address space from a pointer from the assumption built upon target-specific address space predicates, such as `__isGlobal` from CUDA. E.g.,

```
  foo(float *p) {
    __builtin_assume(__isGlobal(p));
    // From there, we could assume p is a global pointer instead of a
    // generic one.
  }
```

This makes the code portable without introducing the implementation-specific features.

Note that NVCC starts to support __builtin_assume from version 11.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112041
2021-11-08 16:51:57 -05:00
Sander de Smalen 2829376bb2 [LV] Use VScaleForTuning to fine-tune the cost per lane.
When targeting a specific CPU with scalable vectorization, the knowledge
of that particular CPU's vscale value can be used to tune the cost-model
and make the cost per lane less pessimistic.

If the target implements 'TTI.getVScaleForTuning()', the cost-per-lane
is calculated as:

  Cost / (VScaleForTuning * VF.KnownMinLanes)

Otherwise, it assumes a value of 1 meaning that the behavior
is unchanged and calculated as:

  Cost / VF.KnownMinLanes

Reviewed By: kmclaughlin, david-arm

Differential Revision: https://reviews.llvm.org/D113209
2021-11-08 16:59:46 +00:00
Nikita Popov a8c318b50e [BasicAA] Use index size instead of pointer size
When accumulating the GEP offset in BasicAA, we should use the
pointer index size rather than the pointer size.

Differential Revision: https://reviews.llvm.org/D112370
2021-11-07 18:56:11 +01:00
Benjamin Kramer 9b8b16457c Put implementation details into anonymous namespaces. NFCI. 2021-11-07 15:18:30 +01:00
Kazu Hirata 843d1eda18 [llvm] Use llvm::reverse (NFC) 2021-11-06 19:31:18 -07:00
Nikita Popov e3cec17b2d [InstSimplify] Remove incorrect icmp of gep fold (PR52429)
As described in https://bugs.llvm.org/show_bug.cgi?id=52429 this
fold is incorrect, because inbounds only guarantees that the
pointers don't wrap in the unsigned space: It is possible that
the sign boundary is crossed by an object.

I'm dropping the fold entirely rather than adjusting it, because
computePointerICmp() fully subsumes it (just with correct predicate
handling).

Differential Revision: https://reviews.llvm.org/D113343
2021-11-06 21:03:21 +01:00
Roman Lebedev a30ec4778a
[TTI][CostModel] `getUserCost()`: recognize replication shuffles and query their cost
This finally creates proper test coverage for replication shuffles,
that are used by LV for conditional loads, and will allow to add
proper costmodel at least for AVX512.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113324
2021-11-06 16:45:15 +03:00
Roman Lebedev f8efc5c0ac
[NFC][TTI] Add/extract `getReplicationShuffleCost()` method, deduplicate it's implementations
Hiding it in `getInterleavedMemoryOpCost()` is problematic for a number of reasons,
including testability and reuse, let's do better.

In a followup `getUserCost()` will be taught to use to to estimate the mask costs,
which will allow for better cost model tests for it.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113313
2021-11-06 16:45:15 +03:00
Kazu Hirata 87e53a0ad8 [llvm] Use make_early_inc_range (NFC) 2021-11-05 19:39:07 -07:00
Philip Reames d24a0e8857 [SCEV] Use constant range of RHS to prove NUW on narrow IV in trip count logic
The basic idea here is that given a zero extended narrow IV, we can prove the inner IV to be NUW if we can prove there's a value the inner IV must take before overflow which must exit the loop.

Differential Revision: https://reviews.llvm.org/D109457
2021-11-05 15:36:47 -07:00
David Green 61225c0818 [ValueTracking][InstCombine] Introduce and use ComputeMinSignedBits
This introduces a new ComputeMinSignedBits method for ValueTracking that
returns the BitWidth - SignBits + 1 from ComputeSignBits, and represents
the minimum bit size for the value as a signed integer.  Similar to the
existing APInt::getMinSignedBits method, this can make some of the
reasoning around ComputeSignBits more natural.

See https://reviews.llvm.org/D112298
2021-11-05 14:41:37 +00:00
Arthur Eubanks 7175886a0f [NewPM] Make eager analysis invalidation per-adaptor
Follow-up change to D111575.
We don't need eager invalidation on every adaptor. Most notably,
adaptors running passes that use very few analyses, or passes that
purely invalidate specific analyses.

Also allow testing of this via a pipeline string
"function<eager-inv>()".

The compile time/memory impact of this is very comparable to D111575.
https://llvm-compile-time-tracker.com/compare.php?from=9a2eec512a29df45c90c2fcb741e9d5c693b1383&to=b9f20bcdea138060967d95a98eab87ce725b22bb&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D113196
2021-11-04 17:16:11 -07:00
Liren Peng 57e093162e [ScalarEvolution] Infer loop max trip count from array accesses
Data references in a loop should not access elements over the
statically allocated size. So we can infer a loop max trip count
from this undefined behavior.

Reviewed By: reames, mkazantsev, nikic

Differential Revision: https://reviews.llvm.org/D109821
2021-11-03 10:40:18 +08:00
Nikita Popov c00e9c6345 [BasicAA] Check known access sizes earlier (NFC)
All heuristics for variable accesses require both access sizes to
be known, so check this once at the start, rather than for each
particular heuristic.
2021-11-02 21:26:26 +01:00
Nikita Popov 0b6ed92c8a [BasicAA] Use early returns (NFC)
Reduce nesting in aliasGEP() a bit by returning early.
2021-11-02 21:17:36 +01:00
Nikita Popov 51e9f33603 [BasicAA] Use saturating multiply on range if nsw
If we know that the var * scale multiplication is nsw, we can use
a saturating multiplication on the range (as a good approximation
of an nsw multiply). This recovers some cases where the fix from
D112611 is unnecessarily strict. (This can be further strengthened
by using a saturating add, but we currently don't track all the
necessary information for that.)

This exposes an issue in our NSW tracking for multiplies. The code
was assuming that (X +nsw Y) *nsw Z results in
(X *nsw Z) +nsw (Y *nsw Z) -- however, it is possible that the
distributed multiplications overflow, even if the non-distributed
one does not. We should discard the nsw flag if the the offset is
non-zero. If we just have (X *nsw Y) *nsw Z then concluding
X *nsw (Y *nsw Z) is fine.

Differential Revision: https://reviews.llvm.org/D112848
2021-11-02 20:27:39 +01:00
Arthur Eubanks e2024d72fa Revert "[NFC] Remove LinkAll*.h"
This reverts commit fe364e5dc7.

Causes breakages, e.g. https://lab.llvm.org/buildbot/#/builders/188/builds/5266
2021-11-02 09:08:09 -07:00
Arthur Eubanks fe364e5dc7 [NFC] Remove LinkAll*.h
These were added to prevent functions from being removed by WPO.

But that doesn't make sense, correct WPO will not remove functions we actually use.

I noticed these because compiling cc1_main.cpp was pulling in random LLVM pass headers.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112971
2021-11-02 08:43:17 -07:00
Arthur Eubanks 029f1a5344 [LazyCallGraph] Skip blockaddresses
blockaddresses do not participate in the call graph since the only
instructions that use them must all return to someplace within the
current function. And passes cannot retrieve a function address from a
blockaddress.

This was suggested by efriedma in D58260.

Fixes PR50881.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D112178
2021-11-01 13:10:24 -07:00
Nikita Popov 4972d12185 [SCEV] Only add direct loop users (NFC)
It it now sufficient to track only direct addrec users of a loop,
and let the SCEVUsers mechanism track and invalidate transitive users.

Differential Revision: https://reviews.llvm.org/D112875
2021-11-01 18:49:43 +01:00
Max Kazantsev e512c5b166 [SCEV][NFC] Factor out common API for getting unique operands of a SCEV
This function is used at least in 2 places, to it makes sense to make it separate.

Differential Revision: https://reviews.llvm.org/D112516
Reviewed By: reames
2021-11-01 11:36:47 +07:00
Kazu Hirata c8b1ed5fb2 [clang, llvm] Use Optional::getValueOr (NFC) 2021-10-30 19:00:21 -07:00
David Green 2c4a9e830c [ValueTracking] Teach computeConstantRange that the maximum value of a half is 65504
The maximal value of a half is 0x7bff, which is 65504 when converted to
an integer. This patch teaches that to computeConstantRange to compute a
constant range with the correct maximum value.
https://alive2.llvm.org/ce/z/BV_Spb
https://alive2.llvm.org/ce/z/Nwuqvb

The maximum value for a float converted in the same way is 3.4e38, which
requires 129bits of data. I have not added that here as integer types so
larger are rare, compared to integers types larger than 17 bits require
for half floats.

The MVE tests change because instsimplify happens to be run as a part of
the backend, where it doesn't tend to for other backends.

Differential Revision: https://reviews.llvm.org/D112694
2021-10-30 14:27:38 +01:00
Kazu Hirata 972d4133e9 Use {DenseSet,SmallPtrSet}::contains (NFC) 2021-10-29 20:26:07 -07:00
Nikita Popov cdf45f98ca [BasicAA] Extract linear expression multiplication (NFC)
Extract a common method for multiplying a linear expression by a
factor.
2021-10-29 22:41:40 +02:00
Nikita Popov 7cf7378a9d [BasicAA] Don't treat non-inbounds GEP as nsw
The scale multiplication is only guaranteed to be nsw if the GEP
is inbounds (or the multiplication is trivial). Previously we were
only considering explicit muls in GEP indices.
2021-10-29 22:30:44 +02:00
modimo 5caad9b5d3 [InlineAdvisor] Add fallback/format switches and negative remark processing to Replay Inliner
Adds the following switches:

1. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback: controls what the replay advisor does for inline sites that are not present in the replay. Options are:

 1. Original: defers to original advisor
 2. AlwaysInline: inline all sites not in replay
 3. NeverInline: inline no sites not in replay

2. --sample-profile-inline-replay-format/--cgscc-inline-replay-format: controls what format should be generated to match against the replay remarks. Options are:

  1. Line
  2. LineColumn
  3. LineDiscriminator
  4. LineColumnDiscriminator

Adds support for negative inlining decisions. These are denoted by "will not be inlined into" as compared to the positive "inlined into" in the remarks.

All of these together with the previous `--sample-profile-inline-replay-scope/--cgscc-inline-replay-scope` allow tweaking in how to apply replay. In my testing, I'm using:
1. --sample-profile-inline-replay-scope/--cgscc-inline-replay-scope = Function to only replay on a function
2. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback = NeverInline since I'm feeding in only positive remarks to the replay system
3. --sample-profile-inline-replay-format/--cgscc-inline-replay-format = Line since I'm generating the remarks from DWARF information from GCC which can conflict quite heavily in column number compared to Clang

An alternative configuration could be to do Function, AlwaysInline, Line fallback with negative remarks which closer matches the final call-sites. Note that this can lead to unbounded inlining if a negative remark doesn't match/exist for one reason or another.

Updated various tests to cover the new switches and negative remarks

Testing:
ninja check-all

Reviewed By: wenlei, mtrofin

Differential Revision: https://reviews.llvm.org/D112040
2021-10-29 12:32:03 -07:00
Peter Waller 98f08752f7 [InstCombine][ConstantFolding] Make ConstantFoldLoadThroughBitcast TypeSize-aware
The newly added test previously caused the compiler to fail an
assertion. It looks like a strightforward TypeSize upgrade.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D112142
2021-10-28 12:15:15 +00:00
Max Kazantsev 513914e1f3 [SCEV] Invalidate user SCEVs along with operand SCEVs to avoid cache corruption
Following discussion in D110390, it seems that we are suffering from unability
to traverse users of a SCEV being invalidated. The result of that is that ScalarEvolution's
inner caches may store obsolete data about SCEVs even if their operands are
forgotten. It creates problems when we try to verify the contents of those caches.

It's also a frequent situation when messing with cache causes very sneaky and
hard-to-analyze bugs related to corruption of memory when dealing with cached
data. They are lurking there because ScalarEvolution's veirfication is not powerful
enough and misses many problematic cases. I plan to make SCEV's verification
much stricter in follow-ups, and this requires dangling-pointers-free caches.

This patch makes sure that, whenever we forget cached information for a SCEV,
we also forget it for all SCEVs that (transitively) use it.

This may have negative compile time impact. It's a sacrifice we are more
than willing to make to enforce correctness. We can also save some time by
reworking invokers of forgetMemoizedResults (maybe we can forget multiple
SCEVs with single query).

Differential Revision: https://reviews.llvm.org/D111533
Reviewed By: reames
2021-10-28 09:39:24 +07:00
Nikita Popov 665060ea45 [BasicAA] Remove misleading overflow check
GEP decomposition currently checks whether the multiplication of
the linear expression offset and GEP scale overflows. However, if
everything else works correctly, this overflow check is both
unnecessary and dangerously misleading. While it will avoid an
overflow in Scale * Offset in particular, other parts of the
calculation (including those on dynamic values) may still overflow.
The code working on the decomposed GEPs is responsible for ensuring
that it remains correct in the presence of overflow. D112611 fixes
the last issue of that kind that I'm aware of (in fact, the overflow
check was originally introduced to work around precisely that issue).

Differential Revision: https://reviews.llvm.org/D112618
2021-10-27 20:56:03 +02:00
Philip Reames 425cbbc602 [Operator] Add hasPoisonGeneratingFlags [mostly NFC]
This method parallels the dropPoisonGeneratingFlags on Instruction, but is hoisted to operator to handle constant expressions as well.

This is mostly code movement, but I did go ahead and add the inrange constexpr gep case.  This had been discussed previously, but apparently never followed up o.
2021-10-27 11:25:40 -07:00
Nikita Popov fbc0c308d5 [BasicAA] Handle known bits as ranges
BasicAA currently tries to determine that the offset is positive by
checking whether all variable indices are positive based on known
bits, multiplied by a positive scale. However, this is incorrect
if the scale multiplication might overflow. In the modified test
case the original value is positive, but may be negative after a
left shift.

Fix this by converting known bits into a constant range and reusing
the range-based logic, which handles overflow correctly.

Differential Revision: https://reviews.llvm.org/D112611
2021-10-27 14:41:31 +02:00
Nikita Popov 9bc7e543b4 [BasicAA] Make range check more precise
Make the range check more precise by calculating the range of
potentially accessed bytes for both accesses and checking whether
their intersection is empty. In that case there can be no overlap
between the accesses and the result is NoAlias.

This is more powerful than the previous approach, because it can
deal with sign-wrapped ranges. In the test case the original range
is [-1, INT_MAX] but becomes [0, INT_MIN] after applying the offset.
This is a wrapping range, so getSignedMin/getSignedMax will treat
it as a full range. However, the range excludes the elements
[INT_MIN+1, -1], which is enough to prove NoAlias with an access
at offset -1.

Differential Revision: https://reviews.llvm.org/D112486
2021-10-27 12:40:58 +02:00
Max Kazantsev 5961f0308f [SCEV][NFC] Verify intergity of SCEVUsers
Make sure that, for every living SCEV, we have all its direct
operand tracking it as their user.

Differential Revision: https://reviews.llvm.org/D112402
Reviewed By: reames
2021-10-27 09:54:49 +07:00
Nikita Popov 3a995c918e [SCEV] Move SCEVLostPoisonFlags() check into SCEVExpander
Always insert values into ExprValueMap, and instead skip using them
in SCEVExpander if poison-generating flags have been lost. This
ensures that all values that are in ValueExprMap are also in
ExprValueMap, so we can use the latter to invalidate the former.

This change is probably not entirely NFC for the case where
originally the SCEV had no nowrap flags but they were inferred
later, in which case that would now allow reusing the existing
value for expansion.

Differential Revision: https://reviews.llvm.org/D112389
2021-10-25 22:37:20 +02:00
Nikita Popov 0d20ebf686 [BasicAA] Use ranges for more than one index
D109746 made BasicAA use range information to determine the
minimum/maximum GEP offset. However, it was limited to the case of
a single variable index. This patch extends support to multiple
indices by adding all the ranges together.

Differential Revision: https://reviews.llvm.org/D112378
2021-10-25 15:30:50 +02:00
Nikita Popov 75384ecdf8 [InstSimplify] Refactor invariant.group load folding
Currently strip.invariant/launder.invariant are handled by
constructing constant expressions with the intrinsics skipped.
This takes an alternative approach of accumulating the offset
using stripAndAccumulateConstantOffsets(), with a flag to look
through invariant.group intrinsics.

Differential Revision: https://reviews.llvm.org/D112382
2021-10-25 10:56:25 +02:00
Kazu Hirata 3729a5abf4 [SCEV] Fix a warning on an unused lambda capture
This patch fixes:

  llvm/lib/Analysis/ScalarEvolution.cpp:12770:37: error: lambda
  capture 'this' is not used [-Werror,-Wunused-lambda-capture]
2021-10-25 00:45:18 -07:00
Max Kazantsev f8623b0783 [SCEV][NFC] Win some compile time from mass forgetMemoizedResults
Mass forgetMemoizedResults can be done more efficiently than bunch
of individual invocations of helper because we can traverse maps being
updated just once, rather than doing this for each invidivual SCEV.

Should be NFC and supposedly improves compile time.

Differential Revision: https://reviews.llvm.org/D112294
Reviewed By: reames
2021-10-25 14:09:41 +07:00
Max Kazantsev dbab339ea4 [SCEV][NFC] Apply mass forgetMemoizedResults queries where possible
When forgetting multiple SCEVs, rather than doing this one by one, we can
instead use mass updates. We plan to make them more efficient than they
are now, potentially improving compile time.

Differential Revision: https://reviews.llvm.org/D111602
Reviewed By: reames
2021-10-25 13:50:49 +07:00
Max Kazantsev a6096b7f9e [SCEV][NFC] Introduce API for mass forgetMemoizedResults query
This patch changes signature of forgetMemoizedResults to be able to work with
multiple SCEVs. Usage will come in follow-ups. We also plan to optimize it in the
future to work faster than individual invalidation updates. Should not change
behavior in any sense.

Split-off from D111602.

Differential Revision: https://reviews.llvm.org/D112293
Reviewed By: reames
2021-10-25 13:49:31 +07:00
Max Kazantsev 1c18ebb2cc [NFC][SCEV] Do not track users of SCEVConstants
Follow-up from D112295, suggested by Nikita: we can avoid tracking
users of SCEVConstants because dropping their cached info is unlikely
to give any new prospects for fact inference, and it should not introduce
any correctness problems.
2021-10-25 12:30:46 +07:00
Max Kazantsev fea4a48c0b [SCEV][NFC] API for tracking of SCEV users
This patch introduces API that keeps track of SCEVs users of
another SCEVs, required to handle invalidations of users along
with operands that comes in follow-up patches.

Differential Revision: https://reviews.llvm.org/D112295
Reviewed By: reames
2021-10-25 12:14:18 +07:00
Kazu Hirata 4bd46501c3 Use llvm::any_of and llvm::none_of (NFC) 2021-10-24 17:35:33 -07:00
Philip Reames a461fa64bb Treat branch on poison as immediate UB (under an off by default flag)
The LangRef clearly states that branching on a undef or poison value is immediate undefined behavior, but historically, we have not been consistent about implementing that interpretation in the optimizer. Historically, we used (in some cases) a more relaxed model which essentially looked for provable UB along both paths which was control dependent on the condition. However, we've never been 100% consistent here. For instance SCEV uses the strong model for increments which form AddRecs (and only addrecs).

At the moment, the last big blocker for finally making this switch is enabling the fix landed in D106041. Loop unswitching (in it's classic form) is incorrect as it creates many "branch on poisons" when unswitching conditions originally unreachable within the loop.

This change adds a flag to value tracking which allows to easily test the optimization potential of treating branch on poison as immediate UB. It's intended to help ease work on getting us finally through this transition and avoid multiple independent rediscovers of the same issues.

Differential Revision: https://reviews.llvm.org/D112026
2021-10-24 14:42:03 -07:00
Nikita Popov 0c7f85d786 [InstSimplify] Simplify fetching of index size (NFC)
Directly fetch the size instead of going through the index type
first.
2021-10-23 22:08:15 +02:00
Nikita Popov 710596a1e1 [ConstantFolding] Accept offset in ConstantFoldLoadFromConstPtr (NFCI)
As this API is now internally offset-based, we can accept a starting
offset and remove the need to create a temporary bitcast+gep
sequence to perform an offset load. The API now mirrors the
ConstantFoldLoadFromConst() API.
2021-10-23 17:59:39 +02:00
Kazu Hirata d8e4170b0a Ensure newlines at the end of files (NFC) 2021-10-23 08:45:29 -07:00
Kazu Hirata d14d7068b6 [llvm] Use StringRef::contains (NFC) 2021-10-23 08:45:27 -07:00
Nikita Popov c5b5b7f621 [ConstantFolding] Remove ConstantFoldLoadThroughGEPIndices() API (NFC)
The last user of this API went away in
4f5e9a2bb2.
2021-10-23 16:59:29 +02:00
Nikita Popov 4f5e9a2bb2 [SCEV] Remove computeLoadConstantCompareExitLimit() (NFCI)
The functionality of this method is already covered by
computeExitCountExhaustively() in a more general fashion. It was
added at a time when exhaustive exit count calculation did not
support constant folding loads yet. I double checked that dropping
this code causes no binary changes in test-suite.

Differential Revision: https://reviews.llvm.org/D112343
2021-10-23 15:34:25 +02:00
Nikita Popov 61cfdf636d [BasicAA] Model implicit trunc of GEP indices
GEP indices larger than the GEP index size are implicitly truncated
to the index size. BasicAA currently doesn't model this, resulting
in incorrect alias analysis results.

Fix this by explicitly modelling truncation in CastedValue in the
same way we do zext and sext. Additionally we need to disable a
number of optimizations for truncated values, in particular
"non-zero" and "non-equal" may no longer hold after truncation.
I believe the constant offset heuristic is also not necessarily
correct for truncated values, but wasn't able to come up with a
test for that one.

A possible followup here would be to use the new mechanism to
model explicit trunc as well (which should be much more common,
as it is the canonical form). This is straightforward, but omitted
here to separate the correctness fix from the analysis improvement.

(Side note: While I say "index size" above, BasicAA currently uses
the pointer size instead. Something for another day...)

Differential Revision: https://reviews.llvm.org/D110977
2021-10-22 23:47:02 +02:00
Nikita Popov 3a10fe2d89 [Loads] Use more powerful constant folding API
This follows up on D111023 by exporting the generic "load value
from constant at given offset as given type" and using it in the
store to load forwarding code. We now need to make sure that the
load size is smaller than the store size, previously this was
implicitly ensured by ConstantFoldLoadThroughBitcast().

Differential Revision: https://reviews.llvm.org/D112260
2021-10-22 18:33:03 +02:00
Nikita Popov 1848525842 [CodeMetrics] Don't require speculatability for ephemeral values
As discussed in D112016, our current requirement of speculatability
for ephemeral is overly strict: What we really care about is that
the instruction will be DCEd once the assume is dropped. For that
it is sufficient that the instruction is side-effect free and not
a terminator.

In particular, this allows non-dereferenceable loads to be ephemeral
values.

Differential Revision: https://reviews.llvm.org/D112179
2021-10-21 20:30:01 +02:00
Arthur Eubanks 3781a46c3c Revert "[IPT] Restructure cache to allow lazy update following invalidation [NFC]"
This reverts commit baea663a6e.

Causes crashes, e.g. https://lab.llvm.org/buildbot/#/builders/77/builds/10715.
2021-10-21 10:48:41 -07:00
Philip Reames baea663a6e [IPT] Restructure cache to allow lazy update following invalidation [NFC]
This change restructures the cache used in IPT to point not to the first special instruction, but to the first instruction which *could* be special. That is, the cached reference is always equal to the first special, or comes before it in the block.

This avoids expensive block scans when we are removing special instructions from the beginning of the block. At the moment, this case is not heavily used, though it does trigger in GVN when doing CSE of calls. The main motivation was a change I'm no longer planning to move forward with, but the cache optimization seemed worthwhile as a minor perf win at low cost.

Differential Revision: https://reviews.llvm.org/D111768
2021-10-21 09:16:21 -07:00
Arthur Eubanks 00500d5bad [NFC] De-template LazyCallGraph::visitReferences() and move into .cpp file
This makes changing it and recompiling it much faster.
2021-10-20 10:50:00 -07:00
Bjorn Pettersson 9c44a0996c [SCEV] Fix formatting error introduced by D112080
Accidentally pushed D112080 without this clang-format cleanup.
2021-10-19 21:44:07 +02:00
Bjorn Pettersson 08619006a0 [SCEV] Avoid compile time explosion in ScalarEvolution::isImpliedCond
As seen in PR51869 the ScalarEvolution::isImpliedCond function might
end up spending lots of time when doing the isKnownPredicate checks.

Calling isKnownPredicate for example result in isKnownViaInduction
being called, which might result in isLoopBackedgeGuardedByCond being
called, and then we might get one or more new calls to isImpliedCond.
Even if the scenario described here isn't an infinite loop, using
some random generated C programs as input indicates that those
isKnownPredicate checks quite often returns true. On the other hand,
the third condition that needs to be fulfilled in order to "prove
implications via truncation", i.e. the isImpliedCondBalancedTypes
check, is rarely fulfilled.
I also made some similar experiments to look at how often we would
get the same result when using isKnownViaNonRecursiveReasoning instead
of isKnownPredicate. So far I haven't seen a single case when codegen
is negatively impacted by using isKnownViaNonRecursiveReasoning. On
the other hand, it seems like we get rid of the compile time explosion
seen in PR51869 that way. Hence this patch.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112080
2021-10-19 21:37:57 +02:00
Arthur Eubanks ecd25edfc5 [InlineCost] Add empty line between call sites when printing inline costs 2021-10-18 13:56:48 -07:00
Arthur Eubanks b8ce97372d [NewPM] Add PipelineTuningOption to eagerly invalidate analyses
This trades off more compile time for less peak memory usage. Right now
it invalidates all function analyses after a module->function or
cgscc->function adaptor.

https://llvm-compile-time-tracker.com/compare.php?from=1fb24fe85a19ae71b00875ff6c96ef1831dcf7e3&to=cb28ddb063c87f0d5df89812ab2de9a69dd276db&stat=instructions
https://llvm-compile-time-tracker.com/compare.php?from=1fb24fe85a19ae71b00875ff6c96ef1831dcf7e3&to=cb28ddb063c87f0d5df89812ab2de9a69dd276db&stat=max-rss

For now this is just experimental.

See comments on why this may affect optimizations.

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D111575
2021-10-18 13:20:35 -07:00
modimo 313c657fce [InlineAdvisor] Add -inline-replay-scope=<Function|Module> to control replay scope
The goal is to allow grafting an inline tree from Clang or GCC into a new compilation without affecting other functions. For GCC, we're doing this by extracting the inline tree from dwarf information and generating the equivalent remarks.

This allows easier side-by-side asm analysis and a trial way to see if a particular inlining setup provides benefits by itself.

Testing:
ninja check-all

Reviewed By: wenlei, mtrofin

Differential Revision: https://reviews.llvm.org/D110658
2021-10-18 13:08:39 -07:00
Kirill Stoimenov 62627c7217 [Sanitizers] Replaced getMaxPointerSizeInBits with getPointerSizeInBits, which was causing failures for 32bit x86.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D111829
2021-10-18 09:31:14 -07:00
Sanjay Patel 2a3cc4d461 [Analysis] add utility function for unary shuffle mask creation
This is NFC-intended for the callers. Posting in case there are
other potential users that I missed.
I would also use this from VectorCombine in a patch for:
https://llvm.org/PR52178 ( D111901 )

Differential Revision: https://reviews.llvm.org/D111891
2021-10-18 09:00:39 -04:00
Nikita Popov 274b2439f8 [ConstantRange] Add fast signed multiply
The multiply() implementation is very slow -- it performs six
multiplications in double the bitwidth, which means that it will
typically work on allocated APInts and bypass fast-path
implementations. Add an additional implementation that doesn't
try to produce anything better than a full range if overflow is
possible. At least for the BasicAA use-case, we really don't care
about more precise modeling of overflow behavior. The current
use of multiply() is fine while the implementation is limited to
a single index, but extending it to the multiple-index case makes
the compile-time impact untenable.
2021-10-17 16:41:49 +02:00
Simon Pilgrim d464a9d476 [Analysis] Replace assert(isa)/dyn_cast with cast. NFC.
cast<> will perform the assertion for us.

Removes a static analysis null dereference warning.
2021-10-16 11:40:19 +01:00
Simon Pilgrim a1b43d2bc9 [LazyValueInfo] getPredicateAt - remove unnecessary null pointer check. NFC.
We already dereference the CxtI pointer several times before reaching the "if(CxtI)", we have no need to check it again.

Fixes a coverity warning.
2021-10-16 11:20:19 +01:00
Simon Pilgrim c288241795 [ConstantFolding] ConstantFoldScalarCall2 - early-out if getLibFunc fails. NFC. 2021-10-16 11:20:19 +01:00
Simon Pilgrim c18cf10a04 [ConstantFolding] Use getValueAPF const ref value where possible. NFC.
Don't copy the value if we can avoid it.
2021-10-16 11:20:19 +01:00
Simon Pilgrim 76ca0d67ab [ConstantFolding] ConstantFoldScalarCall1 - early-out if getLibFunc fails. NFC. 2021-10-16 11:20:18 +01:00
Nikita Popov 0c52c271a5 [BasicAA] Rename ExtendedValue to CastedValue (NFC)
As suggested on D110977, rename ExtendedValue to CastedValue,
because it will contain more than just extensions in the future.
2021-10-15 21:56:54 +02:00
Max Kazantsev 90ae538cab [SCEV] Prove implication of predicates to their sign-flipped counterparts
This patch teaches SCEV two implication rules:

  x <u y && y >=s 0 --> x <s y,
  x <s y && y <s 0 --> x <u y.

And all equivalents with signs/parts swapped.

Differential Revision: https://reviews.llvm.org/D110517
Reviewed By: nikic
2021-10-15 11:49:18 +07:00
Max Kazantsev 1202d280c6 [SCEV][NFC] Reduce memory footprint & compile time via DFS refactoring
Current implementations of DFS in SCEV check unique-visited of traversed
values on pop, and not on push. As result, the same value may be pushed
multiple times just to be thrown away when popped. These operations are
meaningless and only waste time and increase memory footprint of the
worklist.

This patch reworks the DFS strategy to check uniqueness before push.
Should be NFC.

Differential Revision: https://reviews.llvm.org/D111774
Reviewed By: nikic, reames
2021-10-15 10:19:15 +07:00
Artur Pilipenko 3f96f7b30c Fix getInlineCost with ComputeFullInlineCost enabled
Fix a bug when getInlineCost incorrectly returns a
cost/threshold pair instead of an explicit never inline.

Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D111687
2021-10-14 17:41:41 -07:00
Nikita Popov 69853f9920 [IVUsers] Move preheader check into SCEVExpander
Rather than checking for loop nest preheaders upfront in IVUsers,
move this requirement into isSafeToExpand() from SCEVExpander.

Historically, LSR did not check whether SCEVs are safe to expand
and fully relied on IVUsers to validate this. Later, support for
non-expandable SCEVs was added via rigid formulas.

Checking this in isSafeToExpand() makes it more obvious what
exactly this check is guarding against, and avoids the awkward
loop nest scan.

This is a followup to https://reviews.llvm.org/D111493#3055286.

Differential Revision: https://reviews.llvm.org/D111681
2021-10-14 21:52:31 +02:00
Nikita Popov 5f05ff081f [BasicAA] Improve scalable vector handling
Currently, DecomposeGEP() bails out on the whole decomposition if
it encounters a scalable GEP type anywhere. However, it is fine to
still analyze other GEPs that we look through before hitting the
scalable GEP. This does mean that the decomposed GEP base is no
longer required to be the same as the underlying object. However,
I don't believe this property is necessary for correctness anymore.

This allows us to compute slightly more precise aliasing results
for GEP chains containing scalable vectors, though my primary
interest here is simplifying the code.

Differential Revision: https://reviews.llvm.org/D110511
2021-10-14 20:23:50 +02:00
Kevin P. Neal 727a891ec8 [FPEnv][InstSimplify] Fold fadd X, 0 ==> X, when we know X is not -0
Currently the fadd optimizations in InstSimplify don't know how to do this
NoSignedZeros "X + 0.0 ==> X" fold when using the constrained intrinsics.
This adds the support.

This review is derived from D106362 with some improvements from D107285
and is a follow-on to D111085.

Differential Revision: https://reviews.llvm.org/D111450
2021-10-14 12:32:45 -04:00
Nikita Popov a8e7d11aca [ValueTracking] Simplify getKnowledgeValidInContext() call (NFC)
This accepts an ArrayRef, there's no need to create a SmallVector.
2021-10-14 18:17:54 +02:00
Max Kazantsev 6e1308bc10 [SCEV][NFC] Simplify check with CI->isZero() exit condition
Replace check with
    if ((ExitIfTrue && CI->isZero()) || (!ExitIfTrue && CI->isOne()))
with equivalent and simpler version
    if (ExitIfTrue == CI->isZero())
2021-10-14 14:06:52 +07:00
Max Kazantsev 46a1dd47e6 [SCEV][NFC] Reorder checks to delay call of all_of
Check lightweight getter condition before calling all_of.
2021-10-14 13:30:51 +07:00
Mircea Trofin 6c76d01011 [mlgo][aot] requrie the model is autogenerated for test determinism
The tests that exercise the 'release' mode, where the model is AOT-ed,
check the output has certain properties, to validate that, indeed, a
different policy from the default one was exercised. For determinism, we
can't reliably check that output for an arbitrary learned policy, since
it could be that policy happens to mimic the default one in that
particular case.

This patch adds a requirement that those tests run only when the model
is autogenerated (e.g. on build bots).

Differential Revision: https://reviews.llvm.org/D111747
2021-10-13 14:02:41 -07:00
Arthur Eubanks 3628bb7436 Make various assume bundle data structures use uint64_t
Following D110451, we need to make sure to support 64 bit values.
2021-10-13 10:38:41 -07:00
Philip Reames 24c9016574 [instcombine] propagate single use freeze(gep inbounds X)
This is a follow on for D111675 which implements the gep case. I'd originally left it out because I was hoping to actually implement the inrange todo, but after a bit of staring at the code, decided to leave it as is since it doesn't effect this use case (i.e. instcombine requires the op to freeze to be an instruction).

Differential Revision: https://reviews.llvm.org/D111691
2021-10-13 09:25:00 -07:00
Philip Reames 4c5702cb12 Fix bug introduced with 6f34839 (poison flags on floating point ops)
The newly introduced API for checking whether poison comes solely from flags which can be dropped was out of sync.  This was noticed by a reviewer post commit.

For the moment, disable the floating point flags.  In a follow up change, I plan to add support in dropPoisonGeneratingFlags, but that deserves to be a change of it's own.
2021-10-12 20:25:00 -07:00
Philip Reames 6f34839407 [instcombine] propagate freeze through single use poison producing flag instruction
If we have an instruction which produces poison only when flags are specified on the instruction, then we know that freezing the operands and dropping flags is equivalent to freezing the result. If we know those flags don't result in any undefined behavior being executed, then there's no point in preserving the flags as we gain no knowledge by having them.

This patch extends the existing propagation logic which sinks freeze to single potential non-poison operands to allow dropping of flags when we know the freeze is the sole use of the instruction with poison flags.

The main value is that we tend to sink freezes towards the phi in IV cycles where the incoming value to the phi is the freeze of an IV increment. This will in turn (in a future patch), let us fold the freeze through the phi into the loop preheader. Motivated by eliminating need for CanonicalizeFreezeInLoops for the clearly profitable cases from onephi.ll test case in the test directory.

Differential Revision: https://reviews.llvm.org/D111675
2021-10-12 13:52:41 -07:00
Hongtao Yu 098a0d8fbc [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3.
This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation.

Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are:
- Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes
- Unblocked CSE by avoiding pseudo probe from clobbering memory SSA
- Unblocked induction variable simpliciation
- Allow empty loop deletion by treating probe intrinsic isDroppable
- Some refactoring.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D110847
2021-10-12 09:44:12 -07:00
Nikita Popov 2a2a37d972 [IVUsers] Check for preheader instead of loop simplify form
IVUsers currently makes sure that all loops dominating a user are
in loop simplify form, because SCEVExpander needs a preheader to
insert into. However, loop simplify form requires much more than
that. In particular, it requires dedicated exits, which means that
exits need to be found and walked. For large functions with many
nested loops, this can result in pathological compile-time explosion.

Fix this by only checking the property we're actually interested in,
which is incidentally cheap to check.

Differential Revision: https://reviews.llvm.org/D111493
2021-10-11 23:13:13 +02:00
Roman Lebedev 684cbae89a
[KnownBits] Introduce `countMaxActiveBits()` and use it in a few places 2021-10-11 23:36:06 +03:00
Arthur Eubanks 259390de9a [LCG] Don't skip invalidation of LazyCallGraph if CFG analyses are preserved
The CFG being changed and the overall call graph are not related, we can introduce/remove calls without changing the CFG.

Resolves one of the issues in PR51946.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D111275
2021-10-11 13:30:47 -07:00
Philip Reames 7f55209cee [SCEV] Extend trip count to avoid overflow by default
As a brief reminder, an "exit count" is the number of times the backedge executes before some event. It can be zero if we exit before the backedge is reached. A "trip count" is the number of times the loop header is entered if we branch into the loop. In general, TC = BTC + 1 and thus a zero trip count is ill defined

There is a cornercases which we don't handle well. Let's assume i8 for our examples to keep things simple. If BTC = 255, then the correct trip count is 256. However, 256 is not representable in i8.

In theory, code which needs to reason about trip counts is responsible for checking for this cornercase, and either bailing out, or handling it correctly. Historically, we don't have a great track record about actually doing so.

When reviewing D109676, I found myself asking a basic question. Was there any good reason to preserve the current wrap-to-zero behavior when converting from backedge taken counts to trip counts? After reviewing existing code, I could not find a single case which appears to correctly and precisely handle the overflow case.

This patch changes the default behavior to extend instead of wrap. That is, if the result might be 256, we return a value of i9 type to ensure we interpret the count correctly. I did leave the legacy behavior as an option since a) loop-flatten stops triggering if I extend due to weirdly specific pattern matching I didn't understand and b) we could reasonably use the mode if we'd externally established a lack of overflow.

I want to emphasize that this change is *not* NFC. There are two call sites (one in ScalarEvolution.cpp, one in LoopCacheAnalysis.cpp) which are switched to the extend semantics. The former appears imprecise (but correct) for a constant 255 BTC. The later appears incorrect, though I don't have a test case.

Differential Revision: https://reviews.llvm.org/D110587
2021-10-11 09:55:55 -07:00
David Sherwood 26b7d9d622 [LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns
This patch adds further support for vectorisation of loops that involve
selecting an integer value based on a previous comparison. Consider the
following C++ loop:

  int r = a;
  for (int i = 0; i < n; i++) {
    if (src[i] > 3) {
      r = b;
    }
    src[i] += 2;
  }

We should be able to vectorise this loop because all we are doing is
selecting between two states - 'a' and 'b' - both of which are loop
invariant. This just involves building a vector of values that contain
either 'a' or 'b', where the final reduced value will be 'b' if any lane
contains 'b'.

The IR generated by clang typically looks like this:

  %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ]
  ...
  %pred = icmp ugt i32 %val, i32 3
  %phi.update = select i1 %pred, i32 %b, i32 %phi

We already detect min/max patterns, which also involve a select + cmp.
However, with the min/max patterns we are selecting loaded values (and
hence loop variant) in the loop. In addition we only support certain
cmp predicates. This patch adds a new pattern matching function
(isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp.
We only support selecting values that are integer and loop invariant,
however we can support any kind of compare - integer or float.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
  Transforms/LoopVectorize/select-cmp-predicated.ll
  Transforms/LoopVectorize/select-cmp.ll

Differential Revision: https://reviews.llvm.org/D108136
2021-10-11 09:41:38 +01:00
Clement Courbet 83ded5d323 re-land "[AA] Teach BasicAA to recognize basic GEP range information."
Now that PR52104 is fixed.
2021-10-11 10:04:22 +02:00
Nick Desaulniers 9697f93587 [InlineCost] model calls to llvm.is.constant* more carefully
llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values.

A common use case for llvm.is.constant comes from the higher level
__builtin_constant_p. A common usage pattern of __builtin_constant_p in
the Linux kernel is:

    void foo (int bar) {
      if (__builtin_constant_p(bar)) {
        // lots of code that will fold away to a constant.
      } else {
        // a little bit of code, usually a libcall.
      }
    }

A minor issue in InlineCost calculations is when `bar` is _not_ Constant
and still will not be after inlining, we don't discount the true branch
and the inline cost of `foo` ends up being the cost of both branches
together, rather than just the false branch.

This leads to code like the above where inlining will not help prove bar
Constant, but it still would be beneficial to inline foo, because the
"true" branch is irrelevant from a cost perspective.

For example, IPSCCP can sink a passed constant argument to foo:

    const int x = 42;
    void bar (void) { foo(x); }

This improves our inlining decisions, and fixes a few head scratching
cases were the disassembly shows a relatively small `foo` not inlined
into a lone caller.

We could further improve this modeling by tracking whether the argument
to llvm.is.constant* is a parameter of the function, and if inlining
would allow that parameter to become Constant. This idea is noted in a
FIXME comment.

Link: https://github.com/ClangBuiltLinux/linux/issues/1302

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D111272
2021-10-08 15:27:30 -07:00
Philip Reames edf31b4db1 [IPT] Add a statistic to track instructions scanned to answer queries
I'm planning some changes to the invalidation mechanism here, and having a concrete mechanism to track progress is key.
2021-10-08 10:59:35 -07:00
Philip Reames b4498e6b8d [IPT] Narrow scope of removeInstruction invalidation [NFC]
We only need to invalidate if the instruction being removed is the cached "first special instruction".  If the instruction is before that one, it can't (by assumption) be special.  If it is after that one, it wasn't the first.
2021-10-08 10:35:03 -07:00
Philip Reames d694dd0f0d Add iterator range variants of isGuaranteedToTransferExecutionToSuccessor [mostly-nfc]
This factors out utilities for scanning a bounded block of instructions since we have this code repeated in a bunch of places.  The change to InlineFunction isn't strictly NFC as the limit mechanism there didn't handle debug instructions correctly.
2021-10-08 09:50:10 -07:00
Nikita Popov c77a5c21bb [BasicAA] Use base of decomposed GEP in recursive queries (NFC)
DecompGEP.Base and UnderlyingV are currently always the same.
However, logically DecompGEP.Base is the right value to use here,
because the decomposed offset is relative to that base.
2021-10-07 22:08:41 +02:00
Paul Robinson aec66f895b [PS4][TargetLibraryInfo] Set TLI info correctly for PS4 2021-10-07 10:03:31 -07:00
Sanjay Patel fdbf2bb4ee [InstSimplify] (x || y) && (x || !y) --> x
https://alive2.llvm.org/ce/z/4BE33w

This is the logical (select-form) equivalent of the bitwise logic fold:
e36d351d19

This is another part of solving the regression from:
https://llvm.org/PR52077
2021-10-07 12:25:25 -04:00
Erik Desjardins 11c8efd4db [Inline] Introduce Constant::hasOneLiveUse, use it instead of hasOneUse in inline cost model (PR51667)
Otherwise, inlining costs may be pessimized by dead constants.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51667.

Reviewed By: mtrofin, aeubanks

Differential Revision: https://reviews.llvm.org/D109294
2021-10-07 08:33:25 -07:00
Itay Bookstein 40ec1c0f16 [IR][NFC] Rename getBaseObject to getAliaseeObject
To better reflect the meaning of the now-disambiguated {GlobalValue,
GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction
(D109792), the function is renamed to getAliaseeObject.
2021-10-06 19:33:10 -07:00
Kuba Mracek 7329abf2f8 [GlobalDCE] In VFE, replace the whole 'sub' expression of unused relative-pointer-based vtable slots
Differential Revision: https://reviews.llvm.org/D109114
2021-10-06 15:55:55 -07:00
Philip Reames 1183d65b4d [SCEV] Search operand tree for scope bound when inferring flags from IR
When checking to see if we can apply IR flags to a SCEV, we need to identify a bound on the defining scope of the SCEV to be produced.  We'd previously added support for a couple SCEVExpr types which trivially imply bounds, but hadn't handled types such as umax where the bounds come from the bounds of the operands.  This does the obvious thing, and recurses through operands searching for a tighter bound on the defining scope.

I'm honestly surprised by how little this seems to mater on existing tests, but it's worth doing for completeness sake alone.

Differential Revision: https://reviews.llvm.org/D111191
2021-10-06 15:10:02 -07:00
Nikita Popov 17c20a6dfb [SCEV] Avoid unnecessary domination checks (NFC)
When determining the defining scope, avoid repeatedly querying
dominationg against the function entry instruction. This ends up
begin a very common case that we can handle more efficiently.
2021-10-06 22:14:04 +02:00
Philip Reames a7ae227baf [scev] minor style improvement [nfc] 2021-10-06 12:15:16 -07:00
Philip Reames 67896f494e Returning poison from a function w/ noundef return attribute is UB
This does for readability of returns within said function as what we do for the caller side when reasoning about what might be poison.

Differential Revision: https://reviews.llvm.org/D111180
2021-10-06 11:52:18 -07:00
Philip Reames 0658bab870 [SCEV] Infer flags from add/gep in any block
This patch removes a compile time restriction from isSCEVExprNeverPoison. We've strengthened our ability to reason about flags on scopes other than addrecs, and this bailout prevents us from using it. The comment is also suspect as well in that we're in the middle of constructing a SCEV for I. As such, we're going to visit all operands *anyways*.

Differential Revision: https://reviews.llvm.org/D111186
2021-10-06 11:11:54 -07:00
Kevin P. Neal f86c930cc9 [FPEnv][InstSimplify] Fold constrained X + -0.0 ==> X
Currently the fadd optimizations in InstSimplify don't know how to do this
"X + -0.0 ==> X" fold when using the constrained intrinsics. This adds the
support.

This commit is derived from D106362 with some improvements from D107285.

Differential Revision: https://reviews.llvm.org/D111085
2021-10-06 13:52:31 -04:00
Nikita Popov 1301a8b473 [BasicAA] Don't unnecessarily extend pointer size
BasicAA GEP decomposition currently performs all calculation on the
maximum pointer size, but at least 64-bit, with an option to double
the size. The code comment claims that this improves analysis power
when working with uint64_t indices on 32-bit systems. However, I don't
see how this can be, at least while maintaining correctness:

When working on canonical code, the GEP indices will have GEP index
size. If the original code worked on uint64_t with a 32-bit size_t,
then there will be truncs inserted before use as a GEP index. Linear
expression decomposition does not look through truncs, so this will
be an opaque value as far as GEP decomposition is concerned. Working
on a wider pointer size does not help here (or have any effect at all).

When working on non-canonical code (before first InstCombine), the
GEP indices are implicitly truncated to GEP index size. The BasicAA
code currently just ignores this fact completely, and pretends that
this truncation doesn't happen. This is incorrect and will be
addressed by D110977.

I believe that for correctness reasons, it is important to work on
the actual GEP index size to properly model potential overflow.
BasicAA tries to patch over the fact that it uses the wrong size
(see adjustToPointerSize), but it only does that in limited cases
(only for constant values, and not all of them either). I'd like to
move this code towards always working on the correct size, and
dropping these artificial pointer size adjustments is the first step
towards that.

Differential Revision: https://reviews.llvm.org/D110657
2021-10-06 18:40:21 +02:00
Sanjay Patel e36d351d19 [InstSimplify] (x | y) & (x | !y) --> x
https://alive2.llvm.org/ce/z/QagQMn

This fold is handled by instcombine via SimplifyUsingDistributiveLaws(),
but we are missing the sibliing fold for 'logical and' (implemented with
'select'). Retrofitting the code in instcombine looks much harder
than just adding a small adjustment here, and this is potentially more
efficient and beneficial to other passes.
2021-10-06 12:31:25 -04:00
Clement Courbet 3255015407 Fix incomplete conflict resolution in ff41fc07b1 2021-10-06 16:55:14 +02:00
Clement Courbet ff41fc07b1 Revert "[AA] Teach BasicAA to recognize basic GEP range information."
We have found a miscompile with this change, reverting while working on a
reproducer.

This reverts commit 455b60ccfb.
2021-10-06 16:49:10 +02:00
Mircea Trofin 7d541eb4d4 [inliner] Mandatory inlining decisions produce remarks
This also removes the need to disable the mandatory inlining phase in
tests.

In a departure from the previous remark, we don't output a 'cost' in
this case, because there's no such thing. We just report that inlining
happened because of the attribute.

Differential Revision: https://reviews.llvm.org/D110891
2021-10-05 14:01:25 -07:00
Nikita Popov 0be9940ef2 [SCEV] Don't check if propagation safe if there are no flags (NFC)
If there are no nowrap flags, then we don't need to determine
whether propagating flags is safe -- it will make no difference.
2021-10-05 22:25:41 +02:00
Philip Reames c608b49d67 [SCEV] Tweak the algorithm for figuring out if flags must apply to a SCEV [mostly-NFC]
Behavior wise, this patch should be mostly NFC.  The only behavior difference known is that on the isSCEVExprNeverPoison path we'll consider a bound imposed by the SCEVable operands (if any).

Algorithmically, it's an invert of the existing code.  Previously, we checked for each operand if we could find a bound, then checked for must-execute given that bound.  With the patch, we use dominance to refine the innermost bound, then check must execute once.  The interesting case is when we have multiple unknowns within a single basic block.  While both dominance and must-execute are worst-case linear walks within the block, only dominance is cached.  As such, refining based on dominance should be more efficient.
2021-10-05 11:20:48 -07:00
Nikita Popov c117d77e93 [ConstantFold] Refactor load folding
This refactors load folding to happen in two cleanly separated
steps: ConstantFoldLoadFromConstPtr() takes a pointer to load from
and decomposes it into a constant initializer base and an offset.
Then ConstantFoldLoadFromConst() loads from that initializer at
the given offset. This makes the core logic independent of having
actual GEP expressions (and those GEP expressions having certain
structure) and will allow exposing ConstantFoldLoadFromConst() as
an independent API in the future.

This is mostly only a refactoring, but it does make the folding
logic slightly more powerful.

Differential Revision: https://reviews.llvm.org/D111023
2021-10-05 18:07:57 +02:00
Nikita Popov 30001af84e [BasicAA] Ignore CanBeFreed in minimal extent reasoning
When determining NoAlias based on object size and dereferenceability
information, we can ignore frees for the same reason we can ignore
possible null pointers (if null is not a valid pointer): Actually
accessing the null pointer / freed pointer would be immediate UB,
and AA results are only valid under the assumption of an access.

This addresses a minor regression from D110745.

Differential Revision: https://reviews.llvm.org/D111028
2021-10-04 22:08:57 +02:00
Bjorn Pettersson 7f84fa4ad4 [TargetLibraryInfo] Refactor size_t checks in isValidProtoForLibFunc. NFC
In TargetLibraryInfoImpl::isValidProtoForLibFunc we no longer
need the IsSizeTTy lambda function and the SizeTTy object. Instead
we just follow the regular structure of checking for integer types
given an exepected number of bits.
2021-10-04 15:46:39 +02:00
Jay Foad a9bceb2b05 [APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807
2021-10-04 08:57:44 +01:00
Philip Reames 5f7a535330 [SCEV] Cap the number of instructions scanned when infering flags
This addresses a comment from review on D109845.  The concern was raised that an unbounded scan would be expensive.  Long term plan is to cache this search - likely reusing the existing mechanism for loop side effects - but let's be simple and conservative for now.
2021-10-03 16:14:06 -07:00
Philip Reames 35ab211c37 [SCEV] Use trivial bound on defining scope of all SCEVs when computing flags
This addresses a comment from review on D109845.  Even for SCEVs which we can't find true bounds without recursing through operands, entry to the function forms a trivial upper bound.  In some cases, this trivial bound is enough to prove safety of flag inference.
2021-10-03 16:01:30 -07:00
Philip Reames d02db32644 [SCEV] Use full logic when infering flags on add and gep
This is a followon to D109845. With that landed, we will have fixed all known instances of pr51817, and can thus start inferring flags more aggressively with greatly reduced risk of miscompiles. This patch simply applies the same inference logic used in that patch to our other major flag inference path.

We can still do much better here (on both paths), but this is our first step.

Differential Revision: https://reviews.llvm.org/D111003
2021-10-03 15:32:15 -07:00
Philip Reames f39978b84f [SCEV] Correctly propagate nowrap flags across scopes when folding invariant add through addrec
This fixes a violation of the wrap flag rules introduced in c4048d8f. This is an alternate fix to D106852.

The basic problem being fixed is that we infer a set of flags which is valid at some inner scope S1 (usually by correctly propagating them from IR), and then (incorrectly) extend them to a SCEV in scope S2 where S1 != S2. This is not in general safe per the wrap flags semantics recently defined.

In this patch, I include a simple inference step to handle the case where we can prove that S2 is the preheader of the loop S1, and that entry into S2 implies execution of S1. See the code for a more detailed explanation.

One worry I have with this patch is that I might be over-fitting what shows up in tests - and thus hiding negative impact we'd see in the real world. My best defense is that the rule used here very closely follows the one used to propagate the flags from IR to the inner add to start with, and thus if one is reasonable, so probably is the other. Curious what others think about that piece.

The test diffs are roughly as expected. Mostly analysis only, with two transform changes. Oddly, the result looks better in the loop-idiom test, and I don't understand the PPC output enough to have tell. Nothing terrible looking though. (For context, without the scope inference peephole, the test delta includes a couple of vectorization tests. Again, not super concerning, but slightly more so.)

Differential Revision: https://reviews.llvm.org/D109845
2021-10-03 15:19:33 -07:00
Kazu Hirata d34cd75d89 [Analysis, CodeGen] Migrate from arg_operands to args (NFC)
Note that arg_operands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-10-03 08:22:20 -07:00
Dávid Bolvanský 5f2f611880 Fixed more warnings in LLVM produced by -Wbitwise-instead-of-logical 2021-10-03 13:58:10 +02:00
Philip Reames 26223af256 [SCEV] Split isSCEVExprNeverPoison reasoning explicitly into scope and mustexecute parts [NFC]
Inspired by the needs to D111001 and D109845.  The seperation of concerns also amakes it easier to reason about correctness and completeness.
2021-10-02 13:10:38 -07:00
Philip Reames 2ca8a3f213 [SCEV] Stop blindly propagating flags from inbound geps to SCEV nodes
This fixes a violation of the wrap flag rules introduced in c4048d8f. This was also noted in the (very old) PR23527.

The issue being fixed is that we assume the inbound flag on any GEP assumes that all users of *any* gep (or add) which happens to map to that SCEV would also be UB if the (other) gep overflowed. That's simply not true.

In terms of the test diffs, I don't see anything seriously problematic. The lost flags are expected (given the semantic restriction on when its legal to tag the SCEV), and there are several cases where the previously inferred flags are unsound per the new semantics.

The only common trend I noticed when looking at the deltas is that by not considering branch on poison as immediate UB in ValueTracking, we do miss a few cases we could reclaim. We may be able to claw some of these back with the follow ideas mentioned in PR51817.

It's worth noting that most of the changes are analysis result only changes. The two transform changes are pretty minimal. In one case, we miss the opportunity to infer a nuw (correctly). In the other, we fail to fold an exit and produce a loop invariant form instead. This one is probably over-reduced as the program appears to be undefined in practice, and neither before or after exploits that.

Differential Revision: https://reviews.llvm.org/D109789
2021-10-01 16:30:44 -07:00
Philip Reames 24cde2f602 [SCEV] Remove invariant requirement from isSCEVExprNeverPoison
This code is attempting to prove that I must execute if we enter the defining scope of the SCEV which will be created from I. In the case where it found a defining addrec scope, it had a rather odd restriction that all of the other operands must be loop invariant in that addrec's loop.

As near as I can tell here, we really only need a upper bound on the defining scope. If we can prove the stronger property, then we must also have proven the property on the exact defining scope as well.

In practice, the actual effect of this change is narrow. The compile time restriction at the top of the routine basically limits us to I being an arithmetic in some loop L with both an addrec operand in L, and a unknown operands in L. Possible to demonstrate, but the main value of the change is removing unneeded code.

Differential Revision: https://reviews.llvm.org/D110892
2021-10-01 15:57:37 -07:00
Krasimir Georgiev 685f1bfd0a Revert "[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns"
It appears to cause stage2 clang build failures, e.g.,
https://lab.llvm.org/buildbot/#/builders/74/builds/7145.

This reverts commit 1fb37334bd.
2021-10-01 11:39:43 +02:00
David Sherwood 1fb37334bd [LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns
This patch adds further support for vectorisation of loops that involve
selecting an integer value based on a previous comparison. Consider the
following C++ loop:

  int r = a;
  for (int i = 0; i < n; i++) {
    if (src[i] > 3) {
      r = b;
    }
    src[i] += 2;
  }

We should be able to vectorise this loop because all we are doing is
selecting between two states - 'a' and 'b' - both of which are loop
invariant. This just involves building a vector of values that contain
either 'a' or 'b', where the final reduced value will be 'b' if any lane
contains 'b'.

The IR generated by clang typically looks like this:

  %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ]
  ...
  %pred = icmp ugt i32 %val, i32 3
  %phi.update = select i1 %pred, i32 %b, i32 %phi

We already detect min/max patterns, which also involve a select + cmp.
However, with the min/max patterns we are selecting loaded values (and
hence loop variant) in the loop. In addition we only support certain
cmp predicates. This patch adds a new pattern matching function
(isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp.
We only support selecting values that are integer and loop invariant,
however we can support any kind of compare - integer or float.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
  Transforms/LoopVectorize/select-cmp-predicated.ll
  Transforms/LoopVectorize/select-cmp.ll

Differential Revision: https://reviews.llvm.org/D108136
2021-10-01 08:41:03 +01:00
Philip Reames c5e491e6ee [SCEV] Modernize code style of isSCEVExprNeverPoison [NFC]
Use for-range and all_of to make code easier to read in advance of other changes.
2021-09-30 15:13:43 -07:00
Florian Hahn 1fbdbb5595
Revert "Recommit "[SCEV] Look through single value PHIs." (take 2)"
This reverts commit 764d9aa979.

This patch exposed a few additional cases where SCEV expressions are not
properly invalidated.

See PR52024, PR52023.
2021-09-30 20:53:51 +01:00
Nikita Popov b989211d7d [BasicAA] Move more extension logic into ExtendedValue (NFC)
Add methods to appropriately extend KnownBits/ConstantRange there,
same as with APInt. Also clean up the known bits handling by
actually doing that extension rather than checking ZExtBits. This
doesn't matter now, but becomes relevant once truncation is
involved.
2021-09-30 20:45:12 +02:00
Nikita Popov ea02f9caff [BasicAA] Use ExtendedValue in VariableGEPIndex (NFC)
Use the ExtendedValue structure which is used for LinearExpression
in VariableGEPIndex as well.
2021-09-30 18:48:51 +02:00
Kazu Hirata f631173d80 [llvm] Migrate from arg_operands to args (NFC)
Note that arg_operands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-09-30 08:51:21 -07:00
Clement Courbet 455b60ccfb [AA] Teach BasicAA to recognize basic GEP range information.
The information can be implicit (from `ValueTracking`) or explicit.

This implements the backend part of the following RFC
https://groups.google.com/g/llvm-dev/c/T9o51zB1JY.

We still need to settle on how to best represent the information in the
IR, but this is a separate discussion.

Differential Revision: https://reviews.llvm.org/D109746
2021-09-30 08:29:32 +02:00
Nikita Popov 2898101552 [BasicAA] Move DecomposedGEP out of header (NFC)
It's sufficient to have a forward declaration in the header, we
can move the definition of the struct (and VariableGEPIndex)
in the source file.
2021-09-29 23:45:15 +02:00
Nikita Popov 45288edb65 [BasicAA] Pass whole DecomposedGEP to subtraction API (NFC)
Rather than separately handling subtraction of offset and variable
indices, make this one operation. Also rewrite the implementation
to use range-based for loops.
2021-09-29 23:32:15 +02:00
Nikita Popov 49813f7fbf [BasicAA] Pass DecomposedGEP to constantOffsetHeuristic() (NFC)
Rather than separately passing VarIndices and BaseOffset, pass
the whole DecomposedGEP.
2021-09-29 22:23:27 +02:00
Sanjay Patel 4414e2ad97 [InstSimplify] (-1 << x) s>> x --> -1
This was noticed in:
https://llvm.org/PR51351

https://alive2.llvm.org/ce/z/aLxunD
2021-09-29 13:03:12 -04:00
Paul Robinson 56e681afcc [TargetLibraryInfo] Pick new/delete calls by target
There are two sets of new/delete functions, one with Windows/MSVC
mangling and one with Itanium mangling. Mark one set or the other
as unavailable depending on the target.

Split the test malloc-free-delete.ll into three parts: malloc-free.dll
for the C API tests, new-delete-itanium.ll and new-delete-msvc.ll for
the target-specific new/delete tests.

Differential Revision: https://reviews.llvm.org/D110419
2021-09-28 10:10:25 -07:00
Alex Richardson 9049a1c61e [ConstantFolding] Fold ptrtoint(gep i8 null, x) -> x
I was looking at some missed optimizations in CHERI-enabled targets and
noticed that we weren't removing vtable indirection for calls via known
pointers-to-members. The underlying reason for this is that we represent
pointers-to-function-members as {i8 addrspace(200)*, i64} and generate the
constant offsets using (gep i8 null, <index>). We use a constant GEP here
since inttoptr should be avoided for CHERI capabilities. The pointer-to-member
call uses ptrtoint to extract the index, and due to this missing fold we can't
infer the actual value loaded from the vtable.
This is the initial constant folding change for this pattern, I will add
an InstCombine fold as a follow-up.

We could fold all inbounds GEP to null (and therefore the ptrtoint to
zero) since zero is the only valid offset for an inbounds GEP. If the
offset is not zero, that GEP is poison and therefore returning 0 is valid
(https://alive2.llvm.org/ce/z/Gzb5iH). However, Clang currently generates
inbounds GEPs on NULL for hand-written offsetof() expressions, so this
could lead to miscompilations.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D110245
2021-09-28 17:57:36 +01:00
Alex Richardson 3c51b9e270 Fix incorrect GEP bitwidth in areNonOverlapSameBaseLoadAndStore()
When using a datalayout that has pointer index width != pointer size this
code triggers an assertion in Value::stripAndAccumulateConstantOffsets().
I encountered this this while compiling FreeBSD for CHERI-RISC-V.
Also update LoadsTest.cpp to use a DataLayout with index width != pointer
width to ensure this case is tested.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D110406
2021-09-28 17:57:36 +01:00
Bjorn Pettersson 460efc1fb8 [Analysis] Be defensive when matching size_t in lib call signatures
When TargetLibraryInfoImpl::isValidProtoForLibFunc is checking
function signatures to detect lib calls it may check that a parameter
or return value matches with the "size_t" type. For this to work it
has to derive the IR type matching with "size_t". Depending on if
a DataLayout is provided or not, this has been done in two different
way. Either a more strict check being based on IntPtrType (which is
given by the DataLayout) or a more relaxed check assuming that any
integer type matches with "size_t".

Given that the stricter approach exist it seems like we do not want
to trigger rewrites etc if we aren't sure that a function calls
actually match with the library function. Therefore it was questioned
why we actually have the more relaxed approach when not being able
to derive an IR type for "size_t". This patch will take a more
defensive approach, requiring that a DataLayout is passed to
isValidProtoForLibFunc.

Differential Revision: https://reviews.llvm.org/D110584
2021-09-28 15:29:37 +02:00
Bjorn Pettersson 1f5ea14bca [Analysis] Add FIXME:s related to size_t type checks
Differential Revision: https://reviews.llvm.org/D110583
2021-09-28 15:29:37 +02:00
Florian Hahn 764d9aa979
Recommit "[SCEV] Look through single value PHIs." (take 2)
This reverts commit 8fdac7cb7a.

The issue causing the revert has been fixed a while ago in 60b852092c.

Original message:

    Now that SCEVExpander can preserve LCSSA form,
    we do not have to worry about LCSSA form when
    trying to look through PHIs. SCEVExpander will take
    care of inserting LCSSA PHI nodes as required.

    This increases precision of the analysis in some cases.

    Reviewed By: mkazantsev, bmahjour

    Differential Revision: https://reviews.llvm.org/D71539
2021-09-28 10:32:17 +01:00
modimo 20faf78919 [ThinLTO] Add noRecurse and noUnwind thinlink function attribute propagation
Thinlink provides an opportunity to propagate function attributes across modules, enabling additional propagation opportunities.

This change propagates (currently default off, turn on with `disable-thinlto-funcattrs=1`) noRecurse and noUnwind based off of function summaries of the prevailing functions in bottom-up call-graph order. Testing on clang self-build:
1. There's a 35-40% increase in noUnwind functions due to the additional propagation opportunities.
2. Throughput is measured at 10-15% increase in thinlink time which itself is 1.5% of E2E link time.

Implementation-wise this adds the following summary function attributes:
1. noUnwind: function is noUnwind
2. mayThrow: function contains a non-call instruction that `Instruction::mayThrow` returns true on (e.g. windows SEH instructions)
3. hasUnknownCall: function contains calls that don't make it into the summary call-graph thus should not be propagated from (e.g. indirect for now, could add no-opt functions as well)

Testing:
Clang self-build passes and 2nd stage build passes check-all
ninja check-all with newly added tests passing

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D36850
2021-09-27 12:28:07 -07:00
Nikita Popov 7a855596c3 [BasicAA] Don't check whether GEP is sized (NFC)
GEPs are required to have sized source element type, so we can
just assert that here.
2021-09-26 21:21:54 +02:00
Nikita Popov ba664d9066 [AA] Move earliest escape tracking from DSE to AA
This is a followup to D109844 (and alternative to D109907), which
integrates the new "earliest escape" tracking into AliasAnalysis.
This is done by replacing the pre-existing context-free capture
cache in AAQueryInfo with a replaceable (virtual) object with two
implementations: The SimpleCaptureInfo implements the previous
behavior (check whether object is captured at all), while
EarliestEscapeInfo implements the new behavior from DSE.

This combines the "earliest escape" analysis with the full power of
BasicAA: It subsumes the call handling from D109907, considers a
wider range of escape sources, and works with AA recursion. The
compile-time cost is slightly higher than with D109907.

Differential Revision: https://reviews.llvm.org/D110368
2021-09-25 22:40:41 +02:00
Nikita Popov 1c3859f31d [BasicAA] Don't consider Argument as escape source (NFCI)
The case of an Argument and an identified function local is already
handled earlier, because we don't care about captures in that case.
As such, we don't need to additionally consider the combination of
an Argument with a non-escaping identified function local.

This ensures that isEscapeSource() only returns true for
instructions, which is necessary for D110368.
2021-09-25 22:08:15 +02:00
Paul Robinson 6185ad03f1 [TargetLibraryInfo] Correctly handle sqrt*_finite
Other <math>_finite calls are marked as unavailable except on GNU/Linux;
it looks like the sqrt set was just overlooked.

Differential Revision: https://reviews.llvm.org/D110418
2021-09-24 11:57:38 -07:00
Florian Hahn 6f28fb7081
Recommit "[DSE] Track earliest escape, use for loads in isReadClobber."
This reverts the revert commit df56fc6ebb.

This version of the patch adjusts the location where the EarliestEscapes
cache is cleared when an instruction gets removed. The earliest escaping
instruction does not have to be a memory instruction.

It could be a ptrtoint instruction like in the added test
@earliest_escape_ptrtoint, which subsequently gets removed. We need to
invalidate the EarliestEscape entry referring to the ptrtoint when
deleting it.

This fixes the crash mentioned in
https://bugs.chromium.org/p/chromium/issues/detail?id=1252762#c6
2021-09-24 17:13:27 +01:00
Paul Robinson 1376ae9094 [TargetLibraryInfo][AMDGPU] Minor cleanup, NFC 2021-09-24 07:52:44 -07:00
Nico Weber df56fc6ebb Revert "[DSE] Track earliest escape, use for loads in isReadClobber."
This reverts commit 5ce89279c0.
Makes clang crash, see comments on https://reviews.llvm.org/D109844
2021-09-24 09:57:59 -04:00
David Sherwood 8e4f7b749c [Analysis] Fix another issue when querying vscale attributes on functions
There are several places in the code that are currently broken where
we assume an Instruction is always a member of a BasicBlock that
lives in a Function. This is a problem specifically when
attempting to get the vscale_range attribute. This patch adds checks
that an Instruction's parent also has a parent!

I've added a test for a function-less @llvm.vscale intrinsic call here:

  unittests/Analysis/ValueTrackingTest.cpp
2021-09-24 13:37:23 +01:00
David Sherwood c2634fc6ab [Analysis] Fix issues when querying vscale attributes on functions
There are several places in the code that are currently broken as
they assume an Instruction always has a parent Function when
attempting to get the vscale_range attribute. This patch adds checks
that an Instruction has a parent.

I've added a test for a parentless @llvm.vscale intrinsic call here:

  unittests/Analysis/ValueTrackingTest.cpp

Differential Revision: https://reviews.llvm.org/D110158
2021-09-24 09:58:10 +01:00
Fangrui Song 0bb767e7db [InlineAdvisor] Use one single quote 2021-09-23 12:16:15 -07:00
Florian Hahn 5ce89279c0
[DSE] Track earliest escape, use for loads in isReadClobber.
At the moment, DSE only considers whether a pointer may be captured at
all in a function. This leads to cases where we fail to remove stores to
local objects because we do not check if they escape before potential
read-clobbers or after.

Doing context-sensitive escape queries in isReadClobber has been removed
a while ago in d1a1cce5b1 to save compile-time. See PR50220 for more
context.

This patch introduces a new capture tracker, which keeps track of the
'earliest' capture. An instruction A is considered earlier than instruction
B, if A dominates B. If 2 escapes do not dominate each other, the
terminator of the common dominator is chosen. If not all uses cannot be
analyzed, the earliest escape is set to the first instruction in the
function entry block.

If the query instruction dominates the earliest escape and is not in a
cycle, then pointer does not escape before the query instruction.

This patch uses this information when checking if a load of a loaded
underlying object may alias a write to a stack object. If the stack
object does not escape before the load, they do not alias.

I will share a follow-up patch to also use the information for call
instructions to fix PR50220.

In terms of compile-time, the impact is low in general,
    NewPM-O3: +0.05%
    NewPM-ReleaseThinLTO: +0.05%
    NewPM-ReleaseLTO-g: +0.03

with the largest change being tramp3d-v4 (+0.30%)
http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Compared to always computing the capture information on demand, we get
the following benefits from the caching:
NewPM-O3: -0.03%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.04%

The biggest speedup is tramp3d-v4 (-0.21%).
http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Overall there is a small, but noticeable benefit from caching. I am not
entirely sure if the speedups warrant the extra complexity of caching.
The way the caching works also means that we might miss a few cases, as
it is less precise. Also, there may be a better way to cache things.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109844
2021-09-23 12:45:05 +01:00
Alex Richardson 05663dc146 [InstSimplify] Don't lose inbounds when simplifying a GEP
I noticed this while working on a (ptrtoint (gep null, x)) -> x fold.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D110168
2021-09-23 09:25:06 +01:00
Sanjay Patel a85d7a56c7 [ValueTracking] fix isOnlyUsedInZeroEqualityComparison with no users
This is another problem exposed by:
https://bugs.llvm.org/PR50836
2021-09-22 15:01:53 -04:00
Sanjay Patel b05804ab4c [Analysis] reduce code for isOnlyUsedInZeroEqualityComparison; NFC
There's a bug here noted by the FIXME and visible in variations of PR50836.
2021-09-22 14:57:53 -04:00
Sanjay Patel c240169ff2 [Analysis] improve function matching for strlen libcall
The return type of strlen is size_t, not just any integer.

This is a partial fix for an example based on:
https://llvm.org/PR50836

There's another bug here because we can still crash
processing a real strlen or something that looks like it.
2021-09-22 13:50:12 -04:00
Florian Mayer 36daf074d9 [hwasan] also omit safe mem[cpy|mov|set].
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D109816
2021-09-22 11:08:27 +01:00
George Burgess IV cd5f582c3d MemoryBuiltins: update comment; NFC
This comment references behavior that was removed in
ccae43a247, which is a commit from 5 years
ago. It seems safe to assume that that behavior won't be coming back
soon. If it does, we can readd this part of the comment :)
2021-09-21 13:47:26 -07:00
Michael Liao 2d1ffad010 [IR] Re-group AAMDNodes relevant interfaces. NFC. 2021-09-21 14:29:33 -04:00
Florian Hahn 5131037ea9
[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange.
isValidAssumeForContext can provide better results with access to the
dominator tree in some cases. This patch adjusts computeConstantRange to
allow passing through a dominator tree.

The use VectorCombine is updated to pass through the DT to enable
additional scalarization.

Note that similar APIs like computeKnownBits already accept optional dominator
tree arguments.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D110175
2021-09-21 16:54:47 +01:00
Michael Liao 5fb3ae525f [SelectionDAG] Re-calculate scoped AA metadata when merging stores.
Reviewed By: jeroen.dobbelaere

Differential Revision: https://reviews.llvm.org/D102821
2021-09-21 11:41:17 -04:00
Max Kazantsev cd166fb2ef [SCEV] Use isAvailableAtLoopEntry in the asserts
This is what is supposed to be there.
2021-09-21 17:11:15 +07:00
Max Kazantsev 4d5d725428 [SCEV] Add some asserts on availability of arguments of isLoopEntryGuardedByCond
The logic in howManyLessThans is fishy. It first checks invariance of
RHS, and then uses OrigRHS as argument for isLoopEntryGuardedByCond, which
is, strictly saying, a different thing. We are seeing a very rare intermittent
failure of availability checks, and it looks like this precondition is
sometimes broken. Before we can figure out what's going on, adding asserts
that all involved values that may possibly to to isLoopEntryGuardedByCond
are available at loop entry.

If either of these asserts fails (OrigRHS is the most likely suspect), it
means that the logic here is flawed.
2021-09-21 17:08:52 +07:00
Max Kazantsev 2c7d5fbc9e [SCEV] Generalize implication when signedness of FoundPred doesn't matter
The implication logic for two values that are both negative or non-negative
says that it doesn't matter whether their predicate is signed and unsigned,
but only flips unsigned into signed for further inference. This patch adds
support for flipping a signed predicate into unsigned as well.

Differential Revision: https://reviews.llvm.org/D109959
Reviewed By: nikic
2021-09-21 11:17:56 +07:00
Max Kazantsev a06db78fd9 [NFC] Rename Context->CtxI in SCEV for uniformity reasons 2021-09-21 10:12:20 +07:00
Nikita Popov dd0226561e [IR] Add helper to convert offset to GEP indices
We implement logic to convert a byte offset into a sequence of GEP
indices for that offset in a number of places. This patch adds a
DataLayout::getGEPIndicesForOffset() method, which implements the
core logic. I've updated SROA, ConstantFolding and InstCombine to
use it, and there's a few more places where it looks relevant.

Differential Revision: https://reviews.llvm.org/D110043
2021-09-20 20:18:16 +02:00
David Sherwood f988f68064 [Analysis] Add support for vscale in computeKnownBitsFromOperator
In ValueTracking.cpp we use a function called
computeKnownBitsFromOperator to determine the known bits of a value.
For the vscale intrinsic if the function contains the vscale_range
attribute we can use the maximum and minimum values of vscale to
determine some known zero and one bits. This should help to improve
code quality by allowing certain optimisations to take place.

Tests added here:

  Transforms/InstCombine/icmp-vscale.ll

Differential Revision: https://reviews.llvm.org/D109883
2021-09-20 15:01:59 +01:00
Florian Hahn 7f6a4826ac
[CaptureTracking] Allow passing LI to PointerMayBeCapturedBefore (NFC).
isPotentiallyReachable can use LoopInfo to return earlier. This patch
allows passing an optional LI to PointerMayBeCapturedBefore. Used in
D109844.

Reviewed By: nikic, asbirlea

Differential Revision: https://reviews.llvm.org/D109978
2021-09-20 09:07:34 +01:00
Max Kazantsev def15c5fb6 [SCEV] Support negative values in signed/unsigned predicate reasoning
There is a piece of logic that uses the fact that signed and unsigned
versions of the same predicate are equivalent when both values are
non-negative. It's also true when both of them are negative.

Differential Revision: https://reviews.llvm.org/D109957
Reviewed By: nikic
2021-09-20 11:26:33 +07:00
Kazu Hirata 84b07c9b3a [llvm] Use pop_back_val (NFC) 2021-09-19 13:44:23 -07:00
Arthur Eubanks 0db9481208 [NFC] Remove FIXMEs about calling LLVMContext::yield()
Nobody has complained about this, and the documentation for
LLVMContext::yield() states that LLVM is allowed to never call it.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D110008
2021-09-17 14:59:34 -07:00
Hongtao Yu c5fafc1e73 [CSSPGO] Tweakes to lower pseudo probe runtime overhead
A couple tweaks to

1. allow more thinlto importing by excluding probe intrinsics from IR size in module summary

2. Allow general default attributes (nofree nosync nounwind) for pseudo probe intrinsic. Without those attributes, pseudo probes will be basically treated as unknown calls which will in turn block their containing functions from annotated with those attributes.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D109976
2021-09-17 12:28:09 -07:00
Nikita Popov 0fc624f029 [IR] Return AAMDNodes from Instruction::getMetadata() (NFC)
getMetadata() currently uses a weird API where it populates a
structure passed to it, and optionally merges into it. Instead,
we can return the AAMDNodes and provide a separate merge() API.
This makes usages more compact.

Differential Revision: https://reviews.llvm.org/D109852
2021-09-16 21:06:57 +02:00
Michael Liao ffa5c3a555 Fix warning on `llvm-else-after-return`. NFC. 2021-09-16 11:25:43 -04:00
Kazu Hirata 385f380e80 [MemorySSA] Fix "set but not used" warnings 2021-09-15 11:41:41 -07:00
Philip Reames 9bdb19cca2 [SCEV] (udiv X, Y) * Y is always NUW
Motivated by the removal done in D109782. This implements the correct flag part generically.

Differential Revision: https://reviews.llvm.org/D109786
2021-09-15 11:34:50 -07:00
Alina Sbirlea b759381b75 [MemorySSA] Add verification levels to MemorySSA. [NFC]
Add two levels of verification for MemorySSA: Fast and Full.
The defaults are kept the same. Full verification always occurs under
EXPENSIVE_CHECKS, but now it can also be requested in a specific pass for
debugging purposes.
2021-09-15 11:09:54 -07:00
David Green 61cc873a8e [LV] Recognize intrinsic min/max reductions
This extends the reduction logic in the vectorizer to handle intrinsic
versions of min and max, both the floating point variants already
created by instcombine under fastmath and the integer variants from
D98152.

As a bonus this allows us to match a chain of min or max operations into
a single reduction, similar to how add/mul/etc work.

Differential Revision: https://reviews.llvm.org/D109645
2021-09-15 10:45:50 +01:00
Markus Lavin 1ac209ed76 [NPM] Added -print-pipeline-passes print params for a few passes.
Added '-print-pipeline-passes' printing of parameters for those passes
declared with *_WITH_PARAMS macro in PassRegistry.def.

Note that it only prints the parameters declared inside *_WITH_PARAMS as
in a few cases there appear to be additional parameters not parsable.

The following passes are now covered (i.e. all of those with *_WITH_PARAMS in
PassRegistry.def).

LoopExtractorPass - loop-extract
HWAddressSanitizerPass - hwsan
EarlyCSEPass - early-cse
EntryExitInstrumenterPass - ee-instrument
LowerMatrixIntrinsicsPass - lower-matrix-intrinsics
LoopUnrollPass - loop-unroll
AddressSanitizerPass - asan
MemorySanitizerPass - msan
SimplifyCFGPass - simplifycfg
LoopVectorizePass - loop-vectorize
MergedLoadStoreMotionPass - mldst-motion
GVN - gvn
StackLifetimePrinterPass - print<stack-lifetime>
SimpleLoopUnswitchPass - simple-loop-unswitch

Differential Revision: https://reviews.llvm.org/D109310
2021-09-15 08:34:04 +02:00
Philip Reames 0dd755f027 [SCEV] Stop applying contextual flags in applyLoopGuards
This fixes a violation of the wrap flag rules introduced in c4048d8f. As noted in the original review, the NUW is legal to infer from the structure of the replacee, but a) there's no test coverage, and b) this should be done generically for all multiplies.

Differential Revision: https://reviews.llvm.org/D109782
2021-09-14 14:14:52 -07:00
Florian Hahn e248d69036
Recommit "[LAA] Support pointer phis in loop by analyzing each incoming pointer."
SCEV does not look through non-header PHIs inside the loop. Such phis
can be analyzed by adding separate accesses for each incoming pointer
value.

This results in 2 more loops vectorized in SPEC2000/186.crafty and
avoids regressions when sinking instructions before vectorizing.

Fixes PR50296, PR50288.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D102266
2021-09-14 11:19:12 +01:00
Kuba Mracek e80ee4cbd9 [GlobalDCE] In VFE support for relative pointers, allow GEP references to the base symbol
This is for Swift VFE support. In some vtable forms that Swift emits, the "base" of a relative pointer is not the global symbol itself directly, but a GEP into it -- so the pointer is relative to a particular field in the global. So getPointerAtOffset() needs to be able to see through the GEP and allow it in a SUB expression, to correctly recognize the offset as a vtable slot.

Differential Revision: https://reviews.llvm.org/D109169
2021-09-13 15:22:11 -07:00
Florian Mayer 0a22510f3e [value-tracking] see through returned attribute.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D109675
2021-09-13 20:52:26 +01:00
Florian Mayer 5b5d774f5d [hwasan] Respect returns attribute when tracking values.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D109233
2021-09-13 20:52:24 +01:00
Nikita Popov 45c467346a [LAA] Pass access type to getPtrStride()
Pass the access type to getPtrStride(), so it is not determined
from the pointer element type. Many cases still fetch the element
type at a higher level though, so this only partially addresses
the issue.
2021-09-11 19:16:49 +02:00
Johannes Doerfert c09fbbdcfb Reapply "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals""
This reapplies commit 7dbba3376f, or, put
differently, this reverts commit d9a8d20827.

The test now requires the amdgpu and nvptx backend explicitly as it
won't work without properly.
2021-09-10 15:22:56 -05:00
Florian Mayer 57335b6e2e [stack-safety] Allow to determine safe accesses.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D109503
2021-09-10 19:23:54 +01:00
Johannes Doerfert d9a8d20827 Revert "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals"
This reverts commit 7dbba3376f.

There seems to be a problem with the tests, investigating now:
  https://lab.llvm.org/buildbot/#/builders/61/builds/14574
2021-09-10 12:23:08 -05:00
Johannes Doerfert 7dbba3376f [GlobalOpt][FIX] Do not embed initializers into AS!=0 globals
Not all address spaces support initializers for globals and we can
therefore not set them without checking if they are allowed. This
patch adds a hook into TTI to check if an AS allows non-undef
initializers. We disable it for all but address space 0 by default,
NVPTX and AMDGPU targets allow all but address space 3.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D109337
2021-09-10 12:08:50 -05:00
Philip Reames bfa2a81e92 [ScalarEvolution] Add an additional bailout to avoid NOT of pointer.
It's possible in some cases for the LHS to be a pointer where the RHS is not. This isn't directly possible for an icmp, but the analysis mixes up operands of different icmp expressions in some cases.

This does not include a test case as the smallest reduced case we've managed is extremely fragile and unlikely to test anything meaningful in the long term.

Also add an assertion to getNotSCEV() to make tracking down this sort of issue a bit easier in the future.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51787 .

Differential Revision: https://reviews.llvm.org/D109546
2021-09-09 15:19:36 -07:00
Philip Reames eede4846a9 [SCEV] Allow negative steps for LT exit count computation for unsigned comparisons
This bit of code is incredibly suspicious. It allows fully unknown (but potentially negative) steps, but not steps known to be negative. The comment about scev flag inference is worrying, but also not correct to my knowledge.

At best, this might be covering up some related miscompile. However, there's no test in tree for it, the review history doesn't include obvious motivation, and the C++ example doesn't appear to give wrong results when hand translated to IR. I think it's time to remove this and see what falls out.

During review, there were concerns raised about the correctness of the corresponding signed case.  This change was deliberately narrowed to the unsigned case which has been auditted and appears correct for negative values.  We need to get back to the known-negative signed case, but that'll be a future patch if nothing falls out from this one.

Differential Revision: https://reviews.llvm.org/D104140
2021-09-09 14:09:29 -07:00
Eli Friedman 8f792707c4 [ScalarEvolution] Fix pointer/int confusion in howManyLessThans.
In general, howManyLessThans doesn't really want to work with pointers
at all; the result is an integer, and the operands of the icmp are
effectively integers.  However, isLoopEntryGuardedByCond doesn't like
extra ptrtoint casts, so the arguments to isLoopEntryGuardedByCond need
to be computed without those casts.

Somehow, the values got mixed up with the recent howManyLessThans
improvements; fix the confused values, and add a better comment to
explain what's happening.

Differential Revision: https://reviews.llvm.org/D109465
2021-09-09 12:38:33 -07:00
Chris Lattner 735f46715d [APInt] Normalize naming on keep constructors / predicate methods.
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`.  This achieves two things:

1) This starts standardizing predicates across the LLVM codebase,
   following (in this case) ConstantInt.  The word "Value" doesn't
   convey anything of merit, and is missing in some of the other things.

2) Calling an integer "null" doesn't make any sense.  The original sin
   here is mine and I've regretted it for years.  This moves us to calling
   it "zero" instead, which is correct!

APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go.  As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.

Included in this patch are changes to a bunch of the codebase, but there are
more.  We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.

Differential Revision: https://reviews.llvm.org/D109483
2021-09-09 09:50:24 -07:00
Florian Mayer 6e12c73316 [NFC] [stack-safety] add placeholder addRange.
This is in preparataion of D108457.
2021-09-09 13:13:18 +01:00
Florian Mayer d261d4cf55 [stack-safety] [NFC] do not terminate print with blank line. 2021-09-09 12:31:09 +01:00
Florian Mayer 08b4dd8b24 [NFC] [stack-safety] remove unused return value. 2021-09-09 12:19:47 +01:00
Philip Reames e741fabc22 [SCEV] Move getIndexExpressionsFromGEP to delinearize [NFC] 2021-09-08 16:56:49 -07:00
Philip Reames 4b5e260b1d [SCEV] Simplify findExistingSCEVInCache interface [NFC]
We were returning a tuple when all but one caller only cared about one piece of the return value.  That one caller can inline the complexity, and we can simplify all other uses.
2021-09-08 15:26:07 -07:00
Arthur Eubanks fe15347a1e Port the cost model printer to New PM
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109284
2021-09-08 14:47:05 -07:00
Michael Kruse 088577a38e [Delinerization] Require by offset to be zero.
Users of delinearization assume that the the offset into the array element is zero. In most cases it will indeed be zero, but if it is not, the delinearization has to fail since it violates that assumption without the API even allowing to signal to the caller that the by offset is non-zero.

This bug caused Polly to miscompile blender (526.blender_r from SPEC CPU 2017) in -polly-process-unprofitable mode. The SCEV expression incorrectly delinearized has been reduced in the test case byte_offset.ll. The dropped offset into the array element of size 4 (a float) is ((sext i32 %mul7.i4534 to i64) + {(sext i32 %i1 to i64),+,((sext i32 (1 + ((1 + %shl.i.i) * (1 + %shl.i.i)) + %shl.i.i) to i64) * (sext i32 %i1 to i64))}<%for.body703>). This significant component was just dropped, and the wrong pointer was computed when regenerating code from the remaining delinearized subscripts. This occurred during blender's subsurface scattering implementation. As a result, blender's rendering diverged from the reference image.

Patch D108885 would also fix the API.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D109133
2021-09-08 16:02:37 -05:00
Florian Hahn f4726e7238
[LAA] Remove unused OrigPtr from replaceSymbolicStrideSCEV (NFC).
The OrigPtr argument is not used in tree.
2021-09-08 22:35:36 +02:00
Arthur Eubanks b493124ae2 [MemorySSA] Support invariant.group metadata
The implementation is mostly copied from MemDepAnalysis. We want to look
at all loads and stores to the same pointer operand. Bitcasts and zero
GEPs of a pointer are considered the same pointer value. We choose the
most dominating instruction.

Since updating MemorySSA with invariant.group is non-trivial, for now
handling of invariant.group is not cached in any way, so it's part of
the walker. The number of loads/stores with invariant.group is small for
now anyway. We can revisit if this actually noticeably affects compile
times.

To avoid invariant.group affecting optimized uses, we need to have
optimizeUsesInBlock() not use invariant.group in any way.

Co-authored-by: Piotr Padlewski <prazek@google.com>

Reviewed By: asbirlea, nikic, Prazek

Differential Revision: https://reviews.llvm.org/D109134
2021-09-08 13:06:12 -07:00
Philip Reames 585c594d74 Move delinearization logic out of SCEV [NFC]
None of this logic has anything to do with SCEV's internals, it just uses the existing public APIs.  As a result, we can move the code from ScalarEvolution.cpp/hpp to Delinearization.cpp/hpp with only minor changes.

This was discussed in advance on today's loop opt call.  It turned out to be easy as hoped.
2021-09-08 12:28:35 -07:00
Philip Reames 6cdca906c7 [SCEV] Use no-self-wrap flags infered from exit structure to compute trip count
The basic problem being solved is that we largely give up when encountering a trip count involving an IV which is not an addrec. We will fall back to the brute force constant eval, but that doesn't have the information about the fact that we can't cycle back through the same set of values.

There's a high level design question of whether this is the right place to handle this, and if not, where that place is. The major alternative here would be to return a conservative upper bound, and then rely on two invocations of indvars to add the facts to the narrow IV, and then reconstruct SCEV. (I have not implemented the alternative and am not 100% sure this would work out.) That's arguably more in line with existing code, but I find this substantially easier to reason about.  During review, no one expressed a strong opinion, so we went with this one.

Differential Revision: D108651
2021-09-07 17:00:02 -07:00
Philip Reames 9659069978 [SCEV] Further clarify comments regarding UB and zero stride
Follow on to D109029. I realized we had no mention of mustprogrress in the comment (as it prexisted mustprogress in the codebase). In the process of adding it, I tweaked the preconditions into something I think is more clear. Note that mustprogress is checked in the code.

Differential Revision: https://reviews.llvm.org/D109091
2021-09-07 13:53:56 -07:00
Nikita Popov 58db5f6e95 [ConstFold] Support opaque pointers in constexpr GEPs
Support opaque pointers in SymbolicallyEvaluateGEP() by using the
value type of a GlobalValue base or falling back to i8 if there
isn't one. We don't unconditionally generate i8 GEPs here because
that would lose inrange attribues, and because some optimizations
on globals currently rely on GEP types (e.g. the globals SROA
mentioned in the comment).

Differential Revision: https://reviews.llvm.org/D109297
2021-09-07 20:50:29 +02:00
Kazu Hirata 5648f7170e [Analysis, Target, Transforms] Construct SmallVector with iterator ranges (NFC) 2021-09-07 09:19:33 -07:00
Nikita Popov 8d54c8a0c3 [SCEV] Fix applyLoopGuards() with range check idiom (PR51760)
Due to a typo, this replaced %x with umax(C1, umin(C2, %x + C3))
rather than umax(C1, umin(C2, %x)). This didn't make a difference
for the existing tests, because the result is only used for range
calculation, and %x will usually have an unknown starting range,
and the additional offset keeps it unknown. However, if %x already
has a known range, we may compute a result range that is too
small.
2021-09-06 22:22:41 +02:00
Andrew Litteken bd4b1b5f6d [IRSim] Adding support for recognizing branch similarity
The current IRSimilarityIdentifier does not try to find similarity across blocks, this patch provides a mechanism to compare two branches against one another, to find similarity across basic blocks, rather than just within them.

This adds a step in the similarity identification process that labels all of the basic blocks so that we can identify the relative branching locations. Within an IRSimilarityCandidate we use these relative locations to determine whether if the branching to other relative locations in the same region is the same between branches. If they are, we consider them similar.

We do not consider the relative location of the branch if the target branch is outside of the region. In this case, both branches must exit to a location outside the region, but the exact relative location does not matter.

Reviewers: paquette, yroux

Differential Revision: https://reviews.llvm.org/D106989
2021-09-06 11:55:38 -07:00
Sander de Smalen 96f6785bc9 [VectorUtils] Teach findScalarElement to return splat value.
If the vector is a splat of some scalar value, findScalarElement()
can simply return the scalar value if it knows the requested lane
is in the vector.

This is only needed for scalable vectors, because the InsertElement/ShuffleVector
case is already handled explicitly for the fixed-width case.

This helps to recognize an InstCombine fold like:
  extractelt(bitcast(splat(%v))) -> bitcast(%v)

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D107254
2021-09-06 10:56:06 +01:00
Michael Kruse 650bbc5620 [OpenMP][OpenMPIRBuilder] Implement loop unrolling.
Recommit of 707ce34b06. Don't introduce a
dependency to the LLVMPasses component, instead register the required
passes individually.

Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:

 * `unrollLoopFull`
 * `unrollLoopPartial`
 * `unrollLoopHeuristic`

`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.

With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.

Reviewed By: jdoerfert, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107764
2021-09-04 19:18:58 -05:00
Chen Zheng 34badc409c Revert "[HardwareLoops] Change order of SCEV expression construction for InitLoopCount."
This causes https://bugs.llvm.org/show_bug.cgi?id=51714 and
is not a right patch according to comments in D91724

This reverts commit 42eaf4fe0a.
2021-09-03 02:55:43 +00:00
Arthur Eubanks 813a7f1ad7 [MemorySSA] Properly handle liveOnEntry in the walker printer
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109177
2021-09-02 12:51:27 -07:00
Wenlei He f7fff46acc [CSSPGO] Allow inlining recursive call for preinliner
When preinliner is used for CSSPGO, we try to honor global preinliner decision as much as we can except for uninlinable callees. We rely on InlineCost::Never to prevent us from illegal inlining.

However, it turns out that we use InlineCost::Never for both illeagle inlining and some of the "not-so-beneficial" inlining.

The most common one is recursive inlining, while it can bloat size a lot during CGSCC bottom-up inlining, it's less of a problem when recursive inlining is guided by profile and done in top-down manner.

Ideally it'd be better to have a clear separation between inline legality check vs cost-benefit check, but that requires a bigger change.

This change enables InlineCost computation to allow inlining recursive calls, controlled by InlineParams. In SampleLoader, we now enable recursive inlining for CSSPGO when global preinliner decision is used.

With this change, we saw a few perf improvements on SPEC2017 with CSSPGO and preinliner on: 2% for povray_r, 6% for xalancbmk_s, 3% omnetpp_s, while size is about the same (no noticeable perf change for all other benchmarks)

Differential Revision: https://reviews.llvm.org/D109104
2021-09-02 11:24:27 -07:00
Daniil Suchkov 5c97507e2b [InlineCost] Introduce attributes to override InlineCost for inliner testing
This patch introduces four new string attributes: function-inline-cost,
function-inline-threshold, call-inline-cost and call-threshold-bonus.
These attributes allow you to selectively override some aspects of
InlineCost analysis. That would allow us to test inliner separately from
the InlineCost analysis.

That could be useful when you're trying to write tests for inliner and
you need to test some very specific situation, like "the inline cost has
to be this high", or "the threshold has to be this low". Right now every
time someone does that, they have get creative to come up with a way to
make the InlineCost give them the number they need (like adding ~30
load/add pairs for a trivial test). This process can be somewhat tedious
which can discourage some people from writing enough tests for their
changes. Also, that results in tests that are fragile and can be easily
broken without anyone noticing it because the test writer can't
explicitly control what input the inliner will get from the inline cost
analysis.

These new attributes will alleviate those problems to an extent.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D109033
2021-09-02 17:35:06 +00:00
Roman Lebedev 3f1f08f0ed
Revert @llvm.isnan intrinsic patchset.
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)

TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.

While the end result of discussion may lead back to the current design,
it may also not lead to the current design.

Therefore i take it upon myself
to revert the tree back to last known good state.

This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
2021-09-02 13:53:56 +03:00
Roman Lebedev 50634deaa5
Revert "[OpenMP][OpenMPIRBuilder] Implement loop unrolling."
Breaks build with -DBUILD_SHARED_LIBS=ON
```
CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle):
  "LLVMFrontendOpenMP" of type SHARED_LIBRARY
    depends on "LLVMPasses" (weak)
  "LLVMipo" of type SHARED_LIBRARY
    depends on "LLVMFrontendOpenMP" (weak)
  "LLVMCoroutines" of type SHARED_LIBRARY
    depends on "LLVMipo" (weak)
  "LLVMPasses" of type SHARED_LIBRARY
    depends on "LLVMCoroutines" (weak)
    depends on "LLVMipo" (weak)
At least one of these targets is not a STATIC_LIBRARY.  Cyclic dependencies are allowed only among static libraries.
CMake Generate step failed.  Build files cannot be regenerated correctly.
```

This reverts commit 707ce34b06.
2021-09-02 12:42:23 +03:00
Michael Kruse 707ce34b06 [OpenMP][OpenMPIRBuilder] Implement loop unrolling.
Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:

 * `unrollLoopFull`
 * `unrollLoopPartial`
 * `unrollLoopHeuristic`

`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.

With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.

Reviewed By: jdoerfert, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107764
2021-09-02 02:37:25 -05:00
Arthur Eubanks 7b08d9da55 Reland [MemorySSA] Add pass to print results of MemorySSA walker
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109028
2021-09-01 18:58:57 -07:00
Arthur Eubanks 0f63496ea4 Revert "[MemorySSA] Add pass to print results of MemorySSA walker"
This reverts commit 8f98477c2d.

Breaks bots
2021-09-01 18:45:19 -07:00
Arthur Eubanks 8f98477c2d [MemorySSA] Add pass to print results of MemorySSA walker
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109028
2021-09-01 18:29:15 -07:00
Philip Reames bb0fa3ea02 Revert "snapshot - do not push"
This reverts commit 91f4655d92.

This wasn't intented to be pushed, sorry.
2021-09-01 16:59:23 -07:00
Philip Reames 91f4655d92 snapshot - do not push 2021-09-01 16:59:01 -07:00
Alina Sbirlea a10409fe23 [MemorySSAUpdater] Simplify updates when only deleting edges.
When performing only edge deletion, we don't need to do the DT updates
back and forth. Check for the existance of insert updates to simplify
this.
2021-09-01 15:48:20 -07:00
Philip Reames 73b951a7f7 [SCEV] Clarify requirements for zero-stride to be UB
There's a silent bug in our reasoning about zero strides. We assume that having a single static exit implies that if that exit is not taken, then the loop must be infinite. This ignores the potential for abnormal exits via exceptions. Consider the following example:

for (uint_8 i = 0; i < 1; i += 0) {
  throw_on_thousandth_call();
}

Our reasoning is such that we'd conclude this loop can't take the backedge as that would lead to a (presumed) infinite loop.

In practice, this is a silent bug because the loopIsFiniteByAssumption returns false strictly more often than the loopHaNoAbnormalExits property. We could reasonable want to change that in the future, so fixing the codeflow now is worthwhile.

Differential Revision: https://reviews.llvm.org/D109029
2021-09-01 14:01:13 -07:00
Nikita Popov 02f74eadbe [IVDescriptors] Make pointer inductions compatible with opaque pointers
Store the used element type in the InductionDescriptor. For typed
pointers, it remains the pointer element type. For opaque pointers,
we always use an i8 element type, such that the step is a simple
offset.

A previous version of this patch instead tried to guess the element
type from an induction GEP, but this is not reliable, as the GEP
may be hidden (see @both in iv_outside_user.ll).

Differential Revision: https://reviews.llvm.org/D104795
2021-09-01 21:02:05 +02:00
Philip Reames 29fa37ec9f [SCEV] If max BTC is zero, then so is the exact BTC [2 of 2]
This extends D108921 into a generic rule applied to constructing ExitLimits along all paths. The remaining paths (primarily howFarToZero) don't have the same reasoning about UB sensitivity as the howManyLessThan ones did. Instead, the remain cause for max counts being more precise than exact counts is that we apply context sensitive loop guards on the max path, and not on the exact path. That choice is mildly suspect, but out of scope of this patch.

The MVETailPredication.cpp change deserves a bit of explanation. We were previously figuring out that two SCEVs happened to be equal because the happened to be identical. When we optimized one with context sensitive information, but not the other, we lost the ability to prove them equal. So, cover this case by subtracting and then applying loop guards again. Without this, we see changes in test/CodeGen/Thumb2/mve-blockplacement.ll

Differential Revision: https://reviews.llvm.org/D109015
2021-09-01 11:51:48 -07:00
Philip Reames 6600e1759b [SCEV] If max BTC is zero, then so is the exact BTC [1 of N]
This patch is specifically the howManyLessThan case.  There will be a couple of followon patches for other codepaths.

The subtle bit is explaining why the two codepaths have a difference while both are correct. The test case with modifications is a good example, so let's discuss in terms of it.
* The previous exact bounds for this example of (-126 + (126 smax %n))<nsw> can evaluate to either 0 or 1. Both are "correct" results, but only one of them results in a well defined loop. If %n were 127 (the only possible value producing a trip count of 1), then the loop must execute undefined behavior. As a result, we can ignore the TC computed when %n is 127. All other values produce 0.
* The max taken count computation uses the limit (i.e. the maximum value END can be without resulting in UB) to restrict the bound computation. As a result, it returns 0 which is also correct.

WARNING: The logic above only holds for a single exit loop. The current logic for max trip count would be incorrect for multiple exit loops, except that we never call computeMaxBECountForLT except when we can prove either a) no overflow occurs in this IV before exit, or b) this is the sole exit.

An alternate approach here would be to add the limit logic to the symbolic path. I haven't played with this extensively, but I'm hesitant because a) the term is optional and b) I'm not sure it'll reliably simplify away. As such, the resulting code quality from expansion might actually get worse.

This was noticed while trying to figure out why D108848 wasn't NFC, but is otherwise standalone.

Differential Revision: https://reviews.llvm.org/D108921
2021-08-31 08:50:11 -07:00
Kuba Mracek 4c066bd08b [GlobalDCE] Handle relative pointers in VFE (for Swift vtables)
To support Virtual Function Elimination to Swift, this PR adds support for Swift
vtables which contain "relative pointers" instead of direct pointer references.
These are in the form of:

@symbol = ... {
  i32 trunc (i64 sub (i64 ptrtoint (<type> @target to i64), i64 ptrtoint (... @symbol to i64)) to i32)
}

The PR extends GlobalDCE's way of looking up a vtable offset into a dependency
to be able to see through this expression and find the target symbol.

Differential Revision: https://reviews.llvm.org/D107645
2021-08-31 07:07:22 -07:00
Andrew Litteken cf56b08d15 [IRSim] Adding missing comments canonical relation commit
Adding missing comments to IRSimilarityIdentifier.cpp since
they were not properly added in commit 063af63b96.
2021-08-30 08:41:05 -07:00
Nikita Popov 9f7873784d [SCEVExpander] Reuse removePointerBase() for canonical addrecs
ExposePointerBase() in SCEVExpander implements basically the same
functionality as removePointerBase() in SCEV, so reuse it.

The SCEVExpander code assumes that the pointer operand on adds is
the last one -- I'm not sure that always holds. As such this might
not be strictly NFC.
2021-08-29 21:12:35 +02:00
Nikita Popov e6a5dd60ff [SCEV] Assert unique pointer base (NFC)
Add expressions can contain at most one pointer operand nowadays,
assert that in getPointerBase() and removePointerBase().
2021-08-29 20:06:24 +02:00
Kazu Hirata 0003d57434 [Analysis] Fix a "set but not used" warning 2021-08-28 06:37:01 -07:00
Andrew Litteken 063af63b96 [IRSim][IROutliner] Canonicalizing commutative value numbering between similarity sections.
When the initial relationship between two pairs of values between
similar sections is ambiguous to commutativity, arguments to the
outlined functions can be passed in such that the order is incorrect,
causing miscompilations.  This adds a canonical mapping to each
similarity section, so that we can maintain the relationship of global
value numbering from one section to another.

Added Tests:
Transforms/IROutliner/outlining-commutative-operands-opposite-order.ll
unittests/Analysis/IRSimilarityIdentifierTest.cpp - IRSimilarityCandidate:CanonicalNumbering

Reviewers: jroelofs, jpaquette, yroux

Differential Revision: https://reviews.llvm.org/D104143
2021-08-27 15:02:56 -07:00
Philip Reames ec8d87e9f5 [SCEV] Infer nuw from nw for addrecs
This was previously committed in 914836b, and reverted due to confusion on the status of the review.

Differential Revision: https://reviews.llvm.org/D108601
2021-08-24 14:24:05 -07:00
Sanjay Patel 204038d52e [InstSimplify] fold or+shifted -1 to -1
These are similar to the rotate pattern added with:
dcf659e821
...but we don't have guard ops on the shift amount,
so we don't canonicalize to the intrinsic.

  declare void @llvm.assume(i1)

  define i32 @src(i32 %shamt, i32 %bitwidth) {
    ; subtract must be in range of bitwidth
    %lt = icmp ule i32 %bitwidth, 32
    call void @llvm.assume(i1 %lt)

    %r = lshr i32 -1, %shamt
    %s = sub i32 %bitwidth, %shamt
    %l = shl i32 -1, %s
    %o = or i32 %r, %l
    ret i32 %o
  }

  define i32 @tgt(i32 %shamt, i32 %bitwidth) {
    ret i32 -1
  }

https://alive2.llvm.org/ce/z/aF7WHx
2021-08-24 15:38:38 -04:00
Philip Reames 58582bae63 Revert "[SCEV] Infer nsw/nuw from nw for addrecs"
This reverts commit 914836b1c8.  Further comments on review came up after initial approval.  Reverting while addressing.
2021-08-24 09:28:37 -07:00
Philip Reames 914836b1c8 [SCEV] Infer nsw/nuw from nw for addrecs
If we no an addrec doesn't self-wrap, the increment is strictly positive, and the start value is the smallest representable value, then we know that the corresponding wrap type can not occur.

Differential Revision: https://reviews.llvm.org/D108601
2021-08-24 08:53:21 -07:00
Philip Reames 96ef794fd0 [SCEV] Add a hasFlags utility to improve readability [NFC] 2021-08-23 17:36:52 -07:00
Mircea Trofin 1055c5e1d3 [MLGO] Make sure inliner logs when deleting callees
When using final reward (which is now the default), we were skipping
logging decisions that were leading to callee deletion. This fixes that.

Differential Revision: https://reviews.llvm.org/D108587
2021-08-23 14:54:46 -07:00
Florian Hahn d024a01511
Recommit "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64"
This reverts the revert ab9296f13b.

The issue causing the revert should be fixed in 9baed023b4.
2021-08-23 11:25:27 +01:00
Sanjay Patel dcf659e821 [InstSimplify] fold rotate of -1 to -1
This is part of solving more general rotate patterns seen in
bugs related to:
https://llvm.org/PR51575

https://alive2.llvm.org/ce/z/GpkFCt
2021-08-22 09:15:48 -04:00
Sanjay Patel d41e308f10 [InstSimplify] fold rotate of zero to zero
This is part of solving more general rotate patterns seen in
bugs related to:
https://llvm.org/PR51575

https://alive2.llvm.org/ce/z/fjKwqv
2021-08-22 09:15:48 -04:00
Mircea Trofin 8dc3fe0cd1 [NFC][MLGO] Use std::move when moving protobufs
Because of an odd linking problem, we need to temporarily support
building with TF C API 1.15 + tensorflow 2.50 pip package in
'development' mode scenarios. Protobuf Message 'Swap' is partially
implemented in the header (2.50) and relies on a symbol not found in TF
C API 1.15. std::move avoids that, at no semantic cost.
2021-08-20 13:40:35 -07:00
Florian Hahn ab9296f13b
Revert "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64"
This reverts commit f4122398e7 to
investigate a crash exposed by it.

The patch breaks building the code below with `clang -O2 --target=aarch64-linux`

     int a;
     double b, c;
     void d() {
       for (; a; a++) {
         b += c;
         c = a;
       }
     }
2021-08-20 21:24:28 +01:00
Augie Fackler e59c88294b MemoryBuiltins: trailing , on collection literal
This was probably bugging more than is reasonable, but it makes merging
changes in this file slightly less annoying to have the trailing comma
here. I only noticed this because Rust is currently carrying a patch to
this file and it kept making life a little difficult.
2021-08-19 17:59:23 +02:00
David Sherwood f4122398e7 [LoopVectorize][AArch64] Enable ordered reductions by default for AArch64
I have added a new TTI interface called enableOrderedReductions() that
controls whether or not ordered reductions should be enabled for a
given target. By default this returns false, whereas for AArch64 it
returns true and we rely upon the cost model to make sensible
vectorisation choices. It is still possible to override the new TTI
interface by setting the command line flag:

  -force-ordered-reductions=true|false

I have added a new RUN line to show that we use ordered reductions by
default for SVE and Neon:

  Transforms/LoopVectorize/AArch64/strict-fadd.ll
  Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

Differential Revision: https://reviews.llvm.org/D106653
2021-08-19 09:29:40 +01:00
Peter Collingbourne 6f85225ef3 StackLifetime: Remove asserts for multiple lifetime intrinsics.
According to the langref, it is valid to have multiple consecutive
lifetime start or end intrinsics on the same object.

For llvm.lifetime.start:
"If ptr [...] is a stack object that is already alive, it simply
fills all bytes of the object with poison."

For llvm.lifetime.end:
"Calling llvm.lifetime.end on an already dead alloca is no-op."

However, we currently fail an assertion in such cases. I've observed
the assertion failure when the loop vectorization pass duplicates
the intrinsic.

We can conservatively handle these intrinsics by ignoring all but
the first one, which can be implemented by removing the assertions.

Differential Revision: https://reviews.llvm.org/D108337
2021-08-18 18:45:28 -07:00
Arthur Eubanks 7557d6c896 [NFC] Cleanup calls to CallBase::getAttribute() 2021-08-18 09:39:33 -07:00
Arthur Eubanks 3f4d00bc3b [NFC] More get/removeAttribute() cleanup 2021-08-17 21:05:41 -07:00
Mark Danial 4018d25da8 LoopNest Analysis expansion to return instructions that prevent a Loop
Nest from being perfect

Expand LoopNestAnalysis to return the full list of instructions that
cause a loop nest to be imperfect. This is useful for other passes to
know if they should continue for in the inner loops.
Added New function getInterveningInstructions
that returns a small vector with the instructions that prevent a loop
for being perfect. Also added a couple of helper functions to reduce
code duplication.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D107773
2021-08-17 22:25:49 +00:00
Nikita Popov 735a590471 [MemorySSA] Remove -enable-mssa-loop-dependency option
This option has been enabled by default for quite a while now.
The practical impact of removing the option is that MSSA use
cannot be disabled in default pipelines (both LPM and NPM) and
in manual LPM invocations. NPM can still choose to enable/disable
MSSA using loop vs loop-mssa.

The next step will be to require MSSA for LICM and drop the
AST-based implementation entirely.

Differential Revision: https://reviews.llvm.org/D108075
2021-08-16 20:59:37 +02:00
Paul Robinson 94b4598d77 [PS4] stp[n]cpy not available on PS4 2021-08-16 09:06:52 -07:00
Sanjay Patel ca637014f1 [Analysis][SimplifyLibCalls] improve function signature check for memcmp
This would assert/crash as shown in:
https://llvm.org/PR50850

The matching for bcmp/bcopy should probably also be updated,
but that's another patch.
2021-08-15 16:11:26 -04:00
Arthur Eubanks 92ce6db9ee [NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr()
This is more consistent with similar methods.
2021-08-13 11:09:18 -07:00
Arthur Eubanks a0c42ca56c [NFC] Remove AttributeList::hasParamAttribute()
It's the same as AttributeList::hasParamAttr().
2021-08-13 10:58:21 -07:00
Roman Lebedev 0dc6b597db
Revert "[SCEV] Remove premature assert. PR46786"
Since then, the SCEV pointer handling as been improved,
so the assertion should now hold.

This reverts commit b96114c1e1,
relanding the assertion from commit 141e845da5.
2021-08-13 17:50:22 +03:00
Usman Nadeem a7c4e9b1f7 [InstSimplify] Eliminate vector reverse of a splat vector
experimental.vector.reverse(splat(X)) -> splat(X)

Differential Revision: https://reviews.llvm.org/D107793

Change-Id: Id29ba88fd669ff8686712e96b1bdc46dda5b853c
2021-08-11 11:27:58 -07:00
Mircea Trofin 510402c2c8 [NFC][MLGO] 'Use' variable used for asserts 2021-08-10 19:55:17 -07:00
Christopher Di Bella c874dd5362 [llvm][clang][NFC] updates inline licence info
Some files still contained the old University of Illinois Open Source
Licence header. This patch replaces that with the Apache 2 with LLVM
Exception licence.

Differential Revision: https://reviews.llvm.org/D107528
2021-08-11 02:48:53 +00:00
Fangrui Song 76093b1739 [InlineAdvisor] Add single quotes around caller/callee names
Clang diagnostics refer to identifier names in quotes.
This patch makes inline remarks conform to the convention.
New behavior:

```
% clang -O2 -Rpass=inline -Rpass-missed=inline -S a.c
a.c:4:25: remark: 'foo' inlined into 'bar' with (cost=-30, threshold=337) at callsite bar:0:25; [-Rpass=inline]
int bar(int a) { return foo(a); }
                        ^
```

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D107791
2021-08-10 11:51:31 -07:00
Sanjay Patel e260e10c4a [InstSimplify] fold min/max with limit constant
This is already done within InstCombine:
https://alive2.llvm.org/ce/z/MiGE22

...but leaving it out of analysis makes it
harder to avoid infinite loops there.
2021-08-10 10:57:25 -04:00
Sanjay Patel 188832f419 Revert "[InstSimplify] fold min/max with limit constant; NFC"
This reverts commit f43859b437.
This is not NFC, so I'll try again without that mistake in the commit message.
2021-08-10 10:50:09 -04:00
Sanjay Patel f43859b437 [InstSimplify] fold min/max with limit constant; NFC
This is already done within InstCombine:
https://alive2.llvm.org/ce/z/MiGE22

...but leaving it out of analysis makes it
harder to avoid infinite loops there.
2021-08-10 10:43:07 -04:00
Dorit Nuzman 67278b8a90 [LV] Support Interleaved Store Group With Gaps
Teach LV to use masked-store to support interleave-store-group with
gaps (instead of scatters/scalarization).

The symmetric case of using masked-load to support
interleaved-load-group with gaps was introduced a while ago, by
https://reviews.llvm.org/D53668; This patch completes the store-scenario
leftover from D53668, and solves PR50566.

Reviewed by: Ayal Zaks

Differential Revision: https://reviews.llvm.org/D104750
2021-08-08 10:32:02 +03:00
Zheng Chen 30b0c455b1 [LoopCacheAnalysis]: handle mismatch type for Numerator and CacheLineSize
fix an assertion due to mismatch type for Numerator and CacheLineSize in loop cache analysis pass.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D107618
2021-08-06 16:51:09 +00:00
Mircea Trofin ae1a2a09e4 [NFC][MLGO] Make logging more robust
1) add some self-diagnosis (when asserts are enabled) to check that all
features have the same nr of entries

2) avoid storing pointers to mutable fields because the proto API
contract doesn't actually guarantee those stay fixed even if no further
mutation of the object occurs.

Differential Revision: https://reviews.llvm.org/D107594
2021-08-06 04:44:52 -07:00
Serge Pavlov 4c4093e6e3 Introduce intrinsic llvm.isnan
This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:

    Clang has builtin function '__builtin_isnan', which implements C
    library function 'isnan'. This function now is implemented entirely in
    clang codegen, which expands the function into set of IR operations.
    There are three mechanisms by which the expansion can be made.

    * The most common mechanism is using an unordered comparison made by
      instruction 'fcmp uno'. This simple solution is target-independent
      and works well in most cases. It however is not suitable if floating
      point exceptions are tracked. Corresponding IEEE 754 operation and C
      function must never raise FP exception, even if the argument is a
      signaling NaN. Compare instructions usually does not have such
      property, they raise 'invalid' exception in such case. So this
      mechanism is unsuitable when exception behavior is strict. In
      particular it could result in unexpected trapping if argument is SNaN.

    * Another solution was implemented in https://reviews.llvm.org/D95948.
      It is used in the cases when raising FP exceptions by 'isnan' is not
      allowed. This solution implements 'isnan' using integer operations.
      It solves the problem of exceptions, but offers one solution for all
      targets, however some can do the check in more efficient way.

    * Solution implemented by https://reviews.llvm.org/D96568 introduced a
      hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
      specific code into IR. Now only SystemZ implements this hook and it
      generates a call to target specific intrinsic function.

    Although these mechanisms allow to implement 'isnan' with enough
    efficiency, expanding 'isnan' in clang has drawbacks:

    * The operation 'isnan' is hidden behind generic integer operations or
      target-specific intrinsics. It complicates analysis and can prevent
      some optimizations.

    * IR can be created by tools other than clang, in this case treatment
      of 'isnan' has to be duplicated in that tool.

    Another issue with the current implementation of 'isnan' comes from the
    use of options '-ffast-math' or '-fno-honor-nans'. If such option is
    specified, 'fcmp uno' may be optimized to 'false'. It is valid
    optimization in general, but it results in 'isnan' always returning
    'false'. For example, in some libc++ implementations the following code
    returns 'false':

        std::isnan(std::numeric_limits<float>::quiet_NaN())

    The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
    operands are never NaNs. This assumption however should not be applied
    to the functions that check FP number properties, including 'isnan'. If
    such function returns expected result instead of actually making
    checks, it becomes useless in many cases. The option '-ffast-math' is
    often used for performance critical code, as it can speed up execution
    by the expense of manual treatment of corner cases. If 'isnan' returns
    assumed result, a user cannot use it in the manual treatment of NaNs
    and has to invent replacements, like making the check using integer
    operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
    which also expresses the opinion, that limitations imposed by
    '-ffast-math' should be applied only to 'math' functions but not to
    'tests'.

    To overcome these drawbacks, this change introduces a new IR intrinsic
    function 'llvm.isnan', which realizes the check as specified by IEEE-754
    and C standards in target-agnostic way. During IR transformations it
    does not undergo undesirable optimizations. It reaches instruction
    selection, where is lowered in target-dependent way. The lowering can
    vary depending on options like '-ffast-math' or '-ffp-model' so the
    resulting code satisfies requested semantics.

    Differential Revision: https://reviews.llvm.org/D104854
2021-08-06 14:32:27 +07:00
Ryan Prichard 623cf3dfdf Mark getc_unlocked as unavailable by default
Before D45736, getc_unlocked was available by default, but turned off
for non-Cygwin/non-MinGW Windows. D45736 then added 9 more unlocked
functions, which were unavailable by default, but it also:
 * left getc_unlocked enabled by default,
 * removed the disabling line for Windows, and
 * added code to enable getc_unlocked for GNU, Android, and OSX.

For consistency, make getc_unlocked unavailable by default. Maybe this
was the intent of D45736 anyway.

Reviewed By: MaskRay, efriedma

Differential Revision: https://reviews.llvm.org/D107527
2021-08-05 16:35:02 -07:00
Bardia Mahjour 0e08891ec1 [DA] control compile-time spent by MIV tests
Function exploreDirections() in DependenceAnalysis implements a recursive
algorithm for refining direction vectors. This algorithm has worst-case
complexity of O(3^(n+1)) where n is the number of common loop levels.
In this patch I'm adding a threshold to control the amount of time we
spend in doing MIV tests (which most of the time end up resulting in over
pessimistic direction vectors anyway).

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D107159
2021-08-05 09:50:11 -04:00
Nathan Lanza 5848166369 Disable LibFuncs for stpcpy and stpncpy for Android < 21
These functions don't exist in android API levels < 21. A change in
llvm-12 (rG6dbf0cfcf789) caused Oz builds to emit this symbol assuming
it's available and thus is causing link errors. Simply disable it here.

Differential Revision: https://reviews.llvm.org/D107509
2021-08-04 22:48:41 -04:00
Serge Pavlov 0c28a7c990 Revert "Introduce intrinsic llvm.isnan"
This reverts commit 16ff91ebcc.
Several errors were reported mainly test-suite execution time. Reverted
for investigation.
2021-08-04 17:18:15 +07:00
Serge Pavlov 16ff91ebcc Introduce intrinsic llvm.isnan
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.

* The most common mechanism is using an unordered comparison made by
  instruction 'fcmp uno'. This simple solution is target-independent
  and works well in most cases. It however is not suitable if floating
  point exceptions are tracked. Corresponding IEEE 754 operation and C
  function must never raise FP exception, even if the argument is a
  signaling NaN. Compare instructions usually does not have such
  property, they raise 'invalid' exception in such case. So this
  mechanism is unsuitable when exception behavior is strict. In
  particular it could result in unexpected trapping if argument is SNaN.

* Another solution was implemented in https://reviews.llvm.org/D95948.
  It is used in the cases when raising FP exceptions by 'isnan' is not
  allowed. This solution implements 'isnan' using integer operations.
  It solves the problem of exceptions, but offers one solution for all
  targets, however some can do the check in more efficient way.

* Solution implemented by https://reviews.llvm.org/D96568 introduced a
  hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
  specific code into IR. Now only SystemZ implements this hook and it
  generates a call to target specific intrinsic function.

Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:

* The operation 'isnan' is hidden behind generic integer operations or
  target-specific intrinsics. It complicates analysis and can prevent
  some optimizations.

* IR can be created by tools other than clang, in this case treatment
  of 'isnan' has to be duplicated in that tool.

Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':

    std::isnan(std::numeric_limits<float>::quiet_NaN())

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.

To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.

Differential Revision: https://reviews.llvm.org/D104854
2021-08-04 15:27:49 +07:00
Jacob Hegna b16c37fa2c [MLGO] Update the current model url for the Oz inliner model. 2021-08-04 03:09:00 +00:00
Roman Lebedev 6f6e9a867f
[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop
I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107271
2021-08-03 00:57:26 +03:00
Chang-Sun Lin, Jr b58eda39eb [ValueTracking] Fix computeConstantRange to use "may" instead of "always" semantics for llvm.assume
ValueTracking should allow for value ranges that may satisfy
llvm.assume, instead of restricting the ranges only to values that
will always satisfy the condition.

Differential Revision: https://reviews.llvm.org/D107298
2021-08-02 22:20:17 +02:00
Sanjay Patel 7f55557765 [Analysis] improve function signature checking for snprintf
The check for size_t parameter 1 was already here for snprintf_chk,
but it wasn't applied to regular snprintf. This could lead to
mismatching and eventually crashing as shown in:
https://llvm.org/PR50885
2021-07-31 15:17:20 -04:00
Kerry McLaughlin 9d35594993 Reland "[LV] Use lookThroughAnd with logical reductions"
If a reduction Phi has a single user which `AND`s the Phi with a type mask,
`lookThroughAnd` will return the user of the Phi and the narrower type represented
by the mask. Currently this is only used for arithmetic reductions, whereas loops
containing logical reductions will create a reduction intrinsic using the widened
type, for example:

  for.body:
    %phi = phi i32 [ %and, %for.body ], [ 255, %entry ]
    %mask = and i32 %phi, 255
    %gep = getelementptr inbounds i8, i8* %ptr, i32 %iv
    %load = load i8, i8* %gep
    %ext = zext i8 %load to i32
    %and = and i32 %mask, %ext
    ...

^ this will generate an and reduction intrinsic such as the following:
    call i32 @llvm.vector.reduce.and.v8i32(<8 x i32>...)

The same example for an add instruction would create an intrinsic of type i8:
    call i8 @llvm.vector.reduce.add.v8i8(<8 x i8>...)

This patch changes AddReductionVar to call lookThroughAnd for other integer
reductions, allowing loops similar to the example above with reductions such
as and, or & xor to vectorize.

Reviewed By: david-arm, dmgreen

Differential Revision: https://reviews.llvm.org/D105632
2021-07-30 18:04:09 +01:00
Sander de Smalen 84a4caeb84 [InstSimplify] Don't assume parent function when simplifying llvm.vscale.
D106850 introduced a simplification for llvm.vscale by looking at the
surrounding function's vscale_range attributes. The call that's being
simplified may not yet have been inserted into the IR. This happens for
example during function cloning.

This patch fixes the issue by checking if the instruction is in a
parent basic block.
2021-07-29 20:08:08 +01:00
Jun Ma e2fe26e77b [NFC][InstSimplify] Use more intuitive variable names. 2021-07-29 13:55:47 +08:00
Wenlei He 1a8087adaf [ThinLTO] Disallow importing for functions with indir branch to block address
We don't allowing inlining for functions with blockaddress with uses other than strictly callbr. This is because if the blockaddress escapes the function via a global variable, inlining may lead to an invalid cross-function reference.

We check against such cases during inlining, however the check can fail for ThinLTO post-link because CFG simplification can incorrectly removes blocks based on wrong block reachability.

When we import a function with blockaddress taken in a global variable but without importing that variable, we won't go through value mapping to reflect the real address-taken-ness of the cloned blocks. For the imported clone, this leads to blocks reachable from indirect branch through global variable being incorrectly treated as unreachable and removed by SimplifyCFG.

Since inlining for such cases shouldn't be allowed in the first place, I'm marking them as ineligible for importing during pre-link to save the problem of missing address-taken-ness of imported clone as well as bad DCE and inlining.

Differential Revision: https://reviews.llvm.org/D106930
2021-07-28 18:02:48 -07:00
Jun Ma ca0fe3447f [InstSimplify] Simplify llvm.vscale when vscale_range attribute exists
Reduce llvm.vscale to constant based on vscale_range attribute.

Differential Revision: https://reviews.llvm.org/D106850
2021-07-28 21:41:52 +08:00
Mircea Trofin 935dea2cb2 [MLGO] fix silly LLVM_DEBUG misuse 2021-07-27 15:10:28 -07:00
Mircea Trofin eb76ca573d [NFC][MLGO] Debug messages for what inline advisor is selected
We already have an indication (error) if the desired inline advisor
cannot be enabled, but we don't have a positive indication. Added
LLVM_DEBUG messages for the latter.
2021-07-27 15:05:39 -07:00
Anna Thomas 68ffed12b7 [IVDescriptors] Fix bug in checkOrderedReduction
The Exit instruction passed in for checking if it's an ordered reduction need not be
an FPAdd operation. We need to bail out at that point instead of
assuming it is an FPAdd (and hence has two operands). See added testcase.
It crashes without the patch because the Exit instruction is a phi with
exactly one operand.
This latent bug was exposed by 95346ba which added support for
multi-exit loops for vectorization.

Reviewed-By: kmclaughlin
Differential Revision: https://reviews.llvm.org/D106843
2021-07-27 09:31:44 -04:00
Johannes Doerfert 75636868e2 [InstSimplify] Expose generic interface for replaced operand simplification
Users, especially the Attributor, might replace multiple operands at
once. The actual implementation of simplifyWithOpReplaced is able to
handle that just fine, the interface was simply not allowing to replace
more than one operand at a time. This is exposing a more generic
interface without intended changes for existing code.

Differential Revision: https://reviews.llvm.org/D106189
2021-07-27 00:56:12 -05:00
Philip Reames f82f39b9cf [SCEV] Add a comment about invariant in howManyLessThans 2021-07-26 16:39:26 -07:00
Eli Friedman 5c486ce04d [LLVM IR] Allow volatile stores to trap.
Proposed alternative to D105338.

This is ugly, but short-term I think it's the best way forward: first,
let's formalize the hacks into a coherent model. Then we can consider
extensions of that model (we could have different flavors of volatile
with different rules).

Differential Revision: https://reviews.llvm.org/D106309
2021-07-26 10:51:00 -07:00
Florian Hahn 6d753b0751
[LAA] Remove RuntimeCheckingPtrGroup::RtCheck member (NFC).
This patch removes RtCheck from RuntimeCheckingPtrGroup to make it
possible to construct RuntimeCheckingPtrGroup objects without a
RuntimePointerChecking object. This should make it easier to
re-use the code to generate runtime checks, e.g. in D102834.

RtCheck was only used to access the pointer info for a given index.
Instead, the start and end expressions can be passed directly.

For code-gen, we also need to know the address space to use. This can
also be explicitly passed at construction.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D105481
2021-07-26 17:38:10 +01:00
Nikita Popov 33146857e9 [IR] Consider non-willreturn as side effect (PR50511)
This adjusts mayHaveSideEffect() to return true for !willReturn()
instructions. Just like other side-effects, non-willreturn calls
(aka "divergence") cannot be removed and cannot be reordered relative
to other side effects. This fixes a number of bugs where
non-willreturn calls are either incorrectly dropped or moved. In
particular, it also fixes the last open problem in
https://bugs.llvm.org/show_bug.cgi?id=50511.

I performed a cursory review of all current mayHaveSideEffect()
uses, which convinced me that these are indeed the desired default
semantics. Places that do not want to consider non-willreturn as a
sideeffect generally do not want mayHaveSideEffect() semantics at
all. I identified two such cases, which are addressed by D106591
and D106742. Finally, there is a use in SCEV for which we don't
really have an appropriate API right now -- what it wants is
basically "would this be considered forward progress". I've just
spelled out the previous semantics there.

Differential Revision: https://reviews.llvm.org/D106749
2021-07-26 16:35:14 +02:00
Paul Walker 8a8d01d58c [NFC] Change VFShape so it contains an ElementCount rather than seperate VF and IsScalable properties.
Differential Revision: https://reviews.llvm.org/D106750
2021-07-26 12:25:46 +01:00
Philipp Krones 46c0366877 [Inliner] Make the CallPenalty configurable
Tests with multiple benchmarks, like Embench [1], showed that the
CallPenalty magic number has the most influence on inlining decisions
when optimizing for size.

On the other hand, there was no good default value for this parameter.
Some benchmarks profited strongly from a reduced call penalty. On
example is the picojpeg benchmark compiled for RISC-V, which got 6%
smaller with a CallPenalty of 10 instead of 12. Other benchmarks
increased in size, like matmult.

This commit makes the compromise of turning the magic number constant of
CallPenalty into a configurable value. This introduces the flag
`--inline-call-penalty`. With that flag users can fine tune the inliner
to their needs.

The CallPenalty constant was also used for loops. This commit replaces
the CallPenalty constant with a new LoopPenalty constant that is now
used instead.

This is a slimmed down version of https://reviews.llvm.org/D30899

[1]: https://github.com/embench/embench-iot

Differential Revision: https://reviews.llvm.org/D105976
2021-07-26 12:07:49 +01:00
David Sherwood 0aff1798b5 [Analysis] Add simple cost model for strict (in-order) reductions
I have added a new FastMathFlags parameter to getArithmeticReductionCost
to indicate what type of reduction we are performing:

  1. Tree-wise. This is the typical fast-math reduction that involves
  continually splitting a vector up into halves and adding each
  half together until we get a scalar result. This is the default
  behaviour for integers, whereas for floating point we only do this
  if reassociation is allowed.
  2. Ordered. This now allows us to estimate the cost of performing
  a strict vector reduction by treating it as a series of scalar
  operations in lane order. This is the case when FP reassociation
  is not permitted. For scalable vectors this is more difficult
  because at compile time we do not know how many lanes there are,
  and so we use the worst case maximum vscale value.

I have also fixed getTypeBasedIntrinsicInstrCost to pass in the
FastMathFlags, which meant fixing up some X86 tests where we always
assumed the vector.reduce.fadd/mul intrinsics were 'fast'.

New tests have been added here:

  Analysis/CostModel/AArch64/reduce-fadd.ll
  Analysis/CostModel/AArch64/sve-intrinsics.ll
  Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
  Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll

Differential Revision: https://reviews.llvm.org/D105432
2021-07-26 10:26:06 +01:00
Liqiang Tao 4bdfea2c51 [llvm][Inline] Add interface to return cost-benefit stuff
Return cost-benefit stuff which is computed by cost-benefit analysis.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D105349
2021-07-25 20:18:19 +08:00
Philip Reames ec43def700 Style tweaks for SCEV's computeMaxBECountForLT [NFC] 2021-07-23 17:19:45 -07:00
Philip Reames 4a3dc7dc9a [SCEV] Fix bug involving zero step and non-invariant RHS in trip count logic
Eli pointed out the issue when reviewing D104140. The max trip count logic makes an assumption that the value of IV changes. When the step is zero, the nowrap fact becomes trivial, and thus there's nothing preventing the loop from being nearly infinite. (The "nearly" part is because mustprogress may disallow an infinite loop while still allowing 999999999 iterations before RHS happens to allow an exit.)

This is very difficult to see in practice. You need a means to produce a loop varying RHS in a mustprogress loop which doesn't allow the loop to be infinite. In most cases, LICM or SCEV are smart enough to remove the loop varying expressions.

Differential Revision: https://reviews.llvm.org/D106327
2021-07-23 15:19:23 -07:00
Mircea Trofin 55e12f7080 [NFC][MLGO] Just use the underlying protobuf object for logging
Avoid buffering just to copy the buffered data, in 'development
mode', when logging. Instead, just populate the underlying protobuf.

Differential Revision: https://reviews.llvm.org/D106592
2021-07-23 10:56:48 -07:00
Serge Pavlov 1c64b5dc5e [ConstantFolding] Fold constrained arithmetic intrinsics
Constfold constrained variants of operations fadd, fsub, fmul, fdiv,
frem, fma and fmuladd.

The change also sets up some means to support for removal of unused
constrained intrinsics. They are declared as accessing memory to model
interaction with floating point environment, so they were not removed,
as they have side effect. Now constrained intrinsics that have
"fpexcept.ignore" as exception behavior are removed if they have no uses.
As for intrinsics that have exception behavior other than "fpexcept.ignore",
they can be removed if it is known that they do not raise floating point
exceptions. It happens when doing constant folding, attributes of such
intrinsic are changed so that the intrinsic is not claimed as accessing
memory.

Differential Revision: https://reviews.llvm.org/D102673
2021-07-23 14:39:51 +07:00
Mircea Trofin df0066a1c9 [NFC][MLGO] Fix vector sizing
The bots only build release mode, and the use of `reserve` instead of
`resize`, while not causing invalid memory accesses, is incorrect.
2021-07-22 13:06:00 -07:00
Joseph Huber 754eb1c210 [OpenMP] Change `__kmpc_free_shared` to include the paired allocation size
This patch changes `__kmpc_free_shared` to take an additional argument
corresponding to the associated allocation's size. This makes it easier to
implement the allocator in the runtime.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106496
2021-07-21 20:56:21 -04:00
Jacob Hegna cfc4def85d [NFC] Code cleanups in InlineCost.cpp.
- annotate const functions with "const"
 - replace C-style casts with static_cast

Differential Revision: https://reviews.llvm.org/D105362
2021-07-22 00:03:36 +00:00
Kerry McLaughlin be753b207f Revert "[LV] Use lookThroughAnd with logical reductions"
Reverting patch due to buildbot failures.

This reverts commit e22a599672.
2021-07-21 15:16:00 +01:00
Rosie Sumpter 44c9adb414 [LoopFlatten][LoopInfo] Use Loop to identify latch compare instruction
Make getLatchCmpInst non-static and use it in LoopFlatten as a more
robust way of identifying the compare.

Differential Revision: https://reviews.llvm.org/D106256
2021-07-21 10:14:18 +01:00
Kerry McLaughlin e22a599672 [LV] Use lookThroughAnd with logical reductions
If a reduction Phi has a single user which `AND`s the Phi with a type mask,
`lookThroughAnd` will return the user of the Phi and the narrower type represented
by the mask. Currently this is only used for arithmetic reductions, whereas loops
containing logical reductions will create a reduction intrinsic using the widened
type, for example:

  for.body:
    %phi = phi i32 [ %and, %for.body ], [ 255, %entry ]
    %mask = and i32 %phi, 255
    %gep = getelementptr inbounds i8, i8* %ptr, i32 %iv
    %load = load i8, i8* %gep
    %ext = zext i8 %load to i32
    %and = and i32 %mask, %ext
    ...

^ this will generate an and reduction intrinsic such as the following:
    call i32 @llvm.vector.reduce.and.v8i32(<8 x i32>...)

The same example for an add instruction would create an intrinsic of type i8:
    call i8 @llvm.vector.reduce.add.v8i8(<8 x i8>...)

This patch changes AddReductionVar to call lookThroughAnd for other integer
reductions, allowing loops similar to the example above with reductions such
as and, or & xor to vectorize.

Reviewed By: david-arm, dmgreen

Differential Revision: https://reviews.llvm.org/D105632
2021-07-21 09:56:00 +01:00
Sanjay Patel 13302c06cd [ConstantFolding] avoid crashing on a fake math library call
https://llvm.org/PR50960
2021-07-20 18:25:21 -04:00
Jacob Hegna 1f3e90e128 Fix Threshold overwrite bug in the Oz inlining model features.
Differential Revision: https://reviews.llvm.org/D106336
2021-07-20 18:05:06 +00:00
Eli Friedman de3ea51be4 [ScalarEvolution] Refine computeMaxBECountForLT to be accurate in more cases.
Allow arbitrary strides, and make sure we return the correct result when
the backedge-taken count is zero.

Differential Revision: https://reviews.llvm.org/D106197
2021-07-19 15:43:30 -07:00
Philip Reames 4402d0d4fb [SCEV] Add a clarifying comment in howManyLessThans
Wrap semantics are subtle when combined with multiple exits.  This has caused several rounds of confusion during recent reviews, so try to document the subtly distinction between when wrap flags provide <u and <=u facts.
2021-07-19 15:13:48 -07:00
Arthur Eubanks 6cbb35dd3b [NewPM] Bail out of devirtualization wrapper if the current SCC is invalidated
The specific case that triggered this was when inlining a recursive
internal function into itself caused the recursion to go away, allowing
the inliner to mark the function as dead. The inliner marks the SCC as
invalidated but does not provide a new SCC to continue with.

This matches the implementations of ModuleToPostOrderCGSCCPassAdaptor
and CGSCCPassManager.

Fixes PR50363.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D106306
2021-07-19 15:07:30 -07:00
Mircea Trofin 55e2d2060a [MLGO] Use binary protobufs for improved training performance.
It turns out that during training, the time required to parse the
textual protobuf of a training log is about the same as the time it
takes to compile the module generating that log. Using binary protobufs
instead elides that cost almost completely.

Differential Revision: https://reviews.llvm.org/D106157
2021-07-19 13:59:28 -07:00
Mindong Chen e908e063d1 [LoopUtils] Fix incorrect RT check bounds of loop-invariant mem accesses
This fixes the lower and upper bound calculation of a
RuntimeCheckingPtrGroup when it has more than one loop
invariant pointers. Resolves PR50686.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D104148
2021-07-19 19:38:24 +08:00
Nikita Popov 2b17c24a03 [SCEV] Fix unused variable warning (NFC) 2021-07-18 23:12:22 +02:00
Wenlei He 68fa6f7c7c [CSSPGO][NFC] Allow cl::ZeroOrMore for use-iterative-bfi-inference 2021-07-18 13:22:32 -07:00
Eli Friedman 28a3ad3f86 [ScalarEvolution] Remove uses of PointerType::getElementType. 2021-07-18 13:14:33 -07:00
Kazu Hirata 1993b73755 [Analaysis, CodeGen] Remove getHotSucc (NFC)
These functions seem to be unused for at least 5 years.
2021-07-17 07:31:36 -07:00
Eli Friedman cbba71bfb5 [ScalarEvolution] Fix overflow in computeBECount.
The current implementation of computeBECount doesn't account for the
possibility that adding "Stride - 1" to Delta might overflow. For almost
all loops, it doesn't, but it's not actually proven anywhere.

To deal with this, use a variety of tricks to try to prove that the
addition doesn't overflow.  If the proof is impossible, use an alternate
sequence which never overflows.

Differential Revision: https://reviews.llvm.org/D105216
2021-07-16 16:15:18 -07:00
Eli Friedman 5d5b08761f [DependenceAnalysis] Guard analysis using getPointerBase().
D104806 broke some uses of getMinusSCEV() in DependenceAnalysis:
subtraction with different pointer bases returns a SCEVCouldNotCompute.
Make sure we avoid cases involving such subtractions.

Differential Revision: https://reviews.llvm.org/D106099
2021-07-15 14:57:32 -07:00
Philip Reames a99d420a93 [SCEV] Fix unsound reasoning in howManyLessThans
This is split from D105216, it handles only a subset of the cases in that patch.

Specifically, the issue being fixed is that the code incorrectly assumed that (Start-Stide) < End implied that the backedge was taken at least once. This is not true when e.g. Start = 4, Stride = 2, and End = 3. Note that we often do produce the right backedge taken count despite the flawed reasoning.

The fix chosen here is to use an alternate form of uceil (ceiling of unsigned divide) lowering which is safe when max(RHS,Start) > Start - Stride.  (Note that signedness of both max expression and comparison depend on the signedness of the comparison being analyzed, and that overflow in the Start - Stride expression is allowed.)  Note that this is weaker than proving the backedge is taken because it allows start - stride < end < start.  Some cases which can't be proven safe are sent down the generic path, and we do end up generating less optimal expressions in a few cases.

Credit for coming up with the approach goes entirely to Eli.  I just split it off, tweaked the comments a bit, and did some additional testing.

Differential Revision: https://reviews.llvm.org/D105942
2021-07-15 10:32:47 -07:00
Philip Reames 205ed009a4 [SCEV] Handle zero stride correctly in howManyLessThans
This is split from D105216, but the code is hoisted much earlier into
the path where we can actually get a zero stride flowing through. Some
fairly simple proofs handle the cases which show up in practice. The
only test changes are the cases where we really do need a non-zero
divider to produce the right result.

Recommitting with isLoopInvariant() check.

Differential Revision: https://reviews.llvm.org/D105921
2021-07-13 19:14:01 -07:00
Arthur Eubanks 5738819679 Revert "[SCEV] Handle zero stride correctly in howManyLessThans"
This reverts commit 4df591b5c9.

Causes crashes, see comments on D105921.
2021-07-13 17:53:48 -07:00
Eli Friedman bb8c7a980f [ScalarEvolution] Make isKnownNonZero handle more cases.
Using an unsigned range instead of signed ranges is a bit more precise.

Differential Revision: https://reviews.llvm.org/D105941
2021-07-13 15:36:45 -07:00
Philip Reames 4df591b5c9 [SCEV] Handle zero stride correctly in howManyLessThans
This is split from D105216, but the code is hoisted much earlier into the path where we can actually get a zero stride flowing through. Some fairly simple proofs handle the cases which show up in practice. The only test changes are the cases where we really do need a non-zero divider to produce the right result.

Differential Revision: https://reviews.llvm.org/D105921
2021-07-13 13:31:40 -07:00
Philip Reames 087310c71e [SCEV] Strengthen inference of RHS > Start in howManyLessThans
Split off from D105216 to simplify review.  Rewritten with a lambda to be easier to follow.  Comments clarified.

Sorry for no test case, this is tricky to exercise with the current structure of the code.  It's about to be hit more frequently in a follow up patch, and the change itself is simple.
2021-07-13 11:54:07 -07:00
Philip Reames e4b43973fb [ScalarEvolution] Fix overflow when computing max trip counts
This is split from D105216 to reduce patch complexity.  Original code by Eli with very minor modification by me.

The primary point of this patch is to add the getUDivCeilSCEV routine.  I included the two callers with constant arguments as we know those must constant fold even without any of the fancy inference logic.
2021-07-13 10:01:10 -07:00
Nikita Popov 6ac32872ee [Attributes] Replace doesAttrKindHaveArgument() (NFC)
This is now the same as isIntAttrKind(), so use that instead, as
it does not require manual maintenance. The naming is also more
accurate in that both int and type attributes have an argument,
but this method was only targeting int attributes.

I initially wanted to tighten the AttrBuilder assertion, but we
have some in-tree uses that would violate it.
2021-07-12 21:57:26 +02:00
Kazu Hirata 4f94121cce [Analysis] Remove changeCondBranchToUnconditionalTo (NFC)
The last use was removed on Jan 21, 2021 in commit
0895b836d7.
2021-07-10 17:31:43 -07:00
Eli Friedman 882ee7fbd6 Fix buildbot regression from 9c4baf5.
Apparently ScalarEvolution::isImpliedCond tries to truncate a pointer in
some obscure cases. Guard the code with a check for pointers.
2021-07-09 17:54:09 -07:00
Eli Friedman 9c4baf5101 [ScalarEvolution] Strictly enforce pointer/int type rules.
Rules:

1. SCEVUnknown is a pointer if and only if the LLVM IR value is a
   pointer.
2. SCEVPtrToInt is never a pointer.
3. If any other SCEV expression has no pointer operands, the result is
   an integer.
4. If a SCEVAddExpr has exactly one pointer operand, the result is a
   pointer.
5. If a SCEVAddRecExpr's first operand is a pointer, and it has no other
   pointer operands, the result is a pointer.
6. If every operand of a SCEVMinMaxExpr is a pointer, the result is a
   pointer.
7. Otherwise, the SCEV expression is invalid.

I'm not sure how useful rule 6 is in practice.  If we exclude it, we can
guarantee that ScalarEvolution::getPointerBase always returns a
SCEVUnknown, which might be a helpful property. Anyway, I'll leave that
for a followup.

This is basically mop-up at this point; all the changes with significant
functional effects have landed.  Some of the remaining changes could be
split off, but I don't see much point.

Differential Revision: https://reviews.llvm.org/D105510
2021-07-09 17:29:26 -07:00
Nikita Popov 2e3f4694d6 [IR] Add GEPOperator::indices() (NFC)
In order to mirror the GetElementPtrInst::indices() API.

Wanted to use this in the IRForTarget code, and was surprised to
find that it didn't exist yet.
2021-07-09 21:41:20 +02:00
Kevin P. Neal 52900486a1 [FPEnv][InstSimplify] Constrained FP support for NaN
Currently InstructionSimplify.cpp knows how to simplify floating point
instructions that have a NaN operand. It does not know how to handle the
matching constrained FP intrinsic.

This patch teaches it how to simplify so long as the exception handling
is not "fpexcept.strict".

Differential Revision: https://reviews.llvm.org/D103169
2021-07-09 11:26:28 -04:00
Martin Storsjö e479777d3c Revert "[ScalarEvolution] Fix overflow in computeBECount."
This reverts commit 5b350183cd (and
also "[NFC][ScalarEvolution] Cleanup howManyLessThans.",
009436e9c1, to make it apply).

See https://reviews.llvm.org/D105216 for discussion on various
miscompilations caused by that commit.
2021-07-09 14:26:48 +03:00
David Green 38c9a4068d [TTI] Remove IsPairwiseForm from getArithmeticReductionCost
This patch removes the IsPairwiseForm flag from the Reduction Cost TTI
hooks, along with some accompanying code for pattern matching reductions
from trees starting at extract elements. IsPairWise is now assumed to be
false, which was the predominant way that the value was used from both
the Loop and SLP vectorizers. Since the adjustments such as D93860, the
SLP vectorizer has not relied upon this distinction between paiwise and
non-pairwise reductions.

This also removes some code that was detecting reductions trees starting
from extract elements inside the costmodel. This case was
double-counting costs though, adding the individual costs on the
individual instruction _and_ the total cost of the reduction. Removing
it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to
not double count. The cost of reduction intrinsics is still tested
through the various tests in
llvm/test/Analysis/CostModel/X86/reduce-xyz.ll.

Differential Revision: https://reviews.llvm.org/D105484
2021-07-09 11:51:16 +01:00
Bjorn Pettersson 472462c472 [NewPM] Consistently use 'simplifycfg' rather than 'simplify-cfg'
There was an alias between 'simplifycfg' and 'simplify-cfg' in the
PassRegistry. That was the original reason for this patch, which
effectively removes the alias.

This patch also replaces all occurrances of 'simplify-cfg'
by 'simplifycfg'. Reason for choosing that form for the name is
that it matches the DEBUG_TYPE for the pass, and the legacy PM name
and also how it is spelled out in other passes such as
'loop-simplifycfg', and in other options such as
'simplifycfg-merge-cond-stores'.

I for some reason the name should be changed to 'simplify-cfg' in
the future, then I think such a renaming should be more widely done
and not only impacting the PassRegistry.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D105627
2021-07-09 09:47:03 +02:00
Eli Friedman 009436e9c1 [NFC][ScalarEvolution] Cleanup howManyLessThans.
In preparation for D104075. Some NFC cleanup, and some test coverage for
planned changes.
2021-07-08 17:56:26 -07:00
Michael Liao 8c7ff9da90 [Metadata] Decorate methods with 'const'. NFC.
- Minor coding style fix.
2021-07-08 14:11:14 -04:00
Eli Friedman 5b350183cd [ScalarEvolution] Fix overflow in computeBECount.
There are two issues with the current implementation of computeBECount:

1. It doesn't account for the possibility that adding "Stride - 1" to
Delta might overflow. For almost all loops, it doesn't, but it's not
actually proven anywhere.
2. It doesn't account for the possibility that Stride is zero. If Delta
is zero, the backedge is never taken; the value of Stride isn't
relevant. To handle this, we have to make sure that the expression
returned by computeBECount evaluates to zero.

To deal with this, add two new checks:

1. Use a variety of tricks to try to prove that the addition doesn't
overflow.  If the proof is impossible, use an alternate sequence which
never overflows.
2. Use umax(Stride, 1) to handle the possibility that Stride is zero.

Differential Revision: https://reviews.llvm.org/D105216
2021-07-08 10:09:55 -07:00
Eli Friedman f5603aa050 [ScalarEvolution] Make sure getMinusSCEV doesn't negate pointers.
Add a function removePointerBase that returns, essentially, S -
getPointerBase(S).  Use it in getMinusSCEV instead of actually
subtracting pointers.

Differential Revision: https://reviews.llvm.org/D105503
2021-07-07 10:27:10 -07:00
Eli Friedman 7ac1c7bead Recommit [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers.
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.

There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.

The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.

As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base.  This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.

This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes).  The resulting changes
seem okay to me, but suggestions welcome.  As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.

I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.

Recommitting with fix to MemoryDepChecker::isDependent.

Differential Revision: https://reviews.llvm.org/D104806
2021-07-06 12:16:05 -07:00
Eli Friedman a6d081b2cb Revert "[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers."
This reverts commit 74d6ce5d5f.

Seeing crashes on buildbots in MemoryDepChecker::isDependent.
2021-07-06 11:17:13 -07:00
Sanjay Patel 4ec7c02197 [InstSimplify] fix bug in poison propagation for FP ops
If any operand of a math op is poison, that takes
precedence over general undef/NaN.

This should not be visible with binary ops because
it requires 2 constant operands to trigger (and if
both operands of a binop are constant, that should
get handled first in ConstantFolding).
2021-07-06 14:06:50 -04:00
Eli Friedman 74d6ce5d5f [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers.
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.

There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.

The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.

As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base.  This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.

This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes).  The resulting changes
seem okay to me, but suggestions welcome.  As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.

I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.

Differential Revision: https://reviews.llvm.org/D104806
2021-07-06 10:54:41 -07:00
Kerry McLaughlin a7512401e5 [LV] Prevent vectorization with unsupported element types.
This patch adds a TTI function, isElementTypeLegalForScalableVector, to query
whether it is possible to vectorize a given element type. This is called by
isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if
any of the instruction types in the loop are unsupported, e.g:

  int foo(__int128_t* ptr, int N)
    #pragma clang loop vectorize_width(4, scalable)
    for (int i=0; i<N; ++i)
      ptr[i] = ptr[i] + 42;

This example currently crashes if we attempt to vectorize since i128 is not a
supported type for scalable vectorization.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D102253
2021-07-06 13:06:21 +01:00
Sanjay Patel 3d3c0ed932 [InstSimplify] fold extractelement of splat with variable extract index
We already have a fold for variable index with constant vector,
but if we can determine a scalar splat value, then it does not
matter whether that value is constant or not.

We overlooked this fold in D102404 and earlier patches,
but the fixed vector variant is shown in:
https://llvm.org/PR50817

Alive2 agrees on that:
https://alive2.llvm.org/ce/z/HpijPC

The same logic applies to scalable vectors.

Differential Revision: https://reviews.llvm.org/D104867
2021-07-05 08:19:40 -04:00
Paul Walker 287d39dd5a [NFC] Fix a few whitespace issues and typos. 2021-07-04 11:49:58 +01:00
Roman Lebedev fc150cecd7
[SimplifyCFG] simplifyUnreachable(): erase instructions iff they are guaranteed to transfer execution to unreachable
This replaces the current ad-hoc implementation,
by syncing the code from InstCombine's implementation in `InstCombinerImpl::visitUnreachableInst()`,
with one exception that here in SimplifyCFG we are allowed to remove EH instructions.

Effectively, this now allows SimplifyCFG to remove calls (iff they won't throw and will return),
arithmetic/logic operations, etc.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D105374
2021-07-03 10:45:44 +03:00
Jacob Hegna 8cc8caa1b1 [MLGO] Update Oz model url. 2021-07-02 17:29:15 +00:00
Jacob Hegna 99f00635d7 Unpack the CostEstimate feature in ML inlining models.
This change yields an additional 2% size reduction on an internal search
binary, and an additional 0.5% size reduction on fuchsia.

Differential Revision: https://reviews.llvm.org/D104751
2021-07-02 16:57:16 +00:00
Sanjay Patel 9eb613b2de [InstSimplify] do not propagate poison from select arm to icmp user
This is the cause of the miscompile in:
https://llvm.org/PR50944

The problem has likely existed for some time, but it was made visible with:
5af8bacc94 ( D104661 )
handleOtherCmpSelSimplifications() assumed it can convert select of
constants to bool logic ops, but that does not work with poison.
We had a very similar construct in InstCombine, so the fix here
mimics the fix there.

The bug is in instsimplify, but I'm not sure how to reproduce it outside of
instcombine. The reason this is visible in instcombine is because we have a
hack (FIXME) to bypass simplification of a select when it has an icmp user:
955f125899/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp (L2632)

So we get to an unusual case where we are trying to simplify an instruction
that has an operand that would have already simplified if we had processed
it in normal order.

Differential Revision: https://reviews.llvm.org/D105298
2021-07-01 17:40:07 -04:00
Florian Hahn dc4299a7f3
[BasicAA] Fix typo ScaleForGDC -> ScaleForGCD. 2021-07-01 09:58:38 +01:00
Florian Hahn e6d22d0174
[BasicAA] Use separate scale variable for GCD.
Use separate variable for adjusted scale used for GCD computations. This
fixes an issue where we incorrectly determined that all indices are
non-negative and returned noalias because of that.

Follow up to 91fa3565da.
2021-06-30 20:04:39 +01:00
Philip Reames 14d8f1546a [SCEV] Fold (0 udiv %x) to 0
We have analogous rules in instsimplify, etc.., but were missing the same in SCEV.  The fold is near trivial, but came up in the context of a larger change.
2021-06-30 08:31:13 -07:00
Jacob Hegna 7b639f5095 [NFC] clang-format on InlineCost.cpp and InlineAdvisor.h. 2021-06-29 18:15:27 +00:00
Florian Hahn 91fa3565da
[BasicAA] Be more careful with modulo ops on VariableGEPIndex.
(V * Scale) % X may not produce the same result for any possible value
of V, e.g. if the multiplication overflows. This means we currently
incorrectly determine NoAlias in some cases.

This patch updates LinearExpression to track whether the expression
has NSW and uses that to adjust the scale used for alias checks.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99424
2021-06-29 09:22:36 +01:00
Sanjay Patel 7414bbebc2 [Analysis] improve function signature checking for calloc
This would crash later if we thought the parameters were
valid for the standard library call as shown in:
https://llvm.org/PR50846
2021-06-27 08:19:00 -04:00
Eli Friedman 8d5bf0709d [NFC] Prefer ConstantRange::makeExactICmpRegion over makeAllowedICmpRegion
The implementation is identical, but it makes the semantics a bit more
obvious.
2021-06-25 14:43:13 -07:00
Sanjay Patel 1076b6c4f0 [Analysis] use better version of getLibFunc to check for alloc/free calls
There's no reason to use the weaker name-only analysis when we
have a function prototype to check (in fact, we probably should
not even have that name-only function exposed for general use,
but removing it requires auditing all of the callers).

The version of getLibFunc that takes a Function argument also
does some prototype checking to make sure the arguments/return
type match the expected signature of a real library call.

This is NFC-intended because the code in MemoryBuiltins does its
own function signature checking. For now, that means there may
be some redundancy in the checking, but that should not be above
the noise for compile-time. Ideally, we can move the checks to
a single location.

There's still a hole in the logic that allows the example in
https://llvm.org/PR50846 to cause a compiler crash.
2021-06-25 12:14:07 -04:00
Florian Hahn 6478f3fb78
[SCEV] Support single-cond range check idiom in applyLoopGuards.
This patch extends applyLoopGuards to detect a single-cond range check
idiom that InstCombine generates.

It extends applyLoopGuards to detect conditions of the form
(-C1 + X < C2). InstCombine will create this form when combining two
checks of the form (X u< C2 + C1) and (X >=u C1).

In practice, this enables us to correctly compute a tight trip count
bounds for code as in the function below. InstCombine will fold the
minimum iteration check created by LoopRotate with the user check (< 8).

    void unsigned_check(short *pred, unsigned width) {
        if (width < 8) {
            for (int x = 0; x < width; x++)
                pred[x] = pred[x] * pred[x];
        }
    }

As a consequence, LLVM creates dead vector loops for the code above,
e.g. see https://godbolt.org/z/cb8eTcqET

https://alive2.llvm.org/ce/z/SHHW4d

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104741
2021-06-25 10:24:40 +01:00
Sanjay Patel 50db987d59 [InstSimplify] move extract with undef index fold; NFC
This puts it closer to the other undef query check and
will avoid a potential ordering problem if we allow
folding non-constant-int indexes.
2021-06-24 13:22:10 -04:00
Florian Hahn 121ecb05e7
[SCEV] Generalize MatchBinaryAddToConst to support non-add expressions.
This patch generalizes MatchBinaryAddToConst to support matching
(A + C1), (A + C2), instead of just matching (A + C1), A.

The existing cases can be handled by treating non-add expressions A as
A + 0.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D104634
2021-06-24 12:16:15 +01:00
Carl Ritson ae266e743c [LVI] Remove recursion from getValueForCondition (NFCI)
Convert getValueForCondition to a worklist model instead of using
recursion.

In pathological cases getValueForCondition recurses heavily.
Stack frames are quite expensive on x86-64, and some operating
systems (e.g. Windows) have relatively low stack size limits.
Using a worklist avoids potential failures from stack overflow.

Differential Revision: https://reviews.llvm.org/D104191
2021-06-24 09:58:22 +09:00
Eli Friedman b12192f7cd [ScalarEvolution] Clarify implementation of getPointerBase().
getPointerBase should only be looking through Add and AddRec
expressions; other expressions either aren't pointers, or can't be
looked through.

Technically, this is a functional change. For a multiply or min/max
expression, if they have exactly one pointer operand, and that operand
is the first operand, the behavior here changes. Similarly, if an AddRec
has a pointer-type step, the behavior changes. But that shouldn't be
happening in practice, and we plan to make such expressions illegal.
2021-06-23 12:55:59 -07:00
Eli Friedman fdaf304e0d [NFC][ScalarEvolution] Fix SCEVNAryExpr::getType().
SCEVNAryExpr::getType() could return the wrong type for a SCEVAddExpr.
Remove it, and add getType() methods to the relevant subclasses.

NFC because nothing uses it directly, as far as I know; this is just
future-proofing.
2021-06-23 12:55:59 -07:00
Nikita Popov 00d3f7cc3c [LAA] Make getPointersDiff() API compatible with opaque pointers
Make getPointersDiff() and sortPtrAccesses() compatible with opaque
pointers by explicitly passing in the element type instead of
determining it from the pointer element type.

The SLPVectorizer result is slightly non-optimal in that unnecessary
pointer bitcasts are added.

Differential Revision: https://reviews.llvm.org/D104784
2021-06-23 18:44:34 +02:00
Sanjay Patel 656001e7b2 [ValueTracking] look through bitcast of vector in computeKnownBits
This borrows as much as possible from the SDAG version of the code
(originally added with D27129 and since updated with big endian support).

In IR, we can test more easily for correctness than we did in the
original patch. I'm using the simplest cases that I could find for
InstSimplify: we computeKnownBits on variable shift amounts to see if
they are zero or in range. So shuffle constant elements into a vector,
cast it, and shift it.

The motivating x86 example from https://llvm.org/PR50123 is also here.
We computeKnownBits in the caller code, but we only check if the shift
amount is in range. That could be enhanced to catch the 2nd x86 test -
if the shift amount is known too big, the result is 0.

Alive2 understands the datalayout and agrees that the tests here are
correct - example:
https://alive2.llvm.org/ce/z/KZJFMZ

Differential Revision: https://reviews.llvm.org/D104472
2021-06-23 11:46:46 -04:00
Juneyoung Lee 5af8bacc94 [InstSimplify] Add more poison folding optimizations
This adds more poison folding optimizations to InstSimplify.

Since all binary operators propagate poison, these are fine.

Also, the precondition of `select cond, undef, x` -> `x` is relaxed to allow the case when `x` is undef.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104661
2021-06-23 20:25:24 +09:00
Florian Hahn adee485adf
[SCEV] Support signed predicates in applyLoopGuards.
This adds handling for signed predicates, similar to how unsigned
predicates are already handled.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104732
2021-06-23 10:21:05 +01:00
Joseph Huber 2662351e3b [OpenMP] Add new OpenMP globalization functions to library info
Summary:
The changes to globalization introduced in D97680 created two new functions to
push / pop shareably memory on the GPU, __kmpc_alloc_shared and
__kmpc_free_shared. This patch adds these new runtime functions to the
library info so they can be used by the HeapToStack attributor interface. This
optimization replaces malloc / free pairs with stack memory if legal.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D102087
2021-06-22 13:23:05 -04:00
Florian Hahn 6c782e6eb0
[SCEV] Reduce code to handle predicates in applyLoopGuards (NFC).
Hoist out common recurrence check and sink updating the map, to reduce
the code required to support additional predicates.
2021-06-22 15:56:45 +01:00
Nikita Popov e638a290f7 [ConstantFold] Delay fetching pointer element type
Don't do this while stipping pointer casts, instead fetch it at
the end. This improves compatibility with opaque pointers for the
case where the base object is not opaque.
2021-06-22 15:51:00 +02:00
Florian Hahn d17798823c
[SCEV] Retain AddExpr flags when subtracting a foldable constant.
Currently we drop wrapping flags for expressions like (A + C1)<flags> - C2.

But we can retain flags under certain conditions:

* Adding a smaller constant is NUW if the original AddExpr was NUW.

* Adding a constant with the same sign and small magnitude is NSW, if the
  original AddExpr was NSW.

This can improve results after using `SimplifyICmpOperands`, which may
subtract one in order to use stricter predicates, as is the case for
`isKnownPredicate`.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D104319
2021-06-22 11:27:51 +01:00
Nikita Popov 04395fd6cb [ConstantFolding] Separate conditions in GEP evaluation (NFC)
Handle to gep p, 0-v case separately, and not as part of the loop
that ensures all indices are constant integers. Those two things
are not really related.
2021-06-22 11:14:47 +02:00
Eli Friedman 8f3d16905d [ScalarEvolution] Ensure backedge-taken counts are not pointers.
A backedge-taken count doesn't refer to memory; returning a pointer type
is nonsense. So make sure we always return an integer.

The obvious way to do this would be to just convert the operands of the
icmp to integers, but that doesn't quite work out at the moment:
isLoopEntryGuardedByCond currently gets confused by ptrtoint operations.
So we perform the ptrtoint conversion late for lt/gt operations.

The test changes are mostly innocuous. The most interesting changes are
more complex SCEV expressions of the form "(-1 * (ptrtoint i8* %ptr to
i64)) + %ptr)". This is expected: we can't fold this to zero because we
need to preserve the pointer base.

The call to isLoopEntryGuardedByCond in howFarToZero is less precise
because of ptrtoint operations; this shows up in the function
pr46786_c26_char in ptrtoint.ll. Fixing it here would require more
complex refactoring.  It should eventually be fixed by future
improvements to isImpliedCond.

See https://bugs.llvm.org/show_bug.cgi?id=46786 for context.

Differential Revision: https://reviews.llvm.org/D103656
2021-06-21 16:24:16 -07:00
Jacob Hegna f86d1f99b3 Remove ML inlining model artifacts.
They are not conducive to being stored in git. Instead, we autogenerate
mock model artifacts for use in tests. Production models can be
specified with the cmake flag LLVM_INLINER_MODEL_PATH.

LLVM_INLINER_MODEL_PATH has two sentinel values:
 - download, which will download the most recent compatible model.
 - autogenerate, which will autogenerate a "fake" model for testing the
 model uptake infrastructure.

Differential Revision: https://reviews.llvm.org/D104251
2021-06-21 17:38:09 +00:00
Eli Friedman 62ed024c74 [NFC][ScalarEvolution] Clean up ExitLimit constructors.
Make all the constructors forward to one constructor.  Remove redundant
assertions.
2021-06-20 17:40:30 -07:00
Juneyoung Lee 09e8c0d5aa [InstSimplify] icmp poison, X -> poison
This adds a simple transformation from icmp with poison constant to poison.
Comparing poison with something else is poison, so this is okay.

https://alive2.llvm.org/ce/z/e8iReb
https://alive2.llvm.org/ce/z/q4MurY
2021-06-20 15:39:07 +09:00
Tomas Matheson 1bcfa84ae9 Allow building for release with EXPENSIVE_CHECKS
D97225 moved LazyCallGraph verify() calls behind EXPENSIVE_CHECKS,
but verity() is defined for debug builds only so this had the unintended
effect of breaking release builds with EXPENSIVE_CHECKS.

Fix by enabling verify() for both debug and EXPENSIVE_CHECKS.

Differential Revision: https://reviews.llvm.org/D104514
2021-06-19 17:02:11 +01:00
Eli Friedman 8a567e5f22 [ScalarEvolution] Fix pointer/int type handling converting select/phi to min/max.
The old version of this code would blindly perform arithmetic without
paying attention to whether the types involved were pointers or
integers.  This could lead to weird expressions like negating a pointer.

Explicitly handle simple cases involving pointers, like "x < y ? x : y".
In all other cases, coerce the operands of the comparison to integer
types.  This avoids the weird cases, while handling most of the
interesting cases.

Differential Revision: https://reviews.llvm.org/D103660
2021-06-17 14:05:12 -07:00
Bjorn Pettersson 4c7f820b2b Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439
2021-06-17 09:38:28 +02:00
Joachim Meyer 053dbb939d Use `-cfg-func-name` value as filter for `-view-cfg`, etc.
Currently the value is only used when calling `F->viewCFG()` which is missing out on its potential and usefulness.
So I added the check to the printer passes as well.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D102011
2021-06-16 23:54:51 +02:00
Eli Friedman 27963ccf07 [NFC][ScalarEvolution] Refactor createNodeForSelectOrPHI
In preparation for D103660.
2021-06-16 12:32:32 -07:00
Sanjay Patel ce95200b79 [InstSimplify] propagate poison through FP ops
We already have this fold:
  fadd float poison, 1.0 --> poison
...via ConstantFolding, so this makes the behavior consistent
if the other operand(s) are non-constant.

The fold for undef was added before poison existed as a
value/type in IR.

This came up in D102673 / D103169
because we're trying to sort out the more complicated handling
for constrained math ops.
We should have the handling for the regular instructions done
first, so we can build on that (or diverge as needed).

Differential Revision: https://reviews.llvm.org/D104383
2021-06-16 11:31:58 -04:00
Roman Lebedev a3113df219
[SCEV] PtrToInt on non-integral pointers is allowed
As per (committed without review) @reames's rGac81cb7e6dde9b0890ee1780eae94ab96743569b change,
we are now allowed to produce `ptrtoint` for non-integral pointers.
This will unblock further unbreaking of SCEV regarding int-vs-pointer type confusion.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D104322
2021-06-16 10:24:25 +03:00
Arthur Eubanks 9aa1428174 [InstSimplify] Treat invariant group insts as bitcasts for load operands
We can look through invariant group intrinsics for the purposes of
simplifying the result of a load.

Since intrinsics can't be constants, but we also don't want to
completely rewrite load constant folding, we convert the load operand to
a constant. For GEPs and bitcasts we just treat them as constants. For
invariant group intrinsics, we treat them as a bitcast.

Relanding with a check for self-referential values.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D101103
2021-06-15 12:59:43 -07:00
spupyrev 0a0800c4d1 A post-processing for BFI inference
The current implementation for computing relative block frequencies does
not handle correctly control-flow graphs containing irreducible loops. This
results in suboptimally generated binaries, whose perf can be up to 5%
worse than optimal.

To resolve the problem, we apply a post-processing step, which iteratively
updates block frequencies based on the frequencies of their predesessors.
This corresponds to finding the stationary point of the Markov chain by
an iterative method aka "PageRank computation". The algorithm takes at
most O(|E| * IterativeBFIMaxIterations) steps but typically converges faster.

It is turned on by passing option `use-iterative-bfi-inference`
and applied only for functions containing profile data and irreducible loops.

Tested on SPEC06/17, where it is helping to get correct profile counts for one of
the binaries (403.gcc). In prod binaries, we've seen a speedup of up to 2%-5%
for binaries containing functions with hot irreducible loops.

Reviewed By: hoy, wenlei, davidxl

Differential Revision: https://reviews.llvm.org/D103289
2021-06-11 21:46:04 -07:00
Andrew Litteken f6dea2e732 [IRSim] Strip out the findSimilarity call from the constructor
Both doInitialize and runOnModule were running the entire analysis
due to the actual work being done in the constructor. Strip it out here
and only get the similarity during runOnModule.

Author: lanza
Reviewers: AndrewLitteken, paquette, plofti

Differential Revision: https://reviews.llvm.org/D92524
2021-06-11 18:41:28 -05:00
Andrew Litteken 64720f57be [IRSim] Don't copy the Mapper for createCandidatesFromSuffixTree
Every invocation this was copying the Mapper for no reason. Take a const
ref instead.

Author: lanza
Reviewers: AndrewLitteken, plofti, paquette,

Differential Review: https://reviews.llvm.org/D92532
2021-06-11 16:36:23 -05:00
Simon Pilgrim 5e6bfb661e [Analysis] Pass RecurrenceDescriptor as const reference. NFCI.
We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.).

Differential Revision: https://reviews.llvm.org/D104029
2021-06-11 10:24:14 +01:00
Philip Reames 7629b2a09c [LI] Add a cover function for checking if a loop is mustprogress [nfc]
Essentially, the cover function simply combines the loop level check and the function level scope into one call.  This simplifies several callers and is (subjectively) less error prone.
2021-06-10 13:37:32 -07:00
Philip Reames aaaeb4b160 [SCEV] Use mustprogress flag on loops (in addition to function attribute)
This addresses a performance regression reported against 3c6e4191.  That change (correctly) limited a transform based on assumed finiteness to mustprogress loops, but the previous change (38540d7) which introduced the mustprogress check utility only handled function attributes, not the loop metadata form.

It turns out that clang uses the function attribute form for C++, and the loop metadata form for C.  As a result, 3c6e4191 ended up being a large regression in practice for C code as loops weren't being considered mustprogress despite the language semantics.
2021-06-10 13:20:28 -07:00
Philip Reames b6ee5f2b1d Move code for checking loop metadata into Analysis [nfc]
I need the mustprogress loop metadata in ScalarEvolution and it makes sense to keep all the accessors for quering loop metadate together.
2021-06-10 13:01:22 -07:00
Serge Pavlov 8ff36aab69 [ConstantFolding] Enable folding of min/max/copysign for all floats
Previously such folding was enabled for half, float and double values
only. With this change it is allowed for other floating point values
also.

Differential Revision: https://reviews.llvm.org/D103956
2021-06-10 11:57:51 +07:00
Philip Reames b65f30d6fb [SCEV] Minor code motion to simplify a later patch [nfc] 2021-06-09 14:17:06 -07:00
Arthur Eubanks 222cce3828 Revert "[InstSimplify] Treat invariant group insts as bitcasts for load operands"
This reverts commit 26044c6a54.

Breaks on invalid IR (see D101103).
2021-06-09 11:46:10 -07:00
Florian Hahn b76f1f1202
[SCEV] Keep common NUW flags when inlining Add operands.
Currently, NoWrapFlags are dropped if we inline operands of SCEVAddExpr
operands. As a consequence, we always drop flags when building
expressions like `getAddExpr(A, getAddExpr(B, C, NUW), NUW)`.

We should be able to retain NUW flags common among all inlined
SCEVAddExpr and the original flags.

Reviewed By: nikic, mkazantsev

Differential Revision: https://reviews.llvm.org/D103877
2021-06-09 17:13:21 +01:00
Artur Pilipenko 9197bac297 Add an option to hide "cold" blocks from CFG graph
Introduce a new cl::opt to hide "cold" blocks from CFG DOT graphs.
Use BFI to get block relative frequency. Hide the block if the
frequency is below the threshold set by the command line option value.

Reviewed By: davidxl, hoy
Differential Revision: https://reviews.llvm.org/D103640
2021-06-08 11:29:27 -07:00
Caroline Concatto 6fd1604d14 [InstCombine] Add instcombine fold for extractelement + splat for scalable vectors
This patch allows that scalable vector can also use the fold that already
exists for fixed vector, only when the lane index is lower than the minimum
number of elements of the vector.

Differential Revision: https://reviews.llvm.org/D102404
2021-06-08 10:43:38 +01:00
Philip Reames 3c6e419198 [SCEV] Properly guard reasoning about infinite loops being UB on mustprogress
Noticed via code inspection. We changed the semantics of the IR when we added mustprogress, and we appear to have not updated this location.

Differential Revision: https://reviews.llvm.org/D103834
2021-06-07 14:47:36 -07:00
Daniil Suchkov d32cc150fe [BasicAA] Handle PHIs without incoming values gracefully
Fix a bug introduced by f6f6f6375d.
Now for empty PHIs, instead of crashing on assert(hasVal()) in
Optional's internals, we'll return NoAlias, as we did before that patch.

Differential Revision: https://reviews.llvm.org/D103831
2021-06-07 21:39:01 +00:00
Philip Reames 38540d71c7 [SCEV] Compute exit counts for unsigned IVs using mustprogress semantics
The motivation here is simple loops with unsigned induction variables w/non-one steps. A toy example would be:
for (unsigned i = 0; i < N; i += 2) { body; }

Given C/C++ semantics, we do not get the nuw flag on the induction variable. Given that lack, we currently can't compute a bound for this loop. We can do better for many cases, depending on the contents of "body".

The basic intuition behind this patch is as follows:
* A step which evenly divides the iteration space must wrap through the same numbers repeatedly. And thus, we can ignore potential cornercases where we exit after the n-th wrap through uint32_max.
* Per C++ rules, infinite loops without side effects are UB. We already have code in SCEV which relies on this.  In LLVM, this is tied to the mustprogress attribute.

Together, these let us conclude that the trip count of this loop must come before unsigned overflow unless the body would form a well defined infinite loop.

A couple notes for those reading along:
* I reused the loop properties code which is overly conservative for this case. I may follow up in another patch to generalize it for the actual UB rules.
* We could cache the n(s/u)w facts. I left that out because doing a pre-patch which cached existing inference showed a lot of diffs I had trouble fully explaining. I plan to get back to this, but I don't want it on the critical path.

Differential Revision: https://reviews.llvm.org/D103118
2021-06-07 11:24:00 -07:00
Simon Pilgrim 76a1be05fa AssumeBundleQueries.cpp - don't dereference a dyn_cast<> result. NFCI.
Use cast<> instead which will assert that the cast is correct and not just return null - the match() should have already failed if the cast isn't valid anyhow.

Fixes static analysis warning.
2021-06-06 15:25:03 +01:00
Roman Lebedev e350494fb0
[NFC] Promote willNotOverflow() / getStrengthenedNoWrapFlagsFromBinOp() from IndVars into SCEV proper
We might want to use it when creating SCEV proper in createSCEV(),
now that we don't `forgetValue()` in `SimplifyIndvar::strengthenOverflowingOperation()`,
which might have caused us to loose some optimization potential.
2021-06-05 12:17:51 +03:00
Fangrui Song 06e7de795b Fix some -Wunused-but-set-variable in -DLLVM_ENABLE_ASSERTIONS=off build 2021-06-04 23:34:43 -07:00
Artur Pilipenko a06e63fa52 NFC. Refactor DOTGraphTraits::isNodeHidden
Restructure handling of cfg-hide-unreachable-paths and
cfg-hide-deoptimize-paths options so as to make it easier
to introduce new types of hidden blocks.
2021-06-03 11:27:06 -07:00
Qunyan Mangus cbde248736 Add getDemandedBits for uses.
Add getDemandedBits method for uses so we can query demanded bits for each use.  This can help getting better use information. For example, for the code below
define i32 @test_use(i32 %a) {
  %1 = and i32 %a, -256
  %2 = or i32 %1, 1
  %3 = trunc i32 %2 to i8 (didn't optimize this to 1 for illustration purpose)
  ... some use of %3
  ret %2
}
if we look at the demanded bit of %2 (which is all 32 bits because of the return), we would conclude that %a is used regardless of how its return is used. However, if we look at each use separately, we will see that the demanded bit of %2 in trunc only uses the lower 8 bits of %a which is redefined, therefore %a's usage depends on how the function return is used.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97074
2021-06-02 10:07:40 -04:00
Daniil Fukalov 0195e594fe [TTI] NFC: Change getIntImmCodeSizeCost to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D102915
2021-06-02 16:04:11 +03:00
Bjorn Pettersson 9c54ee4378 [SimplifyLibCalls] Take size of int into consideration when emitting ldexp/ldexpf
When rewriting
  powf(2.0, itofp(x)) -> ldexpf(1.0, x)
  exp2(sitofp(x)) -> ldexp(1.0, sext(x))
  exp2(uitofp(x)) -> ldexp(1.0, zext(x))

the wrong type was used for the second argument in the ldexp/ldexpf
libc call, for target architectures with 16 bit "int" type.
The transform incorrectly used a bitcasted function pointer with
a 32-bit argument when emitting the ldexp/ldexpf call for such
targets.

The fault is solved by using the correct function prototype
in the call, by asking TargetLibraryInfo about the size of "int".
TargetLibraryInfo by default derives the size of the int type by
assuming that it is 16 bits for 16-bit architectures, and
32 bits otherwise. If this isn't true for a target it should be
possible to override that default in the TargetLibraryInfo
initializer.

Differential Revision: https://reviews.llvm.org/D99438
2021-06-02 11:40:34 +02:00
Arthur Eubanks 8961293851 [OpaquePtr] Create API to make a copy of a PointerType with some address space
Some existing places use getPointerElementType() to create a copy of a
pointer type with some new address space.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D103429
2021-06-01 16:52:32 -07:00
Arthur Eubanks 26044c6a54 [InstSimplify] Treat invariant group insts as bitcasts for load operands
We can look through invariant group intrinsics for the purposes of
simplifying the result of a load.

Since intrinsics can't be constants, but we also don't want to
completely rewrite load constant folding, we convert the load operand to
a constant. For GEPs and bitcasts we just treat them as constants. For
invariant group intrinsics, we treat them as a bitcast.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D101103
2021-06-01 16:33:06 -07:00
Eli Friedman fd229caa01 [polly] Fix SCEVLoopAddRecRewriter to avoid invalid AddRecs.
When we're remapping an AddRec, the AddRec constructed by a partial
rewrite might not make sense.  This triggers an assertion complaining
it's not loop-invariant.

Instead of constructing the partially rewritten AddRec, just skip
straight to calling evaluateAtIteration.

Testcase was automatically reduced using llvm-reduce, so it's a little
messy, but hopefully makes sense.

Differential Revision: https://reviews.llvm.org/D102959
2021-06-01 09:51:05 -07:00
Florian Hahn aa00b1d763
[LV] Try to sink users recursively for first-order recurrences.
Update isFirstOrderRecurrence to  explore all uses of a recurrence phi
and check if we can sink them. If there are multiple users to sink, they
are all mapped to the previous instruction.

Fixes PR44286 (and another PR or two).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D84951
2021-05-31 19:55:33 +01:00
Daniil Fukalov e853d3b274 [NFC] MemoryDependenceAnalysis cleanup.
1. Removed redundant includes,
2. Removed never defined and used `releaseMemory()`.
3. Fixed member functions names first letter case.
4. Renamed duplicate (in nested struct `NonLocalPointerInfo`) name
   `NonLocalDeps` to `NonLocalDepsMap`.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D102358
2021-05-31 18:07:55 +03:00
Roman Lebedev f7c95c3322
[NFC] ScalarEvolution: apply SSO to the ExprValueMap value
ExprValueMap is a map from SCEV * to a set-vector of (Value *, ConstantInt *) pair,
and while the map itself will likely be big-ish (have many keys),
it is a reasonable assumption that each key will refer to a small-ish
number of pairs.

In particular looking at n=512 case from
https://bugs.llvm.org/show_bug.cgi?id=50384,
the small-size of 4 appears to be the sweet spot,
it results in the least allocations while minimizing memory footprint.
```
$ for i in $(ls heaptrack.opt.*.gz); do echo $i; heaptrack_print $i | tail -n 6; echo ""; done
heaptrack.opt.0-orig.gz
total runtime: 14.32s.
calls to allocation functions: 8222442 (574192/s)
temporary memory allocations: 2419000 (168924/s)
peak heap memory consumption: 190.98MB
peak RSS (including heaptrack overhead): 239.65MB
total memory leaked: 67.58KB

heaptrack.opt.1-n1.gz
total runtime: 13.72s.
calls to allocation functions: 7184188 (523705/s)
temporary memory allocations: 2419017 (176338/s)
peak heap memory consumption: 191.38MB
peak RSS (including heaptrack overhead): 239.64MB
total memory leaked: 67.58KB

heaptrack.opt.2-n2.gz
total runtime: 12.24s.
calls to allocation functions: 6146827 (502355/s)
temporary memory allocations: 2418997 (197695/s)
peak heap memory consumption: 163.31MB
peak RSS (including heaptrack overhead): 211.01MB
total memory leaked: 67.58KB

heaptrack.opt.3-n4.gz
total runtime: 12.28s.
calls to allocation functions: 6068532 (494260/s)
temporary memory allocations: 2418985 (197017/s)
peak heap memory consumption: 155.43MB
peak RSS (including heaptrack overhead): 201.77MB
total memory leaked: 67.58KB

heaptrack.opt.4-n8.gz
total runtime: 12.06s.
calls to allocation functions: 6068042 (503321/s)
temporary memory allocations: 2418992 (200646/s)
peak heap memory consumption: 166.03MB
peak RSS (including heaptrack overhead): 213.55MB
total memory leaked: 67.58KB

heaptrack.opt.5-n16.gz
total runtime: 12.14s.
calls to allocation functions: 6067993 (499958/s)
temporary memory allocations: 2418999 (199307/s)
peak heap memory consumption: 187.24MB
peak RSS (including heaptrack overhead): 233.69MB
total memory leaked: 67.58KB
```

While that test may be an edge worst-case scenario,
https://llvm-compile-time-tracker.com/compare.php?from=dee85d47d9f15fc268f7b18f279dac2774836615&to=98a57e31b1947d5bcdf4a5605ac2ab32b4bd5f63&stat=instructions
agrees that this also results in improvements in the usual situations.
2021-05-31 15:34:03 +03:00
Sanjay Patel 7bb8bfa062 [InstCombine] fix miscompile from vector select substitution
This is similar to the fix in c590a9880d ( PR49832 ), but
we missed handling the pattern for select of bools (no compare
inst).

We can't substitute a vector value because the equality condition
replacement that we are attempting requires that the condition
is true/false for the entire value. Vector select can be partly
true/false.

I added an assert for vector types, so we shouldn't hit this again.
Fixed formatting while auditing the callers.

https://llvm.org/PR50500
2021-05-30 07:11:58 -04:00
Mindong Chen 71acce68da [NFCI] Move DEBUG_TYPE definition below #includes
When you try to define a new DEBUG_TYPE in a header file, DEBUG_TYPE
definition defined around the #includes in files include it could
result in redefinition warnings even compile errors.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D102594
2021-05-30 17:31:01 +08:00
Florian Hahn ec1f6f7e3f
Revert "[LAA] Support pointer phis in loop by analyzing each incoming pointer."
This reverts commit 1ed7f8ede5.

This change can cause loop-distribute to crash in some cases. Revert
until I have more time to wrap up a fix.

See  PR50296, PR5028 and D102266.
2021-05-28 10:33:52 +01:00
Yang Fan f2264ebb08
[ConstantFolding] Fix -Wunused-variable warning (NFC)
GCC warning:
```
/llvm-project/llvm/lib/Analysis/ConstantFolding.cpp: In function ‘llvm::Constant* llvm::ConstantFoldLoadFromConstPtr(llvm::Constant*, llvm::Type*, const llvm::DataLayout&)’:
/llvm-project/llvm/lib/Analysis/ConstantFolding.cpp:713:19: warning: unused variable ‘SimplifiedGEP’ [-Wunused-variable]
  713 |         if (auto *SimplifiedGEP = dyn_cast<GEPOperator>(Simplified)) {
      |                   ^~~~~~~~~~~~~
```
2021-05-28 16:17:12 +08:00
Arthur Eubanks 8086f9d87e [ConstFold] Simplify a load's GEP operand through local aliases
MSVC-style RTTI produces loads through a GEP of a local alias which
itself is a GEP. Currently we aren't able to devirtualize any virtual
calls when MSVC RTTI is enabled.

This patch attempts to simplify a load's GEP operand by calling
SymbolicallyEvaluateGEP() with an option to look through local aliases.

Differential Revision: https://reviews.llvm.org/D101100
2021-05-27 16:04:19 -07:00
Philip Reames ff08c3468f [SCEV] Compute trip multiple for multiple exit loops
This patch implements getSmallConstantTripMultiple(L) correctly for multiple exit loops. The previous implementation was both imprecise, and violated the specified behavior of the method. This was fine in practice, because it turns out the function was both dead in real code, and not tested for the multiple exit case.

Differential Revision: https://reviews.llvm.org/D103189
2021-05-26 11:52:25 -07:00
Philip Reames 9306bb638f [SCEV] Generalize getSmallConstantTripCount(L) for multiple exit loops
This came up in review for another patch, see https://reviews.llvm.org/D102982#2782407 for full context.

I've reviewed the callers to make sure they can handle multiple exit loops w/non-zero returns.  There's two cases in target cost models where results might change (Hexagon and PowerPC), but the results looked legal and reasonable.  If a target maintainer wishes to back out the effect of the costing change, they should explicitly check for multiple exit loops and handle them as desired.

Differential Revision: https://reviews.llvm.org/D103182
2021-05-26 11:18:25 -07:00
Philip Reames 921d3f7af0 [SCEV] Add a utility for converting from "exit count" to "trip count"
(Mostly as a logical place to put a comment since this is a reoccuring confusion.)
2021-05-26 10:41:49 -07:00
Philip Reames fb14577d0c [SCEV] Extract out a helper for computing trip multiples 2021-05-26 10:15:03 -07:00
Philip Reames 9cc2181ec3 [unroll] Use value domain for symbolic execution based cost model
The current full unroll cost model does a symbolic evaluation of the loop up to a fixed limit. That symbolic evaluation currently simplifies to constants, but we can generalize to arbitrary Values using the InstructionSimplify infrastructure at very low cost.

By itself, this enables some simplifications, but it's mainly useful when combined with the branch simplification over in D102928.

Differential Revision: https://reviews.llvm.org/D102934
2021-05-26 08:41:25 -07:00
Vitaly Buka f44f2e0afc [NFC] Fix 'unused' warning 2021-05-25 12:23:57 -07:00
Nikita Popov 6300c37a46 [SCEV] Cache operands used in BEInfo (NFC)
When memoized values for a SCEV expressions are dropped, we also
drop all BECounts that make use of the SCEV expression. This is done
by iterating over all the ExitNotTaken counts and (recursively)
checking whether they use the SCEV expression. If there are many
exits, this will take a lot of time.

This patch improves the situation by pre-computing a set of all
used operands, so that we can determine whether a certain BEInfo
needs to be invalidated using a simple set lookup. Will still need
to loop over all BEInfos though.

This makes for a mild improvement on non-degenerate cases:
https://llvm-compile-time-tracker.com/compare.php?from=b661a55a253f4a1cf5a0fbcb86e5ba7b9fb1387b&to=be1393f450e594c53f0ad7e62339a6bc831b16f6&stat=instructions

For the degenerate case from https://bugs.llvm.org/show_bug.cgi?id=50384,
for n=128 I'm seeing run time drop from 1.6s to 1.1s.

Differential Revision: https://reviews.llvm.org/D102796
2021-05-25 21:03:33 +02:00
Sanjay Patel ca7eaa0a54 [InstSimplify] allow undef element match in vector select condition value
The semantics of select with undefined/poison condition
are not explicitly stated in the LangRef, but this matches
comments in the code and Alive2 appears to concur:
https://alive2.llvm.org/ce/z/KXytmd

We can find this pattern after demanded elements transforms.

As noted in D101191, fuzzers are finding infinite loops because
we may not account for this pattern in other passes.
2021-05-25 14:25:34 -04:00
Philip Reames aabca2d1da [SCEV] Cleanup doesIVOverflowOnX checks [NFC]
Stylistic changes only.
1) Don't pass a parameter just to do an early exit.
2) Use a name which matches actual behavior.
2021-05-25 10:12:24 -07:00
Philip Reames a47b2d4567 [SCEV] Remove unused parameter from computeBECount [NFC]
All callers pass "false" for the Equality parameter.  Kill the dead code, and update the function block comment.
2021-05-25 09:58:56 -07:00
David Goldblatt 8607a02357 [InstSimplify] Transform X * Y % Y --> 0
simplifyDiv already handles the case X * Y / Y --> X (barring overflow).
This adds the equivalent handling to simplifyRem.

Correctness:
https://alive2.llvm.org/ce/z/J2cUbS
https://alive2.llvm.org/ce/z/us9NUM
https://alive2.llvm.org/ce/z/AvaDGJ
https://alive2.llvm.org/ce/z/kq9ige

Extending the situations in which we apply this transform would not be
correct:
https://alive2.llvm.org/ce/z/Lf9V63
https://alive2.llvm.org/ce/z/6RPQK3
https://alive2.llvm.org/ce/z/p9UdxC
https://alive2.llvm.org/ce/z/A2zlhE
https://alive2.llvm.org/ce/z/vHTtLw
https://alive2.llvm.org/ce/z/lvpH42

Differential Revision: https://reviews.llvm.org/D102864
2021-05-25 10:16:04 -04:00
Sanjay Patel a0e71f1832 [ConstProp] propagate poison from vector reduction element(s) to result
This follows from the underlying logic for binops and min/max.
Although it does not appear that we handle this for min/max
intrinsics currently.
https://alive2.llvm.org/ce/z/Kq9Xnh
2021-05-24 10:34:40 -04:00
Martin Storsjö c5638a71d8 [MinGW] Mark a number of library functions unavailable for mingw targets
These functions were marked unavailable for MSVC targets before,
within an "T.isOSWindows() && !T.isOSCygMing()" block, but these ones
are unavailable on MinGW targets too.

This avoids generating calls to stpcpy for MinGW targets, which has
been happening since 6dbf0cfcf7 (in
some cases).

This fixes https://github.com/mstorsjo/llvm-mingw/issues/201.

Differential Revision: https://reviews.llvm.org/D102946
2021-05-22 23:40:19 +03:00
Serge Pavlov c9c05a91c4 [ConstantFolding] Use APFloat for constant folding. NFC
Replace use of host floating types with operations on APFloat when it is
possible. Use of APFloat makes analysis more convenient and facilitates
constant folding in the case of non-default FP environment.

Differential Revision: https://reviews.llvm.org/D102672
2021-05-22 13:00:20 +07:00
Arthur Eubanks f7788e1bff Revert "[NewPM] Only invalidate modified functions' analyses in CGSCC passes"
This reverts commit d14d84af2f.

Causes unacceptable memory regressions.
2021-05-21 16:38:03 -07:00
Arthur Eubanks a52530dd6a Revert "[NPM] Do not run function simplification pipeline unnecessarily"
This reverts commit 97ab068034.

Depends on D100917, which is to be reverted.
2021-05-21 16:38:02 -07:00
Philip Reames cc5f6ae4b4 Move a definition into cpp from header in advance of other changes [nfc] 2021-05-21 09:18:04 -07:00
Daniil Fukalov e1cb98be2d [TTI] NFC: Change getCostOfKeepingLiveOverCall to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D102831
2021-05-21 15:18:12 +03:00
Daniil Fukalov e8e88c3353 [TTI] NFC: Change getRegUsageForType to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D102541
2021-05-21 15:17:23 +03:00
Joe Ellis 5a476987f7 [InstSimplify] Properly constrain {insert,extract}_subvector intrinsic fold
The previous rule:

   (insert_vector _, (extract_vector X, 0), 0) -> X

is not quite correct. The correct fold should be:

   (insert_vector Y, (extract_vector X, 0), 0) -> X
   where: Y is X, or Y is undef

This commit updates the pattern.

Reviewed By: peterwaller-arm, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D102699
2021-05-21 10:05:03 +00:00
Serge Pavlov c162f086ba [APFloat] convertToDouble/Float can work on shorter types
Previously APFloat::convertToDouble may be called only for APFloats that
were built using double semantics. Other semantics like single precision
were not allowed although corresponding numbers could be converted to
double without loss of precision. The similar restriction applied to
APFloat::convertToFloat.

With this change any APFloat that can be precisely represented by double
can be handled with convertToDouble. Behavior of convertToFloat was
updated similarly. It make the conversion operations more convenient and
adds support for formats like half and bfloat.

Differential Revision: https://reviews.llvm.org/D102671
2021-05-21 11:02:51 +07:00
Nikita Popov b661a55a25 [ScalarEvolution] Remove unused ExitLimit::hasOperand() method (NFC)
We only use BackedgeTakenInfo::hasOperand().
2021-05-19 18:42:14 +02:00
Arthur Eubanks 6b9524a05b [NewPM] Don't mark AA analyses as preserved
Currently all AA analyses marked as preserved are stateless, not taking
into account their dependent analyses. So there's no need to mark them
as preserved, they won't be invalidated unless their analyses are.

SCEVAAResults was the one exception to this, it was treated like a
typical analysis result. Make it like the others and don't invalidate
unless SCEV is invalidated.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D102032
2021-05-18 13:49:03 -07:00
Arthur Eubanks cc64ece77d [NFC][OpaquePtr] Avoid using PointerType::getElementType() in VectorUtils.cpp
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D102533
2021-05-17 18:35:44 -07:00
Nikita Popov 7243120198 [CaptureTracking] Simplify reachability check (NFCI)
This code was re-implementing the same-BB case of
isPotentiallyReachable(). Historically, this was done because
CaptureTracking used additional caching for local dominance
queries. Now that it is no longer needed, the code is effectively
the same as isPotentiallyReachable().

The only difference are extra checks for invoke/phis. These are
misleading checks related to dominance in the value availability
sense that are not relevant for control reachability. The invoke
check was correct but redundant in that invokes are always
terminators, so `I` could never come before the invoke. The phi
check is a matter of interpretation (should an earlier phi node be
considered reachable from a later phi node in the same block?)
but ultimately doesn't matter because phis don't capture anyway.
2021-05-16 16:04:10 +02:00
Nikita Popov 656296b1c2 Reapply [CaptureTracking] Do not check domination
Reapply after adjusting the synchronized.m test case, where the
TODO is now resolved. The pointer is only captured on the exception
handling path.

-----

For the CapturesBefore tracker, it is sufficient to check that
I can not reach BeforeHere. This does not necessarily require
that BeforeHere dominates I, it can also occur if the capture
happens on an entirely disjoint path.

This change was previously accepted in D90688, but had to be
reverted due to large compile-time impact in some cases: It
increases the number of reachability queries that are performed.

After recent changes, the compile-time impact is largely mitigated,
so I'm reapplying this patch. The remaining compile-time impact
is largely proportional to changes in code-size.
2021-05-16 15:46:31 +02:00
Nikita Popov 541c2845de Revert "[CaptureTracking] Do not check domination"
This reverts commit 6b8b43e7af.

This causes clang test to fail (CodeGenObjC/synchronized.m).
Revert until I can figure out whether that's an expected change.
2021-05-16 11:04:45 +02:00
Nikita Popov 6b8b43e7af [CaptureTracking] Do not check domination
For the CapturesBefore tracker, it is sufficient to check that
I can not reach BeforeHere. This does not necessarily require
that BeforeHere dominates I, it can also occur if the capture
happens on an entirely disjoint path.

This change was previously accepted in D90688, but had to be
reverted due to large compile-time impact in some cases: It
increases the number of reachability queries that are performed.

After recent changes, the compile-time impact is largely mitigated,
so I'm reapplying this patch. The remaining compile-time impact
is largely proportional to changes in code-size.
2021-05-16 10:49:36 +02:00
Nikita Popov 6e9363c942 [CaptureTracking] Only check reachability for capture candidates
Reachability queries are very expensive, and currently performed
for each instruction we look at, even though most of them will
not lead to a capture and are thus ultimately irrelevant. It is
more efficient to walk a few unnecessary instructions than to
perform unnecessary reachability queries.

Theoretically, this may produce worse results, because the additional
instructions considered may cause us to hit the use count limit
earlier. In practice, this does not appear to be a problem, e.g.
on test-suite O3 we report only one more captured-before with this
change, with no resulting codegen differences.

This makes PointerMayBeCapturedBefore() significantly cheaper in
practice, hopefully allowing it to be used in more places.
2021-05-15 22:57:56 +02:00
Nikita Popov f9e9b0cdb4 [CFG] Move reachable from entry checks into basic block variant
These checks are not specific to the instruction based variant of
isPotentiallyReachable(), they are equally valid for the basic
block based variant. Move them there, to make sure that switching
between the instruction and basic block variants cannot introduce
regressions.
2021-05-15 15:42:02 +02:00
Nikita Popov fb9ed1979a [IR] Add BasicBlock::isEntryBlock() (NFC)
This is a recurring and somewhat awkward pattern. Add a helper
method for it.
2021-05-15 12:41:58 +02:00
Nikita Popov 6418bab6f8 [CFG] Use comesBefore() (NFC)
Use comesBefore() instead of performing an instruction walk. In
line with the previous implementation, instructions are considered
to reach themselves.
2021-05-15 12:14:30 +02:00
Nikita Popov f765e54db2 [CaptureTracking] Clean up same instruction check (NFC)
Check the BeforeHere == I case once in shouldExplore, instead of
handling it in four different places.
2021-05-15 11:58:55 +02:00
Nick Desaulniers 8c72749bd9 [LowerConstantIntrinsics] reuse isManifestLogic from ConstantFolding
GlobalVariables are Constants, yet should not unconditionally be
considered true for __builtin_constant_p.

Via the LangRef
https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic:

    This intrinsic generates no code. If its argument is known to be a
    manifest compile-time constant value, then the intrinsic will be
    converted to a constant true value. Otherwise, it will be converted
    to a constant false value.

    In particular, note that if the argument is a constant expression
    which refers to a global (the address of which _is_ a constant, but
    not manifest during the compile), then the intrinsic evaluates to
    false.

Move isManifestConstant from ConstantFolding to be a method of
Constant so that we can reuse the same logic in
LowerConstantIntrinsics.

pr/41459

Reviewed By: rsmith, george.burgess.iv

Differential Revision: https://reviews.llvm.org/D102367
2021-05-14 15:35:21 -07:00
Nikita Popov c4fb2a1fc2 [MemDep] Use BatchAA in more places (NFCI)
Previously, we already used BatchAA for individual simple pointer
dependency queries. This extends BatchAA usage for the non-local
case, so that only one BatchAA instance is used for all blocks,
instead of one instance per block.

Use of BatchAA is safe as IR cannot be modified during a MemDep
query.
2021-05-14 22:54:40 +02:00
Nikita Popov 5e289cc597 [AA] Support callCapturesBefore() on BatchAA (NFCI)
This is not expected to have any practical compile-time effect,
as the alias() calls inside callCapturesBefore() are rare. This
should still be supported for API completeness, and might be
useful for reachability caching.
2021-05-14 21:48:08 +02:00
Philip Reames 23c93c2555 Discount invariant instructions in full unrolling
This patch updates the cost model for full unrolling to discount the cost of a loop invariant expression on all but one iteration. The reasoning here is that such an expression (as determined by SCEV) will be CSEd or DSEd once the loop is unrolled. Note that SCEVs reasoning will find things which could be invariant, not simply those outside the loop.

Differential Revision: https://reviews.llvm.org/D102506
2021-05-14 11:07:19 -07:00
dfukalov fdae3fc8b3 [GVN] Clobber partially aliased loads.
Use offsets stored in `AliasResult` implemented in D98718.

Updated with fix of issue reported in https://reviews.llvm.org/D95543#2745161

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95543
2021-05-14 11:17:14 +03:00
Nikita Popov 425781bce0 [CaptureTracking] Use isIdentifiedFunctionLocal() (NFC)
These conditions together exactly match isIdentifiedFunctionLocal(),
and this is also what we logically want to check for here.
2021-05-13 23:06:42 +02:00
Nikita Popov dce158c58d [AA] Use isIdentifiedFunctionLocal() (NFC)
This condition is equivalent to isIdentifiedFunctionLocal(),
and this is also what we semantically want to check here.
2021-05-13 23:06:42 +02:00
Joe Ellis 2ed7db0d20 [InstSimplify] Remove redundant {insert,extract}_vector intrinsic chains
This commit removes some redundant {insert,extract}_vector intrinsic
chains by implementing the following patterns as instsimplifies:

   (insert_vector _, (extract_vector X, 0), 0) -> X
   (extract_vector (insert_vector _, X, 0), 0) -> X

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D101986
2021-05-13 16:09:50 +00:00
Florian Hahn e2759f110b
[SCEV] Apply guards to max with non-unitary steps.
We already apply loop-guards when computing the maximum with unitary
steps. This extends the code to also do so when dealing with non-unitary
steps.

This allows us to infer a tighter maximum in some cases.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D102267
2021-05-13 09:47:29 +01:00
Jordan Rupprecht fec2945998 Revert "[GVN] Clobber partially aliased loads."
This reverts commit 6c57044231.

It causes assertion errors due to widening atomic loads, and potentially causes miscompile elsewhere too. Repro, also posted to D95543:

```
$ cat repro.ll
; ModuleID = 'repro.ll'
source_filename = "repro.ll"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.widget = type { i32 }
%struct.baz = type { i32, %struct.snork }
%struct.snork = type { %struct.spam }
%struct.spam = type { i32, i32 }

@global = external local_unnamed_addr global %struct.widget, align 4
@global.1 = external local_unnamed_addr global i8, align 1
@global.2 = external local_unnamed_addr global i32, align 4

define void @zot(%struct.baz* %arg) local_unnamed_addr align 2 {
bb:
  %tmp = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1
  %tmp1 = bitcast %struct.snork* %tmp to i64*
  %tmp2 = load i64, i64* %tmp1, align 4
  %tmp3 = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1, i32 0, i32 1
  %tmp4 = icmp ugt i64 %tmp2, 4294967295
  br label %bb5

bb5:                                              ; preds = %bb14, %bb
  %tmp6 = load i32, i32* %tmp3, align 4
  %tmp7 = icmp ne i32 %tmp6, 0
  %tmp8 = select i1 %tmp7, i1 %tmp4, i1 false
  %tmp9 = zext i1 %tmp8 to i8
  store i8 %tmp9, i8* @global.1, align 1
  %tmp10 = load i32, i32* @global.2, align 4
  switch i32 %tmp10, label %bb11 [
    i32 1, label %bb12
    i32 2, label %bb12
  ]

bb11:                                             ; preds = %bb5
  br label %bb14

bb12:                                             ; preds = %bb5, %bb5
  %tmp13 = load atomic i32, i32* getelementptr inbounds (%struct.widget, %struct.widget* @global, i64 0, i32 0) acquire, align 4
  br label %bb14

bb14:                                             ; preds = %bb12, %bb11
  br label %bb5
}
$ opt -O2 repro.ll -disable-output
opt: /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Utils/VNCoercion.cpp:496: llvm::Value *llvm::VNCoercion::getLoadValueForLoad(llvm::LoadInst *, unsigned int, llvm::Type *, llvm::Instruction *, const llvm::DataLayout &): Assertion `SrcVal->isSimple() && "Cannot widen volatile/atomic load!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/rupprecht/dev/opt -O2 repro.ll -disable-output
...
```
2021-05-11 16:08:53 -07:00
Stanislav Mekhanoshin 22d295f695 [AMDGPU] Constant fold Intrinsic::amdgcn_perm
Differential Revision: https://reviews.llvm.org/D102203
2021-05-10 16:23:11 -07:00
Florian Hahn 93a9a8a8d9
[VecLib] Add support for vector fns from Darwin's libsystem.
This patch adds support for Darwin's libsystem math vector functions to
TLI. Darwin's libsystem provides a range of vector functions for libm
functions.

This initial patch only adds the 2 x double and 4 x float versions,
which are available on both X86 and ARM64. On X86, wider vector versions
are supported as well.

Reviewed By: jroelofs

Differential Revision: https://reviews.llvm.org/D101856
2021-05-10 21:19:58 +01:00
Andy Kaylor 7086025d65 [Dependence Analysis] Enable delinearization of fixed sized arrays
Patch by Artem Radzikhovskyy!

Allow delinearization of fixed sized arrays if we can prove that the GEP indices do not overflow the array dimensions. The checks applied are similar to the ones that are used for delinearization of parametric size arrays. Make sure that the GEP indices are non-negative and that they are smaller than the range of that dimension.

Changes Summary:

- Updated the LIT tests with more exact values, as we are able to delinearize and apply more exact tests
- profitability.ll - now able to delinearize in all cases, no need to use -da-disable-delinearization-checks flag and run the test twice
- loop-interchange-optimization-remarks.ll - in one of the cases we are able to delinearize without using -da-disable-delinearization-checks
- SimpleSIVNoValidityCheckFixedSize.ll - removed unnecessary "-da-disable-delinearization-checks" flag. Now can get the exact answer without it.
- SimpleSIVNoValidityCheckFixedSize.ll and PreliminaryNoValidityCheckFixedSize.ll - made negative tests more explicit, in order to demonstrate the need for "-da-disable-delinearization-checks" flag

Differential Revision: https://reviews.llvm.org/D101486
2021-05-10 10:30:15 -07:00
Nikita Popov d26ca78c18 [SCEV] Handle and/or in applyLoopGuards()
applyLoopGuards() already combines conditions from multiple nested
guards. However, it cannot use multiple conditions on the same guard,
combined using and/or. Add support for this by recursing into either
`and` or `or`, depending on the direction of the branch.

Differential Revision: https://reviews.llvm.org/D101692
2021-05-09 21:34:28 +02:00
Arthur Eubanks 34a8a437bf [NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose
Printing pass manager invocations is fairly verbose and not super
useful.

This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.

This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D101797
2021-05-07 21:51:47 -07:00
Florian Hahn 6c99e63120 [SCEV] By more careful when traversing phis in isImpliedViaMerge.
I think currently isImpliedViaMerge can incorrectly return true for phis
in a loop/cycle, if the found condition involves the previous value of

Consider the case in exit_cond_depends_on_inner_loop.

At some point, we call (modulo simplifications)
isImpliedViaMerge(<=, %x.lcssa, -1, %call, -1).

The existing code tries to prove IncV <= -1 for all incoming values
InvV using the found condition (%call <= -1). At the moment this succeeds,
but only because it does not compare the same runtime value. The found
condition checks the value of the last iteration, but the incoming value
is from the *previous* iteration.

Hence we incorrectly determine that the *previous* value was <= -1,
which may not be true.

I think we need to be more careful when looking at the incoming values
here. In particular, we need to rule out that a found condition refers to
any value that may refer to one of the previous iterations. I'm not sure
there's a reliable way to do so (that also works of irreducible control
flow).

So for now this patch adds an additional requirement that the incoming
value must properly dominate the phi block. This should ensure the
values do not change in a cycle. I am not entirely sure if will catch
all cases and I appreciate a through second look in that regard.

Alternatively we could also unconditionally bail out in this case,
instead of checking the incoming values

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101829
2021-05-07 19:52:29 +01:00
Krzysztof Parzyszek 50cf0a1d1a Allow empty value list in propagateMetadata(Inst, ArrayOf...)
This will allow writing
  propagateMetadata(Inst, collectInterestingValues(...))
without concern about empty lists. In case of an empty list,
Inst is returned without any changes.
2021-05-07 13:20:50 -05:00
Fangrui Song d8aba75a76 Internalize some cl::opt global variables or move them under namespace llvm 2021-05-07 11:15:43 -07:00
Whitney Tsang 1006ac3963 [LoopNest] Consider loop nest with inner loop guard using outer loop
induction variable to be perfect

This patch allow more conditional branches to be considered as loop
guard, and so more loop nests can be considered perfect.

Reviewed By: bmahjour, sidbav

Differential Revision: https://reviews.llvm.org/D94717
2021-05-07 16:04:18 +00:00
Joseph Tremoulet bc302bfbef BasicAA: Recognize inttoptr as isEscapeSource
Pointers escape when converted to integers, so a pointer produced by
converting an integer to a pointer must not be a local non-escaping
object.

Reviewed By: nikic, nlopes, aqjune

Differential Revision: https://reviews.llvm.org/D101541
2021-05-07 07:48:50 -07:00
Peilin Guo 911a541620 [LazyValueInfo] Insert an Overdefined placeholder to prevent infinite recursion
getValueFromCondition() uses a Visited set to record the intermediate value.
However, it uses a postorder way to compute the value first and update the
Visited set later. Thus it will be trapped into an infinite recursion if there
exists IRs that use no dominated by its def as in this example:

  %tmp3 = or i1 undef, %tmp4
  %tmp4 = or i1 undef, %tmp3

To prevent this, we can insert an Overdefined placeholder into the set
before computing the actual value.

Reviewed by: nikic

Differential Revision: https://reviews.llvm.org/D101273
2021-05-07 16:05:50 +08:00
Mircea Trofin 97ab068034 [NPM] Do not run function simplification pipeline unnecessarily
The CGSCC pass manager interplay with the FunctionAnalysisManagerCGSCCProxy is 'special' in the sense that the former will rerun the latter if there are changes to a SCC structure; that being said, some of the functions in the SCC may be unchanged. In that case, the function simplification pipeline will be re-run, which impacts compile time[1].

This patch allows the function simplification pipeline be skipped if it was already run and the function was not modified since.

The behavior is currently disabled by default. This is because, currently, the rerunning of the function simplification pipeline on an unchanged function may still result in changes. The patch simplifies investigating and fixing those cases where repeated function pass runs do actually positively impact code quality, while offering an easy workaround for those impacted negatively by compile time regressions, and not impacting mainline scenarios.

[1] A [[ http://llvm-compile-time-tracker.com/compare.php?from=eb37d3546cd0c6e67798496634c45e501f7806f1&to=ac722d1190dc7bbdd17e977ef7ec95e69eefc91e&stat=instructions | compile time tracker ]] run with the option enabled.

Differential Revision: https://reviews.llvm.org/D98103
2021-05-06 12:24:33 -07:00
Bjorn Pettersson 3ee826594a Make dependency between certain analysis passes transitive (reapply)
LazyBlockFrequenceInfoPass, LazyBranchProbabilityInfoPass and
LoopAccessLegacyAnalysis all cache pointers to their nestled required
analysis passes. One need to use addRequiredTransitive to describe
that the nestled passes can't be freed until those analysis passes
no longer are used themselves.

There is still a bit of a mess considering the getLazyBPIAnalysisUsage
and getLazyBFIAnalysisUsage functions. Those functions are used from
both Transform, CodeGen and Analysis passes. I figure it is OK to
use addRequiredTransitive also when being used from Transform and
CodeGen passes. On the other hand, I figure we must do it when
used from other Analysis passes. So using addRequiredTransitive should
be more correct here. An alternative solution would be to add a
bool option in those functions to let the user tell if it is a
analysis pass or not. Since those lazy passes will be obsolete when
new PM has conquered the world I figure we can leave it like this
right now.

Intention with the patch is to fix PR49950. It at least solves the
problem for the reproducer in PR49950. However, that reproducer
need five passes in a specific order, so there are lots of various
"solutions" that could avoid the crash without actually fixing the
root cause.

This is a reapply of commit 3655f0757f, that was reverted in
33ff3c2049 due to problems with assertions in the polly
lit tests. That problem is supposed to be solved by also adjusting
ScopPass to explicitly preserve LazyBlockFrequencyInfo and
LazyBranchProbabilityInfo (it already preserved
OptimizationRemarkEmitter which depends on those lazy passes).

Differential Revision: https://reviews.llvm.org/D100958
2021-05-05 15:17:55 +02:00
Bjorn Pettersson 33ff3c2049 Revert "Make dependency between certain analysis passes transitive"
This reverts commit 3655f0757f.

It caused assertion failures related to setLastUser in polly builds.
2021-05-04 19:08:41 +02:00
Bjorn Pettersson 3655f0757f Make dependency between certain analysis passes transitive
LazyBlockFrequenceInfoPass, LazyBranchProbabilityInfoPass and
LoopAccessLegacyAnalysis all cache pointers to their nestled required
analysis passes. One need to use addRequiredTransitive to describe
that the nestled passes can't be freed until those analysis passes
no longer are used themselves.

There is still a bit of a mess considering the getLazyBPIAnalysisUsage
and getLazyBFIAnalysisUsage functions. Those functions are used from
both Transform, CodeGen and Analysis passes. I figure it is OK to
use addRequiredTransitive also when being used from Transform and
CodeGen passes. On the other hand, I figure we must do it when
used from other Analysis passes. So using addRequiredTransitive should
be more correct here. An alternative solution would be to add a
bool option in those functions to let the user tell if it is a
analysis pass or not. Since those lazy passes will be obsolete when
new PM has conquered the world I figure we can leave it like this
right now.

Intention with the patch is to fix PR49950. It at least solves the
problem for the reproducer in PR49950. However, that reproducer
need five passes in a specific order, so there are lots of various
"solutions" that could avoid the crash without actually fixing the
root cause.

Differential Revision: https://reviews.llvm.org/D100958
2021-05-04 11:50:08 +02:00
Simon Moll 1db4dbba24 Recommit "[VP,Integer,#2] ExpandVectorPredication pass"
This reverts the revert 02c5ba8679

Fix:

Pass was registered as DUMMY_FUNCTION_PASS causing the newpm-pass
functions to be doubly defined. Triggered in -DLLVM_ENABLE_MODULE=1
builds.

Original commit:

This patch implements expansion of llvm.vp.* intrinsics
(https://llvm.org/docs/LangRef.html#vector-predication-intrinsics).

VP expansion is required for targets that do not implement VP code
generation. Since expansion is controllable with TTI, targets can switch
on the VP intrinsics they do support in their backend offering a smooth
transition strategy for VP code generation (VE, RISC-V V, ARM SVE,
AVX512, ..).

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D78203
2021-05-04 11:47:52 +02:00
Arthur Eubanks d14d84af2f [NewPM] Only invalidate modified functions' analyses in CGSCC passes
Previously, any change in any function in an SCC would cause all
analyses for all functions in the SCC to be invalidated. With this
change, we now manually invalidate analyses for functions we modify,
then let the pass manager know that all function analyses should be
preserved.

So far this only touches the inliner, argpromotion, funcattrs, and
updateCGAndAnalysisManager(), since they are the most used.

Slight compile time improvements:
http://llvm-compile-time-tracker.com/compare.php?from=326da4adcb8def2abdd530299d87ce951c0edec9&to=8942c7669f330082ef159f3c6c57c3c28484f4be&stat=instructions

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D100917
2021-05-03 17:21:44 -07:00
Philip Reames e38ccb729b Recommit "Generalize getInvertibleOperand recurrence handling slightly"
This was reverted because of a reported problem.  It turned out this patch didn't introduce said problem, it just exposed it more widely.  15a4233 fixes the root issue, so this simple a) rebases over that, and b) adds a much more extensive comment explaining why that weakened assert is correct.

Original commit message follows:

Follow up to D99912, specifically the revert, fix, and reapply thereof.

This generalizes the invertible recurrence logic in two ways:
* By allowing mismatching operand numbers of the phi, we can recurse through a pair of phi recurrences whose operand orders have not been canonicalized.
* By allowing recurrences through operand 1, we can invert these odd (but legal) recurrence.

Differential Revision: https://reviews.llvm.org/D100884
2021-05-03 16:40:56 -07:00
Sanjay Patel 15a42339fe [ValueTracking] soften assert for invertible recurrence matching
There's a TODO comment in the code and discussion in D99912
about generalizing this, but I wasn't sure how to implement that,
so just going with a potential minimal fix to avoid crashing.

The test is a reduction beyond useful code (there's no user of
%user...), but it is based on https://llvm.org/PR50191, so this
is asserting on real code.

Differential Revision: https://reviews.llvm.org/D101772
2021-05-03 15:57:40 -04:00
Juneyoung Lee d4d1caafc8 Fix MSan crash after 1977c53b 2021-05-02 13:44:43 +09:00
Arthur Eubanks 07a9df5993 [NFC] Use getParamByValType instead of pointee type
To reduce dependence on pointee types for opaque pointers.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D101706
2021-05-01 21:22:41 -07:00
Juneyoung Lee 7257e6a68a [ValueTracking] ctpop propagates poison
This is a patch that adds ctpop intrinsics to propagatesPoison.

Splitted from D101191
2021-05-02 13:04:37 +09:00
Juneyoung Lee 64e768e816 [ValueTracking] Improve impliesPoison to look into overflow intrinsics
This update supports the following transformation:

```
select(extract(mul_with_overflow(a, _), _), (a == 0), false)
=>
and(extract(mul_with_overflow(a, _), _), (a == 0))
```

which is correct because if `a` was poison the select's condition was
also poison.

This update is splitted from D101423.
2021-05-02 12:03:55 +09:00
Juneyoung Lee 1977c53b2a [InstCombine] Fold overflow bit of [u|s]mul.with.overflow in a poison-safe way
As discussed in D101191, this patch adds a poison-safe folding of overflow bit check:
```
  %Op0 = icmp ne i4 %X, 0
  %Agg = call { i4, i1 } @llvm.[us]mul.with.overflow.i4(i4 %X, i4 %Y)
  %Op1 = extractvalue { i4, i1 } %Agg, 1
  %ret = select i1 %Op0, i1 %Op1, i1 false
=>
  %Y.fr = freeze %Y
  %Agg = call { i4, i1 } @llvm.[us]mul.with.overflow.i4(i4 %X, i4 %Y.fr)
  %Op1 = extractvalue { i4, i1 } %Agg, 1
  %ret = %Op1
```

https://alive2.llvm.org/ce/z/zgPUGT
https://alive2.llvm.org/ce/z/h2gZ_6

Note that there are cases where inserting freeze is not necessary: e.g. %Y is `noundef`.
In this case, LLVM is already good because `%ret` is already successfully folded into `and`,
triggering the pre-existing optimization in InstSimplify: https://godbolt.org/z/v6qena15K

Differential Revision: https://reviews.llvm.org/D101423
2021-05-02 11:54:12 +09:00
Nikita Popov db9d00c5e7 [LVI] Handle mask not equal zero conditions
If V & Mask != 0, we know that at least one of the bits in Mask
must be set, so the value must be >= the lowest bit in Mask.
2021-05-01 23:08:49 +02:00
Nikita Popov cc58e8918b [SCEV] Simplify backedge count clearing (NFC)
This seems to be a leftover from when the BackedgeTakenInfo
stored multiple exit counts with manual memory management. At
some point this was switchted to a simple vector, and there should
be no need to micro-manage the clearing anymore. We can simply
drop the loop from the map and the the destructor do its job.
2021-05-01 17:50:01 +02:00
Adrian Prantl 02c5ba8679 Revert "[VP,Integer,#2] ExpandVectorPredication pass"
This reverts commit 43bc584dc0.

The commit broke the -DLLVM_ENABLE_MODULES=1 builds.

http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/31603/consoleFull#2136199809a1ca8a51-895e-46c6-af87-ce24fa4cd561
2021-04-30 17:02:28 -07:00
Nikita Popov fe230dc197 [ValueTracking] Slightly clean up programUndefinedIfUndefOrPoison() (NFC)
Use contains() to check set membership, and adjust an oddly
structured loop.
2021-04-30 23:05:41 +02:00
Nikita Popov 2cd7868605 [ValueTracking] Limit scan when checking poison UB (PR50155)
The current code can scan an unlimited number of instructions,
if the containing basic block is very large. The test case from
PR50155 contains a basic block with approximately 100k instructions.

To avoid this, limit the number of instructions we inspect. At
the same time, drop the limit on the number of basic blocks, as
this will be implicitly limited by the number of instructions as
well.
2021-04-30 23:04:49 +02:00
Duncan P. N. Exon Smith 518d955f9d Support: Stop using F_{None,Text,Append} compatibility synonyms, NFC
Stop using the compatibility spellings of `OF_{None,Text,Append}`
left behind by 1f67a3cba9. A follow-up
will remove them.

Differential Revision: https://reviews.llvm.org/D101650
2021-04-30 11:00:03 -07:00
Simon Moll 43bc584dc0 [VP,Integer,#2] ExpandVectorPredication pass
This patch implements expansion of llvm.vp.* intrinsics
(https://llvm.org/docs/LangRef.html#vector-predication-intrinsics).

VP expansion is required for targets that do not implement VP code
generation. Since expansion is controllable with TTI, targets can switch
on the VP intrinsics they do support in their backend offering a smooth
transition strategy for VP code generation (VE, RISC-V V, ARM SVE,
AVX512, ..).

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D78203
2021-04-30 15:47:28 +02:00
Roman Lebedev ba5b015b0d
[InlineCost] CallAnalyzer: use TTI info for extractvalue - they are free (PR50099)
It seems incorrect to use TTI data in some places,
and override it in others. In this case, TTI says
that `extractvalue` are free, yet we bill them.

While this doesn't address https://bugs.llvm.org/show_bug.cgi?id=50099 yet,
it reduces the cost from 55 to 50 while the threshold is 45.

Differential Revision: https://reviews.llvm.org/D101228
2021-04-30 13:55:11 +03:00
Arthur Eubanks a3a798d49d [InlineCost] Remove visitUnaryInstruction()
The simplifyInstruction() in visitUnaryInstruction() does not trigger
for all of check-llvm. Looking at all delegates to UnaryInstruction in
InstVisitor, the only instructions that either don't have a visitor in
CallAnalyzer, or redirect to UnaryInstruction, are VAArgInst and Alloca.
VAArgInst will never get simplified, and visitUnaryInstruction(Alloca)
would always return false anyway.

Reviewed By: mtrofin, lebedev.ri

Differential Revision: https://reviews.llvm.org/D101577
2021-04-29 20:33:30 -07:00
jasonliu 7049fbf960 [XCOFF] Handle the case when personality routine is an alias
Summary:
Personality routine could be an alias to another personality routine.
Fix the situation when we compile the file that contains the personality
routine and the file also have functions that need to refer to the
personality routine.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D101401
2021-04-29 22:03:30 +00:00
Philip Reames a047837b90 Revert "Generalize getInvertibleOperand recurrence handling slightly"
This reverts commit 0c01b37eeb while a problem reported is investigated.
2021-04-29 13:06:26 -07:00
Sanjay Patel 1089158c5a [ConstantFolding] propagate poison through vector reduction intrinsics 2021-04-29 12:54:20 -04:00
Sanjay Patel 71597d40e8 [ConstantFolding] refactor helper for vector reductions; NFC
We should handle other cases (undef/poison), so reduce
the duplication of repeated switches.
2021-04-29 12:09:22 -04:00
Craig Topper 25391cec3a [RISCV] Teach computeKnownBits that vsetvli returns number less than 2^31.
This seems like a reasonable upper bound on VL. WG discussions for
the V spec would probably allow us to use 2^16 as an upper bound
on VLEN, but this is good enough for now.

This allows us to remove sext and zext if user happens to assign
the size_t result into an int and then uses it as a VL intrinsic
argument which is size_t.

Reviewed By: frasercrmck, rogfer01, arcbbb

Differential Revision: https://reviews.llvm.org/D101472
2021-04-29 08:07:59 -07:00
Philip Reames 0c01b37eeb Generalize getInvertibleOperand recurrence handling slightly
Follow up to D99912, specifically the revert, fix, and reapply thereof.

This generalizes the invertible recurrence logic in two ways:
* By allowing mismatching operand numbers of the phi, we can recurse through a pair of phi recurrences whose operand orders have not been canonicalized.
* By allowing recurrences through operand 1, we can invert these odd (but legal) recurrence.

Differential Revision: https://reviews.llvm.org/D100884
2021-04-28 14:38:07 -07:00
Philip Reames 0cc3e10f5e [SCEV] Avoid range intersection idiom in getRangeForUnkownRecurrence [NFC]
Addresses a review comment from D101181
2021-04-28 12:48:17 -07:00
Philip Reames a836de0bde [SCEV] Compute ranges for ashr recurrences
Straight forward extension to the recently added infrastructure which was pioneered with shl. This was originally posted as part of D99687, but split off for ease of review.

(I also decided to exclude the unknown start sign case explicitly for simplicity of understanding.)

Differential Revision: https://reviews.llvm.org/D101181
2021-04-28 12:36:20 -07:00
Florian Hahn 1ed7f8ede5
[LAA] Support pointer phis in loop by analyzing each incoming pointer.
SCEV does not look through non-header PHIs inside the loop. Such phis
can be analyzed by adding separate accesses for each incoming pointer
value.

This results in 2 more loops vectorized in SPEC2000/186.crafty and
avoids regressions when sinking instructions before vectorizing.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D101286
2021-04-28 20:19:40 +01:00
Arthur Eubanks cbce28f07e [ConstFold] Use const-folded operands in more places
Previously we were const folding operands but not passing them.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101394
2021-04-27 14:30:19 -07:00
Nikita Popov e45168c4fa [SCEV] Handle uge/ugt predicates in applyLoopGuards()
These can be handled the same way as ule/ult, just using umax
instead of umin. This is useful in cases where the umax prevents
the upper bound from overflowing.

Differential Revision: https://reviews.llvm.org/D101196
2021-04-27 22:41:05 +02:00
Andy Kaylor 0a82d885a4 [Dependence Analysis] Fix ExactSIV producing wrong analysis
Patch by Artem Radzikhovskyy!

Symptom: ExactSIV test produced incorrect analysis of dependencies see LIT tests
Bug: At the end of the algorithm when determining dependence direction original author forgot to divide intermediate results by gcd and round result toward zero

Although this bug can be fixed with significantly fewer changes I opted to write the code in such a way that reflects the original algorithm that Banerjee proposed, for easier reference in the future. This surprisingly results in shorter code, and fewer quotient and max/min calculations.

Changes Summary:

- fixed findGCD to return valid x and y so that they match the function description where: ax - by = gcd(a,b)
- Fixed ExactSIV test, to produce proper results
- Documented the extension of Banerjee's algorithm that the original code author introduced. Banerjee's original algorithm only tested whether Dst depends on Src, the extension also allows us to test whether Src depends on Dst, in one pass.
- ExactRDIV test worked fine. Since it uses findGCD(), it needed to be updated.Since ExactRDIV test has very few changes from the core algorithm of ExactSIV I modified the test to have consistent format as ExactSIV.
- Updated the LIT tests to be testing for correct values.

Differential Revision: https://reviews.llvm.org/D100331
2021-04-27 12:24:00 -07:00
dfukalov e4c606acaf [TTI] NFC: Change getScalarizationOverhead and getOperandsScalarizationOverhead to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D101283
2021-04-27 08:51:48 +03:00
Hongtao Yu 30bb5be389 [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 2.
As a follow-up to D95982, this patch continues unblocking optimizations that are blocked by pseudu probe instrumention.

The optimizations unblocked are:
		- In-block load propagation.
		- In-block dead store elimination
		- Memory copy optimization that turns stores to consecutive memories into a memset.

These optimizations are local to a block, so they shouldn't affect the profile quality.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D100075
2021-04-26 16:52:33 -07:00
Vineet Kumar 84d16e2055 Implementation for TargetTransformInfo::hasActiveVectorLength()
This patch adds the missing implementation for
TargetTransformInfo::hasActiveVectorLength() without which using
hasActiveVectorLength() causes linker error.

Patch by Vineet Kumar!

Differential Revision: https://reviews.llvm.org/D100941
2021-04-26 21:20:05 +00:00
Nikita Popov a5051f2fa2 [SCEV] Fix applyLoopGuards() chaining for ne predicates
ICMP_NE predicates directly overwrote the rewritten result,
instead of chaining it with previous rewrites, as was done for
ICMP_ULT and ICMP_ULE. This means that some guards were effectively
discarded, depending on their order.
2021-04-24 21:43:46 +02:00
dfukalov 6c57044231 [GVN] Clobber partially aliased loads.
Use offsets stored in `AliasResult` implemented in D98718.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95543
2021-04-24 14:14:20 +03:00
Sander de Smalen f9a50f04ba [TTI] NFC: Change getIntImmCost[Inst|Intrin] to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100565
2021-04-23 16:06:36 +01:00
Sander de Smalen 43ace8b5ce [TTI] NFC: Change getScalingFactorCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100564
2021-04-23 16:06:36 +01:00
Sander de Smalen 008a072ded [TTI] NFC: Change getMemcpyCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100563
2021-04-23 16:06:35 +01:00
Sander de Smalen 9ba07f37f8 [TTI] NFC: Change getGEPCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100562
2021-04-23 16:06:35 +01:00
Sander de Smalen e0edfa052f [TTI] NFC: Change getAddressComputationCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100561
2021-04-23 16:06:35 +01:00
dfukalov 9ab17a60eb [TTI] NFC: Use InstructionCost to store ScalarizationCost in IntrinsicCostAttributes.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D101151
2021-04-23 18:02:00 +03:00
Philip Reames 424d6cb902 [SCEV] Compute ranges for lshr recurrences
Straight forward extension to the recently added infrastructure which was pioneered with shl.

Differential Revision: https://reviews.llvm.org/D99687
2021-04-22 11:06:31 -07:00
Wenlei He dff8315892 [CSSPGO][llvm-profdata] Support trimming cold context when merging profiles
The change adds support for triming and merging cold context when mergine CSSPGO profiles using llvm-profdata. This is similar to the context profile trimming in llvm-profgen, however the flexibility to trim cold context after profile is generated can be useful.

Differential Revision: https://reviews.llvm.org/D100528
2021-04-22 00:42:37 -07:00
Sanjay Patel 5e6dc5e404 [InstSimplify] generalize ctlz-of-shifted-constant
https://alive2.llvm.org/ce/z/zWL_VQ
2021-04-21 14:23:55 -04:00
Nico Weber ba7a92c01e [Support] Don't include VirtualFileSystem.h in CommandLine.h
CommandLine.h is indirectly included in ~50% of TUs when building
clang, and VirtualFileSystem.h is large.

(Already remarked by jhenderson on D70769.)

No behavior change.

Differential Revision: https://reviews.llvm.org/D100957
2021-04-21 10:19:01 -04:00
Yang Fan 4307446e9f
[SCEV] Fix -Wunused-variable warning (NFC)
GCC warning:
```
/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp: In member function ‘const llvm::SCEV* llvm::ScalarEvolution::getLosslessPtrToIntExpr(const llvm::SCEV*, unsigned int)::SCEVPtrToIntSinkingRewriter::visitUnknown(const llvm::SCEVUnknown*)’:
/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:1152:13: warning: unused variable ‘ExprPtrTy’ [-Wunused-variable]
 1152 |       Type *ExprPtrTy = Expr->getType();
      |             ^~~~~~~~~
```
2021-04-21 16:01:46 +08:00
Nikita Popov de18fa9e52 Revert "[InstSimplify] Bypass no-op `and`-mask, using known bits (PR49543)"
This reverts commit ea1a0d7c9a.

While this is strictly more powerful, it is also strictly slower.
InstSimplify intentionally does not perform many folds that it
is allowed to perform, if doing so requires a KnownBits calculation
that will be repeated in InstCombine.

Maybe it's worthwhile to do this here, but that needs a more
explicitly stated motivation, evaluated in a review.
2021-04-21 09:55:25 +02:00
Philip Reames 4824d876f0 Revert "Allow invokable sub-classes of IntrinsicInst"
This reverts commit d87b9b81cc.

Post commit review raised concerns, reverting while discussion happens.
2021-04-20 15:38:38 -07:00
Philip Reames d87b9b81cc Allow invokable sub-classes of IntrinsicInst
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.

This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.

After this lands and has baked for a couple days, planned cleanups:
    Make GCStatepointInst a IntrinsicInst subclass.
    Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
    Do the same in SelectionDAG.
    Do the same in FastISEL.

Differential Revision: https://reviews.llvm.org/D99976
2021-04-20 15:03:49 -07:00
Roman Lebedev ea1a0d7c9a
[InstSimplify] Bypass no-op `and`-mask, using known bits (PR49543)
We already special-cased a few interesting patterns,
but that is strictly less powerful than using KnownBits.

So instead get the known bits for the operand of `and`,
and iff all the unset bits of the `and`-mask are known to be zeros
in the operand, we can omit said `and`.
2021-04-21 00:31:46 +03:00
Philip Reames 6792e26c0d Reapply "Look through invertible recurrences in isKnownNonEqual"
I'd reverted this in commit 3b6acb1797 due to buildbot failures.  This patch contains the fix for said issue.  I'd forgotten to handle the case where two phis in the same block have different operand order.  We canonicalize away from this, but it's still valid IR.  The tests included in this change (as opposed to simply having test output changed), crashed without the fix.

Original commit message follows...

This extends the phi handling in isKnownNonEqual with a special case based on invertible recurrences. If we can prove the recurrence is invertible (which many common ones are), we can recurse through the start operands of the recurrence skipping the phi cycle.

(Side note: Instcombine currently does not push back through these cases. I will implement that in a follow up change w/separate review.)

Differential Revision: https://reviews.llvm.org/D99912
2021-04-20 12:47:59 -07:00
Philip Reames 3b6acb1797 Revert "Look through invertible recurrences in isKnownNonEqual"
This reverts commit be20eae25f.  It appears to have caused a crash on a buildbot (https://lab.llvm.org/buildbot#builders/77/builds/5653).  Reverting while investigating.
2021-04-20 11:47:10 -07:00
Philip Reames 9c1a145aeb Rearrange code to reduce diff for D99687 [nfc]
Adding the switches to reduce diffs.  I'm about to split that into an lshr part and an ashr part, doing the NFC part first makes it easier to maintain both diffs.
2021-04-20 11:40:15 -07:00
Roman Lebedev 7186764884
[NFC][SCEV] Split getLosslessPtrToIntExpr out of getPtrToIntExpr() 2021-04-20 21:29:21 +03:00
Philip Reames be20eae25f Look through invertible recurrences in isKnownNonEqual
This extends the phi handling in isKnownNonEqual with a special case based on invertible recurrences. If we can prove the recurrence is invertible (which many common ones are), we can recurse through the start operands of the recurrence skipping the phi cycle.

(Side note: Instcombine currently does not push back through these cases. I will implement that in a follow up change w/separate review.)

Differential Revision: https://reviews.llvm.org/D99912
2021-04-20 10:52:22 -07:00
Dávid Bolvanský 319c9f6e58 [MemoryBuiltins] Added support for memalign
memalign is older aligned_alloc.
2021-04-20 12:39:54 +02:00
Joe Ellis effacc1599 [AArch64] Constant fold sve_convert_from_svbool(zero) to zero
Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D100463
2021-04-20 10:02:49 +00:00
Arthur Eubanks 5e71b9fa93 Explicitly pass type to cast load constant folding result
Previously we would use the type of the pointee to determine what to
cast the result of constant folding a load. To aid with opaque pointer
types, we should explicitly pass the type of the load rather than
looking at pointee types.

ConstantFoldLoadThroughBitcast() converts the const prop'd value to the
proper load type (e.g. [1 x i32] -> i32). Instead of calling this in
every intermediate step like bitcasts, we only call this when we
actually see the global initializer value.

In some existing uses of this API, we don't know the exact type we're
loading from immediately (e.g. first we visit a bitcast, then we visit
the load using the bitcast). In those cases we have to manually call
ConstantFoldLoadThroughBitcast() when simplifying the load to make sure
that we cast to the proper type.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D100718
2021-04-20 00:53:21 -07:00
Sanjay Patel 9d43f6d7ce [LowerConstantIntrinsics] avoid crashing on alloca with unexpected operand type
The test here is reduced from the fuzzer-generated crasher in:
https://llvm.org/PR50023
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33395

I don't know if this is the best or complete solution, but the
zext of the i42 type appears to match the behavior if we run a
weird type example like this through the IR optimizer with -O1.

Differential Revision: https://reviews.llvm.org/D100766
2021-04-19 13:06:29 -04:00
Roman Lebedev 41c22acc22
[NFC][SCEV] Assert that we don't try to create SCEVPtrToIntExpr of a non-integral pointer
ptr<->int casts are only valid for integral pointes,
defensively assert that we don't try to break that here.
2021-04-19 18:38:38 +03:00
Simon Pilgrim ddcdeae358 [Analysis] ImportedFunctionsInliningStatistics.h - add <memory> and remove unused <string> include. NFCI.
Move <string> include to ImportedFunctionsInliningStatistics.cpp and add missing <memory> include as we have explicit uses of std::unique_ptr in the header.
2021-04-19 16:20:56 +01:00
Cullen Rhodes f0bc2782f2 [TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D100377
2021-04-19 11:01:34 +00:00
Roman Lebedev d480f968ad
Revert "[SCEV] Model `ashr exact x, C` as `(abs(x) EXACT/u (1<<C)) * signum(x)`"
As being discussed in https://reviews.llvm.org/D100721,
this modelling is lossy, we can't reconstruct `ash`/`ashr exact`
from it, which means that whenever we actually expand the IR,
we've just pessimized the code..

It would be good to model this pattern, after all it comes up every time
you want to compute a distance between two pointers, but not at this cost.

This reverts commit ec54867df5.
2021-04-18 16:26:45 +03:00
Serge Guelton d6de1e1a71 Normalize interaction with boolean attributes
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from

        if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

to

        if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

Introduce a getValueAsBool that normalize the check, with the following
behavior:

no attributes or attribute set to "false" => return false
attribute set to "true" => return true

Differential Revision: https://reviews.llvm.org/D99299
2021-04-17 08:17:33 +02:00
Thomas Lively 5c729750a6 [WebAssembly] Remove saturating fp-to-int target intrinsics
Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead.
This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u},
which are now represented in IR as a saturating truncation to a v2i32 followed by
a concatenation with a zero vector.

Differential Revision: https://reviews.llvm.org/D100596
2021-04-16 12:11:20 -07:00
Sanjay Patel bb907b26e2 [ValueTracking] don't recursively compute known bits using multiple llvm.assumes
This is an alternative to D99759 to avoid the compile-time explosion seen in:
https://llvm.org/PR49785

Another potential solution would make the exclusion logic stronger to avoid
blowing up, but note that we reduced the complexity of the exclusion mechanism
in D16204 because it was too costly.

So I'm questioning the need for recursion/exclusion entirely - what is the
optimization value vs. cost of recursively computing known bits based on
assumptions?
This was built into the implementation from the start with 60db058,
and we have kept adding code/cost to deal with that capability.

By clearing the query's AssumptionCache inside computeKnownBitsFromAssume(),
this patch retains all existing assume functionality except refining known
bits based on even more assumptions.

We have 1 regression test that shows a difference in optimization power.

Differential Revision: https://reviews.llvm.org/D100573
2021-04-16 08:43:35 -04:00
Mircea Trofin 0d06b14f59 [MLGO] Fix use of AM.invalidate post D100519
The ML inline advisors more aggressively invalidate certain analyses
after each call site inlining, to more accurately capture the problem
state.
2021-04-15 18:45:39 -07:00
Arthur Eubanks c8f0a7c215 [NewPM] Cleanup IR printing instrumentation
Being lazy with printing the banner seems hard to reason with, we should print it
unconditionally first (it could also lead to duplicate banners if we
have multiple functions in -filter-print-funcs).

The printIR() functions were doing too many things. I separated out the
call from PrintPassInstrumentation since we were essentially doing two
completely separate things in printIR() from different callers.

There were multiple ways to generate the name of some IR. That's all
been moved to getIRName(). The printing of the IR name was also
inconsistent, now it's always "IR Dump on $foo" where "$foo" is the
name. For a function, it's the function name. For a loop, it's what's
printed by Loop::print(), which is more detailed. For an SCC, it's the
list of functions in parentheses. For a module it's "[module]", to
differentiate between a possible SCC with a function called "module".

To preserve D74814, we have to check if we're going to print anything at
all first. This is unfortunate, but I would consider this a special
case that shouldn't be handled in the core logic.

Reviewed By: jamieschmeiser

Differential Revision: https://reviews.llvm.org/D100231
2021-04-15 09:50:55 -07:00
dfukalov ce1626f34a [AA] Updates for D95543.
Addressing latter comments in D95543:
- `AliasResult::Result` renamed to `AliasResult::Kind`
- Offset printing added for `PartialAlias` case in `-aa-eval`
- Removed VisitedPhiBBs check from BasicAA'

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D100454
2021-04-15 12:22:03 +03:00
Nikita Popov a1ed025d0e Revert "[SCEV] Don't walk uses of phis without SCEV expression when forgetting"
This reverts commit faf9f11589.

Issues with this patch have been reported in
https://reviews.llvm.org/D100264#2689917 and
https://bugs.llvm.org/show_bug.cgi?id=49967.
2021-04-15 09:43:52 +02:00
William S. Moses d3e2b4c0a2 [SROA][TBAA] Handle shift of regular TBAA nodes
SROA shifts TBAA nodes in a way that may present a problem for !tbaa but not !tbaa.struct nodes.

Differential Revision: https://reviews.llvm.org/D99851
2021-04-14 14:35:20 -04:00
Nikita Popov 0d91075f77 [ValueTracking] Don't require strictly positive for mul nsw recurrence
Just like in the mul nuw case, it's sufficient that the step is
non-zero. If the step is negative, then the values will jump
between positive and negative, "crossing" zero, but the value of
the recurrence is never actually zero.
2021-04-14 19:39:59 +02:00
Nikita Popov 5c0fb026c9 [ValueTracking] Don't require non-zero step for add nuw
It's okay if the step is zero, we'll just stay at the same non-zero
value in that case. The valuable part of this is that the step
doesn't even need to be a constant anymore.
2021-04-14 19:06:18 +02:00
Sander de Smalen 4f42d873c2 [TTI] NFC: Change getArithmeticInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100317
2021-04-14 17:20:36 +01:00
Sander de Smalen d84bd951a8 [TTI] NFC: Change getFPOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: c-rhodes

Differential Revision: https://reviews.llvm.org/D100316
2021-04-14 17:20:36 +01:00
Sander de Smalen 1af35e77f4 [TTI] NFC: Change getVectorInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100315
2021-04-14 17:20:35 +01:00
Sander de Smalen 174e8f6c5e [TTI] NFC: Change getShuffleCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100314
2021-04-14 17:20:35 +01:00
Sander de Smalen 14b934f8a6 [TTI] NFC: Change getCFInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D100313
2021-04-14 17:20:34 +01:00
Sander de Smalen 596f669cfb [TTI] NFC: Change getCallInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: c-rhodes

Differential Revision: https://reviews.llvm.org/D100312
2021-04-14 17:20:34 +01:00
Sanjay Patel 7ef2c68a3d [InstSimplify] improve efficiency for detecting non-zero value
Stepping through callstacks in the example from D99759 reveals
this potential compile-time improvement.

The savings come from avoiding ValueTracking's computing known
bits if we have already dealt with special-case patterns.

Further improvements in this direction seem possible.

This makes a degenerate test based on PR49785 about 40x faster
(25 sec -> 0.6 sec), but it does not address the larger question
of how to limit computeKnownBitsFromAssume(). Ie, the original
test there is still infinite-time for all practical purposes.

Differential Revision: https://reviews.llvm.org/D100408
2021-04-14 09:04:15 -04:00
Sanjay Patel 5ae5d25e38 [ValueTracking] match negative-stepping non-zero recurrence
This is pulled out of D100408.

This avoids a regression that would be exposed by making the
calling code from InstSimplify more efficient.
2021-04-14 08:57:53 -04:00
Sanjay Patel 4919365397 [ValueTracking] reduce code duplication; NFC
The start value can't be null for something to be a non-zero
recurrence, so hoist that common check out of the switch.

Subsequent checks may be incomplete or over-specified as noted in:
D100408
2021-04-14 08:32:42 -04:00
Philip Reames 00c8be3f93 fix whitespace type 2021-04-13 19:02:41 -07:00
Nikita Popov faf9f11589 [SCEV] Don't walk uses of phis without SCEV expression when forgetting
I've run into some cases where a large fraction of compile-time is
spent invalidating SCEV. One of the causes is forgetLoop(), which
walks all values that are def-use reachable from the loop header
phis. When invalidating a topmost loop, that might be close to all
values in a function. Additionally, it's fairly common for there to
not actually be anything to invalidate, but we'll still be performing
this walk again and again.

My first thought was that we don't need to continue walking the uses
if the current value doesn't have a SCEV expression. However, this
isn't quite right, because SCEV construction can skip over values
(e.g. for a chain of adds, we might only create a SCEV expression
for the final value).

What this patch does instead is to only walk the (full) def-use chain
of loop phis that have a SCEV expression. If there's no expression
for a phi, then we also don't have any dependent expressions to
invalidate.

Differential Revision: https://reviews.llvm.org/D100264
2021-04-13 20:28:17 +02:00
Sander de Smalen 03f47bdcb1 [TTI] NFC: Change get[Interleaved]MemoryOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100205
2021-04-13 14:21:02 +01:00
Sander de Smalen d676b5749d [TTI] NFC: Change getMaskedMemoryOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100204
2021-04-13 14:21:01 +01:00
Sander de Smalen db134e2428 [TTI] NFC: Change getCmpSelInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100203
2021-04-13 14:21:01 +01:00
Sander de Smalen 2285dfb73f [TTI] NFC: Change getMinMaxReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100202
2021-04-13 14:21:00 +01:00
Sander de Smalen bd86824d98 [TTI] NFC: Change getArithmeticReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

This patch is practically NFC, with the exception of an AArch64 SVE related
cost-model change, where we can now return an Invalid cost instead of some
bogus number.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100201
2021-04-13 14:20:59 +01:00
Sander de Smalen fd1f8a5462 [TTI] NFC: Change getGatherScatterOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100200
2021-04-13 14:20:59 +01:00
Sander de Smalen 92d8421f49 [TTI] NFC: Change getCastInstrCost and getExtractWithExtendCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100199
2021-04-13 14:20:58 +01:00
Gulfem Savrun Yeniceri e96df3e531 [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-04-13 01:29:41 +00:00
Yuanfang Chen c5fda0e662 Reland "Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom""
This reverts commit a3fabc79ae (relands
f4d682d6ce with fix for the compile-time
regression issue).
2021-04-12 14:50:54 -07:00
Nikita Popov a3fabc79ae Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom"
This reverts commit f4d682d6ce.

This caused a significant compile-time regression:
https://llvm-compile-time-tracker.com/compare.php?from=4b7bad9eaea2233521a94f6b096aaa88dc584e23&to=f4d682d6ce6c5b3a41a0acf297507c82f5c21eef&stat=instructions

Possibly this is due to overeager parsing of target triples.
2021-04-12 22:55:59 +02:00
Arthur Eubanks 269b335bd7 [Inliner] Propagate SROA analysis through invariant group intrinsics
SROA can handle invariant group intrinsics, let the inliner know that
for better heuristics when the intrinsics are present.

This fixes size issues in a couple files when turning on
-fstrict-vtable-pointers in Chrome.

Reviewed By: rnk, mtrofin

Differential Revision: https://reviews.llvm.org/D100249
2021-04-12 10:54:22 -07:00
Hamza Sood 0a92aff721 Replace uses of std::iterator with explicit using
This patch removes all uses of `std::iterator`, which was deprecated in C++17.
While this isn't currently an issue while compiling LLVM, it's useful for those using LLVM as a library.

For some reason there're a few places that were seemingly able to use `std` functions unqualified, which no longer works after this patch. I've updated those places, but I'm not really sure why it worked in the first place.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D67586
2021-04-12 10:47:14 -07:00
Yuanfang Chen f4d682d6ce [InstCombine] when calling conventions are compatible, don't convert the call to undef idiom
D24453 enabled libcalls simplication for ARM PCS. This may cause
caller/callee calling conventions mismatch in some situations such as
LTO. This patch makes instcombine aware that the compatible calling
conventions differences are benign (not emitting undef idom).

Differential Revision: https://reviews.llvm.org/D99773
2021-04-12 09:32:23 -07:00
Roman Lebedev 6d44b3c56d
[NFCI][DomTreeUpdater] applyUpdates(): reserve space for updates first
While, indeed, we may end up pushing less updates that we'd reserve space
for, self-dominating updates aren't often enough for that to matter.
But this should matter for normal updates.
2021-04-11 23:56:22 +03:00
Roman Lebedev 9829f5e6b1
[CVP] @llvm.[us]{min,max}() intrinsics handling
If we can tell that either one of the arguments is taken,
bypass the intrinsic.

Notably, we are indeed fine with non-strict predicate:
* UL: https://alive2.llvm.org/ce/z/69qVW9 https://alive2.llvm.org/ce/z/kNFTKf
      https://alive2.llvm.org/ce/z/AvaPw2 https://alive2.llvm.org/ce/z/oxo53i
* UG: https://alive2.llvm.org/ce/z/wxHeGH https://alive2.llvm.org/ce/z/Lf76qx
* SL: https://alive2.llvm.org/ce/z/hkeTGS https://alive2.llvm.org/ce/z/eR_b-W
* SG: https://alive2.llvm.org/ce/z/wEqRm7 https://alive2.llvm.org/ce/z/FpAsVr

Much like with all other comparison handling in CVP,
while we could sort-of handle two Value's,
at least for plain ICmpInst it does not appear to be worthwhile.

This only fires 78 times on test-suite + dt + rs,
but we don't canonicalize to these yet. (only SCEV produces them)
2021-04-11 00:33:47 +03:00
Nikita Popov 8de2f1ff79 [IVUsers] Check LoopSimplify cache earlier (NFC)
Check the cache before calling isLoopSimplifyForm(). Otherwise we'd
always perform the check for the innermost loop and only skip it
for dominating loops.
2021-04-10 22:58:13 +02:00
Roman Lebedev e8c7f43e2c
[NFC][ConstantRange] Add 'icmp' helper method
"Does the predicate hold between two ranges?"

Not very surprisingly, some places were already doing this check,
without explicitly naming the algorithm, cleanup them all.
2021-04-10 19:38:55 +03:00
Roman Lebedev 7b12c8c59d
Revert "[NFC][ConstantRange] Add 'icmp' helper method"
This reverts commit 17cf2c9423.
2021-04-10 19:37:53 +03:00
Roman Lebedev 17cf2c9423
[NFC][ConstantRange] Add 'icmp' helper method
"Does the predicate hold between two ranges?"

Not very surprisingly, some places were already doing this check,
without explicitly naming the algorithm, cleanup them all.
2021-04-10 19:09:52 +03:00
dfukalov 8f4b7e94a2 [AMDGPU][CostModel] Refine cost model for control-flow instructions.
Added cost estimation for switch instruction, updated costs of branches, fixed
phi cost.
Had to increase `-amdgpu-unroll-threshold-if` default value since conditional
branch cost (size) was corrected to higher value.
Test renamed to "control-flow.ll".

Removed redundant code in `X86TTIImpl::getCFInstrCost()` and
`PPCTTIImpl::getCFInstrCost()`.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D96805
2021-04-10 09:20:24 +03:00
Roman Lebedev 077bff39d4
[Analysis] isDereferenceableAndAlignedPointer(): recurse into select's hands
By doing this within the method itself,
we support traversing multiple levels of selects (TODO: PHI's),
fixing the SROA `std::clamp()` testcase.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47271
Mostly fixes https://bugs.llvm.org/show_bug.cgi?id=49909
2021-04-10 00:56:28 +03:00
Alina Sbirlea f6bff8d157 [MSSA] Rename uses in IDF regardless of new def position in basic block.
When inserting a new def and renaming of uses is asked, always compute
IDF and do the renaming for the blocks with Phis in that IDF.
Resolves PR49859.

Differential Revision: https://reviews.llvm.org/D100163
2021-04-09 12:32:37 -07:00
Momchil Velikov acf3279a03 For non-null pointer checks, do not descend through out-of-bounds GEPs
In LazyValueInfoImpl::isNonNullAtEndOfBlock we populate a set of
pointers, known to be non-null at the end of a block (e.g. because we
did a load through them). We then infer that any pointer, based on an
element of this set is non-null as well ("based" here meaning a
non-null pointer is the underlying object). This is incorrect, even if
the base pointer was non-null, the value of a GEP, that lacks the
inbounds` attribute, may be null.

This issue appeared as miscompilation of the following test case:

int puts(const char *);

typedef struct iter {
  int *val;
} iter_t;

static long distance(iter_t first, iter_t last) {
  long r = 0;
  for (; first.val != last.val; first.val++)
    ++r;
  return r;
}

int main() {
  int arr[2] = {0};
  iter_t i, j;
  i.val = arr;
  j.val = arr + 1;
  if (distance(i, j) >= 2)
    puts("failed");
  else
    puts("passed");
}

This fixes PR49662.

Differential Revision: https://reviews.llvm.org/D99642
2021-04-09 14:09:23 +01:00
dfukalov c1a88e007b [AA][NFC] Convert AliasResult to class containing offset for PartialAlias case.
Add an ability to store `Offset` between partially aliased location. Use this
storage within returned `ResultAlias` instead of caching it in `AAQueryInfo`.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98718
2021-04-09 13:26:09 +03:00
dfukalov d066079728 [NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset.
Main reason is preparation to transform AliasResult to class that contains
offset for PartialAlias case.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98027
2021-04-09 12:54:22 +03:00
Arthur Eubanks c5d1ccbcdf [GVN] Properly invalidate ICF cache when we simplify a value
This fixes a "Cached first special instruction is wrong!" assert.

The assert fires because replacing a value with another can cause an
instruction to no longer be "special" to ICF. In this case,
devirtualization happened, turning an indirect call to a
call to a willreturn function which is no longer special.

Reviewed By: nikic, rnk

Differential Revision: https://reviews.llvm.org/D99977
2021-04-08 14:01:57 -07:00
Stanislav Mekhanoshin 5f0ac1ef78 Set IgnoreLLVMUsed to false in CallGraph::addToCallGraph()
clang++ uses llvm.compiler.used in certain cases to preserve
symbol which is fully inlined. D96087 has resulted in undefined
symbols in such cases. Set it to false by default to preserve
old behavior but keep the option for specific uses where we
want to ignore these (e.g. to detect a potential indirect call
to a function).

Differential Revision: https://reviews.llvm.org/D99897
2021-04-08 11:14:09 -07:00
Max Kazantsev fee330824a [SCEV] Fix false-positive recognition of simple recurrences. PR49856
A value from reachable block may come to a Phi node as its input from
unreachable block. This may confuse matchSimpleRecurrence  which
has no access to DomTree and can falsely recognize something as a recurrency
because of this effect, as the attached test shows.

Patch `ae7b1e` deals with half of this problem, but it only accounts from
the case when an unreachable instruction comes to Phi as an input.

This patch provides a generalization by checking that no Phi block's
predecessor is unreachable (no matter what the input is).

Differential Revision: https://reviews.llvm.org/D99929
Reviewed By: reames
2021-04-07 13:55:17 +07:00
Philip Reames 908215b346 Use AssumeInst in a few more places [nfc]
Follow up to a6d2a8d6f5.  These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.
2021-04-06 13:18:53 -07:00
Philip Reames 9ef6aa020b Plumb AssumeInst through operand bundle apis [nfc]
Follow up to a6d2a8d6f5.  This covers all the public interfaces of the bundle related code.  I tried to cleanup the internals where the changes were obvious, but there's definitely more room for improvement.
2021-04-06 12:53:53 -07:00
Philip Reames a6d2a8d6f5 Add a subclass of IntrinsicInst for llvm.assume [nfc]
Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class.  A follow up change will do the same for the newer assumption query/bundle mechanisms.
2021-04-06 11:16:22 -07:00
Florian Hahn 4059c1c32d [SimplifyInst] Use correct type for GEPs with vector indices.
The current code does not properly handle vector indices unless they are
the first index.

At the moment LangRef gives the impression that the vector index must be
the one and only index (https://llvm.org/docs/LangRef.html#getelementptr-instruction).

But vector indices can appear at any position and according to the
verifier there may be multiple vector indices. If that's the case, the
number of elements must match.

This patch updates SimplifyGEPInst to properly handle those additional
cases.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99961
2021-04-06 17:56:10 +01:00
Philip Reames 21d4839948 Move GCRelocateInst and GCResultInst to IntrinsicInst.h [nfc]
These two are part of the IntrinsicInst class hierarchy and it helps to cut down on some redundant includes.
2021-04-06 08:33:15 -07:00
Kerry McLaughlin 7344f3d39a [LoopVectorize] Add strict in-order reduction support for fixed-width vectorization
Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to
reorder FP operations. However, it may still be beneficial to vectorize the loop by moving
the reduction inside the vectorized loop and making sure that the scalar reduction value
be an input to the horizontal reduction, e.g:

  %phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ]
  %load = load <8 x float>
  %reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load)

This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added
by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions.
For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D98435
2021-04-06 14:45:34 +01:00
Kerry McLaughlin 857b8a73da [LoopVectorize] Change the identity element for FAdd
Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd.

Reviewed By: dmgreen, spatel

Differential Revision: https://reviews.llvm.org/D98963
2021-04-06 12:13:43 +01:00
Simon Pilgrim ddbb58736a [KnownBits] Rename KnownBits::computeForMul to KnownBits::mul. NFCI.
As promised in D98866
2021-04-06 10:11:41 +01:00
Philip Reames 58ccbd0d08 Comment adjustments for a rename 2021-04-05 21:07:42 -07:00
Philip Reames 13deb6aac7 Exact ashr/lshr don't loose any set bits and are thus trivially invertible
Use that fact to improve isKnownNonEqual.
2021-04-05 19:22:36 -07:00
Philip Reames dc8d864e3a Address minor post commit feedback on 0e59dd 2021-04-05 18:22:17 -07:00
Sanjay Patel e2a0f512ea [InstSimplify] fix potential miscompile in select value equivalence
This is the sibling fix to c590a9880d -
as there, we can't subsitute a vector value the equality
compare replacement that we are trying requires that the
comparison is true for the entire value. Vector select
can be partly true/false.
2021-04-05 16:52:34 -04:00
Philip Reames b0e59dd6e1 Extract a helper for figuring out if an operator is invertible [nfc]
For use in an uncoming patch.  Left out the phi case (which could otherwise fit in this framework) as it would cause infinite recursion in said patch.  We can probably also leverage this in instcombine to ensure we keep the two sets of related analysis and transforms in sync.
2021-04-05 12:14:21 -07:00
Nikita Popov 72e0846ef8 [LVI] Don't bail on overdefined value in select
Even if one of the operands is overdefined, we may still produce
a non-overdefined result, e.g. due to a min/max operation. This
matches our handling elsewhere, e.g. for binary operators.

The slot poisoning comment refers to a much older LVI cache
implementation.
2021-04-04 11:11:01 +02:00
Mircea Trofin b32e76c6d5 [mlgo] fix build rules
This was prompted by D95727, which had the side-effect to break the
'release' mode build bot for ML-driven policies. The problem is that now
the pre-compiled object files don't get transitively carried through as
'source' anymore; that being said, the previous way of consuming them
was problematic, because it was only working for static builds; in
dynamic builds, the whole tf_xla_runtime was linked, which is
undesirable.

The alternative is to treat tf_xla_runtime as an archive, which then
leads to the desired effect.

Differential Revision: https://reviews.llvm.org/D99829
2021-04-03 12:49:03 -07:00
Nikita Popov b552e16b0b [Loads] Forward constant vector store to load of first element
InstCombine performs simple forwarding from stores to loads, but
currently only handles the case where the load and store have the
same size. This extends it to also handle a store of a constant
with a larger size followed by a load with a smaller size.

This is implemented through ConstantFoldLoadThroughBitcast() which
is fairly primitive (e.g. does not allow storing a large integer
and then loading a small one), but at least can forward the first
element of a vector store. Unfortunately it seems that we currently
don't have a generic helper for "read a constant value as a different
type", it's all tangled up with other logic in either
ConstantFolding or VNCoercion.

Differential Revision: https://reviews.llvm.org/D98114
2021-04-03 12:10:31 +02:00
Nikita Popov 9d20eaf9c0 [BasicAA] Don't store AATags in cache key (NFC)
The AAMDNodes part of the MemoryLocation is not used by the BasicAA
cache, so don't store it. This reduces the size of each cache entry
from 112 bytes to 48 bytes.
2021-04-03 11:32:01 +02:00
Nikita Popov 17b4e5d456 [BasicAA] Don't pass through AA metadata (NFCI)
BasicAA itself doesn't make use of AA metadata, but passes it
through to recursive queries and makes it part of the cache key.
Aliasing decisions that are based on AA metadata (i.e. TBAA and
ScopedAA) are based *only* on AA metadata, so checking them with
different pointer values or sizes is not useful, the result will
always be the same.

While this change is a mild compile-time improvement by itself,
the actual goal here is to reduce the size of AA cache keys in
a followup change.

Differential Revision: https://reviews.llvm.org/D90098
2021-04-03 11:21:50 +02:00
Simon Pilgrim 4ea5475a3f [KnownBits] Add KnownBits::haveNoCommonBitsSet helper. NFCI.
Include exhaustive test coverage.
2021-04-02 21:44:33 +01:00
Nikita Popov 4a3e006830 [LVI] Use range metadata on intrinsics
If we don't know how to handle an intrinsic, we should still
make use of normal call range metadata.
2021-04-02 16:45:31 +02:00
Sander de Smalen 0f7bbbc481 Always emit error for wrong interfaces to scalable vectors, unless cmdline flag is passed.
In order to bring up scalable vector support in LLVM incrementally,
we introduced behaviour to emit a warning, instead of an error, when
asking the wrong question of a scalable vector, like asking for the
fixed number of elements.

This patch puts that behaviour under a flag. The default behaviour is
that the compiler will always error, which means that all LLVM unit
tests and regression tests will now fail when a code-path is taken that
still uses the wrong interface.

The behaviour to demote an error to a warning can be individually enabled
for tools that want to support experimental use of scalable vectors.
This patch enables that behaviour when driving compilation from Clang.
This means that for users who want to try out scalable-vector support,
fixed-width codegen support, or build user-code with scalable vector
intrinsics, Clang will not crash and burn when the compiler encounters
such a case.

This allows us to do away with the following pattern in many of the SVE tests:
  RUN: .... 2>%t
  RUN: cat %t | FileCheck --check-prefix=WARN
  WARN-NOT: warning: ...

The behaviour to emit warnings is only temporary and we expect this flag
to be removed in the future when scalable vector support is more stable.

This patch also has fixes the following tests:
 unittests:
   ScalableVectorMVTsTest.SizeQueries
   SelectionDAGAddressAnalysisTest.unknownSizeFrameObjects
   AArch64SelectionDAGTest.computeKnownBitsSVE_ZERO_EXTEND_VECTOR_INREG

 regression tests:
   Transforms/InstCombine/vscale_gep.ll

Reviewed By: paulwalker-arm, ctetreau

Differential Revision: https://reviews.llvm.org/D98856
2021-04-02 10:55:22 +01:00
Philip Reames db357891f0 Infer dereferenceability from malloc and friends
Hookup TLI when inferring object size from allocation calls. This allows the analysis to prove dereferenceability for known allocation functions (such as malloc/new/etc) in addition to those marked explicitly with the allocsize attribute.

This is a follow up to 0129cd5 now that the bug fixed by e2c6621e6 is resolved.

As noted in the test, this relies on being able to prove that there is no free between allocation and context (e.g. hoist location). At the moment, this is handled conservatively. I'm working strengthening out ability to reason about no-free regions separately.

Differential Revision: https://reviews.llvm.org/D99737
2021-04-01 11:33:35 -07:00
Philip Reames ffa15e9463 Extract isVolatile helper on Instruction [NFCI]
We have this logic duplicated in several cases, none of which were exhaustive.  Consolidate it in one place.

I don't believe this actually impacts behavior of the callers.  I think they all filter their inputs such that their partial implementations were correct.  If not, this might be fixing a cornercase bug.
2021-04-01 11:24:02 -07:00
Philip Reames e2c6621e63 [deref-at-point] restrict inference of dereferenceability based on allocsize attribute
Support deriving dereferenceability facts from allocation sites with known object sizes while correctly accounting for any possibly frees between allocation and use site. (At the moment, we're conservative and only allowing it in functions where we know we can't free.)

This is part of the work on deref-at-point semantics. I'm making the change unconditional as the miscompile in this case is way too easy to trip by accident, and the optimization was only recently added (by me).

There will be a follow up patch wiring through TLI since that should now be doable without introducing widespread miscompiles.

Differential Revision: https://reviews.llvm.org/D95815
2021-04-01 08:34:40 -07:00
Philip Reames 4af4828a6e [ValueTracking] Handle non-zero ashr/lshr recurrences
If we know we don't shift out bits (e.g. exact), all we need to know is that input is non-zero.
2021-03-31 16:48:32 -07:00