Commit Graph

11781 Commits

Author SHA1 Message Date
Kazu Hirata 7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Kazu Hirata d66cbc565a Don't use Optional::hasValue (NFC) 2022-06-20 20:26:05 -07:00
Kazu Hirata 0916d96d12 Don't use Optional::hasValue (NFC) 2022-06-20 20:17:57 -07:00
Florian Hahn 6dd772d348
[ConstraintElimination] Move logic to get a constraint to helper (NFC). 2022-06-20 21:34:07 +02:00
Kazu Hirata e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Florian Hahn cebe7ae881
[ConstraintElimination] Move logic to add constraint to helper (NFC). 2022-06-20 17:08:35 +02:00
Florian Hahn bd9632afd2
[ConstraintElimination] Move StackEntry up, to allow use earlier (NFC). 2022-06-20 16:40:42 +02:00
Kazu Hirata 129b531c9c [llvm] Use value_or instead of getValueOr (NFC) 2022-06-18 23:07:11 -07:00
Kazu Hirata b254d67160 [llvm] Call *set::insert without checking membership first (NFC) 2022-06-18 08:32:54 -07:00
Florian Hahn 7c0089d735
[Matrix] Check if iterator is at beginning of BB in optimizeTranspose.
If an instruction at the beginning of a block is erased,  this may
trigger crash due to dereferencing an invalid iterator.

Check if II is at the end before dereferencing it.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D127736
2022-06-14 21:37:02 +01:00
Florian Hahn 782e912246
[ConstraintElimination] Support constraints with only const ops.
Remove the early exit if both constraints contain no variables. This
restriction is unnecessayr for correctness and removing it simplifies
handling of trivial constant conditions in follow-up changes.
2022-06-14 10:37:12 +01:00
Guillaume Chatelet f9bb8c24ac [NFC][Alignment] Convert MemCpyOptimizer.cpp 2022-06-13 10:07:09 +00:00
Kazu Hirata 5d7b1a5f1b [Scalar] Use llvm::append_range (NFC) 2022-06-10 23:09:01 -07:00
Guillaume Chatelet dc9c2eac98 [NFC][Alignment] Simplify code 2022-06-10 15:25:28 +00:00
Guillaume Chatelet 12ccdd67aa [NFC] Use proper getSliceAlign type in SROA 2022-06-10 12:37:41 +00:00
Philip Reames 206f10d3f6 Plumb InstructionCost through unroll costing
Teach the unroller(s) how to handle an invalid cost. This avoids crashes when the backend can't provide a cost due to either a fundemental limitation or an unimplemented cost model case.

Differential Revision: https://reviews.llvm.org/D127305
2022-06-09 15:42:53 -07:00
Philip Reames f85c5079b8 Pipe potentially invalid InstructionCost through CodeMetrics
Per the documentation in Support/InstructionCost.h, the purpose of an invalid cost is so that clients can change behavior on impossible to cost inputs. CodeMetrics was instead asserting that invalid costs never occurred.

On a target with an incomplete cost model - e.g. RISCV - this means that transformations would crash on (falsely) invalid constructs - e.g. scalable vectors. While we certainly should improve the cost model - and I plan to do so in the near future - we also shouldn't be crashing. This violates the explicitly stated purpose of an invalid InstructionCost.

I updated all of the "easy" consumers where bailouts were locally obvious. I plan to follow up with loop unroll in a following change.

Differential Revision: https://reviews.llvm.org/D127131
2022-06-09 15:17:24 -07:00
Simon Moll b8c2781ff6 [NFC] format InstructionSimplify & lowerCaseFunctionNames
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName".  This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.

This is the alternative to the less invasive clang-format only patch: D126783

Reviewed By: spatel, rengolin

Differential Revision: https://reviews.llvm.org/D126889
2022-06-09 16:10:08 +02:00
Philip Reames 89c4b29e8d [GuardWidening] Fix a nasty cast bug in c2eccc6
c2eccc6 introduced a call to etHasNoUnsignedWrap which implicitly assumes that Inst is a OverflowingBinaryOperator.  This is frequently untrue, but was not caught because cast<Ty>(X) has been broken, see https://discourse.llvm.org/t/cast-x-is-broken-implications-and-proposal-to-address/63033 for context.

I considered reverting this, but since doing so re-introduces a nasty miscompile of its own, I decided to fix forward instead.

I'll note that this is a particularly nasty form of the cast<Ty>(X) issue.  Because the cast was succeeding unexpected, we were writing data to instructions which weren't OBOs.  This could result in near arbitrary data or memory corruption.  I'm a bit shocked that the sanitizers didn't find this TBH.
2022-06-07 13:27:13 -07:00
Craig Topper d73684e223 [LoopFlatten] Fix crash if the inner loop trip count comes from a sext instruction.
If we look through a truncate in matchLinearIVUser, it's possible
we find a sext/zext instruction that didn't come from widening.
This will fail the MatchedItCount->getType() == InnerInductionPHI->getType()
assertion.

Fix this by checking that we did not look through a truncate already.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D127149
2022-06-07 08:21:21 -07:00
Craig Topper fdd5843572 [LoopFlatten] Replace unchecked dyn_cast with cast.
Spotted while reading through the code.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D127146
2022-06-07 08:21:00 -07:00
Kevin P. Neal a1f1bd547b [IPSCCP] Switch away from Instruction::isSafeToRemove()
In D115737 I found that I needed to teach Instruction::isSafeToRemove()
about strictfp/constrained intrinsics. It was pointed out that this is
probably the wrong function to use isInstructionTriviallyDead(). It doesn't
make sense to have a "second, worse implementation".

I also believe that the Instruction class is the wrong place for this
functionality. The information about whether or not an instruction can be
removed is in the transform passes and should stay there.

Differential Revision: https://reviews.llvm.org/D118387
2022-06-06 09:24:11 -04:00
Kazu Hirata 8daf23d364 [Scalar] Use llvm::make_early_inc_range (NFC) 2022-06-05 23:53:18 -07:00
Kazu Hirata 30f19382c6 [Scalar] Remove isValidSingle (NFC)
The last use was removed on Feb 18, 2022 in commit
00ab91b70d.
2022-06-05 08:45:11 -07:00
Fangrui Song 95a134254a Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options 2022-06-05 01:07:51 -07:00
Fangrui Song d86a206f06 Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options 2022-06-05 00:31:44 -07:00
Kazu Hirata e0039b8d6a Use llvm::less_second (NFC) 2022-06-04 22:48:32 -07:00
Kazu Hirata f83a88a179 [Transforms] Use llvm::is_contained (NFC) 2022-06-04 20:48:26 -07:00
Fangrui Song 36c7d79dc4 Remove unneeded cl::ZeroOrMore for cl::opt options
Similar to 557efc9a8b.
This commit handles options where cl::ZeroOrMore is more than one line below
cl::opt.
2022-06-04 00:10:42 -07:00
Fangrui Song 557efc9a8b [llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.

Also remove cl::init(false) while touching the lines.
2022-06-03 21:59:05 -07:00
Daniil Suchkov f1940a5895 Revert "[LoopInterchange] New cost model for loop interchange"
Reverting the commit due to numerous buildbot failures.

This reverts commit 006334470d.
2022-06-03 00:52:08 +00:00
Congzhe Cao 006334470d [LoopInterchange] New cost model for loop interchange
This patch proposed to use a new cost model for loop interchange, which
is obtained from loop cache analysis.

Given a loopnest, what loop cache analysis returns is a vector of loops
[loop0, loop1, loop2, ...] where loop0 should be replaced as the outermost
loop, loop1 should be placed one more level inside, and loop2 one more level
inside, etc. What loop cache analysis does is not only more comprehensive than
the current cost model, it is also a "one-shot" query which means that we only
need to query it once during the entire loop interchange pass, which is better
than the current cost model where we query it every time we check whether it is
profitable to interchange two loops. Thus complexity is reduced, especially after
D120386 where we do more interchanges to get the globally optimal loop access pattern.

Updates made to test cases are mostly minor changes and some corrections.
Test coverage for loop interchange is not reduced.

Currently we did not completely remove the legacy cost model, but keep it as
fall-back in case the new cost model did not run successfully. This is because
currently we have some limitations in delinearization, which sometimes makes
loop cache analysis bail out. The longer term goal is to enhance delinearization
and eventually remove the legacy cost model compeletely.

Reviewed By: bmahjour, #loopoptwg

Differential Revision: https://reviews.llvm.org/D124926
2022-06-02 19:07:14 -04:00
eopXD 6eab5cade7 [LSR] Early exit for RateFormula when it is already losing. NFC
This patch does not effect any behavior of the current code.

The codebase implicitly implies that `Cost::RateFormula` is only called
when the `Cost` is not in losing status, or else there may be possible
to trigger the assertion of `Cost::isValid`.

The intention here is to prevent mis-use where future development
allow `Cost` that is already loser to call `Cost::RateFormula` - Early
exit when `Cost` is already losing.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D125670
2022-06-01 21:02:40 -07:00
Eli Friedman abdf0da800 [LoopIdiom] Fix bailout for aliasing in memcpy transform.
Commit dd5991cc modified the aliasing checks here to allow transforming
a memcpy where the source and destination point into the same object.
However, the change accidentally made the code skip the alias check for
other operations in the loop.

Instead of completely skipping the alias check, just skip the check for
whether the memcpy aliases itself.

Differential Revision: https://reviews.llvm.org/D126486
2022-05-31 17:24:23 -07:00
Nuno Lopes 80b3dcc045 [Support] Make report_fatal_error respect its GenCrashDiag argument so it doesn't generate a backtrace
There are a few places where we use report_fatal_error when the input is broken.
Currently, this function always crashes LLVM with an abort signal, which
then triggers the backtrace printing code.
I think this is excessive, as wrong input shouldn't give a link to
LLVM's github issue URL and tell users to file a bug report.
We shouldn't print a stack trace either.

This patch changes report_fatal_error so it uses exit() rather than
abort() when its argument GenCrashDiag=false.

Reviewed by: nikic, MaskRay, RKSimon

Differential Revision: https://reviews.llvm.org/D126550
2022-05-30 19:19:23 +01:00
Nikita Popov 1721ff1dfd [GVN] Enable enable-split-backedge-in-load-pre option by default
This option was added in D89854. It prevents GVN from performing
load PRE in a loop, if doing so would require critical edge
splitting on the backedge. From the review:

> I know that GVN Load PRE negatively impacts peeling,
> loop predication, so the passes expecting that latch has
> a conditional branch.

In the PhaseOrdering test in this patch, splitting the backedge
negatively affects vectorization: After critical edge splitting,
the loop gets rotated, effectively peeling off the first loop
iteration. The effect is that the first element is handled
separately, then the bulk of the elements use a vectorized
reduction (but using unaligned, off-by-one memory accesses) and
then a tail of 15 elements is handled separately again.

It's probably worth noting that the loop load PRE from D99926 is
not affected by this change (as it does not need backedge
splitting). This is about normal load PRE that happens to occur
inside a loop.

Differential Revision: https://reviews.llvm.org/D126382
2022-05-30 09:55:58 +02:00
Max Kazantsev 503d5771b6 [JumpThreading][NFCI] Reuse existing DT instead of recomputation
This whole part with recomputation of BPI and BFI looks redundant,
and we tried to get rid of it in D124439. Unfortunately, it causes
some hard-to-reproduce failures due to invalid state of analysis.
Until this is investigated and fixed, let's try to reuse at least
part of available analyzes.

DT is available at this point, and there is no need to recompute it.

Please revert if you see it causing *any* behavior changes.
2022-05-30 12:48:10 +07:00
Florian Hahn 0776c48f9b
Recommit "[LICM] Only create load in ph when promoting load or store doesn't exec."
This reverts the revert commit ad95255b92.

The updated version also creates a load when the store may not execute.
In those cases, we still need to introduce a load in a function where
there may not have been one before, so this doesn't completely resolve
issue #51248.

Original message:

    When only a store is sunk, there is no need to create a load in the
    pre-header, as the result of the load will never get used.

    The dead load can can introduce UB, if the function is marked as
    writeonly.

    Reviewed By: nikic

    Differential Revision: https://reviews.llvm.org/D123473
2022-05-29 21:57:14 +01:00
eopXD 6a84579243 [LSR][TTI][PowerPC][SystemZ][X86] Add const-ness to TTI::isLSRCostLess. NFC
Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D126350
2022-05-27 15:22:23 -07:00
Arthur Eubanks 36096c2b38 [NFC][JumpThreading] Remove InsertFreezeWhenUnfoldingSelect pass parameter
All callers pass true.

select-unfold-freeze.ll is now a subset of select.ll so delete it.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D126501
2022-05-26 16:13:34 -07:00
Owen Anderson 939a43461b Revert "Replace the custom linked list in LeaderTableEntry with TinyPtrVector."
This reverts commit 1e91149844.

Pending further discussion.
2022-05-26 09:50:36 -07:00
Alex Zhikhartsev 8b0d763474 [DFAJumpThreading] Relax analysis to handle unpredictable initial values
Responding to a feature request from the Rust community:

https://github.com/rust-lang/rust/issues/80630

    void foo(X) {
      for (...)
	switch (X)
	  case A
	    X = B
	  case B
	    X = C
    }

Even though the initial switch value is non-constant, the switch
statement can still be threaded: the initial value will hit the switch
statement but the rest of the state changes will proceed by jumping
unconditionally.

The early predictability check is relaxed to allow unpredictable values
anywhere, but later, after the paths through the switch statement have
been enumerated, no non-constant state values are allowed along the
paths. Any state value not along a path will be an initial switch value,
which can be safely ignored.

Differential Revision: https://reviews.llvm.org/D124394
2022-05-26 11:29:54 -04:00
Florian Hahn f96aa493f0
[SimpleLoopUnswitch] Always skip trivial select and set condition.
When updating the branch instruction outside the loopduring non-trivial
 unswitching, always skip trivial selects and update the condition.

Otherwise we might create invalid IR, because the trivial select is
inside the loop, while the condition is outside the loop.

Fixes #55697.
2022-05-26 09:46:24 +01:00
Owen Anderson 1e91149844 Replace the custom linked list in LeaderTableEntry with TinyPtrVector.
The purpose of the custom linked list was to optimize for the case
of a single-element list. It turns out that TinyPtrVector handles
the same basic scenario even better, reducing the size of
LeaderTableEntry by 33%, and requiring only log2(N) allocations
as the size of the list grows. The only downside is that we have
to store the Value's and BasicBlock's in separate vectors, which
is slightly awkward in a few cases. Fortunately that ends up being
entirely encapsulated inside helper functions.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D125205
2022-05-25 23:52:44 -07:00
Serguei Katkov c2eccc67ce [GuardWidening] Remove nuw/nsw flags for hoisted instructions
When we hoist instructions over guard we must clear flags due to these flags
might be implied using this guard, so they make sense only after the guard.

As an example of the bug due to current behavior.
L is known to be in range say [0, 100)
c1 = x u< L
guard (c1)
x1 = add x, 1
c2 = x1 u< L
guard(c2)

basing on guard(c1) we can say that x1 = add nuw nsw x, 1
after guard widening we get
c1 = x u< L
x1 = add nuw nsw x, 1
c2 = x1 u< L
c = and c1, c2
guard(c)

now, basing on fact that x + 1 < L and x >= 0 due to x + 1 is nuw
we can prove that x + 1 u< L implies that x u< L, so we can just remove c1
x1 = add nuw nsw x, 1
c2 = x1 u< L
guard(c2)

But that is not correct due to we will pass x == -1 value.

Reviewed By: mkazantsev
Subscribers: llvm-commits, nikic
Differential Revision: https://reviews.llvm.org/D126354
2022-05-26 13:20:55 +07:00
Nikita Popov 6f0ca6fd23 [JumpThreading] Insert freeze when unfolding select
JumpThreading may convert selects into branch instructions,
in which case the condition needs to be frozen (as branch on
poison is immediate undefined behavior, unlike select on poison).

The necessary code for this is already in place, this just enables
the option.

Differential Revision: https://reviews.llvm.org/D125869
2022-05-21 11:24:27 +02:00
Florian Hahn 32d6ef36d6
[SimpleLoopUnswitch] Skip trivial selects during trivial unswitching.
Update the remaining places in unswitchTrivialBranch to properly skip
trivial selects.

Fixes #55526.
2022-05-19 17:01:13 +01:00
Jay Foad 6bec3e9303 [APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.

The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.

Differential Revision: https://reviews.llvm.org/D125557
2022-05-19 11:23:13 +01:00
Nikita Popov c9e7049754 [JumpThreading] Look through freeze in getPredicateAt() fold
This code is valid for any icmp, so we can safely look through a
freeze when trying to find one.

A caveat here is that replaceFoldableUses() may not end up replacing
any uses in this case. It might make sense to use the freeze as the
context instruction (rather than the terminator) if there is a
freeze, to ensure that it always gets folded. This would require
some changes to how replaceFoldedUses() works though, as it
currently assumes that the value is valid at the end of the block.
2022-05-18 12:09:59 +02:00
Nikita Popov 18c70a7bd9 [JumpThreading] Simplify getPredicateAt() based folding
It's sufficient to just fold the icmp to true/false here, and then
let constant terminator folding take care of the rest.

It should be noted that while replaceFoldableUses() may not replace
all uses of the icmp, at least the use in the terminator we're
working on is always replaceable, so terminator constant folding
should be reliably enabled as a subsequent step.
2022-05-18 11:24:52 +02:00
Nikita Popov d4cdf013c7 [JumpThreading] Use common code to skip freeze (NFC)
There are multiple places that want to look through freeze, so
store condition without freeze in a separate variable.
2022-05-18 10:49:41 +02:00
Juneyoung Lee 3adcf96b4f [JumpThreading] Let ProcessImpliedCondition look into freeze instructions
This patch makes JumpThreading's ProcessImpliedCondition deal with frozen
conditions.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84941
2022-05-18 10:41:31 +09:00
Nikita Popov 9ba452b08e [JumpThreading] Don't pass DT to isGuaranteedNotToBeUndefOrPoison()
JumpThreading intentionally does not force updating of the DT
during optimization, because this may be expensive when many CFG
updates and DT calculations are interleaved.

We shouldn't be fetching the DT just for the purpose of calling
isGuaranteedNotToBeUndefOrPoison(), especially as DT availability
doesn't even show benefit in tests.
2022-05-17 11:53:49 +02:00
Dmitry Vassiliev 7759680e2f [SROA] Avoid postponing rewriting load/store by ignoring lifetime intrinsics in partition's promotability checking
This patch fixes a bug that generates unnecessary packing/unpacking structure code because of incorrectly handling lifetime intrinsic.
For example, a partition of an alloca may contain many slices:
```
Partition [0, 4):
  Slice0: [0, 4) used by: load i32 addr;
  Slice1: [0, 4) used by: store i32 v, addr;
  Slice2: [0, 16) used by lifetime.start(16, addr);
```
When SROA determines if the partition can be promoted, lifetime.start is currently treated as a whole alloca load/store, so Slice0 and Slice1 cannot be promoted at this attempt,
but the packing/unpacking code for Slice0 and Slice1 has been generated.
After rewrite lifetime.start/end intrinsic, SROA tries again with Slice0 and Slice1 and finally promotes them, but redundant packing/unpacking code remaining in the IRs.
This patch changes promotability checking to ignore lifetime intrinsic (they will be rewritten to correct sizes later), so we can promote the real users (load/store) at the first attempt with optimal code.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D124967
2022-05-17 11:25:59 +02:00
Fangrui Song 60e5fd00cd [RS4GC] Fix -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off build after D125000 2022-05-14 10:47:50 -07:00
Nikita Popov ed1cb01baf [IRBuilder] Add IsInBounds parameter to CreateGEP()
We commonly want to create either an inbounds or non-inbounds GEP
based on a boolean value, e.g. when preserving inbounds from
existing GEPs. Directly accept such a boolean in the API, rather
than requiring a ternary between CreateGEP and CreateInBoundsGEP.

This change is not entirely NFC, because we now preserve an
inbounds flag in a constant expression edge-case in InstCombine.
2022-05-13 14:30:55 +02:00
Florian Hahn 8e6d481f3b
[ConstraintElimination] Simplify ssub(A,B) if B s>=b && B s>=0.
A first patch to use the reasoning in ConstraintElimination to simplify
sub with overflow to a regular sub, if the operation is guaranteed to
not overflow.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D125264
2022-05-13 13:19:41 +01:00
Dmitry Makogon 1da42c9f71 [RS4GC] Cache BDVs and bases alogn with IsKnownBase flag (NFC)
This refactors RS4GC to cache results returned findBaseDefiningValue
and also gets rid of BaseDefiningValueResult by caching the
IsKnownBase flag for BDVs and bases.

Differential Revision: https://reviews.llvm.org/D125000
2022-05-13 14:14:17 +07:00
Craig Topper 4b36d9bde7 [CVP] Preserve exact name when converting sext->zext and ashr->lshr.
Previously we took the old name and always appended a numberic suffix.
Since we're doing a 1:1 replacement, it's clearer to keep the original
name exactly.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D125281
2022-05-10 09:13:59 -07:00
Craig Topper 7b362ddda9 [SCCP] Preserve Name when converting SExt->ZExt.
This makes the output IR more readable since we're doing a one to
one replacement.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D125280
2022-05-10 09:13:59 -07:00
Dawid Jurczak 009f6ce0ef [GVNSink] Make GVNSink resistant against self referencing instructions (PR36954)
Before this change GVNSink pass suffers from stack overflow while processing self referenced instruction in unreachable basic block.
According [1] and [2] it's reasonable to make pass resistant against self referencing instructions.
To fix issue we skip sinking analysis when we reach instruction coming from unreachable block.

[1] https://groups.google.com/g/llvm-dev/c/843Tig9IzwA
[2] https://lists.llvm.org/pipermail/llvm-dev/2015-February/082629.html

Differential Revision: https://reviews.llvm.org/D113897
2022-05-10 16:06:12 +02:00
Florian Hahn 41e142fdc7
Recommit "[SimpleLoopUnswitch] Collect either logical ANDs/ORs but not both."
This reverts commit 7211d5ce07.

This version fixes a crash that caused buildbot failures with the first
version.
2022-05-09 13:49:12 +01:00
Alexander Shaposhnikov f827ee671f [Scalar][NFC] Minor cleanups in CallSiteSplitting.cpp 2022-05-06 23:03:49 +00:00
Florian Hahn 7211d5ce07
Revert "[SimpleLoopUnswitch] Collect either logical ANDs/ORs but not both."
This reverts commit db7a87ed4f.

This seems to cause a PPC buildbot failure:
https://lab.llvm.org/buildbot#builders/93/builds/8787
2022-05-06 22:38:15 +01:00
Max Kazantsev 5a08e81779 [RS4GC] Add support for 'freeze' instruction to findBaseDefiningValue
Because this instruction is a noop, we can simply go through it in
search of the base.
2022-05-06 20:46:29 +07:00
Max Kazantsev e6a7afae03 [NFC] Fix typo in assert message 2022-05-06 20:31:34 +07:00
Florian Hahn db7a87ed4f
[SimpleLoopUnswitch] Collect either logical ANDs/ORs but not both.
After D97756, collectHomogenousInstGraphLoopInvariants may collect
conditions for both logical ANDs and logical ORs in case the root is a
select that matches both logical AND & OR.

This means the function won't return invariant values of either AND/OR
chains, but both. This can result in incorrect transformations.

See llvm/test/Transforms/SimpleLoopUnswitch/trivial-unswitch-logical-and-or.ll.
Without the patch, Alive2 rejects the modified tests with:
    Source and target don't have the same return domain.

Note that this also applies to the test case added in D97756
(@test_partial_condition_unswitch_or_select). We can't unswitch on
%cond6, because the graph leading to it contains and AND and an OR.

This only fixes trivial unswitching for now, but a similar problem
likely exists with non-trivial unswitching.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D124526
2022-05-06 09:50:03 +01:00
Florian Hahn 6bd2b70877
[SimpleLoopUnswitch] Add freeze if branch execs for partial unswitching.
We cannot skip the freezing the condition if the unswitched branch
executes, if the condition is a chain of ANDs/ORs. For example, if if we
have an AND %c1, %c2 with  %c1 == undef and %c2 == 0, there would be no
branch on undef in the original code, but a branch on undef if we
unswitch %c1.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D124603
2022-05-05 09:44:07 +01:00
Dawid Jurczak 9c46a9cf61 [NFC][GVNSink] Don't pretend that iteration is over instructions when it's actually over blocks
Differential Revision: https://reviews.llvm.org/D124764
2022-05-03 17:19:40 +02:00
David Green 6f81903e89 [LV][SLP] Mark fptosi_sat as vectorizable
This adds fptosi_sat and fptoui_sat to the list of trivially
vectorizable functions, mainly so that the loop vectorizer can vectorize
the instruction. Marking them as trivially vectorizable also allows them
to be SLP vectorized, and Scalarized.

The signature of a fptosi_sat requires two type overrides
(@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only
take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd
to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first
operand of the intrinsic as a overloaded (but not scalar) operand.

Differential Revision: https://reviews.llvm.org/D124358
2022-05-03 09:32:34 +01:00
Jonas Paulsson 304378fd09 Reapply "[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building
libcalls." (was 0f8c626). This reverts commit 14d9390.

The patch previously failed to recognize cases where user had defined a
function alias with an identical name as that of the library
function. Module::getFunction() would then return nullptr which is what the
sanitizer discovered.

In this updated version a new function isLibFuncEmittable() has as well been
introduced which is now used instead of TLI->has() anytime a library function
is to be emitted . It additionally also makes sure there is e.g. no function
alias with the same name in the module.

Reviewed By: Eli Friedman

Differential Revision: https://reviews.llvm.org/D123198
2022-05-02 19:37:00 +02:00
Florian Hahn 5387a38c38
[SimpleLoopUnswitch] Freeze individual OR/AND operands.
In some cases, it is not enough to freeze the final AND/OR operation
when chaining a number of invariant conditions together.

After creating a chain of ANDs/ORs, we assume all unswitched operands to
be either true or false. But if any of the operands is poison, the rest
of the operands could have any value after branching on the frozen
condition.

To avoid that, freeze individual operands, if needed. In some cases this
may lead to unnecessary freezes, but it seems required at least for some
cases (see trivial-unswitch-freeze-individual-conditions.ll)

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D124554
2022-05-01 20:11:05 +01:00
Florian Hahn 8b022f87b0
[SimpleLoopUnswitch] Freeze trivial conditions if needed.
Trivial unswitching can also introduce new branches on undef/poison.
Freeze the conditions if needed.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D124549
2022-04-30 19:53:36 +01:00
James Y Knight 02aa795785 Revert "[JumpThreading][NFC][CompileTime] Do not recompute BPI/BFI analyzes"
This change has caused non-reproducibility of a self-build of Clang
when using NewPM and providing profile data.

This reverts commit 35f38583d2.
2022-04-29 21:15:47 +00:00
Anna Thomas 205246cb64 [CompileTime] [Passes] Avoid computing unnecessary analyses. NFC
Similar to c515b2f39e, If there are no loops in the function as seen
through LI, we should avoid computing the remaining expensive analyses
(such as SCEV, BPI).  Reordered the analyses requests and early return
if there are no loops.

The logic of avoiding expensive analyses is applied to LoopVectorizer,
LoopLoadElimination and LoopUnrollPass, i.e. all function passes which operate
on loops.

This is an NFC with compile time improvement.

Differential Revision: https://reviews.llvm.org/D124529
2022-04-29 10:00:06 -04:00
Florian Hahn fb4113ef0c
[Passes] Remove legacy LoopUnswitch pass.
The legacy LoopUnswitch pass is only used in the legacy pass manager
pipeline, which is deprecated.

The NewPM replacement is SimpleLoopUnswitch and I think it is time to
remove the legacy LoopUnswitch code.

Fixes #31000.

Reviewed By: aeubanks, Meinersbur, asbirlea

Differential Revision: https://reviews.llvm.org/D124376
2022-04-29 10:30:49 +01:00
Chris Jackson c792884589 [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]
Reland 3f2b76ec90 with the test corrected
to require x86-registered-target.

Differential Revision: https://reviews.llvm.org/D120169
2022-04-28 14:21:56 +01:00
Chris Jackson cd5f9efc4d Revert "[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]"
This reverts commit 3f2b76ec90.
2022-04-28 14:07:31 +01:00
Chris Jackson 3f2b76ec90 [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]
Reland commit 74273d575f following a fix
for a memory leak. The DVIRecoveryRecord vectors now use unique_ptr.

Differential Revision: https://reviews.llvm.org/D120169
2022-04-28 13:55:49 +01:00
Nikita Popov b9dc565147 [GVN] Encode GEPs in offset representation
When using opaque pointers, convert GEPs into offset representation
of the form P + V1 * Scale1 + V2 * Scale2 + ... + ConstantOffset.
This allows us to recognize equivalent address calculations even if
the GEPs don't use the same source element type.

This fixes an opaque pointer codegen regression seen in rustc.

Differential Revision: https://reviews.llvm.org/D124527
2022-04-28 09:32:05 +02:00
Max Kazantsev 35f38583d2 [JumpThreading][NFC][CompileTime] Do not recompute BPI/BFI analyzes
They can already be available, and even if not, DT/LI can be available.
We should not recompute them. Old PM is unchanged because it would
require changing dependencies, and we don't care enough about it.

Differential Revision: https://reviews.llvm.org/D124439
Reviewed By: nikic, aeubanks
2022-04-28 10:46:08 +07:00
Wenju He 96d3be8443 [InferAddressSpaces] Check if AS are the same in isNoopPtrIntCastPair
isNoopAddrSpaceCast is expecting SrcAS is different from DestAS.
If the two AS are the same, consider ptrtoint/inttoptr as noop cast.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D123573
2022-04-28 11:10:55 +08:00
Kirill Stoimenov 761366e6ae Revert "[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]"
This reverts commit 74273d575f.

Buildbot: https://lab.llvm.org/buildbot/#/builders/5/builds/22795
Failing with memory leak.
2022-04-27 23:11:48 +00:00
Anna Thomas c515b2f39e [IRCE] Avoid computing potentially unnecessary analyses. NFC
IRCE is a function pass that operates on loops. If there are no loops in
the function (as seen through LI), we should avoid computing the
remaining expensive analyses (such as BPI). Reordered the analyses
requests and early return if there are no loops. This is an NFC with
compile time improvement.

The same will be done in a follow-up patch for the loop vectorizer.

Reviewed-By: nikic
Differential Revision: https://reviews.llvm.org/D124478
2022-04-27 09:22:10 -04:00
Chris Jackson 74273d575f [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]
This relands commit 8f550368b1.

The test is amended with REQUIRES: x86-registered-target, in line with
the other debuginfo-scev-salvage tests.

Differential Revision: https://reviews.llvm.org/D120169
2022-04-27 13:10:30 +01:00
Chris Jackson 855752e563 Revert [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics[2/2]
This reverts commit 8f550368b1.
2022-04-27 13:06:03 +01:00
Chris Jackson 8f550368b1 [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]
Second of two patches to extend SCEV-based salvaging to dbg.value
intrinsics that have multiple location ops pre-LSR. This second patch
adds the core implementation.

Reviewers: @StephenTozer, @djtodoro

Differential Revision: https://reviews.llvm.org/D120169
2022-04-27 12:47:35 +01:00
Chris Jackson c45e4c140f [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [1/2] [NFC]
First of two patches that extends SCEV-based salvaging to enable
salvaging of dbg.value instrinsics that have multiple locations ops
before the Loop Strength Reduction pass.

The existing single-op SCEV-based salvaging can generate variadic
dbg.value intrinsics in order to salvage a dbg.value that has a single
location op. If a dbg.value has multiple location ops before LSR, and
LSR optimises away one or more of the location operands, then currently
no salvaging will be attempted.

Salvaging can now be added, but first this patch cleans up consistency
in both the code and comments, and applies some refactoring to make
application of the new salvaging implementation more straightforward.

- Use SCEVDbgValueBuilder for both types of recovery expressions:
  IV-offset based and iteration count based.
- Combine the functions that write the final DIExpression.
- Move some static functions into member functions.

Reviewers: @Orlando

Differential Revision: https://reviews.llvm.org/D120168
2022-04-27 11:43:05 +01:00
Florian Hahn c59d95f6a4
[ConstraintElimination] Check if const. is small enough before using it
Check if the value of a ConstantInt is small enough to be used for
solving before calling getSExtValue.

Fixes #55085
2022-04-26 13:56:32 +01:00
Florian Hahn 857c612d89
[IPSCCP] Support unfeasible default dests for switch.
At the moment, unfeasible default destinations are not handled properly
in removeNonFeasibleEdges. So far, only unfeasible cases are removed,
but later code expects unreachable blocks to have no predecessors.

This is causing the crash reported in PR49573.

If the default destination is unfeasible it won't be executed. Create
a new unreachable block on demand and use that as default
destination.

Note that at the moment this only is relevant for cases where
resolvedUndefsIn marks the first case as executable. Regular switch
handling has a FIXME/TODO to support determining whether the default
case is feasible or not.

Fixes #48917.

Differential Revision: https://reviews.llvm.org/D113497
2022-04-26 12:41:41 +01:00
Dmitry Makogon d03d2d8aea [RS4GC] Prune inputs of BDV if they are BDV themselves
Don't check whether an input of BDV can be pruned if the input
is the BDV itself. BDV is present in the states map, so in case
the input is the BDV itself, we'd return false. So explicitly check this case.

Differential Revision: https://reviews.llvm.org/D123846
2022-04-26 16:05:00 +07:00
David Green 9727c77d58 [NFC] Rename Instrinsic to Intrinsic 2022-04-25 18:13:23 +01:00
Florian Hahn 6a6cc5542b
[SimpleLoopUnswitch] Enable freezing of conditions by default.
This fixes a series of mis-compiles by SimpleLoopUnswitch.

My measurements showed no performance regression with -O3 on AArch64
in SPEC2006, SPEC2017 and a set of internal benchmarks.

Fixes #50387, #50430

Depends on D124251.

Reviewed By: nikic, aqjune

Differential Revision: https://reviews.llvm.org/D124252
2022-04-25 14:26:41 +01:00
Max Kazantsev 606a000d1a [LoopInstSimplify] Ignore users in unreachable blocks. PR55072
Logic in this pass assumes that all users of loop instructions are
either in the same loop or are LCSSA Phis. In fact, there can also
be users in unreachable blocks that currently break assertions.
Such users don't need to go to the next round of simplifications.

Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D124368
2022-04-25 17:35:28 +07:00
Florian Hahn b341c44010
[SimpleLoopUnswitch] Check if freeze is needed for partial unswitching.
We only need to insert a Freeze instruction if any of the conditions
may be poison. Similar checks are already done in the other places
SimpleLoopUnswitch creates Freeze instruction.

Reviewed By: aeubanks, efriedma

Differential Revision: https://reviews.llvm.org/D124259
2022-04-22 21:24:55 +01:00
Fangrui Song 14d9390721 Revert D123198 "[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building libcalls."
test/Transforms/InstCombine/pr39177.ll failed in a -DLLVM_USE_SANITIZER=Undefined build.
```
lib/Transforms/Utils/BuildLibCalls.cpp:1217:17: runtime error: reference binding to null pointer of type 'llvm::Function'
```
`Function &F = *M->getFunction(Name);`

This reverts commit 0f8c626723.
2022-04-19 22:26:10 -07:00
Paul Kirth bac6cd5bf8 [misexpect] Re-implement MisExpect Diagnostics
Reimplements MisExpect diagnostics from D66324 to reconstruct its
original checking methodology only using MD_prof branch_weights
metadata.

New checks rely on 2 invariants:

1) For frontend instrumentation, MD_prof branch_weights will always be
   populated before llvm.expect intrinsics are lowered.

2) for IR and sample profiling, llvm.expect intrinsics will always be
   lowered before branch_weights are populated from the IR profiles.

These invariants allow the checking to assume how the existing branch
weights are populated depending on the profiling method used, and emit
the correct diagnostics. If these invariants are ever invalidated, the
MisExpect related checks would need to be updated, potentially by
re-introducing MD_misexpect metadata, and ensuring it always will be
transformed the same way as branch_weights in other optimization passes.

Frontend based profiling is now enabled without using LLVM Args, by
introducing a new CodeGen option, and checking if the -Wmisexpect flag
has been passed on the command line.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D115907
2022-04-19 21:23:48 +00:00
Jonas Paulsson 0f8c626723 [BuildLibCalls] Introduce getOrInsertLibFunc() for use when building libcalls.
A new set of overloaded functions named getOrInsertLibFunc() are now supposed
to be used instead of getOrInsertFunction() when building a libcall from
within an LLVM optimizer(). The idea is that this new function also makes
sure that any mandatory argument attributes are added to the function
prototype (after calling getOrInsertFunction()).

inferLibFuncAttributes() is renamed to inferNonMandatoryLibFuncAttrs() as it
only adds attributes that are not necessary for correctness but merely
helping with later optimizations.

Generally, the front end is responsible for building a correct function
prototype with the needed argument attributes. If the middle end however is
the one creating the call, e.g. when replacing one libcall with another, it
then must take this responsibility.

This continues the work of properly handling argument extension if required
by the target ABI when building a lib call. getOrInsertLibFunc() now does
this for all libcalls currently built by any LLVM optimizer. It is expected
that when in the future a new optimization builds a new libcall with an
integer argument it is to be added to getOrInsertLibFunc() with the proper
handling. Note that not all targets have it in their ABI to sign/zero extend
integer arguments to the full register width, but this will be done
selectively as determined by getExtAttrForI32Param().

Review: Eli Friedman, Nikita Popov, Dávid Bolvanský

Differential Revision: https://reviews.llvm.org/D123198
2022-04-19 21:22:07 +02:00
eop Chen 38ec33d6b9 [LSR] Update outdated comment 2022-04-16 12:11:15 -07:00
Florian Hahn ad95255b92
Revert "[LICM] Only create load in pre-header when promoting load."
This reverts commit 4bf3b7dc92.

This might be causing another buildbot failure.
2022-04-13 20:24:28 +02:00
Anna Thomas 28f27dd264 Check users of instrinsics instead of traversing entire function.NFC
Updated LowerGuardIntrinsic and LowerWidenableCondition to check for
users of the respective intrinsic, instead of checking for guards and
widenable conditions by traversing the entire function.

This is an NFC. Should save some compile time.
2022-04-13 12:28:51 -04:00
Florian Hahn 4bf3b7dc92
Recommit "[LICM] Only create load in pre-header when promoting load."
This reverts the revert commit 1ddc719680.

This version of the patch sets the initial available value to poison,
which resolves an issue with the SSAUpdater breaking LCSSA form.
2022-04-13 17:20:39 +02:00
Whitney Tsang 80304c5f88 [LoopUnroll] Always respect user unroll pragma
IMO when user provide unroll pragma, compiler should always respect it.
It is not clear to me why loop unroll pass currently ensure that the
unrolled loop size is limited by PragmaUnrollThreshold.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D119148
2022-04-11 14:33:24 -04:00
Florian Hahn 1ddc719680
Revert "[LICM] Only create load in pre-header when promoting load."
This reverts commit 42229b96bf.

This appears to cause crashes on multiple bots.
2022-04-11 17:37:23 +02:00
Florian Hahn 42229b96bf
[LICM] Only create load in pre-header when promoting load.
When only a store is sunk, there is no need to create a load in the
pre-header, as the result of the load will never get used.

The dead load can can introduce UB, if the function is marked as
writeonly.

Fixes #51248.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D123473
2022-04-11 16:45:18 +02:00
Arthur Eubanks b22ffc7b98 [CaptureTracking] Ignore ephemeral values in EarliestEscapeInfo
And thread DSE's ephemeral values to EarliestEscapeInfo.

This allows more precise analysis in DSEState::isReadClobber() via BatchAA.

Followup to D123162.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D123342
2022-04-08 10:07:26 -07:00
Zaara Syeda 07005440ae [LSR] Optimize unused IVs to final values in the exit block
Loop Strength Reduce sometimes optimizes away all uses of an induction variable
from a loop but leaves the IV increments. When the only remaining use of the IV
is the PHI in the exit block, this patch will call rewriteLoopExitValues to
replace the exit block PHI with the final value of the IV to skip the updates
in each loop iteration.

Differential Revision: https://reviews.llvm.org/D118808
2022-04-08 11:16:37 -04:00
Nikita Popov c8c6362560 [LICM] Pass MemorySSAUpdater by referene (NFC)
Make it clearer that this is a required dependency.
2022-04-08 10:08:57 +02:00
Nikita Popov 5cefe7d9f5 [LoopSink] Require MemorySSA
This makes MemorySSA in LoopSink required, and removes the AST-based
implementation, as well as the related support code in LICM.

Differential Revision: https://reviews.llvm.org/D123288
2022-04-08 09:49:44 +02:00
Austin Kerbow 26b14c3ea7 [InferAddressSpaces] Fix assert on invalid bitcast placement
Similar to the problem in 0bb25b4603, bitcasts that are inserted must
dominate all uses. When rewriting "values" with "new values" that have
the updated address space, we may replace the "new value" with a bitcast
if one of the original users is an addresspace cast. This bitcast must
be inserted before ALL users, not only before the addresspace cast.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D122964
2022-04-07 20:07:53 -07:00
Arthur Eubanks 17fdaccccf [CaptureTracking] Ignore ephemeral values when determining pointer escapeness
Ephemeral values cannot cause a pointer to escape.

No change in compile time:
https://llvm-compile-time-tracker.com/compare.php?from=4371710085ba1c376a094948b806ddd3b88319de&to=c5ddbcc4866f38026737762ee8d7b9b00395d4f4&stat=instructions

This partially fixes some regressions caused by more calls to `__builtin_assume` (D122397).

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D123162
2022-04-07 10:11:14 -07:00
Nikita Popov e22af03a79 [Sink] Don't sink non-willreturn calls (PR51188)
Fixes https://github.com/llvm/llvm-project/issues/51188.
2022-04-07 16:35:05 +02:00
Nikita Popov 674ee4d353 [LoopSink] Use MemorySSA with legacy pass manager
LoopSink with the legacy pass manager still uses AST, because we
can't compute MemorySSA conditionally. I think now that the legacy
pass manager will be removed soon(TM) we don't need to care about
compile-time impact here anymore. Additionally, since MemorySSA is
no longer eagerly optimized, the impact is actually not that high
anymore (~0.2% geomean regression on CTMark).

This just makes legacy PM and new PM behavior line up -- as a
followup I'll drop these options entirely and make MemorySSA use
mandatory.

Differential Revision: https://reviews.llvm.org/D123216
2022-04-07 09:40:29 +02:00
Matt Arsenault 39f1568633 Transforms: Split LowerAtomics into separate Utils and pass
This will allow code sharing from AtomicExpandPass. Not entirely sure
why these exist as separate passes though.
2022-04-06 20:54:45 -04:00
Alina Sbirlea 08075a7ee8 Revert f7381a795a
Roll-forward 29fada4a3d.
Issue triggered was due to UB.

Differential Revision: https://reviews.llvm.org/D121987
2022-04-06 16:06:14 -07:00
Congzhe Cao eac3487510 [LoopInterchange] Try to achieve the most optimal access pattern after interchange
Motivated by pr43326 (https://bugs.llvm.org/show_bug.cgi?id=43326), where a slightly
modified case is as follows.

 void f(int e[10][10][10], int f[10][10][10]) {
   for (int a = 0; a < 10; a++)
     for (int b = 0; b < 10; b++)
       for (int c = 0; c < 10; c++)
         f[c][b][a] = e[c][b][a];
 }

The ideal optimal access pattern after running interchange is supposed to be the following

 void f(int e[10][10][10], int f[10][10][10]) {
   for (int c = 0;  c < 10; c++)
     for (int b = 0; b < 10; b++)
       for (int a = 0; a < 10; a++)
         f[c][b][a] = e[c][b][a];
 }

Currently loop interchange is limited to picking up the innermost loop and finding an order
that is locally optimal for it. However, the pass failed to produce the globally optimal
loop access order. For more complex examples what we get could be quite far from the
globally optimal ordering.

What is proposed in this patch is to do a "bubble-sort" fashion when doing interchange.
By comparing neighbors in `LoopList` in each iteration, we would be able to move each loop
onto a most appropriate place, hence this is an approach that tries to achieve the
globally optimal ordering.

The motivating example above is added as a test case.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D120386
2022-04-06 15:31:56 -04:00
Martin Storsjö 46776f7556 Fix warnings about variables that are set but only used in debug mode
Add void casts to mark the variables used, next to the places where
they are used in assert or `LLVM_DEBUG()` expressions.

Differential Revision: https://reviews.llvm.org/D123117
2022-04-06 10:01:46 +03:00
Bert Abrath 019e7b7f6e
[PartiallyInlineLibCalls] Don't partially inline a musttail libcall.
Partially inlining a libcall that has the musttail attribute
leads to broken LLVM IR, triggering an assertion in the IR verifier.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D123116
2022-04-05 22:30:50 +03:00
serge-sans-paille 1e02737593 [iwyu] Fix some header include regression
Running iwyu-diff from https://github.com/serge-sans-paille/preprocessor-utils
makes it possible to quickly spot regression in unused includes. This patch
contains the few regressions since the last header cleanup.

Differential Revision: https://reviews.llvm.org/D123036
2022-04-05 15:02:03 +02:00
Nikita Popov a5c3b5748c [MemCpyOpt] Work around PR54682
As discussed on https://github.com/llvm/llvm-project/issues/54682,
MemorySSA currently has a bug when computing the clobber of calls
that access loop-varying locations. I think a "proper" fix for this
on the MemorySSA side might be non-trivial, but we can easily work
around this in MemCpyOpt:

Currently, MemCpyOpt uses a location-less getClobberingMemoryAccess()
call to find a clobber on either the src or dest location, and then
refines it for the src and dest clobber. This was intended as an
optimization, as the location-less API is cached, while the
location-affected APIs are not.

However, I don't think this really makes a difference in practice,
because I don't think anything will use the cached clobbers on
those calls later anyway. On CTMark, this patch seems to be very
mildly positive actually.

So I think this is a reasonable way to avoid the problem for now,
though MemorySSA should also get a fix.

Differential Revision: https://reviews.llvm.org/D122911
2022-04-04 10:19:51 +02:00
Nikita Popov c0cc98251a [Float2Int] Make sure dependent ranges are calculated first (PR54669)
The range calculation in walkForwards() assumes that the ranges of
the operands have already been calculated. With the used visit
order, this is not necessarily the case when there are multiple
roots. (There is nothing guaranteeing that instructions are visited
in topological order.)

Fix this by queuing instructions for reprocessing if the operand
ranges haven't been calculated yet.

Fixes https://github.com/llvm/llvm-project/issues/54669.

Differential Revision: https://reviews.llvm.org/D122817
2022-04-04 10:18:39 +02:00
Philip Reames 7c51669c21 [memcpyopt] Restructure store(load src, dest) form of callslotopt for compile time
The search for the clobbering call is fairly expensive if uses are not optimized at construction.  Defer the clobber walk to the point in the implementation we need it; there are a bunch of bailouts before that point.  (e.g. If the source pointer is not an alloca, we can't do callslotopt.)

On a test case which involves a bunch of copies from argument pointers, this switches memcpyopt from > 1/2 second to < 10ms.
2022-04-03 20:16:20 -07:00
Xiang1 Zhang f830392be7 Correct spelling error in TLS-Load-Hoist 2022-04-04 08:27:54 +08:00
Florian Hahn 5bedc1f093
[ConstraintElimination] Move logic to build worklist to helper (NFC).
This refactor makes it easier to extend the logic to collect information
from blocks in the future, without even further increasing the size of
eliminateConstriants.
2022-04-02 16:55:05 +01:00
Xiang1 Zhang a56f264958 Refine tls-load-hoista llvm option
Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D122890
2022-04-01 19:03:58 +08:00
Jorge Gorbe Moya fc7573f29c Revert "[misexpect] Re-implement MisExpect Diagnostics"
This reverts commit 46774df307.
2022-03-31 14:54:41 -07:00
Paul Kirth 46774df307 [misexpect] Re-implement MisExpect Diagnostics
Reimplements MisExpect diagnostics from D66324 to reconstruct its
original checking methodology only using MD_prof branch_weights
metadata.

New checks rely on 2 invariants:

1) For frontend instrumentation, MD_prof branch_weights will always be
   populated before llvm.expect intrinsics are lowered.

2) for IR and sample profiling, llvm.expect intrinsics will always be
   lowered before branch_weights are populated from the IR profiles.

These invariants allow the checking to assume how the existing branch
weights are populated depending on the profiling method used, and emit
the correct diagnostics. If these invariants are ever invalidated, the
MisExpect related checks would need to be updated, potentially by
re-introducing MD_misexpect metadata, and ensuring it always will be
transformed the same way as branch_weights in other optimization passes.

Frontend based profiling is now enabled without using LLVM Args, by
introducing a new CodeGen option, and checking if the -Wmisexpect flag
has been passed on the command line.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D115907
2022-03-31 17:38:21 +00:00
Nikita Popov 33ac23e7cf [Float2Int] Avoid unnecessary lamdbas (NFC)
Instead of first creating a lambda for calculating the range,
then collecting the ranges for the operands, and then calling the
lambda on those ranges, we can first calculate the operand ranges
and then calculate the result directly in the switch.
2022-03-31 16:13:13 +02:00
Nikita Popov f66975555f [Float2Int] Extract calcRange() method (NFC)
This avoids the awkward "Abort" flag, because we can simply
early-return instead.
2022-03-31 16:13:13 +02:00
Aditya Kumar 368681f803 [GVNHoist] drop debug location according to the debug info guide
According to the LLVM debug info update guide: https://llvm.org/docs/HowToUpdateDebugInfo.html,
"Hoisting identical instructions which appear in several successor
blocks into a predecessor block. In this case there is no single
merged instruction. The rule for dropping locations applies".

Thanks to Yuanbo Li for reporting this.

Reviewed By: dblaikie

Reviewers: sebpop, tejohnson, dblaikie

Differential Revision: https://reviews.llvm.org/D122730
2022-03-30 20:17:53 -07:00
Stephen Long e02f4976ac [LoopIdiom] Merge TBAA of adjacent stores when creating memset
Factor in the TBAA of adjacent stores instead of just the head store
when merging stores into a memset. We were seeing GVN remove a load that
had a TBAA that matched the 2nd store because GVN determined it didn't
match the TBAA of the memset. The memset had the TBAA of only the first
store.

i.e. Loading the field pi_ of shared_count after memset to create an
array of shared_ptr

template<class T>
class shared_ptr {
  T *p;
  shared_count refcount;
};

class shared_count {
  sp_counted_base *pi_;
};

Differential Revision: https://reviews.llvm.org/D122205
2022-03-30 16:54:49 -07:00
Chang-Sun Lin Jr c28ce745cf Value-number GVNHoist loads by result type as well as pointer address.
Avoids merge errors when opaque pointers are loaded into different types.

Reviewed by: jcranmer-intel, hiraditya
Differential Revision: https://reviews.llvm.org/D122521
2022-03-30 11:33:49 -07:00
Florian Hahn 3dbb5eb2cd
[ConstraintElimination] Move ConstraintInfo after ConstraintTy. (NFC)
Code movement to it slightly easier to use ConstraintTy & co in
ConstraintInfo directly, for follow-up patches.
2022-03-29 09:59:03 +01:00
Serguei Katkov 6444a65514 [LSR] Fixup canonicalization formula and its checker.
According to definition of canonical form, it is a canonical
if scale reg does not contain addrec for loop L then none of bases
should contain addrec for this loop.

The critical word here is "contains".

Current checker of canonical form checks not "containing" property
but "is". So it does not check whether it contains but whether it is.

Fix the checker and canonicalizing utility to follow definition.

Without this fix in the test attached the base formula looking as
reg((-1 * {0,+,8}<nuw><nsw><%bb2>)<nsw>) + 1*reg((8 * (%arg /u 8))<nuw>)
is considered as conanocial while base contains an addrec.
And modified formula we want to insert
reg({0,+,8}<nuw><nsw><%bb2>) + 1*reg((-8 * (%arg /u 8)))
is considered as not canonical.

Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D122457
2022-03-29 14:05:04 +07:00
Paul Kirth 90cb325abd Revert "[misexpect] Re-implement MisExpect Diagnostics"
This reverts commit 2add3fbd97.
2022-03-29 06:20:30 +00:00
Philip Reames 33deaa13b8 [memcpyopt] Common code into performCallSlotOptzn [NFC]
We have the same code repeated in both callers, sink it into callee.

The motivation here isn't just code style, we can also defer the relatively expensive aliasing checks until the cheap structural preconditions have been validated.  (e.g. Don't bother aliasing if src is not an alloca.)  This helps compile time significantly.
2022-03-28 20:10:13 -07:00
Paul Kirth 2add3fbd97 [misexpect] Re-implement MisExpect Diagnostics
Reimplements MisExpect diagnostics from D66324 to reconstruct its
original checking methodology only using MD_prof branch_weights
metadata.

New checks rely on 2 invariants:

1) For frontend instrumentation, MD_prof branch_weights will always be
   populated before llvm.expect intrinsics are lowered.

2) for IR and sample profiling, llvm.expect intrinsics will always be
   lowered before branch_weights are populated from the IR profiles.

These invariants allow the checking to assume how the existing branch
weights are populated depending on the profiling method used, and emit
the correct diagnostics. If these invariants are ever invalidated, the
MisExpect related checks would need to be updated, potentially by
re-introducing MD_misexpect metadata, and ensuring it always will be
transformed the same way as branch_weights in other optimization passes.

Frontend based profiling is now enabled without using LLVM Args, by
introducing a new CodeGen option, and checking if the -Wmisexpect flag
has been passed on the command line.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D115907
2022-03-28 23:30:04 +00:00
Alina Sbirlea f7381a795a Revert 29fada4a3d
Seeing a test failure with asan in Halide generated code, reverting
while I investigate.

Differential Revision: https://reviews.llvm.org/D121987
2022-03-28 16:17:41 -07:00
Florian Hahn 8c3281db49
[ConstraintElimination] Use AddOverflow for offset summation.
Fixes an incorrect transformation due to values overflowing
https://alive2.llvm.org/ce/z/uizoea
2022-03-25 18:08:24 +00:00
Djordje Todorovic 9dbc687a5e NFC: [LICM] Update some stale comments
After removing the MaybePromotable, some comments
became stale. This improves them.

Differential Revision: https://reviews.llvm.org/D122319
2022-03-24 14:37:20 +01:00
Nikita Popov 29fada4a3d [EarlyCSE] Don't eagerly optimize MemoryUses
EarlyCSE currently optimizes all MemoryUses upfront. However,
EarlyCSE only actually queries the clobbering memory access for
a subset of uses, namely those where a CSE candidate has already
been identified. Delaying use optimization to the clobber query
improves compile-time in practice.

This change is not NFC because EarlyCSE has a limit on the number
of clobber queries (EarlyCSEMssaOptCap), in which case it falls
back to the defining access. The defining access for uses will now
no longer coincide with the optimized access.

If there are performance regressions from this change, we should
be able to address them by raising this limit.

Differential Revision: https://reviews.llvm.org/D121987
2022-03-23 16:47:35 +01:00
Nikita Popov afb526b3f4 [LICM] Handle store of pointer to itself (PR54495)
Rather than iterating over users and comparing operands, iterate
over uses and check operand number. Otherwise, we'll end up
promoting a store twice if it has two equal operands.

This can only happen with opaque pointers, as otherwise both
operands differ by a level of indirection, so a bitcast would have
to be involved.

Fixes https://github.com/llvm/llvm-project/issues/54495.
2022-03-22 14:00:07 +01:00
Philip Reames ee7324b898 Rename mayBeMemoryDependent to mayHaveNonDefUseDependency [nfc] 2022-03-21 10:01:40 -07:00
psamolysov-intel 2ed030ba88 [InferAddressSpaces][NFC] Small code improvements for the InferAddressSpaces pass
There is a bunch of code improvements in the patch: marking as const everything what can be
const and fixing some typos in comments.

Also the patch removes the shadowing parameter TTI from the rewriteWithNewAddressSpaces
method, the TTI parameter is not required because the same field is in the class.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D121671
2022-03-21 11:03:12 -05:00
Kazu Hirata bce1bf0ee2 [Transform] Apply clang-tidy fixes for readability-redundant-smartptr-get (NFC) 2022-03-20 10:41:22 -07:00
Nick Desaulniers e1bae23f6f [SCCP] do not clean up dead blocks that have their address taken
[SCCP] do not clean up dead blocks that have their address taken

Fixes a crash observed in IPSCCP.

Because the SCCPSolver has already internalized BlockAddresses as
Constants or ConstantExprs, we don't want to try to update their Values
in the ValueLatticeElement. Instead, continue to propagate these
BlockAddress Constants, continue converting BasicBlocks to unreachable,
but don't delete the "dead" BasicBlocks which happen to have their
address taken.  Leave replacing the BlockAddresses to another pass.

Fixes: https://github.com/llvm/llvm-project/issues/54238
Fixes: https://github.com/llvm/llvm-project/issues/54251

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D121744
2022-03-18 11:02:15 -07:00
Florian Hahn 5ab421fb4e
[LICM] Add allowspeculation pass options.
This adds a new option to control AllowSpeculation added in D119965 when
using `-passes=...`.

This allows reproducing #54023 using opt.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D121944
2022-03-18 16:51:57 +00:00
Nikita Popov ab2284a643 [LowerConstantIntrinsics] Make TLI a required dependency
The way the pass is actually used in the optimization pipeline,
TLI will be available, but this is not the case when running just
-lower-constant-intrinsics in tests, which ends up being quite
confusing.

Require TLI unconditionally, as we usually do.
2022-03-18 14:59:18 +01:00
Nikita Popov f96428e16d [MemorySSA] Don't optimize uses during construction
This changes MemorySSA to be constructed in unoptimized form.
MemorySSA::ensureOptimizedUses() can be called to optimize all
uses (once). This should be done by passes where having optimized
uses is beneficial, either because we're going to query all uses
anyway, or because we're doing def-use walks.

This should help reduce the compile-time impact of MemorySSA for
some use cases (the reason why I started looking into this is
D117926), which can avoid optimizing all uses upfront, and instead
only optimize those that are actually queried.

Actually, we have an existing use-case for this, which is EarlyCSE.
Disabling eager use optimization there gives a significant
compile-time improvement, because EarlyCSE will generally only query
clobbers for a subset of all uses (this change is not included in
this patch).

Differential Revision: https://reviews.llvm.org/D121381
2022-03-18 09:56:16 +01:00
Florian Hahn 4a699ae9c6
[LoopSimplifyCFG] Check predecessors of exits before marking them dead.
LoopSimplifyCFG may process loops that are not in
loop-simplify/canonical form. For loops not in canonical form, exit
blocks may be reachable from non-loop blocks and we cannot consider them
as dead if they only are not reachable from the loop itself.

Unfortunately the smallest test I could come up with requires running
multiple passes:
    -passes='loop-mssa(loop-instsimplify,loop-simplifycfg,simple-loop-unswitch)'

The reason is that loops are canonicalized at the beginning of loop
pipelines, so a later transform has to break canonical form in a way
that breaks LoopSimplifyCFG's dead-exit analysis.

Alternatively we could try to require all loop passes to maintain
canonical form. That in turn would also require additional verification.

Fixes #54023, #49931.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D121925
2022-03-18 08:54:44 +00:00