Commit Graph

10733 Commits

Author SHA1 Message Date
Sanjay Patel 1089158c5a [ConstantFolding] propagate poison through vector reduction intrinsics 2021-04-29 12:54:20 -04:00
Sanjay Patel 71597d40e8 [ConstantFolding] refactor helper for vector reductions; NFC
We should handle other cases (undef/poison), so reduce
the duplication of repeated switches.
2021-04-29 12:09:22 -04:00
Craig Topper 25391cec3a [RISCV] Teach computeKnownBits that vsetvli returns number less than 2^31.
This seems like a reasonable upper bound on VL. WG discussions for
the V spec would probably allow us to use 2^16 as an upper bound
on VLEN, but this is good enough for now.

This allows us to remove sext and zext if user happens to assign
the size_t result into an int and then uses it as a VL intrinsic
argument which is size_t.

Reviewed By: frasercrmck, rogfer01, arcbbb

Differential Revision: https://reviews.llvm.org/D101472
2021-04-29 08:07:59 -07:00
Philip Reames 0c01b37eeb Generalize getInvertibleOperand recurrence handling slightly
Follow up to D99912, specifically the revert, fix, and reapply thereof.

This generalizes the invertible recurrence logic in two ways:
* By allowing mismatching operand numbers of the phi, we can recurse through a pair of phi recurrences whose operand orders have not been canonicalized.
* By allowing recurrences through operand 1, we can invert these odd (but legal) recurrence.

Differential Revision: https://reviews.llvm.org/D100884
2021-04-28 14:38:07 -07:00
Philip Reames 0cc3e10f5e [SCEV] Avoid range intersection idiom in getRangeForUnkownRecurrence [NFC]
Addresses a review comment from D101181
2021-04-28 12:48:17 -07:00
Philip Reames a836de0bde [SCEV] Compute ranges for ashr recurrences
Straight forward extension to the recently added infrastructure which was pioneered with shl. This was originally posted as part of D99687, but split off for ease of review.

(I also decided to exclude the unknown start sign case explicitly for simplicity of understanding.)

Differential Revision: https://reviews.llvm.org/D101181
2021-04-28 12:36:20 -07:00
Florian Hahn 1ed7f8ede5
[LAA] Support pointer phis in loop by analyzing each incoming pointer.
SCEV does not look through non-header PHIs inside the loop. Such phis
can be analyzed by adding separate accesses for each incoming pointer
value.

This results in 2 more loops vectorized in SPEC2000/186.crafty and
avoids regressions when sinking instructions before vectorizing.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D101286
2021-04-28 20:19:40 +01:00
Arthur Eubanks cbce28f07e [ConstFold] Use const-folded operands in more places
Previously we were const folding operands but not passing them.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101394
2021-04-27 14:30:19 -07:00
Nikita Popov e45168c4fa [SCEV] Handle uge/ugt predicates in applyLoopGuards()
These can be handled the same way as ule/ult, just using umax
instead of umin. This is useful in cases where the umax prevents
the upper bound from overflowing.

Differential Revision: https://reviews.llvm.org/D101196
2021-04-27 22:41:05 +02:00
Andy Kaylor 0a82d885a4 [Dependence Analysis] Fix ExactSIV producing wrong analysis
Patch by Artem Radzikhovskyy!

Symptom: ExactSIV test produced incorrect analysis of dependencies see LIT tests
Bug: At the end of the algorithm when determining dependence direction original author forgot to divide intermediate results by gcd and round result toward zero

Although this bug can be fixed with significantly fewer changes I opted to write the code in such a way that reflects the original algorithm that Banerjee proposed, for easier reference in the future. This surprisingly results in shorter code, and fewer quotient and max/min calculations.

Changes Summary:

- fixed findGCD to return valid x and y so that they match the function description where: ax - by = gcd(a,b)
- Fixed ExactSIV test, to produce proper results
- Documented the extension of Banerjee's algorithm that the original code author introduced. Banerjee's original algorithm only tested whether Dst depends on Src, the extension also allows us to test whether Src depends on Dst, in one pass.
- ExactRDIV test worked fine. Since it uses findGCD(), it needed to be updated.Since ExactRDIV test has very few changes from the core algorithm of ExactSIV I modified the test to have consistent format as ExactSIV.
- Updated the LIT tests to be testing for correct values.

Differential Revision: https://reviews.llvm.org/D100331
2021-04-27 12:24:00 -07:00
dfukalov e4c606acaf [TTI] NFC: Change getScalarizationOverhead and getOperandsScalarizationOverhead to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D101283
2021-04-27 08:51:48 +03:00
Hongtao Yu 30bb5be389 [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 2.
As a follow-up to D95982, this patch continues unblocking optimizations that are blocked by pseudu probe instrumention.

The optimizations unblocked are:
		- In-block load propagation.
		- In-block dead store elimination
		- Memory copy optimization that turns stores to consecutive memories into a memset.

These optimizations are local to a block, so they shouldn't affect the profile quality.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D100075
2021-04-26 16:52:33 -07:00
Vineet Kumar 84d16e2055 Implementation for TargetTransformInfo::hasActiveVectorLength()
This patch adds the missing implementation for
TargetTransformInfo::hasActiveVectorLength() without which using
hasActiveVectorLength() causes linker error.

Patch by Vineet Kumar!

Differential Revision: https://reviews.llvm.org/D100941
2021-04-26 21:20:05 +00:00
Nikita Popov a5051f2fa2 [SCEV] Fix applyLoopGuards() chaining for ne predicates
ICMP_NE predicates directly overwrote the rewritten result,
instead of chaining it with previous rewrites, as was done for
ICMP_ULT and ICMP_ULE. This means that some guards were effectively
discarded, depending on their order.
2021-04-24 21:43:46 +02:00
dfukalov 6c57044231 [GVN] Clobber partially aliased loads.
Use offsets stored in `AliasResult` implemented in D98718.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95543
2021-04-24 14:14:20 +03:00
Sander de Smalen f9a50f04ba [TTI] NFC: Change getIntImmCost[Inst|Intrin] to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100565
2021-04-23 16:06:36 +01:00
Sander de Smalen 43ace8b5ce [TTI] NFC: Change getScalingFactorCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100564
2021-04-23 16:06:36 +01:00
Sander de Smalen 008a072ded [TTI] NFC: Change getMemcpyCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100563
2021-04-23 16:06:35 +01:00
Sander de Smalen 9ba07f37f8 [TTI] NFC: Change getGEPCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100562
2021-04-23 16:06:35 +01:00
Sander de Smalen e0edfa052f [TTI] NFC: Change getAddressComputationCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100561
2021-04-23 16:06:35 +01:00
dfukalov 9ab17a60eb [TTI] NFC: Use InstructionCost to store ScalarizationCost in IntrinsicCostAttributes.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D101151
2021-04-23 18:02:00 +03:00
Philip Reames 424d6cb902 [SCEV] Compute ranges for lshr recurrences
Straight forward extension to the recently added infrastructure which was pioneered with shl.

Differential Revision: https://reviews.llvm.org/D99687
2021-04-22 11:06:31 -07:00
Wenlei He dff8315892 [CSSPGO][llvm-profdata] Support trimming cold context when merging profiles
The change adds support for triming and merging cold context when mergine CSSPGO profiles using llvm-profdata. This is similar to the context profile trimming in llvm-profgen, however the flexibility to trim cold context after profile is generated can be useful.

Differential Revision: https://reviews.llvm.org/D100528
2021-04-22 00:42:37 -07:00
Sanjay Patel 5e6dc5e404 [InstSimplify] generalize ctlz-of-shifted-constant
https://alive2.llvm.org/ce/z/zWL_VQ
2021-04-21 14:23:55 -04:00
Nico Weber ba7a92c01e [Support] Don't include VirtualFileSystem.h in CommandLine.h
CommandLine.h is indirectly included in ~50% of TUs when building
clang, and VirtualFileSystem.h is large.

(Already remarked by jhenderson on D70769.)

No behavior change.

Differential Revision: https://reviews.llvm.org/D100957
2021-04-21 10:19:01 -04:00
Yang Fan 4307446e9f
[SCEV] Fix -Wunused-variable warning (NFC)
GCC warning:
```
/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp: In member function ‘const llvm::SCEV* llvm::ScalarEvolution::getLosslessPtrToIntExpr(const llvm::SCEV*, unsigned int)::SCEVPtrToIntSinkingRewriter::visitUnknown(const llvm::SCEVUnknown*)’:
/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:1152:13: warning: unused variable ‘ExprPtrTy’ [-Wunused-variable]
 1152 |       Type *ExprPtrTy = Expr->getType();
      |             ^~~~~~~~~
```
2021-04-21 16:01:46 +08:00
Nikita Popov de18fa9e52 Revert "[InstSimplify] Bypass no-op `and`-mask, using known bits (PR49543)"
This reverts commit ea1a0d7c9a.

While this is strictly more powerful, it is also strictly slower.
InstSimplify intentionally does not perform many folds that it
is allowed to perform, if doing so requires a KnownBits calculation
that will be repeated in InstCombine.

Maybe it's worthwhile to do this here, but that needs a more
explicitly stated motivation, evaluated in a review.
2021-04-21 09:55:25 +02:00
Philip Reames 4824d876f0 Revert "Allow invokable sub-classes of IntrinsicInst"
This reverts commit d87b9b81cc.

Post commit review raised concerns, reverting while discussion happens.
2021-04-20 15:38:38 -07:00
Philip Reames d87b9b81cc Allow invokable sub-classes of IntrinsicInst
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.

This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.

After this lands and has baked for a couple days, planned cleanups:
    Make GCStatepointInst a IntrinsicInst subclass.
    Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
    Do the same in SelectionDAG.
    Do the same in FastISEL.

Differential Revision: https://reviews.llvm.org/D99976
2021-04-20 15:03:49 -07:00
Roman Lebedev ea1a0d7c9a
[InstSimplify] Bypass no-op `and`-mask, using known bits (PR49543)
We already special-cased a few interesting patterns,
but that is strictly less powerful than using KnownBits.

So instead get the known bits for the operand of `and`,
and iff all the unset bits of the `and`-mask are known to be zeros
in the operand, we can omit said `and`.
2021-04-21 00:31:46 +03:00
Philip Reames 6792e26c0d Reapply "Look through invertible recurrences in isKnownNonEqual"
I'd reverted this in commit 3b6acb1797 due to buildbot failures.  This patch contains the fix for said issue.  I'd forgotten to handle the case where two phis in the same block have different operand order.  We canonicalize away from this, but it's still valid IR.  The tests included in this change (as opposed to simply having test output changed), crashed without the fix.

Original commit message follows...

This extends the phi handling in isKnownNonEqual with a special case based on invertible recurrences. If we can prove the recurrence is invertible (which many common ones are), we can recurse through the start operands of the recurrence skipping the phi cycle.

(Side note: Instcombine currently does not push back through these cases. I will implement that in a follow up change w/separate review.)

Differential Revision: https://reviews.llvm.org/D99912
2021-04-20 12:47:59 -07:00
Philip Reames 3b6acb1797 Revert "Look through invertible recurrences in isKnownNonEqual"
This reverts commit be20eae25f.  It appears to have caused a crash on a buildbot (https://lab.llvm.org/buildbot#builders/77/builds/5653).  Reverting while investigating.
2021-04-20 11:47:10 -07:00
Philip Reames 9c1a145aeb Rearrange code to reduce diff for D99687 [nfc]
Adding the switches to reduce diffs.  I'm about to split that into an lshr part and an ashr part, doing the NFC part first makes it easier to maintain both diffs.
2021-04-20 11:40:15 -07:00
Roman Lebedev 7186764884
[NFC][SCEV] Split getLosslessPtrToIntExpr out of getPtrToIntExpr() 2021-04-20 21:29:21 +03:00
Philip Reames be20eae25f Look through invertible recurrences in isKnownNonEqual
This extends the phi handling in isKnownNonEqual with a special case based on invertible recurrences. If we can prove the recurrence is invertible (which many common ones are), we can recurse through the start operands of the recurrence skipping the phi cycle.

(Side note: Instcombine currently does not push back through these cases. I will implement that in a follow up change w/separate review.)

Differential Revision: https://reviews.llvm.org/D99912
2021-04-20 10:52:22 -07:00
Dávid Bolvanský 319c9f6e58 [MemoryBuiltins] Added support for memalign
memalign is older aligned_alloc.
2021-04-20 12:39:54 +02:00
Joe Ellis effacc1599 [AArch64] Constant fold sve_convert_from_svbool(zero) to zero
Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D100463
2021-04-20 10:02:49 +00:00
Arthur Eubanks 5e71b9fa93 Explicitly pass type to cast load constant folding result
Previously we would use the type of the pointee to determine what to
cast the result of constant folding a load. To aid with opaque pointer
types, we should explicitly pass the type of the load rather than
looking at pointee types.

ConstantFoldLoadThroughBitcast() converts the const prop'd value to the
proper load type (e.g. [1 x i32] -> i32). Instead of calling this in
every intermediate step like bitcasts, we only call this when we
actually see the global initializer value.

In some existing uses of this API, we don't know the exact type we're
loading from immediately (e.g. first we visit a bitcast, then we visit
the load using the bitcast). In those cases we have to manually call
ConstantFoldLoadThroughBitcast() when simplifying the load to make sure
that we cast to the proper type.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D100718
2021-04-20 00:53:21 -07:00
Sanjay Patel 9d43f6d7ce [LowerConstantIntrinsics] avoid crashing on alloca with unexpected operand type
The test here is reduced from the fuzzer-generated crasher in:
https://llvm.org/PR50023
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33395

I don't know if this is the best or complete solution, but the
zext of the i42 type appears to match the behavior if we run a
weird type example like this through the IR optimizer with -O1.

Differential Revision: https://reviews.llvm.org/D100766
2021-04-19 13:06:29 -04:00
Roman Lebedev 41c22acc22
[NFC][SCEV] Assert that we don't try to create SCEVPtrToIntExpr of a non-integral pointer
ptr<->int casts are only valid for integral pointes,
defensively assert that we don't try to break that here.
2021-04-19 18:38:38 +03:00
Simon Pilgrim ddcdeae358 [Analysis] ImportedFunctionsInliningStatistics.h - add <memory> and remove unused <string> include. NFCI.
Move <string> include to ImportedFunctionsInliningStatistics.cpp and add missing <memory> include as we have explicit uses of std::unique_ptr in the header.
2021-04-19 16:20:56 +01:00
Cullen Rhodes f0bc2782f2 [TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D100377
2021-04-19 11:01:34 +00:00
Roman Lebedev d480f968ad
Revert "[SCEV] Model `ashr exact x, C` as `(abs(x) EXACT/u (1<<C)) * signum(x)`"
As being discussed in https://reviews.llvm.org/D100721,
this modelling is lossy, we can't reconstruct `ash`/`ashr exact`
from it, which means that whenever we actually expand the IR,
we've just pessimized the code..

It would be good to model this pattern, after all it comes up every time
you want to compute a distance between two pointers, but not at this cost.

This reverts commit ec54867df5.
2021-04-18 16:26:45 +03:00
Serge Guelton d6de1e1a71 Normalize interaction with boolean attributes
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from

        if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

to

        if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

Introduce a getValueAsBool that normalize the check, with the following
behavior:

no attributes or attribute set to "false" => return false
attribute set to "true" => return true

Differential Revision: https://reviews.llvm.org/D99299
2021-04-17 08:17:33 +02:00
Thomas Lively 5c729750a6 [WebAssembly] Remove saturating fp-to-int target intrinsics
Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead.
This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u},
which are now represented in IR as a saturating truncation to a v2i32 followed by
a concatenation with a zero vector.

Differential Revision: https://reviews.llvm.org/D100596
2021-04-16 12:11:20 -07:00
Sanjay Patel bb907b26e2 [ValueTracking] don't recursively compute known bits using multiple llvm.assumes
This is an alternative to D99759 to avoid the compile-time explosion seen in:
https://llvm.org/PR49785

Another potential solution would make the exclusion logic stronger to avoid
blowing up, but note that we reduced the complexity of the exclusion mechanism
in D16204 because it was too costly.

So I'm questioning the need for recursion/exclusion entirely - what is the
optimization value vs. cost of recursively computing known bits based on
assumptions?
This was built into the implementation from the start with 60db058,
and we have kept adding code/cost to deal with that capability.

By clearing the query's AssumptionCache inside computeKnownBitsFromAssume(),
this patch retains all existing assume functionality except refining known
bits based on even more assumptions.

We have 1 regression test that shows a difference in optimization power.

Differential Revision: https://reviews.llvm.org/D100573
2021-04-16 08:43:35 -04:00
Mircea Trofin 0d06b14f59 [MLGO] Fix use of AM.invalidate post D100519
The ML inline advisors more aggressively invalidate certain analyses
after each call site inlining, to more accurately capture the problem
state.
2021-04-15 18:45:39 -07:00
Arthur Eubanks c8f0a7c215 [NewPM] Cleanup IR printing instrumentation
Being lazy with printing the banner seems hard to reason with, we should print it
unconditionally first (it could also lead to duplicate banners if we
have multiple functions in -filter-print-funcs).

The printIR() functions were doing too many things. I separated out the
call from PrintPassInstrumentation since we were essentially doing two
completely separate things in printIR() from different callers.

There were multiple ways to generate the name of some IR. That's all
been moved to getIRName(). The printing of the IR name was also
inconsistent, now it's always "IR Dump on $foo" where "$foo" is the
name. For a function, it's the function name. For a loop, it's what's
printed by Loop::print(), which is more detailed. For an SCC, it's the
list of functions in parentheses. For a module it's "[module]", to
differentiate between a possible SCC with a function called "module".

To preserve D74814, we have to check if we're going to print anything at
all first. This is unfortunate, but I would consider this a special
case that shouldn't be handled in the core logic.

Reviewed By: jamieschmeiser

Differential Revision: https://reviews.llvm.org/D100231
2021-04-15 09:50:55 -07:00
dfukalov ce1626f34a [AA] Updates for D95543.
Addressing latter comments in D95543:
- `AliasResult::Result` renamed to `AliasResult::Kind`
- Offset printing added for `PartialAlias` case in `-aa-eval`
- Removed VisitedPhiBBs check from BasicAA'

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D100454
2021-04-15 12:22:03 +03:00
Nikita Popov a1ed025d0e Revert "[SCEV] Don't walk uses of phis without SCEV expression when forgetting"
This reverts commit faf9f11589.

Issues with this patch have been reported in
https://reviews.llvm.org/D100264#2689917 and
https://bugs.llvm.org/show_bug.cgi?id=49967.
2021-04-15 09:43:52 +02:00
William S. Moses d3e2b4c0a2 [SROA][TBAA] Handle shift of regular TBAA nodes
SROA shifts TBAA nodes in a way that may present a problem for !tbaa but not !tbaa.struct nodes.

Differential Revision: https://reviews.llvm.org/D99851
2021-04-14 14:35:20 -04:00
Nikita Popov 0d91075f77 [ValueTracking] Don't require strictly positive for mul nsw recurrence
Just like in the mul nuw case, it's sufficient that the step is
non-zero. If the step is negative, then the values will jump
between positive and negative, "crossing" zero, but the value of
the recurrence is never actually zero.
2021-04-14 19:39:59 +02:00
Nikita Popov 5c0fb026c9 [ValueTracking] Don't require non-zero step for add nuw
It's okay if the step is zero, we'll just stay at the same non-zero
value in that case. The valuable part of this is that the step
doesn't even need to be a constant anymore.
2021-04-14 19:06:18 +02:00
Sander de Smalen 4f42d873c2 [TTI] NFC: Change getArithmeticInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100317
2021-04-14 17:20:36 +01:00
Sander de Smalen d84bd951a8 [TTI] NFC: Change getFPOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: c-rhodes

Differential Revision: https://reviews.llvm.org/D100316
2021-04-14 17:20:36 +01:00
Sander de Smalen 1af35e77f4 [TTI] NFC: Change getVectorInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100315
2021-04-14 17:20:35 +01:00
Sander de Smalen 174e8f6c5e [TTI] NFC: Change getShuffleCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100314
2021-04-14 17:20:35 +01:00
Sander de Smalen 14b934f8a6 [TTI] NFC: Change getCFInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D100313
2021-04-14 17:20:34 +01:00
Sander de Smalen 596f669cfb [TTI] NFC: Change getCallInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: c-rhodes

Differential Revision: https://reviews.llvm.org/D100312
2021-04-14 17:20:34 +01:00
Sanjay Patel 7ef2c68a3d [InstSimplify] improve efficiency for detecting non-zero value
Stepping through callstacks in the example from D99759 reveals
this potential compile-time improvement.

The savings come from avoiding ValueTracking's computing known
bits if we have already dealt with special-case patterns.

Further improvements in this direction seem possible.

This makes a degenerate test based on PR49785 about 40x faster
(25 sec -> 0.6 sec), but it does not address the larger question
of how to limit computeKnownBitsFromAssume(). Ie, the original
test there is still infinite-time for all practical purposes.

Differential Revision: https://reviews.llvm.org/D100408
2021-04-14 09:04:15 -04:00
Sanjay Patel 5ae5d25e38 [ValueTracking] match negative-stepping non-zero recurrence
This is pulled out of D100408.

This avoids a regression that would be exposed by making the
calling code from InstSimplify more efficient.
2021-04-14 08:57:53 -04:00
Sanjay Patel 4919365397 [ValueTracking] reduce code duplication; NFC
The start value can't be null for something to be a non-zero
recurrence, so hoist that common check out of the switch.

Subsequent checks may be incomplete or over-specified as noted in:
D100408
2021-04-14 08:32:42 -04:00
Philip Reames 00c8be3f93 fix whitespace type 2021-04-13 19:02:41 -07:00
Nikita Popov faf9f11589 [SCEV] Don't walk uses of phis without SCEV expression when forgetting
I've run into some cases where a large fraction of compile-time is
spent invalidating SCEV. One of the causes is forgetLoop(), which
walks all values that are def-use reachable from the loop header
phis. When invalidating a topmost loop, that might be close to all
values in a function. Additionally, it's fairly common for there to
not actually be anything to invalidate, but we'll still be performing
this walk again and again.

My first thought was that we don't need to continue walking the uses
if the current value doesn't have a SCEV expression. However, this
isn't quite right, because SCEV construction can skip over values
(e.g. for a chain of adds, we might only create a SCEV expression
for the final value).

What this patch does instead is to only walk the (full) def-use chain
of loop phis that have a SCEV expression. If there's no expression
for a phi, then we also don't have any dependent expressions to
invalidate.

Differential Revision: https://reviews.llvm.org/D100264
2021-04-13 20:28:17 +02:00
Sander de Smalen 03f47bdcb1 [TTI] NFC: Change get[Interleaved]MemoryOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100205
2021-04-13 14:21:02 +01:00
Sander de Smalen d676b5749d [TTI] NFC: Change getMaskedMemoryOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100204
2021-04-13 14:21:01 +01:00
Sander de Smalen db134e2428 [TTI] NFC: Change getCmpSelInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100203
2021-04-13 14:21:01 +01:00
Sander de Smalen 2285dfb73f [TTI] NFC: Change getMinMaxReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100202
2021-04-13 14:21:00 +01:00
Sander de Smalen bd86824d98 [TTI] NFC: Change getArithmeticReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

This patch is practically NFC, with the exception of an AArch64 SVE related
cost-model change, where we can now return an Invalid cost instead of some
bogus number.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100201
2021-04-13 14:20:59 +01:00
Sander de Smalen fd1f8a5462 [TTI] NFC: Change getGatherScatterOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100200
2021-04-13 14:20:59 +01:00
Sander de Smalen 92d8421f49 [TTI] NFC: Change getCastInstrCost and getExtractWithExtendCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100199
2021-04-13 14:20:58 +01:00
Gulfem Savrun Yeniceri e96df3e531 [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-04-13 01:29:41 +00:00
Yuanfang Chen c5fda0e662 Reland "Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom""
This reverts commit a3fabc79ae (relands
f4d682d6ce with fix for the compile-time
regression issue).
2021-04-12 14:50:54 -07:00
Nikita Popov a3fabc79ae Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom"
This reverts commit f4d682d6ce.

This caused a significant compile-time regression:
https://llvm-compile-time-tracker.com/compare.php?from=4b7bad9eaea2233521a94f6b096aaa88dc584e23&to=f4d682d6ce6c5b3a41a0acf297507c82f5c21eef&stat=instructions

Possibly this is due to overeager parsing of target triples.
2021-04-12 22:55:59 +02:00
Arthur Eubanks 269b335bd7 [Inliner] Propagate SROA analysis through invariant group intrinsics
SROA can handle invariant group intrinsics, let the inliner know that
for better heuristics when the intrinsics are present.

This fixes size issues in a couple files when turning on
-fstrict-vtable-pointers in Chrome.

Reviewed By: rnk, mtrofin

Differential Revision: https://reviews.llvm.org/D100249
2021-04-12 10:54:22 -07:00
Hamza Sood 0a92aff721 Replace uses of std::iterator with explicit using
This patch removes all uses of `std::iterator`, which was deprecated in C++17.
While this isn't currently an issue while compiling LLVM, it's useful for those using LLVM as a library.

For some reason there're a few places that were seemingly able to use `std` functions unqualified, which no longer works after this patch. I've updated those places, but I'm not really sure why it worked in the first place.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D67586
2021-04-12 10:47:14 -07:00
Yuanfang Chen f4d682d6ce [InstCombine] when calling conventions are compatible, don't convert the call to undef idiom
D24453 enabled libcalls simplication for ARM PCS. This may cause
caller/callee calling conventions mismatch in some situations such as
LTO. This patch makes instcombine aware that the compatible calling
conventions differences are benign (not emitting undef idom).

Differential Revision: https://reviews.llvm.org/D99773
2021-04-12 09:32:23 -07:00
Roman Lebedev 6d44b3c56d
[NFCI][DomTreeUpdater] applyUpdates(): reserve space for updates first
While, indeed, we may end up pushing less updates that we'd reserve space
for, self-dominating updates aren't often enough for that to matter.
But this should matter for normal updates.
2021-04-11 23:56:22 +03:00
Roman Lebedev 9829f5e6b1
[CVP] @llvm.[us]{min,max}() intrinsics handling
If we can tell that either one of the arguments is taken,
bypass the intrinsic.

Notably, we are indeed fine with non-strict predicate:
* UL: https://alive2.llvm.org/ce/z/69qVW9 https://alive2.llvm.org/ce/z/kNFTKf
      https://alive2.llvm.org/ce/z/AvaPw2 https://alive2.llvm.org/ce/z/oxo53i
* UG: https://alive2.llvm.org/ce/z/wxHeGH https://alive2.llvm.org/ce/z/Lf76qx
* SL: https://alive2.llvm.org/ce/z/hkeTGS https://alive2.llvm.org/ce/z/eR_b-W
* SG: https://alive2.llvm.org/ce/z/wEqRm7 https://alive2.llvm.org/ce/z/FpAsVr

Much like with all other comparison handling in CVP,
while we could sort-of handle two Value's,
at least for plain ICmpInst it does not appear to be worthwhile.

This only fires 78 times on test-suite + dt + rs,
but we don't canonicalize to these yet. (only SCEV produces them)
2021-04-11 00:33:47 +03:00
Nikita Popov 8de2f1ff79 [IVUsers] Check LoopSimplify cache earlier (NFC)
Check the cache before calling isLoopSimplifyForm(). Otherwise we'd
always perform the check for the innermost loop and only skip it
for dominating loops.
2021-04-10 22:58:13 +02:00
Roman Lebedev e8c7f43e2c
[NFC][ConstantRange] Add 'icmp' helper method
"Does the predicate hold between two ranges?"

Not very surprisingly, some places were already doing this check,
without explicitly naming the algorithm, cleanup them all.
2021-04-10 19:38:55 +03:00
Roman Lebedev 7b12c8c59d
Revert "[NFC][ConstantRange] Add 'icmp' helper method"
This reverts commit 17cf2c9423.
2021-04-10 19:37:53 +03:00
Roman Lebedev 17cf2c9423
[NFC][ConstantRange] Add 'icmp' helper method
"Does the predicate hold between two ranges?"

Not very surprisingly, some places were already doing this check,
without explicitly naming the algorithm, cleanup them all.
2021-04-10 19:09:52 +03:00
dfukalov 8f4b7e94a2 [AMDGPU][CostModel] Refine cost model for control-flow instructions.
Added cost estimation for switch instruction, updated costs of branches, fixed
phi cost.
Had to increase `-amdgpu-unroll-threshold-if` default value since conditional
branch cost (size) was corrected to higher value.
Test renamed to "control-flow.ll".

Removed redundant code in `X86TTIImpl::getCFInstrCost()` and
`PPCTTIImpl::getCFInstrCost()`.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D96805
2021-04-10 09:20:24 +03:00
Roman Lebedev 077bff39d4
[Analysis] isDereferenceableAndAlignedPointer(): recurse into select's hands
By doing this within the method itself,
we support traversing multiple levels of selects (TODO: PHI's),
fixing the SROA `std::clamp()` testcase.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47271
Mostly fixes https://bugs.llvm.org/show_bug.cgi?id=49909
2021-04-10 00:56:28 +03:00
Alina Sbirlea f6bff8d157 [MSSA] Rename uses in IDF regardless of new def position in basic block.
When inserting a new def and renaming of uses is asked, always compute
IDF and do the renaming for the blocks with Phis in that IDF.
Resolves PR49859.

Differential Revision: https://reviews.llvm.org/D100163
2021-04-09 12:32:37 -07:00
Momchil Velikov acf3279a03 For non-null pointer checks, do not descend through out-of-bounds GEPs
In LazyValueInfoImpl::isNonNullAtEndOfBlock we populate a set of
pointers, known to be non-null at the end of a block (e.g. because we
did a load through them). We then infer that any pointer, based on an
element of this set is non-null as well ("based" here meaning a
non-null pointer is the underlying object). This is incorrect, even if
the base pointer was non-null, the value of a GEP, that lacks the
inbounds` attribute, may be null.

This issue appeared as miscompilation of the following test case:

int puts(const char *);

typedef struct iter {
  int *val;
} iter_t;

static long distance(iter_t first, iter_t last) {
  long r = 0;
  for (; first.val != last.val; first.val++)
    ++r;
  return r;
}

int main() {
  int arr[2] = {0};
  iter_t i, j;
  i.val = arr;
  j.val = arr + 1;
  if (distance(i, j) >= 2)
    puts("failed");
  else
    puts("passed");
}

This fixes PR49662.

Differential Revision: https://reviews.llvm.org/D99642
2021-04-09 14:09:23 +01:00
dfukalov c1a88e007b [AA][NFC] Convert AliasResult to class containing offset for PartialAlias case.
Add an ability to store `Offset` between partially aliased location. Use this
storage within returned `ResultAlias` instead of caching it in `AAQueryInfo`.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98718
2021-04-09 13:26:09 +03:00
dfukalov d066079728 [NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset.
Main reason is preparation to transform AliasResult to class that contains
offset for PartialAlias case.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98027
2021-04-09 12:54:22 +03:00
Arthur Eubanks c5d1ccbcdf [GVN] Properly invalidate ICF cache when we simplify a value
This fixes a "Cached first special instruction is wrong!" assert.

The assert fires because replacing a value with another can cause an
instruction to no longer be "special" to ICF. In this case,
devirtualization happened, turning an indirect call to a
call to a willreturn function which is no longer special.

Reviewed By: nikic, rnk

Differential Revision: https://reviews.llvm.org/D99977
2021-04-08 14:01:57 -07:00
Stanislav Mekhanoshin 5f0ac1ef78 Set IgnoreLLVMUsed to false in CallGraph::addToCallGraph()
clang++ uses llvm.compiler.used in certain cases to preserve
symbol which is fully inlined. D96087 has resulted in undefined
symbols in such cases. Set it to false by default to preserve
old behavior but keep the option for specific uses where we
want to ignore these (e.g. to detect a potential indirect call
to a function).

Differential Revision: https://reviews.llvm.org/D99897
2021-04-08 11:14:09 -07:00
Max Kazantsev fee330824a [SCEV] Fix false-positive recognition of simple recurrences. PR49856
A value from reachable block may come to a Phi node as its input from
unreachable block. This may confuse matchSimpleRecurrence  which
has no access to DomTree and can falsely recognize something as a recurrency
because of this effect, as the attached test shows.

Patch `ae7b1e` deals with half of this problem, but it only accounts from
the case when an unreachable instruction comes to Phi as an input.

This patch provides a generalization by checking that no Phi block's
predecessor is unreachable (no matter what the input is).

Differential Revision: https://reviews.llvm.org/D99929
Reviewed By: reames
2021-04-07 13:55:17 +07:00
Philip Reames 908215b346 Use AssumeInst in a few more places [nfc]
Follow up to a6d2a8d6f5.  These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.
2021-04-06 13:18:53 -07:00
Philip Reames 9ef6aa020b Plumb AssumeInst through operand bundle apis [nfc]
Follow up to a6d2a8d6f5.  This covers all the public interfaces of the bundle related code.  I tried to cleanup the internals where the changes were obvious, but there's definitely more room for improvement.
2021-04-06 12:53:53 -07:00
Philip Reames a6d2a8d6f5 Add a subclass of IntrinsicInst for llvm.assume [nfc]
Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class.  A follow up change will do the same for the newer assumption query/bundle mechanisms.
2021-04-06 11:16:22 -07:00
Florian Hahn 4059c1c32d [SimplifyInst] Use correct type for GEPs with vector indices.
The current code does not properly handle vector indices unless they are
the first index.

At the moment LangRef gives the impression that the vector index must be
the one and only index (https://llvm.org/docs/LangRef.html#getelementptr-instruction).

But vector indices can appear at any position and according to the
verifier there may be multiple vector indices. If that's the case, the
number of elements must match.

This patch updates SimplifyGEPInst to properly handle those additional
cases.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99961
2021-04-06 17:56:10 +01:00
Philip Reames 21d4839948 Move GCRelocateInst and GCResultInst to IntrinsicInst.h [nfc]
These two are part of the IntrinsicInst class hierarchy and it helps to cut down on some redundant includes.
2021-04-06 08:33:15 -07:00
Kerry McLaughlin 7344f3d39a [LoopVectorize] Add strict in-order reduction support for fixed-width vectorization
Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to
reorder FP operations. However, it may still be beneficial to vectorize the loop by moving
the reduction inside the vectorized loop and making sure that the scalar reduction value
be an input to the horizontal reduction, e.g:

  %phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ]
  %load = load <8 x float>
  %reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load)

This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added
by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions.
For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D98435
2021-04-06 14:45:34 +01:00
Kerry McLaughlin 857b8a73da [LoopVectorize] Change the identity element for FAdd
Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd.

Reviewed By: dmgreen, spatel

Differential Revision: https://reviews.llvm.org/D98963
2021-04-06 12:13:43 +01:00
Simon Pilgrim ddbb58736a [KnownBits] Rename KnownBits::computeForMul to KnownBits::mul. NFCI.
As promised in D98866
2021-04-06 10:11:41 +01:00
Philip Reames 58ccbd0d08 Comment adjustments for a rename 2021-04-05 21:07:42 -07:00
Philip Reames 13deb6aac7 Exact ashr/lshr don't loose any set bits and are thus trivially invertible
Use that fact to improve isKnownNonEqual.
2021-04-05 19:22:36 -07:00
Philip Reames dc8d864e3a Address minor post commit feedback on 0e59dd 2021-04-05 18:22:17 -07:00
Sanjay Patel e2a0f512ea [InstSimplify] fix potential miscompile in select value equivalence
This is the sibling fix to c590a9880d -
as there, we can't subsitute a vector value the equality
compare replacement that we are trying requires that the
comparison is true for the entire value. Vector select
can be partly true/false.
2021-04-05 16:52:34 -04:00
Philip Reames b0e59dd6e1 Extract a helper for figuring out if an operator is invertible [nfc]
For use in an uncoming patch.  Left out the phi case (which could otherwise fit in this framework) as it would cause infinite recursion in said patch.  We can probably also leverage this in instcombine to ensure we keep the two sets of related analysis and transforms in sync.
2021-04-05 12:14:21 -07:00
Nikita Popov 72e0846ef8 [LVI] Don't bail on overdefined value in select
Even if one of the operands is overdefined, we may still produce
a non-overdefined result, e.g. due to a min/max operation. This
matches our handling elsewhere, e.g. for binary operators.

The slot poisoning comment refers to a much older LVI cache
implementation.
2021-04-04 11:11:01 +02:00
Mircea Trofin b32e76c6d5 [mlgo] fix build rules
This was prompted by D95727, which had the side-effect to break the
'release' mode build bot for ML-driven policies. The problem is that now
the pre-compiled object files don't get transitively carried through as
'source' anymore; that being said, the previous way of consuming them
was problematic, because it was only working for static builds; in
dynamic builds, the whole tf_xla_runtime was linked, which is
undesirable.

The alternative is to treat tf_xla_runtime as an archive, which then
leads to the desired effect.

Differential Revision: https://reviews.llvm.org/D99829
2021-04-03 12:49:03 -07:00
Nikita Popov b552e16b0b [Loads] Forward constant vector store to load of first element
InstCombine performs simple forwarding from stores to loads, but
currently only handles the case where the load and store have the
same size. This extends it to also handle a store of a constant
with a larger size followed by a load with a smaller size.

This is implemented through ConstantFoldLoadThroughBitcast() which
is fairly primitive (e.g. does not allow storing a large integer
and then loading a small one), but at least can forward the first
element of a vector store. Unfortunately it seems that we currently
don't have a generic helper for "read a constant value as a different
type", it's all tangled up with other logic in either
ConstantFolding or VNCoercion.

Differential Revision: https://reviews.llvm.org/D98114
2021-04-03 12:10:31 +02:00
Nikita Popov 9d20eaf9c0 [BasicAA] Don't store AATags in cache key (NFC)
The AAMDNodes part of the MemoryLocation is not used by the BasicAA
cache, so don't store it. This reduces the size of each cache entry
from 112 bytes to 48 bytes.
2021-04-03 11:32:01 +02:00
Nikita Popov 17b4e5d456 [BasicAA] Don't pass through AA metadata (NFCI)
BasicAA itself doesn't make use of AA metadata, but passes it
through to recursive queries and makes it part of the cache key.
Aliasing decisions that are based on AA metadata (i.e. TBAA and
ScopedAA) are based *only* on AA metadata, so checking them with
different pointer values or sizes is not useful, the result will
always be the same.

While this change is a mild compile-time improvement by itself,
the actual goal here is to reduce the size of AA cache keys in
a followup change.

Differential Revision: https://reviews.llvm.org/D90098
2021-04-03 11:21:50 +02:00
Simon Pilgrim 4ea5475a3f [KnownBits] Add KnownBits::haveNoCommonBitsSet helper. NFCI.
Include exhaustive test coverage.
2021-04-02 21:44:33 +01:00
Nikita Popov 4a3e006830 [LVI] Use range metadata on intrinsics
If we don't know how to handle an intrinsic, we should still
make use of normal call range metadata.
2021-04-02 16:45:31 +02:00
Sander de Smalen 0f7bbbc481 Always emit error for wrong interfaces to scalable vectors, unless cmdline flag is passed.
In order to bring up scalable vector support in LLVM incrementally,
we introduced behaviour to emit a warning, instead of an error, when
asking the wrong question of a scalable vector, like asking for the
fixed number of elements.

This patch puts that behaviour under a flag. The default behaviour is
that the compiler will always error, which means that all LLVM unit
tests and regression tests will now fail when a code-path is taken that
still uses the wrong interface.

The behaviour to demote an error to a warning can be individually enabled
for tools that want to support experimental use of scalable vectors.
This patch enables that behaviour when driving compilation from Clang.
This means that for users who want to try out scalable-vector support,
fixed-width codegen support, or build user-code with scalable vector
intrinsics, Clang will not crash and burn when the compiler encounters
such a case.

This allows us to do away with the following pattern in many of the SVE tests:
  RUN: .... 2>%t
  RUN: cat %t | FileCheck --check-prefix=WARN
  WARN-NOT: warning: ...

The behaviour to emit warnings is only temporary and we expect this flag
to be removed in the future when scalable vector support is more stable.

This patch also has fixes the following tests:
 unittests:
   ScalableVectorMVTsTest.SizeQueries
   SelectionDAGAddressAnalysisTest.unknownSizeFrameObjects
   AArch64SelectionDAGTest.computeKnownBitsSVE_ZERO_EXTEND_VECTOR_INREG

 regression tests:
   Transforms/InstCombine/vscale_gep.ll

Reviewed By: paulwalker-arm, ctetreau

Differential Revision: https://reviews.llvm.org/D98856
2021-04-02 10:55:22 +01:00
Philip Reames db357891f0 Infer dereferenceability from malloc and friends
Hookup TLI when inferring object size from allocation calls. This allows the analysis to prove dereferenceability for known allocation functions (such as malloc/new/etc) in addition to those marked explicitly with the allocsize attribute.

This is a follow up to 0129cd5 now that the bug fixed by e2c6621e6 is resolved.

As noted in the test, this relies on being able to prove that there is no free between allocation and context (e.g. hoist location). At the moment, this is handled conservatively. I'm working strengthening out ability to reason about no-free regions separately.

Differential Revision: https://reviews.llvm.org/D99737
2021-04-01 11:33:35 -07:00
Philip Reames ffa15e9463 Extract isVolatile helper on Instruction [NFCI]
We have this logic duplicated in several cases, none of which were exhaustive.  Consolidate it in one place.

I don't believe this actually impacts behavior of the callers.  I think they all filter their inputs such that their partial implementations were correct.  If not, this might be fixing a cornercase bug.
2021-04-01 11:24:02 -07:00
Philip Reames e2c6621e63 [deref-at-point] restrict inference of dereferenceability based on allocsize attribute
Support deriving dereferenceability facts from allocation sites with known object sizes while correctly accounting for any possibly frees between allocation and use site. (At the moment, we're conservative and only allowing it in functions where we know we can't free.)

This is part of the work on deref-at-point semantics. I'm making the change unconditional as the miscompile in this case is way too easy to trip by accident, and the optimization was only recently added (by me).

There will be a follow up patch wiring through TLI since that should now be doable without introducing widespread miscompiles.

Differential Revision: https://reviews.llvm.org/D95815
2021-04-01 08:34:40 -07:00
Philip Reames 4af4828a6e [ValueTracking] Handle non-zero ashr/lshr recurrences
If we know we don't shift out bits (e.g. exact), all we need to know is that input is non-zero.
2021-03-31 16:48:32 -07:00
George Mitenkov 807b019ca2 [ConstantFolding] Fixing addo/subo with undef
When folding addo/subo with undef, the current
convention is to use { -1, false } for addo and
{ 0, false } for subo. This was fixed for InstSimplify in
https://reviews.llvm.org/rGf094d65beaa492e845b03561eddd75b5be653a01,
but not in ConstantFolding.

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D99564
2021-03-31 21:47:29 +03:00
Juneyoung Lee df0b97dab0 [ValueTracking] Add with.overflow intrinsics to poison analysis functions
This is a patch teaching ValueTracking that `s/u*.with.overflow` intrinsics do not
create undef/poison and they propagate poison.
I couldn't write a nice example like the one with ctpop; ValueTrackingTest.cpp were simply updated
to check these instead.
This patch helps reducing regression while fixing https://llvm.org/pr49688 .

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99671
2021-04-01 02:41:38 +09:00
Philip Reames ae7b1e8823 [SCEV] Handle unreachable binop when matching shift recurrence
This fixes an issue introduced with my change d4648e, and reported in pr49768.

The root problem is that dominance collapses in unreachable code, and that LoopInfo explicitly only models reachable code.  Since the recurrence matcher doesn't filter by reachability (and can't easily because not all consumers have domtree), we need to bailout before assuming that finding a recurrence implies we found a loop.
2021-03-31 10:33:34 -07:00
Sander de Smalen 2f6f249a49 NFC: Change getIntrinsicInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Depends on D97468

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D97469
2021-03-31 14:04:41 +01:00
Liqiang Tao d2d6720a93 [InlineCost] Remove TODO comment that consider other forms of savings in the cost-benefit analysis
Attempts to compute savings more accurately cannot impact the set of critically important call sites.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D98577
2021-03-31 20:11:32 +08:00
Sander de Smalen 3ccbd4f3c7 NFC: Change getUserCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Depends on D97382

Reviewed By: ctetreau, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D97466
2021-03-31 10:13:09 +01:00
Krasimir Georgiev c51e91e046 Revert "[Passes] Add relative lookup table converter pass"
This reverts commit 5178ffc7cf.

Compiling `llvm-profdata` with a compiler build from this produces a
crashing binary.
2021-03-30 14:13:37 +02:00
Sander de Smalen 4ca860742d [InstructionCost] Don't conflate Invalid costs with Unknown costs.
We previously made a change to getUserCost to return a Invalid cost
when one of the TTI costs returned '-1' (meaning 'unknown' or
'infinitely expensive'). It makes no sense to say that:

  shufflevector <2 x i8> %x, <2 x i8> %y, <4 x i32> <i32 0, i32 1, i32 2, i32 3>

has an invalid cost. Perhaps the cost is not known, but the IR is valid
and can be code-generated. Invalid should only be used for IR that
cannot possibly be code-generated and where a cost is nonsensical.

With more passes now asserting that the cost must be valid, it is possible
that those assertions will fail for perfectly valid IR. An incomplete
cost-model probably shouldn't be a reason for the compiler to break.

It's better to consider these costs as 'very expensive' and ignore them
for other reasons. At some point, we should consider replacing -1 with
some other mechanism.

Reviewed By: paulwalker-arm, dmgreen

Differential Revision: https://reviews.llvm.org/D99502
2021-03-30 09:29:42 +01:00
Gulfem Savrun Yeniceri 5178ffc7cf [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-03-29 21:53:32 +00:00
Nikita Popov ce066da81c [BasicAA] Make sure types match in constant offset heuristic
This can only happen if offset types that are larger than the
pointer size are involved. The previous implementation did not
assert in this case because it initialized the APInts to the
width of one of the variables -- though I strongly suspect it
did not compute correct results in this case.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32621
reported by fhahn.
2021-03-28 21:38:09 +02:00
Nikita Popov 3df3f3df45 [BasicAA] Handle gep with unknown sizes earlier (NFCI)
If the sizes of both memory locations are unknown, we can only
perform a check on the underlying objects. There's no point in
going through GEP decomposition in this case.
2021-03-28 15:48:49 +02:00
Nikita Popov 9075864b73 [BasicAA] Refactor linear expression decomposition
The current linear expression decomposition handles zext/sext by
decomposing the casted operand, and then checking NUW/NSW flags
to determine whether the extension can be distributed. This has
some disadvantages:

First, it is not possible to perform a partial decomposition. If
we have zext((x + C1) +<nuw> C2) then we will fail to decompose
the expression entirely, even though it would be safe and
profitable to decompose it to zext(x + C1) +<nuw> zext(C2)

Second, we may end up performing unnecessary decompositions,
which will later be discarded because they lack nowrap flags
necessary for extensions.

Third, correctness of the code is not entirely obvious: At a high
level, we encounter zext(x -<nuw> C) in the form of a zext on the
linear expression x + (-C) with nuw flag set. Notably, this case
must be treated as zext(x) + -zext(C) rather than zext(x) + zext(-C).
The code handles this correctly by speculatively zexting constants
to the final bitwidth, and performing additional fixup if the
actual extension turns out to be an sext. This was not immediately
obvious to me.

This patch inverts the approach: An ExtendedValue represents a
zext(sext(V)), and linear expression decomposition will try to
decompose V further, either by absorbing another sext/zext into the
ExtendedValue, or by distributing zext(sext(x op C)) over a binary
operator with appropriate nsw/nuw flags. At each step we can
determine whether distribution is legal and abort with a partial
decomposition if not. We also know which extensions we need to
apply to constants, and don't need to speculate or fixup.
2021-03-27 23:31:58 +01:00
Nikita Popov b981bc30bf [BasicAA] Correct handle implicit sext in decomposition
While explicit sext instructions were handled correctly, the
implicit sext that occurs if the offset is smaller than the
pointer size blindly assumed that sext(X * Scale + Offset) is the
same as sext(X) * Scale + Offset, which is obviously not correct.

Fix this by extracting the code that handles linear expression
extension and reusing it for the implicit sext as well.
2021-03-27 15:15:47 +01:00
Nikita Popov 60f3e8fbe4 [BasicAA] Clarify entry values of GetLinearExpression() (NFC)
A number of variables need to be correctly initialized on entry
to GetLinearExpression() for the implementation to behave reasonably.

The fact that SExtBits can currenlty be non-zero on entry is a bug,
as demonstrated by the added test: For implicit sexts by the GEP,
we do currently skip legality checks.
2021-03-27 14:50:09 +01:00
Nikita Popov ad9dad93ff [BasicAA] Bail out earlier for invalid shift amount
Currently, we'd produce an incorrect decomposition, because we
already recursively called GetLinearExpression(), so the Scale=1,
Offset=0 will not necessarily be relative to the shl itself.

Now, this doesn't actually matter for functional correctness,
because such a shift is poison anyway, so its okay to return
an incorrect decomposition. It's still unnecessarily confusing
though, and we can easily avoid this by checking the bitwidth
earlier.
2021-03-27 12:41:16 +01:00
Nikita Popov 5a5a8088cc [BasicAA] Retain shl nowrap flags in GetLinearExpression()
Nowrap flags between mul and shl differ in that mul nsw allows
multiplication of 1 * INT_MIN, while shl nsw does not. This means
that it is always fine to transfer shl nowrap flags to muls, but
not necessarily the other way around. In this case the NUW/NSW
results refer to mul/add operations, so it's fine to retain the
flags from the shl.
2021-03-27 12:26:22 +01:00
Nikita Popov fd7df0cf38 [ValueTracking] Handle shl pair in isKnownNonEqual()
Handle (x << s) != (y << s) where x != y and the shifts are
non-wrapping. Once again, this establishes parity with the
corresponing mul fold that already exists. The shift case is
more powerful because we don't need to guard against multiplies
by zero.
2021-03-26 20:21:05 +01:00
Nikita Popov 9666e89d57 [ValueTracking] Handle shl in isKnownNonEqual()
This handles the pattern X != X << C for non-zero X and C and a
non-overflowing shift. This establishes parity with the corresponing
fold for multiplies.
2021-03-26 20:21:05 +01:00
Nikita Popov caf92a8a92 [ValueTracking] Handle non-zero shl recurrence
In this case we don't care about the step at all, and only require
that the starting value is non-zero.
2021-03-26 18:39:06 +01:00
Nikita Popov 938d05b814 [ValueTracking] Handle non-zero add/mul recurrences more precisely
This is mainly for clarity: It doesn't make sense to do any
negative/positive checks when dealing with a nuw add/mul. These
only make sense to nsw add/mul.
2021-03-26 18:30:07 +01:00
Kazu Hirata 9d375a40c3 Reapply [InlineCost] Enable the cost benefit analysis on FDO
This patch enables the cost-benefit-analysis-based inliner by default
if we have instrumentation profile.

- SPEC CPU 2017 shows a 0.4% improvement.

- An internal large benchmark shows a 0.9% reduction in the cycle
  count along with 14.6% reduction in the number of call instructions
  executed.

Differential Revision: https://reviews.llvm.org/D98213
2021-03-25 21:51:38 -07:00
Kazu Hirata 3c775d93a1 [InlineCost] Reject a zero entry count
This patch teaches the cost-benefit-analysis-based inliner to reject a
zero entry count so that we don't trigger a divide-by-zero.
2021-03-25 21:51:36 -07:00
Richard Smith 040c60d9b6 Fix a miscompile introduced by 99203f2.
getPointersDiff would previously round down the difference between two
pointers to a multiple of the element size of the pointee, which could
result in a pointer value being decreased a little.

Alexey Bataev has graciously agreed to add a testcase for this;
submitting the bugfix now to unblock.
2021-03-25 16:53:58 -07:00
Jingu Kang 3fd64cc7a3 [ValueTracking] Handle two PHIs in isKnownNonEqual()
loop:
  %cmp.0 = phi i32 [ 3, %entry ], [ %inc, %loop ]
  %pos.0 = phi i32 [ 1, %entry ], [ %cmp.0, %loop ]
  ...
  %inc = add i32 %cmp.0, 1
  br label %loop

On above example, %pos.0 uses previous iteration's %cmp.0 with backedge
according to PHI's instruction's defintion. If the %inc is not same among
iterations, we can say the two PHIs are not same.

Differential Revision: https://reviews.llvm.org/D98422
2021-03-25 22:56:05 +00:00
Nico Weber a60ffee3f4 Revert "[InlineCost] Enable the cost benefit analysis on FDO"
This reverts commit ef69aa961d.
Makes clang assert in PGO builds, see repro tgz in
https://bugs.chromium.org/p/chromium/issues/detail?id=1192783#c6
2021-03-25 16:42:19 -04:00
Nikita Popov 93a636d9f6 [IR] Lift attribute handling for assume bundles into CallBase
Rather than special-casing assume in BasicAA getModRefBehavior(),
do this one level higher, in the attribute handling of CallBase.

For assumes with operand bundles, the inaccessiblememonly attribute
applies regardless of operand bundles.
2021-03-25 21:15:39 +01:00
Philip Reames 9a82f42d12 Plumb TLI through isSafeToExecuteUnconditionally [NFC]
Split from D95815 to reduce patch size.  Isn't (yet) used for anything, only the client side is wired up.
2021-03-24 17:52:04 -07:00
Wenlei He 6869e6c1e7 [InlineCost] Make cost-benefit decision explicit
With cost-benefit analysis for inlining, we bypass the cost-threshold by returning inline result from call analyzer early.

However the cost and threshold are still available from call analyzer, and when cost is actually higher than threshold, we incorrect set the reason.

The change makes the decision from cost-benefit analysis explicit. It's mostly NFC, except that it allows the priority-based sample loader inliner used by CSSPGO to use cost-benefit heuristic.

Differential Revision: https://reviews.llvm.org/D99302
2021-03-24 16:10:58 -07:00
Kazu Hirata ef69aa961d [InlineCost] Enable the cost benefit analysis on FDO
This patch enables the cost-benefit-analysis-based inliner by default
if we have instrumentation profile.

- SPEC CPU 2017 shows a 0.4% improvement.

- An internal large benchmark shows a 0.9% reduction in the cycle
  count along with 14.6% reduction in the number of call instructions
  executed.

Differential Revision: https://reviews.llvm.org/D98213
2021-03-24 15:36:49 -07:00
Sanjay Patel adf42dff42 [ValueTracking] peek through min/max to find isKnownToBeAPowerOfTwo
This is similar to the select logic just ahead of the new code.
Min/max choose exactly one value from the inputs, so if both of
those are a power-of-2, then the result must be a power-of-2.

This might help with D98152, but we likely still need other
pieces of the puzzle to avoid regressions.

The change in PatternMatch.h is needed to build with clang.
It's possible there is a better way to deal with the 'const'
incompatibities.

Differential Revision: https://reviews.llvm.org/D99276
2021-03-24 17:54:38 -04:00
Nikita Popov a7efed5a20 [SCEV] Improve handling of not expressions in isImpliedCond()
SCEV currently tries to prove implications of x pred y by also
trying to imply ~y pred ~x. This is expensive in terms of
compile-time (in fact, the majority of isImpliedCond compile-time
is spent here) and generally not fruitful. The issue is that this
also swaps the operands and thus breaks canonical ordering. If
originally we were trying to prove an implication like
X > C1 -> Y > C2, then we'll now try to prove X > C1 -> C3 > ~Y,
which will not work.

The only real case where we can get some use out of this transform
is if the original conditions were in the form X > C1 -> Y < C2, were
then swapped to X > C1 -> C2 > Y and are then swapped again here to
X > C1 -> ~Y > C3.

As such, handle this at a higher level, where we are doing the
swapping in the first place. There's four different ways that we
can line up a predicate and a swapped predicate, so we use some
heuristics to pick some profitable way.

Because we now try this transform at a higher level
(isImpliedCondOperands rather than isImpliedCondOperandsHelper),
we can also prove additional facts. Of the added tests, one was
proven previously while the other wasn't.

Differential Revision: https://reviews.llvm.org/D90926
2021-03-24 21:53:02 +01:00
Gulfem Savrun Yeniceri 5fbe1fdf17 Revert "[Passes] Add relative lookup table converter pass"
This reverts commit 5fd001a5ff
because it broke clang-with-thin-lto-ubuntu bot.
2021-03-24 18:59:33 +00:00
Gulfem Savrun Yeniceri 5fd001a5ff [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-03-24 17:31:18 +00:00
Thomas Preud'homme 3b52c04e82 Make FindAvailableLoadedValue TBAA aware
FindAvailableLoadedValue() relies on FindAvailablePtrLoadStore() to run
the alias analysis when searching for an equivalent value. However,
FindAvailablePtrLoadStore() calls the alias analysis framework with a
memory location for the load constructed from an address and a size,
which thus lacks TBAA metadata info. This commit modifies
FindAvailablePtrLoadStore() to accept an optional memory location as
parameter to allow FindAvailableLoadedValue() to create it based on the
load instruction, which would then have TBAA metadata info attached.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99206
2021-03-24 17:20:26 +00:00
Sander de Smalen 55d18b3cc2 [TTI] Return a TypeSize from getRegisterBitWidth.
This patch changes the interface to take a RegisterKind, to indicate
whether the register bitwidth of a scalar register, fixed-width vector
register, or scalable vector register must be returned.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D98874
2021-03-24 14:45:13 +00:00
Alexey Bataev 2f1b439089 [LoopAnalysis][NFC]Remove redundant code.
Removed redundant code for IsConsecutive variable.
2021-03-24 05:37:19 -07:00
Yang Fan 279d74ffd1
[InstSimplify] Fix unused variable warning (NFC)
GCC warning:
```
/llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp: In function ‘llvm::Value* SimplifyWithOpReplaced(llvm::Value*, llvm::Value*, llvm::Value*, const llvm::SimplifyQuery&, bool, unsigned int)’:
/llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp:3993:15: warning: unused variable ‘SI’ [-Wunused-variable]
 3993 |     if (auto *SI = dyn_cast<SelectInst>(I))
      |               ^~
```
2021-03-24 09:56:36 +08:00
Jingu Kang 2e2740b859 [ValueTracking] Handle increasing mul recurrence in isKnownNonZero()
Differential Revision: https://reviews.llvm.org/D99069
2021-03-23 23:04:41 +00:00
Matteo Favaro a4fb88669c
[MSSA] Extending IsGuaranteedLoopInvariant to support an instruction defined in the entry block
As mentioned in [[ https://reviews.llvm.org/D96979 | D96979 ]], I'm extending the **IsGuaranteedLoopInvariant** check also to the `MemorySSA.cpp` file.

@fhahn For now I didn't unify the function into `MemorySSA.h` because, as you mentioned, it's not directly MSSA related. I'm open to suggestions to find a better place so we can improve the unification process.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D97155
2021-03-23 21:50:56 +00:00
Alexey Bataev 99203f2004 [Analysis]Add getPointersDiff function to improve compile time.
Added getPointersDiff function to LoopAccessAnalysis and used it instead
direct calculatoin of the distance between pointers and/or
isConsecutiveAccess function in SLP vectorizer to improve compile time
and detection of stores consecutive chains.

Part of D57059

Differential Revision: https://reviews.llvm.org/D98967
2021-03-23 14:25:36 -07:00
Nikita Popov 931b6066ac [BasicAA] Handle assumes with operand bundles
This fixes a regression reported on D99022: If a call has operand
bundles, then the inaccessiblememonly attribute on the function
will be ignored, as operand bundles can affect modref behavior in
the general case. However, for assume operand bundles in particular
this is not the case.

Adjust getModRefBehavior() to always report inaccessiblememonly
for assumes, regardless of presence of operand bundles.
2021-03-23 21:21:19 +01:00
Alexey Bataev f1b47ad278 Revert "[Analysis]Add getPointersDiff function to improve compile time."
This reverts commit 065a14a12d to
investigate and fix crash in SLP vectorizer.
2021-03-23 13:17:54 -07:00
Alexey Bataev 065a14a12d [Analysis]Add getPointersDiff function to improve compile time.
Added getPointersDiff function to LoopAccessAnalysis and used it instead
direct calculatoin of the distance between pointers and/or
isConsecutiveAccess function in SLP vectorizer to improve compile time
and detection of stores consecutive chains.

Part of D57059

Differential Revision: https://reviews.llvm.org/D98967
2021-03-23 12:58:42 -07:00
Craig Topper 4c38c35c8d [ValueTracking] Teach canCreateUndefOrPoison that ctpop does not create undef or poison.
This select of ctpop with 0 pattern can get left behind after
loop idiom recognize converts a loop to ctpop. LLVM 10 was able
to optimize this, but LLVM 11 and later is not. The difference
seems to be that some select transforms are now limited based
on canCreateUndefOrPoison.

Teaching canCreateUndefOrPoison about ctpop restores the
LLVM 10 codegen.

Differential Revision: https://reviews.llvm.org/D99207
2021-03-23 12:42:18 -07:00
Juneyoung Lee 960a767368 Reland "[InstCombine] Add simplification of two logical and/ors"
This relands 07c3b97e18 (D96945) which was reverted by
commit f49354838e.
The two-stage compilation successfully tests passes on my machine.
2021-03-23 16:24:50 +09:00
Philip Reames 013449299c Minor format tweak to deref analysis printer 2021-03-22 18:44:18 -07:00
Gulfem Savrun Yeniceri e3a6d70c68 Revert "[Passes] Add relative lookup table converter pass"
This reverts commit 78a65cd945 which
caused buildbot failures.
2021-03-23 00:43:16 +00:00
Gulfem Savrun Yeniceri 78a65cd945 [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-03-22 22:09:02 +00:00
Nikita Popov 7e18cd887c [InstCombine] Whitelist non-refining folds in SimplifyWithOpReplaced
This is an alternative to D98391/D98585, playing things more
conservatively. If AllowRefinement == false, then we don't use
InstSimplify methods at all, and instead explicitly implement a
small number of non-refining folds. Most cases are handled by
constant folding, and I only had to add three folds to cover
our unit tests / test-suite. While this may lose some optimization
power, I think it is safer to approach from this direction, given
how many issues this code has already caused.

Differential Revision: https://reviews.llvm.org/D99027
2021-03-22 22:12:56 +01:00
Nikita Popov ca28e32359 [IR] Mark assume/annotation as InaccessibleMemOnly
These intrinsics don't need to be marked as arbitrary writing,
it's sufficient to write inaccessible memory (aka "side effect")
to preserve control dependencies. This means less special-casing
in BasicAA. This is intended as an alternative to D98925.

Differential Revision: https://reviews.llvm.org/D99022
2021-03-22 22:01:03 +01:00
Juneyoung Lee b00209ed10 [SCEV] Use logical and/or matcher
This is a minor patch that updates ScalarEvolution::isImpliedCond to use logical and/or matcher.
2021-03-23 06:00:54 +09:00
Sanjay Patel 664d0c052c [TargetTransformInfo] move branch probability query from TargetLoweringInfo
This is no-functional-change intended (NFC), but needed to allow
optimizer passes to use the API. See D98898 for a proposed usage
by SimplifyCFG.

I'm simplifying the code by removing the cl::opt. That was added
back with the original commit in D19488, but I don't see any
evidence in regression tests that it was used. Target-specific
overrides can use the usual patterns to adjust as necessary.
We could also restore that cl::opt, but it was not clear to me
exactly how to do it in the convoluted TTI class structure.
2021-03-22 15:55:34 -04:00
Philip Reames 93ce855d4b 2nd attempt at a speculative fix for windows builders after d4648eea 2021-03-22 10:32:57 -07:00
Philip Reames 6ba73c4743 Speculative fix for windows builders after d4648eea 2021-03-22 10:22:01 -07:00
Philip Reames d4648eeaa2 [SCEV] Use trip count information to improve shift recurrence ranges
This patch exploits the knowledge that we may be running many fewer than bitwidth iterations of the loop, and may be able to disallow the overflow case. This patch specifically implements only the shl case, but this can be generalized to ashr and lshr without difficulty.

Differential Revision: https://reviews.llvm.org/D98222
2021-03-22 09:38:43 -07:00
Wenlei He ce6bfe9411 [CSSPGO][llvm-profgen] Use profile summary based threshold for context trimming and merging
Switch to use cold threshold from profile summary for cold context merging and trimming, instead of relying on hard coded values. Minor refactoring included for switch names, etc.

Differential Revision: https://reviews.llvm.org/D98921
2021-03-22 08:56:59 -07:00
Nikita Popov d11d5d1c5f [ValueTracking] Improve mul handling in isKnownNonEqual()
X != X * C is true if:
 * C is not 0 or 1
 * X is not 0
 * mul is nsw or nuw

Proof: https://alive2.llvm.org/ce/z/uwF29z

This is motivated by one of the cases in D98422.
2021-03-21 18:41:35 +01:00
Nikita Popov 9f864d2025 Reapply [ConstantFold] Handle vectors in ConstantFoldLoadThroughBitcast()
There seems to be an impedance mismatch between what the type
system considers an aggregate (structs and arrays) and what
constants consider an aggregate (structs, arrays and vectors).

Adjust the type check to consider vectors as well. The previous
version of the patch dropped the type check entirely, but it
turns out that getAggregateElement() does require the constant
to be an aggregate in some edge cases: For Poison/Undef the
getNumElements() API is called, without checking in advance that
we're dealing with an aggregate. Possibly the implementation should
avoid doing that, but for now I'm adding an assert so the next
person doesn't fall into this trap.
2021-03-21 17:48:21 +01:00
Nikita Popov daae927f9c [InstSimplify] Clean up SimplifyReplacedWithOp implementation (NFCI)
Replace Op with RepOp up-front, and then always work with the new
operands, rather than checking for replacement in various places.
2021-03-21 15:30:30 +01:00
Juneyoung Lee 319d093b87 [CFLGraph] Fix a crash due to missing handling of freeze
https://reviews.llvm.org/D85534#2636321
2021-03-21 02:14:13 +09:00
Philip Reames 5698537f81 Update basic deref API to account for possiblity of free [NFC]
This patch is plumbing to support work towards the goal outlined in the recent llvm-dev post "[llvm-dev] RFC: Decomposing deref(N) into deref(N) + nofree".

The point of this change is purely to simplify iteration on other pieces on way to making the switch. Rebuilding with a change to Value.h is slow and painful, so I want to get the API change landed. Once that's done, I plan to more closely audit each caller, add the inference rules in their own patch, then post a patch with the langref changes and test diffs. The value of the command line flag is that we can exercise the inference logic in standalone patches without needing the whole switch ready to go just yet.

Differential Revision: https://reviews.llvm.org/D98908
2021-03-19 11:17:19 -07:00
Philip Reames 00d0315a7c [SCEV] Factor out a lambda for strict condition splitting [NFC] 2021-03-19 10:07:12 -07:00
Max Kazantsev fff1363ba0 [SCEV] Add false->any implication
By definition of Implication operator, `false -> true` and `false -> false`. It means that
`false` implies any predicate, no matter true or false. We don't need to go any further
trying to prove the statement we need and just always say that `false` implies it in this case.

In practice it means that we are trying to prove something guarded by `false` condition,
which means that this code is unreachable, and we can safely prove any fact or perform any
transform in this code.

Differential Revision: https://reviews.llvm.org/D98706
Reviewed By: lebedev.ri
2021-03-19 11:29:48 +07:00
Mircea Trofin 92ccc6cb17 Reapply "[NPM][CGSCC] FunctionAnalysisManagerCGSCCProxy: do not clear immutable function passes"
This reverts commit 11b70b9e3a.

The bot failure was due to ArgumentPromotion deleting functions
without deleting their analyses. This was separately fixed in 4b1c807.
2021-03-18 09:44:34 -07:00
Max Kazantsev b3a1500ea8 [SCEV][NFC] API for predicate evaluation
Provides API that allows to check predicate for being true or
false with one call. Current implementation is naive and just
calls isKnownPredicate twice, but further we can rework this
logic trying to use one check to prove both facts.
2021-03-18 19:21:29 +07:00
Philip Reames 31764ea295 [LCSSA] Extract a utility for deciding if a new use requires a new lcssa phi [NFC]
(Triggered by a review comment on D98728, but otherwise unrelated.)
2021-03-17 12:14:01 -07:00
David Green e2935dcfc4 [TTI] Add a Mask to getShuffleCost
This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask
can be provided a more accurate cost can be provided by the backend.
For example VREV costs could be returned by the ARM backend. This should
be an NFC until then, laying the groundwork for that to be added.

Differential Revision: https://reviews.llvm.org/D98206
2021-03-17 17:46:26 +00:00
Bardia Mahjour fa9d8ace09 [CGSCC] Print CG node itself instead of its address
Fix the debug output from cgscc
2021-03-17 12:36:55 -04:00
Max Kazantsev a6074b092c [BasicAA] Drop dependency on Loop Info. PR43276
BasicAA stores a reference to LoopInfo inside. This imposes an implicit
requirement of keeping it up to date whenever we modify the IR (in particular,
whenever we modify terminators of blocks that belong to loops). Failing
to do so leads to incorrect state of the LoopInfo.

Because general AA does not require loop info updates and provides to API to
update it properly, the users of AA reasonably assume that there is no need to
update the loop info. It may be a reason of bugs, as example in PR43276 shows.

This patch drops dependence of BasicAA on LoopInfo to avoid this problem.

This may potentially pessimize the result of queries to BasicAA.

Differential Revision: https://reviews.llvm.org/D98627
Reviewed By: nikic
2021-03-17 11:43:44 +07:00
Zequan Wu cbd7eabea8 Revert "[ConstantFold] Handle vectors in ConstantFoldLoadThroughBitcast()"
That commit caused chromium build to crash: https://bugs.chromium.org/p/chromium/issues/detail?id=1188885

This reverts commit edf7004851.
2021-03-16 14:36:21 -07:00
Simonas Kazlauskas 6513995be3 [InstSimplify] Restrict a GEP transform to avoid provenance changes
This is a follow-up to D98588, and fixes the inline `FIXME` about a GEP-related simplification not
preserving the provenance.

https://alive2.llvm.org/ce/z/qbQoAY

Additional tests were added in {rGf125f28afdb59eba29d2491dac0dfc0a7bf1b60b}

Depends on D98672

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98611
2021-03-16 18:53:05 +02:00
Thomas Preud'homme f12433f127 [MemDepAnalysis] Remove redundant comment.
Exact same comment is found 2 lines above.
2021-03-16 15:51:17 +00:00
Max Kazantsev 5097143f0e [SCEV][NFC] Move check up the stack
One of (and primary) callers of isBasicBlockEntryGuardedByCond is
isKnownPredicateAt, which makes isKnownPredicate check before it.
It already makes non-recursive check inside. So, on this execution
path this check is made twice. The only other caller is
isLoopEntryGuardedByCond. Moving the check there should save some
compile time.
2021-03-16 22:09:17 +07:00
Simonas Kazlauskas a977324800 [InstSimplify] Match PtrToInt more directly in a GEP transform (NFC)
In preparation for D98611, the upcoming change will need to apply additional checks to `P` and `V`,
and so this refactor paves the way for adding additional checks in a less awkward way.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98672
2021-03-16 15:45:19 +02:00
Johannes Doerfert f40a2c3bef [NVPTX] CUDA does provide malloc/free since compute capability 2.X
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#dynamic-global-memory-allocation-and-operations

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D98606
2021-03-15 22:45:56 -05:00
Sanjay Patel 660728acd4 [InstSimplify] ctlz({signbit} >>u x) --> x
The motivating pattern was handled in 0a2d69480d ,
but we should have this for symmetry.

But this really highlights that we could generalize for
any shifted constant if we match this in instcombine.

https://alive2.llvm.org/ce/z/MrmVNt
2021-03-15 12:03:35 -04:00
Nikita Popov 5fb43477dc Revert "[NFCI][ValueTracking] getUnderlyingObject(): gracefully handle cycles"
This reverts commit aa440ba24d.

This has a non-trivial compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=0c5b789c7342ee8384507c3242fc256e23248c4d&to=aa440ba24dc25e4c95f6dcf8ff647024f3b12661&stat=instructions

I don't believe this is the correct way to address the issue in
this case.
2021-03-15 13:12:39 +01:00
Roman Lebedev aa440ba24d
[NFCI][ValueTracking] getUnderlyingObject(): gracefully handle cycles
Normally, this function just doesn't bother about cycles,
and hopes that the caller supplied small-enough depth
so that at worst it will take a potentially large,
but limited amount of time. But that obviously doesn't work
if there is no depth limit.

This reapples 36f1c3db66,
but without asserting, just bailout once cycle is detected.
2021-03-15 13:51:02 +03:00
Roman Lebedev f247d2ab9a
Revert "[NFCI][ValueTracking] getUnderlyingObject(): assert that no cycles are encountered"
This reverts commit 36f1c3db66.
Seems to make bots unhappy.
2021-03-15 12:00:59 +03:00
Roman Lebedev 36f1c3db66
[NFCI][ValueTracking] getUnderlyingObject(): assert that no cycles are encountered
Jeroen Dobbelaere in
https://lists.llvm.org/pipermail/llvm-dev/2021-March/149206.html
is reporting that this function can end up in an endless loop
when called from SROA w/ full restrict patches.

For now, simply ensure that such problems are caught earlier/easier.
2021-03-15 11:52:31 +03:00
Roman Lebedev 78b8ce40ef
Reland [SCEV] Improve modelling for (null) pointer constants
This reverts commit 329aeb5db4,
and relands commit 61f006ac65.

This is a continuation of D89456.

As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.

This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D98147
2021-03-13 16:05:34 +03:00
Nikita Popov b2f933a6ce [MemorySSA] Don't bail on phi starting access
When calling getClobberingMemoryAccess() with MemoryLocation on a
MemoryPHI starting access, the walker currently immediately bails
and returns the starting access. This makes sense for the API that
does not accept a location (as we wouldn't know what clobber we
should be checking for), but doesn't make sense for the
MemoryLocation-based API. This means that it can't look through
a MemoryPHI if it's the starting access, but can if there is one
more non-clobbering def in between. This patch removes the limitation.

Differential Revision: https://reviews.llvm.org/D98557
2021-03-13 10:53:13 +01:00
Roman Lebedev 329aeb5db4
Temporairly evert "[SCEV] Improve modelling for (null) pointer constants"
This appears to have broken ubsan bot:
https://lab.llvm.org/buildbot/#/builders/85/builds/3062
https://reviews.llvm.org/D98147#2623549

It looks like LSR needs some kind of a change around insertion point handling.
Reverting until i have a fix.

This reverts commit 61f006ac65.
2021-03-13 09:10:28 +03:00
Roman Lebedev 61f006ac65
[SCEV] Improve modelling for (null) pointer constants
This is a continuation of D89456.

As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.

This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D98147
2021-03-12 22:11:58 +03:00
Stanislav Mekhanoshin b7b99b0799 [AMDGPU] Fix -amdgpu-inline-arg-alloca-cost
Before D94153 this threshold was in a pre-scaled units.
After D94153 inlining threshold multiplier is not applied
to this portion of the threshold anymore. Restore the
threshold by applying the multiplier.

Differential Revision: https://reviews.llvm.org/D98362
2021-03-12 10:19:50 -08:00
serge-sans-paille 1ce2b58454 [NFC] Use llvm::raw_string_ostream instead of std::stringstream
That's more efficient and we don't loose any valuable feature when doing so.
2021-03-12 18:43:59 +01:00
Bjorn Pettersson 529c8e8dc6 [InstSimplify] Simplify smul.fix and smul.fix.sat
Add simplification of smul.fix and smul.fix.sat according to
  X * 0 -> 0
  X * undef -> 0
  X * (1 << scale) -> X

This includes the commuted patterns and splatted vectors.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D98299
2021-03-12 09:09:58 +01:00
Bjorn Pettersson 3638bdfbda [ConstantFold] Handle undef/poison when constant folding smul_fix/smul_fix_sat
Do constant folding according to
  posion * C -> poison
  C * poison -> poison
  undef * C -> 0
  C * undef -> 0
for smul_fix and smul_fix_sat intrinsics (for any scale).

Reviewed By: nikic, aqjune, nagisa

Differential Revision: https://reviews.llvm.org/D98410
2021-03-12 09:09:58 +01:00
Johannes Doerfert d22fbccfe2 [FIX] Allow non-constant assume operand bundle operands.
Fixes PR49545

Reviewed By: zequanwu, fhahn, lebedev.ri

Differential Revision: https://reviews.llvm.org/D98444
2021-03-11 23:31:09 -06:00
Mircea Trofin 11b70b9e3a Revert "[NPM][CGSCC] FunctionAnalysisManagerCGSCCProxy: do not clear immutable function passes"
This reverts commit 5eaeb0fa67.

It appears there are analyses that assume clearing - example:
https://lab.llvm.org/buildbot#builders/36/builds/5964
2021-03-11 18:31:19 -08:00
Mircea Trofin 5eaeb0fa67 [NPM][CGSCC] FunctionAnalysisManagerCGSCCProxy: do not clear immutable function passes
Check with the analysis result by calling invalidate instead of clear on
the analysis manager.

Differential Revision: https://reviews.llvm.org/D98440
2021-03-11 18:15:28 -08:00
Nikita Popov 403da6a69a Reapply [LICM] Make promotion faster
Relative to the previous implementation, this always uses
aliasesUnknownInst() instead of aliasesPointer() to correctly
handle atomics. The added test case was previously miscompiled.

-----

Even when MemorySSA-based LICM is used, an AST is still populated
for scalar promotion. As the AST has quadratic complexity, a lot
of time is spent in this step despite the existing access count
limit. This patch optimizes the identification of promotable stores.

The idea here is pretty simple: We're only interested in must-alias
mod sets of loop invariant pointers. As such, only populate the AST
with loop-invariant loads and stores (anything else is definitely
not promotable) and then discard any sets which alias with any of
the remaining, definitely non-promotable accesses.

If we promoted something, check whether this has made some other
accesses loop invariant and thus possible promotion candidates.

This is much faster in practice, because we need to perform AA
queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable)
instead of O(NumTotal^2), and NumPromotable tends to be small.
Additionally, promotable accesses have loop invariant pointers,
for which AA is cheaper.

This has a signicant positive compile-time impact. We save ~1.8%
geomean on CTMark at O3, with 6% on lencod in particular and 25%
on individual files.

Conceptually, this change is NFC, but may not be so in practice,
because the AST is only an approximation, and can produce
different results depending on the order in which accesses are
added. However, there is at least no impact on the number of promotions
(licm.NumPromoted) in test-suite O3 configuration with this change.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-11 10:50:28 +01:00
Juneyoung Lee 720a828045 Resolve unused variable warning (NFC) 2021-03-11 12:03:03 +09:00
Juneyoung Lee 8652c3e1a3 [InstSimplify] Pass SimplifyQuery to computePointerICmp (NFC) 2021-03-11 11:13:46 +09:00
Ta-Wei Tu 7ff2768be1 Revert "[LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest`"
This reverts commit df9158c9a4.
2021-03-11 01:24:43 +08:00
Alex Richardson b26d6758f0 [SLC] Simplify strcpy and friends with non-zero address spaces
The current logic in TargetLibraryInfoImpl::getLibFunc() was only treating
strcpy, etc. with i8* arguments in address space zero as a valid library
function. However, in the CHERI and Morello targets we expect all libc
functions to use address space 200 arguments.

This commit updates isValidProtoForLibFunc() to check that the argument
is a pointer type. This also drops the check for i8* since we should not
be checking the pointee type any more.

Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D95142
2021-03-10 11:17:34 +00:00
William S. Moses 875891a10d [MemoryDependence] Fix invariant group store
Fix bug in MemoryDependence [and thus GVN] for invariant group.

Previously MemDep didn't verify that the store was storing into a
pointer rather than a store simply using a pointer.

Differential Revision: https://reviews.llvm.org/D98267
2021-03-09 19:03:39 -05:00
Juneyoung Lee f49354838e Revert "[InstCombine] Add simplification of two logical and/ors"
This reverts commit 07c3b97e18 due to a reported failure in two-stage build.
2021-03-10 05:48:31 +09:00
Philip Reames a25b537bf4 [SCEV] Infer known bits from known sign bits
This was suggested by lebedev.ri over on D96534.  You'll note lack of tests.  During review, we weren't actually able to find a case which exercises it, but both I and lebedev.ri feel it's a reasonable change, straight forward, and near free.

Differential Revision: https://reviews.llvm.org/D97064
2021-03-09 12:37:17 -08:00
Benjamin Kramer 0d96ea0792 [ValueTracking] Move matchSimpleRecurrence out of line
The header only has a forward declaration of PHINode available, and this
function doesn't seem to get much out of inlining.
2021-03-09 00:04:47 +01:00
Sanjay Patel 34d0d644ff [ValueTracking] move/add helper to get inverse min/max; NFC
We will need to this functionality to improve min/max folds
in instcombine when we canonicalize to intrinsics.
2021-03-08 17:38:22 -05:00
Sanjay Patel 0a2d69480d [InstSimplify] cttz(1<<x) --> x
https://alive2.llvm.org/ce/z/TDacYu
https://alive2.llvm.org/ce/z/KF84S3
2021-03-08 16:30:14 -05:00
Alina Sbirlea 29482426b5 Revert "[LICM] Make promotion faster"
Revert 3d8f842712
Revision triggers a miscompile sinking a store incorrectly outside a
threading loop. Detected by tsan.
Reverting while investigating.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-08 12:53:03 -08:00
Philip Reames d9a29a6752 constify getUnderlyingObject implementation [nfc] 2021-03-08 11:32:54 -08:00
Ta-Wei Tu df9158c9a4 [LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest`
The check `tightlyNested()` in `LoopInterchange` is similar to the one in `LoopNest`.
In fact, the former misses some cases where loop-interchange is not feasible and results in incorrect behaviour.
Replacing it with the much robust version provided by `LoopNest` reduces code duplications and fixes https://bugs.llvm.org/show_bug.cgi?id=48113.

`LoopInterchange` has a weaker definition of tightly or perfectly nesting-ness than the one implemented in `LoopNest::arePerfectlyNested()`.
Therefore, `tightlyNested()` is instead implemented with `LoopNest::checkLoopsStructure` and additional checks for unsafe instructions.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D97290
2021-03-08 11:36:08 +08:00
Juneyoung Lee 07c3b97e18 [InstCombine] Add simplification of two logical and/ors
This is a patch that adds folding of two logical and/ors that share one variable:

a && (a && b) -> a && b
a && (a & b)  -> a && b
...

This is towards removing the poison-unsafe select optimization (D93065 has more context).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96945
2021-03-08 02:38:43 +09:00
Juneyoung Lee 2c16c4a43c [ValueTracking] update directlyImpliesPoison to look into select's condition
This is a minor update in directlyImpliesPoison and makes it look into select's
condition.
Splitted from https://reviews.llvm.org/D96945
2021-03-07 23:16:44 +09:00
Fangrui Song e6a104465d [ModuleSummaryAnalysis] Avoid duplicate elements in Worklist. NFC 2021-03-06 14:19:22 -08:00
Nikita Popov f278734bf1 [Loads] Restructure getAvailableLoadStore implementation (NFC)
Separate out some conditions with early exits, to make it easier to
support additional cases.
2021-03-06 16:58:11 +01:00
Nikita Popov edf7004851 [ConstantFold] Handle vectors in ConstantFoldLoadThroughBitcast()
There seems to be an impedance mismatch between what the type
system considers an aggregate (structs and arrays) and what
constants consider an aggregate (structs, arrays and vectors).

Rather than adjusting the type check, simply drop it entirely,
as getAggregateElement() is well-defined for non-aggregates: It
simply returns null in that case.
2021-03-06 12:17:56 +01:00
Nikita Popov a917fb89dc [LVI] Simplify and generalize handling of clamp patterns
Instead of handling a number of special cases for selects, handle
this generally when inferring ranges from conditions. We already
infer ranges from `x + C pred C2` to `x`, so doing the same for
`x pred C2` to `x + C` is straightforward.
2021-03-06 10:42:41 +01:00
Nikita Popov b42be01788 [LVI] Pass offset by reference (NFC)
Instead of by pointer. This allows us to use offsets that are not
materialized in the IR.
2021-03-06 10:24:44 +01:00
Wei Mi 2357d29335 [SampleFDO] Another fix to prevent repeated indirect call promotion in
sample loader pass.

In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9,
to prevent repeated indirect call promotion for the same indirect call
and the same target, we used zero-count value profile to indicate an
indirect call has been promoted for a certain target. We removed
PromotedInsns cache in the same patch. However, there was a problem in
that patch described below, and that problem led me to add PromotedInsns
back as a mitigation in
https://reviews.llvm.org/rG4ffad1fb489f691825d6c7d78e1626de142f26cf.

When we get value profile from metadata by calling getValueProfDataFromInst,
we need to specify the maximum possible number of values we expect to read.
We uses MaxNumPromotions in the last patch so the maximum number of value
information extracted from metadata is MaxNumPromotions. If we have many
values including zero-count values when we write the metadata, some of them
will be dropped when we read them because we only read MaxNumPromotions
values. It will allow repeated indirect call promotion again. We need to
make sure if there are values indicating promoted targets, those values need
to be saved in metadata with higher priority than other values.

The patch fixed that problem. We change to use -1 to represent the count
of a promoted target instead of 0 so it is easier to sort the values.
When we prepare to update the metadata in updateIDTMetaData, we will sort
the values in the descending count order and extract only MaxNumPromotions
values to write into metadata. Since -1 is the max uint64_t number, if we
have equal to or less than MaxNumPromotions of -1 count values, they will
all be kept in metadata. If we have more than MaxNumPromotions of -1 count
values, we will only save MaxNumPromotions such values maximally. In such
case, we have logic in place in doesHistoryAllowICP to guarantee no more
promotion in sample loader pass will happen for the indirect call, because
it has been promoted enough.

With this change, now we can remove PromotedInsns without problem.

Differential Revision: https://reviews.llvm.org/D97350
2021-03-04 18:44:12 -08:00
dfukalov 9899427174 [NFC][AliasSetTracker] Remove implicit conversion AliasResult to integer.
Preparation to make AliasResult scoped enumeration.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D97973
2021-03-05 00:53:27 +03:00
Philip Reames 83ae49671d [basicaa] Recurse through a single phi input
BasicAA knows how to analyze phis, but to control compile time, we're fairly limited in doing so. This patch loosens that restriction just slightly when there is exactly one phi input (after discounting induction variable increments). The result of this is that we can handle more cases around nested and sibling loops with pointer induction variables.

A few points to note.
* This is deliberately extremely restrictive about recursing through at most one input of the phi.  There's a known general problem with BasicAA sometimes hitting exponential compile time already, and this patch makes every effort not to compound the problem.  Once the root issue is fixed, we can probably loosen the restrictions here a bit.
* As seen in the test file, we're still missing cases which aren't *directly* based on phis (e.g. using the indvar increment). I believe this to be a separate problem and am going to explore this in another patch once this one lands.
* As seen in the test file, this results in the unfortunate fact that using phivalues sometimes results in worse quality results. I believe this comes down to an oversight in how recursive phi detection was implemented for phivalues. I'm happy to tackle this in a follow up change.

Differential Revision: https://reviews.llvm.org/D97401
2021-03-04 13:07:06 -08:00
Akira Hatanaka 1900503595 [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.

Original commit message:

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-03-04 11:22:30 -08:00
Sanjay Patel b3a33553ae [Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC
We are tracking an FP instruction that does *not* have FMF (reassoc)
properties, so calling that "Unsafe" seems opposite of the common
reading.

I also removed one getter method by rolling the null check into
the access. Further simplification seems possible.

The motivation is to clean up the interactions between FMF and
function-level attributes in these classes and their callers.
2021-03-04 08:53:04 -05:00
Sanjay Patel b3f0c2653b [Analysis] simplify propagation of FMF in recurrences; NFC
This is a mess, but this is hopefully no-functional-change.
The 'Prev' descriptor is only used for min/max recurrences
or when starting a match from a phi, so it should not be a
factor when propagating FMF for fmul/fadd.

The API is confusing (and should be reduced in subsequent steps)
because the "UnsafeAlgebraInst" appears to actually be a placeholder
for a recurrence that does NOT have FMF, but we still want to
treat it as reassociative.
2021-03-03 17:28:10 -05:00
Philip Reames c8cf27e333 Fix a build warning from ea7d208 2021-03-03 09:16:56 -08:00
Philip Reames ea7d208b78 [basicaa] Rewrite isGEPBaseAtNegativeOffset in terms of index difference [mostly NFC]
This is almost purely NFC, it just fits more obviously in the flow of the code now that we've standardized on the index different approach.  The non-NFC bit is that because of canceling the VariableOffsets in the subtract, we can now handle the case where both sides involve a common variable offset.  This isn't an "interesting" improvement; it just happens to fall out of the natural code structure.

One subtle point - the placement of this above the BaseAlias check is important in the original code as this can return NoAlias even when we can't find a relation between the bases otherwise.

Also added some enhancement TODOs noticed while understanding the existing code.

Note: This is slightly different than the LGTMed version.  I fixed the "inbounds" issue Nikita noticed with the original code in e6e5ef4 and rebased this to include the same fix.

Differential Revision: https://reviews.llvm.org/D97520
2021-03-03 09:03:28 -08:00
Philip Reames e6e5ef40cb [basicaa] Fix a latent bug in isGEPBaseAtNegativeOffset
This was pointed out in review of D97520 by Nikita, but existed in the original code as well.

The basic issue is that a decomposed GEP expression describes (potentially) more than one getelementptr.  The "inbounds" derived UB which justifies this aliasing rule requires that the entire offset be composed of "inbounds" geps.  Otherwise, as can be seen in the recently added and changes in this patch test, we can end up with a large commulative offset with only a small sub-offset actually being "inbounds".  If that small sub-offset lies within the object, the result was unsound.

We could potentially be fancier here, but for the moment, simply be conservative when any of the GEPs parsed aren't inbounds.
2021-03-03 08:43:32 -08:00
Hans Wennborg 0a5dd06718 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR"
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.

> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
>   which indicates the call is implicitly followed by a marker
>   instruction and an implicit retainRV/claimRV call that consumes the
>   call result. In addition, it emits a call to
>   @llvm.objc.clang.arc.noop.use, which consumes the call result, to
>   prevent the middle-end passes from changing the return type of the
>   called function. This is currently done only when the target is arm64
>   and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
>   with the operand bundle in the IR and removes the inserted calls after
>   processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
>   operand bundle. It doesn't remove the operand bundle on the call since
>   the backend needs it to emit the marker instruction. The retainRV and
>   claimRV calls are emitted late in the pipeline to prevent optimization
>   passes from transforming the IR in a way that makes it harder for the
>   ARC middle-end passes to figure out the def-use relationship between
>   the call and the retainRV/claimRV calls (which is the cause of
>   PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
>   nothing in the callee prevents it from being paired up with the
>   retainRV/claimRV call in the caller. It then inserts a release call if
>   claimRV is attached to the call since autoreleaseRV+claimRV is
>   equivalent to a release. If it cannot find an autoreleaseRV call, it
>   tries to transfer the operand bundle to a function call in the callee.
>   This is important since the ARC optimizer can remove the autoreleaseRV
>   returning the callee result, which makes it impossible to pair it up
>   with the retainRV/claimRV call in the caller. If that fails, it simply
>   emits a retain call in the IR if retainRV is attached to the call and
>   does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
>   constant value if the call has the operand bundle. This ensures the
>   call always has at least one user (the call to
>   @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
>   multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
>   calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808

This reverts commit ed4718eccb.
2021-03-03 15:51:40 +01:00
Nikita Popov 3d8f842712 [LICM] Make promotion faster
Even when MemorySSA-based LICM is used, an AST is still populated
for scalar promotion. As the AST has quadratic complexity, a lot
of time is spent in this step despite the existing access count
limit. This patch optimizes the identification of promotable stores.

The idea here is pretty simple: We're only interested in must-alias
mod sets of loop invariant pointers. As such, only populate the AST
with loop-invariant loads and stores (anything else is definitely
not promotable) and then discard any sets which alias with any of
the remaining, definitely non-promotable accesses.

If we promoted something, check whether this has made some other
accesses loop invariant and thus possible promotion candidates.

This is much faster in practice, because we need to perform AA
queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable)
instead of O(NumTotal^2), and NumPromotable tends to be small.
Additionally, promotable accesses have loop invariant pointers,
for which AA is cheaper.

This has a signicant positive compile-time impact. We save ~1.8%
geomean on CTMark at O3, with 6% on lencod in particular and 25%
on individual files.

Conceptually, this change is NFC, but may not be so in practice,
because the AST is only an approximation, and can produce
different results depending on the order in which accesses are
added. However, there is at least no impact on the number of promotions
(licm.NumPromoted) in test-suite O3 configuration with this change.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-02 22:10:48 +01:00
dfukalov 6e967834b9 [AA] Cache (optionally) estimated PartialAlias offsets.
For the cases of two clobbering loads and one loaded object is fully contained
in the second `BasicAAResult::aliasGEP` returns just `PartialAlias` that
is actually more common case of partial overlap, it doesn't say anything about
actual overlapping sizes.

AA users such as GVN and DSE have no functionality to estimate aliasing of GEPs
with non-constant offsets. The change stores estimated relative offsets so they
can be used further.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93529
2021-03-02 19:04:15 +03:00
Kazu Hirata d639120983 [llvm] Use set_is_subset (NFC) 2021-02-28 10:59:20 -08:00
Philip Reames f2cfef3596 Be more mathematicly precise about definition of recurrence [NFC]
This clarifies the interface of the matchSimpleRecurrence helper introduced in 8020be0b8 for non-commutative operators.  After ebd3aeba, I realized the original way I framed the routine was inconsistent.  For shifts, we only matched the the LHS form, but for sub we matched both and the caller wanted that information.  So, instead, we now consistently match both forms for non-commutative operators and the caller becomes responsible for filtering if needed.  I tried to put a clear warning in the header because I suspect the RHS form of e.g. a sub recurrence is non-obvious for most folks.  (It was for me.)
2021-02-26 11:22:01 -08:00
Philip Reames ebd3aeba27 Use helper introduced in 8020be0b8 to simplify ValueTracking [NFC]
Direct rewrite of the code the helper was extracted from.
2021-02-26 10:47:26 -08:00
Philip Reames 8020be0b8b Add a helper for matching simple recurrence cycles
This helper came up in another review, and I've got about 4 different patches with copies of this copied into it.  Time to precommit the routine.  :)
2021-02-26 10:21:23 -08:00
Stanislav Mekhanoshin d9c99043bd Option to ignore llvm[.compiler].used uses in hasAddressTaken()
Differential Revision: https://reviews.llvm.org/D96087
2021-02-25 10:06:24 -08:00
Stanislav Mekhanoshin 29e2d9461a Option to ignore assume like intrinsic uses in hasAddressTaken()
Differential Revision: https://reviews.llvm.org/D96081
2021-02-25 09:48:29 -08:00
Simon Pilgrim 96a3dfeb93 Revert rGd65ddca83ff85c7345fe9a0f5a15750f01e38420 - "[ValueTracking] ComputeKnownBits - minimum leading/trailing zero bits in LSHR/SHL (PR44526)"
This is causing sanitizer test failures that I haven't been able to fix yet.
2021-02-24 18:03:17 +00:00
Nico Weber 3d837ad704 Revert "[ValueTracking] computeKnownBitsFromShiftOperator - remove non-zero shift amount handling."
This reverts commit d37400168c.
Breaks Analysis/./AnalysisTests/ComputeKnownBitsTest.KnownNonZeroShift
2021-02-24 09:06:12 -05:00
Simon Pilgrim d37400168c [ValueTracking] computeKnownBitsFromShiftOperator - remove non-zero shift amount handling.
This no longer affects any tests after the improvements to the KnownBits shift helpers.
2021-02-24 13:49:13 +00:00
Simon Pilgrim d65ddca83f [ValueTracking] ComputeKnownBits - minimum leading/trailing zero bits in LSHR/SHL (PR44526)
Followup to D72573 - as detailed in https://blog.regehr.org/archives/1709 we don't make use of the known leading/trailing zeros for shifted values in cases where we don't know the shift amount value.

Stop ValueTracking returning zero for poison shift patterns and use the KnownBits shift helpers directly.

Extend KnownBits::shl to combine all possible shifted combinations if both min/max shift amount values are in range.

Differential Revision: https://reviews.llvm.org/D90479
2021-02-24 12:15:45 +00:00
Ta-Wei Tu 98c6110d9b [LoopNest] Use `getUniqueSuccessor()` instead when checking empty blocks
Blocks that contain only a single branch instruction to the next block can be skipped in analyzing the loop-nest structure.
This is currently done by `getSingleSuccessor()`.
However, the branch instruction might have multiple targets which happen to all be the same.
In this case, the block should still be considered as empty and skipped.

An example is `test/Transforms/LoopInterchange/update-condbranch-duplicate-successors.ll` (the LIT test for this patch is modified from it as well).

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D97286
2021-02-24 09:53:12 +08:00
Fangrui Song ef312951fd collectUsedGlobalVariables: migrate SmallPtrSetImpl overload to SmallVecImpl overload after D97128
And delete the SmallPtrSetImpl overload.

While here, decrease inline element counts from 8 to 4. See D97128 for the choice.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D97257
2021-02-23 16:09:06 -08:00
Simon Pilgrim 1020d16156 [InstSimplify] Handle nsw shl -> poison patterns
Pulled out from D90479 - this recognises invalid nsw shl patterns with signbit changes that result in poison.

Differential Revision: https://reviews.llvm.org/D97305
2021-02-23 18:26:56 +00:00
Simon Pilgrim 18b9fc48f1 [InstructionSimplify] SimplifyShift - rename shift amount KnownBits. NFCI.
As suggested on D97305.
2021-02-23 18:12:59 +00:00
David Green dd2dbf7ee2 [TTI] Change getOperandsScalarizationOverhead to take Type args
As a followup to D95291, getOperandsScalarizationOverhead was still
using a VF as a vector factor if the arguments were scalar, and would
assert on certain matrix intrinsics with differently sized vector
arguments. This patch removes the VF arg, instead passing the Types
through directly. This should allow it to more accurately compute the
cost without having to guess at which operands will be vectorized,
something difficult with more complex intrinsics.

This adjusts one SVE test as it is now calling the wrong intrinsic vs
veccall. Without invalid InstructCosts the cost of the scalarized
intrinsic is too low. This should get fixed when the cost of
scalarization is accounted for with scalable types.

Differential Revision: https://reviews.llvm.org/D96287
2021-02-23 13:04:59 +00:00
David Green bd4b61efbd [CostModel] Remove VF from IntrinsicCostAttributes
getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.

Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.

Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf working. ARM removed the fix in
dfac521da1, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.

Differential Revision: https://reviews.llvm.org/D95291
2021-02-23 13:03:26 +00:00
Arthur Eubanks 468fa037b2 Only verify LazyCallGraph under expensive checks
These verify calls are causing a lot of slowdown on some files, up to 8x.
The LazyCallGraph infra has been tested a lot over the years, so I'm fairly confident that we don't always need to run the verifys.

These verifies took >90% of total time in one of the compilations I looked at.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D97225
2021-02-22 20:18:59 -08:00
Kazu Hirata 896d0e1a2a [Analysis] Use range-based for loops (NFC) 2021-02-22 20:17:18 -08:00
Kazu Hirata 4ed47858ab [llvm] Use llvm::drop_begin (NFC) 2021-02-22 20:17:16 -08:00
Kazu Hirata 871affc5e7 [Analysis] Use ListSeparator (NFC) 2021-02-22 20:17:15 -08:00
Craig Topper 89440df64a [ValueTracking] Improve ComputeNumSignBits for SRem.
The result will have the same sign as the dividend unless the
result is 0. The magnitude of the result will always be less
than or equal to the dividend. So the result will have at least
as many sign bits as the dividend.

Previously we would do this if the divisor was a positive constant,
but that isn't required.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97170
2021-02-22 14:36:25 -08:00
Simon Pilgrim 476ff0327b [InstSimplify] Cleanup out-of-range shift amount handling.
Use APInt::uge() direct instead of getLimitedValue().

Use KnownBits::getMinValue() to make the bounds check more obvious.
2021-02-22 17:00:49 +00:00
Kazu Hirata 047fc3bf20 [Analysis] Use ListSeparator (NFC) 2021-02-21 19:58:04 -08:00
Nikita Popov e0615bcd39 [Loads] Add optimized FindAvailableLoadedValue() overload (NFCI)
FindAvailableLoadedValue() accepts an iterator by reference. If no
available value is found, then the iterator will either be left
at a clobbering instruction or the beginning of the basic block.
This allows using FindAvailableLoadedValue() across multiple blocks.

If this functionality is not needed, as is the case in InstCombine,
then we can use a much more efficient implementation: First try
to find an available value, and only perform clobber checks if
we actually found one. As this function only looks at a very small
number of instructions (6 by default) and usually doesn't find an
available value, this saves many expensive alias analysis queries.
2021-02-21 18:42:56 +01:00
Nikita Popov 7c706aa0d8 [Loads] Extract helper frunction for available load/store (NFC)
This contains the logic for extracting an available load/store
from a given instruction, to be reused in a following patch.
2021-02-21 18:24:58 +01:00
Juneyoung Lee aacf7878bc [ValueTracking] Improve impliesPoison
This patch improves ValueTracking's impliesPoison(V1, V2) to do this reasoning:

```
  %res = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b)
  %overflow = extractvalue { i64, i1 } %res, 1
  %mul      = extractvalue { i64, i1 } %res, 0

	; If %mul is poison, %overflow is also poison, and vice versa.
```

This improvement leads to supporting this optimization under `-instcombine-unsafe-select-transform=0`:

```
define i1 @test2_logical(i64 %a, i64 %b, i64* %ptr) {
; CHECK-LABEL: @test2_logical(
; CHECK-NEXT:    [[MUL:%.*]] = mul i64 [[A:%.*]], [[B:%.*]]
; CHECK-NEXT:    [[TMP1:%.*]] = icmp ne i64 [[A]], 0
; CHECK-NEXT:    [[TMP2:%.*]] = icmp ne i64 [[B]], 0
; CHECK-NEXT:    [[OVERFLOW_1:%.*]] = and i1 [[TMP1]], [[TMP2]]
; CHECK-NEXT:    [[NEG:%.*]] = sub i64 0, [[MUL]]
; CHECK-NEXT:    store i64 [[NEG]], i64* [[PTR:%.*]], align 8
; CHECK-NEXT:    ret i1 [[OVERFLOW_1]]
;

  %res = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b)
  %overflow = extractvalue { i64, i1 } %res, 1
  %mul = extractvalue { i64, i1 } %res, 0
  %cmp = icmp ne i64 %mul, 0
  %overflow.1 = select i1 %overflow, i1 true, i1 %cmp
  %neg = sub i64 0, %mul
  store i64 %neg, i64* %ptr, align 8
  ret i1 %overflow.1
}
```

Previously, this didn't happen because the flag prevented `select i1 %overflow, i1 true, i1 %cmp` from being `or i1 %overflow, %cmp`.
Note that the select -> or conversion happens only when `impliesPoison(%cmp, %overflow)` returns true.
This improvement allows `impliesPoison` to do the reasoning.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96929
2021-02-20 13:22:34 +09:00
Philip Reames b13e942224 [ValueTracking] Add a two argument form of safeCtxI [NFC]
The existing implementation was relying on order of evaluation to achieve a particular result.  This got really confusing when wanting to change the handling for arguments in a later patch.
2021-02-19 14:52:51 -08:00
Sanjay Patel 5b250a27ec [Analysis][LoopVectorize] do not form reductions of pointers
This is a fix for https://llvm.org/PR49215 either before/after
we make a verifier enhancement for vector reductions with D96904.

I'm not sure what the current thinking is for pointer math/logic
in IR. We allow icmp on pointer values. Therefore, we match min/max
patterns, so without this patch, the vectorizer could form a vector
reduction from that sequence.

But the LangRef definitions for min/max and vector reduction
intrinsics do not allow pointer types:
https://llvm.org/docs/LangRef.html#llvm-smax-intrinsic
https://llvm.org/docs/LangRef.html#llvm-vector-reduce-umax-intrinsic

So we would crash/assert at some point - either in IR verification,
in the cost model, or in codegen. If we do want to allow this kind
of transform, we will need to update the LangRef and all of those
parts of the compiler.

Differential Revision: https://reviews.llvm.org/D97047
2021-02-19 14:01:57 -05:00
Philip Reames 4a5edea193 [SCEV] Use both known bits and sign bits when computing range of SCEV unknowns
When computing a range for a SCEVUnknown, today we use computeKnownBits for unsigned ranges, and computeNumSignBots for signed ranges. This means we miss opportunities to improve range results.

One common missed pattern is that we have a signed range of a value which CKB can determine is positive, but CNSB doesn't convey that information. The current range includes the negative part, and is thus double the size.

Per the removed comment, the original concern which delayed using both (after some code merging years back) was a compile time concern. CTMark results (provided by Nikita, thanks!) showed a geomean impact of about 0.1%. This doesn't seem large enough to avoid higher quality results.

Differential Revision: https://reviews.llvm.org/D96534
2021-02-19 08:29:12 -08:00
Nikita Popov 2f17ed294f [DCE] Don't remove non-willreturn calls
In both ADCE and BDCE (via DemandedBits) we should not remove
instructions that are not guaranteed to return. This issue was
pointed out by fhahn in the recent llvm-dev thread.

Differential Revision: https://reviews.llvm.org/D96993
2021-02-19 12:35:40 +01:00
Nikita Popov 370addb996 [IR] Move willReturn() to Instruction
This moves the willReturn() helper from CallBase to Instruction,
so that it can be used in a more generic manner. This will make
it easier to fix additional passes (ADCE and BDCE), and will give
us one place to change if additional instructions should become
non-willreturn (e.g. there has been talk about handling volatile
operations this way).

I have also included the IntrinsicInst workaround directly in
here, so that it gets applied consistently. (As such this change
is not entirely NFC -- FuncAttrs will now use this as well.)

Differential Revision: https://reviews.llvm.org/D96992
2021-02-19 11:56:01 +01:00
Nikita Popov 1d9f4903c6 [BasicAA] Add simple depth limit to avoid stack overflow (PR49151)
This is a simpler variant of D96647. It just adds a straightforward
depth limit with a high cutoff, without introducing complex logic
for BatchAA consistency. It accepts that we may cache a sub-optimal
result if the depth limit is hit.

Eventually this should be more fully addressed by D96647 or similar,
but in the meantime this avoids stack overflows in a cheap way.

Differential Revision: https://reviews.llvm.org/D96996
2021-02-19 11:05:42 +01:00
Wei Mi 5fb65c02ca [SampleFDO] Stop repeated indirect call promotion for the same target.
Found a problem in indirect call promotion in sample loader pass. Currently
if an indirect call is promoted for a target, and if the parent function is
inlined into some other function, the indirect call can be promoted for the
same target again. That is redundent which can harm performance and can cause
excessive compile time in some extreme case.

The patch fixes the issue. If a target is promoted for an indirect call, the
patch will write ICP metadata with the target call count being set to 0.
In the later ICP in sample profile loader, if it sees a target has 0 count
for an indirect call, it knows the target has been promoted and won't do
indirect call promotion for the indirect call.

The fix brings 0.1~0.2% performance on our search benchmark.

Differential Revision: https://reviews.llvm.org/D96806
2021-02-18 17:01:32 -08:00
Nikita Popov 70e3c9a8b6 [BasicAA] Always strip single-argument phi nodes
We can always look through single-argument (LCSSA) phi nodes when
performing alias analysis. getUnderlyingObject() already does this,
but stripPointerCastsAndInvariantGroups() does not. We still look
through these phi nodes with the usual aliasPhi() logic, but
sometimes get sub-optimal results due to the restrictions on value
equivalence when looking through arbitrary phi nodes. I think it's
generally beneficial to keep the underlying object logic and the
pointer cast stripping logic in sync, insofar as it is possible.

With this patch we get marginally better results:

  aa.NumMayAlias | 5010069 | 5009861
  aa.NumMustAlias | 347518 | 347674
  aa.NumNoAlias | 27201336 | 27201528
  ...
  licm.NumPromoted | 1293 | 1296

I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(),
as we're past the point where we can explicitly spell out everything
that's getting stripped.

Differential Revision: https://reviews.llvm.org/D96668
2021-02-18 23:07:50 +01:00
Kazu Hirata e54579307b [llvm] Ensure newlines at the end of files (NFC)
This patch eliminates pesky "No newline at end of file" messages from
git diff.
2021-02-17 23:58:44 -08:00
William S. Moses 40862b1a74 [SROA] Propagate correct TBAA/TBAA Struct offsets
SROA does not correctly account for offsets in TBAA/TBAA struct metadata.
This patch creates functionality for generating new MD with the corresponding
offset and updates SROA to use this functionality.

Differential Revision: https://reviews.llvm.org/D95826
2021-02-17 11:59:00 -05:00
Kazu Hirata df35a183d7 [SCEV] Use ListSeparator (NFC) 2021-02-16 23:23:05 -08:00
Mircea Trofin 33481c9997 [mlgo] Fetch models from path / URL
Allow custom location for pre-trained models used when AOT-compiling
policies.

Differential Revision: https://reviews.llvm.org/D96796
2021-02-16 22:47:14 -08:00
Kerry McLaughlin ba1e150d03 [SVE] Add support for scalable vectorization of loops with int/fast FP reductions
This patch enables scalable vectorization of loops with integer/fast reductions, e.g:

```
unsigned sum = 0;
for (int i = 0; i < n; ++i) {
  sum += a[i];
}
```

A new TTI interface, isLegalToVectorizeReduction, has been added to prevent
reductions which are not supported for scalable types from vectorizing.
If the reduction is not supported for a given scalable VF,
computeFeasibleMaxVF will fall back to using fixed-width vectorization.

Reviewed By: david-arm, fhahn, dmgreen

Differential Revision: https://reviews.llvm.org/D95245
2021-02-16 13:50:06 +00:00
Sameer Sahasrabuddhe 11bf7da64a [NewPM] Introduce (GPU)DivergenceAnalysis in the new pass manager
The GPUDivergenceAnalysis is now renamed to just "DivergenceAnalysis"
since there is no conflict with LegacyDivergenceAnalysis. In the
legacy PM, this analysis can only be used through the legacy DA
serving as a wrapper. It is now made available as a pass in the new
PM, and has no relation with the legacy DA.

The new DA currently cannot handle irreducible control flow; its
presence can cause the analysis to run indefinitely. The analysis is
now modified to detect this and report all instructions in the
function as divergent. This is super conservative, but allows the
analysis to be used without hanging the compiler.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D96615
2021-02-16 10:26:45 +05:30
Sanjay Patel 378941f611 [ValueTracking] add scan limit for assumes
In the motivating example from https://llvm.org/PR49171 and
reduced test here, we would unroll and clone assumes so much
that compile-time effectively became infinite while analyzing
all of those assumes.
2021-02-15 15:24:20 -05:00
Kerry McLaughlin 5fe1593438 [LoopVectorizer] Require no-signed-zeros-fp-math=true for fmin/fmax
Currently, setting the `no-nans-fp-math` attribute to true will allow
loops with fmin/fmax to vectorize, though we should be requiring that
`no-signed-zeros-fp-math` is also set.

This patch adds the check for no-signed-zeros at the function level and includes
tests to make sure we don't vectorize functions with only one of the attributes
associated.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D96604
2021-02-15 13:47:05 +00:00
Caroline Concatto 2d728bbff5 [CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse
This patch adds  a new intrinsic experimental.vector.reduce that takes a single
vector and returns a vector of matching type but with the original lane order
 reversed. For example:

```
vector.reverse(<A,B,C,D>) ==> <D,C,B,A>
```

The new intrinsic supports fixed and scalable vectors types.
The fixed-width vector relies on shufflevector to maintain existing behaviour.
Scalable vector uses the new ISD node - VECTOR_REVERSE.

This new intrinsic is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].

Patch by Paul Walker (@paulwalker-arm).

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html

Differential Revision: https://reviews.llvm.org/D94883
2021-02-15 13:39:43 +00:00
Sjoerd Meijer 357237e93e Recommit "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode"
This reverts commit effc3b0799, with the build
problem fixed.
2021-02-15 11:33:00 +00:00
Sjoerd Meijer effc3b0799 Revert "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode"
This reverts commit cd6de0e8de.
2021-02-15 11:01:23 +00:00
Sjoerd Meijer cd6de0e8de [TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode
This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into
getPreferredAddressingMode() so that we have one interface to steer LSR in
generating the preferred addressing mode.

Differential Revision: https://reviews.llvm.org/D96600
2021-02-15 10:44:15 +00:00
Nikita Popov f197cf2126 [BasicAA] Merge aliasGEP code paths
At this point, we can treat the case of GEP/GEP aliasing and
GEP/non-GEP aliasing in essentially the same way. The only
differences are that we need to do an additional negative GEP base
check, and that we perform a bailout on unknown sizes for the
GEP/non-GEP case (the latter exists only to limit compile-time).

This change is not quite NFC due to the peculiar effect that
the DecomposedGEP for V2 can actually be non-trivial even if V2
is not a GEP. The reason for this is that getUnderlyingObject()
can look through LCSSA phi nodes, while stripPointerCasts() doesn't.
This can lead to slightly better results if single-entry phi nodes
occur inside a loop, where looking through the phi node via aliasPhi()
would subject it to phi cycle equivalence restrictions. It would
probably make sense to adjust pointer cast stripping (for AA) to
handle this case, and ensure consistent results.
2021-02-14 19:35:36 +01:00
Kazu Hirata 910e2d1e57 [llvm] Use llvm::is_contained (NFC) 2021-02-14 08:36:20 -08:00
Kazu Hirata 6ac12e4b34 [Analysis] Use ListSeparator (NFC) 2021-02-14 08:36:14 -08:00
Nikita Popov 53ae96d4bb [BasicAA] Avoid duplicate query for GEPs with identical offsets (NFCI)
For two GEPs with identical offsets, we currently first perform
a base address query without size information, and then if it is
MayAlias, perform another with size information. This is pointless,
as the latter query should produce strictly better results.

This was not quite true historically due to the way that NoAlias
assumptions were handled, but that issue has since been resolved.
2021-02-14 17:18:28 +01:00
Nikita Popov 728803ed74 [BasicAA] Use index difference to detect GEPs with identical indexes
We currently detect GEPs that have exactly the same indexes by
comparing the Offsets and VarIndices. However, the latter implicitly
performs equality comparisons between two values, which is not
generally legal inside BasicAA, due to the possibility of comparisons
across phi cycles.

I believe that in this particular instance this actually ends up being
unproblematic, at least I wasn't able to come up with any cases that
could result in an incorrect root query result.

In the interest of being defensive, compute GetIndexDifference earlier
(which knows how to handle phi cycles properly) and use the result of
that to determine whether the offsets are identical.
2021-02-14 17:11:03 +01:00
aqjune 5f3c99085d [ValueTracking] Dereferenced pointers are noundef
This is a follow-up of D95238's LangRef update.
This patch updates `programUndefinedIfUndefOrPoison(V)` to return true if
`V` is used by any memory-accessing instruction.
Interestingly, this affected many tests in Attributors, mainly about adding noundefs.
The tests are updated using llvm/utils/update_test_checks.py. I checked that the diffs
are about updating noundefs.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96642
2021-02-14 22:50:48 +09:00
Tyker 642e9225c6 reland [InstCombine] convert assumes to operand bundles
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.

Differential Revision: https://reviews.llvm.org/D82703
2021-02-13 13:03:11 +01:00
Nikita Popov 20cb6c7ceb [AA] Add option for tracing AA queries (NFC)
Add an -aa-trace debug option that can be used to print AA queries,
including any recursive queries and their results.
2021-02-12 21:42:49 +01:00
Nikita Popov 191e469ede [AA] Move Depth member from AAResults to AAQI (NFC)
Rather than storing the query depth in AAResults, store it in AAQI.
This makes more sense, as it is a property of the query. This
sidesteps the issue of D94363, fixing slightly inaccurate AA
statistics. Additionally, I plan to use the Depth from BasicAA in
the future, where fetching it from AAResults would be unreliable.

This change is not quite as straightforward as it seems, because
we need to preserve the depth when creating a new AAQI for recursive
queries across phis. I'm adding a new method for this, as we may
need to preserve additional information here in the future.
2021-02-12 21:42:36 +01:00
Akira Hatanaka ed4718eccb [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-12 09:51:57 -08:00
Kerry McLaughlin fea06efe7c [SVE][LoopVectorize] Support for vectorization of loops with function calls
Changes `getScalarizationOverhead` to return an invalid cost for scalable VFs
and adds some simple tests for loops containing a function for which
there is a vectorized variant available.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D96356
2021-02-12 13:47:43 +00:00
Sanjay Patel 79b1b4a581 [Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics
The vector reduction intrinsics started life as experimental ops, so backend support
was lacking. As part of promoting them to 1st-class intrinsics, however, codegen
support was added/improved:
D58015
D90247

So I think it is safe to now remove this complication from IR.

Note that we still have an IR-level codegen expansion pass for these as discussed
in D95690. Removing that is another step in simplifying the logic. Also note that
x86 was already unconditionally forming reductions in IR, so there should be no
difference for x86.

I spot checked a couple of the tests here by running them through opt+llc and did
not see any asm diffs.

If we do find functional differences for other targets, it should be possible
to (at least temporarily) restore the shuffle IR with the ExpandReductions IR
pass.

Differential Revision: https://reviews.llvm.org/D96552
2021-02-12 08:13:50 -05:00
David Sherwood 01b87444cb [NFC][Analysis] Change struct VecDesc to use ElementCount
This patch changes the VecDesc struct to use ElementCount
instead of an unsigned VF value, in preparation for
future work that adds support for vectorized versions of
math functions using scalable vectors. Since all I'm doing
in this patch is switching the type I believe it's a
non-functional change. I changed getWidestVF to now return
both the widest fixed-width and scalable VF values, but
currently the widest scalable value will be zero.

Differential Revision: https://reviews.llvm.org/D96011
2021-02-12 11:07:58 +00:00