Commit Graph

25239 Commits

Author SHA1 Message Date
Max Kazantsev 795e4ee9d2 [NFC] Move functon from IndVarSimplify to SCEV
This function can be reused in other places.

Differential Revision: https://reviews.llvm.org/D87274
Reviewed By: fhahn, lebedev.ri
2020-09-09 11:20:59 +07:00
David Stenberg 17dce2fe43 [UnifyFunctionExitNodes] Remove unused getters, NFC
The get{Return,Unwind,Unreachable}Block functions in
UnifyFunctionExitNodes have not been used for many years,
so just remove them.

Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D87078
2020-09-08 20:42:28 +02:00
Nikita Popov f6b87da0c7 [InstCombine] Fold comparison of abs with int min
If the abs is poisoning, this is already folded to true/false.
For non-poisoning abs, we can convert this to a comparison with
the operand.
2020-09-08 20:23:03 +02:00
Nikita Popov e97f3b1b43 [InstCombine] Fold abs of known negative operand
If we know that the abs operand is known negative, we can replace
it with a neg.

To avoid computing known bits twice, I've removed the fold for the
non-negative case from InstSimplify. Both the non-negative and the
negative case are handled by InstCombine now, with one known bits call.

Differential Revision: https://reviews.llvm.org/D87196
2020-09-08 20:14:35 +02:00
Xun Li 59a467ee4f [Coroutine] Make dealing with alloca spills more robust
D66230 attempted to fix a problem where when there are allocas used before CoroBegin.
It keeps allocas and their uses stay in put if there are no escapse/changes to the data before CoroBegin.
Unfortunately that's incorrect.
Consider this code:

%var = alloca i32
%1 = getelementptr .. %var; stays put
%f = call i8* @llvm.coro.begin
store ... %1
After this fix, %1 will now stay put, however if a store happens after coro.begin and hence modifies the content, this change will not be reflected in the coroutine frame (and will eventually be DCEed).
To generalize the problem, if any alias ptr is created before coro.begin for an Alloca and that alias ptr is latter written into after coro.begin, it will lead to incorrect behavior.

There are also a few other minor issues, such as incorrect dominate condition check in the ptr visitor, unhandled memory intrinsics and etc.
Ths patch attempts to fix some of these issue, and make it more robust to deal with aliases.

While visiting through the alloca pointer, we also keep track of all aliases created that will be used after CoroBegin. We track the offset of each alias, and then reacreate these aliases after CoroBegin using these offset.
It's worth noting that this is not perfect and there will still be cases we cannot handle. I think it's impractical to handle all cases given the current design.
This patch makes it more robust and should be a pure win.
In the meantime, we need to think about what how to completely elimiante these issues, likely through the route as @rjmccall mentioned in D66230.

Differential Revision: https://reviews.llvm.org/D86859
2020-09-08 10:59:13 -07:00
Florian Hahn c7b7c32f4a [DSE,MemorySSA] Increase walker limit a bit.
This slightly bumps the walker limit so that it covers more cases while
not increasing compile-time too much:
http://llvm-compile-time-tracker.com/compare.php?from=0fc1c2b51ba0cfb9145139af35be638333865251&to=91144a50ea4fa82c0c877e77784f60371640b263&stat=instructions
2020-09-08 14:55:46 +01:00
Andrew Wei 78071fb524 [LSR] Canonicalize a formula before insert it into the list
In GenerateConstantOffsetsImpl, we may generate non canonical Formula
if BaseRegs of that Formula is updated and includes a recurrent expr reg
related with current loop while its ScaledReg is not.

Patched by: mdchen
Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D86939
2020-09-08 13:14:53 +08:00
Johannes Doerfert 711bf7dcf9 [Attributor][FIX] Don't crash on internalizing linkonce_odr hidden functions
The CloneFunctionInto has implicit requirements with regards to the
linkage and visibility of the function. We now update these after we did
the CloneFunctionInto on the copy with the same linkage and visibility
as the original.
2020-09-07 23:38:09 -05:00
Johannes Doerfert e6208849c8 [Attributor][NFC] Change variable spelling 2020-09-07 23:38:09 -05:00
Johannes Doerfert 8637acac5a [Attributor][NFC] Clang tidy: no else after continue 2020-09-07 23:38:08 -05:00
Johannes Doerfert ff70c25d76 [Attributor][NFC] Expand `auto` types (clang-fix-it) 2020-09-07 23:38:08 -05:00
Johannes Doerfert 79651265b2 [Attributor][FIX] Properly return changed if the IR was modified
Deleting or replacing anything is certainly a modification. This caused
a later assertion in IPSCCP when compiling 400.perlbench with the new PM.
I'm not sure how to test this.
2020-09-07 23:38:08 -05:00
Florian Hahn efb8e156da [DSE,MemorySSA] Add an early check for read clobbers to traversal.
Depending on the benchmark, this early exit can save a substantial
amount of compile-time:

http://llvm-compile-time-tracker.com/compare.php?from=505f2d817aa8e07ba98e5fd4a8f6ff0666f89df1&to=eb4e441147f9b4b7a5fcbbc57428cadbe9e01f10&stat=instructions
2020-09-07 23:22:10 +01:00
Roman Lebedev bb7d3af113
Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
This was reverted in 503deec218
because it caused gigantic increase (3x) in branch mispredictions
in certain benchmarks on certain CPU's,
see https://reviews.llvm.org/D84108#2227365.

It has since been investigated and here are the results:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html
> It's an amazingly severe regression, but it's also all due to branch
> mispredicts (about 3x without this). The code layout looks ok so there's
> probably something else to deal with. I'm not sure there's anything we can
> reasonably do so we'll just have to take the hit for now and wait for
> another code reorganization to make the branch predictor a bit more happy :)
>
> Thanks for giving us some time to investigate and feel free to recommit
> whenever you'd like.
>
> -eric

So let's just reland this.
Original commit message:


I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108

This reverts commit 503deec218.
2020-09-08 00:24:03 +03:00
Nikita Popov 9fb46a452d [SCCP] Compute ranges for supported intrinsics
For intrinsics supported by ConstantRange, compute the result range
based on the argument ranges. We do this independently of whether
some or all of the input ranges are full, as we can often still
constrain the result in some way.

Differential Revision: https://reviews.llvm.org/D87183
2020-09-07 22:16:06 +02:00
Sanjay Patel 8b30067919 [InstCombine] improve fold of pointer differences
This was supposed to be an NFC cleanup, but there's
a real logic difference (did not drop 'nsw') visible
in some tests in addition to an efficiency improvement.

This is because in the case where we have 2 GEPs,
the code was *always* swapping the operands and
negating the result. But if we have 2 GEPs, we
should *never* need swapping/negation AFAICT.

This is part of improving flags propagation noticed
with PR47430.
2020-09-07 15:54:32 -04:00
Simon Pilgrim 5ea9e655ef VPlan.h - remove unnecessary forward declarations. NFCI.
Already defined in includes.
2020-09-07 18:35:06 +01:00
Sanjay Patel 7a6d6f0f70 [InstCombine] improve folds for icmp with multiply operands (PR47432)
Check for no overflow along with an odd constant before
we lose information by converting to bitwise logic.

https://rise4fun.com/Alive/2Xl

  Pre: C1 != 0
  %mx = mul nsw i8 %x, C1
  %my = mul nsw i8 %y, C1
  %r = icmp eq i8 %mx, %my
  =>
  %r = icmp eq i8 %x, %y

  Name: nuw ne
  Pre: C1 != 0
  %mx = mul nuw i8 %x, C1
  %my = mul nuw i8 %y, C1
  %r = icmp ne i8 %mx, %my
  =>
  %r = icmp ne i8 %x, %y

  Name: odd ne
  Pre: C1 % 2 != 0
  %mx = mul i8 %x, C1
  %my = mul i8 %y, C1
  %r = icmp ne i8 %mx, %my
  =>
  %r = icmp ne i8 %x, %y
2020-09-07 12:40:37 -04:00
Sanjay Patel b22910daab [InstCombine] erase instructions leading up to unreachable
Normal dead code elimination ignores assume intrinsics, so we fail to
delete assumes that are not meaningful (and potentially worse if they
cause conflicts with other assumptions).

The motivating example in https://llvm.org/PR47416 suggests that we
might have problems upstream from here (difference between C and C++),
but this should be a cheap way to make sure we remove more dead code.

Differential Revision: https://reviews.llvm.org/D87149
2020-09-07 10:44:08 -04:00
Sanjay Patel 3ca8b9a560 [InstCombine] give a name to an intermediate value for easier tracking; NFC
As noted in PR47430, we probably want to conditionally include 'nsw'
here anyway, so we are going to need to fill out the optional args.
2020-09-07 08:19:42 -04:00
Sam Parker 928c4b4b49 [SCEV] Refactor isHighCostExpansionHelper
To enable the cost of constants, the helper function has been
reorganised:
- A struct has been introduced to hold SCEV operand information so
  that we know the user of the operand, as well as the operand index.
  The Worklist now uses instead instead of a bare SCEV.
- The costing of each SCEV, and collection of its operands, is now
  performed in a helper function.

Differential Revision: https://reviews.llvm.org/D86050
2020-09-07 11:57:46 +01:00
Sam Parker 65f78e73ad [SimplifyCFG] Consider cost of combining predicates.
Modify FoldBranchToCommonDest to consider the cost of inserting
instructions when attempting to combine predicates to fold blocks.
The threshold can be controlled via a new option:
-simplifycfg-branch-fold-threshold which defaults to '2' to allow
the insertion of a not and another logical operator.

Differential Revision: https://reviews.llvm.org/D86526
2020-09-07 10:04:50 +01:00
Florian Hahn 16bb71fd4f [DSE,MemorySSA] Add a few additional debug messages. 2020-09-06 20:31:00 +01:00
Nikita Popov 4892d3a198 [InstCombine] Fold abs with dominating condition
Similar to D87168, but for abs. If we have a dominating x >= 0
condition, then we know that abs(x) is x. This fold is in
InstCombine, because we need to create a sub instruction for
the x < 0 case.

Differential Revision: https://reviews.llvm.org/D87184
2020-09-05 16:18:35 +02:00
Nikita Popov ada8a17d94 [InstCombine] Fold abs intrinsic eq zero
Following the same transform for the select version of abs.
2020-09-05 15:11:38 +02:00
Nikita Popov 58b28fa7a2 [InstCombine] Fold mul of abs intrinsic
Same as the existing SPF_ABS fold. We don't need to explicitly
handle NABS, as the negs will get folded away first.
2020-09-05 12:37:45 +02:00
Nikita Popov 10cb23c6ca [InstCombine] Fold cttz of abs intrinsic
Same as the existing fold for SPF_ABS. We don't need to explicitly
handle the NABS variant, as we'll first fold away the neg in that
case.
2020-09-05 12:25:41 +02:00
serge-sans-paille 3a6f3fc160 Fix return status of SimplifyCFG
When a switch case is folded into default's case, that's an IR change that
should be reported, update ConstantFoldTerminator accordingly.

Differential Revision: https://reviews.llvm.org/D87142
2020-09-05 07:54:15 +02:00
Florian Hahn 00eb6fef08 [DSE,MemorySSA] Check for throwing instrs between killing/killed def.
We also have to check all uses between the killing & killed def and
check if any of them is throwing.
2020-09-04 18:54:59 +01:00
Wei Wang 4eef14f978 [OpenMPOpt] Assume indirect call always changes ICV
When checking call sites, give special handling to indirect call, as the
callee may be unknown and can lead to nullptr dereference later. Assume
conservatively that the ICV always changes in such case.

Reviewed By: sstefan1

Differential Revision: https://reviews.llvm.org/D87104
2020-09-04 09:05:32 -07:00
Teresa Johnson 45c3560384 [HeapProf] Address post-review comments in instrumentation code
Addresses post-review comments from D85948, which can be found here:
https://reviews.llvm.org/rG7ed8124d46f9.
2020-09-04 08:59:00 -07:00
Florian Hahn 6bc5e866bd [MemCpyOpt] Account for case that MemInsertPoint == BI.
In that case, the new MemoryDef needs to be inserted *before*
MemInsertPoint.
2020-09-04 14:04:08 +01:00
Florian Hahn e2fc6a31d3 [MemCpyOpt] Preserve MemorySSA.
This patch updates MemCpyOpt to preserve MemorySSA. It uses the
MemoryDef at the insertion point of the builder and inserts the new def
after that def.

In some cases, we just modify a memory instruction. In that case, get
the defining access, then remove the memory access and add a new one.
If the defining access is in a different block, insert a new def at the
beginning of the current block, otherwise after the defining access.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86651
2020-09-04 09:05:33 +01:00
Sanjay Patel 2391a34f9f [InstCombine] canonicalize all commutative intrinsics with constant arg 2020-09-03 12:42:04 -04:00
Sanjay Patel bdd5bfd0e4 [IR][GVN] add/allow commutative intrinsics with >2 args
Follow-up to D86798 and rGe25449f.
2020-09-03 10:14:53 -04:00
Florian Hahn 6de51189b0 [PassManager] Move load/store motion pass after DSE in LTO pipeline.
As far as I am aware, the placement of MergedLoadStoreMotion in the
pipeline is not heavily tuned currently. It seems to not matter much if
we do it after DSE in the LTO pipeline (no binary changes for -O3 -flto
on MultiSource/SPEC2000/SPEC2006). Moving it after DSE however has a
major benefit: MemorySSA is constructed by LICM and is consumed by DSE,
so if MergedLoadStoreMotion happens after DSE, we do not need to
preserve MemorySSA in it.

If there are any concerns with this move, I can also update
MergedLoadStoreMotion to preserve MemorySSA.

This patch together with D86651 (preserve MemSSA in MemCpyOpt) and
D86534 (preserve MemSSA in GVN) are the remaining patches to bring down
compile-time for DSE + MemorySSA to the levels outlined in
http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html

Once they land, we should be able to start with flipping the switch on
enabling DSE + MmeorySSA.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86967
2020-09-03 13:47:50 +01:00
Florian Hahn a344b382a0 [GVN] Preserve MemorySSA if it is available.
Preserve MemorySSA if it is available before running GVN.

DSE with MemorySSA will run closely after GVN. If GVN and 2 other
passes preserve MemorySSA, DSE can re-use MemorySSA used by LICM
when doing LTO.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86534
2020-09-03 12:28:13 +01:00
David Green 245f846c4e [MemCpyOptimizer] Change required analysis order for BasicAA/PhiValuesAnalysis
This is a followup to 1ccfb52a61, which made a number of changes
including the apparently innocuous reordering of required passes in
MemCpyOptimizer. This however altered the creation order of BasicAA vs
Phi Values analysis, meaning BasicAA did not pick up PhiValues as a
cached result. Instead if we require MemoryDependence first it will
require PhiValuesAnalysis allowing BasicAA to use it for better results.

I don't claim this is an excellent design, but it fixes a nasty little
regressions where a query later in JumpThreading was getting worse
results.

Differential Revision: https://reviews.llvm.org/D87027
2020-09-03 12:01:51 +01:00
Florian Hahn 4c5e4aa89b Revert "[SCCP] Do not replace deref'able ptr with un-deref'able one."
This reverts commit 3542feeb20.

This seems to be causing issues with a sanitizer build
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-msan/builds/21677
2020-09-03 10:28:42 +01:00
Florian Hahn 3542feeb20 [SCCP] Do not replace deref'able ptr with un-deref'able one.
Currently IPSCCP (and others like CVP/GVN) blindly propagate pointer
equalities. In certain cases, that leads to dereferenceable pointers
being replaced, as in the example test case.

I think this is not allowed, as it introduces an access of an
un-dereferenceable pointer. Note that the pointer is inbounds, but one
past the last element, so it is valid, but not dereferenceable.

This patch is mostly to highlight the issue and start a discussion.
Currently it only checks for specifically looking
one-past-the-last-element pointers with array typed bases.

This causes the mis-compile outlined in
https://stackoverflow.com/questions/55754313/is-this-gcc-clang-past-one-pointer-comparison-behavior-conforming-or-non-standar

In the test case, if we replace %p with the GEP for the store, we
subsequently determine that the store and the load cannot alias, because
they are to different underlying objects.

Note that Alive2 seems to think that the replacement is valid:
https://alive2.llvm.org/ce/z/2rorhk

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85332
2020-09-03 10:22:21 +01:00
Eli Friedman 96ef6998df [InstCombine] Fix a couple crashes with extractelement on a scalable vector.
Differential Revision: https://reviews.llvm.org/D86989
2020-09-02 18:02:07 -07:00
Huihui Zhang b4f04d7135 [VectorCombine][SVE] Do not fold bitcast shuffle for scalable type.
First, shuffle cost for scalable type is not known for scalable type;
Second, we cannot reason if the narrowed shuffle mask for scalable type
is a splat or not.

E.g., Bitcast splat vector from type <vscale x 4 x i32> to <vscale x 8 x i16>
will involve narrowing shuffle mask <vscale x 4 x i32> zeroinitializer to
<vscale x 8 x i32> with element sequence of <0, 1, 0, 1, ...>, which cannot be
reasoned if it's a valid splat or not.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86995
2020-09-02 15:02:16 -07:00
Arthur Eubanks 352cf57cfb [Bindings] Move LLVMAddInstructionSimplifyPass to Scalar.cpp
Should not be with the pass, but alongside all the other C bindings.

Reviewed By: sroland

Differential Revision: https://reviews.llvm.org/D87041
2020-09-02 10:35:39 -07:00
Congzhe Cao ec489ae048 [IPSCCP] Fix a bug that the "returned" attribute is not cleared when function is optimized to return undef
In IPSCCP when a function is optimized to return undef, it should clear the returned attribute for all its input arguments
and its corresponding call sites.

The bug is exposed when the value of an input argument of the function is assigned to a physical register and
because of the argument having a returned attribute, the value of this physical register will continue to be used
as the function return value right after the call instruction returns, even if the value that this register holds may
be clobbered during the function call. This potentially results in incorrect values being used afterwards.

Reviewed By: jdoerfert, fhahn

Differential Revision: https://reviews.llvm.org/D84220
2020-09-02 11:21:48 -04:00
David Stenberg 6d36b22b21 [GlobalOpt] Fix an incorrect Modified status
When marking a global variable constant, and simplifying users using
CleanupConstantGlobalUsers(), the pass could incorrectly return false if
there were still some uses left, and no further optimizations was done.

This was caught using the check introduced by D80916.

This fixes PR46749.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D85837
2020-09-02 15:00:45 +02:00
Venkataramanan Kumar 626c3738cd [InstCombine] Transform 1.0/sqrt(X) * X to X/sqrt(X)
These transforms will now be performed irrespective of the number of uses for the expression "1.0/sqrt(X)":
1.0/sqrt(X) * X => X/sqrt(X)
X * 1.0/sqrt(X) => X/sqrt(X)

We already handle more general cases, and we are intentionally not creating extra (and likely expensive)
fdiv ops in IR. This pattern is the exception to the rule because we always expect the Backend to reduce
X/sqrt(X) to sqrt(X), if it has the necessary (reassoc) fast-math-flags.

Ref: DagCombiner optimizes the X/sqrt(X) to sqrt(X).

Differential Revision: https://reviews.llvm.org/D86726
2020-09-02 08:23:48 -04:00
Sanjay Patel 8fb055932c [VectorCombine] allow vector loads with mismatched insert type
This is an enhancement to D81766 to allow loading the minimum target
vector type into an IR vector with a different number of elements.

In one of the motivating tests from PR16739, SLP creates <2 x float>
load ops mixed with <4 x float> insert ops, so we want to handle that
pattern in addition to potential oversized vectors created by the
vectorizers.

For now, we are assuming the insert/extract subvector with undef is
free because there is no exact corresponding TTI modeling for that.

Differential Revision: https://reviews.llvm.org/D86160
2020-09-02 08:11:36 -04:00
Shinji Okumura 5d13479574 [Attributor] Make use of AANoUndef in AAUndefinedBehavior
This patch makes it possible for AAUB to use information from AANoUndef.
This is the next patch of D86983

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86984
2020-09-02 16:08:03 +09:00
Shinji Okumura 7558e9e5a2 [Attributor] Fix AANoUndef initialization
When the associated value is undef, we immediately forced to indicate a pessimistic fixpoint so far.
This patch changes the initialization to check the attribute given in IR at first and to indicate an optimistic fixpoint when it is given.
This change will enable us to catch , for example, the following case in AAUB.
```
call void @foo(i32 noundef undef)
```

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86983
2020-09-02 15:40:43 +09:00
Alina Sbirlea 1ccfb52a61 [MemCpyOptimizer] Preserve analyses and replace use of lambdas to get them.
Summary:
Analyses are preserved in MemCpyOptimizer.
Get analyses before running the pass and store the pointers, instead of
using lambdas and getting them every time on demand.

Reviewers: lenary, deadalnix, mehdi_amini, nikic, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74494
2020-09-01 17:35:40 -07:00
Aaron Liu d7e16ca28f [LV] Interleave to expose ILP for small loops with scalar reductions.
Interleave for small loops that have reductions inside,
which breaks dependencies and expose.

This gives very significant performance improvements for some benchmarks.
Because small loops could be in very hot functions in real applications.

Differential Revision: https://reviews.llvm.org/D81416
2020-09-01 19:47:32 +00:00
Arthur Eubanks 96f0b57568 [Bindings] Add LLVMAddInstructionSimplifyPass
Reviewed By: sroland

Differential Revision: https://reviews.llvm.org/D86764
2020-09-01 12:38:49 -07:00
Anh Tuyen Tran 68717acb24 [LoopIdiomRecognizePass] Options to disable part or the entire Loop Idiom Recognize Pass
Loop Idiom Recognize Pass (LIRP) attempts to transform loops with subscripted arrays
into memcpy/memset function calls. In some particular situation, this transformation
introduces negative impacts. For example: https://bugs.llvm.org/show_bug.cgi?id=47300

This patch will enable users to disable a particular part of the transformation, while
he/she can still enjoy the benefit brought about by the rest of LIRP. The default
behavior stays unchanged: no part of LIRP is disabled by default.

Reviewed By: etiotto (Ettore Tiotto)

Differential Revision: https://reviews.llvm.org/D86262
2020-09-01 13:59:24 +00:00
Hamilton Tobon Mosquera 1d3d9b9cd8 [OpenMPOpt][NFC] Moving constants as struct static attributes 2020-08-31 19:05:00 -05:00
Hamilton Tobon Mosquera 8931add617 [OpenMPOpt][HideMemTransfersLatency] Get values stored in offload arrays
getValuesInOffloadArrays goes through the offload arrays in __tgt_target_data_begin_mapper getting the values stored in them before the call is issued.

call void @__tgt_target_data_begin_mapper(arg0, arg1,
    i8** %offload_baseptrs, i8** %offload_ptrs, i64* %offload_sizes,
...)

Diferential Revision: https://reviews.llvm.org/D86300
2020-08-31 15:33:05 -05:00
Sanjay Patel e25449ff57 [IR][GVN] allow intrinsics in Instruction's isCommutative query (2nd try)
The 1st try was reverted because I missed an assert that
needed softening.

As discussed in D86798 / rG09652721 , we were potentially
returning a different result for whether an Instruction
is commutable depending on if we call the base class or
derived class method.

This requires relaxing asserts in GVN, but that pass
seems to be working otherwise.

NewGVN requires more work because it uses different
code paths for numbering binops and calls.
2020-08-31 16:01:19 -04:00
Christopher Tetreault 640f20b0c7 [SVE] Remove calls to VectorType::getNumElements from InstCombine
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D82237
2020-08-31 12:59:10 -07:00
Roman Lebedev c23aefd7c3
[NFC][InstCombine] visitPHINode(): cleanup PHI CSE instruction replacement
As @nikic is pointing out in https://reviews.llvm.org/rGbf21ce7b908e#inline-4647
this must be sufficient otherwise `EliminateDuplicatePHINodes()`
would have hit issues with it already.
2020-08-31 22:29:39 +03:00
Fangrui Song f2284e3405 [Sink] Optimize/simplify sink candidate finding with nearest common dominator
For an instruction in the basic block BB, SinkingPass enumerates basic blocks
dominated by BB and BB's successors. For each enumerated basic block,
SinkingPass uses `AllUsesDominatedByBlock` to check whether the basic
block dominates all of the instruction's users. This is inefficient.

Use the nearest common dominator of all users to avoid enumerating the
candidate. The nearest common dominator may be in a parent loop which is
not beneficial. In that case, find the ancestors in the dominator tree.

In the case that the instruction has no user, with this change we will
not perform unnecessary move. This causes some amdgpu test changes.

A stage-2 x86-64 clang is a byte identical with this change.
2020-08-30 22:51:00 -07:00
Sanjay Patel badd7264e1 Revert "[IR][GVN] allow intrinsics in Instruction's isCommutative query"
This reverts commit 25597f7783.
It is causing crashing on bots such as:
http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/10523/steps/ninja-build/logs/stdio
2020-08-30 17:02:51 -04:00
Florian Hahn 86d817d7cf [DSE,MemorySSA] Skip defs without analyzable write locations.
Similar to other checks above, if there is no write location for a def,
it cannot be considered for elimination and can be skipped.
2020-08-30 21:56:25 +01:00
Sanjay Patel 25597f7783 [IR][GVN] allow intrinsics in Instruction's isCommutative query
As discussed in D86798 / rG09652721 , we were potentially
returning a different result for whether an Instruction
is commutable depending on if we call the base class or
derived class method.

This requires relaxing an assert in GVN, but that pass
seems to be working otherwise.

NewGVN requires more work because it uses different
code paths for numbering binops and calls.
2020-08-30 16:49:22 -04:00
Florian Hahn 42c57c294d [DSE,MemorySSA] Simplify code, EarlierAccess is be a MemoryDef (NFC).
After recent changes, we return early if Current is a MemoryPhi, so
EarlierAccess can only be a MemoryDef.
2020-08-30 21:31:57 +01:00
Florian Hahn eb35ebb3a2 [LV] Update CFG before adding runtime checks.
addRuntimeChecks uses SCEVExpander, which relies on the DT/LoopInfo to
be up-to-date. Changing the CFG afterwards may invalidate some inserted
instructions, especially LCSSA phis.

Reorder the code to first update the CFG and then create the runtime
checks. This should not have any impact on the generated code, as we
adjust the CFG and generate runtime checks together.

Fixes PR47343.
2020-08-30 18:21:44 +01:00
Sanjay Patel af4581e8ab [SLP] make commutative check apply only to binops; NFC
As discussed in D86798, it's not clear if the caller code
works with a more liberal definition of "commutative" that
includes intrinsics like min/max. This makes the binop
restriction (current functionality is unchanged) explicit
until the code is audited/tested.
2020-08-30 10:55:44 -04:00
David Green 543c5425f1 [LV] Add some const to RecurrenceDescriptor. NFC 2020-08-30 12:27:51 +01:00
sstefan1 5dfd7cc46c Reland [OpenMPOpt] ICV tracking for calls
The problem with module slice has been addressed in D86319

Introduce two new AAs. AAICVTrackerFunctionReturned which checks if a
function can have a unique ICV value after it is finished, and
AAICVCallSiteReturned which checks AAICVTrackerFunctionReturned for a
call site. This enables us to check the value of a call and if it
changes the ICV. This also changes the approach in
`getReplacementValues()` to a worklist-based approach so we can explore
all relevant BBs.

Differential Revision: https://reviews.llvm.org/D85544
2020-08-30 11:27:48 +02:00
sstefan1 8d8ce85b23 [Attributor] Introduce module slice.
Summary:
The module slice describes which functions we can analyze and transform
while working on an SCC as part of the Attributor-CGSCC pass. So far we
simply restricted it to the SCC.

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D86319
2020-08-30 10:30:44 +02:00
Shinji Okumura a7ca9e09bd [Attributor] Fix callsite check in AAUndefinedBehavior
This is the next patch of D86842
When we check `noundef` attribute violation at callsites, we do not have to require `nonnull` in the following two cases.
1. An argument is known to be simplified to undef
2. An argument is known to be dead

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86845
2020-08-30 13:17:02 +09:00
Shinji Okumura 7082381735 [Attributor][NFC] Fix dependency type in AAUndefinedBehaviorImpl::updateImpl
This patch fixes wrong dependency type in AAUB.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86842
2020-08-30 12:34:50 +09:00
Shinji Okumura 7a15dfd056 [Attributor] Fix AANoUndef identification
Even though `noundef` IR attribute might be attached to non-void type values, AANoUndef is mistakenly identified for pointer type values only.
This patch fixes that.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86737
2020-08-30 05:39:25 +09:00
Florian Hahn 5067f4b626 [LV] Check opt-for-size before expanding runtime checks.
Move bail out when optimizing for size before runtime check generation.
In that case, we do not use the result of the expansion, the expanded
instruction will be dead and cleaned up later.

By doing the check before expanding the runtime-checks, we can save a
bit of unnecessary work.
2020-08-29 20:35:14 +01:00
Roman Lebedev 1dcb936cf6
[NFC][Local] EliminateDuplicatePHINodes(): add STATISTIC() 2020-08-29 22:03:18 +03:00
Roman Lebedev 961483a5ea
[NFCI][Local] Rewrite EliminateDuplicatePHINodes to optionally check hashing invariants
EarlyCSE has a mode to verify the invariant that hash equality equals
key equality, but EliminateDuplicatePHINodes() doesn't.

I've verified that this would have caught the stage2-stage3 mismatches
5ec2b757cc revert has fixed,
that were introduced last time in 3e69871ab5.
2020-08-29 22:03:10 +03:00
Shinji Okumura 1364d856f4 [Attributor][NFC] Do not manifest noundef for positions to be changed to undef
This patch fixes AANoUndef manifestation.
We should not manifest noundef for positions that will be changed to undef.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86835
2020-08-30 03:23:41 +09:00
Florian Hahn 31cdb29de4 [DSE,MemorySSA] Return early when hitting a MemoryPhi.
A MemoryPhi can never be eliminated. If we hit one, return the Phi, so
the caller can continue traversing the incoming accesses.

This saves some unnecessary read clobber checks and improves
compile-time
http://llvm-compile-time-tracker.com/compare.php?from=1ffc58b6d098ce8fa71f3a80fe75b990f633f921&to=d0fa8d1982380b57d7b6067528104bc373dbe07a&stat=instructions
2020-08-29 18:28:26 +01:00
Roman Lebedev 5ec2b757cc
[Instruction] Speculatively undo isIdenticalToWhenDefined() PHI handling changes
The stage2-stage3 differences persist even without instcombine-based
PHI CSE, so this is the only possible reason.
2020-08-29 19:38:57 +03:00
Sanjay Patel 0965272140 [EarlyCSE] fold commutable intrinsics
Handling the new min/max intrinsics is the motivation, but it
turns out that we have a bunch of other intrinsics with this
missing bit of analysis too.

The FP min/max tests show that we are intersecting FMF,
so that part should be safe too.

As noted in https://llvm.org/PR46897 , there is a commutative
property specifier for intrinsics, but no corresponding function
attribute, and so apparently no uses of that bit. We may want to
remove that next.

Follow-up patches should wire up the Instruction::isCommutative()
to this IntrinsicInst specialization. That requires updating
callers to be aware of the more general commutative property
(not just binops).

Differential Revision: https://reviews.llvm.org/D86798
2020-08-29 12:11:01 -04:00
Roman Lebedev bf21ce7b90
[InstCombine] Take 3: Perform trivial PHI CSE
The original take 1 was 6102310d81,
which taught InstSimplify to do that, which seemed better at time,
since we got EarlyCSE support for free.

However, it was proven that we can not do that there,
the simplified-to PHI would not be reachable from the original PHI,
and that is not something InstSimplify is allowed to do,
as noted in the commit ed90f15efb
that reverted it:
> It appears to cause compilation non-determinism and caused stage3 mismatches.

Then there was take 2 3e69871ab5,
which was InstCombine-specific, but it again showed stage2-stage3 differences,
and reverted in bdaa3f86a0.
This is quite alarming.

Here, let's try to change how we find existing PHI candidate:
due to the worklist order, and the way PHI nodes are inserted
(it may be inserted as the first one, or maybe not), let's look at *all*
PHI nodes in the block.

Effects on vanilla llvm test-suite + RawSpeed:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |    \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| asm-printer.EmittedInsts                           | 7942329   | 7942457   |    128 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 254295632 | 254312480 |  16848 |    0.01% |    0.01% |
| correlated-value-propagation.NumPhis               | 18412     | 18347     |    -65 |   -0.35% |    0.35% |
| early-cse.NumCSE                                   | 2183283   | 2183267   |    -16 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 541842    |  -8263 |   -1.50% |    1.50% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640311   | 3644419   |   4108 |    0.11% |    0.11% |
| instcombine.NumDeadInst                            | 1778204   | 1783205   |   5001 |    0.28% |    0.28% |
| instcombine.NumPHICSEs                             | 0         | 22490     |  22490 |    0.00% |    0.00% |
| instcombine.NumWorklistIterations                  | 2023272   | 2024400   |   1128 |    0.06% |    0.06% |
| instcount.NumCallInst                              | 1758395   | 1758802   |    407 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330545    |    -12 |    0.00% |    0.00% |
| instcount.TotalBlocks                              | 1077138   | 1077220   |     82 |    0.01% |    0.01% |
| instcount.TotalFuncs                               | 101442    | 101441    |     -1 |    0.00% |    0.00% |
| instcount.TotalInsts                               | 8831946   | 8832606   |    660 |    0.01% |    0.01% |
| simplifycfg.NumHoistCommonCode                     | 24186     | 24187     |      1 |    0.00% |    0.00% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019813   | 999767    | -20046 |   -1.97% |    1.97% |
```
So it fires 22490 times, which is less than ~24k the take 1 did,
but more than what take 2 did (22228 times)
.
It allows foldAggregateConstructionIntoAggregateReuse() to actually work
after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG
would have done this PHI CSE, of all places. Additionally, allows some
more `invoke`->`call` folds to happen (+110, +2.56%).

All in all, expectedly, this catches less things overall,
but all the motivational cases are still caught, so all good.
2020-08-29 18:21:24 +03:00
Roman Lebedev bdaa3f86a0
Revert "[InstCombine] Take 2: Perform trivial PHI CSE"
While the original variant with doing this in InstSimplify (rightfully)
caused questions and ultimately was detected to be a culprit
of stage2-stage3 mismatch, it was expected that
InstCombine-based implementation would be fine.

But apparently it's not, as
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24095/steps/compare-compilers/logs/stdio
suggests.

Which suggests that somewhere in InstCombine there is a loop
over nondeterministically sorted container, which causes
different worklist ordering.

This reverts commit 3e69871ab5.
2020-08-29 16:05:02 +03:00
Nikita Popov 6093b14c2c [InstCombine] Return replaceInstUsesWith() result (NFC)
Follow the usual usage pattern for this function and return the
result.
2020-08-29 14:49:57 +02:00
Roman Lebedev 71ac9105cd
[InstCombine] foldAggregateConstructionIntoAggregateReuse(): use InstCombiner::replaceInstUsesWith() instead of RAUW
We really shouldn't use RAUW in InstCombine
because we should consistently update Worklist to avoid extra iterations.
2020-08-29 15:10:14 +03:00
Roman Lebedev e65f213178
[InstCombine] canonicalizeICmpPredicate(): use InstCombiner::replaceInstUsesWith() instead of RAUW
We really shouldn't use RAUW in InstCombine
because we should consistently update Worklist to avoid extra iterations.
2020-08-29 15:10:14 +03:00
Roman Lebedev bd12113f57
[NFC][InstCombine] Fix some comments: the code already uses IC::replaceInstUsesWith() 2020-08-29 15:10:14 +03:00
Roman Lebedev 49d223274f
[NFC][InstCombine] Add STATISTIC() for how many iterations we did
As we've established, if it takes more than two iterations
(one to perform folding and one to ensure that no folding opportunities
remain) per function, then there are worklist management issues.
So it may be interesting to keep track of it.
2020-08-29 15:10:13 +03:00
Roman Lebedev 4f4eecf0ec
[InstCombine] visitPHINode(): use InstCombiner::replaceInstUsesWith() instead of RAUW
As noted in post-commit review, we really shouldn't use RAUW in InstCombine
because we should consistently update Worklist to avoid extra iterations.
2020-08-29 15:10:00 +03:00
Roman Lebedev 3e69871ab5
[InstCombine] Take 2: Perform trivial PHI CSE
The original take was 6102310d81,
which taught InstSimplify to do that, which seemed better at time,
since we got EarlyCSE support for free.

However, it was proven that we can not do that there,
the simplified-to PHI would not be reachable from the original PHI,
and that is not something InstSimplify is allowed to do,
as noted in the commit ed90f15efb
that reverted it :
> It appears to cause compilation non-determinism and caused stage3 mismatches.

However InstCombine already does many different optimizations,
so it should be a safe place to do it here.

Note that we still can't just compare incoming values ranges,
because there is no guarantee that these PHI's we'd simplify to
were already re-visited and sorted.
However coming up with a test is problematic.

Effects on vanilla llvm test-suite + RawSpeed:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |      |%| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instcombine.NumPHICSEs                             | 0         | 22228     |  22228 |    0.00% |    0.00% |
| asm-printer.EmittedInsts                           | 7942329   | 7942456   |    127 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 254295632 | 254313792 |  18160 |    0.01% |    0.01% |
| early-cse.NumCSE                                   | 2183283   | 2183272   |    -11 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 541842    |  -8263 |   -1.50% |    1.50% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640311   | 3666911   |  26600 |    0.73% |    0.73% |
| instcombine.NumDeadInst                            | 1778204   | 1783318   |   5114 |    0.29% |    0.29% |
| instcount.NumCallInst                              | 1758395   | 1758804   |    409 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330549    |     -8 |    0.00% |    0.00% |
| instcount.TotalBlocks                              | 1077138   | 1077221   |     83 |    0.01% |    0.01% |
| instcount.TotalFuncs                               | 101442    | 101441    |     -1 |    0.00% |    0.00% |
| instcount.TotalInsts                               | 8831946   | 8832611   |    665 |    0.01% |    0.01% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019813   | 999740    | -20073 |   -1.97% |    1.97% |
```
So it fires ~22k times, which is less than ~24k the take 1 did.
It allows foldAggregateConstructionIntoAggregateReuse() to actually work
after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG
would have done this PHI CSE, of all places. Additionally, allows some
more `invoke`->`call` folds to happen (+110, +2.56%).

All in all, expectedly, this catches less things overall,
but all the motivational cases are still caught, so all good.
2020-08-29 13:13:06 +03:00
Nikita Popov 57a26bb7b4 [InstCombine] Fix typo in comment (NFC)
As pointed out in post-commit review of D63060.
2020-08-29 10:17:17 +02:00
Akira Hatanaka 0231a4e5bd [ObjC][ARC] In HandlePotentialAlterRefCount, check whether an
instruction can decrement the reference count, not whether it can alter
it

This prevents the state transition from S_Use to S_CanRelease when doing
a bottom-up traversal and the transition from S_Retain to S_CanRelease
when doing a top-down traversal when the visited instruction can
increment the ref count but cannot decrement it. This allows the ARC
optimizer to remove retain/release pairs which were previously not
removed.

rdar://problem/21793154
2020-08-28 17:45:14 -07:00
Fangrui Song b5ef137c11 [gcov] Increment counters with atomicrmw if -fsanitize=thread
Without this patch, `clang --coverage -fsanitize=thread` may fail spuriously
because non-atomic counter increments can be detected as data races.
2020-08-28 16:32:35 -07:00
Craig Topper aab90384a3 [Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None)
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.

So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.

This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.

Differential Revision: https://reviews.llvm.org/D86744
2020-08-28 13:23:45 -07:00
Arthur Eubanks cfde93e5d6 [ObjCARCOpt] Port objc-arc to NPM
Since doInitialization() in the legacy pass modifies the module, the NPM
pass is a Module pass.

Reviewed By: ahatanak, ychen

Differential Revision: https://reviews.llvm.org/D86178
2020-08-28 12:59:33 -07:00
Tyker 6d3657417e [SROA] Improve handleling of assumes bundles by SROA
This patch fixes this crash https://gcc.godbolt.org/z/Ps8d1e
And gives SROA the ability to remove assumes if it allows promoting an alloca to register
Without removing assumes when it can't promote to register.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86570
2020-08-28 21:55:45 +02:00
Nikita Popov ffe05dd125 [InstCombine] usub.sat(a, b) + b => umax(a, b) (PR42178)
Fixes https://bugs.llvm.org/show_bug.cgi?id=42178 by folding
usub.sat(a, b) + b to umax(a, b). The backend will expand umax
back to usubsat if that is profitable.

We may also want to handle uadd.sat(a, b) - b in the future.

Differential Revision: https://reviews.llvm.org/D63060
2020-08-28 21:52:29 +02:00
Benjamin Kramer 8782c72765 Strength-reduce SmallVectors to arrays. NFCI. 2020-08-28 21:14:20 +02:00
David Sherwood f4257c5832 [SVE] Make ElementCount members private
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().

Differential Revision: https://reviews.llvm.org/D86065
2020-08-28 14:43:53 +01:00
Benjamin Kramer 3524c23ff2 [SCCP] Use bulk-remove API to bulk-remove attributes. NFCI. 2020-08-28 14:44:14 +02:00
Benjamin Kramer dce72dc870 [FunctionAttrs] Bulk remove attributes. NFC. 2020-08-28 12:56:19 +02:00
Florian Hahn 43aa7227df [DSE,MemorySSA] Check if Current is valid for elimination first.
This changes getDomMemoryDef to check if a Current is a valid
candidate for elimination before checking for reads. Before the change,
we were spending a lot of compile-time in checking for read accesses for
Current that might not even be removable.

This patch flips the logic, so we skip Current if they cannot be
removed before checking all their uses. This is much more efficient in
practice.

It also adds a more aggressive limit for checking partially overlapping
stores. The main problem with overlapping stores is that we do not know
if they will lead to elimination until seeing all of them. This patch
limits adds a new limit for overlapping store candidates, which keeps
the number of modified overlapping stores roughly the same.

This is another substantial compile-time improvement (while also
increasing the number of stores eliminated). Geomean -O3 -0.67%,
ReleaseThinLTO -0.97%.

http://llvm-compile-time-tracker.com/compare.php?from=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&to=2e630629b43f64b60b282e90f0d96082fde2dacc&stat=instructions

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86487
2020-08-28 11:19:04 +01:00
Florian Hahn 20e989e9de [BuildLibCalls] Add argmemonly to more lib calls.
strspn, strncmp, strcspn, strcasecmp, strncasecmp, memcmp, memchr,
memrchr, memcpy, memmove, memcpy, mempcpy, strchr, strrchr, bcmp
should all only access memory through their arguments.

I broke out strcoll, strcasecmp, strncasecmp because the result
depends on the locale, which might get accessed through memory.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86724
2020-08-28 09:50:38 +01:00
Shinji Okumura 50ebd1afa9 [Attributor] Do not manifest noundef for dead positions
Even if noundef is deduced for a position, we should not manifest it when the position is dead.
This is because the associated values with dead positions are replaced with undef values by AAIsDead.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86565
2020-08-28 05:58:18 +09:00
Christopher Tetreault 035833ae42 [SVE] Remove bad call to VectorType::getNumElements() from HeapProfiler
Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D86727
2020-08-27 12:16:00 -07:00
Shinji Okumura c5e6872ec6 [Attributor] Guarantee getAAFor not to update AA in the manifestation stage
If we query an AA with `Attributor::getAAFor` in `AbstractAttribute::manifest`, the AA may be updated.
This patch makes use of the phase flag in Attributor, and handle `getAAFor` behavior according to the flag.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86635
2020-08-28 04:07:42 +09:00
Christopher Tetreault 5e63083435 [SVE] Remove calls to VectorType::getNumElements from Transforms/Vectorize
Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D82056
2020-08-27 12:02:20 -07:00
Teresa Johnson 5b9d462b7d [HeapProf] Fix bot failures from instrumentation pass
Fix bot failure from 7ed8124d46f94601d5f1364becee9cee8538265e:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/8533

Since we are always using dynamic shadow,
insertDynamicShadowAtFunctionEntry should always return true for
modifying the function.
2020-08-27 10:21:19 -07:00
Shinji Okumura 7a68f0f1e0 [Attributor] Add a phase flag to Attributor
Add a new flag that indicates which stage in the process we are in.
This flag is introduced for handling behavior of `getAAFor` according to the stage. (discussed in D86635)

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86678
2020-08-28 01:16:38 +09:00
Teresa Johnson 7ed8124d46 [HeapProf] Clang and LLVM support for heap profiling instrumentation
See RFC for background:
http://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html

Note that the runtime changes will be sent separately (hopefully this
week, need to add some tests).

This patch includes the LLVM pass to instrument memory accesses with
either inline sequences to increment the access count in the shadow
location, or alternatively to call into the runtime. It also changes
calls to memset/memcpy/memmove to the equivalent runtime version.
The pass is modeled on the address sanitizer pass.

The clang changes add the driver option to invoke the new pass, and to
link with the upcoming heap profiling runtime libraries.

Currently there is no attempt to optimize the instrumentation, e.g. to
aggregate updates to the same memory allocation. That will be
implemented as follow on work.

Differential Revision: https://reviews.llvm.org/D85948
2020-08-27 08:50:35 -07:00
Florian Hahn 419c6948df [SimplifyLibCalls] Remove over-eager early return in strlen optzns.
Currently we bail out early for strlen calls with a GEP operand, if none
of the GEP specific optimizations fire. But there could be later
optimizations that still apply,  which we currently miss out on.

An example is that we do not apply the following optimization
   strlen(x) == 0 --> *x == 0

Unless I am missing something, there seems to be no reason for bailing
out early there.

Fixes PR47149.

Reviewed By: lebedev.ri, xbolva00

Differential Revision: https://reviews.llvm.org/D85886
2020-08-27 15:19:45 +01:00
serge-sans-paille 4e29d25669 Fix OpenMP deduplicateRuntimeCalls return status
Differential Revision: https://reviews.llvm.org/D86705
2020-08-27 15:01:04 +02:00
serge-sans-paille 5621571fc7 Fix Attributor return status
Differential Revision: https://reviews.llvm.org/D86703
2020-08-27 15:01:04 +02:00
Florian Hahn bb024c3c4e [DSE,MemorySSA] Remove short-cut to check if all paths are covered.
The post-order number early continue does not work in some cases, e.g.
if a path from EarlierAccess to an exit includes a node that dominates
EarlierAccess in a cycle.

The short-cut only has very minor impact on compile-time, so it seems
straight-forward to remove it for now:

http://llvm-compile-time-tracker.com/compare.php?from=062412e79fcfedf2cf004433e42036b0333e3f83&to=d7386016a77ce1387bdbbf360f1de157faea9d31&stat=instructions

Fixes PR47285.
2020-08-27 12:42:40 +01:00
Florian Hahn e717fdb0f1 [DSE,MemorySSA] Traverse use-def chain without MemSSA Walker.
For DSE with MemorySSA it is beneficial to manually traverse the
defining access, instead of using a MemorySSA walker, so we can
better control the number of steps together with other limits and
also weed out invalid/unprofitable paths early on.

This patch requires a follow-up patch to be most effective, which I will
share soon after putting this patch up.

This temporarily XFAIL's the limit tests, because we now explore more
MemoryDefs that may not alias/clobber the killing def. This will be
improved/fixed by the follow-up patch.

This patch also renames some `Dom*` variables to `Earlier*`, because the
dominance relation is not really used/important here and potentially
confusing.

This patch allows us to aggressively cut down compile time, geomean
-O3 -0.64%, ReleaseThinLTO -1.65%, at the expense of fewer stores
removed. Subsequent patches will increase the number of removed stores
again, while keeping compile-time in check.

http://llvm-compile-time-tracker.com/compare.php?from=d8e3294118a8c5f3f97688a704d5a05b67646012&to=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&stat=instructions

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86486
2020-08-27 10:02:02 +01:00
Shinji Okumura 6c25eca614 [Attributor] Add flag for undef value to the state of AAPotentialValues
Currently, an undef value is reduced to 0 when it is added to a set of potential values.
This patch introduces a flag for under values. By this, for example, we can merge two states `{undef}`, `{1}` to `{1}` (because we can reduce the undef to 1).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85592
2020-08-27 16:30:29 +09:00
Arthur Eubanks 486ed88533 [ConstProp] Remove ConstantPropagation
As discussed in
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143801.html.

Currently no users outside of unit tests.

Replace all instances in tests of -constprop with -instsimplify.
Notable changes in tests:
* vscale.ll - @llvm.sadd.sat.nxv16i8 is evaluated by instsimplify, use a fake intrinsic instead
* InsertElement.ll - insertelement undef is removed by instsimplify in @insertelement_undef
llvm/test/Transforms/ConstProp moved to llvm/test/Transforms/InstSimplify/ConstProp

Reviewed By: lattner, nikic

Differential Revision: https://reviews.llvm.org/D85159
2020-08-26 15:51:30 -07:00
Wei Mi c67ccf5faf [SampleFDO] Enhance profile remapping support for searching inline instance
and indirect call promotion candidate.

Profile remapping is a feature to match a function in the module with its
profile in sample profile if the function name and the name in profile look
different but are equivalent using given remapping rules. This is a useful
feature to keep the performance stable by specifying some remapping rules
when sampleFDO targets are going through some large scale function signature
change.

However, currently profile remapping support is only valid for outline
function profile in SampleFDO. It cannot match a callee with an inline
instance profile if they have different but equivalent names. We found
that without the support for inline instance profile, remapping is less
effective for some large scale change.

To add that support, before any remapping lookup happens, all the names
in the profile will be inserted into remapper and the Key to the name
mapping will be recorded in a map called NameMap in the remapper. During
name lookup, a Key will be returned for the given name and it will be used
to extract an equivalent name in the profile from NameMap. So with the help
of the NameMap, we can translate any given name to an equivalent name in
the profile if it exists. Whenever we try to match a name in the module to
a name in the profile, we will try the match with the original name first,
and if it doesn't match, we will use the equivalent name got from remapper
to try the match for another time. In this way, the patch can enhance the
profile remapping support for searching inline instance and searching
indirect call promotion candidate.

In a planned large scale change of int64 type (long long) to int64_t (long),
we found the performance of a google internal benchmark degraded by 2% if
nothing was done. If existing profile remapping was enabled, the performance
degradation dropped to 1.2%. If the profile remapping with the current patch
was enabled, the performance degradation further dropped to 0.14% (Note the
experiment was done before searching indirect call promotion candidate was
added. We hope with the remapping support of searching indirect call promotion
candidate, the degradation can drop to 0% in the end. It will be evaluated
post commit).

Differential Revision: https://reviews.llvm.org/D86332
2020-08-26 11:07:35 -07:00
Roman Lebedev 95848ea101
[Value][InstCombine] Fix one-use checks in PHI-of-op -> Op-of-PHI[s] transforms to be one-user checks
As FIXME said, they really should be checking for a single user,
not use, so let's do that. It is not *that* unusual to have
the same value as incoming value in a PHI node, not unlike
how a PHI may have the same incoming basic block more than once.

There isn't a nice way to do that, Value::users() isn't uniqified,
and Value only tracks it's uses, not Users, so the check is
potentially costly since it does indeed potentially involes
traversing the entire use list of a value.
2020-08-26 20:20:41 +03:00
Sjoerd Meijer bda8fbe2d2 [LV] Fallback strategies if tail-folding fails
This implements 2 different vectorisation fallback strategies if tail-folding
fails: 1) don't vectorise at all, or 2) vectorise using a scalar epilogue. This
can be controlled with option -prefer-predicate-over-epilogue, that has been
changed to take a numeric value corresponding to the tail-folding preference
and preferred fallback.

Patch by: Pierre van Houtryve, Sjoerd Meijer.

Differential Revision: https://reviews.llvm.org/D79783
2020-08-26 16:55:25 +01:00
Shinji Okumura 3050713798 [Attributor] Provide an edge-based interface in AAIsDead
This patch produces an edge-based interface in AAIsDead.
By this, we can query a set of basic blocks that are directly reachable from a given basic block.
This is specifically useful for implementation of AAReachability.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85547
2020-08-26 16:57:52 +09:00
Roman Lebedev 1f90d45b9e
[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad
While since D86306 we do it's sibling fold for `insertvalue`,
we should also do this for `extractvalue`'s.

And unlike that one, the results here are, quite honestly, shocking,
as it can be observed here on vanilla llvm test-suite + RawSpeed results:

```
| statistic name                                     | baseline  | proposed  |       Δ |       % |    |%| |
|----------------------------------------------------|-----------|-----------|--------:|--------:|-------:|
| asm-printer.EmittedInsts                           | 7945095   | 7942507   |   -2588 |  -0.03% |  0.03% |
| assembler.ObjectBytes                              | 273209920 | 273069800 | -140120 |  -0.05% |  0.05% |
| early-cse.NumCSE                                   | 2183363   | 2183398   |      35 |   0.00% |  0.00% |
| early-cse.NumSimplify                              | 541847    | 550017    |    8170 |   1.51% |  1.51% |
| instcombine.NumAggregateReconstructionsSimplified  | 2139      | 108       |   -2031 | -94.95% | 94.95% |
| instcombine.NumCombined                            | 3601364   | 3635448   |   34084 |   0.95% |  0.95% |
| instcombine.NumConstProp                           | 27153     | 27157     |       4 |   0.01% |  0.01% |
| instcombine.NumDeadInst                            | 1694521   | 1765022   |   70501 |   4.16% |  4.16% |
| instcombine.NumPHIsOfExtractValues                 | 0         | 37546     |   37546 |   0.00% |  0.00% |
| instcombine.NumSunkInst                            | 63158     | 63686     |     528 |   0.84% |  0.84% |
| instcount.NumBrInst                                | 874304    | 871857    |   -2447 |  -0.28% |  0.28% |
| instcount.NumCallInst                              | 1757657   | 1758402   |     745 |   0.04% |  0.04% |
| instcount.NumExtractValueInst                      | 45623     | 11483     |  -34140 | -74.83% | 74.83% |
| instcount.NumInsertValueInst                       | 4983      | 580       |   -4403 | -88.36% | 88.36% |
| instcount.NumInvokeInst                            | 61018     | 59478     |   -1540 |  -2.52% |  2.52% |
| instcount.NumLandingPadInst                        | 35334     | 34215     |   -1119 |  -3.17% |  3.17% |
| instcount.NumPHIInst                               | 344428    | 331116    |  -13312 |  -3.86% |  3.86% |
| instcount.NumRetInst                               | 100773    | 100772    |      -1 |   0.00% |  0.00% |
| instcount.TotalBlocks                              | 1081154   | 1077166   |   -3988 |  -0.37% |  0.37% |
| instcount.TotalFuncs                               | 101443    | 101442    |      -1 |   0.00% |  0.00% |
| instcount.TotalInsts                               | 8890201   | 8833747   |  -56454 |  -0.64% |  0.64% |
| instsimplify.NumSimplified                         | 75822     | 75707     |    -115 |  -0.15% |  0.15% |
| simplifycfg.NumHoistCommonCode                     | 24203     | 24197     |      -6 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                   | 48201     | 48195     |      -6 |  -0.01% |  0.01% |
| simplifycfg.NumInvokes                             | 2785      | 4298      |    1513 |  54.33% | 54.33% |
| simplifycfg.NumSimpl                               | 997332    | 1018189   |   20857 |   2.09% |  2.09% |
| simplifycfg.NumSinkCommonCode                      | 7088      | 6464      |    -624 |  -8.80% |  8.80% |
| simplifycfg.NumSinkCommonInstrs                    | 15117     | 14021     |   -1096 |  -7.25% |  7.25% |
```
... which tells us that this new fold fires whopping 38k times,
increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time),
decreasing total instruction count by -0.64% (-56454),
and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less)
and `extractvalue`'s (-74.83%, i.e. four times less).

This causes geomean -0.01% binary size decrease
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text
and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions

The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform
`InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold,
which is supposed to be dealing with such aggregate reconstructions,
fires a lot less now. There are two reasons why:
1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes.
   We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun.
2. But then, EarlyCSE not only manages to fold such redundant PHI's,
   it also sees that the extract-insert chain recreates the original aggregate,
   and replaces it with the original aggregate.

The take-aways are
1. We maybe should do most trivial, same-BB PHI CSE in InstCombine
2. I need to check if what other patterns remain, and how they can be resolved.
   (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away)

This is a reland of the original commit fcb51d8c24,
because originally i forgot to ensure that the base aggregate types match.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86530
2020-08-26 09:57:50 +03:00
Roman Lebedev c295c6f2c0
Revert "[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad"
This reverts commit fcb51d8c24.

As buildbots report, there's apparently some missing check to ensure
that the types of incoming values match the type of PHI.
Let's revert for a moment.
2020-08-26 09:23:22 +03:00
Roman Lebedev fcb51d8c24
[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad
While since D86306 we do it's sibling fold for `insertvalue`,
we should also do this for `extractvalue`'s.

And unlike that one, the results here are, quite honestly, shocking,
as it can be observed here on vanilla llvm test-suite + RawSpeed results:

```
| statistic name                                     | baseline  | proposed  |       Δ |       % |    |%| |
|----------------------------------------------------|-----------|-----------|--------:|--------:|-------:|
| asm-printer.EmittedInsts                           | 7945095   | 7942507   |   -2588 |  -0.03% |  0.03% |
| assembler.ObjectBytes                              | 273209920 | 273069800 | -140120 |  -0.05% |  0.05% |
| early-cse.NumCSE                                   | 2183363   | 2183398   |      35 |   0.00% |  0.00% |
| early-cse.NumSimplify                              | 541847    | 550017    |    8170 |   1.51% |  1.51% |
| instcombine.NumAggregateReconstructionsSimplified  | 2139      | 108       |   -2031 | -94.95% | 94.95% |
| instcombine.NumCombined                            | 3601364   | 3635448   |   34084 |   0.95% |  0.95% |
| instcombine.NumConstProp                           | 27153     | 27157     |       4 |   0.01% |  0.01% |
| instcombine.NumDeadInst                            | 1694521   | 1765022   |   70501 |   4.16% |  4.16% |
| instcombine.NumPHIsOfExtractValues                 | 0         | 37546     |   37546 |   0.00% |  0.00% |
| instcombine.NumSunkInst                            | 63158     | 63686     |     528 |   0.84% |  0.84% |
| instcount.NumBrInst                                | 874304    | 871857    |   -2447 |  -0.28% |  0.28% |
| instcount.NumCallInst                              | 1757657   | 1758402   |     745 |   0.04% |  0.04% |
| instcount.NumExtractValueInst                      | 45623     | 11483     |  -34140 | -74.83% | 74.83% |
| instcount.NumInsertValueInst                       | 4983      | 580       |   -4403 | -88.36% | 88.36% |
| instcount.NumInvokeInst                            | 61018     | 59478     |   -1540 |  -2.52% |  2.52% |
| instcount.NumLandingPadInst                        | 35334     | 34215     |   -1119 |  -3.17% |  3.17% |
| instcount.NumPHIInst                               | 344428    | 331116    |  -13312 |  -3.86% |  3.86% |
| instcount.NumRetInst                               | 100773    | 100772    |      -1 |   0.00% |  0.00% |
| instcount.TotalBlocks                              | 1081154   | 1077166   |   -3988 |  -0.37% |  0.37% |
| instcount.TotalFuncs                               | 101443    | 101442    |      -1 |   0.00% |  0.00% |
| instcount.TotalInsts                               | 8890201   | 8833747   |  -56454 |  -0.64% |  0.64% |
| instsimplify.NumSimplified                         | 75822     | 75707     |    -115 |  -0.15% |  0.15% |
| simplifycfg.NumHoistCommonCode                     | 24203     | 24197     |      -6 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                   | 48201     | 48195     |      -6 |  -0.01% |  0.01% |
| simplifycfg.NumInvokes                             | 2785      | 4298      |    1513 |  54.33% | 54.33% |
| simplifycfg.NumSimpl                               | 997332    | 1018189   |   20857 |   2.09% |  2.09% |
| simplifycfg.NumSinkCommonCode                      | 7088      | 6464      |    -624 |  -8.80% |  8.80% |
| simplifycfg.NumSinkCommonInstrs                    | 15117     | 14021     |   -1096 |  -7.25% |  7.25% |
```
... which tells us that this new fold fires whopping 38k times,
increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time),
decreasing total instruction count by -0.64% (-56454),
and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less)
and `extractvalue`'s (-74.83%, i.e. four times less).

This causes geomean -0.01% binary size decrease
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text
and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions

The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform
`InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold,
which is supposed to be dealing with such aggregate reconstructions,
fires a lot less now. There are two reasons why:
1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes.
   We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun.
2. But then, EarlyCSE not only manages to fold such redundant PHI's,
   it also sees that the extract-insert chain recreates the original aggregate,
   and replaces it with the original aggregate.

The take-aways are
1. We maybe should do most trivial, same-BB PHI CSE in InstCombine
2. I need to check if what other patterns remain, and how they can be resolved.
   (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away)

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86530
2020-08-26 09:08:24 +03:00
Juneyoung Lee f753f5b050 [ValueTracking] Let getGuaranteedNonPoisonOp find multiple non-poison operands
This patch helps getGuaranteedNonPoisonOp find multiple non-poison operands.

Instead of special-casing llvm.assume, I think it is also a viable option to
add noundef to Intrinsics.td. If it makes sense, I'll make a patch for that.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86477
2020-08-26 04:40:21 +09:00
Sanjay Patel c4f0a0896f [InstCombine] improve demanded element analysis for vector insert-of-extract (2nd try)
The 1st attempt (rG557b890) was reverted because it caused miscompiles.
That bug is avoided here by changing the order of folds and as verified
in the new tests.

Original commit message:
InstCombine currently has odd rules for folding insert-extract chains to shuffles,
so we miss collapsing seemingly simple cases as shown in the tests here.

But poison makes this not quite as easy as we might have guessed. Alive2 tests to
show the subtle difference (similar to the regression tests):
https://alive2.llvm.org/ce/z/hp4hv3 (this is ok)
https://alive2.llvm.org/ce/z/ehEWaN (poison leakage)

SLP tends to create these patterns (as shown in the SLP tests), and this could
help with solving PR16739.

Differential Revision: https://reviews.llvm.org/D86460
2020-08-25 11:19:36 -04:00
Sjoerd Meijer ae366479e8 [LV] get.active.lane.mask consuming tripcount instead of backedge-taken count
This adapts LV to the new semantics of get.active.lane.mask as discussed in
D86147, which means that the LV now emits intrinsic get.active.lane.mask with
the loop tripcount instead of the backedge-taken count as its second argument.
The motivation for this is described in D86147.

Differential Revision: https://reviews.llvm.org/D86304
2020-08-25 13:49:19 +01:00
Shinji Okumura 05390440a2 [Attributor][NFC] Clang format 2020-08-25 19:32:58 +09:00
Benjamin Kramer c6fb72de4f Revert "[InstCombine] improve demanded element analysis for vector insert-of-extract"
This reverts commit 557b890ff4. Causing
miscompiles, test case is on llvm-commits.
2020-08-25 11:31:31 +02:00
David Sherwood 7b64765cd1 [SVE] Fix TypeSize related warnings with IR truncates of scalable vectors
In getCastInstrCost when the instruction is a truncate we were relying
upon the implicit TypeSize -> uint64_t cast when asking if a given type
has the same size as a legal integer. I've changed the code to only
ask the question if the type is fixed length.

I have also changed InstCombinerImpl::SimplifyDemandedUseBits to bail
out for now if the type is a scalable vector.

I've added the following new tests:

  Analysis/CostModel/AArch64/sve-trunc.ll
  Transforms/InstCombine/AArch64/sve-trunc.ll

for both of these fixes.

Differential revision: https://reviews.llvm.org/D86432
2020-08-25 09:17:56 +01:00
Florian Hahn e19ef1aab5 [DSE,MemorySSA] Cache accesses with/without reachable read-clobbers.
Currently we repeatedly check the same uses for read clobbers in some
cases. We can avoid unnecessary checks by keeping track of the memory
accesses we already found read clobbers for. To do so, we just add
memory access causing read-clobbers to a set. Note that marking all
visited accesses as read-clobbers would be to pessimistic, as that might
include accesses not on any path to  the actual read clobber.

If we do not find any read-clobbers, we can add all visited instructions
to another set and use that to skip the same accesses in the next call.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D75025
2020-08-25 08:48:46 +01:00
Roman Lebedev cdd339c568
[InstCombine] PHI-of-insertvalues -> insertvalue-of-PHI's
As per statistic, this happens pretty exceedingly rare,
but i have seen it in exactly the situations the
Phi-aware aggregate reconstruction would have handled,
eventually, and allowed invoke -> call fold later on.

So while this might be something that other fold
will have to learn about, i believe we should be
doing this transform in general.

Here, we are okay with adding two PHI's to get both the base aggregate,
and the inserted value. I'm not sure it makes much sense to restrict
it to a single phi (to just the inserted value?), because originally
we'd be receiving the final aggregate already..

llvm test-suite + RawSpeed:
```
| statistic name                             | baseline  | proposed  |    Δ |      % | \|%\| |
|--------------------------------------------|-----------|-----------|-----:|-------:|------:|
| instcombine.NumPHIsOfInsertValues          | 0         | 12        |  12  |  0.00% | 0.00% |
| asm-printer.EmittedInsts                   | 8926643   | 8926595   | -48  |  0.00% | 0.00% |
| instcombine.NumCombined                    | 3846614   | 3846640   |  26  |  0.00% | 0.00% |
| instcombine.NumConstProp                   | 24302     | 24293     |  -9  | -0.04% | 0.04% |
| instcombine.NumDeadInst                    | 1620140   | 1620112   | -28  |  0.00% | 0.00% |
| instcount.NumBrInst                        | 898466    | 898464    |  -2  |  0.00% | 0.00% |
| instcount.NumCallInst                      | 1760819   | 1760875   |  56  |  0.00% | 0.00% |
| instcount.NumExtractValueInst              | 45659     | 45649     | -10  | -0.02% | 0.02% |
| instcount.NumInsertValueInst               | 4991      | 4981      | -10  | -0.20% | 0.20% |
| instcount.NumIntToPtrInst                  | 27084     | 27087     |   3  |  0.01% | 0.01% |
| instcount.NumPHIInst                       | 371435    | 371429    |  -6  |  0.00% | 0.00% |
| instcount.NumStoreInst                     | 906011    | 906019    |   8  |  0.00% | 0.00% |
| instcount.TotalBlocks                      | 1105520   | 1105518   |  -2  |  0.00% | 0.00% |
| instcount.TotalInsts                       | 9795737   | 9795776   |  39  |  0.00% | 0.00% |
| simplifycfg.NumInvokes                     | 2784      | 2786      |   2  |  0.07% | 0.07% |
| simplifycfg.NumSimpl                       | 1001840   | 1001850   |  10  |  0.00% | 0.00% |
| simplifycfg.NumSinkCommonInstrs            | 15174     | 15170     |  -4  | -0.03% | 0.03% |
```

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86306
2020-08-25 10:38:11 +03:00
Sanjay Patel 557b890ff4 [InstCombine] improve demanded element analysis for vector insert-of-extract
InstCombine currently has odd rules for folding insert-extract chains to shuffles,
so we miss collapsing seemingly simple cases as shown in the tests here.

But poison makes this not quite as easy as we might have guessed. Alive2 tests to
show the subtle difference (similar to the regression tests):
https://alive2.llvm.org/ce/z/hp4hv3 (this is ok)
https://alive2.llvm.org/ce/z/ehEWaN (poison leakage)

SLP tends to create these patterns (as shown in the SLP tests), and this could
help with solving PR16739.

Differential Revision: https://reviews.llvm.org/D86460
2020-08-24 17:00:16 -04:00
Bjorn Pettersson fce44ff5da [Scalarizer] Avoid updating the name of globals
The "takeName" logic at the end of ScalarizerVisitor::finish
could end up renaming global variables when having simplified
and extractelement instruction to simply pick a single vector
element. If the input vector to the extractelement instruction
held pointers to global variables we ended up renaming the global
variable.
The patch make sure we only take the name of the replaced Op when
we have added new instructions that might need a useful name.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D86472
2020-08-24 21:55:03 +02:00
Roman Lebedev 56c529300e
[NFC][InstCombine] Adjust naming for some methods to match coding standards
Requested as preparatory cleanup in https://reviews.llvm.org/D86306#inline-799065
2020-08-24 22:39:34 +03:00
Fangrui Song 44ee9d070a Revert D85812 "[coroutine] should disable inline before calling coro split"
This reverts commit 2e43acfed8.

LLVMCoroutines (the library which contains Coroutines.h) depends on LLVMipo (the
library which contains SampleProfile.cpp). It is inappropriate for
SampleProfile.cpp to depent on Coroutines.h (circular dependency).

The test inverted dependencies as well:
llvm/test/Transforms/Coroutines/coro-inline.ll uses -sample-profile.
2020-08-24 11:41:05 -07:00
Florian Hahn d1a1cce5b1 [DSE,MemorySSA] Do not use callCapturesBefore in isReadClobber.
Using callCapturesBefore potentially improves the precision and the
number of stores we can remove. But in practice, it seems to have very
little impact in terms of stores removed. For example, for
SPEC2000/SPEC2006/MultiSource with -O3 -flto, ~50 more stores are
removed (out of ~26900 stores removed). But in terms of compile-time, it
is very expensive and the patch gives substantial compile-time
improvements: Geomean O3 -0.24%, ReleaseThinLTO -0.47%, ReleaseLTO-g
-0.39%.

http://llvm-compile-time-tracker.com/compare.php?from=612a0bff88ed906c83b82f079d4c49e5fecfb9d0&to=e6c86b96d20d97dd88e903a409bd8d39b6114312&stat=instructions
2020-08-24 16:19:42 +01:00
dongAxis 2e43acfed8 [coroutine] should disable inline before calling coro split
summary:
When callee coroutine function is inlined into caller coroutine
function before coro-split pass, llvm will emits "coroutine should
have exactly one defining @llvm.coro.begin". It seems that coro-early
pass can not handle this quiet well.
So we believe that unsplited coroutine function should not be inlined.
This patch fix such issue by not inlining function if it has attribute
"coroutine.presplit" (it means the function has not been splited) to
fix this issue

TestPlan: check-llvm

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D85812
2020-08-24 22:22:08 +08:00
Francesco Petrogalli 5a34b3ab95 [llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]
Changes:

* Change `ToVectorTy` to deal directly with `ElementCount` instances.
* `VF == 1` replaced with `VF.isScalar()`.
* `VF > 1` and `VF >=2` replaced with `VF.isVector()`.
* `VF <=1` is replaced with `VF.isZero() || VF.isScalar()`.
* Replaced the uses of `llvm::SmallSet<ElementCount, ...>` with
   `llvm::SmallSetVector<ElementCount, ...>`. This avoids the need of an
   ordering function for the `ElementCount` class.
* Bits and pieces around printing the `ElementCount` to string streams.

To guarantee that this change is a NFC, `VF.Min` and asserts are used
in the following places:

1. When it doesn't make sense to deal with the scalable property, for
example:
   a. When computing unrolling factors.
   b. When shuffle masks are built for fixed width vector types
In this cases, an
assert(!VF.Scalable && "<mgs>") has been added to make sure we don't
enter coepaths that don't make sense for scalable vectors.
2. When there is a conscious decision to use `FixedVectorType`. These
uses of `FixedVectorType` will likely be removed in favour of
`VectorType` once the vectorizer is generic enough to deal with both
fixed vector types and scalable vector types.
3. When dealing with building constants out of the value of VF, for
example when computing the vectorization `step`, or building vectors
of indices. These operation _make sense_ for scalable vectors too,
but changing the code in these places to be generic and make it work
for scalable vectors is to be submitted in a separate patch, as it is
a functional change.
4. When building the potential VFs in VPlan. Making the VPlan generic
enough to handle scalable vectorization factors is a functional change
that needs a separate patch. See for example `void
LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned
MaxVF)`.
5. The class `IntrinsicCostAttribute`: this class still uses `unsigned
VF` as updating the field to use `ElementCount` woudl require changes
that could result in changing the behavior of the compiler. Will be done
in a separate patch.
7. When dealing with user input for forcing the vectorization
factor. In this case, adding support for scalable vectorization is a
functional change that migh require changes at command line.

Note that in some places the idiom

```
unsigned VF = ...
auto VTy = FixedVectorType::get(ScalarTy, VF)
```

has been replaced with

```
ElementCount VF = ...
assert(!VF.Scalable && ...);
auto VTy = VectorType::get(ScalarTy, VF)
```

The assertion guarantees that the new code is (at least in debug mode)
functionally equivalent to the old version. Notice that this change had been
possible because none of the methods that are specific to `FixedVectorType`
were used after the instantiation of `VTy`.

Reviewed By: rengolin, ctetreau

Differential Revision: https://reviews.llvm.org/D85794
2020-08-24 13:54:03 +00:00
Francesco Petrogalli bad7d6b373 Revert "[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]"
Reverting because the commit message doesn't reflect the one agreed on
phabricator at https://reviews.llvm.org/D85794.

This reverts commit c8d2b065b9.
2020-08-24 13:50:55 +00:00
Francesco Petrogalli c8d2b065b9 [llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]
Changes:

* Change `ToVectorTy` to deal directly with `ElementCount` instances.
* `VF == 1` replaced with `VF.isScalar()`.
* `VF > 1` and `VF >=2` replaced with `VF.isVector()`.
* `VF <=1` is replaced with `VF.isZero() || VF.isScalar()`.
* Add `<` operator to `ElementCount` to be able to use
`llvm::SmallSetVector<ElementCount, ...>`.
* Bits and pieces around printing the ElementCount to string streams.
* Added a static method to `ElementCount` to represent a scalar.

To guarantee that this change is a NFC, `VF.Min` and asserts are used
in the following places:

1. When it doesn't make sense to deal with the scalable property, for
example:
   a. When computing unrolling factors.
   b. When shuffle masks are built for fixed width vector types
In this cases, an
assert(!VF.Scalable && "<mgs>") has been added to make sure we don't
enter coepaths that don't make sense for scalable vectors.
2. When there is a conscious decision to use `FixedVectorType`. These
uses of `FixedVectorType` will likely be removed in favour of
`VectorType` once the vectorizer is generic enough to deal with both
fixed vector types and scalable vector types.
3. When dealing with building constants out of the value of VF, for
example when computing the vectorization `step`, or building vectors
of indices. These operation _make sense_ for scalable vectors too,
but changing the code in these places to be generic and make it work
for scalable vectors is to be submitted in a separate patch, as it is
a functional change.
4. When building the potential VFs in VPlan. Making the VPlan generic
enough to handle scalable vectorization factors is a functional change
that needs a separate patch. See for example `void
LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned
MaxVF)`.
5. The class `IntrinsicCostAttribute`: this class still uses `unsigned
VF` as updating the field to use `ElementCount` woudl require changes
that could result in changing the behavior of the compiler. Will be done
in a separate patch.
7. When dealing with user input for forcing the vectorization
factor. In this case, adding support for scalable vectorization is a
functional change that migh require changes at command line.

Differential Revision: https://reviews.llvm.org/D85794
2020-08-24 13:39:42 +00:00
Florian Hahn b99a5eb659 [DSE,MemorySSA] Delay PointerMayBeCaptured calls until actually needed.
Avoid computing InvisibleToCallerBefore/AfterRet up front. In most
cases, this information is not really needed. Instead, introduce helper
functions to compute and cache the result on demand.

Notably, this also does not use PointerMayBeCapturedBefore for
isInvisibleToCallerBeforeRet, as it requires the killing MemoryDef as
starting instruction, making the caching ineffective. But it appears the
use of PointerMayBeCapturedBefore has very limited benefits in practice
(e.g. on SPEC2000/SPEC2006/MultiSource there are no binary changes with
-O3 -flto). Refrain from using it for now, to limit-compile-time.

This gives some nice compile-time improvements:
http://llvm-compile-time-tracker.com/compare.php?from=db9345f6810f379a36752dc52caf5230585d0ebd&to=b4d091047e1b8a3d377d200137b79d03aca65663&stat=instructions
2020-08-24 14:05:44 +01:00
Florian Hahn 2431b143ae [DSE,MemorySSA] Limit elimination at end of function to single UO.
Limit elimination of stores at the end of a function to MemoryDefs with
a single underlying object, to save compile time.

In practice, the case with multiple underlying objects seems not very
important in practice. For -O3 -flto on MultiSource/SPEC2000/SPEC2006
this results in a total of 2 more stores being eliminated.

We can always re-visit that in the future.
2020-08-24 13:00:17 +01:00
Sanjay Patel 6a44edb8da [InstCombine] fold abs of select with negated op (PR39474)
Similar to the existing transform - peek through a select
to match a value and its negation.

https://alive2.llvm.org/ce/z/MXi5KG

  define i8 @src(i1 %b, i8 %x) {
  %0:
    %neg = sub i8 0, %x
    %sel = select i1 %b, i8 %x, i8 %neg
    %abs = abs i8 %sel, 1
    ret i8 %abs
  }
  =>
  define i8 @tgt(i1 %b, i8 %x) {
  %0:
    %abs = abs i8 %x, 1
    ret i8 %abs
  }
  Transformation seems to be correct!
2020-08-24 07:37:55 -04:00
Sam Parker 8ce450da32 [NFCI][SimplifyCFG] Combine select costs and checks
Combine the cost modelling and validity checks for the phi to select
conversion in SpeculativelyExecuteBB, extracting the logic out into
a function.
2020-08-24 09:16:11 +01:00
Roman Lebedev f6decfa36d
[InstCombine] Negator: freeze is freely negatible if it's operand is negatible 2020-08-23 23:28:19 +03:00
Florian Hahn 2843c9fe0a [DSE,MemorySSA] Keep single DL instance in DSEState (NFC).
Small cleanup, also removes one instance of getting DataLayout without
using it later.
2020-08-23 15:56:38 +01:00
Sanjay Patel ec06b38130 [InstCombine] canonicalize 'not' ops before logical shifts
This reverses the existing transform that would uniformly canonicalize any 'xor' after any shift. In the case of logical shifts, that turns a 'not' into an arbitrary 'xor' with constant, and that's probably not as good for analysis, SCEV, or codegen.

The SCEV motivating case is discussed in:
http://bugs.llvm.org/PR47136

There's an analysis motivating case at:
http://bugs.llvm.org/PR38781

I did draft a patch that would do the same for 'ashr' but that's questionable because it's just swapping the position of a 'not' and uncovers at least 2 missing folds that we would probably need to deal with as preliminary steps.

Alive proofs:
https://rise4fun.com/Alive/BBV

  Name: shift right of 'not'
  Pre: C2 == (-1 u>> C1)
  %a = lshr i8 %x, C1
  %r = xor i8 %a, C2
  =>
  %n = xor i8 %x, -1
  %r = lshr i8 %n, C1

  Name: shift left of 'not'
  Pre: C2 == (-1 << C1)
  %a = shl i8 %x, C1
  %r = xor i8 %a, C2
  =>
  %n = xor i8 %x, -1
  %r = shl i8 %n, C1

  Name: ashr of 'not'
  %a = ashr i8 %x, C1
  %r = xor i8 %a, -1
  =>
  %n = xor i8 %x, -1
  %r = ashr i8 %n, C1

Differential Revision: https://reviews.llvm.org/D86243
2020-08-22 09:38:13 -04:00
Florian Hahn 5e7e2162d4 [DSE,MemorySSA] Use BatchAA for AA queries.
We can use BatchAA to avoid some repeated AA queries. We only remove
stores, so I think we will get away with using a single BatchAA instance
for the complete run.

The changes in AliasAnalysis.h mirror the changes in D85583.

The change improves compile-time by roughly 1%.
http://llvm-compile-time-tracker.com/compare.php?from=67ad786353dfcc7633c65de11601d7823746378e&to=10529e5b43809808e8c198f88fffd8f756554e45&stat=instructions

This is part of the patches to bring down compile-time to the level
referenced in
http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86275
2020-08-22 08:36:35 +01:00
Roman Lebedev 503deec218
Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline"
As disscussed in post-commit review starting with
	https://reviews.llvm.org/D84108#2227365
while this appears to be mostly a win overall, especially code-size-wise,
this appears to shake //certain// code pattens in a way that is extremely
unfavorable for performance (+30% runtime regression)
on certain CPU's (i personally can't reproduce).

So until the behaviour is better understood, and a path forward is mapped,
let's back this out for now.

This reverts commit 1d51dc38d8.
2020-08-22 00:33:22 +03:00
kuterd 65fcc0ee31 [Attributor] Function seed allow list
-  Adds a command line option to seed only selected functions.
  - Makes seed allow listing exclusive to assertions enabled builds.

Reviewed By: sstefan1

Differential Revision: https://reviews.llvm.org/D86129
2020-08-21 23:55:26 +03:00
Shinji Okumura e21a22a7a8 [Attributor] fix AANoUndef initialization
Currently, `AANoUndefImpl::initialize` mistakenly always indicates optimistic fixpoint for function returned position.
 This is because an associated value is `Function` in the case, and `isGuaranteedNotToBeUndefOrPoison` returns true for Function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86361
2020-08-22 05:06:14 +09:00
Amy Huang 5e3fd471ac [Cloning] Fix to cloning DISubprograms.
When trying to enable -debug-info-kind=constructor there was an assert
that occurs during debug info cloning ("mismatched subprogram between
llvm.dbg.value variable and !dbg attachment").
It appears that during llvm::CloneFunctionInto, a DISubprogram could be
duplicated when MapMetadata is called, and then added to the MD map again
when DIFinder gets a list of subprograms. This results in two different
versions of the DISubprogram.

This patch switches the order so that the DIFinder subprograms are
added before MapMetadata is called.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46784

Differential Revision: https://reviews.llvm.org/D86185
2020-08-21 11:54:56 -07:00
Serguei Katkov 9e362bb0eb [InstCombine] Remove unused entries in gc-live bundle of statepoint
If some of gc live value are not used in gc.relocate we can remove them
from gc-live bundle of statepoint instruction.

Also the CL removes duplicated Values in gc-live bundle.

Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D85959
2020-08-22 01:36:22 +07:00
Serguei Katkov 63d9d56a55 [InstCombine] Move handling of gc.relocate in a gc.statepoint
The only def for gc.relocate is a gc.statepoint. But real dependency is not
described by def-use chain. Instead this dependency is encoded by indecies
of operands in gc-live bundle of statepoint as integer constants in gc.relocate.

InstCombine operates by def-use chain. As a result when value in gc-live bundle
is simplified the gc.statepoint itself is not simplified but it might simplify dependent
gc.relocates. To trigger the optimization of gc.relocate we now unconditionally trigger
check of all dependent gc.relocates by adding them to worklist.

This CL handles of gc.relocates as process of gc.statepoint optimization considering
gc.statepoint and related gc.relocate as whole entity.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D85954
2020-08-21 23:44:23 +07:00
Florian Hahn 8eded24bf4 Recommit "[SCEVExpander] Add helper to clean up instrs inserted while expanding."
Recommit the patch after fixing an issue reported caused by the fact
that re-used values are also added to InsertedValues.

Additional tests have been added in 88818491b9

This reverts the revert commit 38884641f2.
2020-08-21 15:04:17 +01:00
Sam Parker bfc6d8b59b [NFC][SimplifyCFG] Formatting and variable rename 2020-08-21 13:11:17 +01:00
Florian Hahn 9f7350672e [DSE,MemorySSA] Handle atomicrmw/cmpxchg conservatively.
This adds conservative handling of AtomicRMW/AtomicCmpXChg to
isDSEBarrier, similar to atomic loads and stores.
2020-08-21 10:42:42 +01:00
Yevgeny Rouban 18bc400f97 [NewPM][PassInstrumentation] Add PreservedAnalyses parameter to AfterPass* callbacks
Both AfterPass and AfterPassInvalidated pass instrumentation
callbacks get additional parameter of type PreservedAnalyses.
This patch was created by @fedor.sergeev. I have just slightly
changed it.

Reviewers: fedor.sergeev

Differential Revision: https://reviews.llvm.org/D81555
2020-08-21 16:10:42 +07:00
Sam Parker 47251582f5 [SimplifyCFG] Cost required selects
Before we speculatively execute a basic block, query the cost of
inserting the necessary select instructions against the phi folding
threshold. For non-trivial insertions, a more accurate decision can
probably be made during machine if-conversion. With minsize we query
the CodeSize cost, otherwise we use SizeAndLatency.

Differential Revision: https://reviews.llvm.org/D82438
2020-08-21 09:52:52 +01:00
Florian Hahn a0e92ffd0d [DSE,MemorySSA] Split off partial tracking from isOverwite.
When traversing memory uses to look for aliasing reads/writes, we only
care about complete overwrites. This patch splits off the partial
overwrite tracking from isOverwrite This avoids some unnecessary work
when checking for read/write clobbers with MemorySSA-DSE.
isOverwrite, which skips the partial overwrite tracking.

This gives a relatively small improvement
http://llvm-compile-time-tracker.com/compare.php?from=ef2a2f77f87553a0a4a39f518eb9ac86b756bda6&to=658f3905dd96d3415f3782adc712c79fa59a4665&stat=instructions

This is part of the patches to bring down compile-time to the level
referenced in
http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86280
2020-08-21 09:13:59 +01:00
David Green 2b69efded0 [ARM][LV] Add a preferPredicatedReductionSelect target hook
As part of D84741, this adds a target hook for the
preferPredicatedReductionSelect option and makes use
of it under MVE, allowing us to tail predicate most
reduction loops.

Differential Revision: https://reviews.llvm.org/D85980
2020-08-21 08:48:12 +01:00
David Green 816097e4e5 [LV] Allow tail folded reduction selects to remain in the loop
The normal scheme for tail folding reductions is to use:

loop:
  p = phi(0, a)
  mask = ...
  x = masked_load(..., mask)
  a = add(x, p)
s = select(mask, a, p)

This means we need to keep the register p and a alive out of the loop, plus
the mask. On a target with predicated operations we can instead generate
the phi as p = phi(0, s). This ensures the select in the loop and we can
fold select(m, add(a, b), c) to something like a vaddt c, a, b using the
m predicate. This in turn allows us to tail predicate the entire loop.

Differential Revision: https://reviews.llvm.org/D84741
2020-08-20 14:31:14 +01:00
Shinji Okumura 835cfa5def [Attributor] Handle CallBase case in AAValueConstantRange::initialize
Currently, although we handle `CallBase` case in updateImpl, we give up in initialize in the case.
That is problematic when we propagate a range from call site returned position to floating position.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86196
2020-08-20 20:15:19 +09:00
David Stenberg 8206257cb8 [GlobalOpt] Fix an incorrect Modified status
When removing a non-constant store to a global in
CleanupPointerRootUsers(), the GlobalOpt pass could incorrectly return
false.

This was caught using the check introduced by D80916.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D86149
2020-08-20 11:52:09 +02:00
David Stenberg 7a1029fd1e Reland "[LoopUnswitch] Fix incorrect Modified status"
Relanded since the buildbot issue was unrelated to this commit.

When hoisting simple values out from a loop, and an optsize attribute, a
convergent call, or an invoke instruction hindered the pass from
unswitching the loop, the pass would return an incorrect Modified
status.

This was caught using the check introduced by D80916.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86085
2020-08-20 11:52:09 +02:00
David Stenberg ca688ae497 Revert "[LoopUnswitch] Fix incorrect Modified status"
This reverts commit dfd447c220.

After I pushed this commit, llvm-sphinx-docs started failing, due to:

  Warning, treated as error:
  extension 'recommonmark' has no setup() function;
  is it really a Sphinx extension module?

I don't see how this commit may have caused that, but I'm still
reverting it since I don't know how to proceed with that
troubleshooting.
2020-08-20 11:14:23 +02:00
Evgeny Leviant d5b701b972 [ThinLTO] Import globals recursively
Differential revision: https://reviews.llvm.org/D73698
2020-08-20 12:13:43 +03:00
David Stenberg dfd447c220 [LoopUnswitch] Fix incorrect Modified status
When hoisting simple values out from a loop, and an optsize attribute, a
convergent call, or an invoke instruction hindered the pass from
unswitching the loop, the pass would return an incorrect Modified
status.

This was caught using the check introduced by D80916.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86085
2020-08-20 09:04:16 +02:00
Johannes Doerfert 012819f301 [Attributor][FIX] Update the call graph properly when internalizing functions
The internal version is now part of the SCC, make sure to perform this
update.
2020-08-20 01:44:58 -05:00
Johannes Doerfert 3edea15f9a [Attributor] Simplify comparison against constant null pointer
Comparison against null is a common pattern that usually is followed by
error handling code and the likes. We now use AANonNull to simplify
these comparisons optimistically in order to make more code dead early
on.

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D86145
2020-08-20 01:44:58 -05:00
Johannes Doerfert d01ad217ba [Attributor][FIX] Do not use cyclic arguments for `nonnull`
`AADereferenceable::getAssumedDereferenceableBytes()` is actually
deducing `dereferenceable_or_null`. We should not use that information
to deduce `nonnull`, since it doesn't imply `nonnull`.
2020-08-20 01:44:58 -05:00
Johannes Doerfert a49dae0e38 [Attributor][AAIsDead][NFC] Skip uninteresting instructions early 2020-08-20 01:44:58 -05:00
Johannes Doerfert 08f33756e6 [Attributor][NFC] Extract functionality into own member 2020-08-20 01:44:58 -05:00
Johannes Doerfert 1de70a724e Revert "[OpenMPOpt] ICV tracking for calls"
This commits breaks certain OpenMP codes (on power) because it expanded
the Attributor scope without telling the Attributor about the SCC
extend. See: https://reviews.llvm.org/D85544#2227611

This reverts commit b0b32e6490.
2020-08-20 00:00:35 -05:00
Kyungwoo Lee 7a028fe702 Force Remove Attribute
-force-attribute adds an attribute to function via command-line.
However, there was no counter-part to remove an attribute.  This patch
adds -force-remove-attribute that removes an attribute from function.

Differential Revision: https://reviews.llvm.org/D85586
2020-08-19 17:30:13 -04:00
Sanjay Patel 6f3511a01a [ValueTracking] define/use max recursion depth in header
There's a potential motivating case to increase this limit in PR47191:
http://bugs.llvm.org/PR47191

But first we should make it less hacky. The limit in InstCombine is directly tied
to this value because an increase there can cause asserts in the underlying value
tracking calls if not changed together. The usage in VectorUtils is independent,
but the comment suggests that we should use the same value unless there's a known
reason to diverge. There are similar limits in codegen analysis, but I think we
should leave those independent in case we intentionally want the optimization
power/cost to be different there.

Differential Revision: https://reviews.llvm.org/D86113
2020-08-19 16:56:59 -04:00
Hiroshi Yamauchi ab401a8c8a [PGO][PGSO][LV] Fix loop not vectorized issue under profile guided size opts.
D81345 appears to accidentally disables vectorization when explicitly
enabled. As PGSO isn't currently accessible from LoopAccessInfo, revert back to
the vectorization with versioning-for-unit-stride for PGSO.

Differential Revision: https://reviews.llvm.org/D85784
2020-08-19 12:13:34 -07:00
Florian Hahn c0cbe6453a [DSE] Remove dead argument from removePartiallyOverlappedStores (NFC).
The argument is unused and can be removed.
2020-08-19 19:33:52 +01:00
Mehdi Amini a407ec9b6d Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private.""
Was reverted because MLIR/Flang builds were broken, these APIs have been
fixed in the meantime.
2020-08-19 17:26:36 +00:00
Mehdi Amini 4fc56d70aa Revert "[NFC][llvm] Make the contructors of `ElementCount` private."
This reverts commit 264afb9e6a.
(and dependent 6b742cc48 and fc53bd610f)

MLIR/Flang are broken.
2020-08-19 17:21:37 +00:00
Hamilton Tobon Mosquera bd2fa1819b [OpenMPOpt][HideMemTransfersLatency] Moving the 'wait' counterpart of __tgt_target_data_begin_mapper
canBeMovedDownwards checks if the "wait" counterpart of __tgt_target_data_begin_mapper can be moved downwards, returning a pointer to the instruction that might require/modify the data transferred, and returning null it the movement is not possible or not worth it. The function splitTargetDataBeginRTC receives that returned instruction and instead of moving the "wait" it creates it at that point.

Differential Revision: https://reviews.llvm.org/D86155
2020-08-19 11:42:22 -05:00
Francesco Petrogalli 264afb9e6a [NFC][llvm] Make the contructors of `ElementCount` private.
Differential Revision: https://reviews.llvm.org/D86120
2020-08-19 16:26:44 +00:00
Sanjay Patel c8d711adae [InstCombine] reduce code duplication; NFC 2020-08-19 12:05:12 -04:00
Benjamin Kramer b98e25b6d7 Make helpers static. NFC. 2020-08-19 16:00:03 +02:00
Roman Lebedev 3d76a133c7
Revert "[InstCombine] Lower infinite combine loop detection thresholds"
And as being reported by Florian Hahn, there's a hit
in MultiSource/Benchmarks/mafft from the test-suite on X86 with -O3 -flto,
so reverting until addressed.

This reverts commit 71e0b82c9f.
2020-08-19 16:53:30 +03:00
Roman Lebedev 71e0b82c9f
[InstCombine] Lower infinite combine loop detection thresholds
It's been a month since 2f3862eb9f,
and no new bug reports about the threshold were filled,
so let's bump it again and wait again.
2020-08-19 14:37:57 +03:00
sstefan1 b0b32e6490 [OpenMPOpt] ICV tracking for calls
Introduce two new AAs. AAICVTrackerFunctionReturned which checks if a
function can have a unique ICV value after it is finished, and
AAICVCallSiteReturned which checks AAICVTrackerFunctionReturned for a
call site. This enables us to check the value of a call and if it
changes the ICV. This also changes the approach in
`getReplacementValues()` to a worklist-based approach so we can explore
all relevant BBs.

Differential Revision: https://reviews.llvm.org/D85544
2020-08-19 11:43:12 +02:00
Florian Hahn 1a55fbceaa [DSE,MemorySSA] Use NumRedundantStores instead of NumNoopStores.
Legacy DSE uses NumRedundantStores, while MemorySSA DSE uses
NumNoopStores. We should just use the same counter.
2020-08-19 08:50:33 +01:00
Roman Lebedev 2f01785857
[NFC][InstCombine] Aggregate reconstruction: use plain map
Now that we no longer require for this map to have stable iteration order,
we no longer need to pay for keeping the iteration order stable,
so switch from `SmallMapVector` to `SmallDenseMap`.
2020-08-19 01:09:25 +03:00
Roman Lebedev 78bd4231bf
[InstCombine] PHI-aware aggregate reconstruction: properly handle duplicate predecessors
While it may seem like we can just "deduplicate" the case where
some basic block happens to be a predecessor more than once,
which happens for e.g. switches, that is not correct thing to do.
We must actually add a PHI operand for each predecessor.

This was initially reported to me by David Major
as a clang crash during gecko build for android.
2020-08-19 01:00:42 +03:00
Sanjay Patel 139da9c4d7 [InstCombine] fold fabs of select with negated operand
This is the FP example shown in:
https://bugs.llvm.org/PR39474
2020-08-18 09:23:07 -04:00
Shinji Okumura 5e361e2aa4 [Attributor] Deduce noundef attribute
This patch introduces a new abstract attribute `AANoUndef` which corresponds to `noundef` IR attribute and deduce them.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85184
2020-08-18 18:05:54 +09:00
Johannes Doerfert 8abd69aa9e [Attributor] Bail early if AAMemoryLocation cannot derive anything
Before this change we looked through all memory operations in a function
even if the first was an unknown call that could do anything. This did
cost a lot of time but there is little use to do so. We also avoid
creating AAs for things that we would have looked at in case no other AA
will; that is the reason for the test changes.

Running only the attributor-cgscc pass on a IR version of
`llvm-test-suite/MultiSource/Applications/SPASS/clause.c` reduced the
time we spend in `AAMemoryLocation::update` from 4% total to
0.9% (disclaimer: no accurate measurements).
2020-08-17 23:36:36 -05:00
Johannes Doerfert 1d99c3d707 [Attributor] We (should) keep the CG updated so we can mark it as preserved 2020-08-17 23:36:36 -05:00
Johannes Doerfert 858c75f7d1 [Attributor][NFC] Directly return proper type to avoid casts 2020-08-17 23:36:36 -05:00
Johannes Doerfert b27bdf955a [Attributor][FIX] Handle function pointers properly in AANonNull
Before we tired to create a dominator tree for a declaration when we
wanted to determine if the function pointer is `nonnull`. We now avoid
looking at global values if `Value::getPointerDereferenceableBytes` not
already determined `nonnull`.
2020-08-17 23:36:35 -05:00
Hamilton Tobon Mosquera 496f8e5b36 [OpenMPOpt][HideMemTransfersLatency] Split __tgt_target_data_begin_mapper into its "issue" and "wait" counterparts.
WIP that tries to hide the latency of runtime calls that involve host to
device memory transfers by splitting them into their "issue" and "wait"
versions. The "issue" is moved upwards as much as possible. The "wait" is
moved downards as much as possible. The "issue" issues the memory transfer
asynchronously, returning a handle. The "wait" waits in the returned
handle for the memory transfer to finish. We still lack of the movement.
2020-08-17 20:56:10 -05:00
Aditya Kumar 370330f084 NFC: [GVNHoist] Outline functions from the class
Reviewers: sebpop
Reviewed By: hiraditya

Differential Revision: https://reviews.llvm.org/D86032
2020-08-17 17:40:04 -07:00
Johannes Doerfert 19bd4ef157 [Attributor] Properly use the call site argument position 2020-08-17 18:21:09 -05:00
Johannes Doerfert 5dfc207c53 [Attributor][FIX] Do not request an AANonNull for non-pointer types 2020-08-17 18:21:08 -05:00
Roman Lebedev 03127f795b
[InstCombine] PHI-aware aggregate reconstruction: correctly detect "use" basic block
While the original implementation added in D85787 / ae7f08812e
is not incorrect, it is known to be suboptimal.

In particular, it is not incorrect to use the basic block
in which the original `insertvalue` instruction is located
as the merge point, that is not necessarily optimal,
as `@test6` shows.

We should look at all the AggElts, and, if they are all defined
in the same basic block, then that is the basic block we should use.

On RawSpeed library, this catches +4% (+50) more cases.
On vanilla LLVM test-suits, this catches +12% (+92) more cases.
2020-08-18 00:45:18 +03:00
Roman Lebedev f4f673e0e3
[NFC][InstCombine] PHI-aware aggregate reconstruction: don't capture UseBB in lambdas, take it as argument
In a following patch, UseBB will be detected later,
so capturing it is potentially error-prone (capture by ref vs by val).

Also, parametrized UseBB will likely be needed
for multiple levels of PHI indirections later on anyways.
2020-08-18 00:45:18 +03:00