Commit Graph

1066 Commits

Author SHA1 Message Date
Roman Lebedev d15d81ce15
[SimplifyCFG] FoldValueComparisonIntoPredecessors(): deal with each predecessor only once
If the predecessor is a switch, and BB is not the default destination,
multiple cases could have the same destination. and it doesn't
make sense to re-process the predecessor, because we won't make any changes,
once is enough.

I'm not sure this can be really tested, other than via the assertion
being added here, which fires without the fix.
2021-01-06 01:52:37 +03:00
Roman Lebedev fc96cb2dad
[SimplifyCFG] FoldValueComparisonIntoPredecessors(): switch to non-permissive DomTree updates
... which requires not adding a DomTree edge that we just added.
2021-01-06 01:52:37 +03:00
Roman Lebedev 29ca7d5a1a
[SimplifyCFG] simplifyUnreachable(): fix handling of degenerate same-destination conditional branch
One would hope that it would have been already canonicalized into an
unconditional branch, but that isn't really guaranteed to happen
with SimplifyCFG's visitation order.
2021-01-06 01:52:36 +03:00
Roman Lebedev 3460719f58
[NFC][SimplifyCFG] Add a test with same-destination condidional branch
Reported by Mikael Holmén as post-commit feedback on
https://reviews.llvm.org/rG2d07414ee5f74a09fb89723b4a9bb0818bdc2e18#968162
2021-01-06 01:52:36 +03:00
Roman Lebedev f98535686e
[SimplifyCFG] simplifyUnreachable(): switch to non-permissive DomTree updates
... which requires not removing a DomTree edge if the switch's default
still points at that destination, because it can't be removed;
... and not processing the same predecessor more than once.
2021-01-06 01:52:36 +03:00
Roman Lebedev 32c47ebef1
[SimplifyCFG] SimplifyCondBranchToTwoReturns(): switch to non-permissive DomTree updates
... which requires not deleting an edge that just got deleted,
because we could be dealing with a block that didn't go through
ConstantFoldTerminator() yet, and thus has a degenerate cond br
with matching true/false destinations.
2021-01-05 01:26:37 +03:00
Roman Lebedev 110b3d7855
[SimplifyCFG] SimplifyEqualityComparisonWithOnlyPredecessor(): switch to non-permissive DomTree updates
... which requires not deleting an edge that just got deleted.
2021-01-05 01:26:37 +03:00
Roman Lebedev a8604e3d5b
[SimplifyCFG] simplifyIndirectBr(): switch to non-permissive DomTree updates
... which requires not deleting an edge that just got deleted.
2021-01-05 01:26:36 +03:00
Roman Lebedev 3fb57222c4
[NFCI] SimplifyCFG: switch to non-permissive DomTree updates, where possible
Notably, this doesn't switch *every* case, remaining cases
don't actually pass sanity checks in non-permissve mode,
and therefore require further analysis.

Note that SimplifyCFG still defaults to not preserving DomTree by default,
so this is effectively a NFC change.
2021-01-05 01:26:36 +03:00
Roman Lebedev 98cd1c33e3
[NFC][SimplifyCFG] Hoist 'original' DomTree verification from simplifyOnce() into run()
This is NFC since SimplifyCFG still currently defaults to not preserving DomTree.

SimplifyCFGOpt::simplifyOnce() is only be called from SimplifyCFGOpt::run(),
and can not be called externally, since SimplifyCFGOpt is defined in .cpp
This avoids some needless verifications, and is thus a bit faster
without sacrificing precision.
2021-01-04 01:02:02 +03:00
Roman Lebedev a7684940f0
[SimplifyCFG] SimplifyTerminatorOnSelect(): fix/tune DomTree updates
We only need to remove non-TrueBB/non-FalseBB successors,
and we only need to do that once. We don't need to insert
any new edges, because no new successors will be added.
2021-01-04 01:02:02 +03:00
Roman Lebedev 70935b9595
[NFC][SimplifyCFG] SimplifyTerminatorOnSelect(): pull out OldTerm->getParent() into a variable 2021-01-04 01:02:02 +03:00
Roman Lebedev 5fa241a657
[SimplifyCFG] FoldValueComparisonIntoPredecessors(): fine-tune/fix DomTree preservation, take 2 2021-01-03 01:45:48 +03:00
Roman Lebedev 6a3a8d17eb
[SimplifyCFG] FoldValueComparisonIntoPredecessors(): fine-tune/fix DomTree preservation 2021-01-03 01:45:48 +03:00
Roman Lebedev 7c8b8063b6
[SimplifyCFG][AMDGPU] AMDGPUUnifyDivergentExitNodes: SimplifyCFG isn't ready to preserve PostDomTree
There is a number of transforms in SimplifyCFG that take DomTree out of
DomTreeUpdater, and do updates manually. Until they are fixed,
user passes are unable to claim that PDT is preserved.

Note that the default for SimplifyCFG is still not to preserve DomTree,
so this is still effectively NFC.
2021-01-03 01:45:46 +03:00
Roman Lebedev b9da488ad7
[SimplifyCFG] Don't actually take DomTreeUpdater unless we intend to maintain DomTree validity
This guards against unintentional mistakes
like the one i just fixed in previous commit.
2021-01-02 14:40:55 +03:00
Roman Lebedev b4429f3cdd
[SimplifyCFG] Teach removeUndefIntroducingPredecessor to preserve DomTree 2021-01-02 01:01:20 +03:00
Roman Lebedev 657c1e09da
[SimplifyCFG] Teach eliminateDeadSwitchCases() to preserve DomTree, part 2 2021-01-02 01:01:18 +03:00
Roman Lebedev f1ce696056
[SimplifyCFG] Teach tryWidenCondBranchToCondBranch() to preserve DomTree 2021-01-02 01:01:17 +03:00
Roman Lebedev 831636b0e6
[SimplifyCFG] SUCCESS! Teach createUnreachableSwitchDefault() to preserve DomTree
This pretty much concludes patch series for updating SimplifyCFG
to preserve DomTree. All 318 dedicated `-simplifycfg` tests now pass
with `-simplifycfg-require-and-preserve-domtree=1`.

There are a few leftovers that apparently don't have good test coverage.
I do not yet know what gaps in test coverage will the wider-scale testing
reveal, but the default flip might be close.
2021-01-01 03:25:25 +03:00
Roman Lebedev e1440d43bc
[SimplifyCFG] Teach tryToSimplifyUncondBranchWithICmpInIt() to preserve DomTree 2021-01-01 03:25:25 +03:00
Roman Lebedev 8866583953
[SimplifyCFG] Teach FoldValueComparisonIntoPredecessors() to preserve DomTree, part 2 2021-01-01 03:25:24 +03:00
Roman Lebedev a815b6b2b2
[SimplifyCFG] Teach eliminateDeadSwitchCases() to preserve DomTree, part 1 2021-01-01 03:25:24 +03:00
Roman Lebedev 0d2f219d4d
[SimplifyCFG] Teach SimplifyEqualityComparisonWithOnlyPredecessor() to preserve DomTree, part 3 2021-01-01 03:25:23 +03:00
Roman Lebedev 9f17dab1f4
[SimplifyCFG] Teach simplifyIndirectBr() to preserve DomTree 2021-01-01 03:25:23 +03:00
Roman Lebedev b7c463d7b8
[SimplifyCFG] Teach FoldBranchToCommonDest() to preserve DomTree, part 2 2021-01-01 03:25:23 +03:00
Roman Lebedev c1b825d4b8
[SimplifyCFG] Teach FoldValueComparisonIntoPredecessors() to preserve DomTree, part 1 2021-01-01 03:25:22 +03:00
Roman Lebedev 7f221c9196
[SimplifyCFG] Teach SwitchToLookupTable() to preserve DomTree 2020-12-30 23:58:41 +03:00
Roman Lebedev a17025aa61
[SimplifyCFG] Teach switchToSelect() to preserve DomTree 2020-12-30 23:58:40 +03:00
Roman Lebedev c45f765c0d
[SimplifyCFG] Teach SimplifyBranchOnICmpChain() to preserve DomTree 2020-12-30 23:58:40 +03:00
Kazu Hirata 16d20e2554 [Transforms/Utils] Construct SmallVector with iterator ranges (NFC) 2020-12-29 19:23:23 -08:00
Roman Lebedev 39a56f7f17
[SimplifyCFG] Teach SimplifyTerminatorOnSelect() to preserve DomTree 2020-12-30 00:48:12 +03:00
Roman Lebedev ec0b671a61
[SimplifyCFG] Teach SimplifyCondBranchToCondBranch() to preserve DomTree 2020-12-30 00:48:12 +03:00
Roman Lebedev 307156246f
[SimplifyCFG] Teach mergeConditionalStoreToAddress() to preserve DomTree 2020-12-30 00:48:11 +03:00
Roman Lebedev d4c0abb4a3
[SimplifyCFG] Teach FoldCondBranchOnPHI() to preserve DomTree 2020-12-30 00:48:11 +03:00
Roman Lebedev b8121b2e62
[SimplifyCFG] Teach SinkCommonCodeFromPredecessors() to preserve DomTree 2020-12-30 00:48:11 +03:00
Roman Lebedev 18c407bf4c
[SimplifyCFG] Teach HoistThenElseCodeToIf() to preserve DomTree 2020-12-30 00:48:10 +03:00
Roman Lebedev fe9bdd9621
[SimplifyCFG] Teach SimplifyEqualityComparisonWithOnlyPredecessor() to preserve DomTree, part 2 2020-12-30 00:48:10 +03:00
Roman Lebedev 6027e05dbf
[SimplifyCFG] Teach SimplifyEqualityComparisonWithOnlyPredecessor() to preserve DomTree, part 1 2020-12-30 00:48:10 +03:00
Roman Lebedev ef93f7a11c
[SimplifyCFG] FoldBranchToCommonDest: gracefully handle unreachable code ()
We might be dealing with an unreachable code,
so the bonus instruction we clone might be self-referencing.

There is a sanity check that all uses of bonus instructions
that are not in the original block with said bonus instructions
are PHI nodes, and that is obviously not the case
for self-referencing instructions..

So if we find such an use, just rewrite it.

Thanks to Mikael Holmén for the reproducer!

Fixes https://bugs.llvm.org/show_bug.cgi?id=48450#c8
2020-12-28 23:31:19 +03:00
Roman Lebedev ff3749fc79
[NFC] SimplifyCFGOpt::simplifyUnreachable(): pacify unused variable warning
Thanks to Luke Benes for pointing it out.
2020-12-24 21:20:46 +03:00
Roman Lebedev c043f5055e
[SimplifyCFG] Teach FoldBranchToCommonDest() to preserve DomTree, part 1
... for conditional branch case
2020-12-20 00:18:36 +03:00
Roman Lebedev 262ff9c23e
[SimplifyCFG] Teach TryToMergeLandingPad() to preserve DomTree 2020-12-20 00:18:36 +03:00
Roman Lebedev 6a1617d67c
[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 2
... for the custom case returning void.
2020-12-20 00:18:36 +03:00
Roman Lebedev b94520c9ee
[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 1
... for the general case of returning a value.
2020-12-20 00:18:35 +03:00
Roman Lebedev 4d87a6ad13
[NFCI][SimplifyCFG] SimplifyCondBranchToTwoReturns(): pull out BI->getParent() into a variable 2020-12-20 00:18:35 +03:00
Roman Lebedev 83659c7076
[SimplifyCFG] simplifySingleResume(): FoldReturnIntoUncondBranch() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Apparently, there were no dedicated tests just for that functionality,
so i'm adding one here.
2020-12-20 00:18:34 +03:00
Roman Lebedev b7d00e29b7
[SimplifyCFG] Teach simplifySingleResume() to preserve DomTree 2020-12-20 00:18:34 +03:00
Roman Lebedev c209b88dd4
[SimplifyCFG] Teach simplifyCommonResume() to preserve DomTree 2020-12-20 00:18:34 +03:00
Roman Lebedev 76e74d9395
[SimplifyCFG] Teach removeEmptyCleanup() to preserve DomTree 2020-12-20 00:18:33 +03:00
Roman Lebedev 4be8707e64
[SimplifyCFG] Teach FoldTwoEntryPHINode() to preserve DomTree
Still boring, simply drop all edges to successors of DomBlock,
and add an edge to to BB instead.
2020-12-20 00:18:33 +03:00
Roman Lebedev b43b77ff9b
[NFCI][SimlifyCFG] simplifyOnce(): also perform DomTree validation
And that exposes that a number of tests don't *actually* manage to
maintain DomTree validity, which is inline with my observations.

Once again, SimlifyCFG pass currently does not require/preserve DomTree
by default, so this is effectively NFC.
2020-12-20 00:18:32 +03:00
Roman Lebedev 2d07414ee5
[SimplifyCFG] Teach simplifyUnreachable() to preserve DomTree
Pretty boring, removeUnwindEdge() already known how to update DomTree,
so if we are to call it, we must first flush our own pending updates;
otherwise, we just stop predecessors from branching to us,
and for certain predecessors, stop their predecessors from
branching to them also.
2020-12-18 00:37:22 +03:00
Roman Lebedev 2ee724863e
[SimplifyCFG] ConstantFoldTerminator() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a number of tests,
all of which are marked as such so that they do not regress.
2020-12-18 00:37:22 +03:00
Roman Lebedev 164e0847a5
[SimplifyCFG] DeleteDeadBlock() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
2020-12-18 00:37:21 +03:00
Florian Hahn 75c04bfc61
[SimplifyCFG] Preserve !annotation in FoldBranchToCommonDest.
When folding a branch to a common destination, preserve !annotation on
the created instruction, if the terminator of the BB that is going to be
removed has !annotation. This should ensure that !annotation is attached
to the instructions that 'replace' the original terminator.

Reviewed By: jdoerfert, lebedev.ri

Differential Revision: https://reviews.llvm.org/D93410
2020-12-17 14:06:58 +00:00
Roman Lebedev 5cce4aff18
[SimplifyCFG] TryToSimplifyUncondBranchFromEmptyBlock() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
2020-12-17 01:03:49 +03:00
Roman Lebedev 49dac4aca0
[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
2020-12-17 01:03:49 +03:00
Roman Lebedev e113317958
[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware
Two observations:
1. Unavailability of DomTree makes it impossible to make
  `FoldBranchToCommonDest()` transform in certain cases,
   where the successor is dominated by predecessor,
   because we then don't have PHI's, and can't recreate them,
   well, without handrolling 'is dominated by' check,
   which doesn't really look like a great solution to me.
2. Avoiding invalidating DomTree in SimplifyCFG will
   decrease the number of `Dominator Tree Construction` by 5
   (from 28 now, i.e. -18%) in `-O3` old-pm pipeline
   (as per `llvm/test/Other/opt-O3-pipeline.ll`)
   This might or might not be beneficial for compile time.

So the plan is to make SimplifyCFG preserve DomTree, and then
eventually make DomTree fully required and preserved by the pass.

Now, SimplifyCFG is ~7KLOC. I don't think it will be nice
to do all this uplifting in a single mega-commit,
nor would it be possible to review it in any meaningful way.

But, i believe, it should be possible to do this in smaller steps,
introducing the new behavior, in an optional way, off-by-default,
opt-in option, and gradually fixing transforms one-by-one
and adding the flag to appropriate test coverage.

Then, eventually, the default should be flipped,
and eventually^2 the flag removed.

And that is what is happening here - when the new off-by-default option
is specified, DomTree is required and is claimed to be preserved,
and SimplifyCFG-internal assertions verify that the DomTree is still OK.
2020-12-16 00:38:00 +03:00
Roman Lebedev 59560e8589
[SimplifyCFG] FoldBranchToCommonDest(): temporairly put back restrictions on liveout uses of bonus instructions (PR48450)
Even though d38205144f was mostly a correct
fix for the external non-PHI users, it's not a *generally* correct fix,
because the 'placeholder' values in those trivial PHI's we create
shouldn't be *always* 'undef', but the PHI itself for the backedges,
else we end up with wrong value, as the `@pr48450_2` test shows.

But we can't just do that, because we can't check that the PHI
can be it's own incoming value when coming from certain predecessor,
because we don't have a dominator tree.

So until we can address this correctness problem properly,
ensure that we don't perform the transformation
if there are such problematic external uses.

Making dominator tree available there is going to be involved,
since `-simplifycfg` pass currently does not preserve/update domtree...
2020-12-14 20:14:31 +03:00
Roman Lebedev e8360a8e1e
[NFC][SimplifyCFG] FoldBranchToCommonDest(): pull out 'common successor' into a variable
Makes it easier to use it elsewhere
2020-12-14 20:14:31 +03:00
Kazu Hirata 5891ad4e22 [Transforms] Use llvm::erase_value (NFC) 2020-12-13 09:48:47 -08:00
Roman Lebedev d38205144f
[SimplifyCFG] FoldBranchToCommonDest(): bonus instrns must only be used by PHI nodes in successors (PR48450)
In particular, if the successor block, which is about to get a new
predecessor block, currently only has a single predecessor,
then the bonus instructions will be directly used within said successor,
which is fine, since the block with bonus instructions dominates that
successor. But once there's a new predecessor, the IR is no longer valid,
and we don't fix it, because we only update PHI nodes.

Which means, the live-out bonus instructions must be exclusively used
by the PHI nodes in successor blocks. So we have to form trivial PHI nodes.
which will then be successfully updated to recieve cloned bonus instns.

This all works fine, except for the fact that we don't have access to
the dominator tree, and we don't ignore unreachable code,
so we sometimes do end up having to deal with some weird IR.

Fixes https://bugs.llvm.org/show_bug.cgi?id=48450
2020-12-13 00:06:57 +03:00
Roman Lebedev 15f8060f6f
[SimplifyCFG] FoldBranchToCommonDest: don't require that cmp of br is last instruction
There is no correctness need for that, and since we allow live-out
uses, this could theoretically happen, because currently nothing
will move the cond to right before the branch in those tests.
But regardless, lifting that restriction even makes the transform
easier to understand.

This makes the transform happen in 81 more cases (+0.55%)
)
2020-12-01 15:13:06 +03:00
Roman Lebedev b0e9b7c59f
[NFC][SimplifyCFG] Add STATISTIC() to the FoldValueComparisonIntoPredecessors() fold 2020-11-30 12:27:16 +03:00
Roman Lebedev b33fbbaa34
Reland [SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions
This was orginally committed in 2245fb8aaa.
but was immediately reverted in f3abd54958
because of a PHI handling issue.

Original commit message:

1. It doesn't make sense to enforce that the bonus instruction
   is only used once in it's basic block. What matters is
   whether those user instructions fit within our budget, sure,
   but that is another question.
2. It doesn't make sense to enforce that said bonus instructions
   are only used within their basic block. Perhaps the branch
   condition isn't using the value computed by said bonus instruction,
   and said bonus instruction is simply being calculated
   to be used in successors?

So iff we can clone bonus instructions, to lift these restrictions,
we just need to carefully update their external uses
to use the new cloned instructions.

Notably, this transform (even without this change) appears to be
poison-unsafe as per alive2, but is otherwise (including the patch) legal.

We don't introduce any new PHI nodes, but only "move" the instructions
around, i'm not really seeing much potential for extra cost modelling
for the transform, especially since now we allow at most one such
bonus instruction by default.

This causes the fold to fire +11.4% more (13216 -> 14725)
as of vanilla llvm test-suite + RawSpeed.

The motivational pattern is IEEE-754-2008 Binary16->Binary32
extension code:
ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)
^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v
That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM
apparently that fold is happening somewhere else afterall,
so something else also has a similar 'artificial' restriction.
2020-11-27 12:47:15 +03:00
Roman Lebedev f3abd54958
Revert "[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions"
Many bots are unhappy, at the very least missed a few codegen tests,
and possibly this has a logic hole inducing a miscompile
(will be really awesome to have ready reproducer..)

Need to investigate.

This reverts commit 2245fb8aaa.
2020-11-26 23:13:43 +03:00
Roman Lebedev 2245fb8aaa
[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions
1. It doesn't make sense to enforce that the bonus instruction
   is only used once in it's basic block. What matters is
   whether those user instructions fit within our budget, sure,
   but that is another question.
2. It doesn't make sense to enforce that said bonus instructions
   are only used within their basic block. Perhaps the branch
   condition isn't using the value computed by said bonus instruction,
   and said bonus instruction is simply being calculated
   to be used in successors?

So iff we can clone bonus instructions, to lift these restrictions,
we just need to carefully update their external uses
to use the new cloned instructions.

Notably, this transform (even without this change) appears to be
poison-unsafe as per alive2, but is otherwise (including the patch) legal.

We don't introduce any new PHI nodes, but only "move" the instructions
around, i'm not really seeing much potential for extra cost modelling
for the transform, especially since now we allow at most one such
bonus instruction by default.

This causes the fold to fire +11.4% more (13216 -> 14725)
as of vanilla llvm test-suite + RawSpeed.

The motivational pattern is IEEE-754-2008 Binary16->Binary32
extension code:
ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)
^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v
That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM
apparently that fold is happening somewhere else afterall,
so something else also has a similar 'artificial' restriction.
2020-11-26 22:51:22 +03:00
Roman Lebedev 65db7d38e0
[NFC][SimplifyCFG] Add statistic to `FoldBranchToCommonDest()` fold 2020-11-26 22:51:21 +03:00
Hongtao Yu f3c445697d [CSSPGO] IR intrinsic for pseudo-probe block instrumentation
This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues:

1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality.
2. The counter atomics may not be fully cleaned up from the code stream eventually.
3. Extra work is needed for re-targeting.

We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality.

Let's now look at an example. Given the following LLVM IR:

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}
```

The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}

```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86490
2020-11-20 10:39:24 -08:00
Florian Hahn b3b993a7ad Reland "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts the revert commit 408c4408fa.

This version of the patch includes a fix for a crash caused by
treating ICmp/FCmp constant expressions as instructions.

Original message:

On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
2020-11-02 15:39:29 +00:00
Arthur Eubanks 5c31b8b94f Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit 10f2a0d662.

More uint64_t overflows.
2020-10-31 00:25:32 -07:00
Florian Hahn 408c4408fa Revert "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts commit 73f01e3df5.

This appears to break
http://lab.llvm.org:8011/#/builders/85/builds/383.
2020-10-30 21:26:14 +00:00
Arthur Eubanks 10f2a0d662 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-30 10:03:46 -07:00
Florian Hahn 73f01e3df5 [TTI] Add VecPred argument to getCmpSelInstrCost.
On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.

Reviewed By: dmgreen, RKSimon

Differential Revision: https://reviews.llvm.org/D90070
2020-10-30 13:49:08 +00:00
Nico Weber 2a4e704c92 Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit e5766f25c6.
Makes clang assert when building Chromium, see https://crbug.com/1142813
for a repro.
2020-10-27 09:26:21 -04:00
Arthur Eubanks e5766f25c6 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-26 20:24:04 -07:00
Zequan Wu 2f29341114 Revert "Revert "SimplifyCFG: Clean up optforfuzzing implementation""
This reverts commit 716f7636e1.
2020-10-21 17:08:56 -07:00
Zequan Wu 716f7636e1 Revert "SimplifyCFG: Clean up optforfuzzing implementation"
See discussion: https://reviews.llvm.org/D89590
This reverts commit cdd006eec9.
2020-10-21 16:56:32 -07:00
Jeremy Morse 05659606a2 Revert "[gardening] Replace some uses of setDebugLoc(DebugLoc()) with dropLocation(), NFC"
Some of the buildbots have croaked with this patch, for examples failures
that begin in this build:

  http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/29933

This reverts commit 674f57870f.
2020-09-30 09:52:12 +01:00
Vedant Kumar 674f57870f [gardening] Replace some uses of setDebugLoc(DebugLoc()) with dropLocation(), NFC 2020-09-29 17:39:07 -07:00
Sam Parker 65f78e73ad [SimplifyCFG] Consider cost of combining predicates.
Modify FoldBranchToCommonDest to consider the cost of inserting
instructions when attempting to combine predicates to fold blocks.
The threshold can be controlled via a new option:
-simplifycfg-branch-fold-threshold which defaults to '2' to allow
the insertion of a not and another logical operator.

Differential Revision: https://reviews.llvm.org/D86526
2020-09-07 10:04:50 +01:00
Benjamin Kramer 8782c72765 Strength-reduce SmallVectors to arrays. NFCI. 2020-08-28 21:14:20 +02:00
Sam Parker 8ce450da32 [NFCI][SimplifyCFG] Combine select costs and checks
Combine the cost modelling and validity checks for the phi to select
conversion in SpeculativelyExecuteBB, extracting the logic out into
a function.
2020-08-24 09:16:11 +01:00
Sam Parker bfc6d8b59b [NFC][SimplifyCFG] Formatting and variable rename 2020-08-21 13:11:17 +01:00
Sam Parker 47251582f5 [SimplifyCFG] Cost required selects
Before we speculatively execute a basic block, query the cost of
inserting the necessary select instructions against the phi folding
threshold. For non-trivial insertions, a more accurate decision can
probably be made during machine if-conversion. With minsize we query
the CodeSize cost, otherwise we use SizeAndLatency.

Differential Revision: https://reviews.llvm.org/D82438
2020-08-21 09:52:52 +01:00
Roman Lebedev e492f0e03b
[SimplifyCFG] Fix invoke->call fold w/ multiple invokes in presence of lifetime intrinsics
SimplifyCFG has two main folds for resumes - one when resume is directly
using the landingpad, and the other one where resume is using a PHI node.

While for the first case, we were already correctly ignoring all the
PHI nodes, and both the debug info intrinsics and lifetime intrinsics,
in the PHI-based-one, we weren't ignoring PHI's in the resume block,
and weren't ignoring lifetime intrinsics. That is clearly a bug.

On RawSpeed library, this results in +9.34% (+81) more invoke->call folds,
-0.19% (-39) landing pads, -0.24% (-81) invoke instructions
but +51 call instructions and -132 basic blocks.

Though, the run-time performance impact appears to be within the noise.
2020-08-08 20:00:28 +03:00
Roman Lebedev 1f452ac1d7
[NFC][SimplifyCFG] Rewrite isCleanupBlockEmpty() to be iterator_range-based 2020-08-08 20:00:28 +03:00
Roman Lebedev a587bf3eb0
[NFC][SimplifyCFG] Count the number of invokes turned into calls due to empty cleanup blocks 2020-08-08 20:00:27 +03:00
Max Kazantsev 360ab70712 [SimplifyCFG] Do not create unneeded PR Phi in block with convergent calls
We do not thread blocks with convergent calls, but this check was missing
when we decide to insert PR Phis into it (which we only do for threading).

Differential Revision: https://reviews.llvm.org/D83936
Reviewed By: nikic
2020-07-22 13:53:50 +07:00
Roman Lebedev 04b729d076
[NFCI][SimplifyCFG] Guard common code hoisting with a (default-on) flag
Common code sinking is already guarded with a (with default-off!) flag,
so add a flag for hoisting, too.

D84108 will hopefully make hoisting off-by-default too.
2020-07-20 10:29:57 +03:00
Jon Roelofs a0537fc35f [SimplifyCFG] Fix crash in the EXPENSIVE_CHECKS build
SimplifyCFG was incorrectly reporting to the pass manager that it had not made
changes after folding away a PHI.  This is detected in the EXPENSIVE_CHECKS
build when the function's hash changes.

Differential Revision: https://reviews.llvm.org/D83985
2020-07-16 15:34:41 -06:00
Roman Lebedev 2815429d08
[NFC][SimplifyCFG] HoistThenElseCodeToIf(): after hoisting terminator, do return Changed, not just true
Otherwise, if Changed was still false before that,
we would not account for that hoist in NumHoistCommonCode statistic.
2020-07-16 00:32:48 +03:00
Roman Lebedev 1cfc24fd67
[NFC][SimplifyCFG] HoistThenElseCodeToIf(): count number of common instruction "blocks" hoisted
I.e. out of all the times HoistThenElseCodeToIf() was called,
how many times did it actually hoist something?
2020-07-16 00:21:56 +03:00
Roman Lebedev 7b53ad88d4
[NFC][SimplifyCFG] HoistThenElseCodeToIf(): count number of common instructions hoisted 2020-07-16 00:21:56 +03:00
Roman Lebedev 3fc1defc0b
[NFC][SimplifyCFG] SinkCommonCodeFromPredecessors(): count number of instruction "blocks" actually sunk
Out of all the times the function was called,
how many times did we actually sink anything?
2020-07-16 00:21:56 +03:00
Roman Lebedev 9ed65c76c0
[NFC][SimplifyCFG] SinkCommonCodeFromPredecessors(): add debug output when failing to actually sink instr 2020-07-16 00:21:55 +03:00
Roman Lebedev 4c79864488
[NFC][SimplifyCFG] SinkCommonCodeFromPredecessors(): early return if nothing to sink
If we can't sink even one instruction, early return, to increase readability.
2020-07-16 00:21:55 +03:00
Roman Lebedev 702a3c6410
[NFC][SimplifyCFG] Rename statistic NumSinkCommons into NumSinkCommonInstrs
It really counts instructions added into common block,
not number of instruction groups sunk.
2020-07-16 00:21:55 +03:00
Guillaume Chatelet 8dbafd24d6 [Alignment][NFC] Transition and simplify calls to DL::getABITypeAlignment
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82977
2020-07-02 11:28:02 +00:00