Commit Graph

953 Commits

Author SHA1 Message Date
Nikita Popov 54db12ff5a [SimplifyCFG] Regenerate test checks (NFC)
Regenerate the branch weight test using --check-globals.
2021-05-04 19:51:30 +02:00
Teresa Johnson ea817d79be [SimplifyCFG] Look for control flow changes instead of side effects.
When passingValueIsAlwaysUndefined scans for an instruction between an
inst with a null or undef argument and its first use, it was checking
for instructions that may have side effects, which is a superset of the
instructions it intended to find (as per the comments, control flow
changing instructions that would prevent reaching the uses). Switch
to using isGuaranteedToTransferExecutionToSuccessor() instead.

Without this change, when enabling -fwhole-program-vtables, which causes
assumes to be inserted by clang, we can get different simplification
decisions. In particular, when building with instrumentation FDO it can
affect the optimizations decisions before FDO matching, leading to some
mismatches.

I had to modify d83507-knowledge-retention-bug.ll since this fix enables
more aggressive optimization of that code such that it no longer tested
the original bug it was meant to test. I removed the undef which still
provokes the original failure (confirmed by temporarily reverting the
fix) and also changed it to just invoke the passes of interest to narrow
the testing.

Similarly I needed to adjust code for UnreachableEliminate.ll to avoid
an undef which was causing the function body to get optimized away with
this fix.

Differential Revision: https://reviews.llvm.org/D101507
2021-05-03 13:32:22 -07:00
Roman Lebedev cc63203908
[SimplifyCFG] Common code sinking: fix application of profitability check
The profitability check is: we don't want to create more than a single PHI
per instruction sunk. We need to create the PHI unless we'll sink
all of it's would-be incoming values.

But there is a caveat there.
This profitability check doesn't converge on the first iteration!
If we first decide that we want to sink 10 instructions,
but then determine that 5'th one is unprofitable to sink,
that may result in us not sinking some instructions that
resulted in determining that some other instruction
we've determined to be profitable to sink becoming unprofitable.

So we need to iterate until we converge, as in determine
that all leftover instructions are profitable to sink.

But, the direct approach of just re-iterating seems dumb,
because in the worst case we'd find that the last instruction
is unprofitable, which would result in revisiting instructions
many many times.

Instead, i think we can get away with just two passes - forward and backward.
However then it isn't obvious what is the most performant way to update
InstructionsToSink.
2021-04-29 21:11:40 +03:00
Roman Lebedev 1886aad9d0
[SimplifyCFG] Common code sinking: relax restriction on non-uncond predecessors
While we have a known profitability issue for sinking in presence of
non-unconditional predecessors, there isn't any known issues
for having multiple such non-unconditional predecessors,
so said restriction appears to be artificial. Lift it.
2021-04-29 01:01:01 +03:00
Roman Lebedev 410d03aabf
[NFC][SimplifyCFG] Add test for sinking common code with multuple cond predecessors 2021-04-29 01:01:00 +03:00
Roman Lebedev a8e273f2ed
[NFC][SimplifyCFG] Add test showing that profitability check for sinking is broken
Essentially, we can't promise that the instruction is sinkable without
introducing PHI's until we know that it is profitable to sink.
2021-04-29 01:01:00 +03:00
Roman Lebedev d16d820c2e
[SimplifyCFG] Try 2: sink all-indirect indirect calls
Note that we don't want to turn a partially-direct call
into an indirect one, that will break ICP amongst other things.
2021-04-28 19:08:54 +03:00
Roman Lebedev 38dd222b4a
[NFC][SimplifyCFG] Add common code sinking test with direct and indirect callees
This is the pattern ICP produces.
We shouldn't fold this back into an indirect call.
2021-04-28 19:08:54 +03:00
Roman Lebedev 262c679d32
Revert "[SimplifyCFG] Sinking indirect calls - they're already indirect anyways"
Seems to break indirect call promotion, LTO/Resolution/X86/load-sample-prof-icp.ll fails.

This reverts commit e57cf128b3.
2021-04-28 17:46:59 +03:00
Roman Lebedev e57cf128b3
[SimplifyCFG] Sinking indirect calls - they're already indirect anyways 2021-04-28 17:36:23 +03:00
Roman Lebedev 677a0dee64
[NFC][SimplifyCFG] Add test for sinking indirect calls 2021-04-28 17:36:23 +03:00
Roman Lebedev a95a5dc5ab
[NFC][SimplifyCFG] Move sink-common-code.ll into X86
There are post-commit notest for e4c61d5 that suggest
the test is failing on certain bots. It looks like
the code there isn't being moved, which suggests
cost-model involvement, which suggests that we need to
hardcode the target triple.

Hopefully this helps?
2021-04-28 14:10:25 +03:00
Reid Kleckner a495b672b7
[NFC][SimplifyCFG] Precommit SimplifyCFG tests from D29428 2021-04-28 00:35:44 +03:00
Roman Lebedev 134f3ba3ae
[NFC][SimplifyCFG] Autogenerate check lines in few more tests 2021-04-28 00:35:44 +03:00
Roman Lebedev e4c61d5f83
[NFC][SimplifyCFG] Autogenerate check lines in many test files
These are potentially being affected by an upcoming patch.
2021-04-27 22:05:42 +03:00
Michael Kruse b99466eb45 [SimplifyCFG] Preserve metadata when unconditionalizing branches (same target).
When replacing a conditional branch by an unconditional one because the targets are identical, transfer the metadata to the new branch instruction.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D101226
2021-04-26 17:23:01 -05:00
Michael Kruse 153144be40 [SimplifyCFG] Preserve metadata when unconditionalizing branches (constant condition).
When replacing a conditional branch by an unconditional one because the condition is a constant, transfer the metadata to the new branch instruction.

Part of fix for llvm.org/PR50060

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D101141
2021-04-26 10:57:31 -05:00
Teresa Johnson 10b781fb03 Mark type test intrinsics as speculatable to fix inline cost
There is already code in InlineCost.cpp to identify and ignore ephemeral
values (llvm.assume intrinsics and other side-effect free instructions
only feeding the assumes). However, because llvm.type.test intrinsics
were not marked speculatable, they and any instructions specifically
feeding the type test (typically a bitcast) were being counted towards
the instruction cost when inlining. This was causing profile matching
issues in some cases when enabling -fwhole-program-vtables for whole
program devirtualization.

According to the language reference, the speculatable attribute means:
"the function does not have any effects besides calculating its result
and does not have undefined behavior". I see no reason why type tests
cannot be marked with this attribute.

There are 2 test changes:

llvm/test/Transforms/Inline/ephemeral.ll: I added a type test intrinsic
here to verify the fix. Also, I found the test was not actually testing
what it originally intended. Many of the existing instructions were
optimized away by -Oz, and the cost of inlining was negative due to the
benefit of removing the call. So I changed the test to simply invoke the
inline pass and check the number of instructions computed by InlineCost.
I also fixed an instruction that was not actually used anywhere.

llvm/test/Transforms/SimplifyCFG/no-md-sink.ll needed to be made more
robust to code changes that reordered the metadata.

Differential Revision: https://reviews.llvm.org/D101180
2021-04-23 10:02:31 -07:00
Florian Hahn af523514c4
[SimplifyCFG] Skip dbg intrinsics when checking for branch-only BBs.
Debug intrinsics are free to hoist and should be skipped when looking
for terminator-only blocks. As a consequence, we have to delegate to the
main hoisting loop to hoist any dbg intrinsics instead of jumping to the
terminator case directly.

This fixes PR49982.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D100640
2021-04-17 15:17:50 +01:00
Florian Hahn 31b5c2b1d2
[SimplifyCFG] Regenerate CHECK lines and add test for PR49982. 2021-04-16 12:05:54 +01:00
Florian Hahn 467b1f1cd2
[SimplifyCFG] Allow hoisting terminators only with HoistCommonInsts=false.
As a side-effect of the change to default HoistCommonInsts to false
early in the pipeline, we fail to convert conditional branch & phis to
selects early on, which prevents vectorization for loops that contain
conditional branches that effectively are selects (or if the loop gets
vectorized, it will get vectorized very inefficiently).

This patch updates SimplifyCFG to perform hoisting if the only
instruction in both BBs is an equal branch. In this case, the only
additional instructions are selects for phis, which should be cheap.

Even though we perform hoisting, the benefits of this kind of hoisting
should by far outweigh the negatives.

For example, the loop in the code below will not get vectorized on
AArch64 with the current default, but will with the patch. This is a
fundamental pattern we should definitely vectorize. Besides that, I
think the select variants should be easier to use for reasoning across
other passes as well.

https://clang.godbolt.org/z/sbjd8Wshx

```
double clamp(double v) {
  if (v < 0.0)
    return 0.0;
  if (v > 6.0)
    return 6.0;
  return v;
}

void loop(double* X, double *Y) {
  for (unsigned i = 0; i < 20000; i++) {
    X[i] = clamp(Y[i]);
  }
}
```

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D100329
2021-04-13 10:33:35 +01:00
Florian Hahn 59334755e4
[SimplifyCFG] Add test requiring only hoisting a branch. 2021-04-12 22:29:34 +01:00
Nikita Popov 9bad7de9a3 [SimplifyCFG] Handle two equal cases in switch to select
When converting a switch with two cases and a default into a
select, also handle the denegerate case where two cases have the
same value.

Generate this case directly as

  %or = or i1 %cmp1, %cmp2
  %res = select i1 %or, i32 %val, i32 %default

rather than

  %sel1 = select i1 %cmp1, i32 %val, i32 %default
  %res = select i1 %cmp2, i32 %val, i32 %sel1

as InstCombine is going to canonicalize to the former anyway.
2021-04-04 17:27:28 +02:00
Nikita Popov 7ca168dd5a [SimplifyCFG] Add switch-to-select test with two equal cases (NFC)
We handle the case where we have two cases and a default all having
different values, but not the case where two cases happen to have
the same one.

The PhaseOrdering test is a particularly bad example where this
showed up.
2021-04-04 17:14:59 +02:00
Nikita Popov cb4443994e [SimplifyCFG] Make test more robust (NFC)
These are supposed to test creation of a switch, so make sure
there is some actual code in the branches. Otherwise this could
be turned into a select instead.
2021-04-04 17:14:59 +02:00
Roman Lebedev b5822026dd
[SimplifyCFG] 'Fold branch to common dest': don't overestimate the cost
`FoldBranchToCommonDest()` has a certain budget (`-bonus-inst-threshold=`)
for bonus instruction duplication. And currently it calculates the cost
as-if it will actually duplicate into each predecessor.

But ignoring the budget, it won't always duplicate into each predecessor,
there are some correctness and profitability checks.
So when calculating the cost, we should first check into which blocks
will we *actually* duplicate, and only then use that block count
to do budgeting.
2021-03-23 18:30:26 +03:00
Roman Lebedev a866f72eb2
[NFC][SimplifyCFG] 'Fold branch to common dest': add test for cost overestimation
We should not count the cost of duplication into predecessors into which
we won't ultimately duplicate.
2021-03-23 18:30:26 +03:00
Roman Lebedev 514bc01ca3
[SimplifyCFG] FoldBranchToCommonDest(): properly handle same-block external uses (PR49510/PR49689)
We clone bonus instructions to the end of the predecessor block,
and then use `SSAUpdater::RewriteUseAfterInsertions()`.
But that only deals with the cases where the use-to-be-rewritten
are either in different block from the def, or come after the def.

But in some loop cases, the external use may be in the beginning of
predecessor block, before the newly cloned bonus instruction.
`SSAUpdater::RewriteUseAfterInsertions()` does not deal with that.
Notably, the external use can't happen to be both in the same block
and *after* the newly-cloned instruction, because of the fold preconditions.

To properly handle these cases, when the use is in the same block,
we should instead use `SSAUpdater::RewriteUse()`.
TBN, they do the same thing for PHI users.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49510
Likely Fixes https://bugs.llvm.org/show_bug.cgi?id=49689
2021-03-23 17:37:28 +03:00
Sanjay Patel 1bf8f9e228 [SimplifyCFG] use profile metadata to refine merging branch conditions
2nd try (original: 27ae17a6b0) with fix/test for crash. We must make
sure that TTI is available before trying to use it because it is not
required (might be another bug).

Original commit message:

This is one step towards solving:
https://llvm.org/PR49336

In that example, we disregard the recommended usage of builtin_expect,
so an expensive (unpredictable) branch is folded into another branch
that is guarding it.
Here, we read the profile metadata to see if the 1st (predecessor)
condition is likely to cause execution to bypass the 2nd (successor)
condition before merging conditions by using logic ops.

Differential Revision: https://reviews.llvm.org/D98898
2021-03-23 10:19:37 -04:00
Juneyoung Lee 960a767368 Reland "[InstCombine] Add simplification of two logical and/ors"
This relands 07c3b97e18 (D96945) which was reverted by
commit f49354838e.
The two-stage compilation successfully tests passes on my machine.
2021-03-23 16:24:50 +09:00
Juneyoung Lee 5c2e50b5d2 Reland "[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe"
This relands commit 99108c791d (D95026) which was
reverted by 8d5a981a13 because the underlying
problem (https://llvm.org/pr49495) is fixed.
2021-03-23 09:19:53 +09:00
Sanjay Patel 95f7f7c21b Revert "[SimplifyCFG] use profile metadata to refine merging branch conditions"
This reverts commit 27ae17a6b0.
There are bot failures that end with:
 #4 0x00007fff7ae3c9b8 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #5 0x00007fff84e504d8 (linux-vdso64.so.1+0x4d8)
 #6 0x00007fff7c419a5c llvm::TargetTransformInfo::getPredictableBranchThreshold() const (/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libLLVMAnalysis.so.13git+0x479a5c)

...but not sure how to trigger that yet.
2021-03-22 17:48:06 -04:00
Sanjay Patel 27ae17a6b0 [SimplifyCFG] use profile metadata to refine merging branch conditions
This is one step towards solving:
https://llvm.org/PR49336

In that example, we disregard the recommended usage of builtin_expect,
so an expensive (unpredictable) branch is folded into another branch
that is guarding it.
Here, we read the profile metadata to see if the 1st (predecessor)
condition is likely to cause execution to bypass the 2nd (successor)
condition before merging conditions by using logic ops.

Differential Revision: https://reviews.llvm.org/D98898
2021-03-22 16:49:21 -04:00
Sanjay Patel c21016715f [SimplifyCFG] adjust test branchweights; NFC
This will check the boundary conditions of the
revised change proposed in D98898.
2021-03-22 15:55:34 -04:00
Philip Reames 854de7c4d0 [tests] Refresh a bunch of autogen test to adjust for format changes 2021-03-22 10:41:39 -07:00
Sanjay Patel 92068d6c31 [SimplifyCFG] add tests for branch cond merging with prof metadata; NFC
See PR49336.
2021-03-18 15:39:50 -04:00
Sanjay Patel bd197ed0a5 [SimplifyCFG] avoid sinking insts within an infinite-loop
The test is reduced from a C source example in:
https://llvm.org/PR49541

It's possible that the test could be reduced further or
the predicate generalized further, but it seems to require
a few ingredients (including the "late" SimplifyCFG options
on the RUN line) to fall into the infinite-loop trap.
2021-03-12 08:04:57 -05:00
Juneyoung Lee f49354838e Revert "[InstCombine] Add simplification of two logical and/ors"
This reverts commit 07c3b97e18 due to a reported failure in two-stage build.
2021-03-10 05:48:31 +09:00
Mehdi Amini 8d5a981a13 Revert "[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe"
This reverts commit 99108c791d.
Clang is miscompiling LLVM with this change, a stage-2 build hits
multiple failures.

As a repro, I built clang in a stage1 directory and used it this way:

cmake -G Ninja ../llvm \
  -DCMAKE_CXX_COMPILER=`pwd`/../build-stage1/bin/clang++ \
  -DCMAKE_C_COMPILER=`pwd`/../build-stage1/bin/clang \
  -DLLVM_TARGETS_TO_BUILD="X86;NVPTX;AMDGPU" \
  -DLLVM_ENABLE_PROJECTS=mlir \
  -DLLVM_BUILD_EXAMPLES=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_ENABLE_ASSERTIONS=On
ninja check-mlir
2021-03-08 00:15:47 +00:00
Juneyoung Lee 07c3b97e18 [InstCombine] Add simplification of two logical and/ors
This is a patch that adds folding of two logical and/ors that share one variable:

a && (a && b) -> a && b
a && (a & b)  -> a && b
...

This is towards removing the poison-unsafe select optimization (D93065 has more context).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96945
2021-03-08 02:38:43 +09:00
Juneyoung Lee 99108c791d [SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe
This patch makes FoldBranchToCommonDest merge branch conditions into `select i1` rather than `and/or i1` when it is called by SimplifyCFG.
It is known that merging conditions into and/or is poison-unsafe, and this is towards making things *more* correct by removing possible miscompilations.
Currently, InstCombine simply consumes these selects into and/or of i1 (which is also unsafe), so the visible effect would be very small. The unsafe select -> and/or transformation will be removed in the future.
There has been efforts for updating optimizations to support the select form as well, and they are linked to D93065.

The safe transformation is fired when it is called by SimplifyCFG only. This is done by setting the new `PoisonSafe` argument as true.
Another place that calls FoldBranchToCommonDest is LoopSimplify. `PoisonSafe` flag is set to false in this case because enabling it has a nontrivial impact in performance because SCEV is more conservative with select form and InductiveRangeCheckElimination isn't aware of select form of and/or i1.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95026
2021-03-08 01:38:03 +09:00
Sanjay Patel 356cdabd3a [SimplifyCFG] avoid illegal phi with both poison and undef
In the example based on:
https://llvm.org/PR49218
...we are crashing because poison is a subclass of undef, so we merge blocks and create:

PHI node has multiple entries for the same basic block with different incoming values!
  %k3 = phi i64 [ poison, %entry ], [ %k3, %g ], [ undef, %entry ]

If both poison and undef values are incoming, we soften the poison values to undef.

Differential Revision: https://reviews.llvm.org/D97495
2021-02-27 09:10:32 -05:00
Juneyoung Lee 56d228a14e [SimplifyCFG] Update passingValueIsAlwaysUndefined to check more attributes
This is a simple patch to update SimplifyCFG's passingValueIsAlwaysUndefined to inspect more attributes.

A new function `CallBase::isPassingUndefUB` checks attributes that imply noundef.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D97244
2021-02-24 10:40:50 +09:00
Juneyoung Lee edf2e96742 [SimplifyCFG] Minor tweaks to the added tests (NFC) 2021-02-23 17:32:28 +09:00
Juneyoung Lee 28be9af0f8 [SimplifyCFG] Add tests for D97244 (NFC) 2021-02-23 17:15:17 +09:00
Nikita Popov 5c62d66131 [SimplifyCFG] Regenerate test checks (NFC) 2021-01-23 21:24:54 +01:00
Roman Lebedev 1742203844
[SimplifyCFG] FoldBranchToCommonDest(): re-lift restrictions on liveout uses of bonus instructions
I have previously tried doing that in
b33fbbaa34 / d38205144f,
but eventually it was pointed out that the approach taken there
was just broken wrt how the uses of bonus instructions are updated
to account for the fact that they should now use either bonus instruction
or the cloned bonus instruction. In particluar, all that manual handling
of PHI nodes in successors was just wrong.

But, the fix is actually much much simpler than my initial approach:
just tell SSAUpdate about both instances of bonus instruction,
and let it deal with all the PHI handling.

Alive2 confirms that the reproducers from the original bugs (@pr48450*)
are now handled correctly.

This effectively reverts commit 59560e8589,
effectively relanding b33fbbaa34.
2021-01-23 01:29:05 +03:00
Roman Lebedev e838750005
[NFC][SimplifyCFG] fold-branch-to-common-dest.ll: reduce complexity of @pr48450* test
We don't need that many iterations there,
having less iterations helps alive2 verify it.
2021-01-23 01:29:04 +03:00
Roman Lebedev 9bd8bcf993
[NFC][SimplifyCFG] PerformBranchToCommonDestFolding(): fix instruction name preservation
NewBonusInst just took name from BonusInst, so BonusInst has no name,
so BonusInst.getName() makes no sense.
So we need to ask NewBonusInst for the name.
2021-01-23 01:29:03 +03:00
Roman Lebedev d1a6f92fd5
[InstCombine] Fold `(~x) | y` --> `~(x & (~y))` iff it is free to do so
Iff we know we can get rid of the inversions in the new pattern,
we can thus get rid of the inversion in the old pattern,
this decreasing instruction count.

Note that we could position this transformation as just hoisting
of the `not` (still, iff y is freely negatible), but the test changes
show a number of regressions, so let's not do that.
2021-01-22 17:23:54 +03:00
Roman Lebedev 0895b836d7
[SimplifyCFG] FoldBranchToCommonDest(): don't deal with unconditional branches
The case where BB ends with an unconditional branch,
and has a single predecessor w/ conditional branch
to BB and a single successor of BB is exactly the pattern
SpeculativelyExecuteBB() transform deals with.
(and in this case they both allow speculating only a single instruction)

Well, or FoldTwoEntryPHINode(), if the final block
has only those two predecessors.

Here, in FoldBranchToCommonDest(), only a weird subset of that
transform is supported, and it's glued on the side in a weird way.
  In particular, it took me a bit to understand that the Cond
isn't actually a branch condition in that case, but just the value
we allow to speculate (otherwise it reads as a miscompile to me).
  Additionally, this only supports for the speculated instruction
to be an ICmp.

So let's just unclutter FoldBranchToCommonDest(), and leave
this transform up to SpeculativelyExecuteBB(). As far as i can tell,
this shouldn't really impact optimization potential, but if it does,
improving SpeculativelyExecuteBB() will be more beneficial anyways.

Notably, this only affects a single test,
but EarlyCSE should have run beforehand in the pipeline,
and then FoldTwoEntryPHINode() would have caught it.

This reverts commit rL158392 / commit d33f4efbfd.
2021-01-22 17:22:49 +03:00
Arthur Eubanks 6699029b67 [NewPM][opt] Run the "default" AA pipeline by default
We tend to assume that the AA pipeline is by default the default AA
pipeline and it's confusing when it's empty instead.

PR48779

Initially reverted due to BasicAA running analyses in an unspecified
order (multiple function calls as parameters), fixed by fetching
analyses before the call to construct BasicAA.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D95117
2021-01-21 21:08:54 -08:00
Arthur Eubanks ba9b4ea4ee Revert "[NewPM][opt] Run the "default" AA pipeline by default"
This reverts commit be611431cd.

Other/new-pm-lto-defaults.ll failing
2021-01-21 20:16:34 -08:00
Arthur Eubanks be611431cd [NewPM][opt] Run the "default" AA pipeline by default
We tend to assume that the AA pipeline is by default the default AA
pipeline and it's confusing when it's empty instead.

PR48779

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D95117
2021-01-21 19:46:38 -08:00
Juneyoung Lee 2e74a27756 [SimplifyCFG] Reapply update_test_checks.py (NFC) 2021-01-20 12:41:30 +09:00
Juneyoung Lee 395c737d9f [SimplifyCFG] Update SimplifyBranchOnICmpChain to recognize select form of and/or
This patch teaches SimplifyCFG::SimplifyBranchOnICmpChain to understand select form of
(x == C1 || x == C2 || ...) / (x != C1 && x != C2 && ...) and optimize them into switch if possible.
D93065 has more context about the transition, including links to the list of optimizations being updated.

Differential Revision: https://reviews.llvm.org/D93943
2021-01-19 08:53:40 +09:00
Nikita Popov 4229b87ed3 [ValueTracking] Fix isSafeToSpeculativelyExecute for sdiv (PR48778)
The != -1 check does not work correctly for all bitwidths. Use
isAllOnesValue() instead.
2021-01-17 20:06:17 +01:00
Nikita Popov 1cc477f030 [SimplifyCFG] Add test for PR48778 (NFC)
The sdiv is incorrectly speculated.
2021-01-17 20:06:17 +01:00
Dávid Bolvanský a1500105ee [SimplifyCFG] Optimize CFG when null is passed to a function with nonnull argument
Example:

```
__attribute__((nonnull,noinline)) char * pinc(char *p)  {
  return ++p;
}

char * foo(bool b, char *a) {
       return pinc(b ? 0 : a);

}
```

optimize to

```
char * foo(bool b, char *a) {
       return pinc(a);

}
```

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94180
2021-01-15 23:53:43 +01:00
Roman Lebedev a14c36fe27
[SimplifyCFG] switchToSelect(): don't forget to insert DomTree edge iff needed
DestBB might or might not already be a successor of SelectBB,
and it wasn't we need to ensure that we record the fact in DomTree.

The testcase used to crash in lazy domtree updater mode + non-per-function
domtree validity checks disabled.
2021-01-15 23:35:57 +03:00
Roman Lebedev 61ec228030
[NFC][SimplifyCFG] Add testcase showing that we fail to preserve DomTree in switchToSelect() 2021-01-15 23:35:55 +03:00
Florian Hahn d98fc62ae6 [SimplifyCFG] Keep !dgb metadata of moved instruction, if they match.
Currently SimplifyCFG drops the debug locations of 'bonus' instructions.
Such instructions are moved before the first branch. The reason for the
current behavior is that this could lead to surprising debug stepping,
if the block that's folded is dead.

In case the first branch and the instructions to be folded have the same
debug location, this shouldn't be an issue and we can keep the debug
location.

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D93662
2021-01-09 19:15:16 +00:00
Roman Lebedev 6984781df9
[NFC][SimplifyCFG] Add a test with an undef cond branch to identical destinations 2021-01-08 02:15:26 +03:00
Roman Lebedev f8875c313c
[NFC][SimplifyCFG] Add test with an unreachable block with two identical successors 2021-01-08 02:15:25 +03:00
Roman Lebedev 8b9a0e6f7e
[NFC][SimlifyCFG] Add some indirectbr-of-blockaddress tests 2021-01-08 02:15:25 +03:00
Roman Lebedev 087be536fe
[NFC][SimplifyCFG] Add a test with cond br on constant w/ identical destinations 2021-01-08 02:15:24 +03:00
Roman Lebedev 6be1fd6b20
[SimplifyCFG] FoldValueComparisonIntoPredecessors(): drop reachable errneous assert
I have added it in d15d81c because it *seemed* correct, was holding
for all the tests so far, and was validating the fix added in the same
commit, but as David Major is pointing out (with a reproducer),
the assertion isn't really correct after all. So remove it.

Note that the d15d81c still fine.
2021-01-07 18:05:04 +03:00
Roman Lebedev a14945c1db
[SimplifyCFG] SimplifyEqualityComparisonWithOnlyPredecessor(): really don't delete DomTree edges multiple times 2021-01-06 01:52:39 +03:00
Roman Lebedev 0a87e53fc4
[NFC][SimplifyCFG] Add a test where SimplifyEqualityComparisonWithOnlyPredecessor() deletes existing edge 2021-01-06 01:52:39 +03:00
Roman Lebedev 29ca7d5a1a
[SimplifyCFG] simplifyUnreachable(): fix handling of degenerate same-destination conditional branch
One would hope that it would have been already canonicalized into an
unconditional branch, but that isn't really guaranteed to happen
with SimplifyCFG's visitation order.
2021-01-06 01:52:36 +03:00
Roman Lebedev 3460719f58
[NFC][SimplifyCFG] Add a test with same-destination condidional branch
Reported by Mikael Holmén as post-commit feedback on
https://reviews.llvm.org/rG2d07414ee5f74a09fb89723b4a9bb0818bdc2e18#968162
2021-01-06 01:52:36 +03:00
Roman Lebedev a7684940f0
[SimplifyCFG] SimplifyTerminatorOnSelect(): fix/tune DomTree updates
We only need to remove non-TrueBB/non-FalseBB successors,
and we only need to do that once. We don't need to insert
any new edges, because no new successors will be added.
2021-01-04 01:02:02 +03:00
Roman Lebedev 4fc908025f
[NFC][SimplifyCFG] Add a test where we fail to preserve DomTree validity 2021-01-04 01:02:01 +03:00
Roman Lebedev 5fa241a657
[SimplifyCFG] FoldValueComparisonIntoPredecessors(): fine-tune/fix DomTree preservation, take 2 2021-01-03 01:45:48 +03:00
Roman Lebedev a0013934b6
[NFC][SimplifyCFG] Add another test for switch creation where we fail to maintain DomTree 2021-01-03 01:45:48 +03:00
Roman Lebedev 6a3a8d17eb
[SimplifyCFG] FoldValueComparisonIntoPredecessors(): fine-tune/fix DomTree preservation 2021-01-03 01:45:48 +03:00
Roman Lebedev eda50309f5
[NFC][SimplifyCFG] Add test for switch creation where we fail to maintain DomTree
Reduced from vanilla test-suite
2021-01-03 01:45:47 +03:00
Roman Lebedev 657c1e09da
[SimplifyCFG] Teach eliminateDeadSwitchCases() to preserve DomTree, part 2 2021-01-02 01:01:18 +03:00
Roman Lebedev f1ce696056
[SimplifyCFG] Teach tryWidenCondBranchToCondBranch() to preserve DomTree 2021-01-02 01:01:17 +03:00
Roman Lebedev 831636b0e6
[SimplifyCFG] SUCCESS! Teach createUnreachableSwitchDefault() to preserve DomTree
This pretty much concludes patch series for updating SimplifyCFG
to preserve DomTree. All 318 dedicated `-simplifycfg` tests now pass
with `-simplifycfg-require-and-preserve-domtree=1`.

There are a few leftovers that apparently don't have good test coverage.
I do not yet know what gaps in test coverage will the wider-scale testing
reveal, but the default flip might be close.
2021-01-01 03:25:25 +03:00
Roman Lebedev e1440d43bc
[SimplifyCFG] Teach tryToSimplifyUncondBranchWithICmpInIt() to preserve DomTree 2021-01-01 03:25:25 +03:00
Roman Lebedev 8866583953
[SimplifyCFG] Teach FoldValueComparisonIntoPredecessors() to preserve DomTree, part 2 2021-01-01 03:25:24 +03:00
Roman Lebedev a815b6b2b2
[SimplifyCFG] Teach eliminateDeadSwitchCases() to preserve DomTree, part 1 2021-01-01 03:25:24 +03:00
Roman Lebedev 0d2f219d4d
[SimplifyCFG] Teach SimplifyEqualityComparisonWithOnlyPredecessor() to preserve DomTree, part 3 2021-01-01 03:25:23 +03:00
Roman Lebedev 9f17dab1f4
[SimplifyCFG] Teach simplifyIndirectBr() to preserve DomTree 2021-01-01 03:25:23 +03:00
Roman Lebedev b7c463d7b8
[SimplifyCFG] Teach FoldBranchToCommonDest() to preserve DomTree, part 2 2021-01-01 03:25:23 +03:00
Roman Lebedev c1b825d4b8
[SimplifyCFG] Teach FoldValueComparisonIntoPredecessors() to preserve DomTree, part 1 2021-01-01 03:25:22 +03:00
Juneyoung Lee 3c60e9bac8 Add tests for D93943 (NFC) 2021-01-01 05:59:52 +09:00
Roman Lebedev 7f221c9196
[SimplifyCFG] Teach SwitchToLookupTable() to preserve DomTree 2020-12-30 23:58:41 +03:00
Roman Lebedev a17025aa61
[SimplifyCFG] Teach switchToSelect() to preserve DomTree 2020-12-30 23:58:40 +03:00
Roman Lebedev c45f765c0d
[SimplifyCFG] Teach SimplifyBranchOnICmpChain() to preserve DomTree 2020-12-30 23:58:40 +03:00
Juneyoung Lee e6e6404600 [SimplifyCFG] Add tests for select form and/or for creating select from icmps 2020-12-30 22:15:39 +09:00
Roman Lebedev 39a56f7f17
[SimplifyCFG] Teach SimplifyTerminatorOnSelect() to preserve DomTree 2020-12-30 00:48:12 +03:00
Roman Lebedev ec0b671a61
[SimplifyCFG] Teach SimplifyCondBranchToCondBranch() to preserve DomTree 2020-12-30 00:48:12 +03:00
Roman Lebedev 307156246f
[SimplifyCFG] Teach mergeConditionalStoreToAddress() to preserve DomTree 2020-12-30 00:48:11 +03:00
Roman Lebedev d4c0abb4a3
[SimplifyCFG] Teach FoldCondBranchOnPHI() to preserve DomTree 2020-12-30 00:48:11 +03:00
Roman Lebedev b8121b2e62
[SimplifyCFG] Teach SinkCommonCodeFromPredecessors() to preserve DomTree 2020-12-30 00:48:11 +03:00
Roman Lebedev 18c407bf4c
[SimplifyCFG] Teach HoistThenElseCodeToIf() to preserve DomTree 2020-12-30 00:48:10 +03:00
Roman Lebedev fe9bdd9621
[SimplifyCFG] Teach SimplifyEqualityComparisonWithOnlyPredecessor() to preserve DomTree, part 2 2020-12-30 00:48:10 +03:00
Roman Lebedev 6027e05dbf
[SimplifyCFG] Teach SimplifyEqualityComparisonWithOnlyPredecessor() to preserve DomTree, part 1 2020-12-30 00:48:10 +03:00
Roman Lebedev ef93f7a11c
[SimplifyCFG] FoldBranchToCommonDest: gracefully handle unreachable code ()
We might be dealing with an unreachable code,
so the bonus instruction we clone might be self-referencing.

There is a sanity check that all uses of bonus instructions
that are not in the original block with said bonus instructions
are PHI nodes, and that is obviously not the case
for self-referencing instructions..

So if we find such an use, just rewrite it.

Thanks to Mikael Holmén for the reproducer!

Fixes https://bugs.llvm.org/show_bug.cgi?id=48450#c8
2020-12-28 23:31:19 +03:00
Juneyoung Lee db7a2f347f Precommit transform tests that have poison as insertelement's placeholder
This commit copies existing tests at llvm/Transforms and replaces
'insertelement undef' in those files with 'insertelement poison'.
(see https://reviews.llvm.org/D93586)

Tests listed using this script:

grep -R -E '^[^;]*insertelement <.*> undef,' . | cut -d":" -f1 | uniq |
wc -l

Tests updated:

file_org=llvm/test/Transforms/$1
file=${file_org%.ll}-inseltpoison.ll
cp $file_org $file
sed -i -E 's/^([^;]*)insertelement <(.*)> undef/\1insertelement <\2> poison/g' $file
head -1 $file | grep "Assertions have been autogenerated by utils/update_test_checks.py" -q
if [ "$?" == 1 ]; then
  echo "$file : should be manually updated"
  # I manually updated the script
  exit 1
fi
python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file
2020-12-24 11:46:17 +09:00
Roman Lebedev c043f5055e
[SimplifyCFG] Teach FoldBranchToCommonDest() to preserve DomTree, part 1
... for conditional branch case
2020-12-20 00:18:36 +03:00
Roman Lebedev 262ff9c23e
[SimplifyCFG] Teach TryToMergeLandingPad() to preserve DomTree 2020-12-20 00:18:36 +03:00
Roman Lebedev 6a1617d67c
[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 2
... for the custom case returning void.
2020-12-20 00:18:36 +03:00
Roman Lebedev b94520c9ee
[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 1
... for the general case of returning a value.
2020-12-20 00:18:35 +03:00
Roman Lebedev 83659c7076
[SimplifyCFG] simplifySingleResume(): FoldReturnIntoUncondBranch() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Apparently, there were no dedicated tests just for that functionality,
so i'm adding one here.
2020-12-20 00:18:34 +03:00
Roman Lebedev b7d00e29b7
[SimplifyCFG] Teach simplifySingleResume() to preserve DomTree 2020-12-20 00:18:34 +03:00
Roman Lebedev c209b88dd4
[SimplifyCFG] Teach simplifyCommonResume() to preserve DomTree 2020-12-20 00:18:34 +03:00
Roman Lebedev 4be8707e64
[SimplifyCFG] Teach FoldTwoEntryPHINode() to preserve DomTree
Still boring, simply drop all edges to successors of DomBlock,
and add an edge to to BB instead.
2020-12-20 00:18:33 +03:00
Roman Lebedev b43b77ff9b
[NFCI][SimlifyCFG] simplifyOnce(): also perform DomTree validation
And that exposes that a number of tests don't *actually* manage to
maintain DomTree validity, which is inline with my observations.

Once again, SimlifyCFG pass currently does not require/preserve DomTree
by default, so this is effectively NFC.
2020-12-20 00:18:32 +03:00
Roman Lebedev 2d07414ee5
[SimplifyCFG] Teach simplifyUnreachable() to preserve DomTree
Pretty boring, removeUnwindEdge() already known how to update DomTree,
so if we are to call it, we must first flush our own pending updates;
otherwise, we just stop predecessors from branching to us,
and for certain predecessors, stop their predecessors from
branching to them also.
2020-12-18 00:37:22 +03:00
Roman Lebedev 2ee724863e
[SimplifyCFG] ConstantFoldTerminator() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a number of tests,
all of which are marked as such so that they do not regress.
2020-12-18 00:37:22 +03:00
Roman Lebedev 164e0847a5
[SimplifyCFG] DeleteDeadBlock() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
2020-12-18 00:37:21 +03:00
Florian Hahn 75c04bfc61
[SimplifyCFG] Preserve !annotation in FoldBranchToCommonDest.
When folding a branch to a common destination, preserve !annotation on
the created instruction, if the terminator of the BB that is going to be
removed has !annotation. This should ensure that !annotation is attached
to the instructions that 'replace' the original terminator.

Reviewed By: jdoerfert, lebedev.ri

Differential Revision: https://reviews.llvm.org/D93410
2020-12-17 14:06:58 +00:00
Roman Lebedev d22a47e9ff
[SimplifyCFG] Teach mergeEmptyReturnBlocks() to preserve DomTree
A first real transformation that didn't already knew how to do that,
but it's pretty tame - either change successor of all the predecessors
of a block and carefully delay deletion of the block until afterwards
the DomTree updates are appled, or add a successor to the block.

There wasn't a great test coverage for this, so i added extra, to be sure.
2020-12-17 01:03:50 +03:00
Roman Lebedev 5cce4aff18
[SimplifyCFG] TryToSimplifyUncondBranchFromEmptyBlock() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
2020-12-17 01:03:49 +03:00
Roman Lebedev 49dac4aca0
[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
2020-12-17 01:03:49 +03:00
Roman Lebedev 4fc169f664
[SimplifyCFG] removeUnreachableBlocks() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Apparently, there were no dedicated tests just for that functionality,
so i'm adding one here.
2020-12-17 01:03:49 +03:00
Roman Lebedev aa2009fe78
[NFCI][SimplifyCFG] Mark all the SimplifyCFG tests that already don't invalidate DomTree as such
First step after e113317958,
in these tests, DomTree is valid afterwards, so mark them as such,
so that they don't regress.

In further steps, SimplifyCFG transforms shall taught to preserve DomTree,
in as small steps as possible.
2020-12-17 01:03:49 +03:00
Florian Hahn 70bd75426e [SimplifyCFG] Precommit test for preserving !annotation. 2020-12-16 18:47:45 +00:00
Roman Lebedev a7deedc414
[NFC][Tests][SimplifyCFG] Trim whitespaces at the end of lines 2020-12-16 00:38:00 +03:00
Roman Lebedev 59560e8589
[SimplifyCFG] FoldBranchToCommonDest(): temporairly put back restrictions on liveout uses of bonus instructions (PR48450)
Even though d38205144f was mostly a correct
fix for the external non-PHI users, it's not a *generally* correct fix,
because the 'placeholder' values in those trivial PHI's we create
shouldn't be *always* 'undef', but the PHI itself for the backedges,
else we end up with wrong value, as the `@pr48450_2` test shows.

But we can't just do that, because we can't check that the PHI
can be it's own incoming value when coming from certain predecessor,
because we don't have a dominator tree.

So until we can address this correctness problem properly,
ensure that we don't perform the transformation
if there are such problematic external uses.

Making dominator tree available there is going to be involved,
since `-simplifycfg` pass currently does not preserve/update domtree...
2020-12-14 20:14:31 +03:00
Roman Lebedev effbbdec6e
[NFC][SimplifyCFG] Add another miscompiled test for PR48450 2020-12-14 20:14:31 +03:00
Roman Lebedev d38205144f
[SimplifyCFG] FoldBranchToCommonDest(): bonus instrns must only be used by PHI nodes in successors (PR48450)
In particular, if the successor block, which is about to get a new
predecessor block, currently only has a single predecessor,
then the bonus instructions will be directly used within said successor,
which is fine, since the block with bonus instructions dominates that
successor. But once there's a new predecessor, the IR is no longer valid,
and we don't fix it, because we only update PHI nodes.

Which means, the live-out bonus instructions must be exclusively used
by the PHI nodes in successor blocks. So we have to form trivial PHI nodes.
which will then be successfully updated to recieve cloned bonus instns.

This all works fine, except for the fact that we don't have access to
the dominator tree, and we don't ignore unreachable code,
so we sometimes do end up having to deal with some weird IR.

Fixes https://bugs.llvm.org/show_bug.cgi?id=48450
2020-12-13 00:06:57 +03:00
Roman Lebedev 15f8060f6f
[SimplifyCFG] FoldBranchToCommonDest: don't require that cmp of br is last instruction
There is no correctness need for that, and since we allow live-out
uses, this could theoretically happen, because currently nothing
will move the cond to right before the branch in those tests.
But regardless, lifting that restriction even makes the transform
easier to understand.

This makes the transform happen in 81 more cases (+0.55%)
)
2020-12-01 15:13:06 +03:00
Roman Lebedev b52029224c
[NFC][SimplifyCFG] fold-branch-to-common-dest: add tests with cond of br not being the last op 2020-12-01 15:13:05 +03:00
Roman Lebedev b33fbbaa34
Reland [SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions
This was orginally committed in 2245fb8aaa.
but was immediately reverted in f3abd54958
because of a PHI handling issue.

Original commit message:

1. It doesn't make sense to enforce that the bonus instruction
   is only used once in it's basic block. What matters is
   whether those user instructions fit within our budget, sure,
   but that is another question.
2. It doesn't make sense to enforce that said bonus instructions
   are only used within their basic block. Perhaps the branch
   condition isn't using the value computed by said bonus instruction,
   and said bonus instruction is simply being calculated
   to be used in successors?

So iff we can clone bonus instructions, to lift these restrictions,
we just need to carefully update their external uses
to use the new cloned instructions.

Notably, this transform (even without this change) appears to be
poison-unsafe as per alive2, but is otherwise (including the patch) legal.

We don't introduce any new PHI nodes, but only "move" the instructions
around, i'm not really seeing much potential for extra cost modelling
for the transform, especially since now we allow at most one such
bonus instruction by default.

This causes the fold to fire +11.4% more (13216 -> 14725)
as of vanilla llvm test-suite + RawSpeed.

The motivational pattern is IEEE-754-2008 Binary16->Binary32
extension code:
ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)
^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v
That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM
apparently that fold is happening somewhere else afterall,
so something else also has a similar 'artificial' restriction.
2020-11-27 12:47:15 +03:00
Roman Lebedev 4018806329
[NFC][SimplifyCFG] FoldBranchToCommonDest: add one more test with PHI
This is the problematic pattern i didn't think of,
that lead to revert of 2245fb8aaa
in f3abd54958.
2020-11-27 12:47:14 +03:00
Roman Lebedev f3abd54958
Revert "[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions"
Many bots are unhappy, at the very least missed a few codegen tests,
and possibly this has a logic hole inducing a miscompile
(will be really awesome to have ready reproducer..)

Need to investigate.

This reverts commit 2245fb8aaa.
2020-11-26 23:13:43 +03:00
Roman Lebedev 2245fb8aaa
[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions
1. It doesn't make sense to enforce that the bonus instruction
   is only used once in it's basic block. What matters is
   whether those user instructions fit within our budget, sure,
   but that is another question.
2. It doesn't make sense to enforce that said bonus instructions
   are only used within their basic block. Perhaps the branch
   condition isn't using the value computed by said bonus instruction,
   and said bonus instruction is simply being calculated
   to be used in successors?

So iff we can clone bonus instructions, to lift these restrictions,
we just need to carefully update their external uses
to use the new cloned instructions.

Notably, this transform (even without this change) appears to be
poison-unsafe as per alive2, but is otherwise (including the patch) legal.

We don't introduce any new PHI nodes, but only "move" the instructions
around, i'm not really seeing much potential for extra cost modelling
for the transform, especially since now we allow at most one such
bonus instruction by default.

This causes the fold to fire +11.4% more (13216 -> 14725)
as of vanilla llvm test-suite + RawSpeed.

The motivational pattern is IEEE-754-2008 Binary16->Binary32
extension code:
ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)
^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v
That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM
apparently that fold is happening somewhere else afterall,
so something else also has a similar 'artificial' restriction.
2020-11-26 22:51:22 +03:00
Roman Lebedev 394b4fdb41
[NFC][SimplifyCFG] Add test coverage for FoldBranchToCommonDest xform with live-out bonus instuctions
The uses of the bonus instructions should not be preventing the transformation.
2020-11-26 22:51:21 +03:00
Arthur Eubanks aeb0fdff35 [SimplifyCFG] Respect optforfuzzing in NPM pass
Regression caused by refactoring in
cdd006eec9.

See discussion in https://reviews.llvm.org/D89917.

Reviewed By: arsenm, morehouse

Differential Revision: https://reviews.llvm.org/D91473
2020-11-16 09:56:37 -08:00
Simon Pilgrim 5c987212b7 [SimplifyCFG] Remove unused check-prefixes 2020-11-09 10:37:18 +00:00
Sanjay Patel c40126e740 [ARM] remove cost-kind predicate for most math op costs
This is based on the same idea that I am using for the basic model implementation
and what I have partly already done for x86: throughput cost is number of
instructions/uops, so size/blended costs are identical except in special cases
(for example, fdiv or other known-expensive machine instructions or things like
MVE that may require cracking into >1 uop)).

Differential Revision: https://reviews.llvm.org/D90692
2020-11-03 17:23:46 -05:00
Florian Hahn b3b993a7ad Reland "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts the revert commit 408c4408fa.

This version of the patch includes a fix for a crash caused by
treating ICmp/FCmp constant expressions as instructions.

Original message:

On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
2020-11-02 15:39:29 +00:00
Arthur Eubanks 5c31b8b94f Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit 10f2a0d662.

More uint64_t overflows.
2020-10-31 00:25:32 -07:00
Arthur Eubanks 10f2a0d662 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-30 10:03:46 -07:00
Vedant Kumar 5a3ef55a52 [Utils] Skip RemoveRedundantDbgInstrs in MergeBlockIntoPredecessor (PR47746)
This patch changes MergeBlockIntoPredecessor to skip the call to
RemoveRedundantDbgInstrs, in effect partially reverting D71480 due to
some compile-time issues spotted in LoopUnroll and SimplifyCFG.

The call to RemoveRedundantDbgInstrs appears to have changed the
worst-case behavior of the merging utility. Loosely speaking, it seems
to have gone from O(#phis) to O(#insts).

It might not be possible to mitigate this by scanning a block to
determine whether there are any debug intrinsics to remove, since such a
scan costs O(#insts).

So: skip the call to RemoveRedundantDbgInstrs. There's surprisingly
little fallout from this, and most of it can be addressed by doing
RemoveRedundantDbgInstrs later. The exception is (the block-local
version of) SimplifyCFG, where it might just be too expensive to call
RemoveRedundantDbgInstrs.

Differential Revision: https://reviews.llvm.org/D88928
2020-10-27 10:12:59 -07:00
Nico Weber 2a4e704c92 Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit e5766f25c6.
Makes clang assert when building Chromium, see https://crbug.com/1142813
for a repro.
2020-10-27 09:26:21 -04:00
Arthur Eubanks e5766f25c6 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-26 20:24:04 -07:00
Arthur Eubanks 61ac58e10a [NewPM] Pin tests with -debug-pass to legacy PM
-debug-pass is a legacy PM only option.

Some tests checks that the pass returned that it made a change,
which is not relevant to the NPM, since passes return PreservedAnalyses.

Some tests check that passes are freed at the proper time, which is also
not relevant to the NPM.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87945
2020-09-22 17:54:25 -07:00
Arthur Eubanks 1747f77764 [SimplifyCFG] Override options in default constructor
SimplifyCFG's options should always be overridden by command line flags,
but they mistakenly weren't in the default constructor.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87718
2020-09-21 16:33:01 -07:00
Roman Lebedev bb7d3af113
Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
This was reverted in 503deec218
because it caused gigantic increase (3x) in branch mispredictions
in certain benchmarks on certain CPU's,
see https://reviews.llvm.org/D84108#2227365.

It has since been investigated and here are the results:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html
> It's an amazingly severe regression, but it's also all due to branch
> mispredicts (about 3x without this). The code layout looks ok so there's
> probably something else to deal with. I'm not sure there's anything we can
> reasonably do so we'll just have to take the hit for now and wait for
> another code reorganization to make the branch predictor a bit more happy :)
>
> Thanks for giving us some time to investigate and feel free to recommit
> whenever you'd like.
>
> -eric

So let's just reland this.
Original commit message:


I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108

This reverts commit 503deec218.
2020-09-08 00:24:03 +03:00
Sam Parker 65f78e73ad [SimplifyCFG] Consider cost of combining predicates.
Modify FoldBranchToCommonDest to consider the cost of inserting
instructions when attempting to combine predicates to fold blocks.
The threshold can be controlled via a new option:
-simplifycfg-branch-fold-threshold which defaults to '2' to allow
the insertion of a not and another logical operator.

Differential Revision: https://reviews.llvm.org/D86526
2020-09-07 10:04:50 +01:00
serge-sans-paille 3a6f3fc160 Fix return status of SimplifyCFG
When a switch case is folded into default's case, that's an IR change that
should be reported, update ConstantFoldTerminator accordingly.

Differential Revision: https://reviews.llvm.org/D87142
2020-09-05 07:54:15 +02:00
Arthur Eubanks 486ed88533 [ConstProp] Remove ConstantPropagation
As discussed in
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143801.html.

Currently no users outside of unit tests.

Replace all instances in tests of -constprop with -instsimplify.
Notable changes in tests:
* vscale.ll - @llvm.sadd.sat.nxv16i8 is evaluated by instsimplify, use a fake intrinsic instead
* InsertElement.ll - insertelement undef is removed by instsimplify in @insertelement_undef
llvm/test/Transforms/ConstProp moved to llvm/test/Transforms/InstSimplify/ConstProp

Reviewed By: lattner, nikic

Differential Revision: https://reviews.llvm.org/D85159
2020-08-26 15:51:30 -07:00
Sam Parker ee2fdedd84 [NFC][SimplifyCFG] More tests for Arm 2020-08-25 12:13:48 +01:00
Sam Parker d4225b8f17 [NFC][SimplifyCFG] Add some more tests for Arm. 2020-08-25 11:44:17 +01:00
Roman Lebedev 503deec218
Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline"
As disscussed in post-commit review starting with
	https://reviews.llvm.org/D84108#2227365
while this appears to be mostly a win overall, especially code-size-wise,
this appears to shake //certain// code pattens in a way that is extremely
unfavorable for performance (+30% runtime regression)
on certain CPU's (i personally can't reproduce).

So until the behaviour is better understood, and a path forward is mapped,
let's back this out for now.

This reverts commit 1d51dc38d8.
2020-08-22 00:33:22 +03:00