This reverts commit 7aa8a67882.
This version includes fixes to address issues uncovered after
the commit landed and discussed at D11448.
Those include:
* Limit select-traversal to selects inside the loop.
* Freeze pointers resulting from looking through selects to avoid
branch-on-poison.
The memcmp simplifier is limited to folding to constants calls with constant
arrays and constant sizes. This change adds the ability to simplify
memcmp(A, B, N) calls with constant A and B and variable N to the pseudocode
equivalent of
N <= Pos ? 0 : (A < B ? -1 : B < A ? +1 : 0)
where Pos is the offset of the first mismatch between A and B.
Differential Revision: https://reviews.llvm.org/D127766
When the mask is a power-of-2 constant and op0 is a shifted-power-of-2
constant, test if the shift amount equals the offset bit index:
(ShiftC << X) & C --> X == (log2(C) - log2(ShiftC)) ? C : 0
(ShiftC >> X) & C --> X == (log2(ShiftC) - log2(C)) ? C : 0
This is an alternate to D127610 with a more general pattern.
We match only shift+and instead of the trailing xor, so we see a few
more tests diffs. I think we discussed this initially in D126617.
Here are proofs for shifts in both directions:
https://alive2.llvm.org/ce/z/CFrLs4
The test diffs look equal or better for IR, and this makes the
patterns more uniform in IR. The backend can partially invert this
in both cases if that is profitable. It is not trivially reversible,
however, so if we find perf regressions that are not easy to undo,
then we may want to revert this.
Differential Revision: https://reviews.llvm.org/D127801
We really want to push freezes through recurrence phis, so that we
freeze only the start value, rather than the IV value on every
iteration. foldOpIntoPhi() already handles this for the case where
the transfer function doesn't produce poison, e.g.
%iv.next = add %iv, 1. However, this does not work if nowrap flags
are present, e.g. the very common %iv.next = add nuw %iv, 1 case.
This patch adds a fold that pushes freeze instructions to the start
value by checking whether all backedge values will be non-poison
after poison generating flags have been dropped. This allows pushing
freezes out of loops in most cases. I suspect that this also
obsoletes the CanonicalizeFreezeInLoops pass, and we can probably
drop it.
Fixes https://github.com/llvm/llvm-project/issues/56048.
Differential Revision: https://reviews.llvm.org/D127960
llvm.used and llvm.compiler.used are often used with inline assembly
that refers to a specific symbol so that the symbol is kept through to
the linker even though there are no references to it from LLVM IR.
This fixes the MergeFunctions pass to preserve references to these
symbols in llvm.used/llvm.compiler.used so they are not deleted from the
IR. This doesn't prevent these functions from being merged, but
guarantees that an alias or thunk with the expected symbol name is kept
in the IR.
Differential Revision: https://reviews.llvm.org/D127751
Profiling stopped working for us after D98061, which was largely a
Fuschia-specific patch but in one place used `isOSBinFormatELF` to
make a decision. I'm adding a PS4/PS5 exception to that, so we can
get profiling to work again.
Differential Revision: https://reviews.llvm.org/D127506
Profiling stopped working for us after D98061, which was largely a
Fuschia-specific patch but in one place used `isOSBinFormatELF` to
make a decision. I'm adding a PS4/PS5 exception to that, so we can
get profiling to work again.
Differential Revision: https://reviews.llvm.org/D127506
If the root scalar is mapped to to the smallest bit width, the vector is
truncated and the types between original buildvector and extracted value
mismatched. For extract, we emit sext/zext instructions, for shuffles we
can reuse oringal vector instead of the truncated one.
Differential Revision: https://reviews.llvm.org/D127974
Instead of using the underlying instruction and VF to get the type, use
the type of the incoming value. This removes an unnecessary dependence
on the underlying instruction and enables using the recipe without an
underlying instruction.
Currently scatter vectorize nodes can be emitted only for GEPs with
constant indices. But we can also emit such nodes for GEPs with the same
ptr and non-constant vectorizable/gathered indices, if profitable. Patch
adds support for such nodes and tries to improve handling of GEPs with
non-const indeces for such nodes.
Metric: SLP.NumVectorInstructions
Program SLP.NumVectorInstructions
results results0 diff
test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 5243.00 5240.00 -0.1%
test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 5243.00 5240.00 -0.1%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 27550.00 27507.00 -0.2%
test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 5395.00 5380.00 -0.3%
test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 5389.00 5374.00 -0.3%
test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test 961.00 958.00 -0.3%
test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test 961.00 958.00 -0.3%
test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test 5664.00 5643.00 -0.4%
test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 13202.00 13127.00 -0.6%
test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 212.00 207.00 -2.4%
test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 890.00 850.00 -4.5%
test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 1695.00 1581.00 -6.7%
test-suite :: MultiSource/Applications/JM/lencod/lencod.test 2338.00 2140.00 -8.5%
test-suite :: SingleSource/UnitTests/matrix-types-spec.test 63.00 55.00 -12.7%
test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 468.00 356.00 -23.9%
Geomean difference -0.3%
All numbers show increased number of generated vector instructions.
Diff:
SingleSource/Benchmarks/Adobe-C++/loop_unroll - better without LTO, but
need an extra analysis with LTO (with LTO compiler generates
masked_gather, while before regular loads were emitted because of extra
data, availbale at LTO time).
SingleSource/UnitTests/matrix-types-spec - more vector code.
MultiSource/Applications/JM/lencod/lencod - same.
External/SPEC/CINT2006/464.h264ref/464.h264ref - same.
MultiSource/Benchmarks/7zip/7zip-benchmark - same.
External/SPEC/CINT2006/445.gobmk/445.gobmk - no changes.
External/SPEC/CFP2017rate/510.parest_r/510.parest_r - more vector code.
External/SPEC/CFP2006/447.dealII/447.dealII - same
External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s - same
External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp - same
External/SPEC/CFP2017rate/511.povray_r/511.povray - same
External/SPEC/CFP2006/453.povray/453.povray - same
External/SPEC/CFP2017rate/526.blender_r/526.blender_r - same
External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r - same
External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s - same
Differential Revision: https://reviews.llvm.org/D127219
Previously if the inliner split an SCC such that an empty one remained, the MLInlineAdvisor could potentially lose track of the EdgeCount if a subsequent CGSCC pass modified the calls of a function that was initially in the SCC pre-split. Saving the seen nodes in onPassEntry resolves this.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D127693
We can skip the analysis of the constant nodes, their order should not
affect the ordering of the trees/subtrees.
Differential Revision: https://reviews.llvm.org/D127775
Adding the `DW_CC_nocall` calling convention to the function debug metadata is needed when either the return values or the arguments of a function are removed as this helps in informing debugger that it may not be safe to call this function or try to interpret the return value.
This translates to setting `DW_AT_calling_convention` with `DW_CC_nocall` for appropriate DWARF DIEs.
The DWARF5 spec (section 3.3.1.1 Calling Convention Information) says:
If the `DW_AT_calling_convention` attribute is not present, or its value is the constant `DW_CC_normal`, then the subroutine may be safely called by obeying the `standard` calling conventions of the target architecture. If the value of the calling convention attribute is the constant `DW_CC_nocall`, the subroutine does not obey standard calling conventions, and it may not be safe for the debugger to call this subroutine.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D127134
If an instruction at the beginning of a block is erased, this may
trigger crash due to dereferencing an invalid iterator.
Check if II is at the end before dereferencing it.
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D127736
Adds option to print the contents of the Inline Advisor after each SCC Inliner pass
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D127689
GetValueInMiddleOfBlock uses result of GetValueAtEndOfBlockInternal if there is no value
defined for current basic block.
If there is already a value it tries (in this order):
to find single register coming from all predecessors
find existing phi node which matches our incoming registers
build new phi.
The compile time improvement is to use current available value if
it is defined out of current BB or it is a PHI register.
This is due to it can be used in the middle basic block.
Reviewed By: sameerds
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D126523
Remove the early exit if both constraints contain no variables. This
restriction is unnecessayr for correctness and removing it simplifies
handling of trivial constant conditions in follow-up changes.
If an integer PHI has an illegal type (according to the data layout) and
it is only used by `trunc` or `trunc(lshr)` operations, we split the PHI
into various instructions in its predecessors:
6d1543a167/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp (L1536-L1543)
So this can produce code like the following:
Before:
```
pred:
...
bb:
%p = phi i8 [ %somevalue, %pred ], ...
...
%tobool = trunc i8 %p to i1
use %tobool
...
```
In this code, `%p` has an illegal integer type, `i8`, and its only used
in a `trunc` instruction later. In this case this pass puts extraction
code in its predecessors:
After:
```
pred:
...
%t = and i8 %somevalue, 1
%extract = icmp ne i8 %t, 0
bb:
%p.new = phi i1 [ %extract, %pred ], ...
use %p.new instead of %tobool
```
But this doesn't work if `pred` is a `catchswitch` BB because it cannot
have any non-PHI instructions. This CL ensures we bail out in that case.
Fixes https://github.com/llvm/llvm-project/issues/55803.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D127699
OrigPHIsToFix is only used in the native path. Collecting phis can be
replaced by iterating over the plan. This also removes another
unnecessary use of a late getVPValue.
This also reduces the coupling between ILV and the VPlan utilities.
Removes the workaround from https://reviews.llvm.org/D98509#2732628 for
an AIX build compiler issue.
The AIX build compiler product that caused the issue has since been
fixed. Also, the AIX build compiler has been changed to one based on
LLVM.
This shows narrowing improvements on the logic tests
(transforms recently added with e247b0e5c9).
This is not a complete fix. That would require adding
folds to visitOr/visitXor. But it enables the expected
transforms for the basic patterns in the affected tests.
Handle the fact that not only constant expressions, but also
constant aggregates containing expressions can trap.
This still doesn't fix the original C reproducer, probably due to
more issues remaining in other passes.
When pushing an operation across a phi node, we should avoid doing
so across a loop backedge. This is generally non-profitable, because
it does not reduce the number of times the operation is executed,
and could lead to an infinite combine loop.
The code was already guarding against this, but using an
insufficiently strong condition, which did not cover the case where
the operation was originally outside the loop (in which case the
transform moves the operation from outside the loop into the loop,
which is particularly undesirable).
Differential Revision: https://reviews.llvm.org/D127499
The 1st try ( afa192cfb6 ) was reverted because it could
cause an infinite loop with constant expressions.
A test for that and an extra condition to enable the transform
are added now. I also added code comments to better describe
the transform and the existing, related transform.
Original commit message:
https://alive2.llvm.org/ce/z/hRy3rE
As shown in D123408, we can produce this pattern when moving
casts around, and we already have a related fold for a binop
with a constant operand.