Commit Graph

30753 Commits

Author SHA1 Message Date
Florian Hahn e9cced2739
Recommit "[LAA] Initial support for runtime checks with pointer selects."
This reverts commit 7aa8a67882.

This version includes fixes to address issues uncovered after
the commit landed and discussed at D11448.

Those include:

* Limit select-traversal to selects inside the loop.
* Freeze pointers resulting from looking through selects to avoid
  branch-on-poison.
2022-06-17 21:06:26 +02:00
Martin Sebor 5fb67e32f8 [InstCombine] Fold memcmp of constant arrays and variable size
The memcmp simplifier is limited to folding to constants calls with constant
arrays and constant sizes.  This change adds the ability to simplify
memcmp(A, B, N) calls with constant A and B and variable N to the pseudocode
equivalent of

N <= Pos ? 0 : (A < B ? -1 : B < A ? +1 : 0)

where Pos is the offset of the first mismatch between A and B.

Differential Revision: https://reviews.llvm.org/D127766
2022-06-17 10:35:35 -06:00
Sanjay Patel bfde861935 [InstCombine] convert mask and shift of power-of-2 to cmp+select
When the mask is a power-of-2 constant and op0 is a shifted-power-of-2
constant, test if the shift amount equals the offset bit index:

(ShiftC << X) & C --> X == (log2(C) - log2(ShiftC)) ? C : 0
(ShiftC >> X) & C --> X == (log2(ShiftC) - log2(C)) ? C : 0

This is an alternate to D127610 with a more general pattern.
We match only shift+and instead of the trailing xor, so we see a few
more tests diffs. I think we discussed this initially in D126617.

Here are proofs for shifts in both directions:
https://alive2.llvm.org/ce/z/CFrLs4

The test diffs look equal or better for IR, and this makes the
patterns more uniform in IR. The backend can partially invert this
in both cases if that is profitable. It is not trivially reversible,
however, so if we find perf regressions that are not easy to undo,
then we may want to revert this.

Differential Revision: https://reviews.llvm.org/D127801
2022-06-17 10:51:57 -04:00
Malhar Jajoo 6bb40552f2 [LoopVectorize] Add support for invariant stores of ordered reductions
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D126772
2022-06-17 14:56:21 +01:00
Nikita Popov c6b88cb918 [InstCombine] Push freeze through recurrence phi
We really want to push freezes through recurrence phis, so that we
freeze only the start value, rather than the IV value on every
iteration. foldOpIntoPhi() already handles this for the case where
the transfer function doesn't produce poison, e.g.
%iv.next = add %iv, 1. However, this does not work if nowrap flags
are present, e.g. the very common %iv.next = add nuw %iv, 1 case.

This patch adds a fold that pushes freeze instructions to the start
value by checking whether all backedge values will be non-poison
after poison generating flags have been dropped. This allows pushing
freezes out of loops in most cases. I suspect that this also
obsoletes the CanonicalizeFreezeInLoops pass, and we can probably
drop it.

Fixes https://github.com/llvm/llvm-project/issues/56048.

Differential Revision: https://reviews.llvm.org/D127960
2022-06-17 15:01:41 +02:00
Amanieu d'Antras caa2a829cd [MergeFunctions] Preserve symbols used llvm.used/llvm.compiler.used
llvm.used and llvm.compiler.used are often used with inline assembly
that refers to a specific symbol so that the symbol is kept through to
the linker even though there are no references to it from LLVM IR.

This fixes the MergeFunctions pass to preserve references to these
symbols in llvm.used/llvm.compiler.used so they are not deleted from the
IR. This doesn't prevent these functions from being merged, but
guarantees that an alias or thunk with the expected symbol name is kept
in the IR.

Differential Revision: https://reviews.llvm.org/D127751
2022-06-16 21:36:39 +01:00
Paul Robinson 3f6030255d Reland "[PS4/PS5][profiling] Go back to the old way of doing a runtime hook"
Profiling stopped working for us after D98061, which was largely a
Fuschia-specific patch but in one place used `isOSBinFormatELF` to
make a decision.  I'm adding a PS4/PS5 exception to that, so we can
get profiling to work again.

Differential Revision: https://reviews.llvm.org/D127506
2022-06-16 11:53:48 -07:00
Paul Robinson d0e60b6d7e Revert "[PS4/PS5][profiling] Go back to the old way of doing a runtime hook"
This reverts commit 39fb84343e.

Pushed without verifying the test still works.
2022-06-16 11:41:23 -07:00
Paul Robinson 39fb84343e [PS4/PS5][profiling] Go back to the old way of doing a runtime hook
Profiling stopped working for us after D98061, which was largely a
Fuschia-specific patch but in one place used `isOSBinFormatELF` to
make a decision.  I'm adding a PS4/PS5 exception to that, so we can
get profiling to work again.

Differential Revision: https://reviews.llvm.org/D127506
2022-06-16 11:37:07 -07:00
Paul Robinson 593fa3ab30 [PS5] Set address sanitizer shadow offset 2022-06-16 11:28:30 -07:00
Alexey Bataev 76782a65ee [SLP]Use original vector if need to shuffle truncated root.
If the root scalar is mapped to to the smallest bit width, the vector is
truncated and the types between original buildvector and extracted value
mismatched. For extract, we emit sext/zext instructions, for shuffles we
can reuse oringal vector instead of the truncated one.

Differential Revision: https://reviews.llvm.org/D127974
2022-06-16 10:41:18 -07:00
Samuel Eubanks bf02ed240d Prevent crash when TurnSwitchRangeIntoICmp receives default unreachable destination
TurnSwitchRangeIntoICmp crashes when given a switch with a default
destination of unreachable
Addresses issue #53208
https://github.com/llvm/llvm-project/issues/53208

Differential revision: https://reviews.llvm.org/D127712
2022-06-16 16:11:24 +02:00
Florian Hahn 949c13649c
[LV] Remove widenPHIInstruction dependence on underlying instr (NFC).
Instead of using the underlying instruction and VF to get the type, use
the type of the incoming value. This removes an unnecessary dependence
on the underlying instruction and enables using the recipe without an
underlying instruction.
2022-06-16 16:03:01 +02:00
Alexey Bataev 7236d49fd5 [SLP]Extend vectorization for scatter vectorize nodes.
Currently scatter vectorize nodes can be emitted only for GEPs with
constant indices. But we can also emit such nodes for GEPs with the same
ptr and non-constant vectorizable/gathered indices, if profitable. Patch
adds support for such nodes and tries to improve handling of GEPs with
non-const indeces for such nodes.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
                                                                                              results                   results0 diff
                    test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  5243.00                   5240.00  -0.1%
                     test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  5243.00                   5240.00  -0.1%
                     test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 27550.00                  27507.00  -0.2%
                               test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  5395.00                   5380.00  -0.3%
                       test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  5389.00                   5374.00  -0.3%
                    test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   961.00                    958.00  -0.3%
                   test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test   961.00                    958.00  -0.3%
                               test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test  5664.00                   5643.00  -0.4%
                       test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 13202.00                  13127.00  -0.6%
                                test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   212.00                    207.00  -2.4%
                                test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test   890.00                    850.00  -4.5%
                            test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test  1695.00                   1581.00  -6.7%
                                 test-suite :: MultiSource/Applications/JM/lencod/lencod.test  2338.00                   2140.00  -8.5%
                                  test-suite :: SingleSource/UnitTests/matrix-types-spec.test    63.00                     55.00 -12.7%
                             test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   468.00                    356.00 -23.9%
                                                                           Geomean difference                                     -0.3%

All numbers show increased number of generated vector instructions.

Diff:
SingleSource/Benchmarks/Adobe-C++/loop_unroll - better without LTO, but
need an extra analysis with LTO (with LTO compiler generates
masked_gather, while before regular loads were emitted because of extra
data, availbale at LTO time).
SingleSource/UnitTests/matrix-types-spec - more vector code.
MultiSource/Applications/JM/lencod/lencod - same.
External/SPEC/CINT2006/464.h264ref/464.h264ref - same.
MultiSource/Benchmarks/7zip/7zip-benchmark - same.
External/SPEC/CINT2006/445.gobmk/445.gobmk - no changes.
External/SPEC/CFP2017rate/510.parest_r/510.parest_r - more vector code.
External/SPEC/CFP2006/447.dealII/447.dealII - same
External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s - same
External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp - same
External/SPEC/CFP2017rate/511.povray_r/511.povray - same
External/SPEC/CFP2006/453.povray/453.povray - same
External/SPEC/CFP2017rate/526.blender_r/526.blender_r - same
External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r - same
External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s - same

Differential Revision: https://reviews.llvm.org/D127219
2022-06-16 06:05:48 -07:00
Jin Xin Ng aaff3fb6d5 [mlgo] Fix accounting for SCC splits
Previously if the inliner split an SCC such that an empty one remained, the MLInlineAdvisor could potentially lose track of the EdgeCount if a subsequent CGSCC pass modified the calls of a function that was initially in the SCC pre-split. Saving the seen nodes in onPassEntry resolves this.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D127693
2022-06-15 10:53:23 -07:00
Florian Hahn 5ff5b460d9
[LV] Remove unneeded CustomBuilder arg from setDebugLocFromInst (NFC).
The only user that passed in a custom builder was passing in
VPTransformState::Builder, which is the same as ILV::Builder.
2022-06-15 18:48:02 +01:00
Alexey Bataev c60c13f7eb [SLP] Improve reordering in presence of constant only nodes.
We can skip the analysis of the constant nodes, their order should not
affect the ordering of the trees/subtrees.

Differential Revision: https://reviews.llvm.org/D127775
2022-06-15 06:17:34 -07:00
Heejin Ahn b2f4112f25 [InstCombine] Improve check for catchswitch BBs (NFC)
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D127810
2022-06-15 01:06:13 -07:00
Nikita Popov 2dac2c4f76 [SimplifyLibCalls] Drop duplicate check (NFC)
The same condition already exists inside optimizeMemCmpConstantSize().
2022-06-15 09:37:09 +02:00
Venkata Ramanaiah Nalamothu 340b0ca900 [llvm] Add DW_CC_nocall to function debug metadata when either return values or arguments are removed
Adding the `DW_CC_nocall` calling convention to the function debug metadata is needed when either the return values or the arguments of a function are removed as this helps in informing debugger that it may not be safe to call this function or try to interpret the return value.
This translates to setting `DW_AT_calling_convention` with `DW_CC_nocall` for appropriate DWARF DIEs.

The DWARF5 spec (section 3.3.1.1 Calling Convention Information) says:

If the `DW_AT_calling_convention` attribute is not present, or its value is the constant `DW_CC_normal`, then the subroutine may be safely called by obeying the `standard` calling conventions of the target architecture. If the value of the calling convention attribute is the constant `DW_CC_nocall`, the subroutine does not obey standard calling conventions, and it may not be safe for the debugger to call this subroutine.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D127134
2022-06-15 03:30:15 +05:30
Florian Hahn 7c0089d735
[Matrix] Check if iterator is at beginning of BB in optimizeTranspose.
If an instruction at the beginning of a block is erased,  this may
trigger crash due to dereferencing an invalid iterator.

Check if II is at the end before dereferencing it.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D127736
2022-06-14 21:37:02 +01:00
Jin Xin Ng 9f2b873a7d [inliner] Add per-SCC-pass InlineAdvisor printing option
Adds option to print the contents of the Inline Advisor after each SCC Inliner pass

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D127689
2022-06-14 08:06:52 -07:00
Serguei Katkov d713f0eab8 Revert "[MachineSSAUpdater] compile time improvement in GetValueInMiddleOfBlock"
It looks like it causes buildbot failures.
As an example: https://lab.llvm.org/buildbot/#/builders/121/builds/20364

Revert to investigate...

This reverts commit 6bf2791814.
2022-06-14 20:27:21 +07:00
Serguei Katkov 6bf2791814 [MachineSSAUpdater] compile time improvement in GetValueInMiddleOfBlock
GetValueInMiddleOfBlock uses result of GetValueAtEndOfBlockInternal if there is no value
defined for current basic block.

If there is already a value it tries (in this order):

to find single register coming from all predecessors
find existing phi node which matches our incoming registers
build new phi.
The compile time improvement is to use current available value if
it is defined out of current BB or it is a PHI register.
This is due to it can be used in the middle basic block.

Reviewed By: sameerds
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D126523
2022-06-14 18:00:34 +07:00
Guillaume Chatelet 77bba68de6 [NFC][Alignment] Use Align in CoroFrame 2022-06-14 10:56:36 +00:00
Guillaume Chatelet 6fd480d957 [NFC][Alignment] use getAlign in AddressSanitizer 2022-06-14 10:56:36 +00:00
Florian Hahn 782e912246
[ConstraintElimination] Support constraints with only const ops.
Remove the early exit if both constraints contain no variables. This
restriction is unnecessayr for correctness and removing it simplifies
handling of trivial constant conditions in follow-up changes.
2022-06-14 10:37:12 +01:00
Chuanqi Xu d029db9e8a [NFC] Fix Wswitch warning triggered by 735e6c 2022-06-14 14:45:15 +08:00
Chuanqi Xu 735e6c40b5 [Coroutines] Convert coroutine.presplit to enum attr
This is required by @nikic in https://reviews.llvm.org/D127383 to
decrease the cost to check whether a function is a coroutine and this
fixes a FIXME too.

Reviewed By: rjmccall, ezhulenev

Differential Revision: https://reviews.llvm.org/D127471
2022-06-14 14:23:46 +08:00
chenglin.bi 286198ff04 [InstCombine] Optimize lshr+shl+and conversion pattern
if `C1` and `C3` are pow2 and `Log2(C3) >= C2`:
    ((C1 >> X) << C2) & C3 -> X == (Log2(C1)+C2-Log2(C3)) ? C3 : 0
https://alive2.llvm.org/ce/z/zvrkKF

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D127469
2022-06-14 11:06:10 +08:00
Heejin Ahn ac4006b0d6 [InstCombine] Don't slice up PHIs when pred BB has catchswitch
If an integer PHI has an illegal type (according to the data layout) and
it is only used by `trunc` or `trunc(lshr)` operations, we split the PHI
into various instructions in its predecessors:
6d1543a167/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp (L1536-L1543)

So this can produce code like the following:
Before:
```
pred:
  ...

bb:
  %p = phi i8 [ %somevalue, %pred ], ...
  ...
  %tobool = trunc i8 %p to i1
  use %tobool
  ...
```
In this code, `%p` has an illegal integer type, `i8`, and its only used
in a `trunc` instruction later. In this case this pass puts extraction
code in its predecessors:

After:
```
pred:
  ...
  %t = and i8 %somevalue, 1
  %extract = icmp ne i8 %t, 0

bb:
  %p.new = phi i1 [ %extract, %pred ], ...
  use %p.new instead of %tobool
```

But this doesn't work if `pred` is a `catchswitch` BB because it cannot
have any non-PHI instructions. This CL ensures we bail out in that case.

Fixes https://github.com/llvm/llvm-project/issues/55803.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D127699
2022-06-13 18:32:09 -07:00
Florian Hahn 9129e7bb54
[LV] Replace OrigPHIsToFix in native with VPlan traversal. (NFC)
OrigPHIsToFix is only used in the native path. Collecting phis can be
replaced by iterating over the plan. This also removes another
unnecessary use of a late getVPValue.

This also reduces the coupling between ILV and the VPlan utilities.
2022-06-13 22:20:58 +01:00
Hubert Tong 5efb380c26 [NFC] Undo AIX build compiler workaround
Removes the workaround from https://reviews.llvm.org/D98509#2732628 for
an AIX build compiler issue.

The AIX build compiler product that caused the issue has since been
fixed. Also, the AIX build compiler has been changed to one based on
LLVM.
2022-06-13 17:00:33 -04:00
Guillaume Chatelet 111b32ecb4 [NFC][Alignment] Use getAlign in Attributor classes 2022-06-13 15:13:05 +00:00
Guillaume Chatelet 2887dd754e [NFC][Alignment] Use getAlign in VNCoercion 2022-06-13 15:13:05 +00:00
Sanjay Patel 310adb658c [InstCombine] reorder mask folds for efficiency
This shows narrowing improvements on the logic tests
(transforms recently added with e247b0e5c9).

This is not a complete fix. That would require adding
folds to visitOr/visitXor. But it enables the expected
transforms for the basic patterns in the affected tests.
2022-06-13 09:49:57 -04:00
Guillaume Chatelet 45a5cd41e5 [NFC][Alignment] Simplify code in MemorySanitizer 2022-06-13 13:36:36 +00:00
Guillaume Chatelet 86f455750b [NFC] Remove dead code 2022-06-13 12:59:38 +00:00
Guillaume Chatelet a6c2ab0c3f [NFC][Alignment] Use proper type in instrumentLoadOrStore 2022-06-13 12:59:38 +00:00
Nikita Popov 571c713144 [SimplifyCFG] Handle trapping aggregates (PR49839)
Handle the fact that not only constant expressions, but also
constant aggregates containing expressions can trap.

This still doesn't fix the original C reproducer, probably due to
more issues remaining in other passes.
2022-06-13 14:56:49 +02:00
Guillaume Chatelet f9bb8c24ac [NFC][Alignment] Convert MemCpyOptimizer.cpp 2022-06-13 10:07:09 +00:00
David Sherwood 83251896d7 [NFC][InstCombine] Refactor InstCombinerImpl::foldSelectIntoOp
Introduce a lambda function so that we remove a lot of code
duplication.

Differential Revision: https://reviews.llvm.org/D127493
2022-06-13 10:37:07 +01:00
Nikita Popov 92a9b1c918 [InstCombine] Don't push operation across loop phi
When pushing an operation across a phi node, we should avoid doing
so across a loop backedge. This is generally non-profitable, because
it does not reduce the number of times the operation is executed,
and could lead to an infinite combine loop.

The code was already guarding against this, but using an
insufficiently strong condition, which did not cover the case where
the operation was originally outside the loop (in which case the
transform moves the operation from outside the loop into the loop,
which is particularly undesirable).

Differential Revision: https://reviews.llvm.org/D127499
2022-06-13 10:48:09 +02:00
Kazu Hirata df792bcb02 [Transforms] Use default member initialization (NFC)
Identified with modernize-use-default-member-init.
2022-06-12 18:39:05 -07:00
Florian Hahn 763f2bdba5
[VPlan] Remove dead OrigLoop argument from removeDeadRecipes (NFC).
The use of the argument has been remove a while ago. Remove the dead
argument.
2022-06-11 23:36:47 +01:00
Kazu Hirata a838043f38 [llvm] Use contains (NFC) 2022-06-11 11:46:16 -07:00
Kazu Hirata 5d7b1a5f1b [Scalar] Use llvm::append_range (NFC) 2022-06-10 23:09:01 -07:00
Mitch Phillips db68a25ca9 Revert "[Attributor] Ensure to use the proper liveness AA"
This reverts commit a3273c0c06.

Reason: Broke the ASan buildbots with a memory leak. See
https://reviews.llvm.org/rG94841c713fdd2bce3276015d1e946d414bb74ee8 for
more information.
2022-06-10 14:05:09 -07:00
Nuno Lopes e5c5f92e12 [InstCombine] switch synthetic unreachable to use undef instead of poison (NFC) 2022-06-10 21:54:09 +01:00
Sanjay Patel e247b0e5c9 [InstCombine] add narrowing transform for low-masked binop with zext operand (2nd try)
The 1st try ( afa192cfb6 ) was reverted because it could
cause an infinite loop with constant expressions.

A test for that and an extra condition to enable the transform
are added now. I also added code comments to better describe
the transform and the existing, related transform.

Original commit message:
https://alive2.llvm.org/ce/z/hRy3rE

As shown in D123408, we can produce this pattern when moving
casts around, and we already have a related fold for a binop
with a constant operand.
2022-06-10 12:42:27 -04:00